IAM: Identity-Aware Human Motion and Shape Joint Generation

1UIUC 2Meta Reality Labs 3Brown University
IAM teaser figure showing identity-consistent motion generation.

Identity-Consistent Motion Generation enables decoupled control of action dynamics and subject morphology. Given identity cues and motion prompts, the model synthesizes diverse body shapes while producing motions that remain physically consistent with body morphology.

Abstract

Recent advances in text-driven human motion generation enable models to synthesize realistic motion sequences from natural language descriptions. However, most existing approaches assume identity-neutral motion and generate movements using a canonical body representation, ignoring the strong influence of body morphology on motion dynamics. In practice, attributes such as body proportions, mass distribution, and age significantly affect how actions are performed, and neglecting this coupling often leads to physically inconsistent motions.

We propose an identity-aware motion generation framework that explicitly models the relationship between body morphology and motion dynamics. Instead of relying on explicit geometric measurements, identity is represented using multimodal signals, including natural language descriptions and visual cues. We further introduce a joint motion-shape generation paradigm that simultaneously synthesizes motion sequences and body shape parameters, allowing identity cues to directly modulate motion dynamics.

Extensive experiments on motion capture datasets and large-scale in-the-wild videos demonstrate improved motion realism and motion-identity consistency while maintaining high motion quality.

Framework

Figure 2: Overview of the proposed framework.

(a) Data Processing Pipeline: We extract motion sequences M, shape parameters beta, and multimodal identity descriptions (Ti, Ii) from diverse sources (in-the-wild videos or MoCap data).
(b) Motion-Shape Generation: A multimodal identity conditioning framework integrates textual and visual priors through frozen encoders to jointly generate identity-consistent motion sequences and body shapes via a diffusion model.

Qualitative Results on HumanML3D

We compare IAM (Diffusion) with VQ-based IAM and Shape My Moves. Each case keeps the same motion prompt and identity prompt across methods.

Zero-shot Generation on Unseen Prompts

Side-by-side comparison between Shape My Moves and IAM on unseen identities. Identity conditioning follows the paper: each example uses a text identity description together with a reference image (identity keyframe from the source video).

Identity-Controllable Generation

Given the same motion prompt, IAM generates diverse identity-consistent motions for different body types.

"A person hesitantly walks across a wobbly rope bridge at an outdoor adventure park, holding onto the overhead safety harness for balance."

Young adult female, slender build

Young adult male, slender build

Muscular adult male

Older adult female, heavy-set build

Older adult male, overweight build

"A person performs a cheerful dance on a white background, swaying their hips and moving their arms."

Young adult female, slender build

Young adult male, slender build

Muscular adult male

Older adult female, heavy-set build

Older adult male, overweight build

"A person walks forward holding bricks with both hands."

Young adult female, slender build

Young adult male, slender build

Muscular adult male

Older adult female, heavy-set build

Older adult male, overweight build

BibTeX

@article{jia2026iam,
  author    = {Jia, Wenqi and Li, Zekun and Mittal, Abhay and Tang, Chengcheng and Guo, Chuan and Wang, Lezi and Rehg, James M. and Tao, Lingling and An, Sizhe},
  title     = {IAM: Identity-Aware Human Motion and Shape Joint Generation},
  journal   = {arXiv preprint},
  year      = {2026},
}