IAM: Identity-Aware Human Motion and Shape Joint Generation

Abstract

Recent advances in text-driven human motion generation enable models to synthesize realistic motion sequences from natural language descriptions. However, most existing approaches assume identity-neutral motion and generate movements using a canonical body representation, ignoring the strong influence of body morphology on motion dynamics. In practice, attributes such as body proportions, mass distribution, and age significantly affect how actions are performed, and neglecting this coupling often leads to physically inconsistent motions.

We propose an identity-aware motion generation framework that explicitly models the relationship between body morphology and motion dynamics. Instead of relying on explicit geometric measurements, identity is represented using multimodal signals, including natural language descriptions and visual cues. We further introduce a joint motion-shape generation paradigm that simultaneously synthesizes motion sequences and body shape parameters, allowing identity cues to directly modulate motion dynamics.

Extensive experiments on motion capture datasets and large-scale in-the-wild videos demonstrate improved motion realism and motion-identity consistency while maintaining high motion quality.

Framework

(a) Data Processing Pipeline: We extract motion sequences M, shape parameters beta, and multimodal identity descriptions (Ti, Ii) from diverse sources (in-the-wild videos or MoCap data).
(b) Motion-Shape Generation: A multimodal identity conditioning framework integrates textual and visual priors through frozen encoders to jointly generate identity-consistent motion sequences and body shapes via a diffusion model.

Qualitative Results on HumanML3D

We compare IAM (Diffusion) with VQ-based IAM and Shape My Moves. Each case keeps the same motion prompt and identity prompt across methods.

Motion Prompt:

"a person throws something with their right hand hard"

Identity Prompt:

"a male with a lean, athletic build"

Ours

VQ Based

Shape My Move

Motion Prompt:

"a person jumps and spin in the air"

Identity Prompt:

"a female with a short, average build"

Ours

VQ Based

Shape My Move

Motion Prompt:

"a person goes into a short jog before stopping"

Identity Prompt:

"a male with a lean, average build"

Ours

VQ Based

Shape My Move

Motion Prompt:

"a person kicks with the left foot"

Identity Prompt:

"a female with a slim build. she is tall, with long torso and legs"

Ours

VQ Based

Shape My Move

Zero-shot Generation on Unseen Prompts

Side-by-side comparison between Shape My Moves and IAM on unseen identities. Identity conditioning follows the paper: each example uses a text identity description together with a reference image (identity keyframe from the source video).

Motion Prompt:

"A person jogs forward."

Identity Prompt:

Reference image for identity conditioning (Case 1).

"A young adult female with a plus-size, heavy build."

Shape My Move

Ours

Motion Prompt:

"A person sits on a bicycle and pedals forward, their legs moving in a continuous, alternating circular motion as they hold the handlebars and ride along a path."

Identity Prompt:

Reference image for identity conditioning (Case 2).

"A young adult female with a slim, athletic build. She has dark hair tied back in a ponytail."

Shape My Move

Ours

Motion Prompt:

"A person stands holding a long-handled rake. They push the rake forward into a smoking pile on the ground, then pull it back, spreading the ashes. They repeat this motion."

Identity Prompt:

Reference image for identity conditioning (Case 3).

"An older adult male with a stocky build, bald on top with hair on the sides."

Shape My Move

Ours

Motion Prompt:

"A person walks at a medium pace across a vast, flat, snow-covered frozen lake on an overcast day."

Identity Prompt:

Reference image for identity conditioning (Case 4).

"An older adult male with a white beard, of average height and build. He is wearing heavy winter clothing, including a parka, snow pants, a hat, and gloves, and is carrying a large backpack."

Shape My Move

Ours

Identity-Controllable Generation

Given the same motion prompt, IAM generates diverse identity-consistent motions for different body types.

"A person hesitantly walks across a wobbly rope bridge at an outdoor adventure park, holding onto the overhead safety harness for balance."

Young adult female, slender build

Young adult male, slender build

Muscular adult male

Older adult female, heavy-set build

Older adult male, overweight build

"A person performs a cheerful dance on a white background, swaying their hips and moving their arms."

Young adult female, slender build

Young adult male, slender build

Muscular adult male

Older adult female, heavy-set build

Older adult male, overweight build

"A person walks forward holding bricks with both hands."

Young adult female, slender build

Young adult male, slender build

Muscular adult male

Older adult female, heavy-set build

Older adult male, overweight build

BibTeX

@article{jia2026iam,
  author    = {Jia, Wenqi and Li, Zekun and Mittal, Abhay and Tang, Chengcheng and Guo, Chuan and Wang, Lezi and Rehg, James M. and Tao, Lingling and An, Sizhe},
  title     = {IAM: Identity-Aware Human Motion and Shape Joint Generation},
  journal   = {arXiv preprint},
  year      = {2026},
}

IAM: Identity-Aware Human Motion and Shape Joint Generation

Identity-Consistent Motion Generation enables decoupled control of action dynamics and subject morphology. Given identity cues and motion prompts, the model synthesizes diverse body shapes while producing motions that remain physically consistent with body morphology.

Abstract

Framework

Qualitative Results on HumanML3D

Ours

VQ Based

Shape My Move

Ours

VQ Based

Shape My Move

Ours

VQ Based

Shape My Move

Ours

VQ Based

Shape My Move

Zero-shot Generation on Unseen Prompts

Shape My Move

Ours

Shape My Move

Ours

Shape My Move

Ours

Shape My Move

Ours

Identity-Controllable Generation

BibTeX