DexGrasp-Zero

Abstract

To meet the demands of increasingly diverse dexterous hand hardware, it is crucial to develop a policy that enables zero-shot cross-embodiment grasping without redundant re-learning. Cross-embodiment alignment is challenging due to heterogeneous hand kinematics and physical constraints. Existing approaches typically predict intermediate motion targets and retarget them to each embodiment, which may introduce errors and violate embodiment-specific limits, hindering transfer across diverse hands. To overcome these limitations, we propose DexGrasp-Zero, a policy that learns universal grasping skills from diverse embodiments, enabling zero-shot transfer to unseen hands. We first introduce a morphology-aligned graph representation that maps each hand's kinematic keypoints to anatomically grounded nodes and equips each node with tri-axial orthogonal motion primitives, enabling structural and semantic alignment across different morphologies. Relying on this graph-based representation, we design a Morphology-Aligned Graph Convolutional Network (MAGCN) to encode the graph for policy learning. MAGCN incorporates a Physical Property Injection mechanism that fuses hand-specific physical constraints into the graph features, enabling adaptive compensation for varying link lengths and actuation limits for precise and stable grasping. Our extensive simulation evaluations on the YCB dataset demonstrate that our policy, jointly trained on four heterogeneous hands (Allegro, Shadow, Schunk, Ability), achieves an 85% zero-shot success rate on unseen hardware (LEAP, Inspire), outperforming the state-of-the-art method by 59.5%. Real-world experiments further evaluate our policy on three robot platforms (LEAP, Inspire, Revo2), achieving an 82% average success rate on unseen objects.

Paradigm comparison figure — **Paradigm shift: prior approaches versus our method.** (a) Prior paradigm: Existing methods train on a simplified and lossy unified state space. They output intermediate motion targets that require hand-specific retargeting models to convert into physical joint commands. This adds complexity and can lead to kinematically infeasible actions. (b) Our paradigm: We learn a single universal policy end-to-end. The policy operates on a lossless morphology-aligned graph representation and outputs actions in a hand-agnostic motion-primitive space. Physical commands are generated directly through a fixed hand-specific mapping $\mathcal{M}_h$, removing the need for trainable retargeting modules. (c) Real-world deployment on unseen hands validate the effectiveness and zero-shot transfer capability of our approach.

Videos

Real-world Deployment

Trained on Allegro, Shadow, Ability, and Schunk hands; zero-shot deployment in the real world on LEAP, Inspire, and Revo2.

Real-world evaluation props used in hardware experiments — Real-world evaluation props (10 objects) used in our hardware experiments.

Inspire Hand (real-world)

LEAP Hand (real-world)

Revo2 Hand (real-world)

Simulation

Simulation grasping results across all hands.

All Hands (simulation)

Method

Universal hand representation. (a) Morphology-Aligned State Graph Representation: nodes correspond to anatomical units, edges follow kinematic chains, yielding a hand-agnostic semantic graph structure. (b) Schematic of three motion primitives (Flexion, Abduction, Axial Rotation) on a Schunk hand, showing their physical motion effects at representative joints.

Results

Method	Variant	Training Hands				Unseen Hands		Average
Method	Variant	Allegro	Shadow	Ability	Schunk	LEAP	Inspire	Seen	Unseen
CrossDex	per-object	0.81	0.85	0.90	0.90	0.34	0.44	0.865	0.39
CrossDex	multi-object	0.39	0.69	0.42	0.60	0.19	0.34	0.525	0.265
DexGrasp-Zero (Ours)	w/o motion primitives	0.52	0.51	0.64	0.59	0.39	0.29	0.565	0.34
	w/o $\mathcal{G}_{\text{physical}}$ priors	0.91	0.89	0.90	0.84	0.82	0.79	0.885	0.805
	w/o $M_{\text{activation}}$ & $r_{\text{pen}}$	0.92	0.91	0.90	0.81	0.50	0.76	0.885	0.63
	full model	0.92	0.95	0.90	0.91	0.93	0.82	0.92	0.85

Cross-embodiment training and zero-shot transfer results (success rate). Variants of our method: w/o $\mathcal{G}_{\text{physical}}$ priors removes hand-specific physical property encoding; early fusion concatenates physical features with node states at input (instead of layer-wise fusion in GCN); w/o motion primitives replaces the motion-primitive space with raw joint commands; w/o $M_{\text{activation}}$ & $r_{\text{pen}}$ disables the activation mask conditioning and the corresponding action-feasibility penalty in the reward; full model is our complete DexGrasp-Zero policy.

Three-Finger Gripper Test

Zero-shot generalization to Barrett Hand — **Barrett Hand (3-finger, 8-DoF) zero-shot transfer in simulation.** Success rate: **0.70** on YCB objects.

BibTeX

@misc{wu2026dexgraspzeromorphologyalignedpolicyzeroshot,
      title={DexGrasp-Zero: A Morphology-Aligned Policy for Zero-Shot Cross-Embodiment Dexterous Grasping}, 
      author={Yuliang Wu and Yanhan Lin and WengKit Lao and Yuhao Lin and Yi-Lin Wei and Wei-Shi Zheng and Ancong Wu},
      year={2026},
      eprint={2603.16806},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2603.16806}, 
}