DexGrasp-Zero: A Morphology-Aligned Policy for Zero-Shot Cross-Embodiment Dexterous Grasping

Yuliang Wu, Yanhan Lin, WengKit Lao, Yuhao Lin, Yi-Lin Wei, Wei-Shi Zheng, Ancong Wu*
Sun Yat-sen University
*Corresponding author
RSS 2026 ยท Accepted

Accepted to Robotics: Science and Systems (RSS) 2026.

Abstract

To meet the demands of increasingly diverse dexterous hand hardware, it is crucial to develop a policy that enables zero-shot cross-embodiment grasping without redundant re-learning. Cross-embodiment alignment is challenging due to heterogeneous hand kinematics and physical constraints. Existing approaches typically predict intermediate motion targets and retarget them to each embodiment, which may introduce errors and violate embodiment-specific limits, hindering transfer across diverse hands. To overcome these limitations, we propose DexGrasp-Zero, a policy that learns universal grasping skills from diverse embodiments, enabling zero-shot transfer to unseen hands. We first introduce a morphology-aligned graph representation that maps each hand's kinematic keypoints to anatomically grounded nodes and equips each node with tri-axial orthogonal motion primitives, enabling structural and semantic alignment across different morphologies. Relying on this graph-based representation, we design a Morphology-Aligned Graph Convolutional Network (MAGCN) to encode the graph for policy learning. MAGCN incorporates a Physical Property Injection mechanism that fuses hand-specific physical constraints into the graph features, enabling adaptive compensation for varying link lengths and actuation limits for precise and stable grasping. Our extensive simulation evaluations on the YCB dataset demonstrate that our policy, jointly trained on four heterogeneous hands (Allegro, Shadow, Schunk, Ability), achieves an 85% zero-shot success rate on unseen hardware (LEAP, Inspire), outperforming the state-of-the-art method by 59.5%. Real-world experiments further evaluate our policy on three robot platforms (LEAP, Inspire, Revo2), achieving an 82% average success rate on unseen objects.

Paradigm comparison figure
Paradigm shift: prior approaches versus our method. (a) Prior paradigm: Existing methods train on a simplified and lossy unified state space. They output intermediate motion targets that require hand-specific retargeting models to convert into physical joint commands. This adds complexity and can lead to kinematically infeasible actions. (b) Our paradigm: We learn a single universal policy end-to-end. The policy operates on a lossless morphology-aligned graph representation and outputs actions in a hand-agnostic motion-primitive space. Physical commands are generated directly through a fixed hand-specific mapping $\mathcal{M}_h$, removing the need for trainable retargeting modules. (c) Real-world deployment on unseen hands validate the effectiveness and zero-shot transfer capability of our approach.

Videos

Real-world Deployment

Trained on Allegro, Shadow, Ability, and Schunk hands; zero-shot deployment in the real world on LEAP, Inspire, and Revo2.

Real-world evaluation props used in hardware experiments
Real-world evaluation props (10 objects) used in our hardware experiments.
Inspire Hand (real-world)
LEAP Hand (real-world)
Revo2 Hand (real-world)

Simulation

Simulation grasping results across all hands.

All Hands (simulation)

Method

Universal hand representation
Universal hand representation. (a) Morphology-Aligned State Graph Representation: nodes correspond to anatomical units, edges follow kinematic chains, yielding a hand-agnostic semantic graph structure. (b) Schematic of three motion primitives (Flexion, Abduction, Axial Rotation) on a Schunk hand, showing their physical motion effects at representative joints.
Method overview figure
Architecture of DexGrasp-Zero. At each time step $t$: (a) Morphology-Aligned Graph Encoder encodes hand-object state into node features $\mathbf{X}^{h}_{\text{node},t}$ and global feature $\mathbf{x}^{h}_{g,t}$ using a hand-specific graph (adjacency $\mathbf{A}^h$); a GCN with per-layer physical priors produces embeddings $\mathbf{E}^{h}_{\text{node},t}$ and $\mathbf{E}^{h}_{g,t}$. (b) Physical Property Encoder parses hand URDF to build a physical graph $\mathcal{G}^{h}_{\text{physical}}$ (joint limits, link lengths, etc.) and an activation mask $\mathbf{M}^{h}_{\text{activation}}$, encoded into $\mathbf{E}^{h}_{p}$ and fused into every GCN layer. (c) Decoder outputs motion primitives $\boldsymbol{\alpha}^{h}_{\text{prim}}$: wrist 6-DoF commands from $\mathbf{E}^{h}_{g,t}$ and wrist features, and joint actions from masked node embeddings; the latter are mapped via hand-specific $\mathcal{M}_h$ to executable joint commands $\alpha^{h}_{\text{physical},t}$.

Results

Method Variant Training Hands Unseen Hands Average
Allegro Shadow Ability Schunk LEAP Inspire Seen Unseen
CrossDex per-object 0.81 0.85 0.90 0.90 0.34 0.44 0.865 0.39
CrossDex multi-object 0.39 0.69 0.42 0.60 0.19 0.34 0.525 0.265
DexGrasp-Zero (Ours) w/o motion primitives 0.52 0.51 0.64 0.59 0.39 0.29 0.565 0.34
w/o $\mathcal{G}_{\text{physical}}$ priors 0.91 0.89 0.90 0.84 0.82 0.79 0.885 0.805
w/o $M_{\text{activation}}$ & $r_{\text{pen}}$ 0.92 0.91 0.90 0.81 0.50 0.76 0.885 0.63
full model 0.92 0.95 0.90 0.91 0.93 0.82 0.92 0.85
Cross-embodiment training and zero-shot transfer results (success rate). Variants of our method: w/o $\mathcal{G}_{\text{physical}}$ priors removes hand-specific physical property encoding; early fusion concatenates physical features with node states at input (instead of layer-wise fusion in GCN); w/o motion primitives replaces the motion-primitive space with raw joint commands; w/o $M_{\text{activation}}$ & $r_{\text{pen}}$ disables the activation mask conditioning and the corresponding action-feasibility penalty in the reward; full model is our complete DexGrasp-Zero policy.

Three-Finger Gripper Test

Zero-shot generalization to Barrett Hand
Barrett Hand (3-finger, 8-DoF) zero-shot transfer in simulation. Success rate: 0.70 on YCB objects.

BibTeX

@misc{wu2026dexgraspzeromorphologyalignedpolicyzeroshot,
      title={DexGrasp-Zero: A Morphology-Aligned Policy for Zero-Shot Cross-Embodiment Dexterous Grasping}, 
      author={Yuliang Wu and Yanhan Lin and WengKit Lao and Yuhao Lin and Yi-Lin Wei and Wei-Shi Zheng and Ancong Wu},
      year={2026},
      eprint={2603.16806},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2603.16806}, 
}