Humanoid Benchmark
Independent humanoid-robot index · Xodexa
XHI v1.0.0

Methodology — the Xodexa Humanoid Index (XHI)

v1.0.0

The XHI is a transparent, reproducible composite score from 0–100 for general-purpose humanoid robots. The design goal is to be defensible and hard to game: every input is an observable, published spec; every transform is documented here; and the pillar weights reflect where the field agrees value is actually created — autonomy and manipulation dominate, while raw locomotion, long since solved enough to be table stakes, no longer wins on its own.

XHI = Σ  weightp × pillar_scorep   (each pillar normalised to 0–100)

Pillar weights

PillarWeight 
Autonomy & Intelligence25%
Manipulation & Dexterity20%
Mobility & Locomotion15%
Hardware & Engineering15%
Commercial Readiness & Deployment15%
Ecosystem & Viability10%

Why these weights

  • Autonomy & Intelligence — 25%. The single biggest open problem and value driver. A teleoperated robot is a puppet; a task-autonomous one is a worker.
  • Manipulation & Dexterity — 20%. Useful work in human spaces is bottlenecked on hands.
  • Mobility & Locomotion — 15%. Necessary but no longer the differentiator it was in the ASIMO era.
  • Hardware & Engineering — 15%. DoF, actuation modernity, payload-to-weight, onboard compute.
  • Commercial Readiness — 15%. Real paid pilots beat announcement videos.
  • Ecosystem & Viability — 10%. Capital decides who survives to iterate — weighted lowest because money ≠ capability.

The anti-hype rule: teleoperation is capped

The most common way humanoid demos mislead is by showing a human-piloted robot as if it were autonomous. XHI scores the demonstrated autonomy mode on an explicit ladder, and teleoperation sits near the floor. This is why a polished consumer robot that relies on remote VR operators ranks below a plainer machine that genuinely does its own work.

Autonomy level (base score)Points
research20
teleoperated35
supervised-autonomy70
task-autonomous92

Bonuses: +8 if the robot ships a named end-to-end / vision-language-action (VLA) / foundation policy; +5 for dedicated onboard AI compute. Capped at 100.

How each pillar is computed

  • Autonomy = autonomy-ladder base + AI/compute bonuses.
  • Manipulation = hand-DoF (55%, vs a 22-DoF human-hand reference) + payload (30%) + dexterous-hands flag (15%).
  • Mobility = walk speed (55%, vs 2.5 m/s) + runtime (45%, vs 5 h).
  • Hardware = total DoF (50%, vs 60) + payload-to-weight efficiency (25%) + actuation modernity (electric > hybrid > hydraulic) + onboard compute.
  • Commercial = maturity ladder + 5 pts per named deployment (max +20).
  • Ecosystem = funding + valuation, with a viability floor for deep-pocketed corporate parents.

Commercial maturity ladder

Status (base score)Points
research10
retired15
prototype28
pilot52
limited-production76
commercial94

Corporate-backed viability floor applies to: Boston Dynamics, Honda, Hyundai, LG, Samsung, Tesla, XPeng, Xiaomi.

Normalisation reference points

Sub-metrics are min-max normalised against frontier reference values, so a perfect 100 means "at or beyond the best demonstrated humanoid", not merely best in this list. A missing spec scores 0 for that sub-metric (absence is treated as informative, never imputed).

Walk speed ref
2.5 m/s
Runtime ref
5.0 h
Hand DoF ref
22.0
Payload ref
25.0 kg
Total DoF ref
60.0
Funding ref
$1000.0M
Honesty notes. (1) Specs were each cross-checked against ≥2 independent reputable sources; disputed figures are flagged and a per-robot confidence grade is shown. (2) The index measures published capability and readiness, not unverifiable demo claims — where a vendor claim could not be independently corroborated, the conservative reading was used. (3) Weights are an editorial judgement; the full formula and constants are open in app/ranking.py and via the methodology API, so anyone can re-weight and recompute.