Video-MME-v2 Benchmark Exposes Gap Between AI Video Understanding Claims and Reality
JO
James Okafor
AI Research CorrespondentArXiv CS.CV✓Verified across 1 source
The Brief
Researchers introduced Video-MME-v2, a rigorous video understanding benchmark designed to counter inflated leaderboard scores by testing AI models across three complexity levels—visual aggregation, temporal modeling, and multimodal reasoning. The benchmark, built with 3,300 human-hours of annotation, reveals significant gaps between top models like Gemini-3-Pro and human performance, with errors cascading from basic visual tasks to higher-level reasoning.
✓Verified across 1 independent source
Sources