Video-MME-v2 Benchmark Exposes Gap Between AI Video Understanding Claims and Reality

AI Research Correspondent5d agoArXiv CS.CV✓Verified across 1 source

The Brief

Researchers introduced Video-MME-v2, a rigorous video understanding benchmark designed to counter inflated leaderboard scores by testing AI models across three complexity levels—visual aggregation, temporal modeling, and multimodal reasoning. The benchmark, built with 3,300 human-hours of annotation, reveals significant gaps between top models like Gemini-3-Pro and human performance, with errors cascading from basic visual tasks to higher-level reasoning.

✓Verified across 1 independent source

Sources

01https://arxiv.org/abs/2604.05015

Video-MME-v2 Benchmark Exposes Gap Between AI Video Understanding Claims and Reality

AI Models Play Cards Against Humanity — and Agree With Each Other More Than With Humans

Sam Altman's Home Targeted in Second Attack Within 48 Hours

LLMs Lose Ground to Lightweight Graph Parsers When Relation Extraction Gets Complex