Google DeepMind's AI Recognizes Whale Calls

Google DeepMind's Perch 2.0, an AI model trained on millions of bird and land-animal recordings, can classify whale vocalisations with competitive accuracy — a finding that interested the researchers who tested it.

Editor's Note: This article is based on an official announcement from the source organization. Claims regarding performance, benchmarks, and capabilities have not been independently verified.

The result, presented at the NeurIPS conference workshop on AI for Non-Human Animal Communication last December, builds on nearly a decade of whale bioacoustics research at Google DeepMind and Google Research. Previous work from the team includes algorithms to detect humpback whale calls and a multispecies model capable of identifying eight distinct whale species. Perch 2.0 was never designed for ocean acoustics, yet when researchers tested it on marine datasets, it outperformed or matched dedicated whale models.

From Birdsong to Biotwangs: How Transfer Learning Bridges Two Worlds

The mechanism behind this cross-domain success is transfer learning — a technique where knowledge acquired during one task carries over to a different but related one. Rather than building an entirely new model for whale sounds, the team fine-tuned Perch 2.0 by training a simple logistic regression classifier on top of the model's existing feature representations.

The researchers converted 5-second windows of audio from three marine datasets into spectrograms — visual maps of sound intensity across frequencies over time. Perch 2.0 processed these images and produced embeddings, compact feature sets that capture the most acoustically meaningful attributes of each sound clip.

"We're always making new discoveries about call types. We're always learning new things about underwater sounds. There are so many mysterious ocean noises that you can't just have one fixed model." — Lauren Harrell, Google Research

Critically, the classifier performed well even when trained on as few as 4 embeddings per dataset, with performance improving as that number rose to 32. This low data requirement matters in marine biology, where labelled recordings of specific whale calls are often scarce and expensive to obtain.

Why Bird Models Understand Whale Calls

Lauren Harrell, a data scientist at Google Research, and her colleagues offer three explanations for why a model trained on avian sounds transfers effectively to cetacean acoustics.

First, birds and marine mammals may have evolved similar physical mechanisms of vocal production — an evolutionary parallel that produces structurally comparable sounds despite vastly different environments. Second, large models trained on diverse, high-volume datasets tend to generalise well even to out-of-domain tasks, a pattern well-documented in other areas of machine learning. Third, classifying bird calls is genuinely difficult: thousands of species produce overlapping, fine-grained acoustic signals, and learning to distinguish them may train the model to detect subtle features that prove equally useful underwater.

"The whistles of killer whale populations are in the same kind of spectrogram range as many of the bird vocalisations," Harrell explains. The model also handles low-frequency calls from birds, amphibians, and mammals — making it sensitive to the kind of acoustic dynamics that appear in underwater recordings.

Competitive Performance Against Specialist Models

When benchmarked against comparable bird bioacoustics models, the existing multispecies whale model, and models trained on coral reef sounds, Perch 2.0 ranked either first or second across the marine datasets tested. The bird bioacoustics models also performed strongly, reinforcing the transfer learning hypothesis rather than attributing the result to Perch 2.0's scale alone.

It is worth noting that these benchmark results were reported by the researchers themselves in a workshop paper, not yet through independent peer review. The datasets used — three marine audio collections containing whale sounds and other aquatic noises — have not been publicly detailed in full, and external validation of the findings has not yet been published.

The practical implication, according to the team, is significant. Building and training a dedicated marine foundation model requires substantial computational resources and large volumes of labelled data. If an existing terrestrial model can perform comparably with minimal fine-tuning, researchers working on marine conservation gain a faster, cheaper route to deploying acoustic monitoring tools.

What This Means

For conservation scientists monitoring whale populations through passive acoustic recording, Perch 2.0 offers a ready-made, computationally efficient starting point — one that could accelerate species identification and call discovery without the cost of building marine-specific AI infrastructure from the ground up.

Google DeepMind's Bird-Trained AI Effective at Recognising Whale Calls

From Birdsong to Biotwangs: How Transfer Learning Bridges Two Worlds

Why Bird Models Understand Whale Calls

Competitive Performance Against Specialist Models

What This Means

Berkeley Researchers Propose GRASP, a Gradient-Based Planner for Long-Horizon World Models

Stanford AI Index Reports US-China Model Performance Gap Narrowed to 2.7%

Vidoc Security Says It Replicated Anthropic's Mythos Findings Using Public Models

Google DeepMind's Bird-Trained AI Effective at Recognising Whale Calls

From Birdsong to Biotwangs: How Transfer Learning Bridges Two Worlds

Why Bird Models Understand Whale Calls

Competitive Performance Against Specialist Models

What This Means

Related

Berkeley Researchers Propose GRASP, a Gradient-Based Planner for Long-Horizon World Models

Stanford AI Index Reports US-China Model Performance Gap Narrowed to 2.7%

Vidoc Security Says It Replicated Anthropic's Mythos Findings Using Public Models