AI Agents Operate Lab Instruments: LLMs as Autonomous

Researchers have shown that large language models can autonomously control scientific laboratory instruments, writing custom code and refining operating strategies without requiring researchers to have significant programming skills.

Editor's Note: This article is based on a preprint research paper that has not yet undergone peer review. DeepBrief is actively monitoring for peer-reviewed publication and additional independent analysis.

The study, posted to ArXiv in April 2025, uses a working hardware setup — configurable as either a single-pixel camera or a scanning photocurrent microscope — as a test bed. The authors demonstrate that ChatGPT can generate functional instrument-control scripts from natural-language prompts, and that LLM-based agents can operate the equipment independently and iterate on their own control strategies.

The Problem: A Programming Wall Between Scientists and Their Equipment

Controlling sophisticated laboratory instruments typically demands proficiency in low-level or specialised programming languages. For researchers whose expertise lies in chemistry, biology, or physics rather than software, this creates a practical barrier. Experiments that could otherwise be designed by domain experts must instead be routed through programmers or constrained to whatever off-the-shelf software vendors provide.

This is not a minor inconvenience. Custom instrument control is often essential for novel experimental designs, and the gap between a researcher's scientific intuition and their ability to implement it in code can slow discovery or push unconventional experiments off the table entirely.

The authors argue LLM-based tools could democratise laboratory automation, allowing domain experts to focus on science rather than software.

What the Case Study Actually Shows

The team's demonstration centres on a dual-purpose optical setup. In single-pixel camera mode, the system reconstructs images by taking many sequential light measurements rather than using a conventional pixel array. In scanning photocurrent microscope mode, it maps electrical responses across a material surface. Both modes require precise, coordinated instrument control — making them a reasonable stress test for an LLM-driven approach.

According to the paper, ChatGPT was used to produce custom control scripts for this hardware. The authors report that the model could translate high-level descriptions of what the experiment should do into working code, substantially cutting the time and expertise required for setup. These results are self-reported by the research team and have not yet undergone peer review.

The more ambitious part of the work involves extending this into autonomous AI agents — systems that do not just generate code on request but independently operate the instruments, assess the results, and adjust their control parameters. The paper describes agents capable of iteratively refining their strategies, a property the authors frame as a step toward instruments that can conduct experiments with minimal human intervention.

From Script Generator to Autonomous Operator

There is a meaningful distinction between an LLM that writes code when asked and an agent that runs an instrument end-to-end. The first is a productivity tool; the second resembles an automated researcher. The paper positions its work at both points on that spectrum, showing the script-generation capability as a foundation and the agent behaviour as the more forward-looking implication.

The iterative refinement aspect is particularly noteworthy. Rather than generating a single script and stopping, the described agents evaluate their own outputs — presumably against some experimental objective or quality metric — and update their approach accordingly. The paper does not appear to provide detailed quantitative benchmarks on how well this refinement performs compared to human-written control strategies, a gap worth noting.

The choice of instrumentation also matters. Photocurrent microscopy and single-pixel imaging are not routine benchtop procedures; they involve coordinated hardware triggering, signal acquisition timing, and data reconstruction. Automating even a subset of these operations carries more weight than demonstrating LLM control over simpler equipment.

Who This Is Built For

The framing throughout the paper is explicitly about access. The authors emphasise reducing the barrier for researchers who lack computational skills — a constituency that is larger than it might appear. Many working scientists in experimental fields manage their careers with only basic coding ability, relying on established software packages and, where those fall short, collaborators with programming expertise.

If LLM-assisted control can substitute for that expertise — at least for instrument setup and routine operation — it could shift who is able to do what kind of science. Smaller labs without dedicated software engineers, researchers in lower-resource institutions, and scientists trying to prototype novel measurement approaches could all benefit.

The limitations are real, however. LLMs can produce plausible-looking code that contains subtle errors, and in a laboratory context a software bug is not just an inconvenience — it can damage equipment, corrupt data, or create safety risks. The paper does not address in detail how errors in LLM-generated scripts were caught and handled during the case study.

What This Means

If autonomous LLM-based agents can reliably control and adapt laboratory instruments, the bottleneck in experimental science shifts from software implementation toward experimental design — which is precisely where domain expertise matters most.

AI Agents Take Controls: LLMs Tested as Autonomous Lab Instrument Operators

The Problem: A Programming Wall Between Scientists and Their Equipment

What the Case Study Actually Shows

From Script Generator to Autonomous Operator

Who This Is Built For

What This Means

Berkeley Researchers Propose GRASP, a Gradient-Based Planner for Long-Horizon World Models

Stanford AI Index Reports US-China Model Performance Gap Narrowed to 2.7%

Vidoc Security Says It Replicated Anthropic's Mythos Findings Using Public Models

AI Agents Take Controls: LLMs Tested as Autonomous Lab Instrument Operators

The Problem: A Programming Wall Between Scientists and Their Equipment

What the Case Study Actually Shows

From Script Generator to Autonomous Operator

Who This Is Built For

What This Means

Related

Berkeley Researchers Propose GRASP, a Gradient-Based Planner for Long-Horizon World Models

Stanford AI Index Reports US-China Model Performance Gap Narrowed to 2.7%

Vidoc Security Says It Replicated Anthropic's Mythos Findings Using Public Models