A new AI framework can direct signal-reflecting antenna arrays toward wireless users without collecting the detailed channel measurements that have blocked real-world deployment of this technology, according to research published on arXiv.

Reflective surface technology — known as Reconfigurable Intelligent Surfaces (RIS) — has been a centrepiece of next-generation wireless research for years. The premise is straightforward: coat walls, ceilings, or structures with arrays of tunable elements that redirect radio signals around obstacles, extending coverage and boosting signal quality without adding conventional transmitters. The obstacle has been equally straightforward. Before a surface can reflect signals usefully, a system needs Channel State Information (CSI) — a precise mathematical picture of how radio waves travel between transmitter, surface, and receiver. Estimating CSI at scale, across hundreds or thousands of surface elements, demands computational resources that push the technology toward impracticality.

Replacing Channel Maths With Spatial Reasoning

The paper proposes abandoning CSI estimation entirely. Instead, the system trains a Multi-Agent Reinforcement Learning (MARL) framework — specifically using an algorithm called Multi-Agent Proximal Policy Optimization (MAPPO) — where each agent controls a subset of mechanically adjustable metallic reflector panels. Crucially, the agents receive only the user's physical coordinates as input, not channel measurements.

To make that tractable, the researchers introduce a key abstraction: rather than having agents reason directly about the mechanical angles of every panel, they map the high-dimensional control problem onto a lower-dimensional "virtual focal point" space. Each agent learns to steer its panels toward a point in that simplified space, and the combined effect focuses reflected energy on the user's location.

By replacing complex channel modeling with spatial intelligence, the framework sidesteps the fundamental physical-layer barrier that has stalled reflective surface deployment.

The training architecture follows a Centralized Training with Decentralized Execution (CTDE) pattern. During training, agents share information to learn cooperative strategies. At deployment, each agent acts independently using only its local inputs — a design that scales without requiring continuous inter-agent communication in the field.

What the Simulations Show

The team evaluated the system using high-fidelity ray-tracing simulations in dynamic, non-line-of-sight (NLOS) environments — scenarios where no direct path exists between transmitter and receiver, and where the geometry changes as users move. These benchmarks are self-reported by the authors and have not been independently verified.

According to the paper, the MARL framework achieved up to a 26.86 dB signal gain over static flat reflectors — a substantial improvement, given that every 3 dB roughly represents a doubling of received signal power. The system also outperformed both single-agent deep reinforcement learning baselines and hardware-constrained alternatives on two specific metrics: spatial selectivity (how precisely the signal is focused) and temporal stability (how consistently coverage is maintained as users move).

A particularly practically significant result concerns what the authors call "deployment resilience." The trained agents maintained stable signal coverage even when user location estimates contained up to 1.0 metre of error — a tolerance level relevant to real-world positioning systems, which are rarely perfect.

Why Mechanical Reflectors, Not Electronic Ones

The specific choice of mechanically adjustable metallic reflectors, rather than electronically tunable RIS panels, is notable. Electronic RIS elements offer faster switching but introduce their own engineering complexity and cost. Mechanically steered reflectors are simpler and potentially cheaper to manufacture and maintain, though they respond more slowly to rapid channel changes. The MARL approach appears designed in part to compensate for that slower response through predictive, learned behaviour rather than reactive channel tracking.

The framework's reliance on user coordinates rather than channel measurements also aligns with a trend in wireless research toward positioning-assisted communication — using GPS or network-derived location data to inform radio resource management without full channel characterisation.

Scalability and the Path to Real Networks

The research addresses a credible concern about multi-agent systems: whether learned cooperation breaks down as network size grows. The CTDE architecture is specifically designed to remain scalable, since decentralised execution means deployment complexity does not grow with the number of agents in the way that centralised control would.

The authors describe the results as validating MARL-driven spatial abstractions as a scalable pathway toward AI-empowered wireless networks — a claim that remains to be tested in physical hardware trials, which the paper does not include. Simulation-to-reality gaps are a recognised challenge in applying reinforcement learning to physical systems, particularly those involving precise mechanical actuation.

Nonetheless, the core contribution — demonstrating that a cooperative AI system can learn effective reflector control using only coarse location data, with no channel estimation — addresses one of the most cited barriers to RIS adoption at scale.

What This Means

If the approach transfers from simulation to hardware, network operators could deploy reflective surface arrays in complex indoor and urban environments without the channel estimation infrastructure that has made this class of technology expensive and difficult to manage — potentially accelerating the role of AI-controlled passive infrastructure in 6G network design.