Tenstorrent plans to ship the QuietBox 2, a desktop AI workstation priced at $9,999, in Q2 2026 — a machine that fits on a home-office desk, draws power from a standard wall outlet, and runs large language models at speeds faster than a live response from GPT-5 or Claude.

The demand for local AI inference has outpaced what conventional PC hardware can deliver. A typical laptop carries enough memory to load a model with 8 billion to 13 billion parameters, while even high-end workstation GPUs struggle with models above 70 billion parameters. The QuietBox 2 is built specifically to close that gap, without requiring a data centre power supply or a six-figure budget.

Four Custom Chips Where a GPU Tower Can't Fit

The QuietBox 2 houses four of Tenstorrent's Blackhole application-specific integrated circuits — RISC-V chips designed exclusively for AI workloads. Each card carries 120 Tensix AI accelerators and 32 GB of GDDR6 memory, giving the system a total of 480 Tensix cores and 128 GB of GDDR6, supplemented by 256 GB of DDR5 system memory for a combined 384 GB. That is enough to load OpenAI's GPT-OSS-120B and run Meta's Llama 3.1 70B at close to 500 tokens per second — described by Tenstorrent as several times faster than average responses from frontier cloud models.

"Our 128 gigabytes of GDDR6 RAM would require four Nvidia RTX 5090 graphics cards. That couldn't fit in today's 1,600-watt form factor, and the cost for four RTX 5090 GPUs is huge." — Milos Trajkovic, cofounder and systems engineer, Tenstorrent

The memory density is significant because GDDR6 bandwidth determines how quickly a model's weights can be fed to the accelerators. Matching equivalent memory in a standard GPU configuration would require four Nvidia RTX 5090s, which Nvidia recommends running on a 1,000 W system each — making a four-card rig impractical at roughly 4,000 W and potentially costing more than the QuietBox 2's entire asking price for the GPUs alone.

Power Budget as a Design Constraint

The QuietBox 2 draws a maximum of 1,400 W under full load. A standard 15-ampere, 120-volt North American circuit supports around 1,800 W continuous, meaning the machine operates safely without a dedicated high-amperage line. This single engineering decision expands the potential user base considerably — developers working from home offices can deploy the system without electrical upgrades or facilities approvals.

The chassis reinforces the desktop-PC form factor deliberately. It supports a micro-ATX motherboard with an AMD x86 CPU and compatible chipset, uses closed-loop liquid cooling, includes customisable RGB LED lighting, and features a semitransparent side panel. According to Chris Goulet, thermal-mechanical engineer and team lead at Tenstorrent, internal developers have been requesting units precisely because the setup is frictionless: ship the box, plug it in, and it works.

Direct Competitor: Nvidia's DGX Station at $85,000

Tenstorrent is not operating in a vacuum. Nvidia released the DGX Spark last year and opened orders for the DGX Station — powered by the GB300 chip — on 16 March 2026. DGX Station variants will be sold through Asus, Dell, and MSI, with one retailer listing the MSI configuration at $85,000. The DGX Station offers up to 748 GB of memory and draws up to 1,600 W, sitting at the ceiling of a standard breaker.

Nvidia's director of product marketing, Allyn Bourgoyne, has said the company expects most DGX owners to use the devices as remotely accessed servers — sending jobs over a network from a separate laptop rather than sitting directly at the machine. The DGX Station runs DGX OS, a proprietary Ubuntu variant built around Nvidia's CUDA ecosystem.

Tenstorrent's positioning differs in key ways. The QuietBox 2 runs standard Ubuntu with a full desktop environment accessible via HDMI, making it usable as a primary workstation rather than just a remote compute node. The x86 CPU and conventional PC architecture also improve software compatibility compared to Nvidia's ARM-based DGX chipsets.

An Open-Source Stack as a Developer Argument

A major differentiator is software philosophy. Tenstorrent has made its entire software stack open source, including TT-Forge (its AI compiler) and TT-Metalium (a low-level SDK providing kernel-level hardware control), both available on GitHub. The company has also published the instruction set architecture for its Tensix cores, allowing developers to inspect exactly how workloads execute on the silicon.

For teams evaluating AI infrastructure, this openness reduces vendor lock-in risk and lowers the barrier to custom optimisation. Nvidia's CUDA ecosystem remains dominant and deeply integrated into the ML toolchain, but it is proprietary — and developers working with Tenstorrent hardware face no equivalent constraints.

The $9,999 price point also matters in enterprise procurement contexts. At roughly one-ninth the cost of the MSI DGX Station, teams can deploy multiple QuietBox 2 units — one per developer, for instance — rather than sharing a single remote server.

What This Means

For AI developers who need serious local inference capability without a data-centre budget, the QuietBox 2 represents a cost-accessible option for running sub-100B parameter models at meaningful speed — and its open-source stack gives engineering teams a level of hardware transparency that Nvidia's ecosystem does not offer.