Tokens Per Watt Per Dollar: The 2026 AI Efficiency Metric
Cost per GPU-hour, PUE, tokens per second — each sees one layer. Tokens per watt per dollar is the one number that prices intelligence the way owners actually sell it.
Tokens per watt per dollar is the one number that prices AI the way an owner actually sells it. It fuses three things at once: how many tokens a system produces, how much power it burns, and how much capital it ties up. So instead of three metrics that each tell a partial truth, you get a single gauge of what a unit of intelligence really costs to make. We treat it as the operating dial of every campus we build — the same way NVIDIA frames the AI factory around the token as the unit of output.
Here is why the old metrics mislead. Cost per GPU-hour ignores power entirely. Power Usage Effectiveness ignores capital. Tokens per second ignores both. Each one optimizes for whoever invented it — the cloud vendor, the facility engineer, the model researcher — and none of them prices the build for the person who funds the whole thing. Therefore an owner who tunes one variable in isolation will misprice the entire campus, often by enough to erase the margin.
Every figure in this brief is sourced in full on our tokens-per-watt research page.
The three terms, in plain language
The metric has a numerator and two denominators. First, the token term is throughput — the sellable tokens the system actually produces under real load. Second, the watt term is the total power drawn to produce them, facility overhead included. Third, the dollar term is the capital tied up, amortized over the life of the build. Raise the tokens, lower the watts, lower the dollars, and the number goes up. Notably, the discipline is that you cannot cheat one term without paying in another — which is exactly why it surfaces the trade-offs that single metrics hide.
The watt term is where the values live
The watt term is the one most builds get wrong, and it is also where our engineering and our promises are the same thing. Start with overhead. The Green Grid, the body that created the PUE metric, frames the overhead plainly: an industry-average PUE near 1.5 means more than a third of every purchased watt is lost before it reaches a chip. By contrast, a closed-loop liquid campus runs near 1.1, because liquid cooling removes the fan tax and the heat is captured rather than thrown away.
Then there is the source of the watt itself. Because we generate power on the campus and island it from the grid, the electricity behind every token is firm, owned, and priced on our terms — the behind-the-meter logic that also means we draw no grid capacity from the town. And because the cooling loop is sealed, the campus draws zero municipal water. So the two moves that lift the watt term the most are the same two refusals we make to the community: take no grid power, take no town water. The efficiency and the good-neighbor design are one decision, made out of steel and water pipe.
The dollar term is where the gift lives
The dollar term is capital, amortized — and it is more than a finance line. Energy price volatility moves it, which owned generation stabilizes. Hidden operations costs move it, which integration controls. Above all, deployment speed moves it: a campus that earns eighteen months sooner amortizes the same capital across far more tokens, so NREL research on multi-year interconnection waits is not a footnote — it is a direct hit to the dollar term for anyone who builds the conventional way.
Here is the part that matters to the towns we build in. We drive the dollar term down by building the liquid-cooled pods on a domestic line in Fort Worth rather than importing them. That decision lands as cost discipline on the P&L and as skilled jobs in the community — the same line item, read two ways. Lower cost for the buyer; real work for the neighbor.
The token term is where the workload meets the metal
The token term rewards the operator who keeps the machine busy and efficient. Specifically, sustained utilization matters more than peak throughput, because idle accelerators burn capital and power while producing nothing. Furthermore, model size and batching shape how many tokens each watt yields, and inference-only optimizations push the number up without new hardware — which is the whole subject of our inference playbook. In short, the token term is won by tuning the workload to the silicon, not by buying more silicon.
How to model it for your own build
You do not need our campus to use the metric; you need to instrument three things. First, instrument the token term: measure real tokens served under production load, not a benchmark. Second, instrument the watt term: meter total facility power, not just IT power, so overhead is visible. Third, instrument the dollar term: amortize true all-in capital over the real useful life. Then stress-test it — vary energy price, utilization, and timeline — because a number that only holds in the best case is not a number you can underwrite. Finally, report the metric next to its three inputs, so no one can improve the score by hiding a term.
Why we build to this number
I will say the personal part plainly. I do not optimize for tokens per watt per dollar because it is a clever metric. I optimize for it because the moves that win it are the moves I already believe in. Generating our own power wins the watt term and keeps the town’s grid untouched. A closed loop wins the watt term and keeps the town’s water. Building domestically wins the dollar term and puts the neighbor to work. My grandparents taught me to give more than you take, and it turns out the most efficient campus and the best-neighbor campus are the same campus. If you want the wider model, read our other field notes or explore the rest of SAVRN.
Frequently asked questions
How is tokens per watt per dollar actually calculated?
You divide sellable token throughput by the product of total power drawn and amortized capital. Specifically, all three terms must use real production numbers — measured tokens, total facility watts, and true all-in cost over useful life — or the score flatters the build.
Which input moves the score the most?
It depends on the build, but the watt term is usually the biggest lever, because overhead and power source compound. Notably, removing cooling overhead and owning generation can move the number more than a hardware upgrade.
Is the metric only useful for large operators?
No. Because it is a ratio, it scales to any size — a single-block campus and a multi-campus operator can both report it. Therefore a smaller buyer can use it to compare vendors just as rigorously as a hyperscaler.
How does owning the power change the score?
Directly and durably. Because on-site generation lowers and stabilizes the cost per kilowatt-hour, it improves the watt and dollar terms at once. As a result, an owned-power campus holds its score even when grid prices climb.
Why does deployment speed affect an efficiency metric?
Because capital amortizes over output, and output starts only when the campus runs. Consequently, a build that reaches first tokens in months rather than years spreads the same capital across far more tokens, lifting the dollar term.
What is the most common mistake in measuring it?
Measuring the token term on a benchmark instead of production load. By contrast, real workloads have idle time, batching limits, and mixed model sizes, so a benchmark score routinely overstates the number a buyer will actually see.
How should the metric trend through 2030?
Upward, on every term: hardware gets more efficient, cooling and power get owned, and software optimizations improve. However, demand rises faster than the grid can keep up, so the operators who own their power will separate furthest from those who do not.
Can I use it to compare two vendors directly?
Yes, if both report it with their inputs shown. Importantly, insist on seeing the three terms separately, because a single composite number can be gamed by trading a hidden weakness in one term for a strength in another.
Does the metric capture data control or data residency?
Not by itself — it is a cost-of-intelligence number. However, owned, in-region campuses tend to score well and satisfy data-residency requirements at the same time, so the two goals usually point in the same direction.
What is SAVRN's actual advantage on the metric?
We own the three levers that move it: on-site power for the watt term, domestic manufacturing for the dollar term, and liquid cooling at density for the token term. Because those are owned rather than rented, the advantage compounds over the life of the campus instead of eroding.