On-Demand vs Reserved vs Spot: Cloud GPU Pricing Models Compared

Last reviewed on 2026-04-30 · 10 min read

The hourly rate quoted on a provider's pricing page is only one of several prices that apply to the same GPU. The same H100 might cost $4.99/hr on demand, around $3.20/hr on a one-year commitment, and under $1/hr on the spot market — and the choice of pricing model often matters more than the choice of provider.

This page lays out every meaningful cloud GPU pricing model you are likely to meet, what each one is good for, where it hurts, and the decision criteria that actually move the needle. The goal is to leave you with a default for your workload that is hard to second-guess.

The five pricing models

Model	Typical discount vs on-demand	Commitment	Interruption risk
On-demand	0% (baseline)	None — pay per second/minute/hour	None
Reserved instance	30–55%	1- or 3-year, instance-type specific	None
Committed-use / savings plan	20–55%	1- or 3-year, dollar amount or compute hours	None
Spot / preemptible	50–90%	None	High — provider can reclaim with short notice
Reserved capacity (long-term contract)	40–60% (negotiated)	Multi-year, often with minimum spend	None

1. On-demand

On-demand is the baseline you see on the public pricing page. The provider gives you a guaranteed instance for as long as you want it, billed in fine-grained increments. You can stop and start at will, you can change instance type at any time, and you owe nothing once you terminate.

This is the right starting point for any new workload. Until you know how often you will run the job, how long it takes, and what GPU it needs, locking in a discount is premature.

2. Reserved instances

A reserved instance is a forward purchase: you tell the provider you will pay for an instance for 1 or 3 years and the discount lands in exchange. Some providers let you change the instance family, region, or operating system mid-term ("convertible"); others lock the SKU.

Reservations are simple to model: take your annualized on-demand spend on that SKU and apply the discount. They make sense for steady-state inference servers, persistent training pipelines, or development environments that nobody bothers to shut down on weekends.

Common pitfalls:

Reserving the wrong SKU. If you commit to A100 80GB and your workload moves to H100 next quarter, the reservation continues to bill.
Committing during a price-cut cycle. Newer-generation GPUs see street-price drops over time; an old reservation can end up more expensive than current on-demand.
Treating "convertible" as a free option. Convertibility usually carries a smaller discount and may not cover cross-architecture changes.

3. Committed-use discounts and savings plans

Committed-use discounts on Google Cloud and savings plans on AWS are the more flexible cousin of reserved instances. Instead of reserving a specific SKU, you commit to a dollar amount of compute spend per hour over the term, and the discount applies to whatever instances you actually run, within scope.

This is usually the better default than instance-specific reservations for any team whose GPU mix is going to change. The discount is slightly lower, but you stop paying for "right answer to last quarter's question".

4. Spot / preemptible

Spot capacity is the provider's surplus inventory, sold at a steep discount on the understanding that they can reclaim it. Termination notice is short — anywhere from "no warning" on some marketplaces to 30 seconds on Google Cloud, two minutes on AWS, and around five minutes on a few specialty providers.

Spot is the highest-leverage cost reduction available, but it shifts engineering burden onto your training loop:

Aggressive checkpointing.
Termination handlers that flush state on the warning.
Auto-resume scaffolding that picks up the next-best capacity when the current instance dies.

The full operational playbook is in the spot instance guide. Spot pays back fastest for embarrassingly parallel work — hyperparameter sweeps, batch inference, large data preprocessing — and pays back least for single long jobs that cannot tolerate restart overhead.

5. Reserved capacity (negotiated)

Above a certain size — typically several hundred GPUs or several million dollars annually — the public discount tiers stop being the best deal. Providers will negotiate a multi-year capacity contract, often with a minimum committed spend in exchange for guaranteed access to scarce SKUs (H100, H200, B200) and a price floor far below the public rate. This model dominates large training-cluster deployments and is invisible from the public pricing page.

Decision framework

Three questions usually settle the choice:

Is the workload steady-state or bursty? Steady → reservation, savings plan, or contract. Bursty → on-demand and/or spot.
Can the workload tolerate restarts? Yes → spot is a candidate for a chunk of the spend. No → stay on guaranteed capacity.
How confident are you in the SKU? Confident → instance-specific reservation. Unsure → savings plan / committed-use, which keeps options open.

A common, defensible default for production teams: cover the always-on baseline (inference servers, dev environments) with a 1-year savings plan, run elastic workloads (research jobs, training experiments) on on-demand, and use spot for anything fault-tolerant on top of that. Three-year commitments only make sense if you have already validated the workload with a one-year commit.

Worked example: a 6-month research project

Consider a team that needs roughly one A100 80GB for six months — say 3,500 hours of compute, with frequent interruptions tolerated for half of that.

All on-demand at $1.89/hr: 3,500 × $1.89 ≈ $6,615.
1-year reservation at 35% off: only useful if you keep it running for the full year; otherwise the unused months are sunk. For a 6-month project, this almost always loses to on-demand.
Hybrid: 1,750 hours on demand + 1,750 hours on spot at $0.65/hr → 1,750 × $1.89 + 1,750 × $0.65 ≈ $4,445.

The hybrid path saves about a third of the bill, costs no commitment, and only requires the team to put basic checkpointing in place. That is the typical shape: the largest savings come from combining models, not from picking the cheapest one.

Common mistakes

Locking in a 3-year reservation on the day a new GPU generation ships.
Counting the spot discount without budgeting any engineering time for the recovery scaffolding.
Forgetting that reservations cover the instance, not the storage, network, or data egress that come with it. See the hidden cloud GPU costs page for the line items that escape the discount.
Comparing only headline rates across providers. A $2.99/hr on-demand H100 with a 50%-off savings plan can be cheaper than a $2.49/hr H100 with no discount available.

On-Demand vs Reserved vs Spot: Cloud GPU Pricing Models Compared

The five pricing models

1. On-demand

2. Reserved instances

3. Committed-use discounts and savings plans

4. Spot / preemptible

5. Reserved capacity (negotiated)

Decision framework

Worked example: a 6-month research project

Common mistakes

Related reading