On-Demand vs Reserved vs Spot: Cloud GPU Pricing Models Compared

Last reviewed on 2026-04-30 · 10 min read

The hourly rate quoted on a provider's pricing page is only one of several prices that apply to the same GPU. The same H100 might cost $4.99/hr on demand, around $3.20/hr on a one-year commitment, and under $1/hr on the spot market — and the choice of pricing model often matters more than the choice of provider.

This page lays out every meaningful cloud GPU pricing model you are likely to meet, what each one is good for, where it hurts, and the decision criteria that actually move the needle. The goal is to leave you with a default for your workload that is hard to second-guess.

The five pricing models

Model Typical discount vs on-demand Commitment Interruption risk
On-demand 0% (baseline) None — pay per second/minute/hour None
Reserved instance 30–55% 1- or 3-year, instance-type specific None
Committed-use / savings plan 20–55% 1- or 3-year, dollar amount or compute hours None
Spot / preemptible 50–90% None High — provider can reclaim with short notice
Reserved capacity (long-term contract) 40–60% (negotiated) Multi-year, often with minimum spend None

1. On-demand

On-demand is the baseline you see on the public pricing page. The provider gives you a guaranteed instance for as long as you want it, billed in fine-grained increments. You can stop and start at will, you can change instance type at any time, and you owe nothing once you terminate.

This is the right starting point for any new workload. Until you know how often you will run the job, how long it takes, and what GPU it needs, locking in a discount is premature.

2. Reserved instances

A reserved instance is a forward purchase: you tell the provider you will pay for an instance for 1 or 3 years and the discount lands in exchange. Some providers let you change the instance family, region, or operating system mid-term ("convertible"); others lock the SKU.

Reservations are simple to model: take your annualized on-demand spend on that SKU and apply the discount. They make sense for steady-state inference servers, persistent training pipelines, or development environments that nobody bothers to shut down on weekends.

Common pitfalls:

3. Committed-use discounts and savings plans

Committed-use discounts on Google Cloud and savings plans on AWS are the more flexible cousin of reserved instances. Instead of reserving a specific SKU, you commit to a dollar amount of compute spend per hour over the term, and the discount applies to whatever instances you actually run, within scope.

This is usually the better default than instance-specific reservations for any team whose GPU mix is going to change. The discount is slightly lower, but you stop paying for "right answer to last quarter's question".

4. Spot / preemptible

Spot capacity is the provider's surplus inventory, sold at a steep discount on the understanding that they can reclaim it. Termination notice is short — anywhere from "no warning" on some marketplaces to 30 seconds on Google Cloud, two minutes on AWS, and around five minutes on a few specialty providers.

Spot is the highest-leverage cost reduction available, but it shifts engineering burden onto your training loop:

The full operational playbook is in the spot instance guide. Spot pays back fastest for embarrassingly parallel work — hyperparameter sweeps, batch inference, large data preprocessing — and pays back least for single long jobs that cannot tolerate restart overhead.

5. Reserved capacity (negotiated)

Above a certain size — typically several hundred GPUs or several million dollars annually — the public discount tiers stop being the best deal. Providers will negotiate a multi-year capacity contract, often with a minimum committed spend in exchange for guaranteed access to scarce SKUs (H100, H200, B200) and a price floor far below the public rate. This model dominates large training-cluster deployments and is invisible from the public pricing page.

Decision framework

Three questions usually settle the choice:

  1. Is the workload steady-state or bursty? Steady → reservation, savings plan, or contract. Bursty → on-demand and/or spot.
  2. Can the workload tolerate restarts? Yes → spot is a candidate for a chunk of the spend. No → stay on guaranteed capacity.
  3. How confident are you in the SKU? Confident → instance-specific reservation. Unsure → savings plan / committed-use, which keeps options open.
A common, defensible default for production teams: cover the always-on baseline (inference servers, dev environments) with a 1-year savings plan, run elastic workloads (research jobs, training experiments) on on-demand, and use spot for anything fault-tolerant on top of that. Three-year commitments only make sense if you have already validated the workload with a one-year commit.

Worked example: a 6-month research project

Consider a team that needs roughly one A100 80GB for six months — say 3,500 hours of compute, with frequent interruptions tolerated for half of that.

The hybrid path saves about a third of the bill, costs no commitment, and only requires the team to put basic checkpointing in place. That is the typical shape: the largest savings come from combining models, not from picking the cheapest one.

Common mistakes

Related reading