Hidden Costs of Cloud GPUs: Storage, Egress, and Idle Time
A cloud GPU bill is a stack of line items, only one of which is the GPU itself. Looking at the hourly rate alone is the most common mistake teams make when planning a workload — and it is the reason a 30-day project budgeted at $1,500 of "GPU time" can land at $2,000 once the invoice arrives.
This page goes through every cost category that lives next to the GPU rate, what drives each one, and the pattern that usually catches teams off guard. Bring it up against any provider's quote before signing off on a budget.
1. Block storage attached to the GPU instance
Most GPU instances ship with a small system disk and need additional block storage for datasets, checkpoints, and intermediate artifacts. That storage is billed separately from the instance — typically per GB-month, with a multiplier for higher-IOPS or higher-throughput tiers.
For a single instance running for a few hours this is invisible; for a multi-week training run with a 2 TB dataset and a steady stream of checkpoints, it can add 5–15% on top of the GPU hourly rate. The bigger trap: storage continues to bill while the instance is stopped. A team that "saves money" by powering down a dev box on weekends still pays for the volume sitting underneath it.
2. Snapshots, images, and persistent images
Snapshotting a volume so you can spin up a new instance with the same state is convenient and almost always sells incremental storage at a small premium. Custom images for fast boot — built once with all your dependencies pre-installed — also sit on snapshot storage. The cost is low per snapshot but compounds when teams take a snapshot per training run and never delete the old ones.
3. Data egress
Egress — bytes leaving the cloud or moving between regions — is the line item that surprises the most teams. Same-region traffic between availability zones is usually charged at a small per-GB rate; cross-region traffic and traffic to the public internet are charged at much higher rates that have not dropped much over the last decade.
| Direction | Typical pricing pattern | Where it bites |
|---|---|---|
| Inside one availability zone | Often free | Rarely |
| Between AZs in the same region | Low per-GB fee both ways | Multi-AZ training and replication |
| Between regions | Higher per-GB fee, source-side | Cross-region replication, multi-region inference |
| To the public internet | Highest per-GB fee | Serving model outputs, data exports, downloading large checkpoints |
Two patterns to watch for:
- Cross-region datasets. Hosting your dataset in one region and your GPUs in another doubles the egress bill on every read.
- Serving models from the cloud. If your inference output goes back over the public internet to users, the bytes-out are billed at the highest tier. Throughput-heavy use cases (image and video generation) hit this hardest.
4. Networking add-ons
Distributed training across more than one machine usually depends on a fast cross-node fabric. The GPU rate does not include the networking; you pay for it explicitly:
- InfiniBand or RoCE on cluster nodes is sometimes included in the listed price for high-end SKUs, but on many providers it shows up as a per-hour or per-port surcharge.
- VPC peering, transit gateways, NAT gateways, and PrivateLink endpoints are billed per hour and per GB. NAT gateways in particular are a frequent source of "where did $400 go?" tickets.
- Load balancers and API gateways add their own per-hour and per-million-request fees if your inference endpoint sits behind one.
5. Idle instances and "forgot to terminate"
The single biggest hidden cost is paying for GPUs that are not doing useful work. Common shapes:
- A research instance left running over a weekend.
- A development environment kept warm because the start-up scripts take 20 minutes to install dependencies.
- An inference autoscaler with a minimum of 1 GPU during quiet hours, paying full price to serve nothing.
- A failed training job whose terminator script crashed before shutting the instance down.
Three habits that pay back quickly: budget alarms on every project, idle-detector cron that powers down instances with sustained low GPU utilization, and a "minimum replicas = 0" policy on any inference deployment that can tolerate cold starts.
6. The CPU and RAM that come with the GPU
Hyperscaler GPU instances bundle GPUs with a fixed CPU/RAM ratio. You cannot rent the GPU and skip the rest. For an 8x H100 box you will get hundreds of vCPU and a terabyte or more of RAM — included in the rate, but consuming budget you might prefer to spend elsewhere. The lesson is not to fight the bundle; it is to size correctly. Asking for "just one GPU" on a hyperscaler is more expensive than the same GPU on a marketplace where you can rent the SKU à la carte.
7. Support, premium-tier, and license fees
Enterprise support tiers price as a percentage of monthly cloud spend, often 3–10%. For a regulated or production-critical workload that is non-negotiable; for a research project it is dead weight. Some specialized SKUs (Windows GPUs, certain ISV-licensed images) also carry a per-hour license premium on top of the base rate.
Worked example: budgeting a 30-day fine-tune
Take a planned 30-day fine-tune on a single H100 80GB at $2.49/hr. The temptation is to budget 720 hours × $2.49 ≈ $1,793 and stop. Now stack the line items most teams actually incur:
| Line | Estimate | Why |
|---|---|---|
| GPU instance | $1,793 | 720 hours at on-demand rate. |
| Block storage (2 TB, 30 days) | $200–300 | Dataset, checkpoints, container images. |
| Snapshots / images | $10–40 | One image and a handful of checkpoint snapshots. |
| Egress (download model artifacts, evaluation data, logs) | $30–150 | Depends heavily on size of artifacts you pull out. |
| NAT gateway / load balancer (if used) | $50–100 | Per-hour plus per-GB processing fees. |
| Idle hours (estimate 10%) | ~$180 | Time the instance was up but the GPU was at <5% utilization. |
Realized total: typically $2,250–$2,550. The "hidden" portion is 25–40% of the GPU rate, and almost all of it is avoidable with discipline. The single biggest lever is shutting the instance down when it is not training.
How to plan for this
- When comparing two providers, multiply the GPU rate by 1.25 as a first-pass overhead estimate. The real number lives between 1.05 and 1.5 depending on workload shape.
- Keep dataset and GPU in the same region. Always.
- Tag every resource by project so you can see in the bill which workload generated which storage, snapshot, and egress.
- Set a hard budget alarm at 70% of the planned spend. If you are at 70% with a third of the calendar left, something is wrong before the bill arrives, not after.
- Cross-check the workload cost in the cost calculator, then add the overhead estimate above.
Related reading
- On-demand vs reserved vs spot — choosing the discount structure that fits the workload shape.
- Cheapest A100 rentals — the headline rate is one input; the lines above are the others.
- AWS vs Lambda Labs — providers package the surrounding services very differently, which changes the realized cost.