Cloud disks are expensive. Really expensive. Most engineers intuitively understand this, but the magnitudes are worth considering.
Imagine you're running a typical triply replicated "big data" system like Cassandra or Kafka in AWS. You could use VMs with local SSDs, but since you're the person who will inevitably get paged in the middle of the night when something goes wrong, you'd rather use more durable EBS volumes instead. If that feels like overkill to you, consider for a moment how easy it is to fat-finger a single Kubernetes configuration and accidentally terminate all of your VMs and the entire contents of their local disks.
Of course, you’re a practical person, so you do some research and select EBS GP3 volumes since they're the newest generation, and the most cost effective at $0.08/GiB month.
$0.08/GiB. That's pretty expensive, but EBS volumes provide a lot of value over a raw disk: they can be detached from one VM and quickly re-attached to another. Also, perhaps more importantly, they can be resized. These two features dramatically simplify cluster operations, and can save you from a lot of costly mistakes, so they're probably worth paying for.
$0.08/GiB. Pricey, but again probably worth it to reduce time spent on operations and outages, and for peace of mind. But actually… it's not really $0.08/GiB because your data system is replicated three times. So it's more like $0.24/GiB. Also, while EBS volumes can be resized, it's not the sort of thing you can easily automate or do in a pinch, so you'll need to run your system with plenty of headroom. If you want to run the system with 50% of storage free for disasters and changes in workload volume, then we have to double the cost again to $0.48/GiB.
Suddenly, we’re talking about a lot of money. Almost half a dollar to store a single GiB of data for a month! Storing a 128GiB of data for a year will cost $737 at this rate. Everyone knows the cloud is expensive, but a brand new iPhone 15 Pro with 128 GiB of storage only costs $1000! Something's not right here.
The problem is that tools like EBS, while amazingly powerful, were designed to help lift and shift datacenter software that expects a very stable and static environment into the cloud. That's why they’re so commonly used for pre-cloud technology like Kafka, but that's also what makes them so expensive.
There is another way though, and that's to re-architect the software from the ground up around cloud primitives like object storage. In AWS, S3 only costs $0.02/GiB per month of storage, and that’s post-replication and with (effectively) infinite headroom. That's 24x cheaper than EBS, and you don't have to figure out how to resize volumes on the fly or spends weeks performing capacity planning and load testing to determine exactly how many IOPS will be required ahead of time.
You can play with the numbers a bit here: local SSDs instead of EBS, or 2x replication instead of 3x if you really like to live on the edge. But the unit economics in cloud environments are fundamental: object storage is orders of magnitude more cost effective than local disks, so any system storing or processing large volumes of data must eventually shed its local disks entirely.
If you’ve been nodding along the whole time, that last sentence may have just made you pause. We know what you’re thinking. “Get rid of the local disks entirely? Surely it’s enough to just move the older data to object storage, and keep the hot data on local disks.”
We’ll follow up with more on this thought in a subsequent post, but unfortunately the concept of “tiered storage” is a siren song that sounds great on paper, but ultimately doesn’t work out in practice. Often, it just ends up making the operators' lives harder instead of easier, and unfortunately it’s an architectural dead-end as well.
Zero disks would be better, much better, but we’ll share more on the deficiencies of tiered storage and contrast it with modern zero-disk architectures in a subsequent post!