WarpStream S3 Express One Zone Benchmark and Total Cost of Ownership

HN Disclaimer: WarpStream sells a drop-in replacement for Apache Kafka built directly on-top of object storage.

WarpStream was the first data streaming system ever built directly on top of object storage with zero local disks. In our original public benchmarks, we wrote in great detail about how WarpStream’s stateless architecture enables massive cost reductions compared to Apache Kafka at the cost of increased latency.

When S3 Express One Zone (S3EOZ) was first released, we were the first data streaming system to announce support for it. S3EOZ reduced WarpStream’s latency significantly, but also increased its cost due to S3EOZ’s pricing structure. S3EOZ was a great addition to WarpStream because it enabled customers to choose between latency and costs with a single architecture, and even to mix and match high and low latency workloads within a single cluster using Agent Groups. Still, it was expensive compared to S3 standard, and we rarely recommended it to customers unless they had strict latency requirements.

A few weeks ago AWS announced that they were dramatically reducing the cost of S3EOZ by up to 85%. For most realistic use cases, S3EOZ is still more expensive than S3 standard, but with the new price reductions the delta between the two is much smaller than it used to be. So we felt like now was a great time to revisit our public benchmarks and total cost of ownership analysis with S3EOZ in mind.

Results

Our previous public benchmarks blog post was extremely detailed, so we won’t repeat all of that here. However, we’re happy to report that with S3EOZ, WarpStream can land data durably with significantly lower latency than any other zero-disk data streaming system on the market.

In our tests, WarpStream achieved a P99 Produce latency of 169ms and a median Produce latency of just 105ms:

This is roughly 3x lower than what we’re able to accomplish using S3 standard.

TCO

In addition, WarpStream can do this extremely cost-effectively. In our benchmark, we used 5 m7g.xl instances to write 268 MiB/s of traffic, which consumed roughly 50% of the Agent CPU (we allocated 3 vCPUs to each Agent).

VM cost: $0.108/hr (Linux reserved) * 5 (Agents) * 24 * 30 == $338/month in VM fees.

The workload averaged just under 150 PUTs/s and just under 800 GETs/s, so our object storage API costs are as follows:

PUTs: ($0.00113/1000) * 150 (PUT/s) * 2 (replication to two different S3EOZ buckets in different AZs) * 60 * 60 * 24 * 30 == $1,034/month.
GETs: ($0.00003/1000) * 800 (GET/s) * 60 * 60 * 24 * 30 == $62/month.

Storage in S3EOZ is significantly more expensive than in S3 standard, but that doesn’t impact WarpStream’s total cost of ownership because WarpStream lands data into S3EOZ, but within seconds it compacts that data into S3 standard, so the effective storage rate remains the same as it would be without using S3EOZ: ~$0.02/GiB-month. Fortunately, this is one of the dimensions in which the reduced latency doesn’t cost us anything extra at all!

As a result, WarpStream’s S3 storage costs for this workload are ~$130/month.

The final piece of the puzzle is bandwidth. Unlike S3 standard, S3EOZ bills for data uploads ($0.0032/GiB) and retrievals ($0.0006/GiB). Understanding this portion of the cost structure requires understanding WarpStream’s architecture in more depth, but the TLDR; is that we have to pay the per-GiB upload fee twice (once for each S3EOZ bucket we replicate the data to at ingestion time), and then we have to pay the per-GiB retrieval fee four times: once for each AZ that the Agents are running in (to serve live consumers) and once for the compaction from S3EOZ to S3 Standard.

Our workload has a compression ratio of 4x, so our upload fees are:
(0.268GiB/4) * 60 * 60 * 24 * 30 * 2 (replication) * $0.0032 = $1,111/month

Similarly, our retrieval fees are:(0.268GiB/4) * 60 * 60 * 24 * 30 * 4 (live consumers + compaction) * $0.0006 = $416/month

If we add that all up, we get:$338 (vms) + $1,034 (PUTs) + $62(GETs) + $1,111 (uploads) + $416 (retrievals) == $2,961/month in infrastructure costs.

An equivalent 3 AZ Open Source Kafka cluster would cost over $20,252/month, with the inter-zone networking fees alone costing almost five times as much as the total infrastructure costs for WarpStream ($14,765 vs. $2,961).

Even if we compare against the most highly optimized Kafka cluster possible, a single zone cluster with fetch-from-follower enabled, the low-latency WarpStream cluster with S3EOZ is still cheaper at an infrastructure level ($8,223/month for Apache Kafka vs. $2,961/month for WarpStream):

The WarpStream cluster will have slightly higher latency than the Apache Kafka cluster, but not by much, and the WarpStream cluster can run in three availability zones for no additional cost, making it significantly more reliable and durable.

Of course, WarpStream isn’t free. We have to factor in WarpStream’s control plane fees to get the true total cost of ownership running in low-latency mode:

That’s 63% cheaper than the equivalent self-hosted open-source Apache Kafka cluster, and roughly the same cost as a self-hosted Apache Kafka cluster running in a single availability zone, but with significantly better durability, availability, and most importantly, operability. The WarpStream cluster auto-scales, will never run out of disk space or require partition rebalancing, and most importantly, ensures you get to sleep through the night.

Of course, if that cost is still too high, you can always run WarpStream using S3 standard and reduce the WarpStream cost even further. If you want to learn more, we’ve encoded all of these calculations into our public pricing calculator: https://www.warpstream.com/pricing. Just click the “Latency Breakdown” toggle to enable S3EOZ and compare WarpStream’s total cost of ownership to a variety of different alternatives.

For more details about running WarpStream in low-latency mode, check out our docs.

Appendix

Agent Configuration

m7g.xl instances with WarpStream Agent container assigned 3vCPUs and 12 GiB of RAM.
All default settings except WARPSTREAM_BATCH_TIMEOUT are configured to 50ms instead of the default of 250ms (which increases costs, but reduces latency).
https://docs.warpstream.com/warpstream/byoc/advanced-agent-deployment-options/low-latency-clusters

Benchmark Configuration

OpenMessaging workload configuration:

name: benchmark

topics: 1
partitionsPerTopic: 288

messageSize: 1024
useRandomizedPayloads: true
randomBytesRatio: 0.25
randomizedPayloadPoolSize: 1000

subscriptionsPerTopic: 1
consumerPerSubscription: 64
producersPerTopic: 64
producerRate: 270000
consumerBacklogSizeGB: 0
testDurationMinutes: 5760

‍

‍OpenMessaging driver configuration:

name: Kafka
driverClass: io.openmessaging.benchmark.driver.kafka.KafkaBenchmarkDriver
replicationFactor: 3
topicConfig: |
 min.insync.replicas=2
commonConfig: |
bootstrap.servers=$BOOTSTRAP_URL:9092

producerConfig: |
 linger.ms=25
 batch.size=100000
 buffer.memory=128000000
 max.request.size=64000000
 compression.type=lz4
 metadata.max.age.ms=60000
 metadata.recovery.strategy=rebootstrap

consumerConfig: |
 auto.offset.reset=earliest
 enable.auto.commit=true
 auto.commit.interval.ms=20000
 max.partition.fetch.bytes=100485760
 fetch.max.bytes=100485760

Create a free WarpStream account and start streaming with $400 in free credits.

Get Started

Author