RSS
Subscribe
Engineering

WarpStream Blog

Kafka Transactions Explained (Twice!)

Jan 13, 2025
Manu Cupcic
In this blog post we'll explain how transactions work in Kafka by comparing and contrasting the implementations of transactions in two different Kafka implementations: the official Apache Kafka project, and WarpStream.

Getting Rid of (Kafka) Noisy Neighbors Without Having to Buy a Mansion

Dec 3, 2024
Aratz Manterola Lasa
In this post, we’ll look at what noisy neighbors are, the current ways to handle them (cluster quotas and mirroring clusters), and how WarpStream’s solution compares in terms of elasticity, operational simplicity, and cost efficiency.

Introducing WarpStream BYOC Schema Registry

Nov 25, 2024
Brian Shih
WarpStream BYOC reimplements the Kafka protocol with a stateless, zero-disk cloud-native architecture, replacing Kafka brokers with WarpStream Agents to simplify operations. But data streaming extends beyond Kafka clusters alone.

The Case for Shared Storage

Nov 19, 2024
Richard Artoul
In this post, I’ll start off with a brief overview of “shared nothing” vs. “shared storage” architectures in general. This discussion will be a bit abstract and high-level, but the goal is to share with you some of the guiding philosophy that ultimately led to WarpStream’s architecture.

Kafka Replication Without the (Offset) Gaps

Nov 13, 2024
Arjun Nair
Orbit is a tool which creates identical, inexpensive, scaleable, and secure continuous replicas of Kafka clusters. It is built into WarpStream and works without any user intervention to create WarpStream replicas of any Apache Kafka-compatible source cluster.

Announcing Schema Validation with AWS Glue Schema Registry

Sep 25, 2024
Brian Shih
WarpStream now supports AWS Glue Schema Registries, in addition to the Kafka-compatible schema registries. The WarpStream Agent can use schemas stored in the user’s AWS Glue Schema Registries to validate records.

Dealing with rejection (in distributed systems)

Aug 13, 2024
Richard Artoul
Backpressure is a really simple concept. When the system is nearing overload, it should start “saying no” by slowing down or rejecting requests. Of course, the big question is: How do we know when we should reject a request?

Announcing WarpStream Schema Validation

Jul 18, 2024
Brian Shih
WarpStream now has the capability to connect to external schema registries, and verify that records actually conform to the provided schema.

The Kafka Metric You're Not Using: Stop Counting Messages, Start Measuring Time

Jul 16, 2024
Aratz Manterola Lasa
Traditional offset-based monitoring can be misleading due to varying message sizes and consumption rates. To address this, you can introduce a time-based metric for a more accurate assessment of consumer group lag.

Multiple Regions, Single Pane of Glass

Jun 20, 2024
Emmanuel Pot
How we built support for running WarpStream's control plane and Metadata Store in multiple regions, while still presenting our platform as a single pane of glass.

Secure by default: How WarpStream’s BYOC deployment model secures the most sensitive workloads

Jun 10, 2024
Caleb Grillo
WarpStream's Zero Disk Architecture enables a BYOC deployment model that is secure by default and does not require any external access to the customer's environment.

Announcing Bento, the open source fork of the project formerly known as Benthos

May 31, 2024
Richard Artoul
Announcing Bento, the open source fork of the project formerly known as Benthos.

Zero Disks is Better (for Kafka)

May 23, 2024
Richard Artoul
Follow up to "Tiered Storage Won't Fix Kafka", this post covers all the different advantages that WarpStream's Zero Disk Architecture provides over Apache Kafka.

Tiered Storage Won’t Fix Kafka

Apr 28, 2024
Richard Artoul
Tiered storage is a hot topic in the world of data streaming systems, and for good reason. Cloud disks are (really) expensive, object storage is cheap, and in most cases, live consumers are just reading the most recently written data. Paying for expensive cloud disks to store historical data isn’t cost-effective, so historical data should be moved (tiered) to object storage. On paper, it makes all the sense in the world.

Cloud Disks are (Really!) Expensive

Apr 20, 2024
Richard Artoul
Cloud disks are expensive. Really expensive. Most engineers intuitively understand this, but the magnitudes are worth considering.

The Original Sin of Cloud Infrastructure

Mar 14, 2024
Richard Artoul
Many of today's most highly adopted open source “big data” infrastructure projects – like Cassandra, Kafka, Hadoop, etc. – follow a common story. A large company, startup or otherwise, faces a unique, high scale infrastructure challenge that's poorly supported by existing tools. They create an internal solution for their specific needs, and then later (kindly) open source it for the greater community to use. Now, even smaller startups can benefit from the work and expertise of these seasoned engineering teams. Great, right?

Deterministic Simulation Testing for Our Entire SaaS

Mar 12, 2024
Richard Artoul
How we leverage Antithesis to deterministically simulate our entire SaaS platform and verify its correctness, all the way from signup to running entire Kafka workloads.

Kafka as a KV Store: deduplicating millions of keys with just 128 MiB of RAM

Mar 4, 2024
Manu Cupcic
A huge part of building a drop-in replacement for Apache Kafka® was implementing support for compacted topics. The primary difference between a “regular” topic in Kafka and a “compacted” topic is that Kafka will asynchronously delete records from compacted topics that are not the latest record for a specific key within a given partition.

Anatomy of a serverless usage based billing system

Feb 8, 2024
Richard Artoul
Serverless products and usage based billing models go hand in hand, almost by definition. A product that is truly serverless effectively has to have usage based pricing, otherwise it’s not really serverless!

S3 Express is All You Need

Nov 28, 2023
Richard Artoul
The future of modern data infrastructure is object storage.

Unlocking Idempotency with Retroactive Tombstones

Nov 18, 2023
Richard Artoul
How we separated data from metadata to build support for idempotent producers in our Apache Kafka protocol layer.

Minimizing S3 API Costs with Distributed mmap

Oct 9, 2023
Richard Artoul
We first introduced WarpStream in our blog post: "Kafka is Dead, Long Live Kafka", but to summarize: WarpStream is a Kafka protocol compatible data streaming system built directly on top of object storage.

Hacking the Kafka PRoTocOL

Sep 18, 2023
Richard Artoul
How we built stateless load balancing into a protocol that was never designed for it.

Kafka is dead, long live Kafka

Jul 25, 2023
Richard Artoul
Chances are you probably had a strong reaction to the title of this post. In our experience, Kafka is one of the most polarizing technologies in the data space. Some people hate it, some people swear by it, but almost every technology company uses it.