Zero Ops Schema Migration: WarpStream Schema Linking

Mar 25, 2025
Brian Shih
HN Disclaimer: WarpStream sells a drop-in replacement for Apache Kafka built directly on-top of object storage.

What is WarpStream Schema Linking?

We previously launched WarpStream Bring Your Own Cloud (BYOC) Schema Registry, a Confluent-compatible schema registry designed with a stateless, zero-disk, BYOC architecture. 

Today, we’re excited to announce WarpStream Schema Linking, a tool to continuously migrate any Confluent-compatible schema registry into a WarpStream BYOC Schema Registry. WarpStream now has a comprehensive Data Governance suite to handle schema needs, stretching from schema validation to schema registry and now migration and replication. 

In addition to migrating schemas, Schema Linking preserves schema IDs, subjects, compatibility rules, etc. This means that after a migration, the destination schema registry behaves identically to the source schema registry from an API level.

WarpStream Schema Linking works for any schema registry that supports Confluent’s Schema Registry API (such as Confluent, Redpanda, and Aiven’s schema registries) and is not tied to any specific schema registry implementation even if the source schema registry implementation isn’t backed by internal Kafka topics.

WarpStream Schema Linking provides an easy migration path from your current schema registry to WarpStream. You can also use it to:

  • Create scalable, cheap read replicas for your schema registry.
  • Sync schemas between different regions/cloud providers to enable multi-region architecture.
  • Facilitate disaster recovery by having a standby schema registry replica in a different region.

Architecture

Like every WarpStream product, WarpStream’s Schema Linking was designed with WarpStream’s signature data plane / control plane split. During the migration, none of your schemas ever leave your cloud environment. The only data that goes to WarpStream’s control plane is metadata like subject names, schema IDs, and compatibility rules.

WarpStream Schema Linking is embedded natively into the WarpStream Schema Registry Agents so all you have to do is point them at your existing schema registry cluster, and they’ll take care of the rest automatically.

Schema migration is orchestrated by a scheduler running in WarpStream’s control plane.  During migration, the scheduler delegates jobs to the agents running in your VPC to perform tasks such as fetching schemas, fetching metadata, and storing schemas in your object store.

Reconciliation

WarpStream Schema Linking is a declarative framework. You define a configuration file that describes how your Agents should connect to the source schema registry and the scheduler takes care of the rest.

The scheduler syncs the source and destination schema registry using a process called reconciliation. Reconciliation is a technique used by many declarative systems, such as Terraform, Kubernetes, and React, to keep two systems in sync. It largely follows these four steps:

  • Computing the desired state. 
  • Computing the current state. 
  • Diffing between the desired state and the current state.
  • Applying changes to make the new state match the desired state.

What does the desired and current state look like for WarpStream Schema Linking? To answer that question, we need to look at how a schema registry is structured. 

A schema registry is organized around subjects, scopes within which schemas evolve. Each subject has a monotonically increasing list of subject versions which point to registered schemas. Subject versions are immutable. You can delete a subject, but you cannot modify the schema it points to[1]. Conceptually, a subject is kind of like a git branch and the subject versions are like git commits.

The subject versions of the source registry represent the desired state. During reconciliation, the scheduler submits jobs to the Agent to fetch subject versions from the source schema registry.

Similarly, the subject versions of the destination schema registry represent the current state. During reconciliation, the scheduler fetches the destination schema registry’s subject versions from WarpStream’s metadata store.

Diffing is efficient and simple. The scheduler just has to compare the arrays of subject versions to determine the minimal set of schemas that need to be migrated. 

Using subject versions to represent the desired and current state is the key to enabling the data plane / control plane split. It allows the scheduler to figure out which schemas to migrate without having access to the schemas themselves.

Finally, the scheduler submits jobs to the Agent to fetch and migrate the missing schemas. Note that this is a simplified version of WarpStream Schema Linking. In addition to migrating schemas, it also has to migrate metadata such as compatibility rules.

Diffing the current state and the desired state.

Observability

Existing schema migration tools like Confluent Schema Linking work by copying the internal Kafka topic (i.e., <span class="codeinline">_schemas</span>) used to store schemas. Users using these tools can track the migration process by looking at the topic offset of the copied topic.

Since WarpStream Schema Linking doesn’t work by copying an internal topic, it needs an alternative mechanism for users to track progress.

As discussed in the previous section, the scheduler computes the desired and current state during reconciliation. These statistics are made available to you through WarpStream’s Console and metrics emitted by your Agents to track the progress of the syncs.

Some of the stats include the number of source and destination subject versions, the number of newly migrated subject versions for each sync, etc.

Next Steps

To set up WarpStream Schema Linking, read the doc on how to get started. The easiest way to get started is to create an ephemeral schema registry cluster with the warpstream playground command. This way, you can experiment with migrating schemas into your playground schema registry.

Notes

[1] If you hard delete a subject and then register a new subject version with a different schema, the newly created subject version will point to a different schema than before. Check out the docs for limitations of WarpStream Schema Linking.

Create a free WarpStream account and start streaming with $400 in free credits.
Get Started
Author
Brian Shih
Software Engineer
Return To Blog
Return To Blog