What is Kappa Architecture? - ixdb

Kappa Architecture – Where Every Thing Is A Stream (pathirage.org)

Kappa Architecture is a software architecture pattern. Rather than using a relational DB like SQL or a key-value store like Cassandra, the canonical data store in a Kappa Architecture system is an append-only immutable log. From the log, data is streamed through a computational system and fed into auxiliary stores for serving.

Kappa Architecture is a simplification of Lambda Architecture. A Kappa Architecture system is like a Lambda Architecture system with the batch processing system removed. To replace batch processing, data is simply fed through the streaming system quickly.

But why?

Kappa Architecture revolutionizes database migrations and reorganizations: just delete your serving layer database and populate a new copy from the canonical store! Since there is no batch processing layer, only one set of code needs to be maintained.

Says who?

The idea of Kappa Architecture was first described in an article by Jay Kreps from LinkedIn. Then came the talk “Turning the database inside out with Apache Samza” by Martin Kleppmann at 2014 StrangeLoop which inspired this web site.

TURNING THE DATABASE INSIDE OUT WITH APACHE SAMZA

HOW DO I MAKE MY OWN?

RESOURCES

Questioning the Lambda Architecture
Apache Kafka and the Next 700 Stream Processing Systems
Article by Jay Kreps: The Log: What every software engineer should know about real-time data’s unifying abstraction
Presentation: Discovering Kappa Architecture the hard way
Linux Foundation Presentation: Kappa Architecture: Our Experience
Liquid: Unifying Nearline and Offline Big Data Integration (Summary of Liquid paper can be found here.)
Article by Joan Goyeau: Functional Programming with Kafka Streams and Scala

Tools

LOG DATA STORES

An append-only immutable log store is the canonical store in a Kappa Architecture (or Lambda Architecture) system. Some log databases:

STREAMING COMPUTATION SYSTEMS

In Kappa Architecture, data is fed from the log store into a streaming computation system. Some distributed streaming systems:

SERVING LAYER STORES

The purpose of the serving layer is to provide optimized responses to queries. These databases aren’t used as canonical stores: at any point, you can wipe them and regenerate them from the canonical data store. Almost any database, in-memory or persistent, might be used in the serving layer. This also includes special-purpose databases, e.g. for full text search.