What is Kappa Architecture?

Kappa Architecture – Where Every Thing Is A Stream (pathirage.org)

Kappa Architecture is a software architecture pattern. Rather than using a relational DB like SQL or a key-value store like Cassandra, the canonical data store in a Kappa Architecture system is an append-only immutable log. From the log, data is streamed through a computational system and fed into auxiliary stores for serving.

Kappa Architecture is a simplification of Lambda Architecture. A Kappa Architecture system is like a Lambda Architecture system with the batch processing system removed. To replace batch processing, data is simply fed through the streaming system quickly.

But why?

Kappa Architecture revolutionizes database migrations and reorganizations: just delete your serving layer database and populate a new copy from the canonical store! Since there is no batch processing layer, only one set of code needs to be maintained.

Says who?

The idea of Kappa Architecture was first described in an article by Jay Kreps from LinkedIn. Then came the talk “Turning the database inside out with Apache Samza” by Martin Kleppmann at 2014 StrangeLoop which inspired this web site.

TURNING THE DATABASE INSIDE OUT WITH APACHE SAMZA

HOW DO I MAKE MY OWN?

RESOURCES

Tools

LOG DATA STORES

An append-only immutable log store is the canonical store in a Kappa Architecture (or Lambda Architecture) system. Some log databases:

STREAMING COMPUTATION SYSTEMS

In Kappa Architecture, data is fed from the log store into a streaming computation system. Some distributed streaming systems:

SERVING LAYER STORES

The purpose of the serving layer is to provide optimized responses to queries. These databases aren’t used as canonical stores: at any point, you can wipe them and regenerate them from the canonical data store. Almost any database, in-memory or persistent, might be used in the serving layer. This also includes special-purpose databases, e.g. for full text search.

A python package that gives you easy access to the most valuable datasets of Germany

https://github.com/bundesAPI/deutschland

https://github.com/bundesAPI

Auf dem API-Portal des Bundes und hier auf GitHub.com/bundesAPI finden Sie Dokumentationen zu Programmierschnittstellen von Verwaltungsleistungen und Informationsportalen des Bundes. Die Dokumentionen liegen idR. im API-Dokumentationsformat OpenAPI 3 vor und sind insofern sowohl menschen- als auch maschinenlesbar.

Momentan finden sich auf bund.dev bereits Dokumentationen zu mehr als 30 Programmierschnittstellen – wobei die Zahl online verfügbarer und dokumentierbarer Schnittstellen aufgrund des „Gesetzes zur Änderung des E-Government-Gesetzes und zur Einführung des Gesetzes für die Nutzung von Daten des öffentlichen Sektors“ (2. Open-Data-Gesetz) bis 2024 stark anwachsen wird.
https://de.wikipedia.org/wiki/Lilith_Wittmann

SourceGrid

SourceGrid is a free open source grid control. Supports virtual grid, custom cells and editors, advanced formatting options and many others features SourceGrid is a Windows Forms control written entirely in C#, goal is to create a simple but flexible grid to use in all of the cases in which it is necessary to visualize or to change a series of data in a table format. There are a lot of controls of this type available, but often are expensive, difficult to be customize or not compatible with .NET. SourceGrid allows users to have customizable datasource which is not in DataSet format.

Free Controls: https://www.syncfusion.com/products/communitylicense

Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL

https://trino.io

Trino is a highly parallel and distributed query engine, that is built from the ground up for efficient, low latency analytics.

Access data from multiple systems within a single query. For example, join historic log data stored in an S3 object storage with customer data stored in a MySQL relational database.

https://github.com/andreas5588/trino

Open Source Data Quality and Profiling download | SourceForge.net

31 Data lineage tools – DBMS Tools

Ultorg: General-Purpose, User-Friendly Database Software

Intersystems IRIS \ Docker \ Openflights Dataset

Some links about the topic from title

https://github.com/andreas5588/openflights_dataset

https://community.intersystems.com/post/using-docker-container-group-iris-openflights-dataset-and-apache-zeppelin

https://community.intersystems.com/post/tips-and-tricks-brand-new-load-data-command

https://hub.docker.com/r/andreasschneiderixdbde/openflights-iris-zeppelin

https://hub.docker.com/r/andreasschneiderixdbde/openflights-iris

https://openexchange.intersystems.com/package/openflights_dataset

https://www.kaggle.com/datasets/sherrytp/airline-delay-analysis/code

The Million Song Dataset Challenge aims at being the best possible offline evaluation of a music recommendation system
https://www.kaggle.com/c/msdchallenge/overview

Do you want to query complex data structures in an iterative way? Do you have access to hierarchical data structures that need to be queried? This course will teach you the tools required to solve these questions. You will learn how to write recursive queries and query hierarchical data structures. To do this, you will use Common Table Expressions (CTE) and the recursion principle on a wide variety of datasets. You will, for example, dig into a flight plan dataset and learn how to find the best and cheapest connection between two airports. After completing this course, you will understand the principle of recursion, and be able to identify and create hierarchical data models.

https://www.datacamp.com/courses/hierarchical-and-recursive-queries-in-sql-server

This article aims at showing good practices on how to retrieve data with SQL using practical examples on the data above. The following topics are covered:

  • operations on columns
  • most common joins
  • aggregations and window functions
  • tips to handle complex queries

https://www.mit.edu/~amidi/teaching/data-science-tools/tutorial/queries-with-sql/

Have you ever wondered about the differences between a subquery and a common table expression (CTE) in SQL? The concepts seem very similar, but knowing the difference – and when to use each one – will help you write efficient and readable queries.

https://learnsql.com/blog/sql-subquery-cte-difference/

MDX Studio

MDX Studio is a tool that was developed by Mosha Pasumansky, a former Analysis Services developer.
This tool is invaluable when writing a MDX query: you have a code formatting feature, an embedded system to analyze query performance and many other features for writing MDX queries.

Unfortunately, the source code is not publicly available and the project is currently no longer being updated by Mosha. Other contributors made an effort to keep the tool compatible with newer versions of Analysis Services and client connection libraries.

https://www.sqlbi.com/tools/mdx-studio/

Martin Fowler – Software Design

Kent Beck is an American software engineer and the creator of extreme programming, an original signer of the Agile Manifesto, and the author of the Extreme Programming book series, and a proponent of Test-Driven Development

https://www.hanselman.com/blog/the-weekly-source-code-33-microsoft-open-source-inside-google-chrome

Developer and Power Users Tool List for Windows

https://www.hanselman.com/blog/scott-hanselmans-2021-ultimate-developer-and-power-users-tool-list-for-windows

ROBOCOPY

robocopy <source> <target> /MIR /sec /XD "<exclude folder>" /MT /NP /DCOPY:T /COPY:DT

/COPY:copyflag[s] :: what to COPY for files (default is /COPY:DAT).
                      (copyflags : D=Data, A=Attributes, T=Timestamps).
                      (S=Security=NTFS ACLs, O=Owner info, U=aUditing info).

/DCOPY:T :: COPY Directory Timestamps.