Lakehouse overview

Lakehouse is the query-first option in Quix Lake. It persists Kafka topic data as Apache Iceberg tables on your blob storage, with a Quix-managed catalog and a SQL query engine — so you can run interactive SQL and time-series analytics without standing up a separate warehouse.

If you need byte-for-byte replay fidelity rather than query access, see Data Lake. Not sure which to pick? Read Choosing between them in the Quix Lake overview.

What you get

SQL access — query historical topic data with standard SQL; no warehouse to manage
Apache Iceberg tables — open table format on your bucket. The Parquet files and Iceberg metadata are readable by external Iceberg-aware engines as well
Time-series friendly — Iceberg partitioning + columnar Parquet means range scans and aggregates prune aggressively
Catalog-backed — schemas, partitions, snapshots, and statistics tracked by a Quix-managed catalog so the Query engine and UI never have to scan storage
Yours — Parquet lives in your blob storage; only catalog metadata lives in Quix-managed services

Prerequisites

A blob storage connection must be configured for the cluster. The same connection can be shared with the Data Lake Sink if you want both.

Components

A Lakehouse is provisioned per blob storage connection and shared across the workspaces that use it. Once it's set up, you interact with two surfaces:

Surface	What it does
Lakehouse Sink	Connector you deploy per Kafka topic. Writes Iceberg tables to your bucket and registers them with the catalog.
UI	In-portal page to browse tables, inspect schemas, and run SQL interactively.

Behind the scenes, the Lakehouse runs a small set of managed services — see Catalog, Query, and Database if you want to understand how it fits together. You don't deploy or configure these directly; Quix provisions and operates them for you.

How it works

flowchart LR
    kafka([Kafka topic])
    sink[Lakehouse Sink]
    bucket[("Your blob storage<br/>Parquet + Iceberg metadata")]
    query[Query engine]
    catalog[Catalog]
    ui[Lakehouse UI]
    apps[Your apps]

    kafka --> sink
    sink -- "register snapshot" --> catalog
    sink -- "Parquet files" --> bucket
    query -- "table metadata" --> catalog
    query -- "read Parquet" --> bucket
    ui -- "SQL" --> query
    apps -- "SQL" --> query

Ingest — the Lakehouse Sink consumes from Kafka and writes Parquet into an Iceberg table on your bucket.
Register — the Catalog commits the new files into a new Iceberg snapshot, tracking schema, partition specs, and file statistics.
Query — the Query engine plans SQL against the latest snapshot, prunes files by partition + column statistics, and reads only the relevant Parquet.
Explore — use the UI to browse tables and run queries in the portal, or call the Query service directly from your own apps.

The Lakehouse backend is provisioned per blob storage connection. Workspaces that share a connection share the Lakehouse and its tables.
Sinks are deployed per workspace and bind to the Lakehouse for the workspace's blob storage automatically.

Operational behavior

Concurrent writers — multiple sink replicas, and sinks from different workspaces that share the blob storage, all commit through Iceberg. Optimistic concurrency handles the merge.
Read isolation — Iceberg snapshot isolation; in-flight writes don't change a query mid-flight.
Schema evolution — Iceberg's standard rules (additive columns, nullable widening) work by default.

Security

Sinks and the Query engine authenticate against the Catalog with a Quix-managed bearer token. You don't configure auth tokens manually.
Your data lives in your bucket; Quix never copies the Parquet anywhere outside it.
Workspace boundaries are enforced at the portal layer.