Lakehouse overview
Lakehouse is the query-first option in Quix Lake. It persists Kafka topic data as Apache Iceberg tables on your blob storage, with a Quix-managed catalog and a SQL query engine — so you can run interactive SQL and time-series analytics without standing up a separate warehouse.
If you need byte-for-byte replay fidelity rather than query access, see Data Lake. Not sure which to pick? Read Choosing between them in the Quix Lake overview.
What you get
- SQL access — query historical topic data with standard SQL; no warehouse to manage
- Apache Iceberg tables — open table format on your bucket. The Parquet files and Iceberg metadata are readable by external Iceberg-aware engines as well
- Time-series friendly — Iceberg partitioning + columnar Parquet means range scans and aggregates prune aggressively
- Catalog-backed — schemas, partitions, snapshots, and statistics tracked by a Quix-managed catalog so the Query engine and UI never have to scan storage
- Yours — Parquet lives in your blob storage; only catalog metadata lives in Quix-managed services
Prerequisites
A blob storage connection must be configured for the cluster. The same connection can be shared with the Data Lake Sink if you want both.
Components
A Lakehouse is provisioned per blob storage connection and shared across the workspaces that use it. Once it's set up, you interact with two surfaces:
| Surface | What it does |
|---|---|
| Lakehouse Sink | Connector you deploy per Kafka topic. Writes Iceberg tables to your bucket and registers them with the catalog. |
| UI | In-portal page to browse tables, inspect schemas, and run SQL interactively. |
Behind the scenes, the Lakehouse runs a small set of managed services — see Catalog, Query, and Database if you want to understand how it fits together. You don't deploy or configure these directly; Quix provisions and operates them for you.
How it works
flowchart LR
kafka([Kafka topic])
sink[Lakehouse Sink]
bucket[("Your blob storage<br/>Parquet + Iceberg metadata")]
query[Query engine]
catalog[Catalog]
ui[Lakehouse UI]
apps[Your apps]
kafka --> sink
sink -- "register snapshot" --> catalog
sink -- "Parquet files" --> bucket
query -- "table metadata" --> catalog
query -- "read Parquet" --> bucket
ui -- "SQL" --> query
apps -- "SQL" --> query
- Ingest — the Lakehouse Sink consumes from Kafka and writes Parquet into an Iceberg table on your bucket.
- Register — the Catalog commits the new files into a new Iceberg snapshot, tracking schema, partition specs, and file statistics.
- Query — the Query engine plans SQL against the latest snapshot, prunes files by partition + column statistics, and reads only the relevant Parquet.
- Explore — use the UI to browse tables and run queries in the portal, or call the Query service directly from your own apps.
Multi-workspace sharing
- The Lakehouse backend is provisioned per blob storage connection. Workspaces that share a connection share the Lakehouse and its tables.
- Sinks are deployed per workspace and bind to the Lakehouse for the workspace's blob storage automatically.
Operational behavior
- Concurrent writers — multiple sink replicas, and sinks from different workspaces that share the blob storage, all commit through Iceberg. Optimistic concurrency handles the merge.
- Read isolation — Iceberg snapshot isolation; in-flight writes don't change a query mid-flight.
- Schema evolution — Iceberg's standard rules (additive columns, nullable widening) work by default.
Security
- Sinks and the Query engine authenticate against the Catalog with a Quix-managed bearer token. You don't configure auth tokens manually.
- Your data lives in your bucket; Quix never copies the Parquet anywhere outside it.
- Workspace boundaries are enforced at the portal layer.
See also
- Lakehouse Sink — persist a topic as an Iceberg table
- UI — browse and query in the portal
- Query — SQL surface for your apps
- Catalog — how table metadata is tracked
- Database — backing storage for the Catalog
- Blob storage connections
- Data Lake overview — replay-first alternative