What is Quix Lake
Quix Lake is the central storage layer of Quix Cloud. It captures, organizes, and manages Kafka topic data in an open, file-based format on blob storage systems such as Amazon S3, Azure Blob, Google Cloud Storage, or MinIO.
Instead of relying on proprietary databases, Quix Lake uses open formats and Hive-style partitioning so your data stays:
- Portable: readable by tools that support Avro and Parquet
- Efficient: optimized for analytics with Parquet plus partitions
- Flexible: usable in both real-time and historical pipelines
- Future-proof: foundation for Replay, metadata search, and time series queries
Prerequisites
A blob storage connection must be configured for the environment. See Blob storage connections in the related links.
Why it exists
Earlier persistence options were tied to specific databases and SDKs, which limited replay fidelity and format choice. Quix Lake rethinks this:
- Kafka messages are persisted exactly as they arrive, including timestamps, headers, partitions, offsets, and idle gaps
- Metadata is indexed alongside raw data to enable fast discovery without scanning Avro
- Services like API, Replay, and Sink operate directly on the open files in your bucket
- You keep full control of storage, security, and lifecycle in your own cloud account
Where your data lives
Data is written to your bucket in a predictable layout for easy discovery and external tooling.
Examples
Raw Avro:
<bucket>/<workspaceId>/Raw/Topic=csv-data/Key=B/Start=2025-08-21/
ts_1755776884034_1755776886089_part_0_off_331135_331334.avro.snappy
Parquet index and custom metadata:
<bucket>/<workspaceId>/Metadata/Topic=data-source-json/Key=6/
index_raw_0_129879.parquet
metadata_<...>.parquet
See open format for the full layout and schemas.
What you can do
- Explore datasets with the Quix Lake UI or API
- Replay persisted datasets back into Kafka with full fidelity
- Search and filter by time ranges, topics, keys, and custom metadata
- Query externally using DuckDB, Spark, Trino, Athena, or BigQuery over Avro and Parquet
Cross-environment access
With the right permissions, you can browse datasets written by other environments using the Environment switcher in the Catalog.
How it works
- Ingest: a sink writes raw Kafka messages to Avro files in your storage
- Index: Parquet index files summarize time, partition, offsets, and sizes
- Discover: the UI and APIs read the index to list and filter quickly
- Replay: any discovered dataset can be streamed back to Kafka with original order and timing preserved or simulated
- Use: build pipelines that mix historical data with live streams, and run queries over Parquet
Operational behavior
- Soft deletion: catalog deletions move items to Trash for a short retention window before permanent removal, with restore and delete-forever actions
- Security: you control IAM, keys, encryption, retention, and audit in your own cloud account