Skip to content

Quix Lake Open format

Quix Lake stores Kafka messages and metadata as open files in your blob storage (S3, GCS, Azure Blob, MinIO). The layout favors portability, fast discovery, and full-fidelity replay.

Read with anything

Files are standard Avro and Parquet. Open them with DuckDB, Spark, Trino or Presto, Pandas or PyArrow, Athena, or BigQuery external tables.

Storage layout (Hive-style)

<bucket>
  <workspaceId>
    Raw
      Topic=<topic>
        Key=<stream-key>
          Start=<yyyy-mm-dd>
            ts_<startTs>_<endTs>_part_<p>_off_<first>_<last>.avro[.codec]

    Metadata/
      Topic=<topic>
        Key=<stream-key>
          index_raw_<partition>_<seq>.parquet   # index files
          metadata_*.parquet                    # optional custom metadata

Schemas and formats

Short description

  • Every Kafka message is persisted as one Avro record.
  • Captures payload, timestamp, partition, offset, headers, and optional gap markers for periods with no messages.

Record fields

Field Type Meaning
Payload bytes or string Serialized message value
TimestampMs long Message timestamp in milliseconds
Partition int Kafka partition number
Offset long Kafka offset
Headers map Kafka headers (optional)
Gap boolean True when the row marks a gap (optional)
GapReason string Human-readable reason (optional)

Short description

  • Compact Parquet descriptors summarize where data lives so the Catalog and APIs can discover datasets without scanning Avro.

Columns

Column Type Meaning
Path string Full object path to the referenced file
Topic string Kafka topic
Key string Stream key
Partition int Kafka partition number
TimestampStart long First record timestamp in ms
TimestampEnd long Last record timestamp in ms
OffsetStart long First Kafka offset in the segment
OffsetEnd long Last Kafka offset in the segment
RecordCount long Number of records
FileSizeBytes long Size of the referenced file
CreatedAt datetime Descriptor creation time in UTC
DeletedAt datetime Soft-delete marker (nullable)

Short description

  • Optional key or dataset level properties you add, for example experiment id, labels, or business tags. Stored as Parquet and indexed for search.

Columns

Column Type Meaning
Topic string Kafka topic
Key string Stream key
MetadataKey string Property name
MetadataValue string Property value
UpdatedUtc datetime When this metadata entry was last updated

How it flows

Guarantees

  • Portability with standard Avro and Parquet
  • Replay-ready messages with order, timestamps, headers, partitions, offsets, and gaps preserved
  • Fast discovery using Parquet index instead of scanning Avro
  • A stable base for time-series and SQL over Parquet

See also