Lakehouse Sink

The Lakehouse Sink consumes a Kafka topic, writes time-partitioned Parquet files to your blob storage, and registers each write as an Apache Iceberg table snapshot — so the Query engine can SQL over the table.

It is a separate connector from the Data Lake Sink. Choose this one when you want SQL access over your topic; choose Data Lake when you want byte-perfect replay.

Prerequisites

A blob storage connection configured for the cluster
A Lakehouse provisioned for that blob storage

Configuration

The sink is a managed service. You configure it through the connector UI or in your pipeline YAML.

Required

Setting	Notes
`topic`	Kafka topic to consume. The table name in the catalog is derived from the topic name.
`timestampColumn`	Field in the payload used as the Iceberg time partition key

Partitioning

Setting	Notes
`hiveColumns`	Additional partition columns (Hive-style) — typically a stream key or another high-cardinality dimension

Quix Streams / Kafka

Setting	Default	Notes
`consumerGroup`	`quixstreams-default`	Use a unique group per sink
`autoOffsetReset`	`latest`	`latest` or `earliest`
`commitTimeInterval`	`60`	Seconds between offset commits
`commitMsgInterval`	`0`	Messages between commits (0 = disabled)
`logLevel`	`INFO`	`INFO` or `DEBUG`

Schema

The sink auto-discovers the schema from incoming Kafka messages. You don't supply one explicitly. Schema evolution follows Iceberg's standard rules (additive columns and nullable widening).

Example YAML

deployments:
  - name: Lakehouse - Sink
    application: Lakehouse.Sink
    version: latest
    deploymentType: Managed
    resources:
      limits:
        cpu: 500
        memory: 1024
      replicas: 1
    configuration:
      topic: telemetry
      consumerGroup: lakehouse-sink-telemetry
      autoOffsetReset: latest
      timestampColumn: ts
      hiveColumns: deviceId

What it writes

Parquet data files on your blob storage, partitioned by time and any hiveColumns you set
Iceberg metadata — table metadata files (snapshot, manifest list, manifests) committed via the catalog
Catalog updates — each commit registers a new Iceberg snapshot

Operational behavior

Offset commits — committed after Parquet writes and catalog updates succeed (at-least-once delivery).
Ordering — per Kafka partition.
Scaling — increase replicas to parallelize by Kafka partitions; Iceberg's optimistic concurrency reconciles snapshot writes from multiple replicas.
Recovery — restarts pick up from the last committed offset.

Monitoring

Logs — per-batch writes, catalog commits, retries
Metrics — records persisted, bytes written, batches/sec
Lakehouse UI — new partitions and snapshots surface immediately after the catalog commit

Security

Uses the cluster's blob storage connection.
Authenticates against the catalog with a Quix-managed token — you don't configure it.
Honors workspace boundaries enforced by Quix.

Running alongside `DataLake.Sink`

The Lakehouse Sink and the Data Lake Sink are independent connectors. To get both Replay (Data Lake) and SQL (Lakehouse) for the same topic, deploy one of each with different consumer groups. Each progresses at its own pace; storage cost roughly doubles for the topic.