Use case: Apache Iceberg

Source

Ingest data from any source, including popular streaming technologies like Apache Kafka, AWS MSK or AWS Kinesis. Use out-of-the-box connectors, or when that’s not enough, you can quickly customise a connector by forking the nearest example. Building custom connectors is easy with Quix’s pure Python Source API.

Transformation

Prepare your data with Quix Streams, an open source Python library for processing data with streaming DataFrames. Use in-built operators for aggregation, windowing, filtering, group-by, branching, merging and more. Integrate and enrich your data before loading to Iceberg by connecting caches and external systems.

Destination

Sink data to cloud blob stores in Iceberg format, including AWS S3, GCS and Azure Blob Storage. Other databases, lake formats and warehouses are also supported. Quix sink connectors will automatically handle back pressure and checkpointing to ensure no data is duplicated or lost and your database is not overloaded.

What can you build with Quix?

Integrate your data your way

Quix provides out of the box connectors for many destinations, including databases, data lakes and data warehouses. Unlike alternatives like Kafka Connect, they are not a black box: you can fork them into your own Git repository and customise them to your use case.

Pre-process data in a tabular data format

Quix’s open source Python library for stream processing, Quix Streams, enables you to transform your data in stream using a tabular data format. You can also aggregate, join, downsample or enrich data from any cache or external system.

Pure Python

Both connectors and transformations are written in pure Python, so data engineers and scientists can easily customise data ingestion pipelines. Specialized Source, Processing and Sink API’s take care of the heavy lifting so you can get the job done with less headaches.

No throughput limits

Send as much data as you want, Quix’s serverless infrastructure will be able to handle it. The Quix connectors will also handle any backpressure and checkpointing to ensure no data is duplicated or lost, and your systems aren’t overloaded.

No limits on how you structure your data

If your raw data has lots of nested layers that are not optimal for Iceberg, you can easily re-structure your data before sinking it to your cloud object storage.

Transform your data efficiently for optimal ingestion to databases, lakes and warehouses

Better performance and lower TCO