Integrate data from multiple sources into an Iceberg Lakehouse
With this project you can:
- Migrate data from multiple source systems to Apache Iceberg
- Standardize data formats across different input sources
- Support both streaming and database change data capture (CDC)
- Process data from IoT devices using MQTT and Telegraf
- Maintain data consistency during migration
- Track resource usage with built-in monitoring
- Scale each component independently with containerized deployment
Main project components
MQTT Source
Collect data from IoT devices and sensors using the MQTT protocol. This example uses a mock data source, but you can easily connect it to a real MQTT broker.
InfluxDB 2.0 Source
Extract incoming time series data from InfluxDB using the Quix InflluxDB connector.
Postgres CDC Source
Capture database changes from PostgreSQL in real time using the Quix CDC connector for PostgreSQL.
Telegraf Source
Collect metrics and sensor data using Telegraf agents. This example uses a mock data source, but you can easily connect Quix to a real device running a Telegraf agent.
Normalization services
Four normalization services that each standardize data the various sources (MQTT, InfluxDB, PostgreSQL, and Telegraf). These services each write their data into a single normalized data stream.
AWS S3 Iceberg Sink
Read from the normalized data stream and write the normalized data to Apache Iceberg tables stored in AWS S3.
Technologies used
Using this template
To use this template, fork the project repo, sign up for a Quix account and create a project based on your new fork. For more details, see this guide to creating a project.
To write to S3 you’ll need to provide your AWS credentials as application secrets.
Once you’re set up, you can connect it to whatever data sources you like using one of our connectors. Then, adapt the normalization logic to fit your data.