Optimizing Data Management with Preprocessing
This project is an example of adding the power of preprocessing to your analytics pipeline. We show you how to combine your historical static data with real-time data then add preprocessing for compliance, storing only the data that's essential for reporting and analysis. This will help to maintain adherence to data processing regulations as well as reduce compute and storage costs.
Main project components
User DB CDC
Subscribe to changes made to the User Database.
Analytics API
Subscribe to telemetry data being emitted from your servers.
Anonymization
Sanitize confidential data before it hits your analytics process.
Preprocessing Service
Combine, filter and aggregate your real time and static data sources before performing further analysis.
Recommendation Engine
Generate customer recommendations in real time on live data.
S3 Apache Iceberg Sink
Sink analytics data to S3 in Apache Iceberg format.