Python-native processing and connectors
Pure Python, meaning no wrappers around Java and no cross-language debugging.
Sources & Sinks API for building custom connectors that integrate data with Kafka whilst handling retries and backpressure. JSON, Avro, Protobuf and schema registry support to keep your ever-changing data valid and clean.
Process streaming data using DataFrames API
Treat real-time data streams as continuously updating tables via a Streaming DataFrame API. Ideal for transitioning projects from Pandas or PySpark.
Use built-in operators for aggregation, grouping, windowing, filtering, branching, merging, and more to build stateful applications in fewer lines of Python code.
Flexible, scalable and fault tolerant
Checkpointing and exactly-once processing guarantees ensure your data pipelines are durable and fault-tolerant through unpredictable infrastructure issues.
By leveraging Kafka, you’re building infinitely horizontally scalable services. Data replication ensures redundancy and high availability for your data consumers.