June 6, 2024

Quix Streams—a reliable Faust alternative for Python stream processing

A detailed comparison between Faust and Quix Streams covering criteria like performance, coding experience, features, integrations, and product maturity.

Featured image for the "Quix Streams, a reliable Faust alternative for Python stream processing " article published on the Quix blog
Quix offers a pure Python framework for building real-time data pipelines. It's a Kafka client with a stream processing library rolled into one. No JVM, no cross-language debugging—just a simple Pandas-like API for handling streaming data. Deploy in your stack or on Quix Cloud for scalable, stateful, and fault tolerant stream processing.

Nowadays, Python is the most popular programming language in the world for data science, analytics, data engineering, and machine learning. These use cases often require collecting and processing streaming data in real time. 

Stream processing technologies have traditionally been Java-based. However, since many of Python’s real-world applications are underpinned by streaming data, it’s no surprise that an ecosystem of Python stream processing tools is emerging. 

Faust, PySpark, and PyFlink are arguably the most well-known Python stream processors available today, as they have been around for the longest. Although, to be fair, of these three, Faust is the only pure Python stream processing engine with a client-side framework. Meanwhile, PySpark and PyFlink are wrappers around server-side Java engines (Apache Spark and Apache Flink). This can lead to additional complexity — check out our PyFlink deep dive to learn more about the challenges of building and deploying PyFlink pipelines. 

In any case, Faust, PySpark, and PyFlink are not the only options available for Python stream processing. Newer, next-generation alternatives have started to appear over the last couple of years, with Quix Streams being a prime example.

This article explores why Quix Streams is a viable alternative to Faust for Python stream processing. We’ll compare them based on the following criteria:

  • Coding experience
  • Performance
  • Ease of use
  • Stream processing features
  • Integrations and compatibility
  • Maturity and support

Faust overview

Originally developed at Robinhood, Faust is an open-source Python stream processing library that works in conjunction with Apache Kafka. 

Faust is used to deliver distributed systems, real-time analytics, real-time data pipelines, real-time ML pipelines, event-driven architectures, and applications requiring high-throughput, low-latency message processing (e.g., fraud detection apps and user activity tracking systems).

Key Faust features and capabilities

  • Supports both stream processing and event processing
  • No DSL
  • Works with other Python libraries, such as NumPy, PyTorch, Pandas, Django, and Flask 
  • Uses Python async/await syntax
  • High performance (a single worker instance is supposedly capable of processing tens of thousands of events every second )
  • Supports stateful operations (such as windowed aggregations) via tables
  • Highly available and fault tolerant by design, with support for exactly-once semantics (at least in theory) 

Faust certainly offers an appealing set of features and capabilities. But some users have reported drawbacks, too. For example, Kapernikov (a software development agency) assessed several stream processing solutions for one of their projects, including Faust. However, Faust was ultimately disqualified — you can read this article written by Frank Dekervel and published on the Kapernikov blog to find out the reasoning behind this decision (in a nutshell, the Kapernikov team concluded that Faust can be unreliable, exactly-once processing doesn’t always work as expected, the code can be verbose, and testing can be problematic). 

Another example — in a tutorial published on Towards Data Science, the author, Ali Osia (CTO and Co-Founder at InnoBrain) mentions that “Faust’s documentation can be confusing, and the source code can be difficult to comprehend”.   

It’s also concerning that Robinhood has abandoned Faust (it’s unclear why exactly). There is now a fork of Robinhood’s original Faust repo which is maintained by the open source community. However, it has many open issues, and without any commercial backing or significant ongoing community involvement, it’s uncertain whether Faust will continue to mature (and to what extent).

Quix Streams overview

Quix Streams is an open-source, cloud-native library for data streaming and stream processing using Kafka and pure Python. It’s designed to give you the power of a distributed system in a lightweight library by combining Kafka's low-level scalability and resiliency features with an easy-to-use Python interface.

Quix Streams enables you to implement high-performance stream processing pipelines with numerous real-world applications, including real-time machine learning, predictive maintenance, clickstream analytics, fraud detection, and sentiment analysis. Check out the gallery of Quix templates and these tutorials to see more concrete examples of what you can build with Quix Streams. 

Key Quix Streams features and capabilities

  • Pure Python (no JVM, no wrappers, no DSL, no cross-language debugging)
  • No orchestrator, no server-side engine
  • Compatible with various Kafka brokers: Apache Kafka brokers version 0.10 or later, Quix Cloud-managed brokers, and Confluent Cloud, Redpanda, Aiven, and Upstash brokers
  • Streaming DataFrame API (similar to pandas DataFrame) for tabular data transformations
  • Easily integrates with the entire Python ecosystem (pandas, scikit-learn, TensorFlow, PyTorch, etc.)
  • Support for many serialization formats, including JSON (and Quix-specific)
  • Support for stateful operations (using RocksDB)
  • Support for aggregations over tumbling and hopping time windows
  • At-least-once Kafka processing guarantees
  • Designed to run and scale resiliently via container orchestration (Kubernetes)
  • Easily runs locally and in Jupyter Notebook for convenient development and debugging
  • Seamless integration with the fully managed Quix Cloud platform

Faust vs Quix Streams: A head-to-head comparison

Now that we’ve introduced Faust and Quix Streams, it’s time to see how these two Python stream processors compare based on criteria like performance, ease of use, coding experience, feature set, and integrations. These are all essential factors to consider before choosing the stream processing technology that’s best suited for your needs. 

Coding experience

I’ll start by comparing the coding experience when using Quix Streams and Faust. Specifically, I’ll show you how to use these two solutions to calculate the rolling aggregations of temperature readings coming from various sensors. While Quix/Faust acts as the stream processor in our example, Redpanda is the (Kafka-compatible) streaming data platform used to: 

  • Collect raw sensor readings and serve them to Quix Streams/Faust for processing.
  • Store the results of temperature aggregations once they have been calculated by the stream processing component. 

Quix Streams code example

Here’s how to install Quix Streams:

pip install quixstreams

And here’s the stream processing logic:

import os
import random
import json
from datetime import datetime, timedelta
from dataclasses import dataclass
import logging
from quixstreams import Application

logger = logging.getLogger(__name__)

TOPIC = "raw-temperature" # Redpanda input topic
SINK = "agg-temperature"  # Redpanda output topic
WINDOW = 10  # defines the length of the time window in seconds
WINDOW_EXPIRES = 1 # defines, in seconds, how late data can arrive before it is excluded from the window

app = Application(

input_topic = app.topic(TOPIC, value_deserializer="json")
output_topic = app.topic(SINK, value_serializer="json")

sdf = app.dataframe(input_topic)
sdf = sdf.update(lambda value:"Input value received: {value}"))

def initializer(value: dict) -> dict:

    value_dict = json.loads(value)
    return {
        'count': 1,
        'min': value_dict['value'],
        'max': value_dict['value'],
        'mean': value_dict['value'],

def reducer(aggregated: dict, value: dict) -> dict:
    aggcount = aggregated['count'] + 1
    value_dict = json.loads(value)
    return {
        'count': aggcount,
        'min': min(aggregated['min'], value_dict['value']),
        'max': max(aggregated['max'], value_dict['value']),
        'mean': (aggregated['mean'] * aggregated['count'] + value_dict['value']) / (aggregated['count'] + 1)
### Define the window parameters such as type and length

sdf = (
    # Define a tumbling window of 10 seconds
    sdf.tumbling_window(timedelta(seconds=WINDOW), grace_ms=timedelta(seconds=WINDOW_EXPIRES))

    # Create a "reduce" aggregation with "reducer" and "initializer" functions
    .reduce(reducer=reducer, initializer=initializer)

    # Emit results only for closed 10 second windows

### Apply the window to the Streaming DataFrame and define the data points to include in the output
sdf = sdf.apply(
    lambda value: {
        "time": value["end"], # Use the window end time as the timestamp for message sent to the 'agg-temperature' topic
        "temperature": value["value"], # Send a dictionary of {count, min, max, mean} values for the temperature parameter

sdf = sdf.to_topic(output_topic)
sdf = sdf.update(lambda value:"Produced value: {value}"))

if __name__ == "__main__":"Starting application")

Quix Streams offers a high-level abstraction over Kafka with pandas DataFrame-like operations (read this article to learn more about the Quix Streaming DataFrame API). This declarative syntax enhances readability and reduces code length, providing a streamlined experience to Python developers.  

Note that this was a short, simplified demonstration focusing strictly on Quix Streams as a stream processor. For the complete, extended version of this use case, please refer to the “Aggregating Real-time Sensor Data with Python and Redpanda” blog post published by my colleague and Quix CTO, Tomáš Neubauer in Towards Data Science. That blog post contains additional details and helpful context, including how to set up Redpanda and run the stream processing pipeline.

Faust code example

Once you’ve installed Faust, here’s how to define the stream processing logic:

import os
import random
from datetime import datetime, timedelta
from random import randint

import faust

TOPIC = "raw-temperature"
SINK = "agg-temperature"
TABLE = "tumbling-temperature"
WINDOW = 10  # 10 seconds window

app = faust.App(

### Input records
class Temperature(faust.Record, isodates=True, serializer="json"):
    ts: datetime = None
    value: int = None

### Output records
class AggTemperature(faust.Record, isodates=True, serializer="json"):
    ts: datetime = None
    count: int = None
    mean: float = None
    min: int = None
    max: int = None

Source = app.topic(TOPIC, value_type=Temperature)
sink = app.topic(SINK, value_type=AggTemperature)

WINDOW = 10  # 10 seconds window

tumbling_table = (
    .tumbling(WINDOW, expires=timedelta(seconds=WINDOW_EXPIRES))

def window_processor(key, events):
    timestamp = key[1][0]  # key[1] is the tuple (ts, ts + window)
    values = [event.value for event in events]
    count = len(values)
    mean = sum(values) / count
    min_value = min(values)
    max_value = max(values)

    aggregated_event = AggTemperature(
        ts=timestamp, count=count, mean=mean, min=min_value, max=max_value

        f"Processing window: {len(values)} events, Aggreged results: {aggregated_event}"

    sink.send_soon(value=aggregated_event) # sends the aggregation result to the destination topic

@app.agent(app.topic(TOPIC, key_type=str, value_type=Temperature))
async def calculate_tumbling_temperatures(temperatures):
    async for temperature in temperatures:
        value_list = tumbling_table["events"].value()
        tumbling_table["events"] = value_list

Here are a few key takeaways:

  • Faust provides high-level abstractions like tables and agents and agents that simplify the development of distributed stream processing applications using Kafka and Python.
  • Faust supports asynchronous operations (it uses Python’s native async/await syntax).
  • Faust aims to port many of the concepts and capabilities offered by Kafka Streams (a Java-based technology) to Python. 

Note that the snippet above is a simplified demonstration, focusing exclusively on Faust. For a complete, end-to-end example that discusses the Faust code in detail and offers instructions on how to set up Redpanda, see the “Stream processing with Redpanda and Faust” blog post. 

Stream processing features

Here’s how Quix Streams and Faust compare when it comes to core stream processing functionality:

Feature Quix Streams Faust
Supported operations & transformations

With Quix Streams, you can perform the following operations:

  • apply() – apply a function to transform the value and return a new value
  • group_by() – regroup messages during processing
  • contains()– check if the key is present in the row value
  • update() – apply a function to mutate value in-place or to perform a side effect that doesn't update the value (e.g. print a value to the console)
  • filter() – filter values 
  • to_topic() – produces current value to a Kafka topic
  • compose() – composes all functions into one big closure

Note that you can implement custom processing functions using the apply(), update(), and filter() methods. For example, you can use apply() to select and extract columns or transform data from one format to another.   

Stream joins are on the roadmap.

Learn more about processing and transforming data with Quix Streams

With Faust, you can combine (join) streams and use the following operations:

  • group_by() – repartition the stream
  • items() – iterate over keys and values
  • events() – access raw messages
  • take() – buffer up values in the stream
  • enumerate() – count values
  • through() – forward through another topic
  • filter() – filter values to omit from the stream
  • echo() – repeat to one or more topics
Stateful processing Supports stateful operations (with RocksDB as a state store) Supports stateful operations (with RocksDB and Aerospike as state stores)
Windowing & aggregations

Offers aggregations over tumbling and hopping windows. The following aggregation functions are supported:

  • reduce() perform custom aggregations using "reducer" and "initializer" functions
  • min() – get a minimum value within a window
  • max() – get a maximum value within a window
  • mean()– get a mean value within a window 
  • sum() – sum values within a window
  • count() count the number of values within a window

Aggregations over sliding windows are on the roadmap

Per its documentation, Faust offers aggregations over tumbling, hopping, and sliding windows. You can use the following types of aggregation functions:

  • min() 
  • max() 
  • mean()
  • sum() 
  • count()
    Processing guarantees At-least-once processing, with exactly-once processing on the roadmap In theory, Faust offers exactly-once processing. In practice, there are existing bugs that prevent you from achieving exactly-once semantics

    Both Quix Streams and Faust offer diverse capabilities for processing and transforming streaming data. Quix Streams is in the process of expanding its capabilities in the future (stream joins, aggregations over sliding windows, and exactly-once processing guarantees are some of the planned improvements). Meanwhile, it’s unclear if and when new stream processing capabilities will be added to Faust — I couldn't find any public roadmap to refer to.

    Essentially, Quix Streams offers rich capabilities and is evolving with promising enhancements on the horizon, while Faust offers an established toolkit with some practical limitations (around exactly-once processing, for example). 

    Performance and reliability

    Regarding Faust: unfortunately, there’s very little information available about what kind of latency, throughput, and scale you can expect from Faust. I couldn’t find any blog posts or benchmarks covering this topic. 

    The Faust documentation states that a single-core Faust worker instance can already process tens of thousands of events every second. It is designed to be highly available and can survive network problems and server crashes. In the case of node failure, it can automatically recover.

    However, there are a few unresolved issues around performance that might put some people off. For example:

    • Issue #175 - a Faust application should be able to work indefinitely, but consumers stop running one by one until they've all stopped responding and the entire application hangs. 
    • Issue #214 - Faust sometimes crashes during rebalancing.
    • Issue #306 - after being idle, producers are slow to send new messages, and consumers are slow to read them.
    • Issue#247 - memory keeps increasing (and is never freed) even when Faust agents (functions that process data) aren’t doing any processing. 

    Quix Streams, on the other hand, is a much newer library so of course it has fewer open issues. The more users you have, the greater the probability that someone will find a bug. Nevertheless, it's worth noting that Quix Streams is built and maintained by Formula 1 engineers with extensive knowledge about data streaming and stream processing at scale. Under the hood, the library leverages Kafka and Kubernetes to provide data partitioning, consumer groups, state management, replication, and scalability. This enables Quix Streams deployments to reliably handle up to millions of messages/multiple GBs of data per second, with consistently low latencies (in the millisecond range). Thus, we haven't seen any performance issues so far.

    Quix Streams provides the most optminal performance when it runs in containers on Quix Cloud—our fully managed platform for running serverless stream processing applications.

    CK Delta uses Quix to process 40GB of Wi-Fi data per day from 180 underground train stations. Learn more.

    In any case, I encourage you to test both Quix Streams and Faust yourself, to see how well they perform at scale under the workload of your specific use case. 

    Ease of use

    Choosing an easy-to-use stream processing solution lessens the complexity of dealing with streaming data and streamlines the development and maintenance of stream processing pipelines. Here’s how Faust and Quix Streams compare in terms of ease of use:

    Criteria Quix Streams Faust
    Official docs & learning resources Detailed, maintained documentation, supplemented by many tutorials, blog posts, and a gallery of templates (pre-built projects) to quickly get started with app development. There are no official learning resources other than docs. However, in this Reddit post, users have labeled the Faust documentation as “bad”. Additionally, it doesn’t look like the docs have been maintained ever since Robinhood abandoned the project. 
    API design

    No DSL.

    Offers a pure Python Streaming DataFrame API, which is essentially a high-level abstraction over Kafka with pandas DataFrame-like operations.

    No DSL. 

    Faust ports Kafka Streams concepts to Python, utilizing async-await syntax for defining asynchronous stream processing tasks.

    Learning curve Gentle learning curve (a few days) Safe to expect a slightly steeper learning curve (up to a few weeks).  That’s because the documentation could benefit from further improvements, and there’s a lack of additional learning resources, making it harder to learn the basics of Faust. 
    Fully managed deployments You can deploy Quix Streams apps to Quix Cloud, a fully managed platform. There are no vendors offering fully managed Faust deployments. You need to deploy and manage Faust applications yourself, either on your own infrastructure or on a cloud provider's compute resources.
    Time to production As long as you’re using Quix Cloud, you can configure and deploy Quix Streams applications in minutes, with very little infrastructure and platform management required.   While Faust is easy to run, provisioning and tuning the underlying infrastructure can take from days to weeks, depending on the complexity and specifics of your use case

    Faust and Quix Streams offer a purely Pythonic experience, with no DSL involved. This is great for any Python developer or data professional looking to leverage any of these tools. Beyond this similarity, Quix Streams is designed to match Faust in terms of user-friendliness and ease of adoption. Quix Streams also comes with extensive, regularly updated documentation, a rich selection of learning materials including tutorials and templates, and the availability of fully managed deployment options. It also enables you to develop and deploy stream processing applications faster, while keeping infrastructure management complexity to the bare minimum. 

    Integration and compatibility

    Performance, ease of use, coding experience, and feature set are critical aspects to consider before choosing a stream processing solution. But so are built-in integrations and compatibility. Here’s how Quix Streams and Faust fare in this regard: 

    Criteria Quix Streams Faust
    Kafka broker compatibility

    Apache Kafka brokers (version 0.10 or later).

    Guaranteed to work well with various managed Kafka solutions:

    • Quix Coud 
    • Redpanda Cloud
    • Confluent Cloud 
    • Aiven for Apache Kafka
    • Upstash Kafka

    Apache Kafka brokers (version 0.10 or later).

    Should work with managed Kafka solutions (at least in theory)


    Easily integrates with the wider Python ecosystem of libraries.

    Connectors for various source and destination systems.

    Easily integrates with the wider Python ecosystem of libraries.

    No sink and source connectors.

    And here are some comments with additional details and context:

    • Faust theoretically works with any Kafka broker (version 0.10 or later), be it managed by a vendor or self-hosted. In practice, though, you’d have to test to see how well Faust works with specific managed Kafka solutions.
    • Quix Streams works with self-hosted Kafka brokers and has been extensively tested to ensure seamless compatibility with various managed Kafka solutions.
    • Both Quix Streams and Faust are pure Python technologies without a DSL, giving you access to the wider Python ecosystem and making it easy to integrate libraries like Pandas, PyTorch, and scikit-learn into your workflows.
    • Quix provides sync and source connectors for systems like InfluxDB, Redis, Postgres, Segment, and MQTT platforms, simplifying your integration process. This is not the case with Faust, which doesn’t offer ready-made connectors or a formal connector library for integrating with various external systems or databases.

    Product maturity and support

    How mature and well-maintained is the product? Is there a community of contributors and users who can help me if needed? These are questions you need to ponder when selecting any new software solution.

    Criteria Quix Streams Faust
    Maturity (accurate as of June 4th, 2024)
    • First released in March 2023
    • Regularly maintained; new releases almost every month or two
    • Latest version: 2.5.1 (released in May 2024)
    • Only a handful of GitHub issues (all enhancements)
    • First released in 2018
    • Frequent releases until February 2020 (while under Robinhood tutelage)
    • No new releases from February until the end of November 2020, due to Robinhood deprecating Faust and pulling support for it. In November 2020, a GitHub fork of the project was created, which is maintained by an open-source community. 
    • Frequent, regular releases since November 2020 ( although most of them are small ones)
    • Latest version: 0.11.0 (released in March 2024)
    • More than 120 open GitHub issues, most of them bugs
    Support community Growing, helpful, and responsive community, actively maintaining and enhancing the library, and supporting its users. It seems there’s currently only one person working to maintain Faust (judging by GitHub contributors). It seems harder to get help with issues you may encounter.

    Faust faces a few challenges. Both its GitHub versions (the initial, now retired Robinhood project, and the newer, community-maintained fork) come with a number of open bugs and issues. As discussed, this is often indicative of a project’s popularity and wide adoption — the more a technology is used, the likelier it’s exposed to a wide range of operating conditions, user interactions, and integration scenarios, each of which can reveal new problems. Unearthing bugs is not a bad thing (quite the contrary). But as with any open-source library, it's unclear how long these issues will take to fix. As many of you know, maintaining an open source project on your own time is hard.

    Like many companies in the streaming processing space, Quix Streams has the luxury of being backed by commercial entity that has a vested interested in maintaining it (much like what Apache Kafka is to Confluent or Spark is to Databricks). A team of developers is dedicated to fixing bugs, releasing new features, and enhancing the library on a regular cadence. There’s also a roadmap of planned improvements, the Quix Community is growing, and users get very fast responses to their technical questions (there’s even an AI assistant to help with questions about using Quix Streams). 

    A brief conclusion

    Faust has the merit of being one of the earliest solutions that made it possible for developers and data professionals to implement stream processing logic using pure Python. However, it has a gotten a bit rough around the edges. Its repo has more than 100+ open issues, impacting things like performance, scalability, and processing semantics, and the documentation is only updated sporadically. It’s unclear what the future looks like for Faust, especially since the community involved in maintaining and improving the project isn't quite as active as it used to be. 

    If you’re looking for reliable, future-proof Faust alternatives for Python stream processing, I encourage you to check out Quix Streams. The team behind Quix Streams has dedicated resources to its maintenance, documentation, expanding its feature set. And you have the option of deploying your Python applications to a tailor-made runtime environment that integrates tightly with the Quix Streams Python library.

    To learn more, check out the following resources:

    What’s a Rich Text element?

    The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.

    Static and dynamic content editing

    A rich text element can be used with static or dynamic content. For static content, just drop it into any page and begin editing. For dynamic content, add a rich text field to any collection and then connect a rich text element to that field in the settings panel. Voila!

    How to customize formatting for each rich text

    Headings, paragraphs, blockquotes, figures, images, and figure captions can all be styled after a class is added to the rich text element using the "When inside of" nested selector system.

    Related content

    Featured image for the "Quix Streams, a reliable Faust alternative for Python stream processing " article published on the Quix blog

    Quix Streams—a reliable Faust alternative for Python stream processing

    A detailed comparison between Faust and Quix Streams covering criteria like performance, coding experience, features, integrations, and product maturity.
    Steve Rosam
    Words by
    The logos of Flink and Python

    Debugging PyFlink import issues

    Solutions to a common issue that Python developers face when setting up PyFlink to handle real-time data.
    Steve Rosam
    Words by
    Featured image for the "Choosing a Python Kafka client: A comparative analysis" article published on the blog

    Choosing a Python Kafka client: A comparative analysis

    Assessing Python clients for Kafka: kafka-python, Confluent, and Quix Streams. Learn how they compare in terms of DevEx, broker compatibility, and performance.
    Steve Rosam
    Words by