Consuming Multiple Topics with Applications
Applications
now support consuming multiple topics by initializing multiple
StreamingDataFrames
(SDF
). This may also be referred to as a multi-SDF
Application
.
Multi-Topic Use Cases
The benefits of consuming from multiple topics in one Application are a little more nuanced, but the main benefits are:
Consolidating Applications
It may help to consolidate two or more Applications
that share similar
operational contexts.
Code Sharing
It's now much easier to share/use code that applies to multiple topics by having it
all in one Application
.
Joining Topics (coming soon)
Joins will vastly simplify many problems that require handling data from multiple topics at once.
Using Multiple Topics
Initialize an Application
and all topics as normal with Application.topic()
.
Then, for each consumer topic name T
, initialize a SDF
as normal with
sdf_T-name = Application.dataframe(T)
, stored as a unique variable (ex: sdf_T-name
here).
The Application
will track all SDF
s generated this way and will execute all of
them when Application.run()
is called.
Note that you cannot use the same topic across multiple SDF
s.
Simple Example
from quixstreams import Application
app = Application("localhost:9092")
input_topic_a = app.topic("input_a")
input_topic_b = app.topic("input_b")
output_topic = app.topic("output")
sdf_a = app.dataframe(input_topic_a)
sdf_a = sdf_a.apply(func_x).to_topic(output_topic)
sdf_b = app.dataframe(input_topic_b)
sdf_b.update(func_y).to_topic(output_topic)
app.run()
Branching vs Multi-SDFs
Branching is independent of multi-SDF
;
branches can be used in each of the SDF
s from multiple topics, but they cannot
interact with one another in any way.
StreamingDataFrame
Usage
For each SDF
, add operations as normal. Each topic's messages will be processed by
its respective SDF
.
Limitations
There are no additional restrictions for SDF
's when used with multiple topics.
However, each SDF
should be treated like the others do not exist: they cannot
interact or share any operations with one another in any way.
State
Each SDF
's state used in a multi-SDF
implementation is entirely independent
(including all stateful=True
operations), meaning SDF
s cannot access or manipulate
the state of another SDF
.
As a reminder, state is ultimately tied to a given topic (and thus its SDF
).
See here to learn more about stateful processing.
Multiple Topics: NOT parallel
Though multiple StreamingDataFrame
s are involved with multiple topics, they do not run in parallel:
- The
Application
instance always has a single consumer, which reads messages one-by-one from multiple topics. - After the message is consumed from the topic, it is routed to the corresponding
StreamingDataFrame
responsible for the processing of this topic.
Processing multiple topics directly affects the throughput for each topic because more messages will be processed using the same resources.
Upcoming Features
Joins
Joins are a way of combining two topics together into one data stream using various options and conditions.
They are on the immediate roadmap.