You got stream processing to work. Now how do you get it to scale?

Data scientists and engineers are frustrated by the challenges of scaling data infrastructure. They know what’s needed, but they lack the time, resources and expertise to implement and maintain it.‍

Mike Rosam

CEO & Co-Founder

Colorful shiny dots connected with each other.

Leadership often underestimates infrastructure challenges

Data science is hard to explain and hard to understand. Leadership doesn’t always know the difference between a data scientist, a data analyst and a data engineer — or which one they need. Eyes are likely to glaze over when you try to explain why you can’t just plug real-time data streams into the existing infrastructure and make it work.

In an attempt to describe data quality issues, former data scientist and engineer Dan Friedman writes in Data Science: Reality Doesn’t Meet Expectations, “I’d often compare it to a garbage bag that ripped, had its content spewed all over the ground, and your partner has asked you to find a beautiful earring that was accidentally inside.”

“Companies struggle to hire and retain someone with the skills to solve the infrastructure issues and complete the desired project.”

This is the mess many data engineers and scientists face at the beginning of a data project. There are common complaints about the lack of quality data, messy storage methods, and poor data infrastructure. These challenges can leave a data project stranded as companies struggle to hire and retain someone with the skills to solve the infrastructure issues and complete the desired task.

Data scientists nd data engineers statistics.

Source: Stack Overflow, 2020 Developer survey.

Scale and efficiency in data stream processing

“Through 2020, 80% of AI projects will remain alchemy, run by wizards whose talents will not scale in the organization."

Gartner Top Strategic Predictions for 2019 and Beyond

Getting data stream processing to work is good, but it’s not good enough. Too often, successful projects hinge on individual talent and, therefore, won’t scale. As volume and ambition increase, the underlying architecture must scale up to keep pace.

“Scalability is a vital part of data stream processing infrastructure and something to consider from day one to avoid time-consuming rebuilds”

Scalability is a vital part of data stream processing infrastructure and something to consider from day one to avoid time-consuming rebuilds, such as those needed by companies like Alibaba and Twitter.

Real-time data pipeline during 11.11 Global Shopping Festival. Source: Alibaba

Retail giant Alibaba uses stream processing to handle peak traffic and provide real-time data for media displays, business intelligence and live studio products for executives and customer service representatives. In 2017, the Alibaba streaming computing team fully upgraded their stream computing architecture with an average of more than twice the peak stream processing capability and more than five times processing capability.

The upgrades enabled Alibaba to successfully process 256,000 payments per second at peak volume during the 2017 Singles Day shopping festival — more than double the volume processed in 2016. Since then, Alibaba has continued investing in stream processing infrastructure, acquiring Apache Flink backer data Artisans in 2019.

Twitter also made significant upgrades to its stream processing infrastructure. To avoid a lengthy migration process, it opted to build a new system dubbed “Heron” to replace their existing system, “Storm,” explained Baz Bhäte in “Twitter Storm vs Heron. A real-time streaming system demands.” Heron latency is 5–15x lower than Storm’s latency and increases more gradually; throughput is 10–14x higher than that of Storm, according to Bhäte.

Massive companies like Alibaba and Twitter have the resources to build and rebuild data stream infrastructure from scratch. But there are solutions for smaller companies that lack the resources to build, scale and maintain their own stream processing infrastructure.

Scale stream processing solutions faster with Quix

I talk to engineers who have solutions but don’t have the time to scale them. For them, Quix is the rare silver bullet. They can migrate those solutions into our platform and scale from there.

We’ve taken care of all the complicated infrastructure and made it Python-friendly, so developers and data scientists can get straight to coding or building their models. This is the type of big win that can erase months of frustration and open up new opportunities for data-driven products. And teams can do what they do best, feeling proud of what they accomplish.

Once you start using stream processing, it’s easy to see more use cases and — with Quix — scaling up to meet those needs is as easy as sliding a switch. Want to learn more? Book a demo with one of Quix’s friendly experts to talk through your technical challenges in scaling your data infrastructure. We’re here to help, join us in our Slack channel if you have any questions.

Last updated:

Dec 12, 2023

Share this article: