December 4, 2024

Industry insights

When a European manufacturing leader needed to modernize their data stack

Learn how an industrial machinery provider went from processing their sensor data in batches to real time using Python and Quix Streams instead of Flux..

Tun Shwe

VP Data

Banner image for the article "When a European Manufacturing Leader Needed to Modernize Their Data Stack" published on the Quix blog

The 4 Pillars of a Successful AI Strategy

Foundational strategies that leading companies use to overcome common obstacles and achieve sustained AI success.

Get the guide

Guide to the Event-Driven, Event Streaming Stack

Practical insights into event-driven technologies for developers and software architects.

Get the guide

An industrial machinery provider with over €3 billion annual revenue faced a critical technical challenge: the query language they relied on for processing sensor data (Flux with InfluxDB) was being deprecated, forcing them to rethink how they analyze data from machinery worth over 10 million Euros each.

Executive Summary:

The Flux deprecation presented an opportunity to transition from batch to real-time processing.
They considered SQL as a replacement, but Python offered them the ability to build complex logic and implement statistical formulas—use cases for which SQL isn't really designed.
Although real-time processing was appealing, they wanted to avoid the complexity of managing the required infrastructure.
They planned to process more data at the edge and use MQTT to address patchy data connectivity.
They didn’t want to a solution that would overwhelm their engineers who had little time for steep learning curves.
They chose Quix because it is a fully-managed, Python-native solution that supports InfluxDB, MQTT, and processing at the edge.

Collecting data from machines worth millions

The customer was a European manufacturing technology provider that designs heavy machinery that processes raw metals into critical industrial components.

Each machine, valued at around €10 million, handles tons of material daily. Though not mass-produced, these machines play a significant role in global manufacturing. A specialist we spoke to estimated that around 30–40% of the world’s copper wire passed through their equipment.

Our customer began collecting sensor data from its machines in 2008, monitoring key metrics like temperature, pressure, and motor health.

Here’s how their batch data pipeline looked when we spoke to them:

Data Acquisition and Storage: They used a software application called Eva Analyzer (developed by a company close to Siemens) to collect and store raw data from sensors on their heavy machinery. This data was first stored locally on a PC next to each machine then transferred to InfluxDB using Telegraf as an agent.
Data Analysis and Visualization: They used InfluxDB to store and process the data centrally and Grafana to visualize it. This setup allowed them to create dashboards and perform some basic statistical analysis using Flux, InfluxDB's scripting language. These Flux queries were run as scheduled tasks on the InfluxDB server and the results were written to a separate bucket (similar to a database).

A product deprecation forces a change to the data pipeline

In 2023, they faced a major challenge: InfluxDB deprecated Flux—the query language that they were using to process data in InfluxDB. Apparently, InfluxDB users found Flux harder to learn and less performant than SQL-like alternatives, leading to poor adoption despite significant investment in its development.

The Flux deprecation required our customer to adopt a new approach to data analysis. They still wanted to use InfluxDB, but they needed a different way to process the data. Originally, InfluxDB encouraged Flux users to move to SQL or InfluxQL (their own SQL-like language tailor-made for processing time-series data) so they evaluated this option first. It would work almost identically to their existing setup, but they would need to re-write the scheduled tasks as SQL tasks.

Recognizing that Python is more flexible than SQL

After an initial assessment, their principal IoT specialist was hesitant to adopt SQL. Their data processing involved relatively simple statistics and mathematical formulas, but expressing these operations in SQL would result in complex and unwieldy code. Python, on the other hand, offered a more elegant and flexible approach to handle these calculations.

“It's not complicated what we do”, admits the IoT specialist, “but we do need to do some statistics, some multiplication, some division. And SQL code is not going to be nice at the end. I think Python is the way to go.”

After he joined the company in 2008, the IoT specialist spearheaded the adoption of InfluxDB and Flux to improve data analysis and visualization. This shift represented a significant advancement in their data processing capabilities. But now it all had to be rewritten. Thus, the specialist saw it as a good opportunity to switch to real-time processing.

The dilemma of how to convert scheduled Flux tasks to Python

Although the team knew Python was the answer, they faced several challenges in adopting it. Since InfluxDB does not support Python tasks, they needed to find a separate solution for scheduling and running the Python-based processing jobs. Thus, this solution would have to read the raw data out of InfluxDB, process it, then write it back in again.

They considered implementing their own solution, and they also looked at batch processing tools such as Apache Airflow and Prefect (which are both Python based). However, none of these options was especially appealing because it involved too much infrastructure management and required too much precious engineering resource (especially since no-one had any experience with such tools).

Quix offers a low-friction path to real-time processing

After evaluating various batch options, the IoT specialist discovered Quix through an InfluxDB webinar. Initially skeptical about adopting a streaming platform, they were won over by Quix's ability to simplify complex data processing patterns.

‍
“First of all, I said we don't need streaming. It's too complex. But then if it's possible to really have the tools so stripped down so that we don't need to deal with all the complexity behind it, then I think it would be an option.”
— The IoT specialist

There were several key factors that drove their decision to try Quix:

Python Integration: Quix is a Python-first tool, which aligned with their desire to move away from SQL and leverage Python's capabilities.
Windowing operations: Quix supports built-in window operations for time-based processing which they needed to segment machine cycles.
Usage Based Pricing: This is available when using Quix Cloud which meant they only paid for what they used—which is important for a pilot project.
Containerized Environment: Because Quix uses containerized processes, they would avoid the risk of runaway resource consumption.
Edge Processing: The open source Quix Streams Python SDK would give them the capability to run processing at the edge.
MQTT Support: Quix supports MQTT, a lightweight messaging protocol suitable for handling data from machines in remote locations with potentially unstable connections.
Event-Driven Architecture: Quix’s event handlers enable efficient processing triggered by specific events, such as the start of a machine operation, addressing the need for event-based analysis.

‍

According to the specialist, the decision to try real-time processing was also "kind of a bet into the future." Indeed, he recognized that the value of data is highest when it's processed closest to the source of the event.

In essence, our customer's preference for real-time processing (despite not strictly needing it at present) reflects a strategic decision. They are motivated by the recognized benefits of real-time insights, the desire to modernize their infrastructure and to keep pace with the industry trends for digitalization and Industry 4.0.

The transition to real-time processing in Quix

The move to Quix was relatively straightforward. Since Quix offers a managed, serverless solution (Quix Cloud) they just needed to create a Quix account and initialize a project using the default Kafka installation that is built into Quix Cloud.

Next, they used the Quix InfluxDB connector to poll InfluxDB for new data and stream the data to a Kafka topic.

This enabled them to start transitioning the Flux logic to Python. Each query was turned into a continuously running Python process that ran inside a container deployed to Quix Cloud (which uses managed Kubernetes under the hood). The calculation results could be continuously written back to InfluxDB for analysis.

Enhancing the real-time pipeline with MQTT and edge processing

The team plans to enhance their data pipeline by integrating MQTT and edge computing in the near future. The IoT specialist envisions MQTT as a bridge to feed data into Kafka, hosted within the Quix platform. This approach leverages MQTT’s lightweight protocol, ideal for gathering data from globally dispersed machines operating in environments with unreliable internet connectivity. Our customer’s distributed operations, including those in regions like China, benefit from MQTT’s resilience. The ultimate objective is to transition from a database-centric to a full streaming architecture, where MQTT continuously collects and processes machine data in real time.

In addition, the IoT specialist anticipates deploying Quix containers on edge devices, aiming to reduce latency, optimize bandwidth, and enhance security. Their current edge hardware includes industrial-grade devices such as the Revolution Pi, and they plan to test out Quix Edge for local processing. While not immediately required, the IoT specialist expects edge computing to become a priority within the next two years, driven by the need for faster responses, reduced data transmission costs, and improved data privacy through localized processing.

Comparing data pipelines: The past and the future

To understand the transition from batch to real-time, we’ve visualized our customer’s previous data pipeline and how it will look once they’re completed their transition.

Before

Raw data was ingested directly into InfluxDB via Telegraf running at the edge (next to the machines). The raw data was processed in batches on the InfluxDB server using scheduled Flux queries.

After

Telegraf will still run at the edge but it will transmit data to an MQTT broker instead of InfluxDB. A Quix Streams process will read the MQTT messages and write them to a Kafka topic for downstream processing. Other Quix Streams processes will read from the Kafka topic and analyze the data in real time. The results of the real-time calculations will then be written to InfluxDB.

This means that InfluxDB no longer needs to process the raw data. This task is offloaded to Quix which writes new calculations continuously to InfluxDB.

Interested in making a transition to real-time processing?

Maybe your team uses InfluxDB and are facing a similar challenge or perhaps you’re in a similar industry and want to move to real-time processing. Why not get in touch to discuss a pilot project?

If you still need convincing, check out these other resources :

Case study: Exploring real-time and batch analytics for e-bike telemetry with Quix and AWS
Case study: Optimizing manufacturing efficiency with streaming data and ML
Documentation: Overview of the Quix InfluxDB connectors

‍

Share this post

Try Quix for yourself with $300 free credit (no credit card required)

Start building

Read more about why companies are replacing static dashboards with stream processing.

Get the white paper

Guide to the Event-Driven, Event Streaming Stack

Practical insights into event-driven technologies for developers and software architects

Get the guide

Check out the repo

Our Python client library is open source, and brings DataFrames and the Python ecosystem to stream processing.

Star us on GitHub

Words by

Tun Shwe

VP Data

Tun Shwe is the VP of Data at Quix, where he leads data strategy and developer relations. He is focused on helping companies imagine and execute their strategic data vision with stream processing at the forefront. He was previously a Head of Data and a Data Engineer at high growth startups and has spent his career leading teams in developing analytics platforms and data-intensive applications. In his spare time, Tun goes surfing, plays guitar and tends to his analogue cameras.

Rethinking “Build vs Buy” for Data Pipelines

“Build vs buy” is outdated — most companies need tools that provide the flexibility of a build with the convenience of a buy. It’s time for a middle ground.

Words by

Mike Rosam

CEO & Co-Founder

Industry insights

December 4, 2024