back
Success Story

Modernizing an Industrial IoT data stack for processing machine sensor data

Manufacturing
Data engineering
Banner for "Modernizing an Industrial IoT data stack"

First of all, I said we don't need streaming. It's too complex. But then if it's possible to really have the tools so stripped down so that we don't need to deal with all the complexity behind it, then I think it would be an option.

IIoT Specialist

An industrial machine builder with over €3 billion annual revenue faced a critical technical challenge: the query language they relied on for processing sensor data (Flux with InfluxDB) was being deprecated, forcing them to rethink how they analyze data from machinery.

Executive Summary:

  • The Flux deprecation presented an opportunity to transition from batch to real-time processing.
  • They considered SQL as a replacement, but Python offered them the ability to build complex logic and implement statistical formulas—use cases for which SQL isn't really designed.
  • Although real-time processing was appealing, they wanted to avoid the complexity of managing the required infrastructure.
  • They planned to process more data at the edge and use MQTT to address patchy data connectivity.
  • They didn't want to a solution that would overwhelm their engineers who had little time for steep learning curves.
  • They chose Quix because it is a fully-managed, Python-native solution that supports InfluxDB, MQTT, and processing at the edge. 

Collecting data from machines worth millions

The customer is a manufacturing technology provider that designs and builds heavy machinery for the metals industry.

The machinery is tailor-made for various applications across the metals industry and plays a significant role in the global manufacturing network.

Our customer began systematically analyzing machine sensor data since its early days, monitoring key metrics like temperature, pressure, and motor health.

The data processing evolved over the years. Here's how their batch data pipeline looked in the past: 

  • Data Acquisition and Storage: A common proprietary solution would collect data from a PLC (like Siemens S7) and buffer it. Telegraf was in use to feed this data to a central InfluxDB.
  • Data Analysis and Visualization: InfluxDB was used to process the data and Grafana was used for visualizations. This setup allowed them to create dashboards and perform some basic statistical analysis using Flux, InfluxDB's scripting language. These Flux queries were run as scheduled tasks on the InfluxDB server and the results were written to a separate bucket (synonymous with database).

A product deprecation forces a change to the data pipeline

In 2023, they faced a major challenge: InfluxDB deprecated Flux—the query language that they were using to process data in InfluxDB. Apparently, InfluxDB users found Flux harder to learn and less performant than SQL-like alternatives, leading to poor adoption despite significant investment in its development.

The Flux deprecation required our customer to adopt a new approach to data analysis. They still wanted to use InfluxDB, but they needed a different way to process the data. Originally, InfluxDB encouraged Flux users to move to SQL or InfluxQL (their own SQL-like language tailor-made for processing time-series data) so they evaluated this option first. It would work almost identically to their existing setup, but they would need to re-write the scheduled tasks as SQL tasks.

Recognizing that Python is more flexible than SQL

After an initial assessment, their principal IIoT specialist was hesitant to adopt SQL. Their data processing involved relatively simple statistics and mathematical formulas, but expressing these operations in SQL would result in complex and unwieldy code. Python, on the other hand, offered a more elegant and flexible approach to handle these calculations.

It's not complicated what we do”, admits the IIoT specialist, “but we do need to do some statistics, some multiplication, some division. And SQL code is not going to be nice at the end. I think Python is the way to go.

After he joined the company in 2008, the IoT specialist spearheaded the adoption of InfluxDB to improve data analysis and visualization. This shift represented a significant advancement in their data processing capabilities. But now it all had to be rewritten. Thus, the specialist saw it as a good opportunity to switch to real-time processing.

The dilemma of how to convert scheduled Flux tasks to Python

Although the team knew Python was the answer, they faced several challenges in adopting it. Since InfluxDB does not support Python tasks, they needed to find a separate solution for scheduling and running the Python-based processing jobs. Thus, this solution would have to read the raw data out of InfluxDB, process it, then write it back in again.

They considered implementing their own solution, and they also looked at batch processing tools such as Apache Airflow and Prefect (which are both Python based). However, none of these options were especially appealing because they had limited in-house software/data engineering skills and the tooling involved too much infrastructure management.

Quix offers a low-friction path to real-time processing

After evaluating various batch options, the IIoT specialist discovered Quix through an InfluxDB webinar. Initially skeptical about adopting a streaming platform, they were won over by Quix's ability to simplify complex data processing patterns.


“First of all, I said we don't need streaming. It's too complex. But then if it's possible to really have the tools so stripped down so that we don't need to deal with all the complexity behind it, then I think it would be an option.”
— The IIoT specialist

There were several key factors that drove their decision to try Quix:

  • Python Integration: Quix is a Python-first tool, which aligned with their desire to move away from SQL and leverage Python's capabilities.
  • Windowing operations: Quix supports built-in window operations for time-based processing which they needed to segment machine cycles.
  • Usage Based Pricing: This is available when using Quix Cloud which meant they only paid for what they used—which is important for a pilot project.
  • Containerized Environment: Because Quix uses containerized processes, they would avoid the risk of runaway resource consumption.
  • Edge Processing: The open source Quix Streams Python SDK would give them the capability to run processing at the edge.
  • MQTT Support: Quix supports MQTT, a lightweight messaging protocol suitable for handling data from machines in remote locations with potentially unstable connections.
  • Event-Driven Architecture: Quix’s event handlers enable efficient processing triggered by specific events, such as the start of a machine operation, addressing the need for event-based analysis.

According to the specialist, the decision to try real-time processing was also "kind of a bet into the future." Indeed, he recognized that the value of data is highest when it's processed closest to the source of the event.

In essence, our customer's preference for real-time processing (despite not strictly needing it at present) reflects a strategic decision. They are motivated by the recognized benefits of real-time insights, the desire to modernize their infrastructure and to keep pace with the industry trends for digitalization and Industry 4.0.

The transition to real-time processing in Quix

The move to Quix was relatively straightforward. Since Quix offers a managed, serverless solution (Quix Cloud) they just needed to create a Quix account and initialize a project using the default Kafka installation that is built into Quix Cloud. 

Next, they used the Quix InfluxDB connector to poll InfluxDB for new data and stream the data to a Kafka topic. 

This enabled them to start transitioning the Flux logic to Python. Each query was turned into a continuously running Python process that ran inside a container deployed to Quix Cloud (which uses managed Kubernetes under the hood). The calculation results could be continuously written back to InfluxDB for analysis. 

Enhancing the real-time pipeline with MQTT and edge processing 

The team plans to enhance their data pipeline by integrating MQTT and edge computing in the near future. The IIoT specialist envisions MQTT as a bridge to feed data into Kafka, hosted within the Quix platform. This approach leverages MQTT's lightweight protocol, ideal for gathering data from globally dispersed machines operating in environments with unreliable internet connectivity. Our customer's distributed operations, including those in regions like China, benefit from MQTT's resilience. The ultimate objective is to transition from a database-centric to a full streaming architecture, where MQTT continuously collects and processes machine data in real time.

In addition, the IIoT specialist anticipates deploying Quix containers on edge devices, aiming to reduce latency, optimize bandwidth, and enhance security. Their current edge hardware includes industrial-grade devices such as the Revolution Pi, and they plan to test out Quix Edge for local processing. While not immediately required, the IIoT specialist expects edge computing to become a priority within the next two years, driven by the need for faster responses, reduced data transmission costs, and improved data privacy through localized processing.

Comparing data pipelines: The past and the future

To understand the transition from batch to real-time, we've visualized our customer’s previous data pipeline and how it will look once they're completed their transition.

Before

Raw data was ingested directly into InfluxDB via Telegraf running at the edge (next to the machines). The raw data was processed in batches on the InfluxDB server using scheduled Flux queries.

After

Telegraf will still run at the edge but it will transmit data to an MQTT broker instead of InfluxDB. A Quix Streams process will read the MQTT messages and write them to a Kafka topic for downstream processing. Other Quix Streams processes will read from the Kafka topic and analyze the data in real time. The results of the real-time calculations will then be written to InfluxDB.



This means that InfluxDB no longer needs to process the raw data. This task is offloaded to Quix which writes new calculations continuously to InfluxDB.

Interested in making a transition to real-time processing?

Maybe your team uses InfluxDB and are facing a similar challenge or perhaps you're in a similar industry and want to move to real-time processing. Why not get in touch to discuss a pilot project?

If you still need convincing, check out these other resources :

What’s a Rich Text element?

The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.

Static and dynamic content editing

A rich text element can be used with static or dynamic content. For static content, just drop it into any page and begin editing. For dynamic content, add a rich text field to any collection and then connect a rich text element to that field in the settings panel. Voila!

How to customize formatting for each rich text

Headings, paragraphs, blockquotes, figures, images, and figure captions can all be styled after a class is added to the rich text element using the "When inside of" nested selector system.

First of all, I said we don't need streaming. It's too complex. But then if it's possible to really have the tools so stripped down so that we don't need to deal with all the complexity behind it, then I think it would be an option.

IIoT Specialist

Ready to get started?

Learn more about how companies are building data integration pipelines with Quix.

Explore the platform