From sensors to insights: the challenges of handling high-volume industrial data in real time

IIoT is transforming industrial data collection, but legacy systems can’t keep up. Learn how scalable real-time streaming solutions solve the challenge.

Mike Rosam

CEO & Co-Founder

Banner image for the article "From Sensors to Insights: The Challenges of Handling High-Volume Industrial Data in Real Time" published on the Quix blog

The 4 Pillars of a Successful AI Strategy

Foundational strategies that leading companies use to overcome common obstacles and achieve sustained AI success.

Get the guide

Guide to the Event-Driven, Event Streaming Stack

Practical insights into event-driven technologies for developers and software architects.

Get the guide

Thirty years ago, industrial data collection meant a maintenance worker making hourly rounds with a clipboard. Today, factories and plants are continuous data generators, with thousands of sensors monitoring everything from equipment vibration to environmental conditions, production throughput, and product quality.

But generating data is the easy part. The real challenge is making sense of it in real time. Industry 4.0 depends on information transparency and real-time insights, yet most industrial systems were built for a batch-processing world. They weren’t designed to handle the sheer volume, velocity, and variety of industrial IoT (IIoT) data.

This shift in industrial data collection has created a set of challenges that legacy systems weren’t built to handle. Let’s look at some of the obstacles to ingesting, analyzing, and acting on IIoT data in real time.

Scaling: Why handling natural growth and unpredictable spikes in sensor data is a challenge for legacy systems.
Costs: How the pricing models of legacy tools can make large-scale IIoT deployments financially unsustainable.
Latency: Why processing delays can make it impossible to respond when it matters.
Bandwidth: How ever-growing data volumes put networks under strain, creating bottlenecks.

Data volume and scalability

It’s no accident that “data explosion” is a widely used term in IoT circles. In the time between clipboard-based data gathering and today’s IIoT infrastructure, data collection has gone from occasional snapshots to an unbroken stream from thousands of devices in a single plant.

To put it in context, over the past decade three changes have led to a huge increase in the potential volume of data from IIoT:

Sensors everywhere: The cost of IIoT sensors more than halved between 2010 and 2020, making speculative and opportunistic deployments more common. Companies can now afford to instrument everything, even without an immediate need for the data.
Higher frequency, higher volume: While many IIoT devices still take measurements at longer intervals, some now capture millisecond-level telemetry, generating thousands of readings per second. This shift means modern systems can produce hundreds or even thousands of times more data than their predecessors.
Always-on connectivity: With the expansion of 5G and low-power wide-area networks (LPWAN) like LoRaWAN for remote deployments, and high-capacity Wi-Fi 6 and Zigbee in industrial settings, more IIoT sensors are permanently connected. That means more devices are transmitting data continuously rather than in batches when they happen to have connectivity.

And that presents us with a disconnect between the needs of IIoT and the systems that typically ingest industrial data. In particular, data historians.

The limitations of data historians with IIoT data

For three or four decades, data historians have worked away logging readings from SCADA systems and industrial controllers. But they were designed as batch processors to enable historical analysis rather than ingesting millisecond-resolution streams of data from always-on telemetry devices.

And that leads to the following problems for industrial IoT:

They’re a scaling bottleneck: Traditional data historians are not cloud native. Instead, they run on-premise and typically on a single server. That rules out on-demand scaling as hardware must be procured and downtime arranged well in advance of any increase in demand. Keeping up with IIoT data volumes means you’ll quickly hit the capacity ceiling of the individual hardware the data historian is running on.
Batch processing prevents real-time response: Data historians follow a “store first, analyze later” model. That worked fine when data collection was infrequent, but it’s too slow for real-time applications like predictive maintenance.
Data gets siloed: Many data historians lock data into proprietary formats, making integration with more modern systems time consuming and potentially expensive.

Each of these challenges is solvable but not by simply expanding the same old architecture. For that, we need to take a different approach altogether.

Scaling horizontally with distributed systems

Let’s tackle scaling, first. The sheer volume of IIoT data can’t be processed by a single machine sitting in a manufacturing plant’s data closet. Instead, we need to do two things:

Break up the workload: Instead of one system handling everything from ingestion to storage to analysis, each part of the process should run independently.
Distribute the work across multiple machines: Whether physical or virtual, these machines should be able to scale up or down as needed, rather than being limited by the capacity of a single server.

And this is exactly what open source projects such as Apache Kafka do. That means you can scale out to thousands of individual servers to process huge volumes of incoming data. And as a consequence of being a widely adopted, modern, open source project, Kafka and its ilk also tend to be easier to integrate with. So, there’s less engineering cost and risk in connecting to IIoT modules and other parts of your system.

But there’s another side to this as well.

Real-time and event driven

We’ve spoken about data historians in the context of batch processing. Kafka and similar tools are event driven. So, rather than storing data first and analyzing it later, they process data as it arrives, routing it to the right systems in real time.

Let’s look at an example to see how the combination of distributed systems and event driven architectures result in a more scalable system that’s better able to handle large volumes of data.

Imagine a grid-scale battery storage facility. Its job is to smooth out peaks and troughs in electricity production. Keeping it running efficiently requires understanding charge levels, battery cycles, temperature, demand from the grid, and so on. Potentially, that equates to thousands of readings each second.

Even with more frequent batch jobs, millisecond-level changes happen too fast for delayed processing to be useful. Instead, by streaming sensor data into a real-time system, the facility can respond the moment something changes, adjusting power distribution, triggering alerts, or taking corrective action instantly.

And even if a system can technically scale, its pricing model might not.

Exploding costs

Managing IIoT data isn’t only a technical concern. Legacy systems, and especially data historians, tend to have pricing models that make no sense at the volumes and velocity you’re likely to see in an industrial IoT project.

If we stick with the example of data historians, their pricing tends to be based around:

A base price: This is the entry fee simply for having the software, usually bundled with a usage allowance. Charged on a monthly or annual basis but some options offer a perpetual license.
Per-tag pricing: Tags are the data points we want to monitor. That roughly equates to one tag per sensor. Depending on the vendor, you might get a few hundred or thousand bundled with your base price. The more tags you want to monitor, the higher your fee.
User licences: Per seat licensing to cover the people using the system.
Additional functionality: There might be charges for things like ODBC connectors, making the data available in a web interface, and so on.

And this is a problem when we have a situation like our battery storage example. We’ve already seen how it generates thousands of readings per second across charge cycles, voltage levels, temperature, and grid demand. But under a per-tag pricing model, the issue isn’t just the frequency of readings: it’s the fact that each unique parameter being tracked is chargeable.

For a grid-scale battery system, that means every individual sensor—whether it’s tracking voltage, charge cycles, or temperature—adds to the bill, quickly making high-resolution monitoring cost-prohibitive. And that means you end up monitoring what’s affordable rather than what’s available.

With a cloud based streaming data tool, you’ll pay instead for computing resources rather than artificial constraints.

Real time processing and latency

Even if a system can scale to ingest large volumes of data and its pricing model makes that affordable, there’s still a risk that it won’t be able to process that data in real time.

Late arriving data is a particular problem because it makes two things much harder:

Detecting issues at the right moment: This is the most obvious problem, in that delayed data prevents the system from responding in a timely manner.
Finding patterns over time: If data is added to the record out of order, due to delays, it becomes harder to analyze it accurately later on.

And finding the cause of delays can be tricky because even small issues can compound latency along the data processing pipeline.

Devices themselves can introduce delays due to limited resources like memory and processing power. When readings can’t be buffered or transmitted fast enough, data may be delayed—or in the worst case, dropped entirely. A store-and-forward approach with on-device storage can prevent outright data loss but does nothing to reduce latency. Edge computing, where some processing and filtering happen on the device, helps by reducing the volume of data sent, lowering network strain and improving response times.

In our battery storage example, even a short delay in voltage readings, for example, could result in not meeting grid demand when needed..

But where it’s not possible to reduce the amount of data sent over the network, that can lead to another pinch point.

Network bandwidth constraints

IIoT devices tend to be deployed in challenging environments. Even where they’re not out in a remote location connecting over a patchy cellular link, bandwidth is often limited. Partly, that’s down to:

Reliance on low-power networks: Many IIoT sensors run on LPWAN technologies like LoRaWAN or NB-IoT, which trade high bandwidth for energy efficiency to maximize battery life. These networks are optimized for small, infrequent transmissions rather than continuous high-throughput data streams.
Industrial environments make cabling difficult: Retrofitting wired networking in factories, processing plants, and energy facilities is expensive and disruptive. Wireless alternatives often face interference from heavy machinery, metal structures, or electromagnetic noise.
Shared bandwidth in constrained networks: Many IIoT deployments rely on existing industrial control networks (e.g. fieldbuses or shared wifi), which were never designed for high volume, high-frequency data transmission.

But there are also issues with the IIoT data itself that makes it challenging for the network to keep up, including:

Congestion from simultaneous transmissions: If too many devices send data at once, network congestion gets worse, causing delays or dropped packets.
High-frequency sensor data: Some IIoT devices, such as vibration sensors or energy monitors, generate readings every millisecond, far exceeding the capacity of traditional industrial networks.
Bursty transmissions: Devices operating in low-power or intermittently connected environments may store readings and send them in large bursts. Similarly, devices that increase their reading frequency during unstable periods might make issues worse by generating more traffic when there’s an ongoing issue. If poorly timed, both can overwhelm the network.
Protocol inefficiencies: Legacy protocols might not be well suited to the combination of high data volume with bad network conditions. They can also be inefficient when it comes to protocol overhead, leading to a poor signal to noise ratio.

Whether you’re using a real-time streaming platform or a batch processing tool, the network itself is often the bottleneck. But through more modern efficient protocols, edge processing, and adding network capacity, you can improve capacity and make sure data arrives on time.

Getting the infrastructure for scalable, high volume IIoT data

The challenge with IIoT data isn’t just volume. It’s also about making sure systems can keep up in real time. Older systems that use batch processing introduce delays, drive up costs, and create data silos that make it harder to act on your data.

Stream processing removes these barriers by handling data continuously as it arrives. Combined with distributed systems and event-driven architectures, these tools make it easier to scale in response to enormous volumes of data, while giving you real-time response and analysis.

Many companies are already leveraging modern data streaming platforms like Confluent Cloud, Redpanda, and Quix. Quix is especially versatile because it doesn’t just move and route streaming data (as all streaming platforms do) but also allows you to perform complex statistical calculations in real-time, making it a powerful choice for complex use cases. To see how teams are tackling the challenges of processing high-volume sensor data at scale, take a look at the Quix customer stories.

Share this article:

Words by

Mike Rosam

CEO & Co-Founder

Mike Rosam is Co-Founder and CEO at Quix, where he works at the intersection of business and technology to pioneer the world's first streaming data development platform. He was previously Head of Innovation at McLaren Applied, where he led the data analytics product line. Mike has a degree in Mechanical Engineering and an MBA from Imperial College London.