Are data historians getting in the way of Industry 4.0?
Learn how data historians impact Industry 4.0 adoption, understand their limitations and discover alternative approaches to managing data from OT systems.
People have been talking about Industry 4.0 for over a decade, but now it’s finally getting serious. Perhaps we can credit the recent AI hype for reigniting the conversation. Thanks to extensive media coverage of OpenAI, it’s now common knowledge that the more data you feed AI, the smarter it becomes. And who are the gatekeepers of data in the industrial landscape? Data historians.
The data historian market is aware of this and is rapidly modernizing. Yet, for most companies, it’s still fairly cumbersome to move data from offline OT systems into online IT systems (which include AI and ML solutions).
In this article, I’ll look at the weaknesses of legacy data historians and how the market is adapting to overcome these weaknesses. I’ll also look at whether data historians (legacy or modern) are always the best source of OT data, and when it's better to bypass them altogether.
It’s time to adapt or be left behind
For a long time, it’s been easy to tune out the industry 4.0 hype (just like it's been easy to tune out today’s AI hype). There’s been a lot of hot air, but we have also seen early adopters make eye-opening efficiency gains.
For example, in computer-aided design (CAD), generative AI is being used to accelerate product design processes. About 70% of manufacturers are now using this technology for discrete processes, reporting significant time savings. ABB and Schneider Electric have respectively developed comprehensive AI-driven solutions like ABB Ability™ and EcoStruxure™, which optimize industrial operations, enhance energy efficiency, and improve resource utilization.
This progress is inevitably trickling down to small and medium-sized enterprises but will soon be more than a trickle. The SMEs that are still doing things “the way it’s always been done” will die a slow death because more flexible competitors will become much more efficient (regulations and compliance hurdles notwithstanding).
The larger early adopters have figured out how to reliably and efficiently get their OT data into IT systems. Often this has involved building their own complex proprietary solutions. SMEs can’t afford to build their own software, so the data historian market has evolved to make this transition easier. Unfortunately, most SMEs are still stuck with legacy data historians (if they’re using any at all). This is a big problem.
What are legacy data historians good at?
Before I start criticizing legacy data historians, let’s start on a positive note. Data historians have always had a “very particular set of skills”. They are designed to record, store, and retrieve high-frequency time-series data from industrial control systems like SCADA, PLCs, and DCS. Plant operators or engineers can then review that data for live monitoring or analyzing historical trends.
They can do things that regular databases can’t do, such as:
Parsing domain-specific data: OT systems like PLCs and SCADA systems have very specific (and sometimes) exotic ways of formatting and structuring data. Data historians are purpose-built for handling this kind of data and they’re often integrated with asset frameworks that map raw sensor signals to meaningful operational models. This mapping makes it easier for engineers and analysts to interpret and use the data effectively.
Interfacing with older systems: Many industrial facilities rely on proprietary systems that are tightly coupled with data historians from the same vendor (think Siemens, Rockwell, and so on). This makes data historians invaluable for maintaining compatibility in environments with aging infrastructure while still providing access to critical operational data.
Tailoring the user experience to industrial engineers: Data historians often offer a comprehensive suite of features, including data acquisition, validation, compression, storage, retrieval, and basic visualization. Historians are also designed to be intuitive for industrial engineers so they can get operational insights without requiring advanced programming skills or extensive IT support.
However, when it comes to industry 4.0, they have a lot of weaknesses too, which is why the market is adapting.
What is the state of the data historian market?
There are many types of data historian, but for the purposes of this article, I’ll group the product ecosystem into “traditional” and “ISV”. If you’re stuck with a legacy data historian it’s more likely to be from a traditional vendor—although some ISVs have plenty of legacy versions floating around too (i.e OSIsoft).
Traditional
Traditional data historians are more tightly coupled with specific hardware brands such as Siemens or Allen Bradley (Rockwell). They often rely on proprietary data formats, lack robust APIs, and require significant manual effort to integrate with modern IT systems and cloud-based platforms. Although I’ve labeled them as “traditional,” many of these vendors are modernizing their products to meet Industry 4.0 demands, with features like cloud integration and APIs. However, the older legacy versions (still in use by many customers) lack this interoperability.
Vendors in this category include:
- Rockwell Automation: FactoryTalk Historian (now leveraging AVEVA PI).
- GE Vernova: Proficy Historian.
- Siemens: SIMATIC PCS 7 Process Historian.
- ABB: ABB Ability™ Symphony® Plus Historian.
- Honeywell: Honeywell Batch Historian.
- Emerson: DeltaV Batch Historian.
Note that this categorization isn’t fully precise because these data historian software offerings are often sold and function independently of their hardware lines.
Independent Software Vendor (ISVs)
The data historians marketed by ISVs tend to have a different design philosophy because they’re not tied to a specific OT system. They can adapt faster to changes in the industry and tend to prioritize flexibility, openness, and scalability. Nowadays, they claim to offer broad interoperability, real-time data processing, and cloud integration which makes them more naturally aligned with the goals of Industry 4.0.
Examples of ISVs include:
- AVEVA: AVEVA Historian (integrating the older OSIsoft PI system)some text
- Started as a pure software company but was acquired by Scheider Electric.
- dataPARC: PARCserver.
- Canary Labs: Canary Historian.
- Factry: Factry Historian.
- eLynx Technologies: eLynx Data Historian.
- Inductive Automation: Tag Historian Module.
- Prosys OPC: Prosys OPC UA Historian.
- VROC: DataHUB+.
Although some ISVs have been around for decades (e.g. OSIsoft and Canary), many newer ISVs have popped up to fill a growing demand for greater interoperability with cloud systems.
What challenges are data historian vendors trying to address?
Both traditional vendors and ISVs are offering more “modern” data historians to address a fundamental set of problems that come with older legacy systems.
Interoperability Between OT and IT Systems
Hubert Yoshida, former CTO of Hitachi Vantara, famously wrote that “OT is from Mars and IT is from Venus”—a perfect analogy for the disconnect between operational technology (OT) and information technology (IT). Data historians sit squarely in this divide.
The strengths of older data historians are also their weaknesses. They excel at capturing and organizing domain-specific data for OT systems, such as SCADA or PLCs, but often struggle to translate this data into formats usable by IT systems. These systems (e.g., Databricks or Amazon EMR) require data in structured formats like Parquet, ORC, or JSON, optimized for distributed processing and advanced analytics. Getting data from a historian into such formats typically involves bespoke ETL (Extract, Transform, Load) pipelines—processes that are both time-consuming and resource-intensive.
While the broader software industry offers numerous ETL tools for IT-to-IT integrations, the OT world presents unique challenges. Data historians frequently use proprietary formats and interfaces, requiring custom integration for every specific implementation. This lack of standardized interoperability is a significant obstacle to bridging OT and IT. The industry needs data historians that can seamlessly interface with both SCADA systems and modern IT platforms, eliminating the need for custom-built solutions.
As I said before, if you can’t efficiently pipe clean, reliable, and consolidated OT data into IT systems, you won’t be able to see the benefits of modern data-driven solutions (such as predictive maintenance and process optimization).
Data cherry-picking and siloed architectures
The limitations of legacy data historians often lead to a phenomenon known as "data cherry-picking." Since accessing data from these systems can be complex—due to proprietary interfaces and a lack of modern APIs—users focus on the easiest-to-access datasets, ignoring potentially valuable information. This piecemeal approach limits the scope of analysis and hinders innovation.
Adding to the challenge, data is frequently siloed. Legacy historians often organize data by physical assets or production areas, storing it across separate servers rather than in a unified, searchable system. This fragmented architecture stems from older historians' inability to handle the sheer volume of data generated by modern PLCs, IoT devices, and sensors. To avoid overwhelming these systems, organizations resort to distributing data across different physical storage locations.
Manual administration exacerbates these issues. Many legacy systems require users to request access through system administrators, who may restrict access or discourage intensive queries to prevent system crashes. High licensing fees, often tied to the volume of monitored data, further encourage a narrow focus, discouraging comprehensive data collection.
Legacy historians also lack robust support for metadata and context. Without a framework to establish relationships between data points (e.g., which sensor belongs to which machine), users struggle to form a holistic understanding of their systems. Instead, they rely on what’s immediately accessible and understandable, leaving critical insights unexplored.
Data cherry picking leads to uniformed decisions and duplication of effort because everyone is working from a different dataset. If you can centralize data, you reduce the likelihood of cherrypicking.
Have modern data historians cleared the way to industry 4.0?
Not entirely. While modern data historians represent a significant leap forward, they still have a few weaknesses:
Skill Gaps
Although modern historians are more user-friendly than traditional systems, they remain niche technologies requiring expertise to implement and manage effectively. Professionals tasked with deploying these systems—particularly in hybrid OT-IT environments—often lack the necessary training. This shortage of skills and familiarity among engineers, IT specialists, and operators can lead to incomplete implementations or inefficient data pipelines, ultimately diminishing the benefits these systems are designed to deliver.
Granularity Trade-Offs
Modern historians can handle high-frequency data better than their predecessors, but for extremely granular data (e.g., millisecond-level sampling), they can still fall short. Real-time systems like Quix are often better suited for such use cases, where data needs to be processed immediately instead of batched into queries after ingestion.
Cost
Modern data historians are not cheap. Their pricing models are often tied to the volume of data signals monitored or the number of devices integrated, leading to significant expenses for organizations with high-frequency data or a wide array of IoT-enabled equipment. Open-source alternatives and general-purpose time-series databases may offer more cost-effective solutions, albeit without the domain-specific optimizations of modern historians.
However, there’s still reason to be optimistic because…
You don’t have to rely exclusively on a data historian
Some companies augment their data historians with a general-purpose time series database. It’s also possible to bypass the historian altogether (for specific data types).
However, I’m not telling you to ditch your current historian (whether it be legacy or modern). I’m just saying you’ll need other tools to get to industry 4.0. Some of them can extend your current historian, others will work alongside it.
Let’s take a closer look at the latter scenario. In some cases, you might want to ingest data closer to the source rather than getting it out of the historian. It can be a lot cheaper because you can use open-source, general-purpose systems that are designed for processing high-velocity, real-time data.
To understand how this works, let’s look at how data gets from machine to historian.
Both traditional and modern data historians rarely connect directly to machines. Instead, the path from machine to data historian typically involves several intermediary systems:
- Machine → OPC Server → SCADA System → Data Historian
- The OPC server handles protocol translation, converting raw signals into a standardized format.
- The SCADA system organizes these signals (or "tags") into meaningful structures that align with operational workflows.
- The data historian performs further transformations, often through integration with an asset framework, to place the data into a broader hierarchy or operational context.
Each layer in this chain adds value by refining, tagging, and contextualizing data from its raw format. However, there are still situations where it’s better to bypass parts of this chain.
Why ingest data closer to the source?
Each system in the chain often operates at a different sampling resolution, with granularity decreasing as data moves downstream. For example:
- OPC servers may provide raw, high-frequency data at millisecond intervals.
- SCADA systems aggregate this into lower-frequency data (e.g., every few seconds) for operational purposes.
- Historians may store aggregated data (e.g., every 10-30 seconds) for long-term trends.
If your application requires high-resolution, real-time data—such as for vibration analysis, predictive maintenance, or equipment fault detection—pulling data directly from the OPC server is often the better choice. This avoids the loss of granularity introduced by downstream systems.
Ingesting data closer to the source also reduces latency and is generally cheaper. For instance, connecting directly to the OPC server bypasses licensing fees tied to data volumes stored in SCADA systems or historians. It’s also cheaper to store the data because it is continuously aggregated in real time rather than being stored in its most fine-grained form. In the IT world, this is called “shifting-left” so if you’re interested in learning more, check out the article “Shifting Left: Discover What's Possible When You Process Data Closer to the Source”.
Conclusion
There are many paths to industry 4.0 and sometimes data historians get in the way, but you can go around them. Historians are great at consolidating, tagging, and contextualizing industrial data, but their limitations can prevent companies from getting real-time insights and high-resolution analytics.
Data historians have improved, but they haven’t entirely eliminated issues like granularity loss, proprietary complexity, and high costs. In many cases, the solution lies in rethinking how data is processed and analyzed—sometimes bypassing traditional pathways in favor of real-time capabilities.
This article is just the start of the conversation. In upcoming content, I’ll dive deeper into how you can address these challenges, exploring data processing approaches that enable faster, smarter decisions while saving you money.
What’s a Rich Text element?
The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.
Static and dynamic content editing
A rich text element can be used with static or dynamic content. For static content, just drop it into any page and begin editing. For dynamic content, add a rich text field to any collection and then connect a rich text element to that field in the settings panel. Voila!
How to customize formatting for each rich text
Headings, paragraphs, blockquotes, figures, images, and figure captions can all be styled after a class is added to the rich text element using the "When inside of" nested selector system.
Mike Rosam is Co-Founder and CEO at Quix, where he works at the intersection of business and technology to pioneer the world's first streaming data development platform. He was previously Head of Innovation at McLaren Applied, where he led the data analytics product line. Mike has a degree in Mechanical Engineering and an MBA from Imperial College London.