Back

9 Aug, 2022 | Explainer

Four solutions for handling time series data

Most data in streaming applications such as IoT, finance, user behavior analysis and automotive is time-series data. Learn how to capture, process and apply it to get the most value from it.

Tomas Neubauer
Words by
Tomáš Neubauer, CTO & Co-Founder
How to capture and store time series data

Why time series data saves you time and effort — when you handle it correctly

Most data in streaming applications such as IoT, finance, user behavior analysis and automotive is time series data. Time series data refers to a group of data points indexed in time order. Such data can be described by a series of values taken at fixed points in time. Time series data is usually sampled at equally spaced points in time; this is often the easiest and results in the most meaningful data since it can be compared at like-for-like intervals. Regardless of the application, the time index is one of the most valuable properties of data.

In our collective experience, we’ve formulated and tested solutions to common problems that arise when dealing with time series data.

For multiple parameters, use tables.

In use cases where we measure multiple parameters at the same time, it is practical and efficient to send data in table-like structures. This is in contrast to the key-value approach other technologies use (Kafka Streams, for example). The Quix SDK supports panda’s data frames, which are popular in the data science community.

For data streams with various speeds, use buffering.

Quite often, data is collected and streamed from different data sources and, although that data were recorded at the same time, they have not arrived on the platform simultaneously. We provide buffer capability in our SDK to help with this problem. As a result, related data can be analyzed together with its timestamp as a unifying guide.

To acquire time series data, use a replay service.

Getting live data into a platform so data scientists can analyze and use it often involves a long lead time. This can be because a complicated streaming infrastructure needs to be in place first, the data source itself is complicated, or a third party provides the data. As a result, most projects start with some sort of static historical data collected offline. Usually, this data is handed over to data scientists in CSV or JSON files. Data scientists can then start analyzing data to understand it, but this format is difficult to build in a real-time pipeline.

We introduced replay capability in the Quix platform that enables data scientists to import data from historic files into the platform, but also replay that data as streams as if they were live. This enables the building of a real-time processing pipeline. This replay capability enables data scientists to start building real processing blocks months (or sometimes years) before actual live data is streamed into the platform. When the integration effort is completed, and real data is flowing into the platform, the processing pipeline seamlessly switches from replayed streams to real streams.

For performance, don’t put a database at the center of your architecture.

In the Formula 1 world, we dealt with massive amounts of time series data from car sensors. We learned:

  • There is no magic database solution for all types of data. There are good time series databases, good document-store databases, and good relational databases. Each technology excels in one area but falls short in at least one other.
  • The database is in the way of scaling your streaming analytics platform. Options are typically limited to scaling vertically or using a limited, expensive sharding solution.
  • Database infrastructure is expensive. It’s multiple orders of magnitude more expensive per million values processed than using a message broker like something like Kafka. For example, recording parameter-based data such as temperature readings over time as a sequence of events is approximately 145 times more memory- and storage-intensive than recording this data as a continuous stream.

Of course, this doesn’t mean we should just throw out the database. We need a database, just not at the core of our architecture.

At Quix, we use best-in-class database technologies, working together under the hood of our data catalog, to give our customers great performance in all areas. If persistence is enabled on a topic, data streamed using our SDK are persisted in the data catalog in an optimal way. This is completely orthogonal to the live processing pipeline and can be used for model training purposes, historical exploration, or to build dashboards.

The Quix data catalog is not a simple data lake, where JSON messages or CSV files are stored. Data sent via the SDK is parsed and persisted in context and in the time domain. That means:

  • Data is efficiently accessible for queries (aggregations, masking, grouping, etc.).
  • Very efficient data compression is possible — we do not need to save 1,440 messages from a day of temperature sensor data if it remained at a steady -18°C for three hours.
  • Data can be served in a desirable format — e.g. Panda data frame format for model training.

Get going faster with Quix

Quix is a developer-first platform. Sign up to get started immediately with our free plan and try it out today — it might cover all your needs. I hope you’ll agree with me that Quix lets you focus on value-added activities.

share

Try Quix for yourself for free (no credit card, no time limit).

Start for free
Tomas Neubauer
words by
Tomáš Neubauer, CTO & Co-Founder

Tomas Neubauer is Co-Founder and CTO at Quix, responsible for the direction of the company across the full technical stack, and working as a technical authority for the engineering team. He was previously Technical Lead at McLaren, where he led architecture uplift for Formula One racing real-time telemetry acquisition. He later led platform development outside motorsport, reusing the knowhow he gained from racing.

Previous Post Next Post

Related content

View all
Drawback ksqldb 1
Explainer | 24 May, 2023
The drawbacks of ksqlDB in machine learning workflows
Using ksqlDB for real-time feature transformations isn't as easy as it looks. I revisit the strategy to democratize stream processing and examine what's still missing.
1611064394032
words by
Mike Rosam, CEO & Co-Founder
Wild west
Explainer | 24 May, 2023
Bridging the gap between data scientists and engineers in machine learning workflows
Moving code from prototype to production can be tricky—especially for data scientists. They are many challenges in deploying code that needs to calculate features for ML models in real-time. I look at potential solutions to ease the friction.
1611064394032
words by
Mike Rosam, CEO & Co-Founder
Quix vs flink
Explainer | 20 Apr, 2023
Quix as an Apache Flink alternative: a side-by-side comparison
Explore the differences between Quix and Apache Flink and find out when it's better to use Quix as a Flink alternative. If you’re searching for Apache Flink alternatives, this guide offers a detailed, fair comparison to help you make an informed decision.
1611064394032
words by
Mike Rosam, CEO & Co-Founder

The Stream

Updates to your inbox

Get the data stream processing community's newsletter. It's for sharing insights, events and community-driven projects.

Background image