ActiveMQ vs. Kafka: comparing messaging middleware technologies
Messaging middleware emerged as a response to the increasing complexity and interdependency of computing systems. Messaging middleware allows different components to communicate indirectly, using messages passed via a middle layer. This helps decouple components and promotes flexibility, scalability, and robustness. The advent of messaging middleware has been instrumental in the success of many enterprise-level systems, allowing them to evolve and adapt to changing business needs without massive reengineering efforts.
This article compares two popular message oriented middleware technologies, Apache ActiveMQ and Apache Kafka, by exploring their differences, similarities, and use cases. Before we dive into a detailed analysis, here are some key takeaways:
- ActiveMQ is a traditional, Java-based message broker. It offers multi-protocol capabilities and supports flexible messaging patterns (e.g., point-to-point and pub/sub).
- Kafka is a distributed event streaming platform designed to handle high-velocity, high-volume streaming data. It’s a good choice for real-time data pipelines and stream processing.
- ActiveMQ follows a “complex broker, simple consumer” approach, while Kafka has a “simple broker, complex consumer” model.
- Kafka and ActiveMQ are open source solutions, and there are vendors offering commercial support for both.
- They’re both fault-tolerant and highly available solutions, but Kafka is better equipped to deal with hyper-scale scenarios (petabytes of data and trillions of messages per day, distributed across hundreds or even thousands of brokers).
- ActiveMQ and Kafka offer various clients, targeting multiple languages (for instance, Java, Go, Python, Node.js, C++, etc.)
- Kafka offers more integrations and has a larger and more active community.
Apache ActiveMQ is a multi-protocol Java-based message broker that implements the Java Message Service (JMS). It was initially developed by LogicBlaze, and later donated to the Apache Software Foundation. ActiveMQ is a popular choice for integrating various applications and enabling flexible communication between different components.
ActiveMQ (both the Classic version and the newer Artemis version) employs an architecture that includes producers, consumers, and message brokers. Producers create messages and push them to destinations, which are managed by brokers. There are two types of destinations:
- Queues. They work in a point-to-point fashion; there is a single consumer reading each message.
- Topics. They implement the pub/sub pattern; messages are dispatched to all consumers subscribed to a topic.
Consumers then read messages from queues/topics. To do so, they can pull messages or, alternatively, the broker can push them. The push model tends to be more responsive, delivering messages to consumers as soon as they arrive at the broker. On the other hand, the pull model gives consumers more control over when messages are consumed, preventing them from becoming overwhelmed.
As a short aside, there are some differences compared to Kafka, which has topics, but not queues. Furthermore, Kafka consumers pull messages (the broker can’t push them).
ActiveMQ supports clustering to share message processing load across multiple brokers. Clustering also ensures high availability and fault tolerance.
We’ll now briefly discuss some of the differences between ActiveMQ Classic and ActiveMQ Artemis. While both versions largely offer the same features and serve the same use cases, there are some internal architectural differences. Here are the most notable ones:
- In ActiveMQ Classic, we have a few different implementations of the IO connectivity layer, like tcp (synchronous) and nio (non-blocking). In comparison, Artemis uses Netty to implement the IO layer (non-blocking).
- ActiveMQ Classic relies on external tools for data persistence: KahaDB, or a JDBC-compliant database. Artemis can use a JDBC database for this purpose too, but it also has a built-in message store (append-only message journal).
- ActiveMQ Classic brokers use cursors, which are caches of messages ready to be dispatched to consumers. Brokers try to keep all incoming messages in there. When brokers run out of memory, messages are added to the message store, and the caching stops. Meanwhile, in Artemis, the whole message journal is kept in memory, and messages are dispatched directly from it. When the Artemis broker runs out of memory, messages are stored in sequential page files on the producer side. Once memory is freed, messages are moved from these page files into the Artemis journal.
- Unlike ActiveMQ Classic, Artemis only implements queues internally. Multicast routing is used to implement publish/subscribe semantics (topics), where all subscribers get their own internal queue, and messages are routed to all of them. Anycast routing is used to implement point-to-point semantics, where there’s only one queue, and all consumers subscribe to it.
ActiveMQ’s features and capabilities
Unlike Kafka, which uses a binary protocol over TCP for publishing and consuming messages, ActiveMQ supports multiple wire protocols:
- OpenWire (ActiveMQ’s native protocol)
- HornetQ (only supported by ActiveMQ Artemis)
- XMPP (only supported by ActiveMQ Classic)
- WSIF (only supported by ActiveMQ Classic)
- RSS and Atom (only supported by ActiveMQ Classic)
Similar to Kafka, ActiveMQ allows data to be persisted (although it’s not really designed for long-term data storage). However, while Kafka brokers store data on disk, ActiveMQ offers different options:
- ActiveMQ Classic. There are two options — KahaDB (file-based storage), or a JDBC-compliant database (for instance, MySQL, Oracle Database, or PostgreSQL).
- ActiveMQ Artemis. The recommended option is to use the append-only file journal, which consists of a set of files on disk. Alternatively, you can store broker state and messages in PostgreSQL, MySQL, or Apache Derby via JDBC. However, this second option is less mature and comes with diminished performance. Check out the Artemis persistence documentation for more details.
Beyond data persistence, ActiveMQ has some other capabilities that help ensure reliability and high availability. In the case of ActiveMQ Classic, this could mean using a primary/replica broker configuration, where the replica can take over if the primary broker fails. Alternatively, you can configure a network of brokers, so messages can be forwarded across brokers. This helps provide load balancing and enables broker redundancy. It also helps with horizontal scaling.
Meanwhile, ActiveMQ Artemis offers high availability and failover capabilities like live-backup groups and client failover. In a live-backup group, each live broker can have one or more backup broker (note that these backup brokers are not operational unless failover occurs). Client failover can be automatic — if a connection fails, the client will reconnect to a backup broker. If you don’t want to use automatic failover, there’s also the option of implementing your manual reconnection logic at the application level.
ActiveMQ leverages a “complex broker, simple consumer” approach. This means that an ActiveMQ broker is responsible for routing messages to appropriate consumers, maintaining the state of consumers, tracking which messages have been consumed, and handling redelivery if necessary. Meanwhile, consumers manage no sophisticated routing logic — they simply consume messages from the broker.
ActiveMQ’s “complex broker, simple consumer” model simplifies the development of client applications, but it puts more pressure on the broker, as it must manage all the various aspects of messaging. As we discuss later in the article, Kafka uses an opposite approach: “simple broker, complex consumer”.
I’ll end this section with a few words about ActiveMQ’s performance. ActiveMQ Artemis is better equipped to provide high performance compared to ActiveMQ Classic. This is because Artemis, unlike ActiveMQ Classic, has a completely asynchronous, non-blocking architecture. Furthermore, Artemis offers the option of storing data in a journal (which is faster than moving it to a third-party database for storage). Records are appended (added to the end of the journal), which minimizes random access operations (typically the slowest operations on disk).
But even when using Artemis, there are still some latency and throughput limitations. For instance, the protocol you use affects performance (e.g., AMQP and STOMP are slower and come with additional overhead compared to MQTT and OpenWire). In general, ActiveMQ Artemis can handle a respectable throughput and can provide low latencies in moderate workloads. However, it’s not designed and optimized to handle millions of messages per second, as is the case with Kafka.
Like Kafka, ActiveMQ is an open source technology available under the Apache License version 2.0. If you don’t want to deal with deploying, managing, and scaling ActiveMQ in-house, you can offload this responsibility to a number of third-party vendors who offer managed ActiveMQ services. Common examples are AWS Amazon MQ, Red Hat AMQ Broker, and OpenLogic. It’s worth noting that, compared to Kafka, there are fewer commercial support options for ActiveMQ.
ActiveMQ offers JMS, REST, and WebSocket interfaces, alongside various clients, for programming languages like .NET, C++, Erlang, Go, Haskell, Node.js, Python, and Ruby on Rails. Since ActiveMQ supports the JMS API, any JMS-compliant client can interact with ActiveMQ.
In terms of ready-made integrations with other systems, ActiveMQ is nowhere near Kafka. While Kafka benefits from a rich suite of sink and source connectors for hundreds of other systems, ActiveMQ offers limited integrations, with frameworks like Apache Camel and Spring. ActiveMQ also offers JMS bridges, allowing you to easily connect your ActiveMQ deployment to other JMS-compliant brokers, like RabbitMQ and IBM MQ.
Another key difference compared to Kafka is that ActiveMQ does not provide any native stream processing capabilities — there’s no ActiveMQ equivalent of Kafka Streams.
Finally, the ActiveMQ community is not as large and active as Kafka’s. There’s a smaller number of organizations using ActiveMQ, fewer educational resources, and a significantly lower number of meetups and events dedicated to ActiveMQ.
While ActiveMQ is a traditional message broker, Apache Kafka is a distributed system meant to handle high-velocity, high-volume, and fault-tolerant data streams. It was originally developed at LinkedIn and later donated to the Apache Software Foundation. As an event streaming platform, Kafka is a popular choice for building real-time data pipelines, event-driven architectures, and microservices applications.
Similar to ActiveMQ, Kafka’s architecture consists of producers, consumers, and brokers. Producers write messages to brokers, while consumers read the data ingested by brokers, following the publish/subscribe pattern. Note that consumers use a pull mechanism (long polling) to fetch data from brokers. The main advantage of this pull approach is that consumers get to control the rate at which they consume messages, without the risk of becoming overwhelmed.
Brokers run on a Kafka cluster, while producers and consumers are entirely decoupled from the system. Each broker stores the data sent by producers in topics — collections of messages belonging to the same group/category. Note that there’s no concept of message queues in Kafka — this is a significant difference compared to ActiveMQ, which has both topics and queues.
Kafka’s topics can be divided into multiple partitions, which brings benefits like fault tolerance, scalability, and parallelism. Each broker can store selected partitions from a topic, while the other partitions can be distributed across other brokers. This approach helps split and balance the workload between brokers.
Kafka’s features and capabilities
Unlike ActiveMQ, Kafka does not offer multi-protocol capabilities for publishing and consuming messages. Instead, Kafka uses a binary protocol over TCP that defines all APIs as request-response message pairs.
Kafka's log-based storage model uses a single, append-only log file for each topic partition, where messages are written sequentially and stored on broker disks. Reads are sequential, starting from an offset (note that consumers are responsible for managing their offsets). Writes are appended to the end of the log.
You can configure Kafka to store messages indefinitely. This capability is essential for recovery and continuity in case of failures or disasters. For enhanced reliability, availability, and fault tolerance, you can set up replicas for each topic’s partitions across a configurable number of brokers. This way, if a broker becomes unavailable, automatic failover to another replica in the cluster is possible, so messages remain available. In addition to intra-cluster replication, you can use MirrorMaker to replicate entire Kafka clusters. These replicated clusters can be located in different data centers or even different regions (geo-replication).
In contrast to ActiveMQ’s “complex broker, simple consumer” approach, Kafka uses a “simple broker, complex consumer” model. This means that Kafka brokers are responsible for storing data, and do not actively keep track of which messages were read by consumers.
Instead, each consumer controls the rate at which it reads messages and keeps track of the latest message it has read (offset) in-memory. For durability and to ensure that the consumer can pick up where it left off after restarts or failures, the offset must be stored somewhere more permanent than just in-memory. That’s why consumers periodically commit their offsets for durable storage in an internal Kafka topic (in earlier Kafka versions, ZooKeeper was used to store consumer offsets; however, this wasn’t very efficient, because ZooKeeper wasn’t designed to handle a large number of writes, which is often the case when offsets are frequently committed).
Kafka's "simple broker, complex consumer" approach makes developing and managing consumers more challenging. However, there are benefits too:
- Improved scalability. By offloading much of the tracking and processing logic to the consumer, the broker remains lightweight and able to handle a higher volume of messages.
- Parallel processing. Different consumers can read different parts of the message log simultaneously. This allows for parallel processing of messages, providing a more efficient use of resources.
- Low latency. With less processing required at the broker level, messages can be delivered to consumers with potentially lower latency.
In terms of performance, Kafka clusters are capable of high throughput workloads (millions of messages per second), with extremely low latencies (in the millisecond range). In addition, Kafka is well known for its scalability. Kafka can reliably handle petabytes of data and trillions of messages per day, distributed across hundreds (or even thousands) of brokers, as demonstrated in production by giants like Netflix, Uber, and LinkedIn. See, for instance, how LinkedIn uses Kafka to handle 7 trillion messages per day. I found no evidence that ActiveMQ is geared towards the same level of scalability and throughput as Kafka.
Just like ActiveMQ, Kafka is an open source technology available under the Apache License version 2.0. There are also numerous third-party vendors providing managed Kafka services, which simplify the deployment, scaling, and management of Kafka clusters. Examples include Quix, Confluent, Amazon MSK, Aiven, Instaclustr, and Azure HDInsight. It’s worth pointing out that commercial support for Kafka is more mature and widespread compared to ActiveMQ.
Kafka offers a wide variety of clients, both official, and community-made. These clients target various programming languages, including (but not limited to) Java, Scala, Go, Python, C, C++, Ruby, .NET, PHP, Node.js, and Swift.
It’s safe to say that Kafka has a much larger ecosystem of integrations compared to ActiveMQ. The Kafka Connect framework allows you to easily ingest data from other systems into Kafka, and stream data from Kafka topics to various destinations. There are hundreds of connectors for different types of systems, such as databases (e.g., MongoDB), storage systems (like Azure Blob Storage), messaging systems (for instance, JMS), and many more. Kafka even provides sync and source connectors for ActiveMQ.
Kafka also has the upper hand over ActiveMQ when it comes to native stream processing capabilities. The Kafka Streams library allows you to build real-time stream processing apps with features like joins, aggregations, windowing, and exactly-once processing. In comparison, ActiveMQ doesn’t provide any built-in stream processing features.
Finally, Kafka benefits from a vast and active community, and a numerous user base (in contrast, the ActiveMQ community is significantly smaller). Thousands of companies spanning virtually every industry have embraced Kafka. There are hundreds of Kafka meetups, and tens of Kafka-focused events and conferences all over the world. In addition, there are thousands of blog posts, tutorials, and educational resources related to Kafka, offering a wealth of information on Kafka usage and best practices.
ActiveMQ vs. Kafka: head-to-head comparison
The following table summarizes the differences and similarities between Kafka and ActiveMQ:
ActiveMQ vs. Kafka: use cases
The decision to use ActiveMQ instead of Kafka (or vice versa) depends on the specifics of your use case.
ActiveMQ is a traditional messaging system that's great in the following scenarios:
- Flexible asynchronous messaging. You can use ActiveMQ both as a message queue software, and as a pub/sub message broker. Furthermore, due to its integration with Apache Camel, ActiveMQ is a good choice for implementing various Enterprise Integration Patterns, such as message filtering, message routing (via JMS API message selector), dead letter channels, or request-reply.
- Interoperability. As it supports multiple transport protocols and programming languages, ActiveMQ can act as a communication bridge for apps written in different languages, and using different protocols. For example, with ActiveMQ in the middle, a mobile application developed in Swift and communicating via MQTT can seamlessly exchange data with an enterprise system written in Java that uses the OpenWire protocol.
- Transactional messaging. This is due to ActiveMQ’s guaranteed message delivery, message ordering, persistence capabilities, atomic operations, flexible acknowledgment modes, and message redelivery policies.
Meanwhile, Kafka is an excellent choice for the following use cases:
- High throughput and scalable data pipelines. This is due to Kafka’s distributed system design, horizontal scaling capabilities, and efficient handling of large volumes of real-time data across multiple producers and consumers.
- Stream processing. Kafka has built-in stream processing capabilities (Kafka Streams). Additionally, Kafka integrates with external stream processing frameworks like Apache Spark, Apache Flink, and Apache Storm, to name just a few.
- Event sourcing. Kafka’s immutable and append-only log structure ensures a reliable, ordered, and replayable record of events, enabling the full historical sequence of state changes to be stored and queried.
- Log aggregation. Kafka’s distributed design enables high-throughput and low-latency ingestion of log data, facilitating real-time centralization and analysis.
- Data integration. Due to its numerous sync and source connectors, Kafka makes it easy to transfer data between diverse systems.
Kafka and ActiveMQ total cost of ownership (TCO)
Kafka and ActiveMQ are available for use under a free, open source license. However, just because they are free to use doesn’t mean they are cost-free. Self-hosting open source Kafka/ActiveMQ means you will have to deal with the following main categories of expenses:
- Infrastructure costs. Includes the servers, storage, and networking resources required.
- Operational costs. Refers to all the costs of maintaining, scaling, monitoring, and optimizing your deployment.
- Human resources and manpower. This involves the costs of recruiting and training the required staff (DevOps engineers, developers, architects, etc.), and paying their salaries.
- Downtime costs. While hard to quantify, unexpected cluster failures and service unavailability can lead to reputational damage, reduced customer satisfaction, data loss, missed business opportunities, and lost revenue.
- Miscellaneous expenses. Additional expenses may be required for security and compliance, auditing purposes, and integrations (e.g., building custom integrations and clients in new languages).
The TCO for self-hosting Kafka and ActiveMQ can differ wildly depending on the specifics of your use case, the number of clusters and their size, and the volume of data. The total cost of a self-managed Kafka or ActiveMQ deployment can range from tens of thousands of $ per year (for small deployments and one engineer on payroll) up to millions of $ per year (for large deployments and teams of engineers and architects). A few comments:
- It’s possible for Kafka to be more expensive than ActiveMQ, for the innocuous reason that Kafka is designed for hyper-scale scenarios (thousands of brokers, trillions of messages per day), while ActiveMQ can’t reach the same level of scalability. The more brokers and messages going through the system, the higher the cost of ownership.
- Kafka comes with higher data storage costs, as you can persist vast volumes of data indefinitely. In contrast, you wouldn’t spend as much on persistence with ActiveMQ, which generally stores data for shorter periods of time (usually just long enough to ensure message delivery).
- You might spend more time searching for and recruiting staff skilled in ActiveMQ than Kafka (this is because ActiveMQ is less popular and has a smaller community, with a limited number of experts).
- With ActiveMQ, you might spend more on building custom integrations (if relevant to your use case). Meanwhile, Kafka has a rich ecosystem of ready-made connectors that allow it to seamlessly integrate with other systems.
If self-managing Kafka or ActiveMQ isn’t to your taste, you have the option of fully managed deployments. The burden of self-hosting might make managed services more cost-effective, especially if you have a smaller team, limited expertise, and faster time to market is important to you. As mentioned earlier in this article, there are various ActiveMQ and Kafka vendors out there (the Kafka ones are more numerous), so you can choose the one with the friendliest pricing model for your specific use case and usage patterns.
After reading this article, I hope you better understand the key differences and similarities between Kafka and ActiveMQ, and you can more easily decide which one is best suited to your specific needs. If you’re keen to explore other messaging middleware technologies to see if they make more sense for your use case, check out some of our other blog posts:
If you conclude that Kafka is the right solution for you, but want to avoid the complexity of managing it in-house, I invite you to give Quix a go. A fully managed platform that combines Kafka with serverless stream processing, Quix offers an environment to build, test, and deploy services that derive insights from real-time data streams. Quix removes the need for you to deal with the operational complexity of deploying and scaling Kafka, reducing the cost and time required to extract business value from real-time data. Check out the official documentation to learn more about the capabilities of the Quix platform.
What’s a Rich Text element?
The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.
Static and dynamic content editing
A rich text element can be used with static or dynamic content. For static content, just drop it into any page and begin editing. For dynamic content, add a rich text field to any collection and then connect a rich text element to that field in the settings panel. Voila!
How to customize formatting for each rich text
Headings, paragraphs, blockquotes, figures, images, and figure captions can all be styled after a class is added to the rich text element using the "When inside of" nested selector system.
Mike Rosam is Co-Founder and CEO at Quix, where he works at the intersection of business and technology to pioneer the world's first streaming data development platform. He was previously Head of Innovation at McLaren Applied, where he led the data analytics product line. Mike has a degree in Mechanical Engineering and an MBA from Imperial College London.