Stream processing systems

Stream processing systems are used to process continuous streams of data. They have many use cases, including real-time analytics, data enrichment, [observability] and [monitoring], and more.

Stream processing systems are used in event-driven systems, as an alternative to [event buses] and message queues. While event buses and message queues are used to decouple components in distributed software, with the aim of making complex systems more modular and scalable, stream processing systems are optimized for the real-time processing and analysis of data – though they can be (and often are) used in the implementation of conventional events and messages as well.

Stream processing systems are distinguished from message queues and event buses by their extremely high [throughput].

Examples of stream processing systems include:

Other common features of stream processing systems include data retention and replayability. Messages may be persisted until their time-to-live (TTL) expires. In the meantime, those messages can be resent to consumers.

Like event-driven systems, stream processing systems tend to implement fan out patterns, which means multiple consumers can connect to the same queue (stream of data), each getting one copy of each message in the queue. This allows multiple independent components to process data in different ways, eg. for logs, data analysis, and real-time service updates.

In summary, stream processing systems can be good solutions for the following use cases:

  • You want to process uniform messages.

  • The messages require a short time to process.

  • You want messages to fan out to multiple decoupled consumers, because there is not a huge cost associated with processing a single message.

  • You have multiple independent systems that need to process the same message in different ways.

  • You need messages to be processed really, really quickly.

  • Constant, rather than "bursty", data flow. Kafka, for example, is designed to handle data flowing through the system at all times. By comparison, traditional message queues like RabbitMQ work better when you don’t have that.