Notes/Message Queues and Event-Driven Architecture

Message Queues and Event-Driven Architecture

How message queues decouple producers from consumers, the differences between pub/sub and point-to-point messaging, and the trade-offs between at-least-once and exactly-once delivery guarantees.

2021-02-15AI-Synthesized from Personal Notes

System DesignMessage QueuesEvent Driven

Terminology

Message queue: a middleware component that accepts messages from producers, stores them durably, and delivers them to consumers in order, decoupling the sender from the receiver
Producer (publisher): the component that creates and sends messages to the queue or topic
Consumer (subscriber): the component that reads and processes messages from the queue or topic
Broker: the server process that manages queues, stores messages, and coordinates delivery between producers and consumers
Topic: a named channel in a pub/sub system where messages are published; subscribers receive copies of all messages sent to topics they subscribe to
Queue (point-to-point): a message destination where each message is delivered to exactly one consumer, enabling work distribution across a pool of workers
Pub/sub (publish-subscribe): a messaging pattern where messages are broadcast to all subscribers of a topic, enabling fan-out communication
Consumer group: a set of consumers that share the work of processing messages from a topic; each message is delivered to one consumer within the group
Offset: a sequential identifier for a message's position within a partition or queue, used by consumers to track their read position
Acknowledgment (ack): a signal from the consumer to the broker confirming that a message has been successfully processed and can be removed or marked as consumed
At-most-once delivery: a guarantee where messages may be lost but are never delivered more than once; the producer sends without waiting for acknowledgment
At-least-once delivery: a guarantee where messages are never lost but may be delivered more than once; the broker retries until the consumer acknowledges
Exactly-once delivery: a guarantee where each message is processed exactly one time; achieved through idempotent consumers or transactional protocols
Dead letter queue (DLQ): a special queue where messages that cannot be processed after a configured number of retries are moved for manual inspection
Backpressure: a flow control mechanism where a slow consumer signals the producer or broker to reduce the message sending rate
Idempotency: the property that processing the same message multiple times produces the same result as processing it once, essential for safe at-least-once delivery
Partition: a subdivision of a topic that enables parallel consumption; messages within a partition are ordered, but ordering across partitions is not guaranteed

What & Why

In a simple architecture, services call each other directly. Service A sends an HTTP request to Service B and waits for a response. This works until Service B is slow, overloaded, or down. When that happens, Service A blocks, its thread pool fills up, and failures cascade through the system.

Message queues break this tight coupling. Instead of calling Service B directly, Service A drops a message into a queue and moves on. Service B picks up the message when it is ready. If Service B is temporarily down, messages accumulate in the queue and are processed when it recovers. The producer and consumer operate independently, at their own pace, without knowing about each other.

This decoupling provides three key benefits. First, resilience: if a consumer crashes, messages are not lost because the broker stores them durably. Second, scalability: you can add more consumers to process messages in parallel without changing the producer. Third, traffic shaping: the queue absorbs bursts of traffic, smoothing out spikes so downstream services receive a steady flow.

The two fundamental messaging patterns are point-to-point and pub/sub. In point-to-point, each message goes to exactly one consumer, which is ideal for distributing work (order processing, image resizing, email sending). In pub/sub, each message is broadcast to all subscribers, which is ideal for event notification (user signed up, payment completed, inventory changed).

The hardest design decision is the delivery guarantee. At-most-once is fast but lossy. At-least-once is safe but requires consumers to handle duplicates. Exactly-once is the ideal but is expensive to achieve, typically requiring idempotent consumers or transactional coordination between the broker and the consumer's data store.

How It Works

Point-to-Point Messaging

In point-to-point messaging, the broker maintains a queue of messages. Multiple consumers can connect to the same queue, but each message is delivered to only one of them. The broker typically uses round-robin or least-busy assignment. When a consumer acknowledges a message, it is removed from the queue. If a consumer fails before acknowledging, the broker redelivers the message to another consumer.

Pub/Sub Messaging

In pub/sub, producers publish messages to a topic. Every subscriber to that topic receives a copy of every message. This enables fan-out: a single "order placed" event can trigger inventory updates, email notifications, analytics recording, and fraud detection simultaneously, each handled by a different subscriber.

Consumer groups combine pub/sub with point-to-point. Within a consumer group, each message is delivered to only one member. Across groups, every group gets every message. This lets you scale processing within a service (multiple instances in one group) while still broadcasting to multiple services (different groups).

Delivery Guarantees

At-most-once: the producer sends the message and does not retry. If the broker or network drops it, the message is lost. This is the fastest option and acceptable for metrics, logs, or other data where occasional loss is tolerable.

At-least-once: the producer retries until the broker acknowledges receipt. The broker redelivers to consumers until they acknowledge processing. If a consumer processes a message but crashes before acknowledging, the message is redelivered, resulting in a duplicate. Consumers must be idempotent to handle this safely.

Exactly-once: achieved by combining at-least-once delivery with idempotent processing. The consumer assigns each message a unique ID and checks whether it has already been processed before applying the effect. Some systems (Kafka with transactions) provide exactly-once semantics at the broker level by atomically committing consumer offsets and produced messages.

Dead Letter Queues

When a message fails processing repeatedly (poison message), the broker moves it to a dead letter queue after a configured retry limit. This prevents a single bad message from blocking the entire queue. Engineers can inspect the DLQ, fix the issue, and replay the messages.

Complexity Analysis

Let $p$ be the number of partitions, $c$ be the number of consumers in a group, and $n$ be the number of messages.

Operation	Time	Notes
Produce (append)	$O(1)$	Append to end of partition log
Consume (sequential read)	$O(1)$	Read at current offset
Seek to offset	$O(1)$	Direct offset lookup
Fan-out to $s$ subscribers	$O(s)$	Each subscriber reads independently
Rebalance consumers	$O(p)$	Reassign partitions across consumers

Throughput scales linearly with partitions. With $p$ partitions and $c = p$ consumers, the aggregate throughput is:

$\text{Throughput}_{\text{total}} = p \times \text{Throughput}_{\text{per-partition}}$

The maximum useful parallelism is bounded by the number of partitions: adding consumers beyond $p$ leaves some idle, since each partition is assigned to at most one consumer within a group.

$\text{Effective consumers} = \min(c, p)$

Implementation

ALGORITHM ProduceMessage(broker, topic, key, value)
INPUT: broker connection, topic name, message key, message value
OUTPUT: acknowledgment with partition and offset
BEGIN
  partition <- HASH(key) MOD broker.partitionCount(topic)
  message <- CREATE message with (key, value, timestamp: NOW())

  SEND message TO broker FOR topic AT partition
  ack <- WAIT FOR broker acknowledgment

  IF ack IS timeout THEN
    RETRY ProduceMessage(broker, topic, key, value)
  END IF

  RETURN ack
END


ALGORITHM ConsumeMessages(broker, topic, groupId)
INPUT: broker connection, topic name, consumer group ID
STATE: offset: last committed offset per assigned partition
BEGIN
  assignedPartitions <- broker.ASSIGN_PARTITIONS(topic, groupId)

  LOOP
    batch <- broker.FETCH(assignedPartitions, fromOffsets: offsets)

    FOR EACH message IN batch DO
      success <- PROCESS(message)

      IF success THEN
        offsets[message.partition] <- message.offset + 1
      ELSE
        retryCount <- GET_RETRY_COUNT(message)
        IF retryCount >= MAX_RETRIES THEN
          SEND message TO deadLetterQueue
          offsets[message.partition] <- message.offset + 1
        ELSE
          INCREMENT_RETRY_COUNT(message)
          // Do not advance offset; message will be redelivered
        END IF
      END IF
    END FOR

    COMMIT offsets TO broker
  END LOOP
END


ALGORITHM IdempotentConsumer(message, processedIds, dataStore)
INPUT: message with unique messageId, set of processed IDs, data store
OUTPUT: processing result
BEGIN
  IF message.messageId IN processedIds THEN
    RETURN "already processed, skipping"
  END IF

  BEGIN TRANSACTION on dataStore
    result <- APPLY_BUSINESS_LOGIC(message, dataStore)
    INSERT message.messageId INTO processedIds
  COMMIT TRANSACTION

  RETURN result
END


ALGORITHM PartitionRebalance(topic, consumers, partitions)
INPUT: topic, list of active consumers, list of partitions
OUTPUT: assignment map of consumer -> list of partitions
BEGIN
  assignment <- empty map

  SORT partitions BY partition ID
  SORT consumers BY consumer ID

  partitionsPerConsumer <- LENGTH(partitions) / LENGTH(consumers)
  remainder <- LENGTH(partitions) MOD LENGTH(consumers)

  partitionIndex <- 0
  FOR i <- 0 TO LENGTH(consumers) - 1 DO
    count <- partitionsPerConsumer
    IF i < remainder THEN
      count <- count + 1
    END IF

    assignment[consumers[i]] <- partitions[partitionIndex .. partitionIndex + count - 1]
    partitionIndex <- partitionIndex + count
  END FOR

  RETURN assignment
END

Real-World Applications

Order processing: e-commerce platforms publish "order placed" events to a topic; separate consumers handle payment processing, inventory reservation, shipping label generation, and confirmation emails independently
Log aggregation: application servers produce log messages to a centralized queue (Kafka); consumers index them into Elasticsearch for search and Grafana for monitoring dashboards
Task queues: web applications offload slow operations (image resizing, PDF generation, email sending) to background workers via queues like RabbitMQ or Amazon SQS
Event sourcing: systems record every state change as an immutable event in a log; the current state is reconstructed by replaying events, enabling audit trails and temporal queries
Microservice communication: services communicate asynchronously through events rather than synchronous HTTP calls, reducing coupling and improving resilience to individual service failures
Stream processing: real-time analytics pipelines consume events from Kafka topics, compute aggregations (click counts, revenue totals) in sliding windows, and write results to dashboards

Key Takeaways

Message queues decouple producers from consumers, providing resilience (messages survive consumer downtime), scalability (add consumers for parallel processing), and traffic smoothing (absorb bursts)
Point-to-point delivers each message to one consumer for work distribution; pub/sub broadcasts to all subscribers for event notification; consumer groups combine both patterns
At-least-once delivery with idempotent consumers is the most practical guarantee for most systems, balancing safety against complexity
Partitions are the unit of parallelism: throughput scales linearly with partition count, but consumers beyond the partition count sit idle
Dead letter queues prevent poison messages from blocking processing, giving engineers a safe place to inspect and replay failed messages

2021-02-16

Microservices and API Gateway Patterns

How microservices decompose monoliths into independently deployable services, how API gateways and service meshes manage cross-cutting concerns, and when the Backend for Frontend pattern simplifies client integration.

2021-02-17

Rate Limiting and Circuit Breakers

How rate limiting algorithms like token bucket and sliding window protect services from overload, and how circuit breakers prevent cascading failures by failing fast when downstream services are unhealthy.

2021-02-18

CDNs and Edge Computing

How content delivery networks distribute content to edge servers worldwide, how cache hierarchies reduce origin load, and how edge functions move computation closer to users for lower latency.