Microservices and API Gateway Patterns
How microservices decompose monoliths into independently deployable services, how API gateways and service meshes manage cross-cutting concerns, and when the Backend for Frontend pattern simplifies client integration.
Terminology
- Monolith: a single deployable unit containing all application logic, where components share the same process, memory, and database
- Microservice: a small, independently deployable service that owns a specific business capability and communicates with other services over the network
- API gateway: a single entry point that sits between external clients and internal microservices, handling routing, authentication, rate limiting, and protocol translation
- Service mesh: an infrastructure layer of lightweight proxies (sidecars) deployed alongside each service instance, handling service-to-service communication concerns like load balancing, retries, and observability
- Sidecar proxy: a proxy process that runs alongside a service instance (in the same pod or host), intercepting all inbound and outbound network traffic transparently
- BFF (Backend for Frontend): a pattern where each client type (web, mobile, IoT) gets its own dedicated backend service that aggregates and transforms data from downstream microservices
- API composition: a pattern where a service (often the API gateway or a BFF) calls multiple downstream services and combines their responses into a single response for the client
- Service discovery: the mechanism by which services find the network addresses of other services, either through a registry (Consul, etcd) or DNS-based resolution
- Circuit breaker: a pattern that prevents a service from repeatedly calling a failing downstream service; after a threshold of failures, the circuit "opens" and requests fail fast without attempting the call
- Bounded context: a domain-driven design concept that defines the boundary within which a particular domain model applies, often used to determine microservice boundaries
- Saga: a pattern for managing distributed transactions across multiple services using a sequence of local transactions with compensating actions for rollback
- Strangler fig: a migration pattern where a monolith is incrementally replaced by microservices, routing traffic to new services as they are built while the monolith shrinks over time
- Idempotency key: a unique identifier attached to a request that allows the receiver to detect and safely ignore duplicate requests
- Fan-out: a pattern where a single request triggers calls to multiple downstream services in parallel
What & Why
A monolithic application starts simple. All code lives in one repository, deploys as one artifact, and shares one database. For small teams and early-stage products, this is the right choice. But as the application grows, the monolith becomes painful. A change to the checkout flow requires redeploying the entire application. A memory leak in the recommendation engine takes down the payment system. The team cannot scale the search service independently from the user profile service.
Microservices address these problems by decomposing the application into small, independently deployable services. Each service owns a specific business capability (orders, payments, inventory, notifications), has its own database, and communicates with other services over the network. Teams can develop, test, deploy, and scale each service independently.
The trade-off is complexity. A monolith has one deployment, one database, and function calls between components. Microservices have dozens or hundreds of deployments, distributed data, and network calls that can fail, time out, or arrive out of order. You need infrastructure to handle routing, authentication, retries, circuit breaking, observability, and service discovery. This is where API gateways and service meshes come in.
An API gateway is the front door. External clients (browsers, mobile apps) talk to the gateway, which routes requests to the appropriate internal services. The gateway handles cross-cutting concerns that every request needs: authentication, rate limiting, request logging, and protocol translation (REST to gRPC, for example).
A service mesh handles the same concerns for internal service-to-service traffic. Instead of embedding retry logic, circuit breakers, and mutual TLS into every service, a sidecar proxy (like Envoy) is deployed alongside each service instance. The proxy intercepts all traffic and applies policies consistently, without the service code knowing about it.
How It Works
Monolith vs Microservices
API Gateway Pattern
The API gateway sits at the edge of the system. External clients send all requests to the gateway, which handles:
- Routing: maps external URLs to internal service endpoints (e.g.,
/api/ordersroutes to the Orders service). - Authentication: validates tokens or API keys before forwarding requests, so individual services do not need to implement auth.
- Rate limiting: enforces request quotas per client or API key.
- API composition: for requests that need data from multiple services, the gateway calls them in parallel and merges the responses.
- Protocol translation: accepts REST from external clients and translates to gRPC or other internal protocols.
Backend for Frontend (BFF)
A single API gateway serving both web and mobile clients often leads to bloated responses. The web app needs detailed product data with reviews, while the mobile app needs a compact summary. The BFF pattern creates a dedicated backend for each client type. The web BFF aggregates and formats data for the web app. The mobile BFF does the same for mobile, returning smaller payloads optimized for bandwidth constraints.
Service Mesh
A service mesh deploys a sidecar proxy alongside every service instance. All traffic flows through the proxy, which provides:
- Mutual TLS: encrypts all service-to-service traffic without application code changes.
- Load balancing: distributes requests across instances of the target service.
- Retries and timeouts: automatically retries failed requests with configurable backoff.
- Circuit breaking: stops sending traffic to unhealthy instances.
- Observability: collects metrics, traces, and logs for every request.
The control plane (Istio, Linkerd) configures all the sidecar proxies centrally. Services communicate as if they are making normal network calls, unaware that the mesh is handling reliability and security.
API Composition
When a client needs data from multiple services (user profile + order history + recommendations), the gateway or BFF makes parallel calls to each service and combines the results:
- Receive client request.
- Fan out: call User Service, Order Service, and Recommendation Service in parallel.
- Wait for all responses (with timeouts).
- Merge responses into a single payload.
- Return to client.
If one downstream service is slow or fails, the composition layer can return a partial response with degraded functionality rather than failing entirely.
Complexity Analysis
Let $s$ be the number of downstream services called during API composition, $t_i$ be the response time of service $i$, and $n$ be the total number of microservices in the system.
| Operation | Latency | Notes |
|---|---|---|
| Sequential composition | $\sum_{i=1}^{s} t_i$ | Each call waits for the previous |
| Parallel composition | $\max(t_1, t_2, \ldots, t_s)$ | All calls execute concurrently |
| Gateway routing | $O(1)$ or $O(\log n)$ | Hash map or prefix-tree lookup |
| Service discovery lookup | $O(1)$ | Cached registry or DNS |
| Mesh sidecar overhead | $O(1)$ per hop | Typically 1-3ms added latency |
The overall availability of a composed request depends on the availability of each downstream service. For $s$ independent services each with availability $a$:
With 5 services each at 99.9% availability, the composed availability drops to $0.999^5 \approx 99.5\%$. This is why retries, circuit breakers, and graceful degradation are essential in microservice architectures.
Implementation
ALGORITHM APIGatewayRoute(request, routeTable)
INPUT: incoming HTTP request, route table mapping paths to services
OUTPUT: response from the target service
BEGIN
// Authenticate
token <- EXTRACT_AUTH_TOKEN(request)
IF NOT VALIDATE_TOKEN(token) THEN
RETURN 401 Unauthorized
END IF
// Rate limit
clientId <- EXTRACT_CLIENT_ID(token)
IF RATE_LIMIT_EXCEEDED(clientId) THEN
RETURN 429 Too Many Requests
END IF
// Route to service
targetService <- LOOKUP routeTable FOR request.path
IF targetService IS NIL THEN
RETURN 404 Not Found
END IF
// Forward request
response <- FORWARD(request, targetService)
RETURN response
END
ALGORITHM APIComposition(request, serviceCalls)
INPUT: client request, list of (serviceName, serviceRequest) pairs
OUTPUT: merged response
BEGIN
responses <- empty map
errors <- empty list
// Fan out: call all services in parallel
futures <- empty list
FOR EACH (name, serviceReq) IN serviceCalls DO
future <- ASYNC_CALL(name, serviceReq, timeout: 2 seconds)
APPEND (name, future) TO futures
END FOR
// Collect results
FOR EACH (name, future) IN futures DO
result <- AWAIT future
IF result IS success THEN
responses[name] <- result.body
ELSE
APPEND name TO errors
responses[name] <- default fallback for name
END IF
END FOR
merged <- MERGE_RESPONSES(responses)
IF LENGTH(errors) > 0 THEN
merged.degraded <- true
merged.failedServices <- errors
END IF
RETURN merged
END
ALGORITHM CircuitBreaker(service, request)
INPUT: target service, outgoing request
STATE: failCount, state (CLOSED/OPEN/HALF_OPEN), lastFailTime
CONSTANTS: THRESHOLD, TIMEOUT_DURATION
OUTPUT: response or fast failure
BEGIN
IF state = OPEN THEN
IF NOW() - lastFailTime > TIMEOUT_DURATION THEN
state <- HALF_OPEN
ELSE
RETURN error "circuit open, failing fast"
END IF
END IF
response <- CALL(service, request)
IF response IS failure THEN
failCount <- failCount + 1
lastFailTime <- NOW()
IF failCount >= THRESHOLD THEN
state <- OPEN
END IF
RETURN error "service call failed"
ELSE
IF state = HALF_OPEN THEN
state <- CLOSED
END IF
failCount <- 0
RETURN response
END IF
END
ALGORITHM ServiceDiscovery(registry, serviceName)
INPUT: service registry, name of the target service
OUTPUT: list of healthy instance addresses
BEGIN
instances <- registry.LOOKUP(serviceName)
healthyInstances <- empty list
FOR EACH instance IN instances DO
IF instance.status = HEALTHY THEN
APPEND instance.address TO healthyInstances
END IF
END FOR
IF LENGTH(healthyInstances) = 0 THEN
RETURN error "no healthy instances for " + serviceName
END IF
RETURN healthyInstances
END
Real-World Applications
- E-commerce platforms: companies decompose monolithic stores into services for catalog, cart, checkout, payments, and shipping, each scaling independently during traffic spikes like sales events
- Streaming services: video platforms use microservices for user profiles, content metadata, recommendation engines, encoding pipelines, and playback, with an API gateway handling millions of concurrent client connections
- Banking and fintech: financial institutions use microservices with strict bounded contexts for accounts, transactions, fraud detection, and compliance, with service meshes providing mutual TLS and audit logging
- Mobile backends: apps with web and mobile clients use the BFF pattern to serve optimized payloads to each platform, reducing bandwidth usage on mobile while providing rich data to web dashboards
- SaaS platforms: multi-tenant software uses API gateways for tenant isolation, authentication, and usage-based rate limiting, routing requests to shared or dedicated service instances based on the tenant's plan
Key Takeaways
- Microservices trade the simplicity of a monolith for independent deployability, technology diversity, and granular scaling, at the cost of distributed system complexity
- API gateways centralize cross-cutting concerns (auth, rate limiting, routing) at the edge, preventing duplication across services and simplifying client integration
- The BFF pattern gives each client type a dedicated backend that aggregates and shapes data from downstream services, avoiding one-size-fits-all API responses
- Service meshes (Envoy, Linkerd, Istio) handle service-to-service reliability (retries, circuit breaking, mTLS) transparently via sidecar proxies, keeping business logic clean
- Composed availability drops multiplicatively: with $s$ services at 99.9% each, overall availability is $0.999^s$, making retries, timeouts, and graceful degradation essential