Queue-Based Load Leveling

Decouple requests and protect downstream services using a queue to enable asynchronous processing



Spikes in demand and load can put increased pressure on services that can result in a cascade of latency and errors through connected services.

Services may be connecting and coordinating legacy systems that simply cannot scale in the same way as systems provisioned using auto-scaling services like Lambda, DynamoDB or appropriately configured EC2. We may be operating with hard commercial limits on capacity - budgeting AWS costs or integrating with a third-party platform that enforces a contractual constraint on throughput. And finally, even auto-scaling systems take time to respond to sudden increases in demand.


Decouple requests and protect downstream services using an SQS queue to enable asynchronous processing.

Push incoming requests to SQS and create a Lambda function to consume and process the messages. Lambda polls the queue and synchronously invokes the function with an event containing a batch of queue messages. Messages can be pushed to the queue at a rate and volume independent of the throughput of the worker process.


API Gateway and Lambda Function

Expose an API endpoint to handle receiving requests and pushing to a queue. If no processing is required before landing the message on the queue, consider a gateway proxy or push to the queue directly via the SDK.

SQS Queue and Lambda Function Consumer

A Lambda function with an SQS event mapping handles messages. Tune Lambda consumer concurrency and batch size to throttle throughput and buffer downstream systems. Concurrency determines how many function instances are available to process messages.

Dead Letter Queue

A Dead Letter Queue (DLQ) configured on the source queue is used to catch failed messages. Messages will be retried and sent to Lambda up to a configured maximum ReceiveCount. Failed messages held in the DLQ are not lost and can be investigated, debugged and reprocessed.


SQS guarantees at-least-once delivery so Lambda functions need to be able to handle duplicate messages.

When using the Lambda function concurrency setting to control throughput, if messages cannot be processed, the function will generate Throttle Errors. This is the expected behavior.

Making work asynchronous introduces new challenges in tracking the state of processes. Consider combining this pattern with the Asynchronous Request Reply pattern as one approach to communicating state to consumers.

Lambda has hard limits on scaling SQS consumers:

  • maximum 10 messages per batch
  • maximum 1000 concurrent batches
  • maximum increase of 60 provisioned function instances per minute (up to the configured function concurrency limit)

Make sure the DLQ is configured on the source queue, not on the Lambda function. A DLQ on the Lambda function catches failures when asynchronous;y invoking a Lambda function. A DLQ on the source queue is needed to catches message failures.

Cost Profile

API GatewayRequest
API GatewayData Transfer
Lambda(Compute Time x Memory)
CloudWatchLog Data Ingestion