Decouple requests and protect downstream services using a queue to enable asynchronous processing
Spikes in demand and load can put increased pressure on services that can result in a cascade of latency and errors through connected services.
Services may be connecting and coordinating legacy systems that simply cannot scale in the same way as systems provisioned using auto-scaling services like Lambda, DynamoDB or appropriately configured EC2. We may be operating with hard commercial limits on capacity - budgeting AWS costs or integrating with a third-party platform that enforces a contractual constraint on throughput. And finally, even auto-scaling systems take time to respond to sudden increases in demand.
Throttling can be found under Queue-Based Load Leveling.