Scaling S3: Navigating request Rates, throughput, scaling, and S3TA

23 Jan 2026, written by Tahmeed Zaman

Running S3 at any scale, from archiving petabytes of data to powering a global-video streaming service, requires predictable performance. Although S3 scales horizontally, performance is limited by two key constraints:

Request rate limits: affected by how many internal partitions your keys span
Network Throughput: based on client bandwidth, distance, and concurrency

To maintain reliable systems, developers must understand and account for these limitations. Today, we’ll discuss request rate limits and techniques to scale them using prefixes. We’ll also dive into how S3 Transfer Acceleration and multipart uploads can improve throughput.

Scaling Request Rates with Prefixes

Let’s start with a closer look at request rate throttling.

S3 scales by dividing data across partitions, with objects stored in distinct segments of virtual memory. Object keys are designed to be partition-aware, where each prefix maps to a specific partition. That said, partitions have request limits, with the baseline request rate of

~3,500 writes per second for PUT, COPY, POST and DELETE.

~5,500 reads per second for GET and HEAD.

Prefixes determine partitions

Partition behavior in AWS is driven by the keys. For instance, a wildlife video workload may separate content using key prefixes like:

birds/pigeon/…
birds/robin/…
birds/owl/…

In this setup, each prefix maps to a separate partition. Because the rate limit is enforced at the prefix level, distributing traffic evenly across three prefixes triples the baseline capacity to ~10,500 writes per second and ~16,500 reads per second across the entire dataset.

AWS also partitions data automatically. If request volume for a single prefix exceeds its baseline capacity, S3 recognizes the increased load and divides the prefix into multiple internal partitions to increase capacity. This process is fast but not instantaneous, so users may briefly encounter 503 Slow Down responses.

With no limit on bucket prefixes, balancing workloads across multiple prefixes is a simple way to stay below the per-prefix rate limits.

Increasing Throughput with S3 Transfer Acceleration

Requests are only one part of the equation. After designing your S3 buckets to handle high request volumes, the next hurdle to address is throughput.

AWS does not specify throughput numbers for S3 the same way it does for request rates because throughput depends on multiple variables: client bandwidth, latency between regions, object size, concurrency, and other factors beyond AWS’s control. To help mitigate transfer bottlenecks, AWS offers S3 Transfer Acceleration (S3TA).

How S3TA Works

S3TA enhances upload and download performance by directing each transfer through the nearest AWS edge location.

For an object request such as https://birds.s3accelerate.amazonaws.com/pigeon/pigeon.mp4, the request first hits the closest AWS edge location. It then travels across AWS’s private network to the correct S3 bucket, avoiding the public internet’s inherent latency for faster transfers.

The latency improvement is impressive, with AWS reporting a 1.5x-10x reduction in latency. However, this performance boost comes with an additional fee at approximately $0.04 per gigabyte (both incoming and outgoing data).

Accelerating Large File Transfers with Multi-Part Uploads

To maximize throughput for very large objects, you can use S3 Transfer Acceleration with multi-part uploads.

AWS benchmarks indicate that a 485 MB file upload over the public internet using a PUT request takes roughly 72 seconds. With S3TA, the same upload takes only 43 seconds, a 40% improvement). Together with multi-part uploads, this time is reduced further to only 28 seconds, almost 2.6x faster than the original.

Multi-part uploads is ideal for files over 100MB; for files exceeding 5GB, it is necessary.

This strategy offers several advantages:

Parallelization: Objects are broken up into independently uploaded pieces so that they can be uploaded in parallel.
Resilience: Only the singular failed part is retried, avoiding the need to restart the entire upload on large files.
Efficiency with S3TA: S3TA routes each part to the nearest AWS edge location for consistent high throughput.

By combining multi-part uploads with S3TA, you can fully leverage concurrent network steams for maximum performance.

Common use cases for S3TA

Global Content Distribution

Imagine opening an app to stream a live NBA game or catch up on your favorite Netflix series. Without S3TA, video requests have to traverse the public internet, resulting in slow startup times and possible playback disruptions.

Before S3TA, viewers faced delays between 5-8 seconds before videos could start and may encounter frequent buffering throughout the stream.

But after S3TA, requests are routed through the closest AWS edge location and then via AWS’s private network. This results in a seamless, high-quality streaming experience, even in areas with unreliable connectivity.

Disaster Recovery Replication

In environments with tight recovery-point objectives (RPO), any delay in data backup translates into potential data loss. Financial service firms (e.g. Capital One) that handle millions of transactions every hour cannot risk gaps in backups. Without S3TA, uploading a 1TB snapshot takes 36 minutes over public internet—that’s 36 minutes of unprotected data. With S3TA, the same snapshot is completed in 9 minutes, cutting exposure by 75%. With multi-part uploads, the 1TB snapshot is broken into smaller parts and uploaded concurrently, bringing the total time down to just 6 minutes. This means an unprotected window that is more than 80% faster than the original.

Key Takeaways

To achieve the best performance for workloads, developers need to carefully manage both request rates and throughput limits:

Balance traffic across multiple key prefixes to avoid 503 Slow Down errors and allow S3 to autoscale partitions, scaling traffic evenly.
Activate S3TA to route data through the closest AWS edge location and across AWS’s private network, cutting down latency for users around the world.
Leverage multi‑part uploads to efficiently transfer large objects by uploading parts in parallel and automatically retrying any failed parts.

Collectively, these strategies transform S3 from a basic storage solution into a scaleable, high performance infrastructure.