I’ve read multiple system design books, taken courses, and watched several playlists on YouTube. One of the most common and foundational system design questions that almost everyone starts with is the Bitly problem: designing a URL shortener that converts long URLs into shorter, more manageable links.

Over time, this question started to feel dry and repetitive. I often found myself skipping it whenever I saw it again.

Recently, though, I came across a very interesting variation of the problem. I tried designing a solution for it, and that is what inspired me to write this post.

The Challenge

The challenge is to design a URL shortener that supports 1 trillion URLs.

At first glance, this sounds like a simple scaling problem, but it actually changes a lot of things and breaks many of the naive assumptions that usually work for the standard Bitly-style system design question.

Even if we assume a base-62 encoding (0-9, a-z, A-Z), a 7-character string gives us roughly 3.5 trillion possible combinations.

That sounds like plenty for 1 trillion URLs, but things get complicated once we start thinking about distribution, storage, and collision guarantees.

Also, 62^7 is only the raw theoretical namespace. In practice, the usable space is smaller because some codes will be reserved for custom aliases, blocked words, internal testing, and occasionally wasted ranges from crashed allocators. The margin is still comfortable, but it is not infinite.

Constraints

  1. The shortened URL can have at most 7 characters.
  2. The system must guarantee unique URLs with no collisions.
  3. The system must support 1 trillion URLs over its entire lifetime.

Before diving into the trillion-URL challenge, let’s first revisit the standard approach used to solve the traditional URL shortener design problem.

The Traditional Approach

Functional Requirements

Core requirements

  • Users should be able to submit a long URL and receive a shortened version.
  • Optionally, users should be able to specify a custom alias for their shortened URL, i.e. www.short.ly/my-custom-alias.
  • Optionally, users should be able to specify an expiration date for their shortened URL.
  • Users should be able to access the original URL by using the shortened URL.

Below the line (out of scope)

  • User authentication and account management.
  • Analytics on link clicks, such as click counts or geographic data.

Non-Functional Requirements

Core requirements

  • The system should ensure uniqueness for short codes, where each short code maps to exactly one long URL.
  • Redirection should occur with minimal delay, ideally under 100 ms.
  • The system should be reliable and available 99.99% of the time, with availability prioritized over consistency.
  • The system should scale to support 1 billion shortened URLs and 100 million DAU.

Below the line (out of scope)

  • Real-time consistency for analytics.
  • Advanced security features such as spam detection and malicious URL filtering.

High-Level Design

We’ll go through the functional requirements one by one and design a system that satisfies them.

1. Submitting a long URL and receiving a short URL

When a user submits a long URL, the client sends a POST request to /urls with the long URL, custom alias, and expiration date.

The flow looks like this:

  1. The primary server receives the request and validates the long URL format using libraries like is-url or simple validation logic.
  2. Optionally, we can check whether this exact long URL was already shortened and return the existing short code as a deduplication optimization.
  3. In practice, most URL shorteners allow multiple short codes for the same long URL, since different users may want different expiration dates, separate analytics, or different custom aliases.
  4. If the URL is valid, we generate a short code and store the mapping.

Generating the short code

The standard answer is to use a hash function that produces enough randomness to make collisions unlikely. A hash function like MD5 takes an input and produces a deterministic fixed-size output. That means the same long URL would always map to the same hash, which is useful if you want deterministic code generation.

But the system still needs to store the redirect mapping, because a hash is not reversible and we still need metadata such as expiration dates, ownership, and custom aliases.

It is also not desirable if you need multiple short codes for the same URL.

Hash outputs also have high entropy, which makes them appear random. We can encode that output using base-62 and take the first 7 characters as our shortcode.

Here, encoding simply means converting binary hash output into a sequence of readable characters from a chosen alphabet so it can be used as a short, URL-friendly code.

This gives us approximately 62^7 = 3.52 trillion possible values, which is a large namespace.

But a large namespace does not make random or truncated-hash generation safe. If the code space has size |S| and n codes are already in use, the probability that the next random code collides is n / |S|. That means you still need retries and database checks to enforce uniqueness. In a space of this size, even a system that creates around 1 billion links would expect on the order of 10^5 colliding pairs unless it performs uniqueness checks.

At more ordinary scale, a more production-friendly baseline is to stop relying on random hashes and use a centralized counter instead. Redis is a good fit here because INCR is atomic and hands out unique integers efficiently. We can base-62-encode each integer into a short code and guarantee uniqueness without a retry loop.

Even if links expire, we usually do not recycle short codes. Reusing codes creates ugly edge cases with stale caches, delayed clients, and old analytics data, so it is safer to treat the namespace as append-only over the system’s lifetime.

Once we have the short URL, we can insert it into the database along with the long URL, optional custom alias, and expiration date. Finally, we return the shortened URL to the client.

2. Accessing the original URL from the shortened URL

Once the short URL is live, users can use it to reach the original URL. Importantly, that shortened URL exists under a domain we own.

When a user accesses a shortened URL, the flow looks like this:

  1. The browser sends a GET request with the short code, for example GET /abc123.
  2. The primary server looks up the short code in the database.
  3. If the short code exists and has not expired, the server retrieves the long URL. If it has expired, the server returns 410 Gone.
  4. The server responds with an HTTP redirect, usually either 301 or 302.

For a URL shortener, a 302 redirect is often preferred because:

  • It gives us more control over the redirection process, allowing us to update or expire links later.
  • It prevents browsers from aggressively caching the redirect.
  • It still allows us to track click statistics, even though analytics are out of scope for this design.

How do we make redirects fast?

A naive database lookup could devolve into a full table scan, which is clearly too slow.

A better baseline is to add an index, or simply make the shortened URL the primary key. That gives us indexed lookups and also enforces uniqueness.

The remaining problem is SSD IOPS. A single database instance would still struggle to keep up with heavy traffic, leading to slower response times and possible timeouts.

A much better solution is to place an in-memory cache like Redis or Memcached between the server and the database. Frequently accessed mappings from short code to long URL can live in memory.

The read path then becomes:

  • On a cache hit, return the long URL in milliseconds.
  • On a cache miss, query the database, populate the cache, and return the result.

The difference in speed is significant:

  • Memory access time: about 100 nanoseconds (0.0001 ms)
  • SSD access time: about 0.1 ms
  • HDD access time: about 10 ms

That means memory access is roughly 1,000 times faster than SSD and 100,000 times faster than HDD.

In terms of operations per second:

  • Memory can support millions of reads per second.
  • SSDs can support roughly 100,000 IOPS.
  • HDDs typically support around 100 to 200 IOPS.

The only real challenge here is cache invalidation, which can get complicated when updates or deletions happen. In this system, though, the problem is smaller because shortened URLs are mostly read-heavy and rarely change.

The cache also needs time to warm up, so the first few requests for a URL may still hit the database. Since memory is limited, we also need to think carefully about cache size, eviction policies such as LRU, and which entries are worth storing.

3. Scaling the standard design to 1 billion URLs

Let’s do some rough sizing.

Each row in the database contains:

  • A short code, roughly 8 bytes
  • A long URL, roughly 100 bytes
  • creationTime, roughly 8 bytes
  • An optional custom alias, roughly 100 bytes
  • An expiration date, roughly 8 bytes

That totals around 200 bytes per row. If we round up to 500 bytes to account for metadata such as creator ID, analytics ID, and internal overhead, then 1 billion mappings would require:

500 bytes * 1 billion rows = 500 GB

That is still within the capabilities of modern SSDs, and if we need more headroom, we can shard data across multiple servers.

Reads are much more frequent than writes, so we can separate the system into reader and writer services and scale them independently. We can then add more server instances behind a load balancer to handle higher RPS without concentrating load on a single machine.

Here is the high-level difference between the two versions of the problem:

Aspect Standard shortener Trillion-scale shortener
ID generation Centralized counter or sequence Range-based allocation across many writers
Collision handling Database checks are acceptable Uniqueness must be generated up front
Storage Single relational cluster can still work Sharded distributed key-value storage
Read path Cache in front of the database Multi-region cache plus edge-aware reads
Public code shape Sequential codes may be acceptable Codes should be scrambled to prevent enumeration

All of this works well for the standard problem. Now let’s go back to the trillion-URL version and look at why those answers stop working.

The core shift is from generating IDs with local randomness to treating the shortcode space as a globally allocated namespace.

Why the Usual Approaches Fail at 1 Trillion URLs

The constraints change the problem enough that several standard answers break down.

1. Truncated hashes stop being safe

The assumption that MD5 plus base-62 gives us enough entropy fails here. Once we truncate the hash to just 7 characters, we are throwing away most of the hash space, and the birthday paradox tells us that collisions become mathematically unavoidable.

The birthday paradox says that in a room of just 23 people, there is about a 50% chance that two people share the same birthday, even though there are 365 possible days.

That feels surprising because 23 is much smaller than 365.

The reason is that we are comparing many pairs, not just one.

The number of comparisons among k items is:

\[\frac{k(k-1)}{2}\]

So the probability of collision grows quadratically as the number of generated IDs increases.

After base-62 encoding, the total possible ID space is approximately 3.5 trillion. The rough intuition is that collisions start becoming noticeable around $\sqrt{N}$, but the more precise 50% threshold is:

\[k_{50} \approx \sqrt{2N \ln 2}\]

Where:

  • k_{50} is the point where the chance of at least one collision is about 50%
  • N is the size of the total ID space

For 7 base-62 characters:

\[N = 62^7\] \[N = 3.5 \times 10^{12}\]

So:

\[k_{50} \approx \sqrt{2 \cdot 62^7 \cdot \ln 2}\] \[k_{50} \approx 2.21 \times 10^6\]

So by about 2.2 million generated IDs, the system already has roughly a 50% chance of at least one collision. Even at 2 million IDs, the probability is already significant. In this system, that is disastrous, because a user could be redirected to the wrong site.

2. Retry-on-collision becomes too expensive

One way to patch the problem is to salt the URL and keep rehashing until you find a unique value.

That creates a loop like this:

hash -> DB check -> collision? -> rehash -> check again

At trillion-record scale, the database index will itself be several terabytes. Every collision check becomes a random lookup against that massive index, and many of those lookups will fall out of cache.

If you have heavy write traffic, you end up spending a large chunk of your compute budget just proving that a string has not been used before. In the worst case, that becomes extremely expensive and operationally ugly.

We need to stop searching for uniqueness and start generating uniqueness.

Generating Uniqueness Instead of Searching for It

The simplest way to guarantee uniqueness is to use a counter again. That mathematically guarantees a unique value for every URL.

But now we hit the next problem: we cannot hand users a giant integer because the short URL is constrained to at most 7 characters.

Turning a large integer into a 7-character code

The solution is to map the large integer into a base-62 string.

A base-62 alphabet can look like this:

  • 0-9 (10 characters)
  • a-z (26 characters)
  • A-Z (26 characters)

That gives us:

\[62 = 10 + 26 + 26\]

Here is a simple example. Assume our counter is 500.

\[500 / 62 = 8 \text{ remainder } 4\]

The remainder 4 maps to the character at index 4 in the alphabet, and the quotient 8 becomes the next digit in the conversion.

This gives us a one-to-one mapping where every integer in the usable range maps to a unique short string, and we can decode it back correctly if needed. There are no collision checks and no guesswork. The mapping itself is O(1).

Avoiding a global counter bottleneck

A single counter is a disaster in a distributed system, because every writer would need to talk to the same coordinator.

To solve that, we introduce a coordinator service like ZooKeeper. It acts as a distributed, highly available source of truth and hands out ranges of IDs to each server.

For example:

  • Server A gets 1 to 1,000,000
  • Server B gets 1,000,001 to 2,000,000

Now each server can allocate IDs locally in memory by incrementing a local integer. That means:

  • No network calls for every URL creation
  • No coordination on every write
  • No locks
  • No database uniqueness checks

Once a server exhausts its range, it goes back to ZooKeeper and requests another one.

If Server A crashes, the worst case is that we lose some unused IDs from its range. That is fine. We have roughly 3.5 trillion total combinations and only need 1 trillion of them, so wasting some IDs is acceptable.

This makes ID generation solid. We are no longer hoping collisions do not happen. We have mathematically eliminated them.

The Storage Wall

Now we get to the storage problem.

If each URL takes roughly 100 bytes, then 100B * 1T = 100 TB of raw URL data alone, and that is before timestamps, indexes, metadata, replication, and everything else.

That is not something a single PostgreSQL instance should handle. Even if it could, a deep B-tree index at this scale would translate into a cascade of disk lookups.

So the obvious answer is to distribute and shard the data across multiple nodes. For example, if 100 nodes each handle 5 TB, the problem becomes much more manageable.

Routing requests to the right shard

The next question is how to know which node holds which URL.

The simplest approach is modulo-based sharding. The routing rule can be as simple as:

shard = hash(short_url) % 100

That gives us O(1) routing without a central directory.

If we expect the number of shards to change frequently, true consistent hashing or rendezvous hashing is a better choice because it minimizes remapping when nodes are added or removed.

Since this workload is mostly key-value lookups with no major joins or cross-node transactions, databases like Cassandra or DynamoDB are a much better fit than a monolithic relational database.

They are built for exactly this type of access pattern, and their LSM-tree storage engines handle write-heavy workloads much better while also simplifying sharding and replication.

The read path at trillion scale

The Pareto principle applies strongly here: a small fraction of URLs will receive most of the traffic.

So treating every URL equally is wasteful. Instead, we place a distributed cache such as Redis or Memcached in front of the storage layer.

The read path becomes:

  • Cache hit: return the URL in under a millisecond.
  • Cache miss: look up the correct shard in Cassandra, fetch the URL, store it in Redis, and then return the redirect.

Security, Global UX, and the Reductionist Mindset

There is one important issue that usually does not come up early in system design interviews: security.

Making the URLs unpredictable

If the system simply uses a visible counter, then the short URLs become predictable. If someone knows one URL, they can often guess the previous or next one.

That means an attacker could scrape the namespace, discover private links, or infer business intelligence like usage growth.

So even though we still want a counter under the hood, we need the outward-facing short codes to look random.

That is where a Feistel cipher helps. More precisely, we use it as a reversible permutation over the fixed ID space: the counter value is scrambled into another integer of the same size.

Every input still maps to a unique output, and we can still reverse it back to the original value. In other words, it behaves like format-preserving scrambling. But from the outside, the output looks random.

That gives us the best of both worlds: guaranteed uniqueness underneath and non-predictable public URLs on top.

Global reads and disaster scenarios

Now imagine a regional failure, or simply a user very far away from the region where the system is deployed.

If all redirects are served from, say, Northern Virginia, then a user in Tokyo has to make a full round trip to Virginia before being redirected. That can easily add around 300 ms, which is a poor user experience for something as simple as a redirect.

To avoid that, we can push reads and redirects closer to the user with a CDN like Cloudflare or Akamai, but that only helps if the short-code mapping is also available near the edge. That can be done by caching redirect responses, storing hot mappings in edge KV, or running edge compute backed by nearby replicas. Otherwise, the edge still has to call back to origin.

With edge-local data, the Tokyo user can be served from the Tokyo edge in something closer to 10 ms.

Writes are different. If a user in New York creates a link, it gets written to Cassandra and then replicated asynchronously to Tokyo.

That brings us to the CAP theorem.

Consistency, availability, and partition tolerance cannot all be maximized at the same time. In this case, we are willing to give up immediate consistency so we can preserve high availability and partition tolerance.

That means eventual consistency is acceptable. If a user in Tokyo cannot immediately access a link that was just created in New York because replication is still catching up, that is not a fatal problem. A refresh a few seconds later should resolve it.

So we are okay with eventual consistency, but we are not willing to compromise on availability or partition tolerance.

The one thing we absolutely cannot compromise on is uniqueness. That is the beauty of the range-manager approach. Even if New York and Tokyo are completely isolated from each other, as long as they were assigned non-overlapping ranges before the split, they can continue generating unique URLs independently with no risk of overlap.

Final Takeaway

Let’s step back and trace what we built here.

We did not approach this as a simple URL-shortening coding task. We approached it as a namespace-management problem.

  1. We determined the size of the namespace: 62^7, or about 3.5 trillion possible short codes.
  2. We reduced uniqueness to a sequential counter.
  3. We handled distributed coordination through range allocation with ZooKeeper.
  4. We scaled storage with sharded key-value data in Cassandra.
  5. We reduced read latency by caching the hot set with Redis.
  6. We addressed predictability and security with a Feistel cipher.

That is the reductionist mindset.

At this scale, you cannot brute-force your way through the problem with more code. You have to find the right abstraction that makes the problem manageable again.

If you walk into an interview and say, “I’ll use a hash and hope for the best,” you are thinking like a coder, which is fine. But if you talk about the physics of the ID space, explain why LSM trees outperform B-trees for this write-heavy profile, or discuss the birthday paradox and what it means for truncated hashes, you show a deeper understanding of how systems actually work.

That is what I mean when I say that understanding why a system fails is more important than knowing how to build it.

In the end, this is TinyURL at trillion scale: a simple-looking problem on the surface, but a masterclass in distributed systems underneath.