How to code
First, I prepare two HashMap to store short URL and original url. One is for storing short url as a key and original url as a value. Another is for the reverse. Next, how I create tiny url from original url. In my case, I create tiny url by random characters, such as
0-9, a-z, A-Z. And the size of tiny url is 10. If tiny url which I create is duplicate, I create it until the url is not duplicate.
How to design
In this problem of leet code, some additional questions is prepared. I pick up some questions of them.
How many unique identifiers possible?
In my case, I set that the size of tiny url is 10. And the number of characters is 26 + 26 + 10 = 62 because the letters of tiny url are alphabet (both uppercase and lowercase) and numbers. So the number of unique tiny urls is 6210 ≒ 1018. One billion is 109 and the population of the world is about 7 billion.
Estimate the maximum number of URLs a single machine can store.
The capacity depends on memory and disk size of the machine. I think about the capacity to store the urls. I assume the each number blow.
- The number of urls I have to store is
one million (10^6)
- The size of each url is
In this case, since the size of one character is about one byte, the size of url is about 100 byte. And there are one million urls, so the size is 108 byte = 100MB.
Java has 2GB memory so that there are enough capacity to store one million urls.
Estimate the maximum number of queries per second (QPS) for decoding a shortened URL in a single machine.
When we decode a shorted URL, split method and containsKey are executed. If the size of url is 100, this process takes O(102) time. In my java environment, O(102) process takes 102 nano time. Therefore one decode process takes 0.1 ms. So in a second, QPS is 107. (System.nanoTime())
How would you scale the service?
There are some solutions how to scale. When we use some machine to store data, we have to consider how would we distribute the data. If this service is used in all over the world, it’s effective to distribute each data by user location.
If you can enable caching, what would you cache and what’s the expiry time?
- Least Recently Used
- Least Frequently Used