2017-06-19

Design TinyURL in LeetCode

Tiny URL Design Cache queries per second QPS memory LeetCode

Problem

TinyUrl is a URL that a original URL is compressed. This problem is how should we design TinyURL services.

https://leetcode.com/problems/design-tinyurl/#/solutions Design TinyURL - LeetCode

How to code

First, I prepare two HashMap to store short URL and original url. One is for storing short url as a key and original url as a value. Another is for the reverse. Next, how I create tiny url from original url. In my case, I create tiny url by random characters, such as 0-9, a-z, A-Z. And the size of tiny url is 10. If tiny url which I create is duplicate, I create it until the url is not duplicate.

Source Code

How to design

In this problem of leet code, some additional questions is prepared. I pick up some questions of them.

How many unique identifiers possible?

In my case, I set that the size of tiny url is 10. And the number of characters is 26 + 26 + 10 = 62 because the letters of tiny url are alphabet (both uppercase and lowercase) and numbers. So the number of unique tiny urls is 62¹⁰ ≒ 10¹⁸. One billion is 10⁹ and the population of the world is about 7 billion.

Estimate the maximum number of URLs a single machine can store.

The capacity depends on memory and disk size of the machine. I think about the capacity to store the urls. I assume the each number blow.

The number of urls I have to store is one million (10^6)
The size of each url is 100 (10^2)

In this case, since the size of one character is about one byte, the size of url is about 100 byte. And there are one million urls, so the size is 10⁸ byte = 100MB.

Java has 2GB memory so that there are enough capacity to store one million urls.

Estimate the maximum number of queries per second (QPS) for decoding a shortened URL in a single machine.

When we decode a shorted URL, split method and containsKey are executed. If the size of url is 100, this process takes O(10²) time. In my java environment, O(10²) process takes 10² nano time. Therefore one decode process takes 0.1 ms. So in a second, QPS is 10⁷. (System.nanoTime())

How would you scale the service?

There are some solutions how to scale. When we use some machine to store data, we have to consider how would we distribute the data. If this service is used in all over the world, it’s effective to distribute each data by user location.

If you can enable caching, what would you cache and what’s the expiry time?

Least Recently Used
Least Frequently Used

2017-06-13

２分探索木からノードを削除する Delete Node in a BST in LeetCode

LeetCode Binary Search Tree Delete node

問題

2分探索木とint型のkeyが与えられる。BSTからkeyの要素を削除する問題。

leetcode.com

解法

以下のステップで解く

2分探索でkeyを見つける
削除するノードの右側のサブツリーの最小値を持つノードを削除するノードに置き換える。
右側のサブツリーから最小値のノードを削除する
(削除するノードのleft, rightのどちらかがnullの場合はnullではない方のサブツリーを削除するノードと置き換える)

f:id:atiek1121:20170613091118p:plain

ソースコード

2017-06-05

Perfect SquareかをO(logN)で判定する LeetCode

LeetCode Binary Search sqrt

問題

leetcode.com

解法

単純にfor文を用いた解法だとO(√N)となり間に合わない。 2文探索を用いることでO(logN)で解くことができる。

ソースコード

2017-05-30

【グラフ理論とネットワーク理論】Extended Complete Graph

Extended Complete Graph ネットワーク理論ネットワーク拡大完全グラフ

Connectivity and Damage Index

Connectivity

Connectivity express the reliability of the graph

Node Connectivity Cv(G) : Minimum number of nodes to be removed to make a graph unconnected.
Branch Connectivity Ce(G) : Minimum number of branches to be removed to make a graph unconnected.
Node connectivity = d → d - 1 node failures OK
Branch connectivity = d → d - 1 branch failures OK

Theorem1

Cv(G) ≦ Ce(G) ≦ δ(G) : δ = degree (the number of branches of one node)

Damage index

Damage index is an index of the increase of hops to connect in case of failure.

α-Node(Branch) Damage Index Iα(G) (Jα(G)) is defined as blow

Maximum diameter of a graph without α nodes(branches) - Diameter of the original graph G

Design Method of Extended Complete Graph

f:id:atiek1121:20170530094111p:plain

Extended Complete Graph : Complete graph Gd → Line conversion, m times → Extended complete graph.

L^m(Gd) = L(L(…(L(G))…)) (m times)

Ex. 1 f:id:atiek1121:20170530095406p:plain

	Number of nodes	Number of branched
Gd	d+1	(d+1)d
L(Gd)	(d+1)d	(d+1)d²
L²(Gd)	(d+1)d²	(d+1)d³

Comparison

	Complete graph	Extended Complete Graph
正則性 Regularity	d-Regular	d-Regular
Number of nodes `V`	d + 1	(d + 1)d^m
Number of branches `E`	(d + 1)d	(d+1)d^m+1
直径 Diameter k(G)	1	m+1
Node connectivity Cv(G)	d	d
Branch connectivity Ce(G)	d	d
次数 Degree δ(G)	d	d
節点罹障度 Node damage index Iα(G)	0	≦ m
枝罹障度 Branch damage index Jα(G)	1	≦ m+1

Throughput ratio in case of failure

k(G)/(Jα(G) + k(G)) → 0.5 for both Gd and L^m(Gd)

2017-05-24

サイズNの配列からM個の要素をランダムに取り出す

乱数シャッフル shuffle random

問題

サイズNの配列からM個の要素を取り出し新たな配列を返すメソッドを実装する。各要素が選ばれる確率は全て等確率になるようにする。

解き方

はじめに元々の配列の先頭からM個取り出して配列を作る。後は、インデックスがm以上の要素と以下の要素を交換するチャンスを等しく与えてあげることで、等確率をつくることができる。

ソースコード

2017-05-24

Least Recently Used (LRU) cache を実装する

Cracking The Coding Interview LeetCode HashMap LinkedList Least Recently Used LRU cache キャッシュ

問題

LRUキャッシュとはキーとバリューをマッピングするデータ構造で、キャパティを超えたら最後に使われたキャッシュが削除されるという挙動をする。

leetcode.com

解法

HashMapと双方向連結リストを利用することで実装することができる。リストを用いることで、使われた順番を管理することができる。

ソースコード

2017-05-23

合計の入れ替え Cracking The Coding Interview

Cracking The Coding Interview HashSet

問題

２つの整数配列が与えられる。それぞれの配列から１つずつ要素を決め、それらを交換した時に２つの配列の要素の合計値が一致するようなペアを見つける問題。

入力: {4, 1, 2, 1, 1, 2} と {3, 6, 3, 3}
出力: {1, 3}

解法

全列挙の解法 O(A*B)

i, jを添え字にしてforループを２つネストすることで２つの配列における全ての要素の組合せを得られる。i, jをスワップさせた時に合計値が一致すればそれらを出力する。この解法の計算量は(O(A*B)) (２つの配列のサイズをそれぞれA, Bとする)

O(A + B)の解法

実は、配列Aから交換する要素が決定されれば、配列B内の交換するべき要素の候補は決定する。配列A, Bの総和をsumA, sumBとし、それぞれの配列内の交換するべき要素をa, bと置くと。 sumA - a + b = sumB - b + aが成り立つ。あとはaだけを全列挙すればbが決定されるので、計算して得たbが配列B内に存在するかをチェックしてあげれば良い。この解法の計算量はO(A + B)。

マツシタのお勉強

welcome to my engineering blog

Problem

How to code

Source Code

How to design

How many unique identifiers possible?

Estimate the maximum number of URLs a single machine can store.

Estimate the maximum number of queries per second (QPS) for decoding a shortened URL in a single machine.

How would you scale the service?

If you can enable caching, what would you cache and what’s the expiry time?

問題

解法

問題

解法

Connectivity and Damage Index

Connectivity

Theorem1

Damage index

Design Method of Extended Complete Graph

Comparison

Throughput ratio in case of failure

問題

解き方

問題

解法

問題

解法

全列挙の解法 O(A*B)

O(A + B)の解法