8/180: Binary Search Tree And LRU Cache

This is day 8 of my 180 days challenge, and I am happy with my progress hope I will keep working on me like this. This challenge has been taken by myself to make learning as a habit.

Today I have gone through Binary Search Tree and LRU Cache of Data structure. I have picked these two topics as these are the interviewers favorite. So Today we are going to discuss these two topics.

Binary Search Tree

Tree (Graph theory)

A
/|\
B C D
|
E

We have access to the root node exclusively all other nodes can be accessed via the root node. The leaf nodes are the ones having no children.

A tree is a G(V, E) undirected graph in which any two vertices are connected by exactly one path or equivalently a connected acyclic undirected graph.

Linked Lists: It is another data structure so the aim is to be able to store items efficiently (insertion and removal operations). Arrays have a huge disadvantage; there may be holes in the data structure and we have to shift a lot of items this problem can be eliminated by linked lists.

We have access to the first node of the linked list other items can be accessed starting with this node.

Binary Search Tree:

12 — -layer 1 with 2⁰ node
/\
4 20 — -layer 2 with 2¹ nodes
/\
1 5 — — layer h with 2^h-1 nodes

Every node in the tree can have at most 2 children (left child and right child). Left child is smaller than the parent node. Right child is greater than the parent node. We can access the root node exclusively and all other nodes can be accessed via the root node.

Every decision can get rid of half of the data (Like with binary search). And this is how we can achieve O(logN) running time.

The height of a tree is the number of edges on the longest downwanrd path between the root and a leaf node. The number of layers the tree contains

2^h-1 = N

log2^h-1 = logN

h = logN+1

h = O(logN)

The logarithm O(log N) running time is valid only when the tree structure is balanced. We should keep height of tree at a minimum which is h = log N

The tree structure may become imbalanced which means the number of nodes significantly differ in the subtrees. If the tree is imbalanced so the h = logN relation is no more valid then the operations running time is no more O(logN) logarithmic

Imbalanced is a tree where the running time of operations can be reduced to even O(N) linear running time complexity. Balanced tree is a tree where the running time of operations are O(logN) always.

Binary search tree aims to store items efficiently. It keeps the keys in sorted order so that lookup and other operations can the principle of binary search with O(logN) running time. Each comparison allows the operations to skip over half of the tree, so that each operation takes time proportional to the logarithm of number of items stored in the tree. This is much better than O(N) the linear time required to find items by key is an unsorted array but slower than the corresponding operations on hashtables with O(1).

Maximum item in the BST is the right most leaf node.

Minimum item in the BST is the left most leaf node.

Remove a leaf node: Basically we just have to notify the parent that the child has been removed. The node will be deleted by the garbage collector.

Remove node with single child: Basically we just have to notify the parent that the left or right child has been changed.

Remove node with two children: successor : smallest item in the right subtree, predecessor: largest item in the left subtree. We swap predecessor with the node to be removed.

Tree traversal: means visiting every node of the binary search tree exactly once in O(N) linear running time.

  1. Pre order traversal -> root->left->right
  2. Post order traversal->left->right->root
  3. In order traversal->left->root->right : it gives the sorted order of elements.

LRU (Least Recently used Cache)

Naïve Approach: use a single hashtable and we achieve put and get operations in O(1) but the memory complexity is not favorable.

LRU Cache use doubly linked list to achieve this goal and we have to use hashtables as well to boost linked lists.

A ← → B← →C← →D← →E

A is head and E is tail.

We usually use age bits to track the items stored in the cache frequently used items have higher age bit values. When the cache is full we have to remove the least recently used item.

Other approach: items closer to the head are more important.

Problem: When we consider a given item (URL) that given item may be in the memory already. In this case we have to update the value. Of course first we have to find the given item: with linked list this operation takes O(N) time complexity. This is why we use HASHTABLES. We can reduce the O(N) running time to O(1).

Why to use doubly linked list->We need the pointers to the tail as well, when we remove the least significant item. If we do not have a pointer to the tail first we have to find the last item which can be done on O(N) when using singly linked list.

A(0)

We visit page A with id 0 (we use id as key in hashtable) in every insertion we check whether the item is present in the cache or not (if, yes we do an update)

B(1)← →A(0)

We visit page B with id 1

E(4)← →D(3)← →C(2)← →B(1)← →A(0)

Consider we have visited page C, D, E respectively and all these are stored in hashtable now.

We visit F(5), but now the cache is full, we will delete item from tail and insert at head.

F(5)← →E(4)← →D(3)← →C(2)← →B(1)

Now lets say we visit page D, we can find page D already in our hashtable, so we will remove item D and insert again at the head.

D(3)← →F(5)← →E(4)← →C(2)← →B(1)

This is how our least recently used cache work. I will write soon with implementation of all the two topics we discussed today.

I am Indian by birth, Punjabi by destiny. Humanity is my religion. Love to eat, travel, read books and my million dreams keep me alive.