Balanced Binary Search Trees
Here we take notes of balanced binary search trees. To be more specific, we will put more details in the so-called left leaning red-black tree.
Usually, a red-black tree is thought to be extremely hard to implement. There are even a famous lawsuit concerns about it. Here we do not talk about the traditional red-black tree which is invented in the 1980s, but we are going to talk about a newly invented variant of it which is quite easy to implement and use.
To beginning, we first introduce the preliminary binary search tree which is not guaranteed to be balanced.
Below are a skeleton of functions such BST should have(to be simpler, we just use int
as data type:
class Node {
Node left = null;
Node right = null;
Integer key;
Integer val;
public Node(Integer key, Integer val) {
this.key = key;
this.val = val;
}
}
public class BST {
Node root = null;
public Integer get(Integer key) {
return get(root, key);
}
public void put(Integer key, Integer val) {
root = put(root, key, val);
}
private Integer get(Node h, Integer key) {
if (h == null) return null;
else if (key < h.key) return get(h.left, key);
else if (key > h.key) return get(h.right, key);
else return h.val;
}
private Node put(Node h, Integer key, Integer val) {
if (h == null) return new Node(key, val);
else if (key < h.key) h.left = put(h.left, key, val);
else if (key > h.key) h.right = put(h.right, key, val);
else h.val = val;
return h;
}
}
Note in this implementation, we do not add codes for deletion which we will discuss later.
And when we put new data, we always set pointer to left or right subtree and root to the results
of put(Node root, Integer key, Integer val)
function to make the implementation more clear and simpler.
Current implementation's defects are obvious. The search tree may become to deep or even degenerate to
a chain which lead to linear searching and inserting time cost. Also it is volnerable to StackOverFlowError
.
To solve this question. we will introduce red-black tree.
AAccording to Wikipedia, a red-black tree is a binary search tree that has an extra bit for its color which can be black or red. It has the following properties:
- The root is black
- All leaves (which is
null
) are all black - If a node is red, both of its children are black
- Any path from any node to its descendant
null
leaves has the same number of black nodes
From the fourth property we know that the time cost to search the tree at most cost time where is the number of total elements in the tree. Let's assume there are black nodes from root to leaves, then there are at most red nodes in a path from root to leaves. So there are at most nodes in total in a path from root to leaves. Also the shortest path has nodes. As every path has black nodes, we can induct that is at most , thus the longest path from root to leaves is at most which is .
To correctly implement the original red-black tree is extremely hard. But fortunately, one of the authors of red-black tree himself invented a new kind of red-black tree which is easy to code. It is called left leaning red-black tree. This variant is invented at 2008.
The idea of left leaning red-black tree is quite simple: we just forbid red nodes to exist as the right child of any black node. In other words, red nodes can only appear as left child now. This limit reduced cases that need to handle when inserting or deleting a node.
The search operation is the same as the original, the problem exists in the insertion. At first, we let all new inserted node to be red node, then as to keep the red-black tree's left leaning balanced property, we have to do some extra operations. There are several different cases:
- we have a red node as right child, which violate the left leaning property. In this case, we need to perform a rotate left operation.
- we have two connected red nodes, that is we have a node that both its left and left of left child are red. At this time, we have to perform a rotate right operation.
- we have a node whose both left and right child are red. Such node may comes from a right rotation from the second case or new node inserting. As for this case, we need flip colors of two child nodes and current node.
We update the node definition and add a new function named isRed
to check a node's color as we want null
node to be black. The rotate left, rotate right and flip color operations are easy to implement too.
class Node {
Node left = null;
Node right = null;
Integer key;
Integer val;
boolean isRed = true;
public Node(Integer key, Integer val) {
this.key = key;
this.val = val;
}
}
public class RedBlackBST {
private boolean isRed(Node n) {
return !(n == null) && n.isRed;
}
private Node rotateLeft(Node h) {
Node x = h.right;
h.right = x.left;
x.left = h;
x.isRed = h.isRed;
h.isRed = true;
return x;
}
private Node rotateRight(Node h) {
Node x = h.left;
h.left = x.right;
x.right = h;
x.isRed = h.isRed;
h.isRed = true;
return x;
}
private void flipColor(Node h) {
h.isRed = true;
h.left.isRed = false;
h.right.isRed = false;
}
/* ... omitted ... */
}
As you see, to get a more simpler implementation of put operation, we let rotateLeft
and rotateRight
function returns the root of subtree after rotation, so we can modify the original put function as:
public class RedBlackBST {
/* ... omitted ... */
public void put(Integer key, Integer val) {
root = put(root, key, val);
root.isRed = false;
}
private Node put(Node h, Integer key, Integer val) {
if (h == null) return new Node(key, val);
else if (key < h.key) h.left = put(h.left, key, val);
else if (key > h.key) h.right = put(h.right, key, val);
else h.val = val;
// Adjust colors and structures
if (isRed(h.right) && !isRed(h.left)) h = rotateLeft(h);
if (isRed(h.left) && isRed(h.left.left)) h = rotateRight(h);
if (isRed(h.left) && isRed(h.right)) flipColor(h);
return h;
}
}
Note that we need to keep root
of the whole tree to be black all the time. So after each put we always set
root.isRed
to false.
As to compare left leaning red black tree to preliminary binary search tree, we test by sequentially inserting
1000000 values. The preliminary binary search tree just raised a StackOverflowError
while our red-black tree
runs successfully in about 5.76
seconds. After reduce the key range from 1000000 to 10000, the preliminary
implementation ran for about 1.24
seconds while red black tree finished in 0.12
seconds. The time cost
is reduced quite a lot.
Delete a key
Delete a key is a little complex even in preliminary binary search tree. We can first think about
the way to delete the node with minimum or maximum key of subtree. We also use recursive way here.
To find the lowest key, we follow on pointers to left child until we meet a node whose left node
is null
, then we remove that node. Following code shows how to do it:
private Node deleteMin(Node x) {
if (x.left == null) return x.right;
x.left = deleteMin(x.left);
return x;
}
The implementation of deleteMax
is similar. Now we make use of this method to enables removing
a node whose both two children are not null. The idea is simple. We replace the node with the node
with minimum key in its right subtree and then use deleteMin
to delete that min node.(Or we replace
it with the node with maximum key in its left subtree)
To achieve this, we may need another function to get the node with minimum key first. Here we use loop instead of recursion to do this.
private Node min(Node x) {
while (x.left != null) {
x = x.left;
}
return x;
}
Then, along with deletion method of nodes which either left or right node is null, we can provide the delete function:
private Node delete(Node x, Integer key) {
if (x == null) return null;
if (key < x.key) delete(x.left, key);
else if (key > x.key) delete(x.right, key);
else {
if (x.left == null) return x.right;
else if (x.right == null) return x.left;
Node t = x;
x = min(t.right);
x.right = deleteMin(t.right);
x.left = t.left;
}
return x;
}
The deletion in red-black tree is in similar idea but is more complex as we need to keep the balance of tree and subtrees while recursively deletion. For the details of this aspect please refer to the book Algorithm.
Select and Rank
In any binary search tree, if we keep the size of subtree in each node, we will be going to perform select and rank operation fast. The select operation returns the kth least key in the tree and rank operation returns a key's rank in the tree.
The implememtation is like:
private int size(Node x) {
if (x == null) return 0;
else return x.size;
}
public Integer select(int k) {
Node x = select(root, k);
if (x != null) return x.key;
else return x;
}
private Node select(Node x, int k) {
// k starts from zero
if (x == null) return null;
int t = size(x.left);
if (t > k) return select(x.left, k);
else (t < k) return select(x.right, k - t - 1);
else return x;
}
and,
public int rank(int key) {
return rank(root, key);
}
private int rank(Node x, int key) {
if (x == null) return 0;
if (key < x.key) return rank(x.left, key);
else if (key > x.key) return size(x.left) + 1 + rank(x.right, key);
else return size(x.left);
}
Range query
We can also perform range query on the tree. For example, if we'd like to return keys between lo
and hi
inclusive,
we can use the following code:
public Iterable<integer> keys(Integer lo, Integer hi) {
Queue<Integer> queue = new Queue<Integer>();
keys(root, queue, lo, hi);
return queue;
}
private void keys(Node x, Queue<Integer> queue, Integer lo, Integer hi) {
if (x == null) return;
if (x.key > lo) keys(x.left, queue, lo, hi);
if (x.key >= lo && x.key <= hi) queue.enqueue(x.key);
if (x.key < hi) keys(x.right, queue, lo, hi);
}