Sunday, June 2, 2019

Merging heaps in O(log N)?

Can you merge two heaps in O(log N)? Turns out if they are leftist heaps, you can!
But what is a leftist heap, and how is it different from binary heap?

Recall that binary heap is both complete and balanced. Hence it looks something like this:
               1
             /   \
            2     3
           / \   /
          5   6 7
             Fig 1
It's clear that it supports O(log N) operations.

In contrast, this is what leftist heap looks like:
               1
              / \
             3   2
            /   /
           5   7
          /
         6
             Fig 2
It's neither complete nor balanced. Yet, it supports O(log N) operations too!

How does it work?


Idea behind Leftist Heap


Each node on a leftist heap has a 'rank'. It is defined as the shortest distance to its empty child. Hence in the example tree above (Fig 2), the rank of node '1' is 2, while the rank of node '3' is 1. By definition, the rank of an empty (NULL) node is 0.

Each node maintains the following property:
1. value of current node is smaller than those of its children (the usual min-heap property)
2. the rank of its left child is greater than or equal to the rank of its right child (the leftist property!)

The leftist property has a consequence: the rank of a node is = 1 + the rank of its right child. Do you see it?

So, how does the leftist property help us support O(log N) insertion and deletion operations? Introducing, the "merge" operation!

Merging Leftist Heap


As it turns out, we can merge 2 leftist heap efficiently:
1. First we compare the min values of the leftist heaps. Let's call the one with a smaller min value as A, and the other as B.
2. We recursively merge B with the right child of A, call the result M.
3. Then we attach M as the new right child of A.
4. Wait, what if M has a larger rank than the left child of A? Well, we simply swap them! Hence, we maintain leftist property of heap A as such.

Here is an illustration:

               1         MERGE WITH           4
              / \                            / \
             3   2                          7   5
            /   /                          /
           5   7                          12
          /
         6
              A                               B

Here we merge B to the right child of A. Let's focus on those:

          2    MERGE WITH         4
         /                       / \
        7                       7   5
                               /
                              12

In this case, we merge heap '4' with right child of '2'. Since the right child is
empty, we simply attach '4' to it:

                2
               / \
              7   4
                 / \
                7   5
               /
              12

But since the left child of '2' has a rank of 1 while its right child has a rank of 2, we must swap the children:
                2
               / \
              4   7
             / \
            7   5
           /
          12
               M
Next we attach this to the right child of the original heap A:
               1 
              / \ 
             3   2
            /   / \
           5   4   7
          /   / \
         6   7   5
            /
           12
Finally, since the left child of '1' has rank 1 while its right child has rank 2, we must do a final swap and we get the result of the merge as:

                1 
              /   \ 
             2     3
            / \   /
           4  7  5 
          / \   /
         7  5  6
        /
       12

The time complexity of merge operation is O(rank of heap). (Here, rank of heap is the rank of its root node)

But, what is the upper bound of the rank, in relation to the size of the tree? It turns out, for a leftist heap with rank R, the number of nodes in the heap is at least \(2^R-1\). This means \(2^R-1\) <= N or R <= log (N+1), or R = O(log N).

Proof: By induction on S(R) = minimum size of leftist heap with rank R.
Base case: S(1). The smallest possible such leftist heap is a leaf node, hence S(1) = 1 = \(2^1 -1\).
Inductive case: Say \(S(k) = 2^k-1\) for all k <= R. What is S(R+1)?
Consider the right subtree of the root. This is a leftist heap with rank R. Hence smallest possible size for this subtree is S(R) = \(2^R-1\).
How about the left subtree of the root? It can have rank >= R. The smallest possible size is therefore S(R) = \(2^R-1\) when the rank = R.
Hence in total, S(R+1) must be \((2^R-1) + (2^R-1) + 1 = 2^{R+1}-1\), proving the inductive case.

Hence time complexity of merge operation is O(log N).


Insertion and Deletion on Leftist Heap


Insertion can be done as merging the existing heap with a new heap with 1 node. Deletion of the root of a heap is done by merge its left and right children.

Hence these operations are O(log N)!

Here is the full implementation:

#include <cstdio>
#include <vector>
#include <cstdlib>
#include <queue>
#include <cassert>
using namespace std;

// NOTE: This is an implementation of leftist heap for educational purposes
// only. It does not deal with any memory management, and it pays 0 respect
// for software eng practices. Read and use at your own risk.

// Leftist heap node definition.
struct Node {
  int value;
  int rank;
  Node *left;
  Node *right;
};

// Private functions.
Node *create_leaf_node(int value) {
  return new Node{value, 1, nullptr, nullptr};
}

int get_rank(Node *node) {
  if (!node) return 0;
  return node->rank;
}

// Creates a new leftist heap.
Node *create_node(int value, Node *heap_a, Node *heap_b) {
  if (get_rank(heap_a) >= get_rank(heap_b)) {
    return new Node{value, get_rank(heap_b)+1, heap_a, heap_b};
  }
  return new Node{value, get_rank(heap_a)+1, heap_b, heap_a};
}

// Merges leftist heap 'first' with 'second' and returns the pointer to the
// merged heap. Destroys both 'first' and 'second'.
Node *merge(Node *first, Node *second) {
  if (!first) {
    return second;
  }
  if (!second) {
    return first;
  }
  if (first->value > second->value) {
    return merge(second, first);
  }
  return create_node(first->value, first->left, merge(first->right, second));
}

// Debug interface.
void print(Node *heap) {
  if (!heap) {
    printf("E");
    return;
  }
  printf("(%d[%d]", heap->value, heap->rank);
  print(heap->left);
  print(heap->right);
  printf(")");
}

// Public interfaces.
Node *push(Node *heap, int value) {
  return merge(heap, create_leaf_node(value));
}

int top(Node *heap) {
  return heap->value;
}

Node *pop(Node *heap) {
  return merge(heap->left, heap->right);
}

int main() {
  int N = 500000;
  Node *heap = nullptr;
  std::priority_queue<int> pq;
  for (int i = 0; i < N; ++i) {
    int v = rand() % N;
    pq.push(-v);
    heap = push(heap, v);
  }
  while (N-- > 0) {
    assert(top(heap) == -pq.top());
    heap = pop(heap);
    pq.pop();
  }
  return 0;
}

4 comments:

  1. Thanks for the blog!
    How about a blog on Segment Tree questions that involve storing a BST in each node. Explanation with an example is really appreciated. :)

    ReplyDelete
  2. Thank you for your post. This is excellent information. It is amazing and wonderful to visit your site.
    best event management in chennai
    event management services in chennai

    ReplyDelete