Sunday, November 15, 2020

in-place merge sort (sort of)

So I just saw this discussion https://stackoverflow.com/questions/2571049/how-to-sort-in-place-using-the-merge-sort-algorithm and I think the top-voted answer is brilliant. Felt like I have found a rare gem and want to share it to the rest of the world.

The problem statement is: "How do you perform in-place merge sort?".

In-place more or less means requiring O(1) extra space. However, in this post, I'm going to describe a variant which requires O(log N) extra space and takes O(N log^2 N) to run, since the final algorithm will still be using recursions. I think it's a good trade-off, since it still highlights the main ideas and the code is still simple.

We'll arrive at the algorithm by a series of false starts.

Let's start with the ordinary merge sort:

sort(A[], i, j):
    m = (i + j) / 2
    sort(A[], i, m)
    sort(A[], m+1, j)
    merge(A[], i, m, m+1, j)

Or in plain English: to sort, you sort the first half, then sort the second half, and then merge the two sorted list into one sorted list.

The merge operation is very simple, just like merging 2 sorted decks of cards: repeatedly take one from the top of each pile, then place the smaller one on the third pile.

This "third" pile is usually represented as a temporary array, large enough to hold the result of the merge, which means O(N) extra space.

Here comes the challenge: Can we do better than this in terms of extra space required?

If you have never thought of this problem before, I recommend you thinking about it a little bit (or a lot) before I spoil it to you.


OK.


The first thing that we may try to do is to right away try to merge the two sorted halves. That's pretty hard.

So there comes the brilliant idea #1 from the Stack Overflow answer: what if we leave some unsorted regions which we can use to store our temporary merge operations?

For example, say we now only aim to sort half of the array, leaving the other half unsorted:
[  <------ 1/2 unsorted ------->   |   <------- 1/2 to be sorted -------> ]

Then we can use the unsorted region to store our merge operation by swapping sorted items with unsorted items.

Here's an illustration:
2 5 | 1 4 | A B C D |

[A, B, C, D] is the unsorted items, while [2, 5] and [1, 4] are the sorted lists to be merged. Then the merging operation, done step by step, will look as follows:

2 5 | 1 4 | A B C D |   // swap A and 1.
^     ^     ^
2 5 | A 41 B C D |   // swap B and 2
^       ^     ^
B 5 | A 4 | 1 2 C D |   // swap C and 4
  ^     ^       ^
B 5 | A C | 1 2 4 D |   // swap D and 5
  ^       ^       ^
B D | A C | 1 2 4 5 |   // Done
    ^     ^         ^

We end up with [B, D, A, C] unsorted on the left side, and [1, 2, 4, 5] sorted on the right side.

With this idea alone, you would be able to sort half of the array. What about the other half of the array? Well, we can do the same too! So schematically it will look like this:
[ <--- 1/4 unsorted ----> | <---- 1/4 sorted ---> | <--- 1/2 sorted ----> ]

However at this point, what we want is to merge the 1/4 sorted and 1/2 sorted parts, but we only have 1/4 unsorted buffer which is clearly not enough to contain 1/4 + 1/2 = 3/4 of the items.

Here comes brilliant idea #2, which I will call "in-place merging with empty slots".

Say we 2 sorted lists of size M and N. For example, say M = 3 and N = 5, as shown in the following illustration:

2 4 7 | A B C 1 3 5 6 9 |

Here, [2, 4, 7] is a sorted list of size M = 3, and [1, 3, 5, 6, 9] is a sorted of size N = 5. 
A, B, C are the empty slots, since M = 3, there are 3 of them. We name them A, B, C just to help keeping track of them. In reality, we don't care they order of the empty slots.

How shall we proceed in merging them? Follow this and you will get it:

3 6 7 | A B C 1 2 5 8 9 |   // swap A and 1
^       ^     ^
3 6 7 | 1 B C A 2 5 8 9 |   // swap B and 2
^         ^     ^
3 6 7 | 1 2 C A B 5 8 9 |   // swap C and 3
^           ^     ^
C 6 7 | 1 2 3 A B 5 8 9 |   // swap A and 5
  ^           ^   ^
C 6 7 | 1 2 3 5 B A 8 9 |   // swap B and 6
  ^             ^   ^
C B 7 | 1 2 3 5 6 A 8 9 |   // swap A and 7
    ^             ^ ^
C B A | 1 2 3 5 6 7 8 9 |   // Done (kind of)
      ^             ^
                    ^

It might be harder to understand because the empty slots are intermixed with the sorted list. However it has very intuitive feel to it. As the merging process goes on, the empty slots "bubble up" and one by one they "pop".

Now, how would we apply this idea to the previous problem of merging 1/4 sorted list to 1/2 sorted list? Here's how. What if we re-arrange the order we do things, and instead end up with the 1/4 unsorted array in between them, as follows:
[ <--- 1/4 sorted ----> | <-- 1/4 unsorted --> | <----------------- 1/2 sorted --------------->]

Then we can pretend that the 1/4 unsorted items are "empty slots" which will bubble up and get popped with the items in the 1/4 sorted array, using our "in-place merging with empty slots" demonstrated above!

We'll then end up with the following:
[ <--- 1/4 unsorted ---> | <---------------------- 3/4 sorted ------------------------> ]

We can then repeat the process on the 1/4 unsorted to get 1/8 sorted, 1/8 unsorted, 3/4 sorted, then perform the "in-place merging with empty slots", and so on until we have 1/8, 1/16, 1/32 unsorted region and so on.

When will this end? It ends when the unsorted region is small enough, e.g. < 10 elements, where we can simply perform an insertion sort on the entire array once, which will be of the order O(N) since insertion sort is linear on "almost sorted" arrays.

Here's the code (disclaimer: might be wrong)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#include <stdio.h>
#define N 1000000
int a[N];
void swap(int i,int j){
    int c=a[i];
    a[i]=a[j];
    a[j]=c;
}
void merge(int a[],int x,int y,int m,int n,int w) {
    while(x<=y&&m<=n){
        swap(w++,a[x]<a[m]?x++:m++);
    }
    while(x<=y){
        swap(w++,x++);
    }
    while(m<=n){
        swap(w++,m++);
    }
}
void insertion_sort(int a[],int i,int j){
    int m,n;
    for(m=i;m<=j;++m){
        for(n=m;n>i&&a[n]<a[n-1];--n){
            swap(n,n-1);
        }
    }
}
void sort(int a[],int i,int j) {
    int m=j+1,n=j,w,x;
    while(j-i>1){
        x=(i+j+1)/2-1;
        w=(i+j)/2+1;
        sort(a,i,x);
        merge(a,i,x,m,n,w);
        m=w;
        j=w-1;
    }
    insertion_sort(a,i,n);
}
int main() {
    int n,i;
    scanf("%d",&n);
    for(i=0;i<n;++i)scanf("%d",&a[i]);
    sort(a,0,n-1);
    printf("%d\n",n);
    for(i=0;i<n;++i)printf("%d\n",a[i]);
    return 0;
}

Now, analysis.

Extra space?
Well, extra space needed is O(log N) since the implementation still does a recursion, which requires stack to keep track of the recursion, and the recursion can only go O(log N) deep. 

Run-time complexity?
More difficult to see. First of all, the merge operation is O(N). What about sort? Let T(n) be the run-time complexity of sort. Then T(n) = sum of [O(n) + T(k)] for k = n/2, n/4, n/8, ... 
Since we call merge only at most O(log n) times, the merge operations take overall O(n log n) time.
So it reduces to T(n) = O(n log n) + T(n/2) + T(n/4) + T(n/8) + ...
Which reduces to T(n) = O(n log n) + 2 T(n/2), which reduces to T(n) = O(n log^2 n).

Are you not happy with the fact that this runs slower than O(N log N) merge sort? Then here's a challenge for you: How would you improve the above implementation so that it really runs in O(N log N) time? Hint: we haven't exploited brilliant idea #1 in the code yet. Enjoy!

Sunday, June 2, 2019

Merging heaps in O(log N)?

Can you merge two heaps in O(log N)? Turns out if they are leftist heaps, you can!
But what is a leftist heap, and how is it different from binary heap?

Recall that binary heap is both complete and balanced. Hence it looks something like this:
               1
             /   \
            2     3
           / \   /
          5   6 7
             Fig 1
It's clear that it supports O(log N) operations.

In contrast, this is what leftist heap looks like:
               1
              / \
             3   2
            /   /
           5   7
          /
         6
             Fig 2
It's neither complete nor balanced. Yet, it supports O(log N) operations too!

How does it work?

Thursday, June 1, 2017

Codeforces Round #416 (Div. 2) C. Vladik and Memorable Trip

Problem Statement:
Codeforces Round #416 (Div. 2) C. Vladik and Memorable Trip

Summary:
An integer array A of size n <= 5000 has entries which ranges from 1 to 5000. We pick disjoint segments of the array such that for each segment S:
1. If i in S, then any j s.t. a[j] == a[i] must also be in S
2. The score of S is computed as the XOR of its elements
Goal is to maximise the sum of scores of the chosen segments.

Solution:
I like this problem. This is solvable using a dynamic programming approach.

Codeforces Round #416 (Div. 2) E. Vladik and Entertaining Flags

Problem Statement:
Codeforces Round #416 (Div. 2) E. Vladik and Entertaining Flags
Summary:
Given an integer matrix with size m x n with m <= 10, n <= 10^5, find the number of connected components in segment [l, r] of the matrix (i.e. rectangle (0, l) -> (m-1, r)). Two cells belong to the same connected component if they are adjacent (share the same edge) and have the same value.
There are q <= 10^5 such queries.

Solution:
A cool problem which can be solved using Segment Tree. This is made possible by the following observation: Consider two adjacent segments [l, m] and [m+1, r]. If we know the number of components on each segment, then we can iterate along the cells where the two segments meet to see if we can combine any adjacent components.

Sunday, May 28, 2017

Codeforces Round #415 (Div. 2) E. Find a car [or Div. 1 C]

Problem Statement:
Codeforces Round #415 (Div. 2) E. Find a car

Paraphrase:
An integer matrix M is defined as:
M[i][j] = the minimum integer x >= 1 such that:
- M[i][k] != x for all k < j
- M[k][j] != x for all k < i

Example:
1 2 3 4
2 1 4 3
3 4 1 2
4 3 2 1

Goal: Given a rectangular block in M {(x1, y1), (x2, y2)} and an integer K, compute the sum of elements in that rectangle, but exclude elements whose value is larger than K.


Thursday, March 16, 2017

UVa 434 - Matty's Blocks

Problem Statement:
UVa 434 - Matty's Blocks

Summary:
This problem pertains to a matrix of size K x K, where each entry is a non-negative integer.
Let the maximum values on each row be max_row[0 .. K-1], and the maximum values on each column be max_col[0 .. K - 1].

E.g.          
[ 1 4 2 5 3 ]  --> 5
[ 8 1 7 6 0 ]  --> 8
[ 3 3 4 1 5 ]  --> 5  max_row
[ 6 5 0 3 2 ]  --> 6
[ 2 4 1 6 6 ]  --> 6
  v v v v v
  8 5 7 6 5
  max_col

Our job is to fill in the matrix to fulfil those requirements. In general there are more than one way to do it. Compute:
1. Minimum sum of entries possible.
2. Maximum sum of entries possible.

Note that no constraint is placed on K in the original problem statement. It can be small, it can be very large.


Monday, February 27, 2017

a bit of algo: LZW Compression

LZW compression is a magic show. I am left awe-struck as the algorithm compresses and uncompresses bits of data. The magic is not in the fact that it does compress the data, no, that's the boring part. The magic is in the simplicity and elegance of it.

Compression
Given a string "bananababa", the algorithm starts with a code table. We initialize the code table with the following entries:

0 - a
1 - b
2 - n

This is called the initial code table.

LZW then iterates through the letters in the string while also updating the code table. The algorithm is simply:

buffer := empty;
for c in str:
    if buffer + c is in code_table:
        buffer := buffer + c;
    else:
        output code_table[buffer];
        add (next_index_number++, buffer + c) into code_table;
        buffer := c;
output code_table[buffer];

That's it! So with our example "bananababa", we get: