Sunday, November 15, 2020

in-place merge sort (sort of)

So I just saw this discussion https://stackoverflow.com/questions/2571049/how-to-sort-in-place-using-the-merge-sort-algorithm and I think the top-voted answer is brilliant. Felt like I have found a rare gem and want to share it to the rest of the world.

The problem statement is: "How do you perform in-place merge sort?".

In-place more or less means requiring O(1) extra space. However, in this post, I'm going to describe a variant which requires O(log N) extra space and takes O(N log^2 N) to run, since the final algorithm will still be using recursions. I think it's a good trade-off, since it still highlights the main ideas and the code is still simple.

We'll arrive at the algorithm by a series of false starts.

Let's start with the ordinary merge sort:

sort(A[], i, j):
    m = (i + j) / 2
    sort(A[], i, m)
    sort(A[], m+1, j)
    merge(A[], i, m, m+1, j)

Or in plain English: to sort, you sort the first half, then sort the second half, and then merge the two sorted list into one sorted list.

The merge operation is very simple, just like merging 2 sorted decks of cards: repeatedly take one from the top of each pile, then place the smaller one on the third pile.

This "third" pile is usually represented as a temporary array, large enough to hold the result of the merge, which means O(N) extra space.

Here comes the challenge: Can we do better than this in terms of extra space required?

If you have never thought of this problem before, I recommend you thinking about it a little bit (or a lot) before I spoil it to you.


OK.


The first thing that we may try to do is to right away try to merge the two sorted halves. That's pretty hard.

So there comes the brilliant idea #1 from the Stack Overflow answer: what if we leave some unsorted regions which we can use to store our temporary merge operations?

For example, say we now only aim to sort half of the array, leaving the other half unsorted:
[  <------ 1/2 unsorted ------->   |   <------- 1/2 to be sorted -------> ]

Then we can use the unsorted region to store our merge operation by swapping sorted items with unsorted items.

Here's an illustration:
2 5 | 1 4 | A B C D |

[A, B, C, D] is the unsorted items, while [2, 5] and [1, 4] are the sorted lists to be merged. Then the merging operation, done step by step, will look as follows:

2 5 | 1 4 | A B C D |   // swap A and 1.
^     ^     ^
2 5 | A 41 B C D |   // swap B and 2
^       ^     ^
B 5 | A 4 | 1 2 C D |   // swap C and 4
  ^     ^       ^
B 5 | A C | 1 2 4 D |   // swap D and 5
  ^       ^       ^
B D | A C | 1 2 4 5 |   // Done
    ^     ^         ^

We end up with [B, D, A, C] unsorted on the left side, and [1, 2, 4, 5] sorted on the right side.

With this idea alone, you would be able to sort half of the array. What about the other half of the array? Well, we can do the same too! So schematically it will look like this:
[ <--- 1/4 unsorted ----> | <---- 1/4 sorted ---> | <--- 1/2 sorted ----> ]

However at this point, what we want is to merge the 1/4 sorted and 1/2 sorted parts, but we only have 1/4 unsorted buffer which is clearly not enough to contain 1/4 + 1/2 = 3/4 of the items.

Here comes brilliant idea #2, which I will call "in-place merging with empty slots".

Say we 2 sorted lists of size M and N. For example, say M = 3 and N = 5, as shown in the following illustration:

2 4 7 | A B C 1 3 5 6 9 |

Here, [2, 4, 7] is a sorted list of size M = 3, and [1, 3, 5, 6, 9] is a sorted of size N = 5. 
A, B, C are the empty slots, since M = 3, there are 3 of them. We name them A, B, C just to help keeping track of them. In reality, we don't care they order of the empty slots.

How shall we proceed in merging them? Follow this and you will get it:

3 6 7 | A B C 1 2 5 8 9 |   // swap A and 1
^       ^     ^
3 6 7 | 1 B C A 2 5 8 9 |   // swap B and 2
^         ^     ^
3 6 7 | 1 2 C A B 5 8 9 |   // swap C and 3
^           ^     ^
C 6 7 | 1 2 3 A B 5 8 9 |   // swap A and 5
  ^           ^   ^
C 6 7 | 1 2 3 5 B A 8 9 |   // swap B and 6
  ^             ^   ^
C B 7 | 1 2 3 5 6 A 8 9 |   // swap A and 7
    ^             ^ ^
C B A | 1 2 3 5 6 7 8 9 |   // Done (kind of)
      ^             ^
                    ^

It might be harder to understand because the empty slots are intermixed with the sorted list. However it has very intuitive feel to it. As the merging process goes on, the empty slots "bubble up" and one by one they "pop".

Now, how would we apply this idea to the previous problem of merging 1/4 sorted list to 1/2 sorted list? Here's how. What if we re-arrange the order we do things, and instead end up with the 1/4 unsorted array in between them, as follows:
[ <--- 1/4 sorted ----> | <-- 1/4 unsorted --> | <----------------- 1/2 sorted --------------->]

Then we can pretend that the 1/4 unsorted items are "empty slots" which will bubble up and get popped with the items in the 1/4 sorted array, using our "in-place merging with empty slots" demonstrated above!

We'll then end up with the following:
[ <--- 1/4 unsorted ---> | <---------------------- 3/4 sorted ------------------------> ]

We can then repeat the process on the 1/4 unsorted to get 1/8 sorted, 1/8 unsorted, 3/4 sorted, then perform the "in-place merging with empty slots", and so on until we have 1/8, 1/16, 1/32 unsorted region and so on.

When will this end? It ends when the unsorted region is small enough, e.g. < 10 elements, where we can simply perform an insertion sort on the entire array once, which will be of the order O(N) since insertion sort is linear on "almost sorted" arrays.

Here's the code (disclaimer: might be wrong)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#include <stdio.h>
#define N 1000000
int a[N];
void swap(int i,int j){
    int c=a[i];
    a[i]=a[j];
    a[j]=c;
}
void merge(int a[],int x,int y,int m,int n,int w) {
    while(x<=y&&m<=n){
        swap(w++,a[x]<a[m]?x++:m++);
    }
    while(x<=y){
        swap(w++,x++);
    }
    while(m<=n){
        swap(w++,m++);
    }
}
void insertion_sort(int a[],int i,int j){
    int m,n;
    for(m=i;m<=j;++m){
        for(n=m;n>i&&a[n]<a[n-1];--n){
            swap(n,n-1);
        }
    }
}
void sort(int a[],int i,int j) {
    int m=j+1,n=j,w,x;
    while(j-i>1){
        x=(i+j+1)/2-1;
        w=(i+j)/2+1;
        sort(a,i,x);
        merge(a,i,x,m,n,w);
        m=w;
        j=w-1;
    }
    insertion_sort(a,i,n);
}
int main() {
    int n,i;
    scanf("%d",&n);
    for(i=0;i<n;++i)scanf("%d",&a[i]);
    sort(a,0,n-1);
    printf("%d\n",n);
    for(i=0;i<n;++i)printf("%d\n",a[i]);
    return 0;
}

Now, analysis.

Extra space?
Well, extra space needed is O(log N) since the implementation still does a recursion, which requires stack to keep track of the recursion, and the recursion can only go O(log N) deep. 

Run-time complexity?
More difficult to see. First of all, the merge operation is O(N). What about sort? Let T(n) be the run-time complexity of sort. Then T(n) = sum of [O(n) + T(k)] for k = n/2, n/4, n/8, ... 
Since we call merge only at most O(log n) times, the merge operations take overall O(n log n) time.
So it reduces to T(n) = O(n log n) + T(n/2) + T(n/4) + T(n/8) + ...
Which reduces to T(n) = O(n log n) + 2 T(n/2), which reduces to T(n) = O(n log^2 n).

Are you not happy with the fact that this runs slower than O(N log N) merge sort? Then here's a challenge for you: How would you improve the above implementation so that it really runs in O(N log N) time? Hint: we haven't exploited brilliant idea #1 in the code yet. Enjoy!

7 comments:

  1. This is an interesting approach to in-place merge sort! Using O(log N) extra space while still preserving the fundamental merge sort idea is a great trade-off. I’m excited to see how the algorithm is structured!
    Industrial Dust Cleaner
    Centralized Dust Collector

    ReplyDelete
  2. I love how the explanation starts with breaking down the false starts. It's always good to explore why certain methods don't work before landing on a viable solution.
    You can also explore
    fume scrubber
    Dust Collector India

    ReplyDelete
  3. The complexity of this algorithm (O(N log^2 N)) is something to keep in mind. While it's not the most optimal, it’s a nice compromise between space efficiency and time complexity. I’ll definitely try implementing this approach.
    Industrial Dust Removal India
    Cyclone Dust Collector India

    ReplyDelete
  4. I’m curious if this approach could be further optimized for specific types of data or use cases. Have you tested this with larger datasets?

    Shrink Wrapping Machine Supplier
    Centrifugal Blower in Delhi

    ReplyDelete
  5. Great explanation! In-place sorting is always challenging, and this algorithm provides a solid middle ground. It's nice to see an approach that keeps the code relatively simple while improving space usage.

    Pulse Jet Bag Filter India
    Centrifugal Blower in Manufacturer

    ReplyDelete
  6. I really appreciate how you’ve broken down the core concepts and offered a reasonable trade-off between time and space complexity. I’ve been struggling with in-place merge sort for a while, and this seems like a good solution to experiment with!
    Invest in Brands
    Dust Collector

    ReplyDelete
  7. The use of recursion is tricky for in-place algorithms. Even though it doesn't achieve O(1) space, the fact that it reduces extra space to O(log N) makes it still pretty efficient for many practical applications.
    Centrifugal Blowers India
    Shrink Packing Machine Supplier

    ReplyDelete