The problem statement is: "How do you perform in-place merge sort?".
In-place more or less means requiring O(1) extra space. However, in this post, I'm going to describe a variant which requires O(log N) extra space and takes O(N log^2 N) to run, since the final algorithm will still be using recursions. I think it's a good trade-off, since it still highlights the main ideas and the code is still simple.
We'll arrive at the algorithm by a series of false starts.
Let's start with the ordinary merge sort:
sort(A[], i, j):
m = (i + j) / 2
sort(A[], i, m)
sort(A[], m+1, j)
merge(A[], i, m, m+1, j)
Or in plain English: to sort, you sort the first half, then sort the second half, and then merge the two sorted list into one sorted list.
The merge operation is very simple, just like merging 2 sorted decks of cards: repeatedly take one from the top of each pile, then place the smaller one on the third pile.
This "third" pile is usually represented as a temporary array, large enough to hold the result of the merge, which means O(N) extra space.
Here comes the challenge: Can we do better than this in terms of extra space required?
If you have never thought of this problem before, I recommend you thinking about it a little bit (or a lot) before I spoil it to you.
OK.
The first thing that we may try to do is to right away try to merge the two sorted halves. That's pretty hard.
So there comes the brilliant idea #1 from the Stack Overflow answer: what if we leave some unsorted regions which we can use to store our temporary merge operations?
For example, say we now only aim to sort half of the array, leaving the other half unsorted:
[ <------ 1/2 unsorted -------> | <------- 1/2 to be sorted -------> ]
Then we can use the unsorted region to store our merge operation by swapping sorted items with unsorted items.
Here's an illustration:
2 5 | 1 4 | A B C D |
[A, B, C, D] is the unsorted items, while [2, 5] and [1, 4] are the sorted lists to be merged. Then the merging operation, done step by step, will look as follows:
2 5 | 1 4 | A B C D | // swap A and 1.
^ ^ ^
2 5 | A 4 | 1 B C D | // swap B and 2
^ ^ ^
B 5 | A 4 | 1 2 C D | // swap C and 4
^ ^ ^
B 5 | A C | 1 2 4 D | // swap D and 5
^ ^ ^
B D | A C | 1 2 4 5 | // Done
^ ^ ^
We end up with [B, D, A, C] unsorted on the left side, and [1, 2, 4, 5] sorted on the right side.
With this idea alone, you would be able to sort half of the array. What about the other half of the array? Well, we can do the same too! So schematically it will look like this:
[ <--- 1/4 unsorted ----> | <---- 1/4 sorted ---> | <--- 1/2 sorted ----> ]
However at this point, what we want is to merge the 1/4 sorted and 1/2 sorted parts, but we only have 1/4 unsorted buffer which is clearly not enough to contain 1/4 + 1/2 = 3/4 of the items.
Here comes brilliant idea #2, which I will call "in-place merging with empty slots".
Say we 2 sorted lists of size M and N. For example, say M = 3 and N = 5, as shown in the following illustration:
2 4 7 | A B C 1 3 5 6 9 |
Here, [2, 4, 7] is a sorted list of size M = 3, and [1, 3, 5, 6, 9] is a sorted of size N = 5.
A, B, C are the empty slots, since M = 3, there are 3 of them. We name them A, B, C just to help keeping track of them. In reality, we don't care they order of the empty slots.
How shall we proceed in merging them? Follow this and you will get it:
3 6 7 | A B C 1 2 5 8 9 | // swap A and 1
^ ^ ^
3 6 7 | 1 B C A 2 5 8 9 | // swap B and 2
^ ^ ^
3 6 7 | 1 2 C A B 5 8 9 | // swap C and 3
^ ^ ^
C 6 7 | 1 2 3 A B 5 8 9 | // swap A and 5
^ ^ ^
C 6 7 | 1 2 3 5 B A 8 9 | // swap B and 6
^ ^ ^
C B 7 | 1 2 3 5 6 A 8 9 | // swap A and 7
^ ^ ^
C B A | 1 2 3 5 6 7 8 9 | // Done (kind of)
^ ^
^
It might be harder to understand because the empty slots are intermixed with the sorted list. However it has very intuitive feel to it. As the merging process goes on, the empty slots "bubble up" and one by one they "pop".
Now, how would we apply this idea to the previous problem of merging 1/4 sorted list to 1/2 sorted list? Here's how. What if we re-arrange the order we do things, and instead end up with the 1/4 unsorted array in between them, as follows:
[ <--- 1/4 sorted ----> | <-- 1/4 unsorted --> | <----------------- 1/2 sorted --------------->]
Then we can pretend that the 1/4 unsorted items are "empty slots" which will bubble up and get popped with the items in the 1/4 sorted array, using our "in-place merging with empty slots" demonstrated above!
We'll then end up with the following:
[ <--- 1/4 unsorted ---> | <---------------------- 3/4 sorted ------------------------> ]
We can then repeat the process on the 1/4 unsorted to get 1/8 sorted, 1/8 unsorted, 3/4 sorted, then perform the "in-place merging with empty slots", and so on until we have 1/8, 1/16, 1/32 unsorted region and so on.
When will this end? It ends when the unsorted region is small enough, e.g. < 10 elements, where we can simply perform an insertion sort on the entire array once, which will be of the order O(N) since insertion sort is linear on "almost sorted" arrays.
Here's the code (disclaimer: might be wrong)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | #include <stdio.h> #define N 1000000 int a[N]; void swap(int i,int j){ int c=a[i]; a[i]=a[j]; a[j]=c; } void merge(int a[],int x,int y,int m,int n,int w) { while(x<=y&&m<=n){ swap(w++,a[x]<a[m]?x++:m++); } while(x<=y){ swap(w++,x++); } while(m<=n){ swap(w++,m++); } } void insertion_sort(int a[],int i,int j){ int m,n; for(m=i;m<=j;++m){ for(n=m;n>i&&a[n]<a[n-1];--n){ swap(n,n-1); } } } void sort(int a[],int i,int j) { int m=j+1,n=j,w,x; while(j-i>1){ x=(i+j+1)/2-1; w=(i+j)/2+1; sort(a,i,x); merge(a,i,x,m,n,w); m=w; j=w-1; } insertion_sort(a,i,n); } int main() { int n,i; scanf("%d",&n); for(i=0;i<n;++i)scanf("%d",&a[i]); sort(a,0,n-1); printf("%d\n",n); for(i=0;i<n;++i)printf("%d\n",a[i]); return 0; } |
Now, analysis.
Extra space?
Well, extra space needed is O(log N) since the implementation still does a recursion, which requires stack to keep track of the recursion, and the recursion can only go O(log N) deep.
Run-time complexity?
More difficult to see. First of all, the merge operation is O(N). What about sort? Let T(n) be the run-time complexity of sort. Then T(n) = sum of [O(n) + T(k)] for k = n/2, n/4, n/8, ...
Since we call merge only at most O(log n) times, the merge operations take overall O(n log n) time.
So it reduces to T(n) = O(n log n) + T(n/2) + T(n/4) + T(n/8) + ...
Which reduces to T(n) = O(n log n) + 2 T(n/2), which reduces to T(n) = O(n log^2 n).
Are you not happy with the fact that this runs slower than O(N log N) merge sort? Then here's a challenge for you: How would you improve the above implementation so that it really runs in O(N log N) time? Hint: we haven't exploited brilliant idea #1 in the code yet. Enjoy!