Sunday, May 8, 2016

How is deque implemented?

It sounds really simple, and I have never really given it a thought before. However just now I realised that it is not as straight-forward as I might have led to believe!

A refresher: deque is a data structure that supports O(1) append operations on both ends (head and tail) and O(1) random look-ups.

Hence deque has a similar characteristics as a vector, only better. Wait a minute, how could that be? If deque is simply a doubly linked-list, then random look-ups will be O(N). That's when I realise that I have been so ignorant and oblivious.

The next thought that comes to mind is by using a circular array where we keep track of the head and tail pointers. I think this approach might work, it does give the desired characteristics: appending on both ends is obviously O(1), and look-up of an index can be computed by taking into account the offset introduced by the head pointer. However this is not how deques are usually implemented. Partly I think because when the current array is full, copying it to a larger one will be costly. Furthermore, since elements are moved to the newer array, pointers to the corresponding elements on the previous array will be invalidated, hence that would not be a very nice behaviour to work with.

The way deque is implemented is by using vector of fixed-sized arrays (also called data blocks). All information of the addresses of our data blocks are stored in a central vector called "map". map has a monotonic property such that the data blocks are arranged in sequential order in map. (i.e. by following the data blocks in map, we can list down all the elements in our deque in sequential total order)

We keep track of: num_of_elements (the number of elements we currently have), first_block (index of map containing the address of our first block), first_elem (so that map[first_block][first_elem] is the first element in our deque) and block_size (the size of a data block).

data block: [x x 0 1] [6 x x x] [2 3 4 5]
block addr:      3         4         8
       map: [x x 3 8 4 x x]

num_of_elements = 7
    first_block = 2
     first_elem = 2
     block_size = 4

An append operation is done by finding the first block (or last block, depending on the append operation) and insert our element to that block accordingly, just like what we will do with a plain vector/array. If that block is full prior to our insertion, we simply allocate a new block and place our element there, and then append the address of the newly created data block to our map (wow, recursive!). Deletion is also O(1) under this scheme.

Look up is done by indexing arithmetic using information on first_block, first_elem, and block_size. The i-th element is either in the first block (i.e. i < block_size - first_elem) or it's not. If it is in the first block, then map[first_block][i+block_size] will be the i-th element. Else, we compute which block it is in (i.e. cur_block = (i - block_size + first_elem)/block_size + 1) and hence the position of that element in the block (i.e. cur_elem = i - block_size + first_elem - cur_block*block_size), which locates our element at map[cur_block][cur_elem].

Finally, when our initial map vector is full, we can assign a bigger vector without having to worry that the pointers to the elements are invalidated. (Although we still incur the overhead of O(N), which can be amortized).

Interesting data structure indeed!