__1. What's the motivation behind this data structure?__

Being the friend of Segment Tree, Fenwick Tree is designed to answer ranged queries efficiently. Historically it is used mostly for calculating cumulative frequencies, or in other words, the sum of values in range [i, j]. It actually can only handle queries in the form [1, j], but in the case of cumulative frequency, one can first query [1, j], and then query [1, i-1], and subtract the result for a desired effect.

__2. What's it look like?__

Using the array A = [1, 4, -3, 5, 6, 1, 3, -1, 0, 2], a Fenwick tree looks like this:

i 1 2 3 4 5 6 7 8 9 10 A 1, 4, -3, 5, 6, 1, 3, -1, 0, 2 fenwick 1, 5, -3, 7, 6, 7, 3, 16, 0, 2 2D representation: 16 +---------+----+----+ 7 | | +----+----+ | | 5 | 7 | 2 +----+ | +----+ | +----+ 1 4 -3 5 6 1 3 -1 0 2 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010

__3. Where is the tree?__

It was weird that it is kind of difficult for me to think about Fenwick Tree as a tree, despite its name. After some contemplation, I think the parent-child relationship of a Fenwick Tree can be described as follows:

1) A node with value v with binary representation A1B where B is made up of zero bits (i.e. the 1 in A1B denotes the rightmost bit 1 in v) is the parent of A01000[...]0, A01100[...]0, A01110[...]0, and so on until A01111[...]1. Hence the parent of node v can be computed as v + (v & -v). So more concretely, v is a child of v + (v & -v).

2) If v = A1B (where B is zero bits), then that node covers the sum from A0B+1 to A1B. This is the motivating construction of fenwick tree, it allows us to compute a range query in O(log n) time (where log n is the length of the bits representation of v). To see this, suppose that v = A1B1C where B and C are zero bits (hence the 1s in v represents the two rightmost 1 bits of v). Then we know that A1B1C covers the value from [A1B0C+1] to A1B1C. All we need to find is now the total sum from 1 to A1B0C(where B0C is all 0 bits, which is hence a recursive operation! ). By 'peeling' off the rightmost 1 bit one by one, we can compute the total sum from 1 to v. This peeling operation is represented as v - (v & -v) (we can also see this as jumping to the left sibling of node v).

Range query [1,v] is hence supported as follows:

rmq(v):

while (v) {

rmq += fenwick[v]; // or rmq = min(fenwick[v], rmq) for RMQ

v -= v & -v;

}

On the other hand, update with incremental value val is represented as follow:

update(v, val):

while (v <= length) {

fenwick[tree] += val; // or fenwick[v] = min(fenwick[v], val).

v += v & v;

}

Notice that we only support 'incremental changes', like adding/subtracting some delta, or by performing monotonic minimisation/maximisation.

Notice also that the update procedure above only handle one entry at a time. As it turns out, there is a way to provide a range update for cumulative frequency by manipulation 2 Fenwick trees at a time.

Firstly, suppose that we have two arrays mult and offset. When we perform a range update on [3, 5] with value +2 for example, we set the values in mult as follows:

mult : [ 0 0 2 2 2 0 0]

While offset is set as follows:

offset: [ 0 0 4 4 4 -6 -6]

Of course if we have performed that range update on our cumulative array, it will look like this:

sum : [ 0 0 2 4 6 6 6]

How does that help? When we want to compute the value of the cumulative frequence at index i after the update, we calculate sum[i] = i * mult[i] - offset. Hence for example:

at i = 3, sum[3] = 3 * 2 - 4 = 2.

at i = 5, sum[5] = 5 * 2 - 4 = 6.

at i = 6, sum[6] = 6 * 0 - (-6) = 6.

Hence we can recover the value of sum[i] from both mult[i] and offset[i]. Now, if both mult and offset are fenwick trees, we can perform the updates as follows: mult.update(3, 2) and then mult.update(6, -2). And we can perform the following to offset: offset.update(3, 4) and offset.update(6, -10). Then sum[i] can be computed as mult.rmq(i) * i - offset.rmq(i).

More generally, we have:

update(left, right, val) {

mult.update(left, val);

mult.update(right+1, -val);

offset.update(left, (left-1)*val);

offset.update(right+1, -right*val);

}

rmq(i) {

return mult.rmq(i) * i - offset.rmq(i)

}

So the above trick of splitting updates into multiplication and offsetting is quite interesting, people who come up with those by themselves are indeed geniuses. But the more interesting trick for me is actually the part where we reduce the range update on the underlying mult and offset to sublinear complexity.

The observation is: to perform an update such as: increment/decrement by val from index i to N on an array, we can instead perform an update on a fenwick tree at index i with that value. This technique essentially pushed down the running time complexity of the above range update from O(N) to O(lg N).

To see this more clearly we can think about an array, for example A = [1, 2, 3, 2]. Suppose that we convert A into a fenwick tree representation FA such that FA.rmq(i) = A[i] (This can be done by iterating through elements in A and perform an update on FA at position i with value A[i]-A[i-1]). Now if we want to make a range update from [L, R] by +val, first we perform an update on FA at position L by value val, then we perform update on FA at R+1 by value -val. The effect of the first operation is that FA.rmq(i) for all i >= L will increase by +val, and the effect of the second operation is such that for all i > R, we make sure that the effect of +val is negated. This achieves the same effect as updating all element in A[L..R] one by one, but with the advantage of using only O(lg N) time.

Here we make a trade off: indexing is now O(lg N) on FA, as opposed to O(1) at A, but performing a range update on FA is now O(lg N) instead of O(N) as of on A.

Yeah that's pretty cool.

1) A node with value v with binary representation A1B where B is made up of zero bits (i.e. the 1 in A1B denotes the rightmost bit 1 in v) is the parent of A01000[...]0, A01100[...]0, A01110[...]0, and so on until A01111[...]1. Hence the parent of node v can be computed as v + (v & -v). So more concretely, v is a child of v + (v & -v).

2) If v = A1B (where B is zero bits), then that node covers the sum from A0B+1 to A1B. This is the motivating construction of fenwick tree, it allows us to compute a range query in O(log n) time (where log n is the length of the bits representation of v). To see this, suppose that v = A1B1C where B and C are zero bits (hence the 1s in v represents the two rightmost 1 bits of v). Then we know that A1B1C covers the value from [A1B0C+1] to A1B1C. All we need to find is now the total sum from 1 to A1B0C(where B0C is all 0 bits, which is hence a recursive operation! ). By 'peeling' off the rightmost 1 bit one by one, we can compute the total sum from 1 to v. This peeling operation is represented as v - (v & -v) (we can also see this as jumping to the left sibling of node v).

__4. Code for Range query and Update:__Range query [1,v] is hence supported as follows:

rmq(v):

while (v) {

rmq += fenwick[v]; // or rmq = min(fenwick[v], rmq) for RMQ

v -= v & -v;

}

On the other hand, update with incremental value val is represented as follow:

update(v, val):

while (v <= length) {

fenwick[tree] += val; // or fenwick[v] = min(fenwick[v], val).

v += v & v;

}

Notice that we only support 'incremental changes', like adding/subtracting some delta, or by performing monotonic minimisation/maximisation.

__5. Range update?__Notice also that the update procedure above only handle one entry at a time. As it turns out, there is a way to provide a range update for cumulative frequency by manipulation 2 Fenwick trees at a time.

Firstly, suppose that we have two arrays mult and offset. When we perform a range update on [3, 5] with value +2 for example, we set the values in mult as follows:

mult : [ 0 0 2 2 2 0 0]

While offset is set as follows:

offset: [ 0 0 4 4 4 -6 -6]

Of course if we have performed that range update on our cumulative array, it will look like this:

sum : [ 0 0 2 4 6 6 6]

How does that help? When we want to compute the value of the cumulative frequence at index i after the update, we calculate sum[i] = i * mult[i] - offset. Hence for example:

at i = 3, sum[3] = 3 * 2 - 4 = 2.

at i = 5, sum[5] = 5 * 2 - 4 = 6.

at i = 6, sum[6] = 6 * 0 - (-6) = 6.

Hence we can recover the value of sum[i] from both mult[i] and offset[i]. Now, if both mult and offset are fenwick trees, we can perform the updates as follows: mult.update(3, 2) and then mult.update(6, -2). And we can perform the following to offset: offset.update(3, 4) and offset.update(6, -10). Then sum[i] can be computed as mult.rmq(i) * i - offset.rmq(i).

More generally, we have:

update(left, right, val) {

mult.update(left, val);

mult.update(right+1, -val);

offset.update(left, (left-1)*val);

offset.update(right+1, -right*val);

}

rmq(i) {

return mult.rmq(i) * i - offset.rmq(i)

}

__6. Additional note: Fun technique__So the above trick of splitting updates into multiplication and offsetting is quite interesting, people who come up with those by themselves are indeed geniuses. But the more interesting trick for me is actually the part where we reduce the range update on the underlying mult and offset to sublinear complexity.

The observation is: to perform an update such as: increment/decrement by val from index i to N on an array, we can instead perform an update on a fenwick tree at index i with that value. This technique essentially pushed down the running time complexity of the above range update from O(N) to O(lg N).

To see this more clearly we can think about an array, for example A = [1, 2, 3, 2]. Suppose that we convert A into a fenwick tree representation FA such that FA.rmq(i) = A[i] (This can be done by iterating through elements in A and perform an update on FA at position i with value A[i]-A[i-1]). Now if we want to make a range update from [L, R] by +val, first we perform an update on FA at position L by value val, then we perform update on FA at R+1 by value -val. The effect of the first operation is that FA.rmq(i) for all i >= L will increase by +val, and the effect of the second operation is such that for all i > R, we make sure that the effect of +val is negated. This achieves the same effect as updating all element in A[L..R] one by one, but with the advantage of using only O(lg N) time.

Here we make a trade off: indexing is now O(lg N) on FA, as opposed to O(1) at A, but performing a range update on FA is now O(lg N) instead of O(N) as of on A.

Yeah that's pretty cool.