## Monday, June 6, 2016

### Codeforces Round #355 (Div. 2) - D. Vanya and Treasure

Problem Statement:
Codeforces Round #355 (Div. 2) - D. Vanya and Treasure

Summary:
You have a 2D array m x n with values [1..p] on each cell (p <= n*m). Distance between two cells is defined as Manhattan distance. Compute the path with minimum total distance that passes through 1, 2, ..., p in order. [Additionally, the problem always starts from cell (1, 1)].

Solution:
Tough problem. I will discuss two solutions to this problem, one using Fenwick/Segment Tree, and one that uses BFS. The editorial can be found here: http://codeforces.com/blog/entry/45181.

Motivation:
The main strategy is of course doing a dynamic programming on the level of key obtained. Let dp[row][col] be the minimum distance needed to reach cell with value p from cell (row, col). If cell (row, col) has key k, dp[row][col] is simply the minimum of dp[next_row][next_col] + |row-next_row| + |col-next_col|, for all cell (next_row, next_col) that have key k+1. But of course, this alone will not be fast enough as the run time complexity is O(nm * nm).

Fenwick/Segment Tree based solution:
This in an idea that is shared on some comments in the editorial page (brilliant people), which is the solution that I preferred. It makes use of an observation regarding the symmetry of the problem and a sweeping line technique.

The problem with our formulation previously is that we uses absolute function |x| which ties the parameters of two cells tightly. However, if we can reduce the problem into several subparts with symmetrical properties, then we might have better luck.

Indeed, suppose that we subdivide the regions into 4 regions around (row, col). Now let's focus on the top-left region of (row, col). That is, if we only consider those cells such that next_row <= row and next_col <= col, we will have dp[row][col] = mininum of dp[next_row][next_col] - (next_row+next_col) + (row+col). We see that we can group the terms next_row and next_col together independently from row and col. Since (row, col) is fixed, all we need is to minimise dp[next_row][next_col] - next_row - next_col. This suggest that we can use segment tree/fenwick tree somehow.

The trick is to sort those cells and perform a sweeping of the cells along row values. Initialise an RMQ data structure. Now perform a sweep along rows, and go through the cells in each row from left to right:
1. When we see a cell (i, j) with key k+1, we update the RMQ at index j with its dp[i][j]-i-j.
2. Then when we encounter a cell (i, j) with key k, we make a query (1, j) to retrieve the minimum term for dp[next_row][next_col] - next_row - next_col for cell (i, j). Hence dp[i][j] = rmq(1, j) + i + j.

After completing the sweeping and updating the dp values for the top-left region, we repeat the process on the rest of the regions. One way to reduce the implementation complexity is to perform a rotation and repeat the same procedure above on the rotated cells. Hence we can repeat this rotation a few times until all subdivision is covered.

Implementation:

#include <iostream>
#include <cstdio>
#include <vector>
#include <algorithm>
#include <cassert>
#include <utility>
using namespace std;
constexpr int INF = 87654321;
class Fenwick {
public:
vector<int> tree;
int length;
int Rmq(int i) {
int ret = INF;
while (i) {
ret = min(ret, tree[i]);
i -= i & -i;
}
return ret;
}
void Update(int i, int val) {
while (i <= length) {
tree[i] = min(tree[i], val);
i += i&-i;
}
}
Fenwick(){}
Fenwick(int length) : tree(length+1, INF), length(length) {}
void reset() {
for (int i = 0; i < length+1; ++i) tree[i] = INF;
}
};
int n,m,p;
Fenwick fenwick[4];
vector<vector<int>> board, dp;
vector<vector<pair<int,int>>> idx;
vector<vector<pair<int,int>>> level[4], true_idx[4];

void Solve() {
for (int x=0;x<4;++x){
fenwick[x] = Fenwick(m);
level[x].clear();
level[x].resize(p+1);
true_idx[x].clear();
true_idx[x].resize(p+1);
vector<pair<int,pair<pair<int,int>,pair<int,int>>>> chest;
for (int i = 1; i <= n; ++i) {
for (int j = 1; j <= m; ++j) {
chest.push_back({board[i][j], {{i,j}, idx[i][j]}});
}
}
sort(chest.begin(), chest.end());
for (int i=0;i<n*m;++i){
level[x][chest[i].first].push_back(chest[i].second.first);
true_idx[x][chest[i].first].push_back(chest[i].second.second);
}

vector<vector<int>> rotated_board(m+1,vector<int>(n+1,0));
vector<vector<pair<int,int>>> rotated_idx(m+1, vector<pair<int,int>>(n+1,{0,0}));
for (int j = 1; j <=m; ++j) {
for (int i = 1; i<=n; ++i) {
rotated_board[j][i]=board[n+1-i][j];
rotated_idx[j][i]=idx[n+1-i][j];
}
}
board = rotated_board;
idx = rotated_idx;
swap(n,m);
}
dp = vector<vector<int>>(n+1,vector<int>(m+1,INF));
dp[level[0][p][0].first][level[0][p][0].second]=0;
for (int v=p-1;v>=1;--v) {
for(int x=0;x<4;++x){
auto& cur_tree = fenwick[x];
cur_tree.reset();
const auto& query = level[x][v];
const auto& data = level[x][v+1];
const auto& query_idx = true_idx[x][v];
const auto& data_idx = true_idx[x][v+1];
int i = 0, j = 0;
while (i < query.size()) {
while (j < data.size() && (data[j].first < query[i].first ||
(data[j].first == query[i].first &&
data[j].second < query[i].second))) {
cur_tree.Update(data[j].second,
dp[data_idx[j].first][data_idx[j].second]-
data[j].first-data[j].second);
++j;
}
auto& val = dp[query_idx[i].first][query_idx[i].second];
val = min(val, cur_tree.Rmq(query[i].second)+query[i].first+query[i].second);
++i;
}
}
}
}

int main() {
scanf("%d%d%d",&n,&m,&p);
board = vector<vector<int>>(n+1,vector<int>(m+1,0));
idx = vector<vector<pair<int,int>>>(n+1,vector<pair<int,int>>(m+1,{0,0}));
for(int i=1;i<=n;++i){
for(int j=1;j<=m;++j){
scanf("%d",&board[i][j]);
idx[i][j] = {i,j};
}
}
Solve();
int ans = INF;
for (int i=0;i<level[0][1].size();++i){
ans = min(ans, dp[level[0][1][i].first][level[0][1][i].second]+abs(level[0][1][i].first-1)+
abs(level[0][1][i].second-1));
}
printf("%d\n",ans);
return 0;
}


The complexity is O(nm log nm).

BFS based solution:
This is the solution that is described by the official editorial. I have not tried this myself, because I still having difficulties trying to prove its runtime complexity. But I really like this solution because it is much simpler than the above, and comes off as pretty genius. And the idea goes as follows. Instead of using RMQ as above, realise that if we perform a BFS on a set of cells with key k, the runtime complexity is O(nm) (because BFS is O(V+E) where V is nm while E is around 4nm (since each cell is connected to at most 4 other cells)). Borrowing the notation from the editorial, if the number of cells with key k is cnt[k], then suppose that our algorithm runs BFS if cnt[k] * cnt[k-1] > n*m, and perform the simple nested iteration otherwise. Then we can say with deduce that at most there is O(sqrt(nm)) instances where we run a BFS, hence overall the BFS procedure will be O(nm sqrt(nm)).
Proof (source of idea if from a dude in the comment section, clever person):
First of all recall the AM-GM for 2 variables (a+b)/2 >= sqrt(ab) for a, b positive numbers. Now we have mn = cnt[1] + cnt[2] + ... + cnt[p] > (cnt[1]+cnt[2])/2 + (cnt[2]+cnt[3])/2 + ... + (cnt[p-1]+cnt[p])/2 > sqrt(cnt[1]*cnt[2]) + sqrt(cnt[2]*cnt[3]) + ... + sqrt(cnt[p-1] * cnt[p])
Suppose that there are more than sqrt(mn) pairs of cnt[k], cnt[k+1] such that cnt[k]*cnt[k+1] > nm. Then we have sqrt(cnt[1]*cnt[2]) + ... + sqrt(cnt[p-1]*cnt[p]) > sqrt(nm) * sqrt(nm) = nm, a contradiction! Therefore there are at most sqrt(nm) BFS runs.

However, I do not know how would one prove the bound for the nested iteration. It might be provable from some form of amortized analysis, or there might be some simple observations that I can't see yet at the moment. Oh well. Updated: Alright! Indeed the proof is pretty elementary, but I could only see it after reading the discussions in the editorial. I love brilliant people and the knowledge that they share. So here is the proof that the bound for the nested iteration is O(nm * sqrt(nm)):
Given that two positive numbers p and q such that pq < nm, WLOG, p < q, hence we have p*p < pq < nm => p < sqrt(nm), i.e. the smaller of the two number will be less than sqrt(nm). Hence cnt[i] * cnt[i+1] < sqrt(mn) max(cnt[i], cnt[i+1] < sqrt(mn) * (cnt[i] + cnt[i+1]). Now what we want to compute is sum(cnt[i] * cnt[i+1]) for all i such that cnt[i]*cnt[i+1] < nm. Hence we have sum(cnt[i] * cnt[i+1]) < sqrt(nm) * 2 *(cnt[1] + cnt[2] + ... + cnt[p]) = 2 sqrt(nm) * nm. Therefore we conclude that the sum is O(nm sqrt(nm)). Woohoo!