Saturday, December 27, 2014

Codeforces Round 272 Div. 1 Problem D - Dreamon and Binary

Problem Statement:
477D - Dreamon and Binary

Solution:
A loaded DP problem.. And it features an interesting use of suffix array prefix doubling technique. The official editorial to this problem seems to be quite detailed, but somehow I find it a bit hard to understand. You may find my explanation even more confusing, but hopefully still useful.



Firstly, let's focus on finding the number of ways to print x[1..n]. Consider the following situation:
Suppose that we have printed x[1..j] for some j. Now we want to find the number of ways to fill up x[j+1 .. n], but to do this we also need the information about what is the last substring being printed. So Let W(i,j) be the number of ways to print up x[j+1 .. n], given that x[1..j] is already printed up, and that the last substring printed was the string x[i..j]. Then we will have the following options:
1. We know that the length of x[i..j] (the last substring printed) is len =  j-i+1. Therefore, if the adjacent substring to x[i..j] with equal length len, that is x[(j+1) .. (2j-i+1)], is a valid substring (which means that x[i..j] <= x[(j+1) .. (2j-i+1)]), then we can print this substring and therefore W(i,j) can be derived from W(j+1, 2j-i+1).
2. However, if the adjacent substring that has an equal length is not a valid substring, we can simply extend this substring by one, resulting in x[(j+1)..(2j-i+2)] which is certainly bigger than x[i..j]. Hence W(i,j) is derivable from W(j+1,2j-i+2) instead.
3. And in either case 1 or 2, we can simply extend the last string x[i..j] to x[i..j+1], and therefore some W(i,j) can always be derived from W(i,j+1).

Now we need an efficient way to do the condition check in case 1, since comparing the string character by character (an O(N) operation) will leave us with a running time complexity of \(O(N^3)\). This can be done by building a suffix array on x, but the important information is not only the final suffix array per say, but also entire ranking information on every prefix length being considered. This means that while updating the ranking of suffixes in each iteration of suffix array construction, we obtain the ranking table of substrings by length = 1, 2, 4, 8, ... and so on. How to utilize this to compare substrings of length other than those lengths? It can be cleverly done as follows:

Let the two substrings of length k in question be [i..(i+k-1)] and [j..(j+k-1)]. We find the maximum power n such that \(2^n \leq k\). That is to say, \(k = 2^n + r\) for some \(r < 2^n\). Think of r as the offset. Hence to fully compare [i..(i+k-1)] with [j..(j+k-1)], we can actually compare [i..(i+\(2^n\)-1)] with [j .. (j+\(2^n\)-1)], as well as [(i+r)..(i+r+\(2^n\)-1)] with [(j+r) .. (j+r+\(2^n\)-1)], both are found on the ranking table! This pushes down the checking time from \(O(N)\) to \(O(1)\), hence pushes down the running time complexity to \(O(N^2)\).

Then the remaining part is to implement the idea, which can be pretty challenging:

#include <iostream>
#include <cstdio>
#include <algorithm>
#include <string>
#include <utility>
#include <vector>
using namespace std;


string s;
long long MOD = (long long) 1e9 + 7LL;
int $ = -1;
int r[20][5005];
long long dp[5005][5005];
int cmp[5005];
pair<int,int> val[5005][5005];
int n;

//is [i..j] <= [j+1 .. (2*j+i-1)] ?
int check(int len, int i, int j){
 int k = -1;
 int tmp = len;
 while(tmp){
  ++k;
  tmp >>= 1;
 }
 int L = 1 << k;
 if(L == len) {
  return r[k][i] <= r[k][j+1];
 }
 if(r[k][i] == r[k][j+1]){
  int offset = len - L;
  return r[k][i+offset] <= r[k][j+1+offset];
 }
 return r[k][i] < r[k][j+1];
}

void update_path(int i, int j, int u, int v){
 if(val[i][j].first == -1){
  val[i][j] = val[u][v];
  if(i!=u) val[i][j].second ++;
 } else {
  if(val[i][j].first == val[u][v].first){
   val[i][j].second = min(val[i][j].second, val[u][v].second + (i != u ? 1 : 0));
  } else if(val[i][j].first < val[u][v].first) {
   if(n - val[i][j].first >= 13) {
    val[i][j] = val[u][v];
    if(i != u) val[i][j].second++;
   } else {
    int tmp = cmp[val[u][v].first] + val[u][v].second + (u!=i?1:0);
    if(cmp[val[i][j].first] + val[i][j].second > tmp){
     val[i][j] = val[u][v];
     if(u!=i)val[i][j].second ++;
    }
   }
  }
 }
}


int main(){
 cin >> s;
 n = s.size();
 for(int i=0;i<n;++i) r[0][i] = (int)s[i];
 for(int k=1,len=1;;++k,len*=2){
  vector<pair<pair<int,int>,int> > st;
  for(int i=0;i<n;++i){
   int j = i+len;
   if(j >= n) j = $;
   else j = r[k-1][j];
   st.push_back(make_pair(make_pair(r[k-1][i],j),i));
  }
  sort(st.begin(), st.end());
  int cnt = 0;
  r[k][st[0].second] = cnt;
  for(int i=1;i<n;++i){
   if(st[i].first != st[i-1].first) ++cnt;
   r[k][st[i].second] = cnt;
  }
  if(len >= n) {
   break;
  }
 }
 for(int i=max(0,n-13);i<n;++i){
  cmp[i] = 0;
  for(int j=i;j<n;++j){
   cmp[i] *= 2;
   cmp[i] += (s[j] == '1' ? 1 : 0);
  }
 }
 for(int i=0;i<n;++i)
  for(int j=0;j<n;++j)
   val[i][j].first = -1;
 for(int i=0;i<n;++i){
  dp[i][n-1] = (s[i] == '1' ? 1 : 0);
  val[i][n-1] = make_pair(i,1);
 }
 for(int j=n-2;j>=0;--j){
  for(int i=0;i<=j;++i){
   dp[i][j] = dp[i][j+1];
   update_path(i,j,i,j+1);
   if(s[j+1] != '0'){
    if(2*j-i+1 < n && check(j-i+1, i, j)){
     dp[i][j] = (dp[i][j] + dp[j+1][2*j-i+1]) % MOD;
     update_path(i,j,j+1,2*j-i+1);
    } else if (2*j-i+2 < n){
     dp[i][j] = (dp[i][j] + dp[j+1][2*j-i+2]) % MOD;
     update_path(i,j,j+1,2*j-i+2);
    }
   }
  }
 }
 long long minlen = 0;
 for(int i = val[0][0].first; i < n;++i){
  minlen *= 2LL;
  minlen %= MOD;
  minlen += (s[i] == '1'? 1:0);
  minlen %= MOD;
 }
 minlen += val[0][0].second;
 minlen %= MOD;
 printf("%d\n", (int) dp[0][0]);
 printf("%d\n", (int) minlen);
 return 0;
}