## Tuesday, January 13, 2015

### SPOJ MORSE - Decoding Morse Sequences

Problem Statement:
SPOJ - MORSE

Solution:
This is a very... tedious problem. The idea is to build a trie structure for efficient search of words in the given dictionary. Since each letters in the dictionary is at most only 20 letters long, which can contain up to 26 letters, we can build the trie structure in O(M) time where M is the total sum of the lengths of the words.

Next, we do a depth first search like algorithm on the morse code (defined as morse[i]), starting from i = 0. Initially we also have a pointer cur pointing to the root of the trie. So the states of the algorithm is S(i, cur), where S(i,cur) gives us the number of ways to match morse[i..N] with words from dictionary.

For each i, we check what are the alphabet letters that can be decoded using the prefix of [i..N]. For each letter L that can be decoded, we check whether L is a child of the node pointed by cur. If so, we go to that state S(i+len(L), L) and recurse. Furthermore, if the path from root to cur forms a word in the dictionary, which means morse[0..i] are exactly matched with words from the dictionary, hence we can start matching a new phrase at index i, therefore we go to state S(i, root).

With this dynamic programming approach, we can do a complete search on all possible states more efficiently to find the total number of ways of matching the morse code. What is the running time? Since i $$\in [0..N]$$ and cur can be as many as M, on the surface the algorithm seems to run on O(NM). However, since in each state, cur can only move the its child, and since trie is a direct acyclic graph of at most with longest path of length 20, we actually have an O(N) algorithm (with some moderately large constant)! (Disclaimer: I'm doing the analysis half awake, so forgive me if the analysis turns up to be inaccurate :) )

There is a gotcha in the test cases prepared in SPOJ, as there are actually some words that are repeated on the dictionary. True story. So our code actually need to handle those cases. I'd say that this is one very tedious problem..

#include <iostream>
#include <cstdio>
#include <algorithm>
#include <string>
#include <vector>
#include <cassert>
using namespace std;

string morse;
vector<string> dict;
int dicttree[200003][30];
vector<int> state[100003];
long long dp[100003];
int N, sz;
int dollar = 29;
string pattern[26] = {".-","-...","-.-.","-..",".","..-.","--.","....","..",".---","-.-",".-..","--","-.","---",".--.","--.-",".-.","...","-","..-","...-",".--","-..-","-.--","--.."};

void build_dict_tree(){
for(int i=0;i<=200000;++i){
for(int j=0;j<30;++j){
dicttree[i][j] = 0;
}
}
int cnt = 1;
for(int i=0;i<N;++i){
int cur = 0;
int k = 0;
while(dicttree[cur][dict[i][k]-'A'] != 0){
cur = dicttree[cur][dict[i][k]-'A'];
++k;
if(k >= dict[i].size())break;
}
if(k == dict[i].size()){
dicttree[cur][dollar] += 1;
} else {
while(k < dict[i].size()){
dicttree[cur][dict[i][k]-'A'] = cnt;
cur = cnt;
++cnt;
++k;
}
dicttree[cur][dollar] += 1;
}
}
}

void build_state(){
for(int i=0;i<sz;++i){
state[i].clear();
}
for(int i=0;i<sz;++i){
for(int j=0;j<26;++j){
bool ok = true;
for(int k=0;k<pattern[j].size();++k){
if(morse[i+k] != pattern[j][k]){
ok = false;
break;
}
}
if(ok) state[i].push_back(j);
}
}
}

long long dfs(int u, int start, int trie){
assert(u<=sz);
if(u == sz){
return dicttree[trie][dollar];
}
if(start == u) if(dp[u]!=-1)return dp[u];
long long ret = 0;
for(int i=0;i<state[u].size();++i){
int c = state[u][i];
if(dicttree[trie][c])ret += dfs(u+pattern[c].size(), start, dicttree[trie][c]);
}
if(dicttree[trie][dollar]){
ret += 1LL*dicttree[trie][dollar] * dfs(u, u, 0);
}
if(start == u) {
dp[u] = ret;
}
return ret;
}

int main(){
int tc;
scanf("%d",&tc);
while(tc--){
cin >> morse;
sz = morse.size();
scanf("%d",&N);
dict = vector<string>(N+3);
for(int i=0;i<N;++i){
cin >> dict[i];
}
build_dict_tree();
build_state();
for(int i=0;i<sz;++i){
dp[i] = -1;
}
long long ans = dfs(0,0,0);
cout << ans << endl;
}
return 0;
}