Problem Statement:
471D - MUH and Cube Walls
Solution:
And interesting string matching problem since it does not so obviously look like one. For the \(a_i\) and \(b_i\) given, store \(s_i\) and \(p_i\) as follows: \(s_i = a_i - a_{i-1}\) and \(p_i = b_i - b_{i-1}\), which are basically array of differences. Then the problem reduces to finding occurrences of p in s, which can be done efficiently in \(O(N)\) using KMP algorithm. Don't forget about one special case where b has only one element.
Showing posts with label String Matching. Show all posts
Showing posts with label String Matching. Show all posts
Saturday, December 20, 2014
Monday, December 15, 2014
Codeforces Round #282 Div. 1 - Obsessive String
Problem Statement:
494B - Obsessive String
Solution:
I like this problem, and for me it is a particularly challenging one. Thankfully the editorial for this round is very well-written and detailed. I learnt quite a lot from this problem.
Firstly we need to find an efficient way to mark all occurrence of t in s, and here we can employ the \(O(N)\) Knuth-Morris-Pratt string matching algorithm. We keep track of all indexes i of s such that s[i-|t|+1 .. i] matches t exactly in an array g[i], setting such indexes g[i] as 1, and 0 otherwise. The hardest part is to come up with the right DP state. Here, the one that will work is: let f[i] be the number of ways to choose the substrings from [1..i], such that the last (right-most) substring ends with index i.
494B - Obsessive String
Solution:
I like this problem, and for me it is a particularly challenging one. Thankfully the editorial for this round is very well-written and detailed. I learnt quite a lot from this problem.
Firstly we need to find an efficient way to mark all occurrence of t in s, and here we can employ the \(O(N)\) Knuth-Morris-Pratt string matching algorithm. We keep track of all indexes i of s such that s[i-|t|+1 .. i] matches t exactly in an array g[i], setting such indexes g[i] as 1, and 0 otherwise. The hardest part is to come up with the right DP state. Here, the one that will work is: let f[i] be the number of ways to choose the substrings from [1..i], such that the last (right-most) substring ends with index i.
Monday, October 13, 2014
a bit of dp on string: SPOJ - DSUBSEQ
Problem Statement:
SPOJ - DSUBSEQ
Summary:
Given a string S, find the number of distinct subsequences.
S only contains uppercase characters.
\(|S| \leq 10^5\), 100 test cases, and time limit of 2s.
Solution:
Let D[i][j] be the number of distinct subsequences starting with alphabet j in a string of length i. Also define sum[i] be the total number of distinct subsequences in S[0..i].
Consider the case where we are appending S[i] to substring S[0..(i-1)]. Then we have the following
D[i][j] is equal to:
1. D[i-1][j] (for all j != S[i]),
2. sum[i] + 1 (for j == S[i])
Using this relationship, we consider i from 0 to |S|-1 incrementally and update D[i][j] and sum[i] accordingly, where sum[|S|-1] will be the answer. By updating D[i][j] as such, we will have , we will have an \(O(|S|)\) running time, but with a rather big constant hiding, and a space usage of \(O(|S|)\) with an equally high constant term.
Improving the algorithm, we can just maintain D[j] for each alphabet, and a variable sum which we will update on each iteration:
1. newsum = 2 * sum + 1 - D[current_alphabet]
2. D[current_alphabet] = sum + 1
3. sum = newsum
Cutting down the space to \(O(1)\) and resulting in a blazingly fast solution :D
SPOJ - DSUBSEQ
Summary:
Given a string S, find the number of distinct subsequences.
S only contains uppercase characters.
\(|S| \leq 10^5\), 100 test cases, and time limit of 2s.
Solution:
Let D[i][j] be the number of distinct subsequences starting with alphabet j in a string of length i. Also define sum[i] be the total number of distinct subsequences in S[0..i].
Consider the case where we are appending S[i] to substring S[0..(i-1)]. Then we have the following
D[i][j] is equal to:
1. D[i-1][j] (for all j != S[i]),
2. sum[i] + 1 (for j == S[i])
Using this relationship, we consider i from 0 to |S|-1 incrementally and update D[i][j] and sum[i] accordingly, where sum[|S|-1] will be the answer. By updating D[i][j] as such, we will have , we will have an \(O(|S|)\) running time, but with a rather big constant hiding, and a space usage of \(O(|S|)\) with an equally high constant term.
Improving the algorithm, we can just maintain D[j] for each alphabet, and a variable sum which we will update on each iteration:
1. newsum = 2 * sum + 1 - D[current_alphabet]
2. D[current_alphabet] = sum + 1
3. sum = newsum
Cutting down the space to \(O(1)\) and resulting in a blazingly fast solution :D
a bit of string dp: Codeforces Round #272 Problem E (Div. 2)
Problem Statement:
Codeforces - 476E
Summary:
Given a string \(s\) and a pattern \(p\). Denote \(S_k\) as the set of strings produced from removing exactly \(k\) letters from \(s\). For each \(w \in S_k\), we define occurrences of \(p\) in \(w\) as the maximum number of non-overlapping substrings of \(w\) that match \(p\) exactly. Then our task is, for each \(k \leq |s|\) output the maximum occurrences of \(p\) amongst all \(w \in S_k\).
\(|s| \leq 2000, |p| \leq 500\)
Solution:
Firstly, suppose that we are considering the [i .. \(|s|\)] part of the string s. If we denote next[i] as the index immediately next to the end of a minimum length subsequence in [i .. \(|s|\)] that matches p. We also denote cost[i] as the number of letters that must be skipped (or deleted) in order to complete that subsequence. Then we have the following recursive relationship:
Denote L(i, k) be the maximum occurrences of pattern p in a substring [i .. \(|s|\)] after deleting exactly \(k\) letters. Then
L(i, k) = max { L(i+1, k), L(next[i], k - cost[i]) }
Our answers would be on L(0, k) for all the required k values.
Codeforces - 476E
Summary:
Given a string \(s\) and a pattern \(p\). Denote \(S_k\) as the set of strings produced from removing exactly \(k\) letters from \(s\). For each \(w \in S_k\), we define occurrences of \(p\) in \(w\) as the maximum number of non-overlapping substrings of \(w\) that match \(p\) exactly. Then our task is, for each \(k \leq |s|\) output the maximum occurrences of \(p\) amongst all \(w \in S_k\).
\(|s| \leq 2000, |p| \leq 500\)
Solution:
Firstly, suppose that we are considering the [i .. \(|s|\)] part of the string s. If we denote next[i] as the index immediately next to the end of a minimum length subsequence in [i .. \(|s|\)] that matches p. We also denote cost[i] as the number of letters that must be skipped (or deleted) in order to complete that subsequence. Then we have the following recursive relationship:
Denote L(i, k) be the maximum occurrences of pattern p in a substring [i .. \(|s|\)] after deleting exactly \(k\) letters. Then
L(i, k) = max { L(i+1, k), L(next[i], k - cost[i]) }
Our answers would be on L(0, k) for all the required k values.
Wednesday, July 9, 2014
a bit of string matching: Suffix Automaton
I will not have the time to expand the derivation of this data structure yet, but here I will present the algorithm as written in a paper by Blumer et. al. on smallest suffix automaton that recognizes all subwords.
Problem Statement:
Given a text (fixed), build a deterministic finite automaton (DFA) which accepting states are substrings in the given text.
Algorithm: Construction of the DFA (called Suffix Automaton / Directed acyclic word graphs (DAWG) )
Build-DAWG (string s):
1. Create a node u representing the empty state
2. Set root as u, sink points to u
3. For i from 0 to length of s:
3.1 Create a new node called newsink
3.2 Create a primary edge from sink to newsink with label s[i]
3.3 Set currentstate points to sink, and suffixstate as null
3.4 While currentstate is not root and suffixstate is null:
3.4.1 Set currentstate = currentstate.suff
3.4.2 If currentstate has a primary edge labeled s[i]:
3.4.2.1 Set suffixstate to the state that edge points to
3.4.3 Else if currentstate has a secondary edge labeled s[i]:
3.4.3.1 Let child be the state that edge points to
3.4.3.2 Set newstate = Split( currentstate, child )
3.4.3.3 Set suffixstate = newstate
3.4.4 Else:
3.4.4.1 Create a secondary edge from currentstate to newsink with label s[i]
3.5 If suffixstate is still null:
3.5.1 Set suffixstate = root
3.6 Set newsink.suff = suffixstate
3.7 Set sink = newsink
Split (parent, child):
1. Create a new node newstate
2. Set that secondary edge to a primary edge from parent to newstate
3. For all outgoing primary and secondary edge from child:
3.1 Create a similar outgoing secondary edge from newstate pointing to the same state
4. Set newstate.suff = child.suff
5. Reset child.suff = newstate
6. Set currentstate = parent
7. While currentstate is not root:
7.1 Set currentstate = currentstate.suff
7.2 If currentstate has a secondary edge to child:
7.2.1 Reset the secondary edge to point to newstate
7.3 Else:
7.3.1 Break from the loop
8. Return newstate
Here is an implementation (a very inefficient one :D haha) in C++ :
Problem Statement:
Given a text (fixed), build a deterministic finite automaton (DFA) which accepting states are substrings in the given text.
Algorithm: Construction of the DFA (called Suffix Automaton / Directed acyclic word graphs (DAWG) )
Build-DAWG (string s):
1. Create a node u representing the empty state
2. Set root as u, sink points to u
3. For i from 0 to length of s:
3.1 Create a new node called newsink
3.2 Create a primary edge from sink to newsink with label s[i]
3.3 Set currentstate points to sink, and suffixstate as null
3.4 While currentstate is not root and suffixstate is null:
3.4.1 Set currentstate = currentstate.suff
3.4.2 If currentstate has a primary edge labeled s[i]:
3.4.2.1 Set suffixstate to the state that edge points to
3.4.3 Else if currentstate has a secondary edge labeled s[i]:
3.4.3.1 Let child be the state that edge points to
3.4.3.2 Set newstate = Split( currentstate, child )
3.4.3.3 Set suffixstate = newstate
3.4.4 Else:
3.4.4.1 Create a secondary edge from currentstate to newsink with label s[i]
3.5 If suffixstate is still null:
3.5.1 Set suffixstate = root
3.6 Set newsink.suff = suffixstate
3.7 Set sink = newsink
Split (parent, child):
1. Create a new node newstate
2. Set that secondary edge to a primary edge from parent to newstate
3. For all outgoing primary and secondary edge from child:
3.1 Create a similar outgoing secondary edge from newstate pointing to the same state
4. Set newstate.suff = child.suff
5. Reset child.suff = newstate
6. Set currentstate = parent
7. While currentstate is not root:
7.1 Set currentstate = currentstate.suff
7.2 If currentstate has a secondary edge to child:
7.2.1 Reset the secondary edge to point to newstate
7.3 Else:
7.3.1 Break from the loop
8. Return newstate
Here is an implementation (a very inefficient one :D haha) in C++ :
#include <cstdio> #include <iostream> #include <vector> #include <string> #include <algorithm> using namespace std; //Implementation of Suffix Automaton //First Try, Must be very INEFFICIENT //Using pointers maybe struct state; struct edge; state* split(state* parent, state* child, vector<edge*>::iterator it); struct edge { state* child; char label; }; struct state { //information regarding the state int index; int pos; //suffix link state* suff; //prim edge vector<edge*> prim_edge; //sec edge vector<edge*> sec_edge; }; state *root, *sink; int idx = 1; void init(string str) { //create the root state root = new state; root->suff = NULL; root->prim_edge.clear(); root->sec_edge.clear(); root->index = 0; root->pos = -1; //initialize sink sink = root; int len = str.size(); for(int i=0;i<len;++i) { //edge label char a = str[i]; state* newsink = new state; newsink->index = idx++; newsink->pos = i; newsink->suff = NULL; //initialize new edge to the new sink edge* to_newsink = new edge; to_newsink->child = newsink; to_newsink->label = a; sink->prim_edge.push_back(to_newsink); state* currstate = sink; while( currstate->suff != NULL && newsink->suff == NULL) { currstate = currstate->suff; bool finished = false; //if exist prim_edge of currstate with label == a for(vector<edge*>::iterator it = currstate->prim_edge.begin(); it != currstate->prim_edge.end(); ++it) { if( (*it)->label == a ) { //update suff link of newsink to currstate newsink->suff = (*it)->child; finished = true; break; } } if( finished ) break; //if exist secondary edge for(vector<edge*>::iterator it = currstate->sec_edge.begin(); it != currstate->sec_edge.end(); ++it ) { if( (*it)->label == a ) { //split the state state* child = (*it)->child; state* clone = split(currstate, child, it); //update suffix link of newsink to clone newsink->suff = clone; finished = true; break; } } if( finished ) break; //Else, add secondary edge to newsink and continue currstate->sec_edge.push_back(to_newsink); } //if newsink still does not have suffix link if (newsink->suff == NULL) { //point to source newsink->suff = root; } //update sink sink = newsink; } } state* split(state* parent, state* child, vector<edge*>::iterator iter) { state* clone = new state; clone->index = idx++; clone->pos = child->pos; char label = (*iter)->label; //copy all outgoing edge for(vector<edge*>::iterator it = child->prim_edge.begin(); it != child->prim_edge.end();++it) { edge* e = new edge; e->label = (*it)->label; e->child = (*it)->child; clone->sec_edge.push_back(e); } for(vector<edge*>::iterator it = child->sec_edge.begin(); it != child->sec_edge.end();++it) { edge* e = new edge; e->label = (*it)->label; e->child = (*it)->child; clone->sec_edge.push_back(e); } //update suffix link of clone clone->suff = child->suff; //update suffix link of child child->suff = clone; //update parent's sec edge to child to prim edge to clone parent->sec_edge.erase(iter); edge* to_clone = new edge; to_clone->label = label; to_clone->child = clone; parent->prim_edge.push_back(to_clone); //update all edge to child to point to clone state* currstate = parent; while( currstate->suff != NULL ) { currstate = currstate->suff; bool update = false; for(vector<edge*>::iterator it = currstate->sec_edge.begin(); it != currstate->sec_edge.end(); ++it) { if( (*it)->child == child ) { (*it)->child = clone; update = true; break; } } if( !update ) break; } return clone; } void print(state* s) { printf("state with index: %d\n", s->index); vector<edge*>::iterator it; for(it = s->prim_edge.begin(); it != s->prim_edge.end(); ++it) { printf( "==%c (%d)" , (*it)->label, (*it)->child->index); printf("\n"); } for(it = s->sec_edge.begin(); it != s->sec_edge.end(); ++it) { printf( "--%c (%d)" , (*it)->label, (*it)->child->index); printf("\n"); } for(it = s->prim_edge.begin(); it != s->prim_edge.end(); ++it) { print((*it)->child); } for(it = s->sec_edge.begin(); it != s->sec_edge.end(); ++it) { print((*it)->child); } } int search(string str) { int len = str.size(); state* currstate = root; for(int i=0;i<len;++i) { vector<edge*>::iterator it; bool found = false; for(it = currstate->prim_edge.begin(); it!= currstate->prim_edge.end(); ++it) { if( (*it)->label == str[i] ) { currstate = (*it)->child; found = true; break; } } if( found ) continue; for(it = currstate->sec_edge.begin(); it!= currstate->sec_edge.end(); ++it) { if( (*it)->label == str[i] ) { currstate = (*it)->child; found = true; break; } } if (!found) return -1; } return currstate->pos; } int main() { printf("Welcome to SUFFIX AUTOMATON SEARCH\n"); printf("Enter your string/dictionary\n"); string str; getline(cin,str); init(str); string src; while (1) { printf("Enter your search term:\n"); getline(cin,src); int pos = search(src); if(pos < 0) { printf("The term is not found\n"); } else { printf("The term is found at end-pos: %d\n", pos); } } return 0; }
Subscribe to:
Posts (Atom)