Monday, October 13, 2014

a bit of dp on string: SPOJ - DSUBSEQ

Problem Statement:
SPOJ - DSUBSEQ

Summary:
Given a string S, find the number of distinct subsequences.
S only contains uppercase characters.
\(|S| \leq 10^5\), 100 test cases, and time limit of 2s.

Solution:
Let D[i][j] be the number of distinct subsequences starting with alphabet j in a string of length i. Also define sum[i] be the total number of distinct subsequences in S[0..i].

Consider the case where we are appending S[i] to substring S[0..(i-1)]. Then we have the following
D[i][j] is equal to:
1. D[i-1][j] (for all j != S[i]),
2. sum[i] + 1 (for j == S[i])

Using this relationship, we consider i from 0 to |S|-1 incrementally and update D[i][j] and sum[i] accordingly, where sum[|S|-1] will be the answer. By updating D[i][j] as such, we will have , we will have an \(O(|S|)\) running time, but with a rather big constant hiding, and a space usage of \(O(|S|)\) with an equally high constant term.

Improving the algorithm, we can just maintain D[j] for each alphabet, and a variable sum which we will update on each iteration:
1. newsum = 2 * sum + 1 - D[current_alphabet]
2. D[current_alphabet] = sum + 1
3. sum = newsum

Cutting down the space to \(O(1)\) and resulting in a blazingly fast solution :D