Class FastPrefixFunctionCalculator
An implementation of IPrefixFunctionCalculator which takes advantage of the property that any non-longest border is a also a border of the longest border, to calculate the Prefix Function efficiently.
Inheritance
Implements
Inherited Members
Namespace: MoreStructures.KnuthMorrisPratt.PrefixFunction
Assembly: MoreStructures.dll
Syntax
public class FastPrefixFunctionCalculator : IPrefixFunctionCalculator
Remarks
ALGORITHM
- The algorithm iterates over all the chars of the text, from the first to the last, calculating the Prefix
Function s for each index i.
- The Prefix Function at index 0, s(0) = 0
, since a 1-char string can't have any border.
- For each any other index, it calculates the value of the Prefix Function based on values of s at lower
indexes and based on T.
- if T[i] = T[s(i - 1)]
, where s(i - 1)
has been calculated at the previous step, then
s(i) = s(i - 1) + 1
.
- This is because T[0..s(i - 1)]
is, by definition of s, a border of T[0..i]
and, if
T[i] = T[s(i - 1)]
, then T[0..s(i - 1) + 1] = T[i - s(i)..i]
. So T[0..s(i - 1) + 1]
is a
border of T[0..(i + 1)]
and there can't be any longer border, since s(i + 1) <= s(i) + 1
(the Prefix Function can grow of at most 1 at each index).
- if T[i] != T[s(i - 1)]
, s(i)
will be smaller than s(i - 1) + 1
. While s can increase of
at most 1 at each index, it can decrease to 0 in a single step (if T[i] is such that there are no borders of
T[0..(i + 1)]).
- Because a border of T[0..(i + 1)] (prefix of T including up to the i-th char) is also a border of T[0..i]
(prefix of T at previous step, i.e. including up to the (i - 1)-th char), EXCEPT for the char T[i], which
would come just after the border of T[0..i], the longest border of T[0..(i + 1)] (whose length is s(i)),
can be found by looking for the longest border of T[0..i] followed by T[i].
- If such a border of length w is found, then there is a border of T[0..i], T[0..w], followed by T[w] = T[i].
- Which means that T[0..(w + 1)] == T[(i - w)..(i + 1)]
.
- Therefore, T[0..(w + 1)] is a border of T[0..(i + 1)] and s(i + 1) = w
.
- If no borders are found, the Prefix Function at index i is zero.
COMPLEXITY
- There are as many top-level iterations as indexes of the Prefix Function and chars of the text.
- For each top-level iteration, there are at most as many sub-iterations as borders of the prefix, each one
checking a single char of the text. In the worst case there are O(n) borders, meaning that each sub-iteration
requires O(n) char comparisons.
- That would seem to imply that the number of sub-iterations in total, throughout all top-level iterations,
would be quadratic.
- However, the length of the border calculated by the inner loop is always non-negative and is increased of at
most n times (since it is increased of at most 1 per char of text).
- Moreover, it is progressively shorten by the inner loop, decreased of at least 1 per sub-iteration. This is
because, when the char following the current border is different than the current char (which is the
condition of the inner loop), the next border is the longest border of the current border, which is strictly
shorter than the current border.
- Therefore the length of the border can reach at most n thoughout all top-level iterations, and is always
non-negative and decreased by 1 at each sub-iteration, which means that the total number of inner loop
iterations through all top-level iterations is O(n).
- All accesses to values calculated in previous steps and to chars of the text are done in constant time.
- An array of integers, as big as the numbers of chars in the text, is allocated at the beginning, to memoize
the value calculated at each step, since that value may be used in following iterations.
- No other data structure is allocated by the algorithm.
- Therefore, Time Complexity is O(n) and Space Complexity is O(n).
Methods
| Improve this Doc View SourceGetValues(String)
Declaration
public IEnumerable<int> GetValues(string text)
Parameters
Type | Name | Description |
---|---|---|
System.String | text |
Returns
Type | Description |
---|---|
IEnumerable<System.Int32> |