Search Results for

    Show / Hide Table of Contents

    Class FastPrefixFunctionCalculator

    An implementation of IPrefixFunctionCalculator which takes advantage of the property that any non-longest border is a also a border of the longest border, to calculate the Prefix Function efficiently.

    Inheritance
    System.Object
    FastPrefixFunctionCalculator
    Implements
    IPrefixFunctionCalculator
    Inherited Members
    System.Object.Equals(System.Object)
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetHashCode()
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    System.Object.ToString()
    Namespace: MoreStructures.KnuthMorrisPratt.PrefixFunction
    Assembly: MoreStructures.dll
    Syntax
    public class FastPrefixFunctionCalculator : IPrefixFunctionCalculator
    Remarks

    ALGORITHM
    - The algorithm iterates over all the chars of the text, from the first to the last, calculating the Prefix Function s for each index i.
    - The Prefix Function at index 0, s(0) = 0, since a 1-char string can't have any border.
    - For each any other index, it calculates the value of the Prefix Function based on values of s at lower indexes and based on T.
    - if T[i] = T[s(i - 1)], where s(i - 1) has been calculated at the previous step, then s(i) = s(i - 1) + 1.
    - This is because T[0..s(i - 1)] is, by definition of s, a border of T[0..i] and, if T[i] = T[s(i - 1)], then T[0..s(i - 1) + 1] = T[i - s(i)..i]. So T[0..s(i - 1) + 1] is a border of T[0..(i + 1)] and there can't be any longer border, since s(i + 1) <= s(i) + 1 (the Prefix Function can grow of at most 1 at each index).
    - if T[i] != T[s(i - 1)], s(i) will be smaller than s(i - 1) + 1. While s can increase of at most 1 at each index, it can decrease to 0 in a single step (if T[i] is such that there are no borders of T[0..(i + 1)]).
    - Because a border of T[0..(i + 1)] (prefix of T including up to the i-th char) is also a border of T[0..i] (prefix of T at previous step, i.e. including up to the (i - 1)-th char), EXCEPT for the char T[i], which would come just after the border of T[0..i], the longest border of T[0..(i + 1)] (whose length is s(i)), can be found by looking for the longest border of T[0..i] followed by T[i].
    - If such a border of length w is found, then there is a border of T[0..i], T[0..w], followed by T[w] = T[i].
    - Which means that T[0..(w + 1)] == T[(i - w)..(i + 1)].
    - Therefore, T[0..(w + 1)] is a border of T[0..(i + 1)] and s(i + 1) = w.
    - If no borders are found, the Prefix Function at index i is zero.

    COMPLEXITY
    - There are as many top-level iterations as indexes of the Prefix Function and chars of the text.
    - For each top-level iteration, there are at most as many sub-iterations as borders of the prefix, each one checking a single char of the text. In the worst case there are O(n) borders, meaning that each sub-iteration requires O(n) char comparisons.
    - That would seem to imply that the number of sub-iterations in total, throughout all top-level iterations, would be quadratic.
    - However, the length of the border calculated by the inner loop is always non-negative and is increased of at most n times (since it is increased of at most 1 per char of text).
    - Moreover, it is progressively shorten by the inner loop, decreased of at least 1 per sub-iteration. This is because, when the char following the current border is different than the current char (which is the condition of the inner loop), the next border is the longest border of the current border, which is strictly shorter than the current border.
    - Therefore the length of the border can reach at most n thoughout all top-level iterations, and is always non-negative and decreased by 1 at each sub-iteration, which means that the total number of inner loop iterations through all top-level iterations is O(n).
    - All accesses to values calculated in previous steps and to chars of the text are done in constant time.
    - An array of integers, as big as the numbers of chars in the text, is allocated at the beginning, to memoize the value calculated at each step, since that value may be used in following iterations.
    - No other data structure is allocated by the algorithm.
    - Therefore, Time Complexity is O(n) and Space Complexity is O(n).

    Methods

    | Improve this Doc View Source

    GetValues(String)

    Declaration
    public IEnumerable<int> GetValues(string text)
    Parameters
    Type Name Description
    System.String text
    Returns
    Type Description
    IEnumerable<System.Int32>
    Remarks

    Implements

    IPrefixFunctionCalculator

    Extension Methods

    SuffixStructureNodeExtensions.GetAllSuffixesFor<TEdge, TNode>(TNode, TextWithTerminator)
    • Improve this Doc
    • View Source
    In This Article
    Back to top Generated by DocFX