Search Results for

    Show / Hide Table of Contents

    Class PcsBasedSuffixArrayBuilder

    An algorithm for building the MoreStructures.SuffixArrays.SuffixArray based on fast PCS comparison.

    Inheritance
    System.Object
    PcsBasedSuffixArrayBuilder
    Implements
    ISuffixArrayBuilder
    Inherited Members
    System.Object.Equals(System.Object)
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetHashCode()
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    System.Object.ToString()
    Namespace: MoreStructures.SuffixArrays.Builders
    Assembly: MoreStructures.dll
    Syntax
    public class PcsBasedSuffixArrayBuilder : ISuffixArrayBuilder
    Remarks

    Remark: the following analysis is based on the default implementations used for each of the four steps of the Suffix Array building algorithm, as set by PcsBasedSuffixArrayBuilder(TextWithTerminator, IDictionary<Char, Int32>). Using a builder different from the default for any of the step would result into a different runtime and complexity.

    ALGORITHM
    The algorithm is based on two lists, position list and equivalence class list, which are iteratively recalculated, over and over again, for longer and longer PCS of Text, until the full content of Text is covered by the PCS.
    More in detail:
    - Position and equivalence class lists are first calculated for 1-char PCS, i.e. for single chars, using CountingSortCharsSorter and OrderBasedSingleCharPcsClassifier respectively.
    - Position and equivalence class lists are then calculated for PCS of length double than the current one, using CountingSortDoubleLengthPcsSorter and EqClassesBasedDoubleLengthPcsClassifier respectively.
    - The operation is repeated until the current PCS length becomes bigger or equal than the length of Text.
    - At this point, the last calculated order is the order of PCS which are at least as long as the full Text.
    - Because all such strings include a full suffixes of Text terminated by a terminator, which is a unique char, the resulting order is the order of all suffixes of Text, which is by definition the Suffix Array of Text.

    COMPLEXITY
    - The algorithm has a bootstrap part (1-char PCS) and an iterative part (doubling length PCS).
    - It first calculates position and equivalence lists for 1-char PCS, which are respectively O(n + sigma) and O(n) operations, both in Time and Space Complexity, where n is the length of Text and sigma is the size of the alphabet of Text.
    - Then the iterative part is executed. The number of top-level iterations is logarithmic with n, because the PCS length doubles at every iteration and the termination condition is that the PCS length is at least as big as the length of Text.
    - The two operations done in the iteration loop both have linear Time and Space Complexity.
    - Therefore, both Time and Space Complexity are O(n * log(n) + sigma).
    - If sigma is O(n), Time and Space Complexity are O(n * log(n)).

    Constructors

    | Improve this Doc View Source

    PcsBasedSuffixArrayBuilder(TextWithTerminator, Func<String, ICharsSorter>, Func<String, IList<Int32>, ISingleCharPcsClassifier>, Func<Int32, IList<Int32>, IList<Int32>, IDoubleLengthPcsSorter>, Func<Int32, IList<Int32>, IList<Int32>, IDoubleLengthPcsClassifier>)

    Declaration
    public PcsBasedSuffixArrayBuilder(TextWithTerminator text, Func<string, ICharsSorter> singleCharPcsSorterBuilder, Func<string, IList<int>, ISingleCharPcsClassifier> singleCharPcsClassifierBuilder, Func<int, IList<int>, IList<int>, IDoubleLengthPcsSorter> doubleLengthPcsSorterBuilder, Func<int, IList<int>, IList<int>, IDoubleLengthPcsClassifier> doubleLengthPcsClassifierBuilder)
    Parameters
    Type Name Description
    TextWithTerminator text

    Func<System.String, ICharsSorter> singleCharPcsSorterBuilder

    Func<System.String, IList<System.Int32>, ISingleCharPcsClassifier> singleCharPcsClassifierBuilder

    Func<System.Int32, IList<System.Int32>, IList<System.Int32>, IDoubleLengthPcsSorter> doubleLengthPcsSorterBuilder

    Func<System.Int32, IList<System.Int32>, IList<System.Int32>, IDoubleLengthPcsClassifier> doubleLengthPcsClassifierBuilder

    Remarks

    Allows to specify the algorithm to be used for each of the four steps of the Suffix Array building algorithm, each one via a dedicated builder.

    | Improve this Doc View Source

    PcsBasedSuffixArrayBuilder(TextWithTerminator, IDictionary<Char, Int32>)

    Declaration
    public PcsBasedSuffixArrayBuilder(TextWithTerminator text, IDictionary<char, int> alphabet)
    Parameters
    Type Name Description
    TextWithTerminator text

    IDictionary<System.Char, System.Int32> alphabet

    The alphabet of Text, i.e. the list of all System.Char potentially appearing in Text, mapped to an alphabet index. Required by the Counting Sort algorithm, which builds the histogram of occurrences in the input, of all chars of the alphabet of the input.

    Remarks

    Uses the best implementations for each of the four steps of the Suffix Array building algorithm, resulting in an overall linear Time and Space Complexity.

    Properties

    | Improve this Doc View Source

    DoubleLengthPcsClassifierBuilder

    The builder of the IDoubleLengthPcsClassifier implementation used in the Suffix Array building algorithm, to classify PCS of length 2 * L, once PCS of length L have been sorted and classified and PCS of length 2 * L have been sorted.

    Declaration
    public Func<int, IList<int>, IList<int>, IDoubleLengthPcsClassifier> DoubleLengthPcsClassifierBuilder { get; }
    Property Value
    Type Description
    Func<System.Int32, IList<System.Int32>, IList<System.Int32>, IDoubleLengthPcsClassifier>
    Remarks

    Takes as first input parameter a System.Int32, containing the length L of the PCS.
    Takes as second input parameter a of System.Int32, containing the equivalence classes of the PCS of length L, previously calculated and potentially needed to calculate the equivalence classes of PCS of length 2 * L.
    Takes as third input parameter a of System.Int32, containing the order of the PCS of length L, previously calculated and potentially needed to calculate the equivalence classes of PCS of length 2 * L.
    Returns a suitable IDoubleLengthPcsClassifier implementation.

    | Improve this Doc View Source

    DoubleLengthPcsSorterBuilder

    The builder of the IDoubleLengthPcsSorter implementation used in the Suffix Array building algorithm, to sort PCS of length 2 * L, once PCS of length L have been sorted and classified.

    Declaration
    public Func<int, IList<int>, IList<int>, IDoubleLengthPcsSorter> DoubleLengthPcsSorterBuilder { get; }
    Property Value
    Type Description
    Func<System.Int32, IList<System.Int32>, IList<System.Int32>, IDoubleLengthPcsSorter>
    Remarks

    Takes as first input parameter a System.Int32, containing the length L of the PCS.
    Takes as second input parameter a of System.Int32, containing the order of the PCS of length L, previously calculated and potentially needed to calculate the order of PCS of length 2 * L.
    Takes as third input parameter a of System.Int32, containing the equivalence classes of the PCS of length L, previously calculated and potentially needed to calculate the order of PCS of length 2 * L.
    Returns a suitable IDoubleLengthPcsSorter implementation.

    | Improve this Doc View Source

    SingleCharPcsClassifierBuilder

    The builder of the ISingleCharPcsClassifier implementation used to find equivalence classes of 1-char PCS in the Suffix Array building algorithm.

    Declaration
    public Func<string, IList<int>, ISingleCharPcsClassifier> SingleCharPcsClassifierBuilder { get; }
    Property Value
    Type Description
    Func<System.String, IList<System.Int32>, ISingleCharPcsClassifier>
    Remarks

    Takes as first input parameter a System.String, containing the input text, whose chars have to be classified.
    Takes as second input parameter a of System.Int32, containing the order of the 1-char PCS, previously calculated.
    Returns a suitable ISingleCharPcsClassifier implementation.

    | Improve this Doc View Source

    SingleCharPcsSorterBuilder

    The builder of the ICharsSorter implementation used to sort 1-char PCS in the Suffix Array building algorithm.

    Declaration
    public Func<string, ICharsSorter> SingleCharPcsSorterBuilder { get; }
    Property Value
    Type Description
    Func<System.String, ICharsSorter>
    Remarks

    Takes as input a single System.String parameter, containing the input text, whose chars have to be sorted.
    Returns a suitable ICharsSorter implementation.

    | Improve this Doc View Source

    Text

    The TextWithTerminator, to build the MoreStructures.SuffixArrays.SuffixArray of.

    Declaration
    public TextWithTerminator Text { get; }
    Property Value
    Type Description
    TextWithTerminator

    Methods

    | Improve this Doc View Source

    Build()

    Declaration
    public SuffixArray Build()
    Returns
    Type Description
    MoreStructures.SuffixArrays.SuffixArray

    Implements

    ISuffixArrayBuilder

    Extension Methods

    SuffixStructureNodeExtensions.GetAllSuffixesFor<TEdge, TNode>(TNode, TextWithTerminator)
    • Improve this Doc
    • View Source
    In This Article
    Back to top Generated by DocFX