Search Results for

    Show / Hide Table of Contents

    Class CountingSortDoubleLengthPcsSorter

    An implementation of IDoubleLengthPcsSorter using Counting Sort, to perform sorting of partial cyclic shifts (PCS) of length 2 * L of a string, given the order and the equivalence classes of the PCS of length L.

    Inheritance
    System.Object
    CountingSortDoubleLengthPcsSorter
    Implements
    IDoubleLengthPcsSorter
    Inherited Members
    System.Object.Equals(System.Object)
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetHashCode()
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    System.Object.ToString()
    Namespace: MoreStructures.SuffixArrays.CyclicShifts
    Assembly: MoreStructures.dll
    Syntax
    public class CountingSortDoubleLengthPcsSorter : IDoubleLengthPcsSorter
    Remarks

    ADVANTAGES AND DISADVANTAGES
    - Compared to NaiveDoubleLengthPcsSorter, this implementation has way better runtime (linear instead of cubic).
    - Unlike NaiveDoubleLengthPcsSorter, this implementation requires specific data structures to be provided, in alternative to the input, to calculate the position list.
    - However, unlike NaiveDoubleLengthPcsSorter, it does not require to know whether input contains a terminator or not.
    - This is because such piece of information would only be needed when running comparisons between PCS.
    - This sorter, on the other hand, uses externally provided lists precisely in order to avoid the need for costly PCS comparisons, which are "embedded" in the externally provided data structures.

    ALGORITHM
    - The algorithm takes advantage of the fact that PCS of length L have been already ordered, and their order from the smallest to the biggest, is defined in the provided position list O, of the PCS of the input string of length L.
    - The PCS of length 2 * L which the algorithm sorts are the ones starting at index i - L, for each i in O. Let's call O' this extension of O to the the L chars preceding each index of O.
    - PCS of length 2 * L starting at index i - L are made of two halves: the first, from index i - L to index i excluded, which is unsorted; and the second, from index i to i + L excluded, which is already sorted and whose order is the one defined in O.
    - The algorithm uses a stable implementation of the Counting Sort to sort in linear time O', under Counting Sort assumption of a small alphabet.
    - Moreover, the algorithm only looks at the first half of each PCS of length 2 * L, because the second half is already sorted in O, hence in O', and the implementation of the sorting is stable, i.e. it preserves order of suffixes, as they appear in the input being sorted, to the output, among the strings sharing the same prefix.
    - Counting Sort is adapted in the following way: instead of directly building an histogram of occurrences of the first halves (PCS of length L), it builds an histogram of their equivalence classes, which are integer simple to use as a key in Counting Sort and limited in number by the number of chars in the input string.
    - The order list of the PCS of length 2 * L is calculated by iterating in reverse order over the equivalence classes of the second halves, which are sorted in lexicographic ascending order in the provided position list.

    COMPLEXITY
    - The algorithm is just a variation of the Counting Sort, where, instead of sorting by PCS (which would require comparing O(n) strings, or having a much larger alphabet), a sorting by equivalence classes is done.
    - Direct access to the position list, as well as to the list of equivalence class is done in constant time.
    - Therefore, Time and Space Complexity are both O(n + sigma), where n is the length of the input string, and also of the position and equivalence classes lists and sigma is the length of the alphabet, i.e. the number of distinct equivalence classes of PCS of length L.
    - Because there are n PCS of length L in the input string, there are at most n distinct equivalent classes.
    - Therefore, Time and Space Complexity are actually O(n).

    Constructors

    | Improve this Doc View Source

    CountingSortDoubleLengthPcsSorter(Int32, IList<Int32>, IList<Int32>)

    Declaration
    public CountingSortDoubleLengthPcsSorter(int pcsLength, IList<int> order, IList<int> eqClasses)
    Parameters
    Type Name Description
    System.Int32 pcsLength

    IList<System.Int32> order

    IList<System.Int32> eqClasses

    Properties

    | Improve this Doc View Source

    EqClasses

    The equivalence classes of the already sorted PCS of length .

    Declaration
    public IList<int> EqClasses { get; }
    Property Value
    Type Description
    IList<System.Int32>
    | Improve this Doc View Source

    Order

    The position list of the already sorted PCS of length .

    Declaration
    public IList<int> Order { get; }
    Property Value
    Type Description
    IList<System.Int32>
    | Improve this Doc View Source

    PcsLength

    The length L of the PCS. Remark: sorting is done of PCS of length 2 * L, not L.

    Declaration
    public int PcsLength { get; }
    Property Value
    Type Description
    System.Int32

    Methods

    | Improve this Doc View Source

    Sort()

    Declaration
    public IList<int> Sort()
    Returns
    Type Description
    IList<System.Int32>
    Remarks

    Implements

    IDoubleLengthPcsSorter

    Extension Methods

    SuffixStructureNodeExtensions.GetAllSuffixesFor<TEdge, TNode>(TNode, TextWithTerminator)
    • Improve this Doc
    • View Source
    In This Article
    Back to top Generated by DocFX