Class CountingSortDoubleLengthPcsSorter
An implementation of IDoubleLengthPcsSorter using Counting Sort, to perform sorting of partial cyclic shifts (PCS) of length 2 * L of a string, given the order and the equivalence classes of the PCS of length L.
Inheritance
Implements
Inherited Members
Namespace: MoreStructures.SuffixArrays.CyclicShifts
Assembly: MoreStructures.dll
Syntax
public class CountingSortDoubleLengthPcsSorter : IDoubleLengthPcsSorter
Remarks
ADVANTAGES AND DISADVANTAGES
- Compared to NaiveDoubleLengthPcsSorter, this implementation has way better runtime (linear
instead of cubic).
- Unlike NaiveDoubleLengthPcsSorter, this implementation requires specific data structures
to be provided, in alternative to the input, to calculate the position list.
- However, unlike NaiveDoubleLengthPcsSorter, it does not require to know whether
input contains a terminator or not.
- This is because such piece of information would only be needed when running comparisons between PCS.
- This sorter, on the other hand, uses externally provided lists precisely in order to avoid the
need for costly PCS comparisons, which are "embedded" in the externally provided data structures.
ALGORITHM
- The algorithm takes advantage of the fact that PCS of length L have been already ordered, and their order
from the smallest to the biggest, is defined in the provided position list O, of the PCS of the input string
of length L.
- The PCS of length 2 * L which the algorithm sorts are the ones starting at index i - L, for each i in O.
Let's call O' this extension of O to the the L chars preceding each index of O.
- PCS of length 2 * L starting at index i - L are made of two halves: the first, from index i - L to index i
excluded, which is unsorted; and the second, from index i to i + L excluded, which is already sorted and
whose order is the one defined in O.
- The algorithm uses a stable implementation of the Counting Sort to sort in linear time O', under Counting
Sort assumption of a small alphabet.
- Moreover, the algorithm only looks at the first half of each PCS of length 2 * L, because the second half is
already sorted in O, hence in O', and the implementation of the sorting is stable, i.e. it preserves order of
suffixes, as they appear in the input being sorted, to the output, among the strings sharing the same prefix.
- Counting Sort is adapted in the following way: instead of directly building an histogram of occurrences of
the first halves (PCS of length L), it builds an histogram of their equivalence classes, which are integer
simple to use as a key in Counting Sort and limited in number by the number of chars in the input string.
- The order list of the PCS of length 2 * L is calculated by iterating in reverse order over the equivalence
classes of the second halves, which are sorted in lexicographic ascending order in the provided position
list.
COMPLEXITY
- The algorithm is just a variation of the Counting Sort, where, instead of sorting by PCS (which would require
comparing O(n) strings, or having a much larger alphabet), a sorting by equivalence classes is done.
- Direct access to the position list, as well as to the list of equivalence class is done in constant time.
- Therefore, Time and Space Complexity are both O(n + sigma), where n is the length of the input string, and
also of the position and equivalence classes lists and sigma is the length of the alphabet, i.e. the number
of distinct equivalence classes of PCS of length L.
- Because there are n PCS of length L in the input string, there are at most n distinct equivalent classes.
- Therefore, Time and Space Complexity are actually O(n).
Constructors
| Improve this Doc View SourceCountingSortDoubleLengthPcsSorter(Int32, IList<Int32>, IList<Int32>)
Declaration
public CountingSortDoubleLengthPcsSorter(int pcsLength, IList<int> order, IList<int> eqClasses)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | pcsLength | |
IList<System.Int32> | order | |
IList<System.Int32> | eqClasses |
Properties
| Improve this Doc View SourceEqClasses
The equivalence classes of the already sorted PCS of length
Declaration
public IList<int> EqClasses { get; }
Property Value
Type | Description |
---|---|
IList<System.Int32> |
Order
The position list of the already sorted PCS of length
Declaration
public IList<int> Order { get; }
Property Value
Type | Description |
---|---|
IList<System.Int32> |
PcsLength
The length L of the PCS. Remark: sorting is done of PCS of length 2 * L, not L.
Declaration
public int PcsLength { get; }
Property Value
Type | Description |
---|---|
System.Int32 |
Methods
| Improve this Doc View SourceSort()
Declaration
public IList<int> Sort()
Returns
Type | Description |
---|---|
IList<System.Int32> |