Class PcsBasedSuffixArrayBuilder
An algorithm for building the MoreStructures.SuffixArrays.SuffixArray based on fast PCS comparison.
Inheritance
Implements
Inherited Members
Namespace: MoreStructures.SuffixArrays.Builders
Assembly: MoreStructures.dll
Syntax
public class PcsBasedSuffixArrayBuilder : ISuffixArrayBuilder
Remarks
Remark: the following analysis is based on the default implementations used for each of the four steps of the Suffix Array building algorithm, as set by PcsBasedSuffixArrayBuilder(TextWithTerminator, IDictionary<Char, Int32>). Using a builder different from the default for any of the step would result into a different runtime and complexity.
ALGORITHM
The algorithm is based on two lists, position list and equivalence class list, which are iteratively
recalculated, over and over again, for longer and longer PCS of Text, until the full content of
Text is covered by the PCS.
More in detail:
- Position and equivalence class lists are first calculated for 1-char PCS, i.e. for single chars, using
CountingSortCharsSorter and OrderBasedSingleCharPcsClassifier respectively.
- Position and equivalence class lists are then calculated for PCS of length double than the current one, using
CountingSortDoubleLengthPcsSorter and EqClassesBasedDoubleLengthPcsClassifier
respectively.
- The operation is repeated until the current PCS length becomes bigger or equal than the length of
Text.
- At this point, the last calculated order is the order of PCS which are at least as long as the full
Text.
- Because all such strings include a full suffixes of Text terminated by a terminator, which is a
unique char, the resulting order is the order of all suffixes of Text, which is by definition
the Suffix Array of Text.
COMPLEXITY
- The algorithm has a bootstrap part (1-char PCS) and an iterative part (doubling length PCS).
- It first calculates position and equivalence lists for 1-char PCS, which are respectively
O(n + sigma) and O(n) operations, both in Time and Space Complexity, where n is the length of
Text and sigma is the size of the alphabet of Text.
- Then the iterative part is executed. The number of top-level iterations is logarithmic with n, because the
PCS length doubles at every iteration and the termination condition is that the PCS length is at least as big
as the length of Text.
- The two operations done in the iteration loop both have linear Time and Space Complexity.
- Therefore, both Time and Space Complexity are O(n * log(n) + sigma).
- If sigma is O(n), Time and Space Complexity are O(n * log(n)).
Constructors
| Improve this Doc View SourcePcsBasedSuffixArrayBuilder(TextWithTerminator, Func<String, ICharsSorter>, Func<String, IList<Int32>, ISingleCharPcsClassifier>, Func<Int32, IList<Int32>, IList<Int32>, IDoubleLengthPcsSorter>, Func<Int32, IList<Int32>, IList<Int32>, IDoubleLengthPcsClassifier>)
Declaration
public PcsBasedSuffixArrayBuilder(TextWithTerminator text, Func<string, ICharsSorter> singleCharPcsSorterBuilder, Func<string, IList<int>, ISingleCharPcsClassifier> singleCharPcsClassifierBuilder, Func<int, IList<int>, IList<int>, IDoubleLengthPcsSorter> doubleLengthPcsSorterBuilder, Func<int, IList<int>, IList<int>, IDoubleLengthPcsClassifier> doubleLengthPcsClassifierBuilder)
Parameters
Type | Name | Description |
---|---|---|
TextWithTerminator | text | |
Func<System.String, ICharsSorter> | singleCharPcsSorterBuilder | |
Func<System.String, IList<System.Int32>, ISingleCharPcsClassifier> | singleCharPcsClassifierBuilder | |
Func<System.Int32, IList<System.Int32>, IList<System.Int32>, IDoubleLengthPcsSorter> | doubleLengthPcsSorterBuilder | |
Func<System.Int32, IList<System.Int32>, IList<System.Int32>, IDoubleLengthPcsClassifier> | doubleLengthPcsClassifierBuilder |
Remarks
Allows to specify the algorithm to be used for each of the four steps of the Suffix Array building algorithm, each one via a dedicated builder.
PcsBasedSuffixArrayBuilder(TextWithTerminator, IDictionary<Char, Int32>)
Declaration
public PcsBasedSuffixArrayBuilder(TextWithTerminator text, IDictionary<char, int> alphabet)
Parameters
Type | Name | Description |
---|---|---|
TextWithTerminator | text | |
IDictionary<System.Char, System.Int32> | alphabet | The alphabet of Text, i.e. the list of all System.Char potentially appearing in Text, mapped to an alphabet index. Required by the Counting Sort algorithm, which builds the histogram of occurrences in the input, of all chars of the alphabet of the input. |
Remarks
Uses the best implementations for each of the four steps of the Suffix Array building algorithm, resulting in an overall linear Time and Space Complexity.
Properties
| Improve this Doc View SourceDoubleLengthPcsClassifierBuilder
The builder of the IDoubleLengthPcsClassifier implementation used in the Suffix Array building algorithm, to classify PCS of length 2 * L, once PCS of length L have been sorted and classified and PCS of length 2 * L have been sorted.
Declaration
public Func<int, IList<int>, IList<int>, IDoubleLengthPcsClassifier> DoubleLengthPcsClassifierBuilder { get; }
Property Value
Type | Description |
---|---|
Func<System.Int32, IList<System.Int32>, IList<System.Int32>, IDoubleLengthPcsClassifier> |
Remarks
Takes as first input parameter a System.Int32, containing the length L of the PCS.
Takes as second input parameter a
Takes as third input parameter a
Returns a suitable IDoubleLengthPcsClassifier implementation.
DoubleLengthPcsSorterBuilder
The builder of the IDoubleLengthPcsSorter implementation used in the Suffix Array building algorithm, to sort PCS of length 2 * L, once PCS of length L have been sorted and classified.
Declaration
public Func<int, IList<int>, IList<int>, IDoubleLengthPcsSorter> DoubleLengthPcsSorterBuilder { get; }
Property Value
Type | Description |
---|---|
Func<System.Int32, IList<System.Int32>, IList<System.Int32>, IDoubleLengthPcsSorter> |
Remarks
Takes as first input parameter a System.Int32, containing the length L of the PCS.
Takes as second input parameter a
Takes as third input parameter a
Returns a suitable IDoubleLengthPcsSorter implementation.
SingleCharPcsClassifierBuilder
The builder of the ISingleCharPcsClassifier implementation used to find equivalence classes of 1-char PCS in the Suffix Array building algorithm.
Declaration
public Func<string, IList<int>, ISingleCharPcsClassifier> SingleCharPcsClassifierBuilder { get; }
Property Value
Type | Description |
---|---|
Func<System.String, IList<System.Int32>, ISingleCharPcsClassifier> |
Remarks
Takes as first input parameter a System.String, containing the input text, whose chars have to be
classified.
Takes as second input parameter a
Returns a suitable ISingleCharPcsClassifier implementation.
SingleCharPcsSorterBuilder
The builder of the ICharsSorter implementation used to sort 1-char PCS in the Suffix Array building algorithm.
Declaration
public Func<string, ICharsSorter> SingleCharPcsSorterBuilder { get; }
Property Value
Type | Description |
---|---|
Func<System.String, ICharsSorter> |
Remarks
Takes as input a single System.String parameter, containing the input text, whose chars have to be
sorted.
Returns a suitable ICharsSorter implementation.
Text
The TextWithTerminator, to build the MoreStructures.SuffixArrays.SuffixArray of.
Declaration
public TextWithTerminator Text { get; }
Property Value
Type | Description |
---|---|
TextWithTerminator |
Methods
| Improve this Doc View SourceBuild()
Declaration
public SuffixArray Build()
Returns
Type | Description |
---|---|
MoreStructures.SuffixArrays.SuffixArray |