Class NaiveDoubleLengthPcsClassifier
An implementation of IDoubleLengthPcsClassifier which solely depends on the input string, to generate equivalence classes.
Inheritance
Implements
Inherited Members
Namespace: MoreStructures.SuffixArrays.CyclicShifts
Assembly: MoreStructures.dll
Syntax
public class NaiveDoubleLengthPcsClassifier : IDoubleLengthPcsClassifier
Remarks
ADVANTAGES AND DISADVANTAGES
- Unlike other implementations, such as OrderBasedDoubleLengthPcsClassifier or
EqClassesBasedDoubleLengthPcsClassifier, this classifier needs to know whether the input being
provided includes a terminator (i.e. its last char is "special" as it uniquely identify the end of the
string) or not.
- This information is required since this classifier performs classification by actually sorting PCS, and
sorting PCS requires knowing whether a terminator is present or not (and which one it is).
- Other implementations don't require such information, since they never compare PCS against each other: they
either use an externally provided position list or compare via the equivalence class list.
ALGORITHM
- PCS of length L are generated as actual strings from the input string, then sorted in lexicographic order via
the
- The ordered sequence of PCS with corresponding starting index in the input string is then iterated over.
- A new equivalence class is generated (increasing the current counter) every time a PCS differs from the
previous one. The equivalence class is assigned to the output at the index of the PCS.
- Finally, the equivalence class list is returned.
COMPLEXITY
- Extracting the n PCS of length L from the input string has O(n * L) Time and Space Complexity.
- Sorting the n PCS of length L via the LINQ-provided QuickSort has O(n^2 * L) Time Complexity and O(n) Space
Complexity.
- Iterating over the n PCS of length L is an O(n) operation. Each iteration does O(L) work, since, while
direct access to the equivalence class list is done in constant time, comparison between the current PCS
and the previous one is a comparison of two strings of length L, and requires all L chars to be comparerd in
the worst case.
- Instantiating the equivalence class list of output is also an O(n) operation.
- Therefore, Time Complexity, as driven by sorting, is O(n^2 * L) and Space Complexity, as driven by the PCS
generating and iteration, is O(n * L).
Constructors
| Improve this Doc View SourceNaiveDoubleLengthPcsClassifier(String, Int32, Boolean)
Declaration
public NaiveDoubleLengthPcsClassifier(string input, int pcsLength, bool inputWithTerminator)
Parameters
Type | Name | Description |
---|---|---|
System.String | input | |
System.Int32 | pcsLength | |
System.Boolean | inputWithTerminator | Whether |
Properties
| Improve this Doc View SourceInput
The input text, whose PCS of length PcsLength have to be classified.
Declaration
public string Input { get; }
Property Value
Type | Description |
---|---|
System.String |
PcsLength
The length of the PCS of Input to be classified.
Declaration
public int PcsLength { get; }
Property Value
Type | Description |
---|---|
System.Int32 |
Methods
| Improve this Doc View SourceClassify()
Declaration
public IList<int> Classify()
Returns
Type | Description |
---|---|
IList<System.Int32> |