Class OrderBasedDoubleLengthPcsClassifier
An implementation of IDoubleLengthPcsClassifier which uses the externally provided position list of the PCS of length PcsLength, in addition to the input string, to generate equivalence classes.
Inheritance
Implements
Inherited Members
Namespace: MoreStructures.SuffixArrays.CyclicShifts
Assembly: MoreStructures.dll
Syntax
public class OrderBasedDoubleLengthPcsClassifier : IDoubleLengthPcsClassifier
Remarks
ADVANTAGES AND DISADVANTAGES
- Unlike NaiveDoubleLengthPcsClassifier, this implementation requires additional data structures
to be provided, in addition to the Input, to calculate equivalence classes.
- However, unlike NaiveDoubleLengthPcsClassifier, it does not require to know whether
Input contains a terminator or not.
- This is because such piece of information would only be needed when running comparisons between PCS.
- This classifier, on the other hand, uses an externally provided position list precisely in order to avoid the
need for costly PCS comparisons, which are "embedded" in the externally provided position list.
- PCS are actually still compared for equality. However, comparison for equality, unlike comparison for
sorting, doesn't require to know whether a char is terminator or not, since a terminator is a char, it is
only equal to itself and different from any other char.
ALGORITHM
- The externally provided ordered sequence of PCS Order is iterated over.
- A new equivalence class is generated (increasing the current counter) every time a PCS differs from the
previous one. The equivalence class is assigned to the output at the index of the PCS being checked.
- The comparison between a PCS and the previous one (both of length PcsLength), is done comparing
the PcsLength chars of the two strings.
- Finally, the equivalence class list is returned.
COMPLEXITY
- Iterating over the n PCS of length L is an O(n) operation. Each iteration does O(L) work, since, while
direct access to the equivalence class list is done in constant time, comparison between the current PCS
and the previous one is a comparison of two strings of length L, and requires all L chars to be comparerd in
the worst case.
- Instantiating the equivalence class list of output is also an O(n) operation.
- Therefore, Time Complexity, as driven by the iteration over Order, is O(n * L). Space
Complexity, driven as well by iteration over Order storing the previous PCS, is O(n + L).
Constructors
| Improve this Doc View SourceOrderBasedDoubleLengthPcsClassifier(String, Int32, IList<Int32>)
Declaration
public OrderBasedDoubleLengthPcsClassifier(string input, int pcsLength, IList<int> order)
Parameters
Type | Name | Description |
---|---|---|
System.String | input | |
System.Int32 | pcsLength | |
IList<System.Int32> | order |
Properties
| Improve this Doc View SourceInput
The input text, whose PCS of length PcsLength have to be classified.
Declaration
public string Input { get; }
Property Value
Type | Description |
---|---|
System.String |
Order
The position list of the PCS of length PcsLength, to be used, in addition to the Input, to calculate the equivalence classes.
Declaration
public IList<int> Order { get; }
Property Value
Type | Description |
---|---|
IList<System.Int32> |
PcsLength
The length the PCS of Input to be classified.
Declaration
public int PcsLength { get; }
Property Value
Type | Description |
---|---|
System.Int32 |
Methods
| Improve this Doc View SourceClassify()
Declaration
public IList<int> Classify()
Returns
Type | Description |
---|---|
IList<System.Int32> |