Class OrderBasedSingleCharPcsClassifier
A ISingleCharPcsClassifier implementation which uses an externally provided position list Order of the 1-char PCS of the Input, to calculate equivalence classes.
Inheritance
Implements
Inherited Members
Namespace: MoreStructures.SuffixArrays.CyclicShifts
Assembly: MoreStructures.dll
Syntax
public class OrderBasedSingleCharPcsClassifier : ISingleCharPcsClassifier
Remarks
ADVANTAGES
- Compared to the naive implementation of NaiveSingleCharPcsClassifier, it has better runtime and
only allocates an array of n elements, where n is the length of the Input.
- However, it requires the position list of the Input to be provided to the classifier at
construction time.
- Calculating the position list is a linear time operation only when some assumptions can be made on the input,
such as an alphabet of limited size (which is the main scenario for CountingSortCharsSorter).
In all other cases, the runtime has a worst case of O(n * log(n)).
ALGORITHM
- An array of n elements EC, to accomodate the equivalence classes of the result, is allocated upfront.
- The item of EC with index P[0]
, where P is the list of positions, is set to 0. This is because
P[0]
is the index in the Input T, of the smallest char in T. Therefore, there are no
smaller chars in T and the equivalence class of T[P[0]]
is hence the smallest, i.e. 0.
- EC[O[i + 1]]
can be calculated from EC[O[i]]
: two scenarios are possible.
- If T[O[i + 1]] == T[O[i]]
, it means that the two chars T[O[i + 1]]
and T[O[i]]
, which
come one after the other one in the position list, are the same and should have the same equivalence class.
Therefore, the equivalence class of T[O[i + 1]]
, EC[O[i + 1]]
can be set to the equivalence
class of T[O[i]]
, EQ[O[i]]
.
- If, on the other hand, T[O[i + 1]] != T[O[i]]
, it means that the two chars T[O[i + 1]]
and
T[O[i]]
, which come one after the other one in the position list, are different and should have
different equivalence classes. Therefore, the equivalence class of T[O[i + 1]]
, EC[O[i + 1]]
can be set to the successor of the equivalence class of T[O[i]]
, EQ[O[i]]
.
COMPLEXITY
- There are as many iterations as items of the position list.
- Within each iteration, direct accesses to items of Input and Order by index are
done in constant time.
- Therefore, Time and Space Complexity are O(n), excluding the cost of calculating the position list, which is
externally provided. Check ICharsSorter implementations for the complexity of the algorithms
calculating the position list of a string.
Constructors
| Improve this Doc View SourceOrderBasedSingleCharPcsClassifier(String, IList<Int32>)
Declaration
public OrderBasedSingleCharPcsClassifier(string input, IList<int> order)
Parameters
Type | Name | Description |
---|---|---|
System.String | input | |
IList<System.Int32> | order |
Properties
| Improve this Doc View SourceInput
The input text, whose 1-char PCS have to be classified.
Declaration
public string Input { get; }
Property Value
Type | Description |
---|---|
System.String |
Order
The position list of the 1-char PCS of the Input.
Declaration
public IList<int> Order { get; }
Property Value
Type | Description |
---|---|
IList<System.Int32> |
Remarks
Has as many items as chars in the Input.
It is specifically required by this implementation of the algorithm and has to be externally provided.
It can be calculated with any implementation of Sort(String).
Methods
| Improve this Doc View SourceClassify()
Declaration
public IList<int> Classify()
Returns
Type | Description |
---|---|
IList<System.Int32> | A list of equivalence classes, with as many items as chars in the Input. |