Class PrefixFunctionBasedBorderExtraction
An implementation of IBordersExtraction which first calculate the prefix function of the text, and then uses it to calculate the borders efficiently.
Inheritance
Implements
Inherited Members
Namespace: MoreStructures.KnuthMorrisPratt.Borders
Assembly: MoreStructures.dll
Syntax
public class PrefixFunctionBasedBorderExtraction : IBordersExtraction
Remarks
ALGORITHM
- The algorithm takes advantage of the following property of borders: any non-longest border of the text is a
also a border of the longest border of the text.
- That is, if T[0..w] is a border of T of length w, and T has longest border T[0..lb] of length lb, where
0 < w < lb, then T[0..w] is also border of T[0..lb]. That is: any non-longest border of T is also
border of the longest border of T.
- Given the Prefix Function s of T, the length of the longest border of T can be calculated as s(n - 1), where
n is the length of T. The longest border of the longest border can be calculated as s(s(n - 1) - 1). And so
on...
- Therefore, borders are found iteratively: the next border (shorter than the longest) can be calculated as the
longest border of the border found at the previous iteration, by using the Prefix Function on the predecessor
of the previous result: w(i) = s(w(i - 1) - 1); w(i + 1) = s(w(i) - 1) = s(s(w(i - 1) - 1) - 1), etc.
- The iteration terminates when the Prefix Function returns 0, i.e. there is no border.
COMPLEXITY
- Values of the Prefix Function have to be iterated up to the last, since the longest border of the provided
text T has a length equal to the value of the Prefix Function in n - 1, where n is the length of T.
- Then there are as many iterations as the number of borders of T. In the worst case, T can have a number of
borders comparable to the chars in it. For example, the string new string('a', n)
has n - 1 borders:
new string('a', n - 1)
, new string('a', n - 2)
, ..., "a".
- For each iteration, the border is returned as new string, and that requires iterating over each char of the
border. In the worst case, borders of T can have a length comparable to the length of T. For example the
string new string('a', n)
has a border of length n - 1
.
- Direct accessing of the list of values of the Prefix Function is done in constant time.
- Storing the Prefix Function values requires storing n integers, each of constant size.
- Every iteration the string of a single border is built and returned.
- Therefore, Time Complexity is O(Tpf * n + n ^ 2) and Space Complexity is O(Spf * n + n), where Tpf and Spf
are the amortized cost of the Prefix Function values over the n chars of T.
Constructors
| Improve this Doc View SourcePrefixFunctionBasedBorderExtraction(IPrefixFunctionCalculator)
Declaration
public PrefixFunctionBasedBorderExtraction(IPrefixFunctionCalculator prefixFunctionCalculator)
Parameters
Type | Name | Description |
---|---|---|
IPrefixFunctionCalculator | prefixFunctionCalculator |
Remarks
Properties
| Improve this Doc View SourcePrefixFunctionCalculator
The IPrefixFunctionCalculator implementation to be used, to calculate the Prefix Function of the text.
Declaration
protected IPrefixFunctionCalculator PrefixFunctionCalculator { get; }
Property Value
Type | Description |
---|---|
IPrefixFunctionCalculator |
Methods
| Improve this Doc View SourceGetAllBordersByDescLength(String)
Declaration
public IEnumerable<string> GetAllBordersByDescLength(string text)
Parameters
Type | Name | Description |
---|---|---|
System.String | text |
Returns
Type | Description |
---|---|
IEnumerable<System.String> |