Class Matcher
Exposes utility methods to match a text against a pattern, using the Knuth-Morris-Pratt algorithm.
Inheritance
Inherited Members
Namespace: MoreStructures.KnuthMorrisPratt.Matching
Assembly: MoreStructures.dll
Syntax
public static class Matcher
Methods
| Improve this Doc View SourceMatch(String, String, Char, Boolean)
Tries to match the provided pattern
against the provided text
, using the
Knuth-Morris-Pratt algorithm. Returns all matches found.
Declaration
public static IEnumerable<Match<int>> Match(string text, string pattern, char separator, bool validateSeparator = false)
Parameters
Type | Name | Description |
---|---|---|
System.String | text | The text, to match the pattern against. |
System.String | pattern | The pattern to match. |
System.Char | separator | A special character, to be used as separator in the concatenated string, between |
System.Boolean | validateSeparator | Whether an additional validation check should be performed, before actually running the algorithm, on whether
|
Returns
Type | Description |
---|---|
IEnumerable<Match<System.Int32>> | A sequence of all matches, from the one starting at the lowest index in |
Remarks
ALGORITHM
The Knuth-Morris-Pratt algorithm:
- First builds a concatenated string in the form "pattern-separator-text".
- Then calculates the Prefix Function of the concatenated string.
- Finally, iterates over the Prefix Function, retaining all indexes i such that the value of the Prefix
Function at i is equal to the length of the pattern. That means that the prefix of the concatenated
string from 0 to i (included) has a border of length equal to the length of the pattern, which means that
the prefix contains the pattern.
- For each retained index i, a successful match is emitted, starting at index i - 2 * m and matching
exactly m chars, where m is the length of the pattern.
COMPLEXITY
- The only validation check which is not done in constant time is whether text and pattern contain the
separator. This validation check is by default not done, and has to be opted in via the flag
validateSeparator
. When executed it takes O(n + m) time, since it has to compare the
separator against each char of the text and of the pattern.
- The concatenated string has length n + m + 1, where n is the length of the text and m is the length of
the pattern. Building it requires iterating both over the pattern and the text.
- Calculating the Prefix Function of the concatenated string via FastPrefixFunctionCalculator
is a O(n + m + 1) operation, both in time and space.
- Iterating over the indexes of the Prefix Function which refer to the text is an O(n) operation.
- Each iteration makes a constant-time comparison and yield the result.
- Therefore, both Time and Space Complexity are O(n + m).