Class SuffixArrayBasedMatcher
A IMatcher implementation which uses a MoreStructures.SuffixArrays.SuffixArray to perform text pattern matching.
Inheritance
Implements
Inherited Members
Namespace: MoreStructures.BurrowsWheelerTransform.Matching
Assembly: MoreStructures.dll
Syntax
public class SuffixArrayBasedMatcher : IMatcher
Remarks
Text and its MoreStructures.SuffixArrays.SuffixArray has to be provided as an additional input.
If not already available, it can be built via a
SuffixStructureBasedSuffixArrayBuilder<TEdge, TNode>, if a Suffix Structure of the provided text
is already available (such as a Suffix Tree or Trie), and via NaiveSuffixArrayBuilder otherwise.
The algorithm performs two binary searches over the SortedBWT in sequence.
- The first binary search looks for the first index i of the SortedBWT such that the provided
pattern matches the i-th element of the MoreStructures.SuffixArrays.SuffixArray (meaning that the pattern is
fully contained in the i-th suffix of Text in SuffixArray).
- If i is not found, a failing Match is returned. Otherwise, a second binary search is
performed.
- The second binary search looks for the last index j of the SortedBWT respecting the same
condition as the first binary search. This search starts from the index found in the first search (included).
- The second search is guaranteed to succeed (as in the worst case the last occurrence corresponds to the first
one, found in the first search.
- The array of indexes is as long as the length n of SortedBWT, which is also the length of the
text and the MoreStructures.SuffixArrays.SuffixArray.
- Each binary search does a number of comparisons which is logarithmic over n.
- Each comparison is of at most n chars.
- So Time Complexity is O(n * log(n)) and Space Complexity is O(n).
Constructors
| Improve this Doc View SourceSuffixArrayBasedMatcher(RotatedTextWithTerminator, TextWithTerminator, SuffixArray)
Declaration
public SuffixArrayBasedMatcher(RotatedTextWithTerminator sbwt, TextWithTerminator text, SuffixArray suffixArray)
Parameters
Type | Name | Description |
---|---|---|
RotatedTextWithTerminator | sbwt | |
TextWithTerminator | text | |
MoreStructures.SuffixArrays.SuffixArray | suffixArray |
Remarks
Properties
| Improve this Doc View SourceBWT
Declaration
public RotatedTextWithTerminator BWT { get; }
Property Value
Type | Description |
---|---|
RotatedTextWithTerminator |
Remarks
Unlike SortedBWT, BWT is not required to perform Pattern Matching against Text, and is not supported.
OrderedAscListSearch
The ISearch implementation to be used when searching for items in lists sorted in ascending order.
Declaration
protected static ISearch OrderedAscListSearch { get; }
Property Value
Type | Description |
---|---|
ISearch |
SortedBWT
The sorted version of the Burrows-Wheeler Transform. Also, the first column of the Burrows-Wheeler Matrix.
Declaration
public RotatedTextWithTerminator SortedBWT { get; }
Property Value
Type | Description |
---|---|
RotatedTextWithTerminator |
SuffixArray
The MoreStructures.SuffixArrays.SuffixArray of Text.
Declaration
public SuffixArray SuffixArray { get; }
Property Value
Type | Description |
---|---|
MoreStructures.SuffixArrays.SuffixArray |
Text
The TextWithTerminator, to do pattern matching against.
Declaration
public TextWithTerminator Text { get; }
Property Value
Type | Description |
---|---|
TextWithTerminator |
Remarks
Requires to get the actual suffix from the i-th element of SuffixArray, to be compared against the pattern.
Methods
| Improve this Doc View SourceBuildComparer(IEnumerable<Char>)
Builds a SuffixArrayAgainstPatternComparer instance, or a derivation of it, to be used in Match(IEnumerable<Char>), to find the start and end indexes in SortedBWT, corresponding to first and last matches of the pattern in Text.
Declaration
protected virtual SuffixArrayAgainstPatternComparer BuildComparer(IEnumerable<char> pattern)
Parameters
Type | Name | Description |
---|---|---|
IEnumerable<System.Char> | pattern |
Returns
Type | Description |
---|---|
SuffixArrayAgainstPatternComparer | An instance of SuffixArrayAgainstPatternComparer instance, or a derivation of it. |
Match(IEnumerable<Char>)
Tries to match the provided pattern
against Text, via the
SortedBWT and the SuffixArray of Text.
Declaration
public Match Match(IEnumerable<char> pattern)
Parameters
Type | Name | Description |
---|---|---|
IEnumerable<System.Char> | pattern |
Returns
Type | Description |
---|---|
Match |