Class NarrowingIntervalMatcher
Implements
Inherited Members
Namespace: MoreStructures.BurrowsWheelerTransform.Matching
Assembly: MoreStructures.dll
Syntax
public class NarrowingIntervalMatcher : IMatcher
Remarks
ALGORITHM
This is a basic implementation, narrowing the matching interval at every iteration with two linear scans of the
BWT:
- the first from the beginning of the current interval and up to the first char matching the current char;
- and the second from the end of the current interval and up to the last char matching the current char.
COMPLEXITY
- No precomputation cost is paid on instantiation, except for sorting of the BWT to build the
SortedBWT, which takes O(n * log(n)) time using QuickSort, but can
also run in linear time for a constant size alphabet using the Counting Sort.
- Either way, the predominant cost is the main narrowing interval algorithm, which runs for each char in the
BWT (i.e. n times) two linear scans of the BWT itself (on the order of n), resulting in quadratic time
execution.
- Therefore, Time Complexity is O(n^2) and Space Complexity is O(n).
Constructors
| Improve this Doc View SourceNarrowingIntervalMatcher(RotatedTextWithTerminator, RotatedTextWithTerminator)
Builds an instance of this finder, for the provided bwt
and
sbwt
. Because both BWT and SortedBWT are provided, no sorting happens.
Declaration
public NarrowingIntervalMatcher(RotatedTextWithTerminator bwt, RotatedTextWithTerminator sbwt)
Parameters
Type | Name | Description |
---|---|---|
RotatedTextWithTerminator | bwt | The Burrows-Wheeler Transform. Corresponds to the last column of the Burrows-Wheeler Matrix. |
RotatedTextWithTerminator | sbwt | The sorted version of the Burrows-Wheeler Transform. |
Properties
| Improve this Doc View SourceBWT
The Burrows-Wheeler Transform. Also, the last column of the Burrows-Wheeler Matrix.
Declaration
public RotatedTextWithTerminator BWT { get; }
Property Value
Type | Description |
---|---|
RotatedTextWithTerminator |
CharComparer
Declaration
protected IComparer<char> CharComparer { get; }
Property Value
Type | Description |
---|---|
IComparer<System.Char> |
OrderedAscListSearch
The ISearch implementation to be used when searching for items in lists sorted in ascending order.
Declaration
protected static ISearch OrderedAscListSearch { get; }
Property Value
Type | Description |
---|---|
ISearch |
SortedBWT
The sorted version of the Burrows-Wheeler Transform. Also, the first column of the Burrows-Wheeler Matrix.
Declaration
public RotatedTextWithTerminator SortedBWT { get; }
Property Value
Type | Description |
---|---|
RotatedTextWithTerminator |
Methods
| Improve this Doc View SourceBuildLastFirstFinder()
Builds the ILastFirstFinder instance which is then (potentially) used by all iterations of the matching algorithm over the pattern to match against BWT and SortedBWT.
Declaration
protected virtual ILastFirstFinder BuildLastFirstFinder()
Returns
Type | Description |
---|---|
ILastFirstFinder | An instance of ILastFirstFinder. |
Remarks
The ILastFirstFinder implementation used is PrecomputedFinder.
Match(IEnumerable<Char>)
Declaration
public virtual Match Match(IEnumerable<char> pattern)
Parameters
Type | Name | Description |
---|---|---|
IEnumerable<System.Char> | pattern |
Returns
Type | Description |
---|---|
Match |
Remarks
The pattern matching is done via successive narrowing of a interval, defined by a start and an end index.
- At the beginning the interval is as big as the provided BWTransform (and its text).
- The algorithm proceeds in reverse: from the last char of the pattern P, P[^1] to the first, P[0].
- Search in Sorted BWT for the range of indexes (first1, last1) having value P[^1] via a ISearch
implementation (because the input is sorted, binary search is possible).
- The char in BWT at indexes first1 and last1 represent the predecessor of all instances of P[^1] in P. The
interval (first1, last1) can then be narrowed down to (first2, last2), taking into account only the chars
in BWT which match the predecessor of P[^1], P[^2].
- By last-first property, new indexes (first3, last3) of the chars in Sorted BWT corresponding to first2 and
last2 in BWT, can be found. Those are the first and last of the new narrowed range, ready for step 4.
- When all chars of P, up to P[0], have been consumed, all matches have been identified as an interval in Sorted BWT.
NarrowInterval(Char, ILastFirstFinder, Int32, Int32)
Narrows the provided (startIndex
, endIndex
) interval, (possibly) using
the provided ILastFirstFinder finder
for last-first matching.
Declaration
protected virtual (bool success, int narrowedStartIndex, int narrowedEndIndex) NarrowInterval(char currentChar, ILastFirstFinder finder, int startIndex, int endIndex)
Parameters
Type | Name | Description |
---|---|---|
System.Char | currentChar | The char currently being processed. |
ILastFirstFinder | finder | The finder used to perform last-first matching, if needed. When pattern matching a single instance is shared across all iterations over the pattern. |
System.Int32 | startIndex | The lower extreme of the interval to be narrowed. |
System.Int32 | endIndex | The higher extreme of the interval to be narrowed. |
Returns
Type | Description |
---|---|
System.ValueTuple<System.Boolean, System.Int32, System.Int32> | An interval narrower than the one provided as input, or (-1, -1), if narrowing resulted into an empty set (i.e. overall matching has failed). |
Remarks
Narrowing is performed in five sub-steps:
- a linear scan in BWT from
startIndex
downwards is done, to identify the narrowed start; - a linear scan in BWT from
endIndex
upwards is done, to identify the narrowed end; - a last-first of the narrowed start is done, to find the corresponding narrowed start in the Sorted BWT;
- a last-first of the narrowed end is done, to find the corresponding narrowed end in the Sorted BWT;
- the narrowed interval in Sorted BWT is returned.