Class NaiveBuilder
Implements
Inherited Members
Namespace: MoreStructures.BurrowsWheelerTransform.Builders
Assembly: MoreStructures.dll
Syntax
public class NaiveBuilder : IBuilder
Remarks
Check specific builder methods, such as BuildMatrix(TextWithTerminator), for further information about the complexity of each operation.
Methods
| Improve this Doc View SourceBuildMatrix(BWTransform)
Rebuilds the original BWMatrix from a BWTransform representing the last column of the Burrows-Wheeler Matrix (which is also the Burrows-Wheeler Transform).
Declaration
public BWMatrix BuildMatrix(BWTransform lastBWMColumn)
Parameters
Type | Name | Description |
---|---|---|
BWTransform | lastBWMColumn | The last column of the Burrows-Wheeler Matrix. |
Returns
Type | Description |
---|---|
BWMatrix | The matrix, wrapped into a BWMatrix object. |
Remarks
Because the entire Burrows-Wheeler Matrix is built from the text with an invertible function, and the same happens for the Burrows-Wheeler Transform of the text, it's possible to get back the entire matrix from its last column.
BuildMatrix(TextWithTerminator)
Builds Burrows-Wheeler objects, such as BWMatrix and BWTransform of the provided TextWithTerminator.
Declaration
public virtual BWMatrix BuildMatrix(TextWithTerminator text)
Parameters
Type | Name | Description |
---|---|---|
TextWithTerminator | text | The text to build the BWM, with its terminator (required). |
Returns
Type | Description |
---|---|
BWMatrix | The matrix, wrapped into a BWMatrix object. |
Remarks
COMPLEXITY
Since this operation requires computing a n * n matrix, where n is the
Length of text
, it can be intensive operation, both in
time.
Time Complexity:
- Sorting a large number of strings on a large non-constant alphabet takes n * log(n) * m, where m is
the cost of a comparison of two n-sized strings, which is O(n).
- Therefore Time Complexity is O(n^2 * log(n)).
- If the alphabet can be considered of constant size and comparison between two strings happens in
constant time, the complexity is O(n * log(n)).
Space Complexity:
- The output is a n * n matrix of chars (all cyclic rotations of a n-sized string).
- Therefore Space Complexity is O(n^2 * m), when no assumption is made on the size of a char being
constant, where m = log(w, M), with w = size of a word in memory and M = size of the alphabet.
- If the alphabet can be considered of constant size, the complexity is O(n^2).
BuildTransform(BWMatrix)
Builds the Burrows-Wheeler Transform from the provided BWMatrix.
Declaration
public virtual BWTransform BuildTransform(BWMatrix matrix)
Parameters
Type | Name | Description |
---|---|---|
BWMatrix | matrix | The matrix, whose BWT has to be calculated. |
Returns
Type | Description |
---|---|
BWTransform | The transform, wrapped into a BWTransform object. |
Remarks
BuildTransform(TextWithTerminator)
Declaration
public virtual BWTransform BuildTransform(TextWithTerminator text)
Parameters
Type | Name | Description |
---|---|---|
TextWithTerminator | text |
Returns
Type | Description |
---|---|
BWTransform |
Remarks
COMPLEXITY
- Done without constructing the BWMatrix of text
, which would requires
O(n^2) space.
- Instead, n VirtuallyRotatedTextWithTerminator objects are created (one per char of
text
), mapping a specific rotation of the original text
and taking
into account the rotation in its all its char-position dependent functionalities, such as
CompareTo(VirtuallyRotatedTextWithTerminator),
GetEnumerator() etc.
InvertMatrix(BWMatrix)
Declaration
public virtual TextWithTerminator InvertMatrix(BWMatrix matrix)
Parameters
Type | Name | Description |
---|---|---|
BWMatrix | matrix |
Returns
Type | Description |
---|---|
TextWithTerminator |
Remarks
COMPLEXITY
- No computation to be done, except for building the string of the TextWithTerminator.
- Time Complexity = O(n), Space Complexity = O(n), where n = edge of matrix
.
InvertTransform(RotatedTextWithTerminator)
Declaration
public virtual TextWithTerminator InvertTransform(RotatedTextWithTerminator lastBWMColumn)
Parameters
Type | Name | Description |
---|---|---|
RotatedTextWithTerminator | lastBWMColumn |
Returns
Type | Description |
---|---|
TextWithTerminator |
Remarks
ALGORITHM
This implementation inverts the BWT by iteratively building n+1-mers from n-mers.
- 1-mers (first column of the matrix) is just the last column (BWT), sorted.
That gives a matrix M0 of 1 columns and n rows (where n = length of
lastBWMColumn
).
- 2-mers are derived from 1-mers, by juxtaposing side-by-side last column (BWT) and M0, sorted.
That gives a matrix M1 of 2 columns and n rows.
- 3-mers are derived from 2-mers, by juxtaposing side-by-side last column (BWT) and M1, sorted.
That gives a matrix M2 of 3 columns and n rows.
- And so on, up to (n - 1)-mers and matrix M(n - 2) of n - 1 columns and n rows.
- The last column is already known (BWT), so the text can be extracted from the first line: the first
char is the separator, the rest is the text without separator.
COMPLEXITY
- There are n top-level iterations, where n is the length of lastBWMColumn
.
- Each iteration takes n * log(n) * m time to sort, where m is the length of strings to compare = n.
- So total Time Complexity is O(n^3 * log(n)) and Space Complexity is O(n^2).