Class SuffixAndLcpArraysBasedSuffixTreeBuilder
A IBuilder<TEdge, TNode> implementation of SuffixTreeNode structures, using the MoreStructures.SuffixArrays.SuffixArray and the MoreStructures.SuffixArrays.LongestCommonPrefix.LcpArray of the full text, to build the tree efficiently.
Inheritance
Implements
Inherited Members
Namespace: MoreStructures.SuffixTrees.Builders
Assembly: MoreStructures.dll
Syntax
public class SuffixAndLcpArraysBasedSuffixTreeBuilder : IBuilder<SuffixTreeEdge, SuffixTreeNode>
Remarks
ADVANTAGES AND DISADVANTAGES
- Like NaivePartiallyRecursiveSuffixTreeBuilder and UkkonenSuffixTreeBuilder,
it builds the tree one suffix of the text at a time, adding at least a leaf and potentially an intermediate
node at every iteration.
- Unlike them, the suffixes are not added to the tree in the order in which they appear in the text, i.e. they
are not processed from the longest to the shortest, rather from the smallest to the biggest in lexicographic
ascending order.
- It also requires the construction of the MoreStructures.SuffixArrays.SuffixArray and the MoreStructures.SuffixArrays.LongestCommonPrefix.LcpArray of the
input, whereas the naive and the Ukkonen implementations don't require any auxiliary structure.
- Compared to the naive implementation, it has better runtime, even including the cost of building auxiliary
structures: all-round performance is less than the quadratic performance of the naive approach, but worse
than the linear performance of the Ukkonen algorithm.
ALGORITHM
- First, build full text, from the provided TextWithTerminator instances, and the corresponding
MoreStructures.SuffixArrays.SuffixArray SA and MoreStructures.SuffixArrays.LongestCommonPrefix.LcpArray LCPA, which are both fully enumerated in order to be
able to direct access them by index.
- Then, build the initial mutable tree: just the root node with a single leaf linked by an edge labelled with
the 1st suffix in the MoreStructures.SuffixArrays.SuffixArray: SA[0]
.
- This represents the initial state of the mutable tree, which will be modified iteration by iteration going
through the Suffix Array from the 2nd to the last suffix (in lexicographic ascending order).
- The item of the LCP Array in position i - 1, gives the LCP of such suffix with the previous suffix:
LCPA[i - 1] = LCP(SA[i - 1], SA[i])
. Such LCP value is compared with the LCP value calculated at the
previous iteration.
- Different insertion strategies are applied, depending on the result of the comparison. At each iteration the
reference to the last inserted leaf and to the previous LCP value are kept.
- Case 1: if the current LCP is smaller than the previous, the root-to-leaf path of the leaf about to be
added shares a shorter path with the previous leaf. So move up in the tree from the last inserted leaf, up
until the first node which makes current LCP value non smaller than the previous LCP value. Then check
whether it's fallbacking to Case 2 or 3.
- Case 2: if the current LCP is the same as the previous, the root-to-leaf path of the leaf about to be
added shares exactly the same path with the previous leaf. Therefore the new leaf can be attached directly
to the parent of the previous leaf, which becomes a sibling of the new leaf. No new intermediate node.
- Case 3: if the current LCP is bigger than the previous, a part of the previous edge is in common
between the previous leaf and the leaf about to be created in this iteration. Therefore, build new
intermediate, detach old edge going to previous leaf, attach new intermediate edge.
- Finally, after all iterations have been executed and all suffixes of the text have been included in the
mutable tree, build the final SuffixTreeNode structure from the mutable tree, using a
MoreStructures.MutableTrees.Conversions.FullyIterativeConversion.
COMPLEXITY
- The complexity of building the MoreStructures.SuffixArrays.SuffixArray and the MoreStructures.SuffixArrays.LongestCommonPrefix.LcpArray, via the provided
SuffixAndLcpArraysBuilder is not included in this cost analysis. This has to be taken into
account when comparing this implementation of IBuilder<TEdge, TNode> of
SuffixTreeNode structures with other implementations, such as
NaivePartiallyRecursiveSuffixTreeBuilder or UkkonenSuffixTreeBuilder.
- Building the initial mutable tree requires building two nodes and an edge, which can both be done in constant
time.
- Then, n - 1 iterations are performed, where n is the length of the input, executing optionally Case 1 and
then either Case 2 or 3.
- Cases 2 and 3 perform the insertion of a leaf and potentially of an intermediate node, both O(1) operations.
Case 1 may seem an O(n) operation. However, due to LCP Array property, moving up in the tree will happen at
most once per iteration.
- Final step is building the final tree from the mutable tree, which requires the traversal of the O(n) nodes
of the tree. MoreStructures.MutableTrees.Conversions.FullyIterativeConversion does that in O(n) time and
space, via a stack and a precomputed data structure to find the number of terminators per edge in O(1) time.
- In conclusion, both Time and Space Complexity are O(n).
Constructors
| Improve this Doc View SourceSuffixAndLcpArraysBasedSuffixTreeBuilder(Func<TextWithTerminator, (SuffixArray, LcpArray)>)
Declaration
public SuffixAndLcpArraysBasedSuffixTreeBuilder(Func<TextWithTerminator, (SuffixArray, LcpArray)> suffixAndLcpArraysBuilder)
Parameters
Type | Name | Description |
---|---|---|
Func<TextWithTerminator, System.ValueTuple<MoreStructures.SuffixArrays.SuffixArray, MoreStructures.SuffixArrays.LongestCommonPrefix.LcpArray>> | suffixAndLcpArraysBuilder |
Remarks
Properties
| Improve this Doc View SourceSuffixAndLcpArraysBuilder
The builder to be used to construct the MoreStructures.SuffixArrays.SuffixArray and the MoreStructures.SuffixArrays.LongestCommonPrefix.LcpArray of the full text, both required by this algorithm to build the SuffixTreeNode structure.
Declaration
public Func<TextWithTerminator, (SuffixArray, LcpArray)> SuffixAndLcpArraysBuilder { get; }
Property Value
Type | Description |
---|---|
Func<TextWithTerminator, System.ValueTuple<MoreStructures.SuffixArrays.SuffixArray, MoreStructures.SuffixArrays.LongestCommonPrefix.LcpArray>> |
Methods
| Improve this Doc View SourceBuildTree(TextWithTerminator[])
Declaration
public SuffixTreeNode BuildTree(params TextWithTerminator[] texts)
Parameters
Type | Name | Description |
---|---|---|
TextWithTerminator[] | texts |
Returns
Type | Description |
---|---|
SuffixTreeNode |