Class NaivePartiallyRecursiveSuffixTreeBuilder
Builds objects, such as edges and nodes, for SuffixTreeNode structures.
Inheritance
Implements
Inherited Members
Namespace: MoreStructures.SuffixTrees.Builders
Assembly: MoreStructures.dll
Syntax
public class NaivePartiallyRecursiveSuffixTreeBuilder : IBuilder<SuffixTreeEdge, SuffixTreeNode>
Remarks
ADVANTAGES AND DISADVANTAGES
- Implemented as an iteration of recursive visit of the tree being built, with as many iterations as the number
of suffixes of the input (where the longest suffix is the text itself) and one level of recursion per char of
each suffix.
- Limited by call stack depth and usable with input text of a "reasonable" length (i.e. string having a length
< ~1K chars).
ALGORITHM
- For each suffix S of the input T, start from the root node N of the tree, initialized as a mutable tree made
of a simple leaf.
- Compare the first char of S with the first char of each of the edges coming from N.
- If there is no edge e such that label(e)[0] == S[0]
, it means that there is no descendant of N sharing
a path with S. So, create a new leaf under N, attached by an edge with a label equal to S.
- Otherwise, it means that there is a path to be shared between the new leaf and at least one of the
descendants of N.
- In this case, compare the current suffix and the label of such edge e, for the LCP.
- If the prefix in common is shorter than the length of the label of the edge e, create an intermediate node,
push down the child pointed by the edge in the current node and add a new node for the reminder of the suffix
(up to next terminator) as second child of the intermediate.
- Otherwise, eat prefixLength chars from the edge, move to the child pointed by the edge entirely matching the
beginning of the current suffix and repeat the same operation.
- Finally, after all iterations have been executed and all suffixes of the text have been included in the
mutable tree, build the final SuffixTreeNode structure from the mutable tree, using a
MoreStructures.MutableTrees.Conversions.FullyIterativeConversion.
COMPLEXITY
- The generation of the full text, done by
GenerateFullText(TextWithTerminator[]), is a O(n)
operation, where n is the sum of the length of each TextWithTerminator in the input
(terminators included).
- The initial tree, just a root-leaf, is created in constant time.
- The top level loop is executed n times, once per char of the full text.
- Finding the edge with the same first char is O(avgEdges), where avgEdges is the average number of edges
coming out of a node of the tree. This operation is done for all cases (leaf creation alone or leaf +
intermediate node).
- Over all the n nodes of the tree, taking into account the worst case where avgEdges is O(n), Time Complexity
would become O(n^2). However, the total number of edges in a tree is O(n), so the overall cost of these
operations for the entire tree is O(n), and not O(n^2).
- Finding the LCP between the suffix and the label of the current edge are O(n) operations, since the length
of a generic suffix of the text is O(n) and they require iterating over all the chars.
- Such operations are only done when an intermediate node is required. However, in the worst case there are as
many intermediate as leaves in the tree (i.e. each iteration adds an intermediate node and a leaf).
Therefore, the number of intermediate nodes is O(n), and the two O(n) operations above are to be repeated
O(n) times.
- Final step is building the final tree from the mutable tree, which requires the traversal of the O(n) nodes
of the tree. MoreStructures.MutableTrees.Conversions.FullyIterativeConversion does that in O(n) time and
space, via a stack and a precomputed data structure to find the number of terminators per edge in O(1) time.
- In conclusion, Time Complexity = O(n^2) and Space Complexity = O(n) where n = length of the text to match.
- Compared to tries, trees are more compact due to edge coalescing and edge label compression (i.e. edge
strings stored as pair (start, length), rather than as a substring of length chars). Each recursion add a
leaf and at most one intermediate node, so Space Complexity ~ 2 * n = O(n).
Methods
| Improve this Doc View SourceBuildTree(TextWithTerminator[])
Declaration
public SuffixTreeNode BuildTree(params TextWithTerminator[] texts)
Parameters
Type | Name | Description |
---|---|---|
TextWithTerminator[] | texts |
Returns
Type | Description |
---|---|
SuffixTreeNode |