Search Results for

    Show / Hide Table of Contents

    Class SuffixAndLcpArraysBasedSuffixTreeBuilder

    A IBuilder<TEdge, TNode> implementation of SuffixTreeNode structures, using the MoreStructures.SuffixArrays.SuffixArray and the MoreStructures.SuffixArrays.LongestCommonPrefix.LcpArray of the full text, to build the tree efficiently.

    Inheritance
    System.Object
    SuffixAndLcpArraysBasedSuffixTreeBuilder
    Implements
    IBuilder<SuffixTreeEdge, SuffixTreeNode>
    Inherited Members
    System.Object.Equals(System.Object)
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetHashCode()
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    System.Object.ToString()
    Namespace: MoreStructures.SuffixTrees.Builders
    Assembly: MoreStructures.dll
    Syntax
    public class SuffixAndLcpArraysBasedSuffixTreeBuilder : IBuilder<SuffixTreeEdge, SuffixTreeNode>
    Remarks

    ADVANTAGES AND DISADVANTAGES
    - Like NaivePartiallyRecursiveSuffixTreeBuilder and UkkonenSuffixTreeBuilder, it builds the tree one suffix of the text at a time, adding at least a leaf and potentially an intermediate node at every iteration.
    - Unlike them, the suffixes are not added to the tree in the order in which they appear in the text, i.e. they are not processed from the longest to the shortest, rather from the smallest to the biggest in lexicographic ascending order.
    - It also requires the construction of the MoreStructures.SuffixArrays.SuffixArray and the MoreStructures.SuffixArrays.LongestCommonPrefix.LcpArray of the input, whereas the naive and the Ukkonen implementations don't require any auxiliary structure.
    - Compared to the naive implementation, it has better runtime, even including the cost of building auxiliary structures: all-round performance is less than the quadratic performance of the naive approach, but worse than the linear performance of the Ukkonen algorithm.

    ALGORITHM
    - First, build full text, from the provided TextWithTerminator instances, and the corresponding MoreStructures.SuffixArrays.SuffixArray SA and MoreStructures.SuffixArrays.LongestCommonPrefix.LcpArray LCPA, which are both fully enumerated in order to be able to direct access them by index.
    - Then, build the initial mutable tree: just the root node with a single leaf linked by an edge labelled with the 1st suffix in the MoreStructures.SuffixArrays.SuffixArray: SA[0].
    - This represents the initial state of the mutable tree, which will be modified iteration by iteration going through the Suffix Array from the 2nd to the last suffix (in lexicographic ascending order).
    - The item of the LCP Array in position i - 1, gives the LCP of such suffix with the previous suffix: LCPA[i - 1] = LCP(SA[i - 1], SA[i]). Such LCP value is compared with the LCP value calculated at the previous iteration.
    - Different insertion strategies are applied, depending on the result of the comparison. At each iteration the reference to the last inserted leaf and to the previous LCP value are kept.
    - Case 1: if the current LCP is smaller than the previous, the root-to-leaf path of the leaf about to be added shares a shorter path with the previous leaf. So move up in the tree from the last inserted leaf, up until the first node which makes current LCP value non smaller than the previous LCP value. Then check whether it's fallbacking to Case 2 or 3.
    - Case 2: if the current LCP is the same as the previous, the root-to-leaf path of the leaf about to be added shares exactly the same path with the previous leaf. Therefore the new leaf can be attached directly to the parent of the previous leaf, which becomes a sibling of the new leaf. No new intermediate node.
    - Case 3: if the current LCP is bigger than the previous, a part of the previous edge is in common between the previous leaf and the leaf about to be created in this iteration. Therefore, build new intermediate, detach old edge going to previous leaf, attach new intermediate edge.
    - Finally, after all iterations have been executed and all suffixes of the text have been included in the mutable tree, build the final SuffixTreeNode structure from the mutable tree, using a MoreStructures.MutableTrees.Conversions.FullyIterativeConversion.

    COMPLEXITY
    - The complexity of building the MoreStructures.SuffixArrays.SuffixArray and the MoreStructures.SuffixArrays.LongestCommonPrefix.LcpArray, via the provided SuffixAndLcpArraysBuilder is not included in this cost analysis. This has to be taken into account when comparing this implementation of IBuilder<TEdge, TNode> of SuffixTreeNode structures with other implementations, such as NaivePartiallyRecursiveSuffixTreeBuilder or UkkonenSuffixTreeBuilder.
    - Building the initial mutable tree requires building two nodes and an edge, which can both be done in constant time.
    - Then, n - 1 iterations are performed, where n is the length of the input, executing optionally Case 1 and then either Case 2 or 3.
    - Cases 2 and 3 perform the insertion of a leaf and potentially of an intermediate node, both O(1) operations. Case 1 may seem an O(n) operation. However, due to LCP Array property, moving up in the tree will happen at most once per iteration.
    - Final step is building the final tree from the mutable tree, which requires the traversal of the O(n) nodes of the tree. MoreStructures.MutableTrees.Conversions.FullyIterativeConversion does that in O(n) time and space, via a stack and a precomputed data structure to find the number of terminators per edge in O(1) time.
    - In conclusion, both Time and Space Complexity are O(n).

    Constructors

    | Improve this Doc View Source

    SuffixAndLcpArraysBasedSuffixTreeBuilder(Func<TextWithTerminator, (SuffixArray, LcpArray)>)

    Declaration
    public SuffixAndLcpArraysBasedSuffixTreeBuilder(Func<TextWithTerminator, (SuffixArray, LcpArray)> suffixAndLcpArraysBuilder)
    Parameters
    Type Name Description
    Func<TextWithTerminator, System.ValueTuple<MoreStructures.SuffixArrays.SuffixArray, MoreStructures.SuffixArrays.LongestCommonPrefix.LcpArray>> suffixAndLcpArraysBuilder

    Remarks

    Properties

    | Improve this Doc View Source

    SuffixAndLcpArraysBuilder

    The builder to be used to construct the MoreStructures.SuffixArrays.SuffixArray and the MoreStructures.SuffixArrays.LongestCommonPrefix.LcpArray of the full text, both required by this algorithm to build the SuffixTreeNode structure.

    Declaration
    public Func<TextWithTerminator, (SuffixArray, LcpArray)> SuffixAndLcpArraysBuilder { get; }
    Property Value
    Type Description
    Func<TextWithTerminator, System.ValueTuple<MoreStructures.SuffixArrays.SuffixArray, MoreStructures.SuffixArrays.LongestCommonPrefix.LcpArray>>

    Methods

    | Improve this Doc View Source

    BuildTree(TextWithTerminator[])

    Declaration
    public SuffixTreeNode BuildTree(params TextWithTerminator[] texts)
    Parameters
    Type Name Description
    TextWithTerminator[] texts
    Returns
    Type Description
    SuffixTreeNode
    Remarks

    Implements

    IBuilder<TEdge, TNode>

    Extension Methods

    SuffixStructureNodeExtensions.GetAllSuffixesFor<TEdge, TNode>(TNode, TextWithTerminator)
    BuilderExtensions.BuildTree<TEdge, TNode>(IBuilder<TEdge, TNode>, String)
    • Improve this Doc
    • View Source
    In This Article
    Back to top Generated by DocFX