Search Results for

    Show / Hide Table of Contents

    Class UkkonenSuffixTreeBuilder

    Builds objects, such as edges and nodes, for SuffixTreeNode structures, using the Ukkonen algorithm.

    Inheritance
    System.Object
    UkkonenSuffixTreeBuilder
    Implements
    IBuilder<SuffixTreeEdge, SuffixTreeNode>
    Inherited Members
    System.Object.Equals(System.Object)
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetHashCode()
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    System.Object.ToString()
    Namespace: MoreStructures.SuffixTrees.Builders
    Assembly: MoreStructures.dll
    Syntax
    public class UkkonenSuffixTreeBuilder : IBuilder<SuffixTreeEdge, SuffixTreeNode>
    Remarks

    Iterative implementation of the Ukkonen algorithm: see https://en.wikipedia.org/wiki/Ukkonen%27s_algorithm for an introduction and https://www.cs.helsinki.fi/u/ukkonen/SuffixT1withFigs.pdf for the original paper.

    ALGORITHM OPTIMIZATIONS
    The algorithm applies some optimizations to achieve linear complexity:
    - Edge Labels compression: strings on edges are stored as (start, end) indexes. This makes the space used by a single edge constant, i.e. O(1), because it always consists of two integers, regardless of how many chars of the text it represents.
    - Global End: all leafs have a dynamic end index, autoincremented every new phase. This makes the time used to apply Rule 1 Extension constant, i.e. O(1), because incrementing the global end also increments all leaf implicitely.
    - Suffix Links: all internal nodes point to another internal node sharing the suffix. This makes more efficient traversal, because storing and jumping the suffix link when needed means traversal doesn't have to be done again.
    - Terminators Index Map: a function is used, mapping each index i of the full input text (consolidated TextWithTerminator including all TextWithTerminator instances passed in the input) to the index of the next terminator appearing in the full text not before i. Such function is used to perform edge trimming of the mutable tree structure built by the Ukkonen algorithm in linear time.

    ALGORITHM
    The algorithm is structured in phases, as many as the number of chars in the text.
    At the beginning of a new phase, both the number of remaining suffixes to take care of, and the global end pointer are both increased by 1.
    Each phase is composed of at least 1 iteration, each one taking care of remaining suffixes. At the beginning

    USECASES
    Not limited by call stack depth. Convenient with long input text (i.e. string having a length < ~1K chars).

    COMPLEXITY
    - The Time Complexity of the Ukkonen algorithm, building the mutable tree from the input texts, is O(t). Space Complexity is O(2 * t) where t = length of the text to match.
    - While there are as many phases as number of chars in text (t), and there can be multiple iterations per phase (as many as the number of remaining suffixes to process), the complexity is still linear, ~ 2t.
    - Each node of the resulting mutable tree (there are O(n) of them, and at most 2 * n), is processed by the mutable tree conversion method at most twice: once for leaves and twice for intermediate nodes, which are re-pushed onto the stack before its children, so that they can be processed after all their children have already been converted.
    - Each stack frame processing by the mutable tree conversion method performs constant-time operations only: SuffixTreeNode and SuffixTreeEdge instances creation, children dictionary instantiation and edge trimming (trimming the label of edges in generalized Suffix Trees, to the first terminator appearing in the label, when multiple are present - i.e. when the label spans multiple concatenated texts).
    - Edge trimming is a constant-time operation once the cost of calculating the Terminators Index Map is paid, which is a linear cost in time and space (linear in t).
    - Therefore, mutable tree conversion, and the overall algorithm, is linear in time and space over t.

    Methods

    | Improve this Doc View Source

    BuildTree(TextWithTerminator[])

    Declaration
    public SuffixTreeNode BuildTree(params TextWithTerminator[] texts)
    Parameters
    Type Name Description
    TextWithTerminator[] texts
    Returns
    Type Description
    SuffixTreeNode
    Remarks

    Implements

    IBuilder<TEdge, TNode>

    Extension Methods

    SuffixStructureNodeExtensions.GetAllSuffixesFor<TEdge, TNode>(TNode, TextWithTerminator)
    BuilderExtensions.BuildTree<TEdge, TNode>(IBuilder<TEdge, TNode>, String)
    • Improve this Doc
    • View Source
    In This Article
    Back to top Generated by DocFX