Search Results for

    Show / Hide Table of Contents

    Class SuffixTreeBasedSnssFinder

    A ISnssFinder implementation and SuffixStructureBasedSnssFinder concretion which uses a SuffixTreeNode structure to implement Find(IEnumerable<Char>, IEnumerable<Char>).

    Inheritance
    System.Object
    SuffixStructureBasedSnssFinder
    SuffixTreeBasedSnssFinder
    Implements
    ISnssFinder
    Inherited Members
    SuffixStructureBasedSnssFinder.Terminator1
    SuffixStructureBasedSnssFinder.Terminator2
    SuffixStructureBasedSnssFinder.ValidateInput
    SuffixStructureBasedSnssFinder.ValidateTexts(IEnumerable<Char>, IEnumerable<Char>)
    SuffixStructureBasedSnssFinder.CollectPrefixChars<TEdge, TNode>(TextWithTerminator, TNode, IDictionary<TNode, TreeTraversalVisit<TEdge, TNode>>)
    System.Object.Equals(System.Object)
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetHashCode()
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    System.Object.ToString()
    Namespace: MoreStructures.SuffixStructures.Matching
    Assembly: MoreStructures.dll
    Syntax
    public class SuffixTreeBasedSnssFinder : SuffixStructureBasedSnssFinder, ISnssFinder
    Remarks

    ADVANTAGES AND DISADVANTAGES
    - Compared to the naive implementation of NaiveSnssFinder, has both average and worst case better runtime, at the cost of space used (which is O(1) for the naive implementation).
    - Compared to SuffixTrieBasedSnssFinder, it has better Time and Space Complexity, due to Branches Coaleascing combined with Path Compression. It's, however, more complex to implement, visualize and debug step-by-step.

    ALGORITHM
    - First uses the UkkonenSuffixTreeBuilder to build a suffix tree of the concatenation of first text, Terminator1, second text and Terminator2, stopping branches at the first terminator encountered. This structure is also known as Generalized Suffix Tree.
    - Visit the suffix tree breadth-first, stopping at the first node such that the root-to-node prefix is substring of text1 but not of text2.
    - The root-to-node prefix is a substring of text1 when there is a path-to-leaf which contains an edge including Terminator1.
    - The root-to-node prefix is NOT a substring of text2 when there is no path-to-leaf which doesn't contain Terminator1.
    - Such substring of text1 is guaranteed to be the shortest in number of edges by the visit order imposed by the breadth-first search. However, unlike for Suffix Tries, in Suffix Trees the number of chars per edges varies.
    - Therefore, all pontential results have to be collected, then sorted by actual prefix length.

    COMPLEXITY
    - Validating the input requires going through text1 and text2, and takes linear time in the the number of chars n of the concatenated text "text1 ## separator1 ## text2 ## separator2", and constant space.
    - Building the Generalized Suffix Tree via the Ukkonen algorithm takes time and space at least proportional to the number of nodes of the tree, which is linear with n.
    - For each level of the breadth-first traversal of the trie, all node-to-leaf paths are checked (in the worst case).
    - There are at most n levels in the tree, since there can't be a path longer than a suffix of the concatenated text. The higher is the level, the shorter are node-to-leaf paths. However, their number is always the same or lower.
    - For each node there are as many node-to-leaf paths as leaves, and there are at most n leaves in the tree (since each suffix can add at most a single intermediate node and a single leaf, having terminator 1 or terminator2 as incoming edge).
    - Checking whether a path contains terminator1 takes constant space and a time proportional to the number of nodes in the path, which is O(n).
    - The following optimization is implemented: if a path P1 starting from a node N1 identifies a prefix p1 which is a potential SNSS, all paths starting from nodes Pi which are descendants of N1 would identify prefixes pi which would be longer than p1, so they can be excluded.
    - Rebuilding the string from each identified path takes O(n) time and space. Sorting would take time O(m * log(m) * n) where m is the number of potential SNSS (which is in average much smaller than n).
    - So in conclusion, Time Complexity is O(n^2) and Space Complexity is O(n).

    Constructors

    | Improve this Doc View Source

    SuffixTreeBasedSnssFinder(Char, Char)

    Builds an instance with the provided terminators, validating the input by default. All text passed to Find(IEnumerable<Char>, IEnumerable<Char>) must conform with these two terminators.

    Declaration
    public SuffixTreeBasedSnssFinder(char terminator1, char terminator2)
    Parameters
    Type Name Description
    System.Char terminator1

    System.Char terminator2

    Methods

    | Improve this Doc View Source

    Find(IEnumerable<Char>, IEnumerable<Char>)


    This implementation builds and uses a SuffixTreeNode structure to perform the search.

    Declaration
    public override IEnumerable<string> Find(IEnumerable<char> text1, IEnumerable<char> text2)
    Parameters
    Type Name Description
    IEnumerable<System.Char> text1
    IEnumerable<System.Char> text2
    Returns
    Type Description
    IEnumerable<System.String>
    Overrides
    SuffixStructureBasedSnssFinder.Find(IEnumerable<Char>, IEnumerable<Char>)
    Remarks

    Implements

    ISnssFinder

    Extension Methods

    SuffixStructureNodeExtensions.GetAllSuffixesFor<TEdge, TNode>(TNode, TextWithTerminator)
    • Improve this Doc
    • View Source
    In This Article
    Back to top Generated by DocFX