Search Results for

    Show / Hide Table of Contents

    Class SuffixTrieBasedSnssFinder

    A ISnssFinder implementation and SuffixStructureBasedSnssFinder concretion which uses a SuffixTrieNode structure to implement Find(IEnumerable<Char>, IEnumerable<Char>).

    Inheritance
    System.Object
    SuffixStructureBasedSnssFinder
    SuffixTrieBasedSnssFinder
    Implements
    ISnssFinder
    Inherited Members
    SuffixStructureBasedSnssFinder.Terminator1
    SuffixStructureBasedSnssFinder.Terminator2
    SuffixStructureBasedSnssFinder.ValidateInput
    SuffixStructureBasedSnssFinder.ValidateTexts(IEnumerable<Char>, IEnumerable<Char>)
    SuffixStructureBasedSnssFinder.CollectPrefixChars<TEdge, TNode>(TextWithTerminator, TNode, IDictionary<TNode, TreeTraversalVisit<TEdge, TNode>>)
    System.Object.Equals(System.Object)
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetHashCode()
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    System.Object.ToString()
    Namespace: MoreStructures.SuffixStructures.Matching
    Assembly: MoreStructures.dll
    Syntax
    public class SuffixTrieBasedSnssFinder : SuffixStructureBasedSnssFinder, ISnssFinder
    Remarks

    ADVANTAGES AND DISADVANTAGES
    - Compared to the naive implementation of NaiveSnssFinder, has better aberage runtime, at the cost of space used (which is O(1) for the naive implementation and at least quadratic here).
    - Compared to the SuffixTree-based implementation, it has way worse Time and Space Complexity, but it's easier to implement, visualize and debug step-by-step.

    ALGORITHM
    - First uses the NaivePartiallyRecursiveSuffixTrieBuilder to build a suffix trie of the concatenation of first text, Terminator1, second text and Terminator2, a.k.a. generalized suffix trie.
    - Visit the suffix trie breadth-first, stopping at the first node such that the root-to-node prefix is substring of text1 but not of text2.
    - The root-to-node prefix is a substring of text1 when there is a path-to-leaf which contains an edge including Terminator1.
    - The root-to-node prefix is NOT a substring of text2 when there is no path-to-leaf which doesn't contain Terminator1.
    - Such substring of text1 is guaranteed to be the shortest by the visit order imposed by the breadth-first search. This is true because trie edges are labelled by a single chars, so root-to-node paths which are longer by the number of edges correspond to longer prefixes. This is not true in general for Suffix Trees, where edge labels can vary in length, and a path which is composed of less edges may be longer in chars than a path with more edges but shorter labels in average.

    COMPLEXITY
    - Validating the input requires going through text1 and text2, and takes linear time in the the number of chars n of the concatenated text "text1 ## separator1 ## text2 ## separator2", and constant space.
    - Building the Generalized Suffix Trie takes time and space at least proportional to the number of nodes of the trie, which is quadratic with n (unlike for Suffix Trees).
    - For each level of the breadth-first traversal of the trie, all node-to-leaf paths are checked (in the worst case).
    - There are at most n levels in the trie, since there can't be a path longer than a suffix of the concatenated text, and all suffixes should be covered by the trie. The higher is the level, the shorter are node-to-leaf paths. However, their number is always the same or lower.
    - For each node there are as many node-to-leaf paths as leaves, and there are at most n leaves in the trie (since each suffix can add potentially multiple intermediate nodes, but always a single leaf, having terminator2 as incoming edge).
    - Checking whether a path contains terminator1 takes constant space and a time proportional to the number of nodes in the path, which is O(n).
    - Rebuilding the string from the identified path takes O(n) time and space.
    - So in conclusion, Time Complexity is O(n^3) and Space Complexity is O(n^2).

    Constructors

    | Improve this Doc View Source

    SuffixTrieBasedSnssFinder(Char, Char)

    Builds an instance with the provided terminators, validating the input by default. All text passed to Find(IEnumerable<Char>, IEnumerable<Char>) must conform with these two terminators.

    Declaration
    public SuffixTrieBasedSnssFinder(char terminator1, char terminator2)
    Parameters
    Type Name Description
    System.Char terminator1

    System.Char terminator2

    Methods

    | Improve this Doc View Source

    Find(IEnumerable<Char>, IEnumerable<Char>)


    This implementation builds and uses a SuffixTrieNode structure to perform the search.

    Declaration
    public override IEnumerable<string> Find(IEnumerable<char> text1, IEnumerable<char> text2)
    Parameters
    Type Name Description
    IEnumerable<System.Char> text1
    IEnumerable<System.Char> text2
    Returns
    Type Description
    IEnumerable<System.String>
    Overrides
    SuffixStructureBasedSnssFinder.Find(IEnumerable<Char>, IEnumerable<Char>)
    Remarks

    Implements

    ISnssFinder

    Extension Methods

    SuffixStructureNodeExtensions.GetAllSuffixesFor<TEdge, TNode>(TNode, TextWithTerminator)
    • Improve this Doc
    • View Source
    In This Article
    Back to top Generated by DocFX