Class SuffixTrieBasedSnssFinder

A ISnssFinder implementation and SuffixStructureBasedSnssFinder concretion which uses a SuffixTrieNode structure to implement Find(IEnumerable<Char>, IEnumerable<Char>).

Inheritance

System.Object

SuffixStructureBasedSnssFinder

SuffixTrieBasedSnssFinder

Implements

ISnssFinder

Inherited Members

SuffixStructureBasedSnssFinder.Terminator1

SuffixStructureBasedSnssFinder.Terminator2

SuffixStructureBasedSnssFinder.ValidateInput

SuffixStructureBasedSnssFinder.ValidateTexts(IEnumerable<Char>, IEnumerable<Char>)

SuffixStructureBasedSnssFinder.CollectPrefixChars<TEdge, TNode>(TextWithTerminator, TNode, IDictionary<TNode, TreeTraversalVisit<TEdge, TNode>>)

System.Object.Equals(System.Object)

System.Object.Equals(System.Object, System.Object)

System.Object.GetHashCode()

System.Object.GetType()

System.Object.MemberwiseClone()

System.Object.ReferenceEquals(System.Object, System.Object)

System.Object.ToString()

Namespace: MoreStructures.SuffixStructures.Matching

Assembly: MoreStructures.dll

Syntax

public class SuffixTrieBasedSnssFinder : SuffixStructureBasedSnssFinder, ISnssFinder

Remarks

ADVANTAGES AND DISADVANTAGES
- Compared to the naive implementation of NaiveSnssFinder, has better aberage runtime, at the cost of space used (which is O(1) for the naive implementation and at least quadratic here).
- Compared to the SuffixTree-based implementation, it has way worse Time and Space Complexity, but it's easier to implement, visualize and debug step-by-step.

ALGORITHM
- First uses the NaivePartiallyRecursiveSuffixTrieBuilder to build a suffix trie of the concatenation of first text, Terminator1, second text and Terminator2, a.k.a. generalized suffix trie.
- Visit the suffix trie breadth-first, stopping at the first node such that the root-to-node prefix is substring of text1 but not of text2.
- The root-to-node prefix is a substring of text1 when there is a path-to-leaf which contains an edge including Terminator1.
- The root-to-node prefix is NOT a substring of text2 when there is no path-to-leaf which doesn't contain Terminator1.
- Such substring of text1 is guaranteed to be the shortest by the visit order imposed by the breadth-first search. This is true because trie edges are labelled by a single chars, so root-to-node paths which are longer by the number of edges correspond to longer prefixes. This is not true in general for Suffix Trees, where edge labels can vary in length, and a path which is composed of less edges may be longer in chars than a path with more edges but shorter labels in average.

COMPLEXITY
- Validating the input requires going through text1 and text2, and takes linear time in the the number of chars n of the concatenated text "text1 ## separator1 ## text2 ## separator2", and constant space.
- Building the Generalized Suffix Trie takes time and space at least proportional to the number of nodes of the trie, which is quadratic with n (unlike for Suffix Trees).
- For each level of the breadth-first traversal of the trie, all node-to-leaf paths are checked (in the worst case).
- There are at most n levels in the trie, since there can't be a path longer than a suffix of the concatenated text, and all suffixes should be covered by the trie. The higher is the level, the shorter are node-to-leaf paths. However, their number is always the same or lower.
- For each node there are as many node-to-leaf paths as leaves, and there are at most n leaves in the trie (since each suffix can add potentially multiple intermediate nodes, but always a single leaf, having terminator2 as incoming edge).
- Checking whether a path contains terminator1 takes constant space and a time proportional to the number of nodes in the path, which is O(n).
- Rebuilding the string from the identified path takes O(n) time and space.
- So in conclusion, Time Complexity is O(n^3) and Space Complexity is O(n^2).

Constructors

| Improve this Doc View Source

SuffixTrieBasedSnssFinder(Char, Char)

Builds an instance with the provided terminators, validating the input by default. All text passed to Find(IEnumerable<Char>, IEnumerable<Char>) must conform with these two terminators.

Declaration

public SuffixTrieBasedSnssFinder(char terminator1, char terminator2)

Parameters

Type	Name	Description
System.Char	terminator1
System.Char	terminator2

Methods

| Improve this Doc View Source

Find(IEnumerable<Char>, IEnumerable<Char>)

This implementation builds and uses a SuffixTrieNode structure to perform the search.

Declaration

public override IEnumerable<string> Find(IEnumerable<char> text1, IEnumerable<char> text2)

Parameters

Type	Name	Description
IEnumerable<System.Char>	text1
IEnumerable<System.Char>	text2

Returns

Type	Description
IEnumerable<System.String>

Overrides

SuffixStructureBasedSnssFinder.Find(IEnumerable<Char>, IEnumerable<Char>)

Remarks

Implements

ISnssFinder

Extension Methods

SuffixStructureNodeExtensions.GetAllSuffixesFor<TEdge, TNode>(TNode, TextWithTerminator)