Class SuffixTrieBasedSnssFinder
A ISnssFinder implementation and SuffixStructureBasedSnssFinder concretion which uses a SuffixTrieNode structure to implement Find(IEnumerable<Char>, IEnumerable<Char>).
Implements
Inherited Members
Namespace: MoreStructures.SuffixStructures.Matching
Assembly: MoreStructures.dll
Syntax
public class SuffixTrieBasedSnssFinder : SuffixStructureBasedSnssFinder, ISnssFinder
Remarks
ADVANTAGES AND DISADVANTAGES
- Compared to the naive implementation of NaiveSnssFinder, has better aberage runtime, at the
cost of space used (which is O(1) for the naive implementation and at least quadratic here).
- Compared to the SuffixTree-based implementation, it has way worse Time and Space Complexity, but it's easier
to implement, visualize and debug step-by-step.
ALGORITHM
- First uses the NaivePartiallyRecursiveSuffixTrieBuilder to build a suffix trie of the
concatenation of first text, Terminator1, second text and
Terminator2, a.k.a. generalized suffix trie.
- Visit the suffix trie breadth-first, stopping at the first node such that the root-to-node prefix is
substring of text1 but not of text2.
- The root-to-node prefix is a substring of text1 when there is a path-to-leaf which contains an edge including
Terminator1.
- The root-to-node prefix is NOT a substring of text2 when there is no path-to-leaf which doesn't contain
Terminator1.
- Such substring of text1 is guaranteed to be the shortest by the visit order imposed by the breadth-first
search. This is true because trie edges are labelled by a single chars, so root-to-node paths which are
longer by the number of edges correspond to longer prefixes. This is not true in general for Suffix Trees,
where edge labels can vary in length, and a path which is composed of less edges may be longer in chars than
a path with more edges but shorter labels in average.
COMPLEXITY
- Validating the input requires going through text1 and text2, and takes linear time in the the number of chars
n of the concatenated text "text1 ## separator1 ## text2 ## separator2", and constant space.
- Building the Generalized Suffix Trie takes time and space at least proportional to the number of nodes of the
trie, which is quadratic with n (unlike for Suffix Trees).
- For each level of the breadth-first traversal of the trie, all node-to-leaf paths are checked (in the worst
case).
- There are at most n levels in the trie, since there can't be a path longer than a suffix of the concatenated
text, and all suffixes should be covered by the trie. The higher is the level, the shorter are node-to-leaf
paths. However, their number is always the same or lower.
- For each node there are as many node-to-leaf paths as leaves, and there are at most n leaves in the trie
(since each suffix can add potentially multiple intermediate nodes, but always a single leaf, having
terminator2 as incoming edge).
- Checking whether a path contains terminator1 takes constant space and a time proportional to the number of
nodes in the path, which is O(n).
- Rebuilding the string from the identified path takes O(n) time and space.
- So in conclusion, Time Complexity is O(n^3) and Space Complexity is O(n^2).
Constructors
| Improve this Doc View SourceSuffixTrieBasedSnssFinder(Char, Char)
Builds an instance with the provided terminators, validating the input by default. All text passed to Find(IEnumerable<Char>, IEnumerable<Char>) must conform with these two terminators.
Declaration
public SuffixTrieBasedSnssFinder(char terminator1, char terminator2)
Parameters
Type | Name | Description |
---|---|---|
System.Char | terminator1 | |
System.Char | terminator2 |
Methods
| Improve this Doc View SourceFind(IEnumerable<Char>, IEnumerable<Char>)
This implementation builds and uses a SuffixTrieNode structure to perform the search.
Declaration
public override IEnumerable<string> Find(IEnumerable<char> text1, IEnumerable<char> text2)
Parameters
Type | Name | Description |
---|---|---|
IEnumerable<System.Char> | text1 | |
IEnumerable<System.Char> | text2 |
Returns
Type | Description |
---|---|
IEnumerable<System.String> |