Interface IBuilder<TEdge, TNode>
Builds objects, such as edges and nodes, for the ISuffixStructureNode<TEdge, TNode> concretion it is the builder of.
Namespace: MoreStructures.SuffixStructures.Builders
Assembly: MoreStructures.dll
Syntax
public interface IBuilder<TEdge, TNode>
where TEdge : ISuffixStructureEdge<TEdge, TNode> where TNode : ISuffixStructureNode<TEdge, TNode>
Type Parameters
Name | Description |
---|---|
TEdge | |
TNode |
Remarks
This interface allows to have a shared construction interface for objects among all structures. Object construction is externalized into a Builder object as a workaround to the limitation of not having constructor signatures directly in interfaces. See https://codeblog.jonskeet.uk/2008/08/29/lessons-learned-from-protocol-buffers-part-4-static-interfaces/
Methods
| Improve this Doc View SourceBuildTree(TextWithTerminator[])
Build a ISuffixStructureEdge<TEdge, TNode> of the provided text(s), which is a n-ary search tree in which edges coming out of a node are substrings of at least one of the texts and nodes are not directly labeled instead.
Declaration
TNode BuildTree(params TextWithTerminator[] texts)
Parameters
Type | Name | Description |
---|---|---|
TextWithTerminator[] | texts | The text(s) to build the Suffix Structure of, each one with its own unique terminator (required for traversal). |
Returns
Type | Description |
---|---|
TNode | The root node of the Suffix Structure. |
Remarks
TERMINATORS
Terminators have to be unique characters, not just on the single TextWithTerminator they are
appended to, but to all TextWithTerminator items in texts
.
That means that if texts
is (T1, ..., Tn), where each TextWithTerminator
Ti has ti as Terminator, ti should not be in any of the
Text of Tj, for any j.
GENERALIZED SUFFIX STRUCTURES
When multiple TextWithTerminator instances are passed into texts
, the
Suffix Structure built is known as Generalized Suffix Structure (e.g. Generalized Suffix Tree or Trie).
It differs from a normal Suffix Structure built for the concatenation of items in texts
by the fact that it doesn't contain any root-to-leaf path identifying a suffix which spans multiple texts.
When a Suffix Structure is built for a single TextWithTerminator which a concatenation of
multiple texts, each with its own terminator T1 || t1 || ... || Tn || tn
, the resulting structure
has branches which span over multiple texts, such as suffixOf[Ti] || ti || prefixOf[T(i+1)]
or
suffixOf[Ti] || ti || T(i+1) || t(i+1) || prefixOf[T(i+2)]
.
A Generalized Suffix Structure trims those branches in construction, so that each node-to-leaf branch ends
with any of the terminator t1, ..., tn (not always tn) and doesn't contain any other terminator except the
ending one.
EDGES
Substrings of a text are identified by their start position in text and their length, rather than by a copy
of the substring itself.
The technique, known as Edge Compression allows to store edges information in constant space, and
the entire tree in linear space w.r.t. the number of nodes in the tree (which can be linear or not in the
input, depending on the type of suffix structure).
The sequence of edges in root-to-node paths in the tree identify prefixes in common to multiple suffixes of
the text(s).