Search Results for

    Show / Hide Table of Contents

    Class KasaiLcpArrayBuilder

    An ILcpArrayBuilder implementation which uses the Kasai's algorithm (2001) to compute the LCP Array in linear time.

    Inheritance
    System.Object
    NaiveLcpArrayBuilder
    KasaiLcpArrayBuilder
    Implements
    ILcpArrayBuilder
    Inherited Members
    NaiveLcpArrayBuilder.Text
    NaiveLcpArrayBuilder.SuffixArray
    System.Object.Equals(System.Object)
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetHashCode()
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    System.Object.ToString()
    Namespace: MoreStructures.SuffixArrays.LongestCommonPrefix
    Assembly: MoreStructures.dll
    Syntax
    public class KasaiLcpArrayBuilder : NaiveLcpArrayBuilder, ILcpArrayBuilder
    Remarks

    ADVANTAGES AND DISADVANTAGES
    - Compared to the naive implementation following the definition of LCP Array, implemented by NaiveLcpArrayBuilder, this algorithm has better runtime and equal Space Complexity.
    - The better runtime comes at the cost of the complexity of the algorithm itself, which is harder to analyze.
    - Moreover, unlike in the naive implementation, the construction of the array doesn't proceed in order, i.e. it doesn't proceed from the first element to the last element of the array.
    - That is, the Kasai's algorithm is not online and, in its base form, cannot be easily made lazy.

    ALGORITHM
    - The algorithm requires 3 data structures to operate: the text T, its Suffix Array SA and the Inverted Suffix Array ISA. The three data structures have n items (chars or integers).
    - T and SA are externally provided, whereas ISA is calculated scanning linearly SA and populating a dictionary of positions of SA: ISA[v] = i such that SA[i] = v. Because there is no better general algorithm than a single linear scan of the input, ISA is internally calculated, rather than externally provided.
    - The current LCP length, named here CLCP, is initialized to 0, and the index of the current suffix in T, named here CS, is initialized to the first item of SA: CLCP = 0; CS = SA[0]. This is the setup before the main loop.
    - Then n iterations are performed, filling in the resulting LPC array, named here LCPA, initialized to an empty array of length n - 1.
    - At each iteration the index in SA of the current suffix (starting in T at index CS), is calculated via ISA: k = ISA[CS].
    - Then, the next suffix in T, named here NS, is calculated as the successor of k in SA: NS = SA[k + 1].
    - Important: if ISA[CS] is the last index of SA (i.e. n - 1), then CLCP is reset and the current iteration skipped.
    - A new value of CLCP is calculated as the longest common prefix between T[CS..] and T[NS..], skipping the first CLCP - 1 chars, which are equal by construction.
    - Such new value of CLCP is then assigned to LCPA at index
    - Finally, the index of the current suffix in T, CS, is incremented modulo n and the current iteration terminates.
    - After the loop, LCPA is fully populated and can be returned as result.

    COMPLEXITY
    - Text and Suffix Array evaluation (required to have direct access by index) are O(n) operations, both in time and space.
    - Inversion of the Suffix Array is also a O(n) operation.
    - Initialization of the current LCP and the current prefix are constant-time operations.
    - The main loop of the algorithm runs n iterations.
    - The only operation which doesn't take constant time in the body of the loop is the LCP between the current suffix, initialized at the first item of the Suffix Array and then incremented by 1 modulo n at every iteration, and the next suffix, calculated via Suffix Array and its inverted version.
    - While the above is true, it should be noted that the current LCP is increased once per match and decreased once per iteration. Because the current LCP cannot be bigger than n (that's the biggest number of chars a prefix can have, hence the bigger number of chars in common between two prefixes), the total number of comparisons across all iterations is O(n).
    - Therefore, Time and Space Complexity are O(n).

    Constructors

    | Improve this Doc View Source

    KasaiLcpArrayBuilder(TextWithTerminator, SuffixArray)

    Declaration
    public KasaiLcpArrayBuilder(TextWithTerminator text, SuffixArray suffixArray)
    Parameters
    Type Name Description
    TextWithTerminator text

    MoreStructures.SuffixArrays.SuffixArray suffixArray

    Remarks

    Methods

    | Improve this Doc View Source

    Build()

    Declaration
    public override LcpArray Build()
    Returns
    Type Description
    MoreStructures.SuffixArrays.LongestCommonPrefix.LcpArray
    Overrides
    NaiveLcpArrayBuilder.Build()
    Remarks

    Implements

    ILcpArrayBuilder

    Extension Methods

    SuffixStructureNodeExtensions.GetAllSuffixesFor<TEdge, TNode>(TNode, TextWithTerminator)
    • Improve this Doc
    • View Source
    In This Article
    Back to top Generated by DocFX