Class KasaiLcpArrayBuilder
An ILcpArrayBuilder implementation which uses the Kasai's algorithm (2001) to compute the LCP Array in linear time.
Implements
Inherited Members
Namespace: MoreStructures.SuffixArrays.LongestCommonPrefix
Assembly: MoreStructures.dll
Syntax
public class KasaiLcpArrayBuilder : NaiveLcpArrayBuilder, ILcpArrayBuilder
Remarks
ADVANTAGES AND DISADVANTAGES
- Compared to the naive implementation following the definition of LCP Array, implemented by
NaiveLcpArrayBuilder, this algorithm has better runtime and equal Space Complexity.
- The better runtime comes at the cost of the complexity of the algorithm itself, which is harder to analyze.
- Moreover, unlike in the naive implementation, the construction of the array doesn't proceed in order, i.e.
it doesn't proceed from the first element to the last element of the array.
- That is, the Kasai's algorithm is not online and, in its base form, cannot be easily made lazy.
ALGORITHM
- The algorithm requires 3 data structures to operate: the text T, its Suffix Array SA and the Inverted Suffix
Array ISA. The three data structures have n items (chars or integers).
- T and SA are externally provided, whereas ISA is calculated scanning linearly SA and populating a dictionary
of positions of SA: ISA[v] = i such that SA[i] = v
. Because there is no better general algorithm than
a single linear scan of the input, ISA is internally calculated, rather than externally provided.
- The current LCP length, named here CLCP, is initialized to 0, and the index of the current suffix in T, named
here CS, is initialized to the first item of SA: CLCP = 0; CS = SA[0]
. This is the setup before the
main loop.
- Then n iterations are performed, filling in the resulting LPC array, named here LCPA, initialized to an empty
array of length n - 1.
- At each iteration the index in SA of the current suffix (starting in T at index CS), is calculated via ISA:
k = ISA[CS]
.
- Then, the next suffix in T, named here NS, is calculated as the successor of k in SA: NS = SA[k + 1]
.
- Important: if ISA[CS]
is the last index of SA (i.e. n - 1), then CLCP is reset and the current
iteration skipped.
- A new value of CLCP is calculated as the longest common prefix between T[CS..] and T[NS..], skipping the
first CLCP - 1 chars, which are equal by construction.
- Such new value of CLCP is then assigned to LCPA at index
- Finally, the index of the current suffix in T, CS, is incremented modulo n and the current iteration
terminates.
- After the loop, LCPA is fully populated and can be returned as result.
COMPLEXITY
- Text and Suffix Array evaluation (required to have direct access by index) are O(n) operations, both in time
and space.
- Inversion of the Suffix Array is also a O(n) operation.
- Initialization of the current LCP and the current prefix are constant-time operations.
- The main loop of the algorithm runs n iterations.
- The only operation which doesn't take constant time in the body of the loop is the LCP between the current
suffix, initialized at the first item of the Suffix Array and then incremented by 1 modulo n at every
iteration, and the next suffix, calculated via Suffix Array and its inverted version.
- While the above is true, it should be noted that the current LCP is increased once per match and decreased
once per iteration. Because the current LCP cannot be bigger than n (that's the biggest number of chars a
prefix can have, hence the bigger number of chars in common between two prefixes), the total number of
comparisons across all iterations is O(n).
- Therefore, Time and Space Complexity are O(n).
Constructors
| Improve this Doc View SourceKasaiLcpArrayBuilder(TextWithTerminator, SuffixArray)
Declaration
public KasaiLcpArrayBuilder(TextWithTerminator text, SuffixArray suffixArray)
Parameters
Type | Name | Description |
---|---|---|
TextWithTerminator | text | |
MoreStructures.SuffixArrays.SuffixArray | suffixArray |
Remarks
Methods
| Improve this Doc View SourceBuild()
Declaration
public override LcpArray Build()
Returns
Type | Description |
---|---|
MoreStructures.SuffixArrays.LongestCommonPrefix.LcpArray |