Class CountingSortCharsSorter
An implementation of ICharsSorter which uses Counting Sort to sort the input.
Inheritance
Implements
Inherited Members
Namespace: MoreStructures.Strings.Sorting
Assembly: MoreStructures.dll
Syntax
public class CountingSortCharsSorter : ICharsSorter
Remarks
ADVANTAGES
- The algorithm is a Counting Sort adaptation and specialization to strings, seen as lists of chars.
- It leverages a runtime linear in the size of the input and its alphabet, which is better than any worst-case
scenario of a comparison-based algorithm, such as QuickSort or MergeSort.
- However, because its runtime depends not only on the size of the input, but also on the size of the alphabet,
it is suitable for scenarios of small alphabets only, or at least when there is an upper bound on the number
of distinct chars in the input (and they are somehow known in advance), which is small enough to fit the
histogram of occurrences in memory.
- For example: genome sequences can be mapped to 4-chars strings; English lower-case sentences are limited to
26 chars plus space, punctuation, etc; digit sequences consist of an alphabet of 10 chars only etc.
- It is not a good fit for scenarios where the number of distinct values can be very high, such as an
unconstrained Unicode string.
ALGORITHM
- First, a histogram of the occurrences in the input of all the chars in the alphabet is created, going through
each char of the input and incrementing the counter of the char in the histogram.
- Then the histogram is made cumulative, by going through each item of the histogram array, starting from the
second item, and cumulating the previous sum into the value at the current position.
- Finally, all chars of the input are iterated once again, in reverse order, to build the order list, i.e. the
list of positions.
COMPLEXITY
- Building the histogram requires iterating over all the n chars of the input.
- The histogram has a size equal to the size of the alphabet, sigma, of the input.
- Making the histogram cumulative requires going through each item of it, and there are sigma items.
- Finally, the second pass of the string means n more iterations, each one doing constant-time work.
- The second pass builds the order list, which is a new array of n items (one per index of the input).
- Therefore, Time Complexity is O(n + sigma) and Space Complexity is O(n + sigma).
Constructors
| Improve this Doc View SourceCountingSortCharsSorter(IDictionary<Char, Int32>)
Declaration
public CountingSortCharsSorter(IDictionary<char, int> alphabet)
Parameters
Type | Name | Description |
---|---|---|
IDictionary<System.Char, System.Int32> | alphabet |
Remarks
Properties
| Improve this Doc View SourceAlphabet
The alphabet of the input, i.e. the list of all System.Char potentially appearing in the input string, mapped to an alphabet index.
Declaration
public IDictionary<char, int> Alphabet { get; }
Property Value
Type | Description |
---|---|
IDictionary<System.Char, System.Int32> |
Remarks
Required by the Counting Sort algorithm, which builds the histogram of occurrences in the input, of all chars of the alphabet of the input.
Methods
| Improve this Doc View SourceSort(String)
Declaration
public IList<int> Sort(string input)
Parameters
Type | Name | Description |
---|---|---|
System.String | input |
Returns
Type | Description |
---|---|
IList<System.Int32> |