Class DfsOnEachVertexSinkBasedTopologicalSort

An ITopologicalSort implementation which assigns topological order to vertices by identifing a sink vertex at each iteration in a deterministic way, running DFS on each vertex and selecting the max sink.

Inheritance

System.Object

DfsOnEachVertexSinkBasedTopologicalSort

Implements

ITopologicalSort

Inherited Members

System.Object.Equals(System.Object)

System.Object.Equals(System.Object, System.Object)

System.Object.GetHashCode()

System.Object.GetType()

System.Object.MemberwiseClone()

System.Object.ReferenceEquals(System.Object, System.Object)

System.Object.ToString()

Namespace: MoreStructures.Graphs.Sorting

Assembly: MoreStructures.dll

Syntax

public class DfsOnEachVertexSinkBasedTopologicalSort : ITopologicalSort

Remarks

ADVANTAGES AND DISADVANTAGES
- This is a very naive implementation of ITopologicalSort.
- Using DFS on each vertex and building a "reachability dictionary" is conceptually simple, but very unefficient, leading to cubic runtime over the number of vertices (compared to the quadratic runtime of AnyPathToSinkBasedTopologicalSort or the linear runtime of an approach based on a single DFS pass of the graph).
- However, because all reachable vertices are retrieved, and the max is selected when looking for a sink, the final topological sort returned is easily predictable: when multiple topological sort orders are possible, the one returned is the one which has higher vertex ids at the end of the output array.
- The deterministic property is not satisfied by some implementations with better runtime, such as AnyPathToSinkBasedTopologicalSort, which starts graph exploration from "some" vertex and follows "some" path, to get to a sink.

ALGORITHM
- First, all reachable vertices, from each vertex of the graph, are identified, running a DepthFirstSearchFromVertex(IGraph, Int32) via the VisitStrategy provided in the constructor.
- Reachable vertices from each vertex are stored into a D, mapping the System.Int32 id of the vertex i to an of vertices reachable from i (including i itself).
- Then, an array of v integers, named here TS, where v is the number of vertices in the graph, is instantiated, to store the topological sort of the provided DAG.
- An of vertices already processed (and removed from D) is also instantiated to an empty set.
- Finally the main loop of the algorithm is repeated, until D remains empty.
- At each iteration, the sink vertex with the biggest index is found.
- The sink is found by looking for vertices from which no other vertices are reachable, other than the vertex itself and possibly the already processed ones (which are "virtually" removed from the graph).
- If such a sink is not found, it means that the remaining graph contains a cycle, because there is at least a vertex in the graph (otherwise the main loop condition would have been false) and for each vertex i there is a vertex j such that a directed path from i to j, i ~~> j, exists. So an is thrown as topological sort only makes sense for DAGs.
- If such a sink is found instead, it is returned as (v - 1 - i)-th item of TS, where i is the 0-based running index of the main loop of the algorithm, incremented by 1 at each iteration.
- Notice that not any sink is taken: instead of stopping at the first sink found, the sink with the highest id is returned. This makes the algorithm fully deterministic and ensures full predictability of the result.
- Finally the array TS is returned.

COMPLEXITY
- Getting the number of vertices via GetNumberOfVertices() is a constant-time operation in any classical implementation of IGraph.
- Running DFS for each vertex of the graph via DepthFirstSearchFromVertex(IGraph, Int32) is a generally expensive operation, which has a Time and Space Complexity which depends on the specific IVisitStrategy used.
- Instantiating the TS array also requires resetting all its v items.
- The main loop of the algorithm, executed v times, requires finding the last sink vertex, which is an operation quadratic in complexity. All other operations require constant time and space.
- Therefore Time Complexity is O(v * Tdfs + v^3) and Space Complexity is O(v^2 + Sdfs), where Tdfs and Sdfs are Time and Space Complexity of running a DFS from a vertex.
- Using FullyIterativeHashSetBasedGraphVisit as VisitStrategy and assuming GetAdjacentVerticesAndEdges(Int32, Boolean) can be computed in constant time, Tdfs and Sdfs becomes linear in v and e.
- In this case, Time Complexity becomes O(v * (v + e) + v^3) which can be simplified as O(v^3), since e can be at most O(v^2) (in graphs where every vertex is connected to every other vertex).

Constructors

| Improve this Doc View Source

DfsOnEachVertexSinkBasedTopologicalSort(IVisitStrategy)

Declaration

public DfsOnEachVertexSinkBasedTopologicalSort(IVisitStrategy visitStrategy)

Parameters

Type	Name	Description
IVisitStrategy	visitStrategy

Remarks

Properties

| Improve this Doc View Source

VisitStrategy

The visitor to be used to identify sink vertices, by running Depth First Searches on the graph.

Declaration

public IVisitStrategy VisitStrategy { get; }

Property Value

Type	Description
IVisitStrategy

Methods

| Improve this Doc View Source

Sort(IGraph)

Declaration

public IList<int> Sort(IGraph dag)

Parameters

Type	Name	Description
IGraph	dag

Returns

Type	Description
IList<System.Int32>

Remarks

Implements

ITopologicalSort

Extension Methods

SuffixStructureNodeExtensions.GetAllSuffixesFor<TEdge, TNode>(TNode, TextWithTerminator)