Topological Sort

Tag: [DFS]

Wiki Link: https://en.wikipedia.org/wiki/Topological_sorting

Given a graph describing the relationship between each task.

Given multiple tasks, and the constraints are:

A depends on B
B depends on C, D
C depends on D
D depends on nothing
E depends on C

Write an algorithm to generate the solution to print the order of the tasks which can let us finish all tasks.

Solution: DFS (Depth First Search) with permanent_visited and temporary_visited

def topological_sort(graph):
    if not graph:
        return []

    result = []
    permanent_visited = set()
    temporary_visited = set()

    for vertex in graph:
        if vertex in permanent_visited:
            continue

        dfs(graph, vertex, result, permanent_visited, temporary_visited)

    return result

def dfs(graph, curr, result, permanent_visited, temporary_visited):
    temporary_visited.add(curr)

    for child in graph[curr]:
        if child in permanent_visited:
            continue
        if child in temporary_visited:
            raise ValueError('The graph is not DAG(Directed Acyclic Graph)')

        dfs(graph, child, result, permanent_visited, temporary_visited)

    temporary_visited.remove(curr)
    permanent_visited.add(curr)
    result.append(curr)

if __name__ == '__main__':
    print 'Program is working'

    graph = {
        'A': ['B', 'C'],
        'B': ['C'],
        'C': ['D', 'E'],
        'D': ['E'],
        'E': [],
        'F': ['E'],
        'G': ['H'],
        'H': ['I'],
        'I': ['J'],
        'J': []
    }

    result = topological_sort(graph)
    print result

    print 'Program is end'

Note:

Time complexity = O(V + E), V is the number of vertices, and E is the number of edges.
We know time complexity of DFS = O(V + E), here the time complexity of the function "dfs" = O(V + E). And there is a outer loop to iterate all vertices (which has not been permanent visited). But Because inside of "dfs" will mark the visited vertex in permanent_visited, so actually all vertices have just been visited once. So the time complexity of Topological sort should be equal to the time complexity of DFS, which is O(V + E).

Solution [OLD Version]: DFS (Depth First Search) with temporary_visited and permanent_visited.

# topological_sorting.py

def topological_sort(graph):
    if not graph:
        return []

    result = []
    temporary_visited = set()
    permanent_visited = set()

    for v in graph:
        if v in permanent_visited:
            continue
        try:
            dfs(graph, v, result, temporary_visited, permanent_visited)
        except ValueError, e:
            print 'error:{}'.format(e.message)
            return None

    return result

def dfs(graph, v, result, temporary_visited, permanent_visited):
    if v in temporary_visited:
        raise ValueError('The graph is not DAG (Directed Acyclic Graph), there exists cycle in it.')

    temporary_visited.add(v)
    for connecting_v in graph[v]:
        if connecting_v not in permanent_visited:
            dfs(graph, connecting_v, result, temporary_visited, permanent_visited)

    permanent_visited.add(v)
    temporary_visited.remove(v)
    result.append(v)

if __name__ == '__main__':
    print 'Program is working'

    graph = {
        'A': ['B'],
        'B': ['C', 'D'],
        'C': ['D'],
        'D': [],
        'E': ['C']
    }

    sorted_sequence = topological_sort(graph)
    print sorted_sequence

    print 'Program is end'

Revelation:

Here each time, we just append the v into the end of the 'result' list, the reason is because of the way to describe the relationship between the tasks in the graph (map), such as we're saying A depends on B, and B depends on C, D, and C depends on D.
But if the above graph means: Only after finishing A then we can finish B, and after finishing B, then we can finish C and D. If like that, at the end of each level of the recursion, we should insert the v in the front of the 'result' list by result.insert(0, v).

Note:

Time complexity is T = O(V + E), V is the number of vertices, and the E is the number of edges. If E = 1/2 * V * (V - 1), it seems that T = O(V^2), but why people says the topological sort is linear time complexity algorithm??? (Personally I think, the time complexity of topological sort should be T = O(V * (V + E)))
If the graph's vertices are fully connected, the E = 1/2 * V * (V - 1).

E(v) stands for the number of edges in a fully connected graph, who has number of v vertices.
E(1) = 0
E(2) = 1
E(3) = 3
E(4) = 6
E(5) = 10
E(6) = 15
... ...

--------------------------------------------------------------------------------------------

E(2) - E(1) = 1
E(3) - E(2) = 2
E(4) - E(3) = 3
E(5) - E(4) = 4
... ...
E(v) - E(v - 1) = v - 1

If we add all above equations together, we can get:
-E(1) + E(v) = 1 + 2 + 3 + 4 + ... + v - 1
E(v) - E(1) = (1 + (v - 1)) * ((v - 1) - 1 + 1) / 2 = 1/2 * v * (v - 1)

Since E(1) = 0, so
E(v) = 1/2 * v * (v - 1)

The topological sort algorithm can only work on DAG (Directed Acyclic Graph). And also, we can use this algorithm to detect whether there exists cycle in directed graph.
DFS's time complexity is T = O(V + E).

Topological Sort

Topological Sort

results matching ""

No results matching ""