Topological Sort
Tag: [DFS]
Wiki Link: https://en.wikipedia.org/wiki/Topological_sorting
Given a graph describing the relationship between each task.
Given multiple tasks, and the constraints are:
- A depends on B
- B depends on C, D
- C depends on D
- D depends on nothing
- E depends on C
Write an algorithm to generate the solution to print the order of the tasks which can let us finish all tasks.
Solution: DFS (Depth First Search) with permanent_visited and temporary_visited
def topological_sort(graph):
if not graph:
return []
result = []
permanent_visited = set()
temporary_visited = set()
for vertex in graph:
if vertex in permanent_visited:
continue
dfs(graph, vertex, result, permanent_visited, temporary_visited)
return result
def dfs(graph, curr, result, permanent_visited, temporary_visited):
temporary_visited.add(curr)
for child in graph[curr]:
if child in permanent_visited:
continue
if child in temporary_visited:
raise ValueError('The graph is not DAG(Directed Acyclic Graph)')
dfs(graph, child, result, permanent_visited, temporary_visited)
temporary_visited.remove(curr)
permanent_visited.add(curr)
result.append(curr)
if __name__ == '__main__':
print 'Program is working'
graph = {
'A': ['B', 'C'],
'B': ['C'],
'C': ['D', 'E'],
'D': ['E'],
'E': [],
'F': ['E'],
'G': ['H'],
'H': ['I'],
'I': ['J'],
'J': []
}
result = topological_sort(graph)
print result
print 'Program is end'
Note:
- Time complexity = O(V + E), V is the number of vertices, and E is the number of edges.
- We know time complexity of DFS = O(V + E), here the time complexity of the function "dfs" = O(V + E). And there is a outer loop to iterate all vertices (which has not been permanent visited). But Because inside of "dfs" will mark the visited vertex in permanent_visited, so actually all vertices have just been visited once. So the time complexity of Topological sort should be equal to the time complexity of DFS, which is O(V + E).
Solution [OLD Version]: DFS (Depth First Search) with temporary_visited and permanent_visited.
# topological_sorting.py
def topological_sort(graph):
if not graph:
return []
result = []
temporary_visited = set()
permanent_visited = set()
for v in graph:
if v in permanent_visited:
continue
try:
dfs(graph, v, result, temporary_visited, permanent_visited)
except ValueError, e:
print 'error:{}'.format(e.message)
return None
return result
def dfs(graph, v, result, temporary_visited, permanent_visited):
if v in temporary_visited:
raise ValueError('The graph is not DAG (Directed Acyclic Graph), there exists cycle in it.')
temporary_visited.add(v)
for connecting_v in graph[v]:
if connecting_v not in permanent_visited:
dfs(graph, connecting_v, result, temporary_visited, permanent_visited)
permanent_visited.add(v)
temporary_visited.remove(v)
result.append(v)
if __name__ == '__main__':
print 'Program is working'
graph = {
'A': ['B'],
'B': ['C', 'D'],
'C': ['D'],
'D': [],
'E': ['C']
}
sorted_sequence = topological_sort(graph)
print sorted_sequence
print 'Program is end'
Revelation:
Here each time, we just append the v into the end of the 'result' list, the reason is because of the way to describe the relationship between the tasks in the graph (map), such as we're saying A depends on B, and B depends on C, D, and C depends on D.
But if the above graph means: Only after finishing A then we can finish B, and after finishing B, then we can finish C and D. If like that, at the end of each level of the recursion, we should insert the v in the front of the 'result' list by result.insert(0, v).
Note:
- Time complexity is T = O(V + E), V is the number of vertices, and the E is the number of edges. If E = 1/2 * V * (V - 1), it seems that T = O(V^2), but why people says the topological sort is linear time complexity algorithm??? (Personally I think, the time complexity of topological sort should be T = O(V * (V + E)))
- If the graph's vertices are fully connected, the E = 1/2 * V * (V - 1).
E(v) stands for the number of edges in a fully connected graph, who has number of v vertices.
E(1) = 0
E(2) = 1
E(3) = 3
E(4) = 6
E(5) = 10
E(6) = 15
... ...
--------------------------------------------------------------------------------------------
E(2) - E(1) = 1
E(3) - E(2) = 2
E(4) - E(3) = 3
E(5) - E(4) = 4
... ...
E(v) - E(v - 1) = v - 1
If we add all above equations together, we can get:
-E(1) + E(v) = 1 + 2 + 3 + 4 + ... + v - 1
E(v) - E(1) = (1 + (v - 1)) * ((v - 1) - 1 + 1) / 2 = 1/2 * v * (v - 1)
Since E(1) = 0, so
E(v) = 1/2 * v * (v - 1)
- The topological sort algorithm can only work on DAG (Directed Acyclic Graph). And also, we can use this algorithm to detect whether there exists cycle in directed graph.
- DFS's time complexity is T = O(V + E).