Topological Sort

Tag: [DFS]

Wiki Link: https://en.wikipedia.org/wiki/Topological_sorting

Given a graph describing the relationship between each task.

Given multiple tasks, and the constraints are:

  • A depends on B
  • B depends on C, D
  • C depends on D
  • D depends on nothing
  • E depends on C

Write an algorithm to generate the solution to print the order of the tasks which can let us finish all tasks.


Solution: DFS (Depth First Search) with permanent_visited and temporary_visited

def topological_sort(graph):
    if not graph:
        return []

    result = []
    permanent_visited = set()
    temporary_visited = set()

    for vertex in graph:
        if vertex in permanent_visited:
            continue

        dfs(graph, vertex, result, permanent_visited, temporary_visited)

    return result

def dfs(graph, curr, result, permanent_visited, temporary_visited):
    temporary_visited.add(curr)

    for child in graph[curr]:
        if child in permanent_visited:
            continue
        if child in temporary_visited:
            raise ValueError('The graph is not DAG(Directed Acyclic Graph)')

        dfs(graph, child, result, permanent_visited, temporary_visited)

    temporary_visited.remove(curr)
    permanent_visited.add(curr)
    result.append(curr)

if __name__ == '__main__':
    print 'Program is working'

    graph = {
        'A': ['B', 'C'],
        'B': ['C'],
        'C': ['D', 'E'],
        'D': ['E'],
        'E': [],
        'F': ['E'],
        'G': ['H'],
        'H': ['I'],
        'I': ['J'],
        'J': []
    }

    result = topological_sort(graph)
    print result

    print 'Program is end'

Note:

  • Time complexity = O(V + E), V is the number of vertices, and E is the number of edges.
  • We know time complexity of DFS = O(V + E), here the time complexity of the function "dfs" = O(V + E). And there is a outer loop to iterate all vertices (which has not been permanent visited). But Because inside of "dfs" will mark the visited vertex in permanent_visited, so actually all vertices have just been visited once. So the time complexity of Topological sort should be equal to the time complexity of DFS, which is O(V + E).

Solution [OLD Version]: DFS (Depth First Search) with temporary_visited and permanent_visited.

# topological_sorting.py

def topological_sort(graph):
    if not graph:
        return []

    result = []
    temporary_visited = set()
    permanent_visited = set()

    for v in graph:
        if v in permanent_visited:
            continue
        try:
            dfs(graph, v, result, temporary_visited, permanent_visited)
        except ValueError, e:
            print 'error:{}'.format(e.message)
            return None

    return result

def dfs(graph, v, result, temporary_visited, permanent_visited):
    if v in temporary_visited:
        raise ValueError('The graph is not DAG (Directed Acyclic Graph), there exists cycle in it.')

    temporary_visited.add(v)
    for connecting_v in graph[v]:
        if connecting_v not in permanent_visited:
            dfs(graph, connecting_v, result, temporary_visited, permanent_visited)

    permanent_visited.add(v)
    temporary_visited.remove(v)
    result.append(v)

if __name__ == '__main__':
    print 'Program is working'

    graph = {
        'A': ['B'],
        'B': ['C', 'D'],
        'C': ['D'],
        'D': [],
        'E': ['C']
    }

    sorted_sequence = topological_sort(graph)
    print sorted_sequence

    print 'Program is end'

Revelation:

  • Here each time, we just append the v into the end of the 'result' list, the reason is because of the way to describe the relationship between the tasks in the graph (map), such as we're saying A depends on B, and B depends on C, D, and C depends on D.

  • But if the above graph means: Only after finishing A then we can finish B, and after finishing B, then we can finish C and D. If like that, at the end of each level of the recursion, we should insert the v in the front of the 'result' list by result.insert(0, v).

Note:

  • Time complexity is T = O(V + E), V is the number of vertices, and the E is the number of edges. If E = 1/2 * V * (V - 1), it seems that T = O(V^2), but why people says the topological sort is linear time complexity algorithm??? (Personally I think, the time complexity of topological sort should be T = O(V * (V + E)))
  • If the graph's vertices are fully connected, the E = 1/2 * V * (V - 1).
E(v) stands for the number of edges in a fully connected graph, who has number of v vertices.
E(1) = 0
E(2) = 1
E(3) = 3
E(4) = 6
E(5) = 10
E(6) = 15
... ...

--------------------------------------------------------------------------------------------

E(2) - E(1) = 1
E(3) - E(2) = 2
E(4) - E(3) = 3
E(5) - E(4) = 4
... ...
E(v) - E(v - 1) = v - 1

If we add all above equations together, we can get:
-E(1) + E(v) = 1 + 2 + 3 + 4 + ... + v - 1
E(v) - E(1) = (1 + (v - 1)) * ((v - 1) - 1 + 1) / 2 = 1/2 * v * (v - 1)

Since E(1) = 0, so
E(v) = 1/2 * v * (v - 1)
  • The topological sort algorithm can only work on DAG (Directed Acyclic Graph). And also, we can use this algorithm to detect whether there exists cycle in directed graph.
  • DFS's time complexity is T = O(V + E).

results matching ""

    No results matching ""