Shuffle
Tags: [math], [statistics]
Link: https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#The_modern_algorithm
Given an array, please write an algorithm to shuffle it (randomize it).
Solution: In-place
import random
def shuffle(arr):
if not arr:
return arr
arr_len = len(arr)
for i in xrange(arr_len - 1):
random_index = random.randint(i, arr_len - 1)
arr[i], arr[random_index] = arr[random_index], arr[i]
return arr
Revelation:
- To prove the algorithm generates 100% random shuffle: For example, the original array we can call it 'arr' (such as [A, B, C, D, E, F]), and the shuffled array, we can called 'shuffle_arr'(such as [B, A, E, C, D, F]) (although this algorithm is modifying the array in-place, here we still use two names to mark the array we shuffled before and after). First we can think about just one element, what's the probability the 'A' was put in the position shuffle_arr[1]? The answer is P = (probability of 'A' was not picked at first time * probability of A was picked at the second time) = (1 - (1 / n)) * (1 / n - 1) = ((n - 1) / n) * (1 / n - 1) = 1 / n. We can use this way to calculate each element, finally we will find the probability of each element at each position are all 1 / n, which prove that this algorithm can generate 100% random shuffle.
Note:
- Time complexity = O(n), n is the number of elements of the given array.
- For the last element, we don't need to swap it with others, because there is no remaining elements to swap with it, and also, it has been given opportunity to be swapped.
Solution: not in-place
import random
def shuffle(arr):
if not arr:
return arr
arr_len = len(arr)
index_set = set()
shuffle = []
while len(shuffle) < arr_len:
random_index = None
while True:
r_i = random.randint(0, arr_len - 1)
if r_i not in index_set:
random_index = r_i
break
index_set.add(random_index)
shuffle.append(arr[random_index])
return shuffle
Revelation:
- Sometimes, we may be required not to modify the given array, so we may have two ways to achieve this go, the first one is using the in-place algorithm, but not modify the given array, we can copy the given array first, and modify the copied one. The second way is using not-in-place approach.
Note:
- It's a little hard to calculate the time complexity for above algorithm, because we don't know how many times it take to generate each random_index.
- To prove the above algorithm can generate 100% random shuffle, we can use the same way (as in-place solution) to analyze the correctness of the above algorithm.