Shuffling a collection

The question

In many games you must add some randomness and that might include shuffling a collection of data.

When playing cards with friends, in real life (whaaa?!) you always hear “shuffle them again, they aren’t well shuffled enough!”, so with a proper Penn & Teller face you keep shuffling the cards, many many times, until enough randomness is introduced.

Translating this in code-land, is enough to do

shuffle(cards)

just one time, or must we do something like:


shuffle(cards)
shuffle(cards)
shuffle(cards)

If so, how many times?

What I asked myself is this: if I shuffle the collection twice, will it be more shuffled or, on the contrary, less shuffled than if I shuffle it only once?

Consider the following collection, unshuffled:

[A, B, C, D, E]

Now we shuffle it and get, for example:

[C, D, B, A, E]

My concern is that if I shuffle it again, the items might pop back in in their original place, hence making the collection less unshuffled. Can this happen? Or will it be just coincidence?

But first, how do we define a distance between collections?

Two ways, that I can think of, are:

1) Positions

Sum the difference of the positions of the elements between them.

With this method we iterate item by item for the first collection and found on which position on the other collection the item is, and calculate the difference between positions. Easier to see with pseudo-code:


int distance(List list1, List list2) {
  int ret = 0;

  for (element : list1) {
    ret += abs(list1.indexOf(element) - list2.indexOf(element));
  }

  return ret;
}

2) Permutations

Sum how many permutations must be done to the second collection so that it’s on the same order as the first collection. Again, some pseudo-code:


int permutations(List list1, List list2) {
  int ret = 0;

  for (T element : list1) {
    int posList1 = list1.indexOf(element);
    int posList2 = list2.indexOf(element);

    if (posList1 != posList2) {
      temp = list2.get(posList1);

      list2.set(posList1, element);
      list2.set(posList2, temp);

      ret++;
    }
  }

  return ret;
}

With these two ├╝ber powerful tools, we are ready to find da answer to the question.

The idea is to try and see the distances between the original collection, and a shuffled collection, and compare the results with 2 shuffles, 3 shuffles, et cetera.

Results

Size 5

Size 5

Size 10

Size 10

Size 25

Size 25

The raw data is uploaded here as a google doc

Hence, it’s straightforward: shuffle them as many times you feel like, they’ll have the same entropy

tl;dr: it doesn’t matter how many times you shuffle a collection, as long as the shuffle function is ok, it’s all the same