== Intuition ==
Imagine you have a pile of cards and you want to create a hand that is completely random. Here "completely random" means it has to be a hand that is uniformly drawn at random from the set of all permutations of hands. That is, all permutations must have the same probability of being chosen.
The simplest way to do this that I can think of is this: pull a card at random out of the pile, add it to your hand as the first card. Then pull another card at random out of the pile, and add that to your hand as the second card, keeping them in order. And then draw another card randomly out of the pile, and add it to your hand as the third card. And so on and so forth until the pile is empty.
In Knuth's algorithm, the way you "pull" a card out of the pile is to swap the "current card" A[i] with A[i to n] where n is the index of the last card. This means that A[i] is going to be a card randomly chosen from A[i to n], i.e it's a card randomly pulled from that subarray. And since we then move on to A[i+1] for which we choose from the subarray A[i+1,n], that means the card that we have just chosen is not put back into the pile.
This basically follows the definition of a random permutation: the first card is chosen uniformly at random from the set of all the cards. The second card is chosen uniformly at random from the set of all the remaining cards - that is, the set of card that do not include the previously chosen cards. And so on and so forth.
== Informal proof (using the tree diagram) ==
We can draw a tree diagram to represent all the possible runs of the algorithm on a particular input. At the topmost level, at the root node, it has n children (each child node corresponds to a different choice that the algorithm can make). At the second level, each node has n-1 children, and so on and so forth. So it's easy therefore to see that there are n! possible paths through the tree, each with equal probability (because each node at each level has equal probability, so the probability of reaching each leaf node is the same).
Now all we have to do is to show that each of the n! permutations is represented by at least one execution path. Note that at each node we produce a distinct permutation. Which means that each node on each level produces a set of subtrees that is distinct from all other subtrees on that level. Recursing down the tree we get to the bottom level, which are subtrees consisting of only one node, where we stop.
== Formal proof (by Peter J. Haas) ==
The formal proof by Peter J. Haas looks at the probability of picking an arbitrary permutation.
If we can show that the probability of picking any arbitrary permutation is 1/n! then we have proven that the shuffle is correct (by definition).
Consider an arbitrary permutation [b_1, b_2, ..., b_n]
We produce a permutation x by the random shuffle algorithm.
What is the probability that x is identical to b, that is x_1 = b_1, x_2 = b_2, ..., x_n = b_n?
The probability of the Knuth shuffle producing a permutation with b_1 as its first element is clearly 1/n.
Now, the probability of the shuffle producing a permutation that starts with b_1, b_2 is: 1/n x 1/(n-1). Why? Because having chosen b_1 as the first element, now there are n-1 elements left to choose. The probability of choosing b_2 from those n-1 elements is therefore 1/(n-1).
And the probability of choosing b_3 as the third element, given that the first two elements are b_1 and b_2? Well, we simply multiply up the probabilities (this follows from the definition of conditional probability) as follows:
P(x_1 = b_1): 1/n
P(x_2 = b_2 given x_1 = b_1): 1/n * 1/(n-1)
P(x_3 = b_3 given x_1 = b_1 and x_2 = b_2): 1/n * 1/(n-1) * 1/(n-2)
And so on and so forth, such that
P(x_1 = b_1, x_2 = b_2, ..., x_n = b_n) = 1/n * 1/(n-1) * 1/(n-2) * ... * 1/1
Hence:
P(x = b) = 1/n!
Thus the proof concludes: the probability of the shuffling algorithm picking any arbitrary permutation is 1/n!, therefore the algorithm is correct.
Source: https://people.cs.umass.edu/~phaas/CS590M/handouts/Fisher-Yates-proof.pdf