The Monkey Shakespeare Probability

The probability that a solitary, immortal monkey, typing randomly on a standard, unbreakable keyboard, would eventually produce one of Shakespeare's plays is estimated. The question of what age of the universe, if any, would be large enough to give the monkey a 50% chance of success, is investigated.
What's past is prologue
The monkey problem is believed to have originated with a statement attributed to Huxley:

Six monkeys, set to strum unintelligently on typewriters for millions of years, would be bound in time to write all the books in the British Museum..

and is quoted by J. Jeans in Mysterious Universe, Cambridge University Press, 1930, p.4. As Jeans was an accomplished physicist, it is surprising he does not either prove or disprove the claim. Perhaps because Huxley did not specify exactly how many millions of years he was thinking of, Jeans let the remark go unchallenged.
Some approximations
The exact age of the universe one finds in the literature can be expected to vary as more accurate astronomical data is obtained by scientists. The value we will use is that currently given by NASA. The number of letters in a Shakespearean play is also an approximation. However, only order of magnitude accuracy is required to perform the calculation, so the approximations are reasonable.
  1. \( {\text{Age of Universe = 13.7 billion years =}} 10^{17} s\)   [1]
  2. \( {\text{ Number of chars/letters in a Shakespeare Play = }} 10^5 \) (Rough approximation)
  3. The keyboard contains 44 keys (Again, this varies, depending on the keyboard)
  4. All keys are equally likely to be struck by the monkey. (Since the space bar is a much larger key than the others, in reality the monkey would be expeced to strike it more often. We ignore that nuance.)
  5. Ignore upper and lower case distinctions
  6. The monkey types at a rate of one char per second
A review of probability
Suppose we perform an experiment that has a finite number of possible outcomes. The set of all possible outcomes, S, is called the sample space . Any subset, E, of the sample space is called an event . If the number of elements in the sample space is n, and the number of elements in the event is m, then the probability of the event, P(E), is defined to be \(\frac{m}{n}\). This, by the way, is the classical definition of probability. (The other way of defining probability, the Bayesian definition, will not be considered here.)

  • If the event has zero elements, then this reduces to \(\frac{0}{n}\), or a probability of zero, indicating it will never happen.
  • If the event happens to be the same as the whole sample space, then we get \(\frac{n}{n}\), or a probability of one, indicating certainty
  • An event with a probability midway between 0 and 1, P(E) = \(\frac{1}{2}\) would indicate the event was just as likely to occur as to not occur...
So in general, a probability will be greater or equal to zero, and less than or equal to one., i.e, \( 0 \leqq P \leqq 1 \)   [2]

Example: 44 key typewriter      A typewriter with 44 keys is struck once. What is the probability that the letter "a" was struck?
The sample space is the set of 44 keys. The number of elements in the sample space is 44. The event E is the set of struck keys, in this case, the "a" key. The number of elements in the event is 1. So P(E) = \(\frac{1}{44}\).
Comment: An implicit assumption, or idealization, being made in the above calculation is that all 44 keys are equally likely of being struck.
Though this be madness, yet there is method in't.
In general, the calculation of a probability reduces to solving two counting problems. First, you have to calculate the number of elements in the sample space. Second, you have to calculate the number of elements in the event. A fundamental technique that often enables us to do this is the Product Rule
If an event can occur in m ways and a second independent event can occur in n ways, then the two events can occur in mn ways.

Example: Fujara Flute      How many ways can a flautist cover one fingerhole on a flute, and then cover a second, different fingerhole?
A Fujara flute has 3 fingerholes. There are 3 choices for the first hole. Having made the initial selection, there will be two possible choices for the second fingerhole. Applying the product rule gives: 3 * 2 = 6 ways.
It is as easy to count atomies
Let our budding playwright monkey type precisely \( 10^5 \) characters. Let's enquire as to the probability that the monkey in fact produced a Shakespearean play. To do so, we will need to count the elements in the sample space, and the elements in the event.
The Sample Space
The sample space will be the set of strings consisting of \( 10^5 \) characters from an "alphabet" of 44 chars or keys. The number of elements in the sample space is, from the product rule, calculated by noting we have 44 choices for the first char, 44 choices for the second, ..., 44 for the \( 10^5 \) and last character. So, applying the product rule, the total number of elements in the sample space is:
44 * 44 * ... * 44 (\( 10^5 \) such terms) = \( {44^{10^5}} \) = \(44^{100000}\)
The Event
The event will be our Shakespearean play, which consists of a single, albeit long, string. The number of elements in the event is 1.
The Probability
Thus, we can immediately conclude: The probability that the monkey was successfull after randomly typing just \( 10^5 \) chars is \begin{align} & \text{P =} \frac{1}{44^{10000}} \\ & \\ & \text{and using } \log _{10} 44 = 1.64345 \\ & \text{and } \log _{10} P = (-100000) * 1.64345 \\ & \text{we get,} \\ & \\ & \text{P = } 10^{-164345} \end{align}
To be or not to be
If an event or experiment can only have two outcomes (success and failure), then independent repeated trials of that event are called Bernoulli trials. A standard example of a Bernoulli trial is the tossing of a coin. Here is the essential approximation in our heuristic approach to the Monkey-Shakespeare problem -- every \(10^5 \) characters typed by the monkey will be considered to be a Bernoulli trial. That is, we let the monkey type \(10^5 \) characters (the assumed number of characters in the Shakespeare play) and then ask if the given Shakespearean play was, in fact, produced (success) or not (failure). By approximating the problem as a sequence of Bernoulli trials, we can apply well known and standard results of elementary probability theory to get a heuristic answer to our basic question.
Bernoulli Trials
First, we review the basic facts of Bernoulli trials. Denote the probability of success of a Bernoulli trial by p, and the probability of failure as q, with p + q = 1. Then, we have the following result for the probability of k successes in n trials:
P (k successes) = \(n\choose k\) \(p^k q^{n-k} \)

P (1 or more success) = 1 - \( q^n \)
where \(n\choose k\) is defined as \( \frac{n!}{(n-k)!k!}\)

Example: Coin Toss      A coin is tossed 20 times, what is the probability of getting 19 heads?
\begin{align} &\text{We have} \\ &n = 20, k = 19, p = q =\frac{1}{2} \\ & \text{so,} \\ &\text{P(19 heads) =} {20\choose 19} * \frac{1}{2}^{19} * \frac{1}{2}^{20-19} \\ &= 1.812 * 10^{-5} \end{align}
The Calculation
Having layed the ground work by reviewing basic probability and counting techniques, we can now proceed with the calculation.
Number of Bernouli Trials
The monkey types one char each second. Therefore, the time for the monkey to type \( 10^5 \) chars is \( 10^5 \) seconds.
It follows that the maximum number of trials n that the monkey can perform is:
\( n = \frac{\text{age of the universe}}{\text{time for one trial}} = \frac{10^{17}}{10^5} = 10^{12}\) trials
\begin{align} &\text{Using Bernoulli, we know that} \\ \\ &\text{P(1 or more Shakespeare’s)} = 1 - q^n \\ & = 1 - {(1-p)}^n \\ & = 1 - (1 - np) + \text{neglible terms} \\ & = np \end{align}
Plugging in our values for n and p, we finally get
The Monkey Shakespeare Probability
\begin{align} \text{P(1 or more Shakespeare plays)} & = 10^{12} * 10^{-164345} \\ & = 10^{-164333} \end{align}
In other words, even if the monkey had been typing for as long as the universe has been in existence, the probability that it will type a given play of Shakespeare’s is on the order of \(10^{-164333} \)
Age of Universe Required for 50% Chance of Monkey Success
Setting \( np = \frac{1}{2} \), we get \( n = \frac{1}{2p} \), so that the number of trials n to give the monkey a 50% chance of success is
\(n_{50} = .5 * 10^{164344} = 5 * 10^{164343}\)
and so,
the (Age of the Universe)50% = \(10^5 * 5 * 10^{164344} s \)
Thus we see that the required age of the universe for a reasonable chance of monkey success is a surprisingly huge number relative to the known age \(10^{17} s \) of the universe.
In fairness to Huxley and Jeans, I must point out that before Einstein's General Theory of Relativity, and the resulting Big Bang theory, time was considered to be absolute, and infinite. Therefore, one did not typically consider the age of the universe in one's calculations, and tremendously large stretches of time were not necessarily unrealistic.
Parting is such sweet sorrow
In conclusion, I must point out why this is merely a heuristic calculation. It has been assumed that we can model the calculation by a series of Bernoulli trials. This approximation has the effect of under-estimating the probability. It would be possible, for example, for the monkey to have successfully reproduced the play after \( 10^5 + 1 \) chars (i.e.,1 trial plus the first char from the next trial) have been typed... However, approximating by a series of Bernoulli trials does have the advantage of using well known and elementary counting techniques to make obvious the rather absurd nature (absurd given our current knowledge of the finite age of the universe) of Huxley's original statement.

[1] Source
[2] It must be pointed out that Feynman, in his investigations on the foundations on quantum mechanics, has successfully extended the range of probabilities to include negative numbers.