Binomial vs. Negative Binomial vs. Poisson Distributions

The Binomial, Negative Binomial, and Poisson Distributions are closely related with one another in terms of their inherent mathematics. However, they are distinguished from one another due to the fact that they are better applied in situations suitable to them. I will attempt to provide as simple a comparison between these three probability distributions in order to fully recognize their usefulness and their closeness.

Let’s start with the simplest:

Binomial Distributions:

Binomial Probability Distributions are one of the many Discrete Probability Distributions – the random variable X only possesses discrete values in its data set (i.e., X = 1, 2, 3, 4, 5; certain values from a finite set) – that have been created and one of the most popular.

In order for one to use a Binomial Probability Distribution (or BPD for short), one must first understand what Binomial experiments are, better known as Bernoulli Trials.

Bernoulli Trials are characterized as:

  • an experiment consisting of n repeated trials.
  • having each trial posses two and only two outcomes, often labelled as success or failure.
  • the probability of success is denoted with p. the probability of failure (or no success) is denoted with q where p = 1 – q.
  • having each trial being independent. Therefore, the result of the any trial does not have any influence in the outcomes of any subsequent trials.

The simplest example to understand Bernoulli Trials is the Coin-Flip Experiment. The Coin-Flip Experiment is considered a Bernoulli Trial since:

  • it can be filled as many times as one wants.
  • it possesses only two possible outcomes: heads and tails.
  • P(heads) = 0.5 ; P(tails) = 0.5
  • each outcome from a flip of the coin will have no effect on the outcomes of future flips.

Since this particular scenario is considered a Bernoulli Trial, it can be analyzed with a Binomial Probability Distribution.

Let’s pose a question, what is the probability that 3 heads will occur out of 5 coin flips?

Using the formula:



b(x; n, p), stands for the binomial probability distribution function.

x, stands for the number of successes that result from a binomial experiment

n, stands for the number of trials conducted in the binomial experiment

p, stands for the probability of success occurring on the individual trial

q, stands for the probability of failure occurring on the individual trial

nCx, stands for the number of combinations of successes (x) occurring in a number of trials (n)

In this way, we can solve our question by having,

x = 3 heads occurring (3 successes)

n = 5 flips (5 trials conducted)

p = 0.5 chance of getting a head (chance of success)

q = 0.5 chance of getting a tail (chance of failure)


Therefore, the probability of 3 heads occurring in 5 trials is 0.3125.

Negative Binomial Distributions:

Negative Binomial Probability Distributions are similar to that of the previously mentioned distribution, apart from the one detail that makes its experiments different from that of a Bernoulli Trial.

Just like Bernoulli Trials, Negative Binomial Experiments are characterized as:

  • an experiment consisting of repeated trials.
  • having each trial posses two and only two outcomes, often labelled as success or failure.
  • the probability of success is denoted with p. the probability of failure (or no success) is denoted with where p = 1 – q.
  • having each trial being independent. Therefore, the result of the any trial does not have any influence in the outcomes of any subsequent trials.

The difference is that:

  • the experiment continues until a fixed number of successes, r, occurs

Returning back to the Coin-Flip experiment. The question is, if one continues flipping a coin, what is the probability of heads landing 3 times?

This is a Negative Binomial Experiment because:

  • the experiment consists of repeated trials until heads lands 3 times.
  • each trial has only two possible outcomes: heads or tails
  • P(heads) = 0.5 ; P(tails) = 0.5
  • the trials are independent, as getting a heads on one trial does not affect the outcome of the next and following trials.
  • the experiment continues until a fixed number of successes occurs, for example, 3 heads.

Using the formula:



b*(x; r, p), stands for the negative binomial probability distribution function.

x, stands for the number of trials required to produce r successes from the negative binomial experiment

r, stands for the number of successes in the negative binomial experiment

p, stands for the probability of success occurring on the individual trial

q, stands for the probability of failure occurring on the individual trial

x-1Cr-1, stands for the number of combinations of successes (x) occurring in a number of trials (n)

Let us assume that it takes 5 flips to produce 3 heads. In this way, we can solve our question by having,

x = 5 flips (5 trials required to produce r successes)

r = 3 heads

p = 0.5 chance of getting a head (chance of success)

q = 0.5 chance of getting a tail (chance of failure)


Therefore, the probability of 3 heads occurring in 5 trials is 0.1875.

A cool note: in situations where r=1, you get the formula for the Geometric Distribution Function, which is essentially a specific case of  the Negative Binomial Distribution Function.

Poisson Distributions:

Poisson Probability Distribution Functions are once again another type of Discrete Probability Functions. While Poisson experiments are similar to Bernoulli Trials in that only two possible outcomes exist, the similarities end there.

Poisson experiments are characterized:

  • by having only two possible outcomes: success and failure
  • with independent trials.
  • by knowing the average number of successes that occur within in a region (where a region is defined as a length, area, volume, period of time, etc…)
  • by the probability of success occurring is proportional to the size of the region (and therefore the probability of success occurring in an extremely small region of space is virtually zero).

There are two slight variations to the formula of a Poisson Probability Distribution Function. The commonly used formula is:



P(x;λ), stands for the Poisson Probability Distribution Function

λ, is the average number of successes that occur in a region.

x, the actual number of success that occur in a region.

e, Euler’s Constant, which is an irrational number with a value of 2.71828…

An example, a real estate company sells on average 2 homes per day, what is the probability that exactly 3 homes will be sold tomorrow?

In this way, we can solve our question by having,

λ = 2, where 2 homes are sold in a day, where “a day” is the region in which the successes “2 homes sold” occur.

x = 3, assuming that 3 homes are sold tomorrow, what are the chances of this event occurring within the region of a day?


Therefore, the probability of 3 homes being sold tomorrow is 0.180.

The alternative form of the Poisson Probability Distribution Function defines, λ = np


n = is the total number of trials

p = is the probability of a success of occurring.

Essentially, λ is the Expected Value of a Bernoulli Trial. The Poisson Distribution Function is nothing more than a specific case of the Binomial Distribution Function by where n is a large number, and p is a very small number.

The formula can be rewritten as under:

pois form 2

This form of the Poisson Distribution Function proves useful when solving other situations (radioactive decay, cell populations, voting, etc…)

Another example: a certain drug is effective in 98% of cases. If 2000 patients are selected, at random, what was the probability that the drug was ineffective in exactly 10 cases?

In this way,

n = 200o patients

p = 0.02 (success of drug ineffectiveness)

x = 10 cases

pois ex2

Therefore, the probability of the drug being ineffective in exactly 10 cases is 1.2276*10^-8.


The Dragon Curve (and a bit on Fractals)

If you have ever read Michael Crichton’s “Jurassic Park” (Yes, the 1993 movie was adapted from a book as most tend to be) then you would have noticed the unorthodox chapter headings.

1st Iteration

Each chapter (or iteration as the book called it) was found to have a seemingly meaningless illustration of some lines and squares adjunct to a quotation from one the book’s characters.

The First Chapter/First Iteration contained the above image with the quote: “At the earliest drawings of the fractal curve, few clues to the underlying mathematical structure will be seen.” Ian Malcolm.

With each subsequent chapter, the quotations begin to resemble the events in the story – the idea of unpredictability; chaos theory – and the illustrations become more elaborate – and eerily reptilian.


 What we have is called a Dragon Curve.

It is a fractal with a very simple iterative process:

  1. Draw a Line
  2. Rotate a copy of the Line from Step 1 90 degrees (clockwise or anti-clockwise is arbitrary) and attach it to the end of the First Line. (You will now have two perpendicular lines – an L shape).
  3. Rotate a copy of the entire L shape from Step 2 90 degrees (continue rotating it the same direction as you did in Step 2) and attach it to the end of the First L shape. (You will now have a saucepan shape).
  4. Rotate a copy of the entire saucepan shape from Step 3 90 degrees (continue rotating it the same direction) and attach it to the end of the First saucepan shape.
  5. Rotate a copy of the entire image and attach it to the end of the original.

Each step corresponds to the panel in the image below (read from left to right, top to bottom)

1-15 horizontal

*Some things to be aware of is the overlapping that occurs from Step 5 onwards. If you’re attempting to draw the curve by hand, it would be helpful to use different colours to help yourself differentiate between the original and the rotated copy.

As you can see it is an extremely simple fractal design. You can see that the number of lines double with each iteration; this is because we are copying the previous iteration. Another thing to notice is that each section of the Dragon Curve (more noticeable in higher iterations) is reduced by a factor of {\sqrt{2}} and rotated by 45°.

rotate and reduce

Dragon Curves, like many other fractals, posses a property called Self-Similarity. For a fractal to be self-similar means that if you zoom at any specific region of the fractal, no matter you far you go, it will remain the same image. An example of this can be seen in the Sierpinski Triangle or Koch Snowflake.


Unlike some fractals, Dragon Curves can be produced with different iterative processes, though they are more complicated to pursue compared to the one mentioned previously.

The first method requires you to:


  1. Take a square
  2. Divide it in half horizontally.
  3. Translate both halves in opposite directions horizontally such that their edges touch the midpoints of the longer edge.
  4. Divide the shape in vertically in sixths
  5. Translate these divisions vertically similar to Step 3.
  6. Divide the shape in horizontally in tenths
  7. Translate these divisions horizontally similar to Step 3.
  8. and so on…

The second method requires you to:

  1. Essentially draw a lot of triangles…I have been unable to understand the integral mechanics of this particular iterative process. But mathtuber TheMathGuy’s video on the Dragon Fractal (another name for the Dragon Curve; also called the Jurassic Park Dragon, Heighway Dragon, etc…) provides a detailed demonstration on another way to draw the Dragon Curve.

While the Dragon Curve is just another good-looking fractal that can be a fairly decent (yet cliché)
desktop wallpaper, fractals have numerous applications in fields such as astronomy, graphics design, Fractal naturemeteorology, geology, economics and many more.

The iterative nature of fractals allow computer designers to map landscapes in video games and 3-D maps. Fractals can be used to understand the nature of different crystal lattice structures and correlate the iterative process to their strength. Price trends are found to possess self-similarity, just like fractals, as patterns in price fluctuations follow similar patterns in different time periods, regardless of magnitude. You can calculate the length of coastlines using fractal mathematics.

Fractals ominously occur in real life: the branches of a tree, the arrangement of flower petals, the design of a seashell, tributaries in a delta, crystalline structures of compounds and rocks (ice, diamond, halite), lightning, DNA, animal horns, and many many more.

The dragon is symbolic for its mystery and revered power, and the Dragon Curve is likewise the perfect testimonial to illustrate – and confirm – the power of fractals.


Works Cited:

123 4 5 6 7 8

The Monty Hall Problem

the Pledge

Imagine you’re in a game-show. You have managed to blaze through the quarter-finals and semi-finals and end up as one of the two finalist contestants at the threshold of receiving the Grand Prize of 1 Million Dollars!

MHP all closed

Your final challenge is a simple guessing game. There are three doors: one of which contains the Grand Prize, and the remaining two contain a beautifully groomed and domesticated goat each. You must guess which door contains the Grand Prize. If you guess correctly, you receive the Grand Prize and are declared official winner of the game show. If you guess incorrectly, you will receive the goats, leaving the Grand Prize to your opponent.

MHP 3 open GMG

You have no way of knowing which door conceals which item. Whichever door you pick, you will receive the prize behind it.

The game-show host asks you to pick a door. Let us say you had a good feeling about Door A and picked it. Here’s the twist. Before the game-show host reveals the contents behind Door A, he reveals to you the content of one of the remaining two doors. The game-show host will always reveal the location of one of the goats. Let’s say he opened Door B, which contained a goat.


Then game-show host offers you a second chance: do you want to stick with your initial choice, or would like to switch doors?

MHP A'Host'C

The question is:

Should you swap?

Should you stick with your original choice?

Or does it really make a difference?

the Turn

If your answer is  that it doesn’t really make a difference whether you switch doors or not, then you are unfortunately, very very wrong. Surprising isn’t it? It in fact does make a difference, a very large one in fact, as you actually double your chances by switching from your initial choice. Don’t believe me?

Never fear, Mathematics is here! 

the Prestige

Assuming you are among the many who believed it didn’t make a difference (if you’re a smarty-pants and chose correctly to swap doors, then go be smart some place else…), then you probably made the false assumption that there was a 50% chance that the Grand Prize would be in Door A as well as Door C.


Here’s an example.

We will use the same arrangement, where Door A contains a Goat, Door B contains a Goat, and Door C contains the Grand Prize.


Let us set some cases:

1. You pick Door A [it contains a Goat], the Host shows you Door B [it contains a Goat]


2. You pick Door B [it contains a Goat], the Host shows you Door A [it contains a Goat]


3. You pick Door C [it contains the Grand Prize], the Host shows you Door A/B [it contains a Goat]


Let’s count the number of times you win if you do not switch doors and remain with your initial choice.

1. You pick Door A [it contains a Goat], the Host shows you Door B [it contains a Goat], you remain with Door A.



2. You pick Door B [it contains a Goat], the Host shows you Door A [it contains a Goat], you remain with Door B.



3. You pick Door C [it contains the Grand Prize], the Host shows you Door A/B [it contains a Goat], you remain with Door C.




  • P(Lose if you do not switch) = 2/3
  • P(Win if you do not switch) = 1/3

Let’s count the number of times you win if you switch doors rather than remaining with your initial choice.

1. You pick Door A [it contains a Goat], the Host shows you Door B [it contains a Goat], you switch to  Door C.



2. You pick Door B [it contains a Goat], the Host shows you Door A [it contains a Goat], you switch to  Door C.



3. You pick Door C [it contains the Grand Prize], the Host shows you Door B [it contains a Goat], you switch to  Door A.




  • P(Lose if you switch) = 1/3
  • P(Win if you switch) = 2/3

You can also come up with the same answer by drawing a tree diagram,

MHP Tree Diagram

or using Conditional Probability calculations,


As you can see, the Probability of Winning doubles when you decide to switch doors and the Probability of Losing halves.

This problem is known as the Monty Hall Problem or Monty Hall Paradox, and is named after the host of the TV Game Show “Let’s Make a Deal”. This problem is recognized for its counter-intuitive result – a Veridical Paradox.

For many years, it has baffled people, especially renowned mathematicians, for its paradoxical solution. Numerous simulations have been created to test whether you are more likely to win if you switch doors. The New York Times created a Flash Simulation on their website that allows you to test for yourself the Monty Hall Problem.

The Mythbusters and many other science shows have all tried their hand at this problem, and switching has always proven to be the most profitable choice.

The most important lesson to take from the Monty Hall Problem is to realize that our gut/instinctive feeling is not always correct, and that we must always be skeptical of everything.


Works Cited

1 – 2 – 3 – 4 – 5 – 6

Contact – Probable Impossibility or Improbable Possibility?


It was a dry and blistering August 15, 1977 at the Ohioan Big Ear Telescope. SETI (Search for Extra-Terrestrial Intelligence) Radio Astronomer Jerry Ehman spent this afternoon laboriously examining the shelves of printed perforated paper that detailed radio signals emanating from the cosmos. Hours spent sc800px-Wow_signalrutinizing over every seemingly inconsequential signal led to him to come across an anomaly curiously inconsistent with the library of signal data. The 6EQUJ5 code circled with his red pen indicated a radio signal thirty times stronger than any other normal signal received and a frequency ominously identical to the hydrogen atom’s resonance frequency. Ehmer’s comment of the code – Wow! –  modestly sums up the discovery. Was it first contact? Was it just an emission from celestial bodies like quasars, known for their violent ejections of electromagnetic energy? Alternatively, was it an alien races’ misguided attempt at invading our planet by kidnapping the Earth’s then “King”, Elvis Presley, who coincidentally “passed away” the following day?


This 72 second supposed alien message has since chosen not to reveal itself after our scouring the skies for a signal with the slightest resemblance. Whatever the origin and cause, the “Wow! Signal” had become an iconic symbol for the SETI’s goal, and by extension, humanity’s evolution as a technological and cosmological species. The implications of such a discovery were subject to our imagination. As Carl Sagan, astronomer once said, “Imagination will often carry us to worlds that never were. But without it we go nowhere”.

The Drake Equation

An example of this use of imagination is the Drake Equation.


The Drake Equation,  a mathematical equation created by astronomer Frank Drake in 1961, is used to approximate the number of detectable intelligent extraterrestrial civilizations in the Milky Way, and by extension our chances of encountering them. As you can see, the equation involves a variety of variables:

  • N = the number of civilizations in our galaxy with which communication might be possible
  • R* = number of new stars born each year
  • fp = percentage/fraction of stars with planets
  • ne = average number of habitable planets per solar system
  • fℓ = percentage chance a habitable planet develops life
  • fi = percentage chance that life develops intelligence
  • fc = percentage chance life can communicate across space
  • L = the length of time for which such civilizations release detectable signals into space

If you notice, the type of information for each following variable becomes more and more specific; it calculates the approximation from a macro-perspective and narrows the specifications of the variables. Essentially, the equation is structured using the Multiplicative/Fundamental Counting Principle.

The chances of finding intelligent extraterrestrial life in our galaxy can either be extremely high or extremely low. We must understand that the Drake Equation presently exists to provide perspective, rather than calculate a specific number. With the level of technology and knowledge we have so far, we can accurately come up with values for the first three variables by observing the billions of galaxies and stars in the night sky. But the values for the remaining four variables is up to our own pessimism or optimism and are the ones scientists are searching for. The problem with trying to predict the chance that a planet develops life that becomes intelligent that achieves interstellar communication is that we have only one data point, and that is us on our pale blue dot.

However, the Drake Equation is not necessarily perfect, as there have been numerous refinements and abridgements to it. For example, the nr variable describes the percentage chance of new civilizations evolving from previously extinct civilizations. This is an optional variable for the equation as it takes into consideration the billion year lifespans most terrestrial planets possess.

Each variable is laden with assumptions; for example, the fp variable factors only the number of terrestrial planets rather than gas giants Saturn and Jupiter. This makes the assumption that gas giants are unable to host intelligent life, as far as we know.

If we look at the types of values that would be used in the estimate, the values R*, fp, and ne are closer to powers of 10 rather than 1. The fl, fi, fc factors would all be numbers less than 1 (

10^-1 or even smaller). If we multiply these values together we can approximately get a value of 1. Therefore, based on these crude estimates, the number of civilization in our galaxy with which communication might be possible, N, is approximately equal to L, the longevity of a civilization.


Probabilities of there being intelligent aliens, when it comes to something as infinite as outer space, has the potential of being extremely large. Because space is so large and the formation of planets and other phenomenon is seemingly random, there is undoubtedly at least one (apart from our own) planet with intelligent life. This is analogous to finding one’s phone number in the digits of pi or any other transcendental number like e or phi. Because of pi’s random digits, there exists a chance of finding any conceivable number permutation, based on the notion of infinity and randomness. Based on these assumptions, that the universe is in fact infinite (or just terribly large) and its processes are random in nature, intelligent life should exist elsewhere. The issue is whether we can contact them or not.

“Where are they?”

– was the big question asked by Italian Physicist Enrico Fermi. If Earth is considered an atypical planet orbiting an ordinary young yellow star like billions of other planets, there should be a multitude of civilizations in space – yet where are they? So far, we have not one strong piece of evidence supporting this assumption, just evidence of the contrary.

This idea is called the Fermi Paradox – “the apparent contradiction between high estimates of the probability of the existence of extraterrestrial civilization and humanity’s lack of contact with, or evidence for, such civilizations”.

There are numerous reasons for this being the case:

  • The Universe is unimaginably large – the diameter of the observable universe (notice the word “observable”) is 93 billion light years. Intelligent life can possibly exist anywhere.
  • The Speed of Light is finite – the speed of light in a vacuum is approximately 299 792 458 m / s. It would take a photon of light 93 000 000 000 years to travel from one edge of the observable universe to the opposite end.
  • The Universe is still expanding – as discovered by astronomer Edwin Hubble from observing Doppler shifts from neighbouring celestial objects, the Universe is still expanding after 13.77 billion years. Like ants on an inflatable balloon, they get farther and farther apart even if they stay still. Similarly, everything is essentially flying away from us, and shouting “come back” into the night sky won’t help.
  • Radio Transmissions never get to reach us – radio waves are just low frequency electromagnetic radiation or light. Light, in the presence of a large mass like a star or black hole, will have its path bent due to gravitational influence. Because of this, not only do the chances of alien radio transmissions reaching us slim, but our ability to accurately locate its source will reduce.

There are other explanations for this paradox:

  • We are in fact alone – perhaps it takes extremely precise conditions for intelligent life to arise. This is known as the Rare Earth Hypothesis. This goes against the Mediocrity Principle – that Earth is like any other planet in the universe and thus the development of intelligent life is common. Proof against the Rare Earth Principle is the thousands of exo-planets that we have discovered that have Earth-like conditions.
  • We’re not looking properly – perhaps aliens are transmitting data on frequencies unknown to us, or have encrypted transmissions that we might pass for insignificant.
  • Intelligent Life tends to destroy itself – perhaps developing as an intelligent species inadvertently killed them in the process such as by destroying the ecosystem or mutual assured destruction through war.
  • Natural disasters eradicate Intelligent Species – perhaps earthquakes, volcanic eruptions, emerging viruses, or meteor impacts may have caused intelligent life to die out.
  • We have not searched long enough to detect and comprehend interstellar transmissions

There are also other explanations that can be slightly 1984-esque:

  • We are purposely not contacted – perhaps extraterrestrial intelligence has developed and has decided that we never contact them. This is known as the Zoo Hypothesis, where we would be the animals in the cage being observed. Their reasons for doing so? They might be conducting experiments on us. They might be waiting for us to reach a higher technological level before contacting us.
  • The Alien Civilization does not agree with itself. The question of who speaks for Earth/Humanity might be similar to what other civilizations might undergo.
  • It would be dangerous to communicate with other alien civilizations – perhaps alien civilizations have found out that contacting other civilizations might be disastrous, and so have decided to avoid us.
  • They are here on Earth unobserved – We might underestimate their size or form. They could be microscopic or they have taken our own human forms. Perhaps they are here among us. Maybe its the bus driver, your neighbour, your teacher…


Calculating the probability of  intelligent extraterrestrial life in the Universe is subject to numerous variables. Our chances of encountering them is largely determined by our physical constraints and perhaps extraterrestrials’ decisions

As Arthur C. Clarke once said: “Two possibilities exist: either we are alone in the Universe or we are not. Both are equally terrifying.”

Works Cited

1 – 2 – 3 – 4 – 5 – 6 – 7 – 8 – 9 – 10

Deriving the Power Rule using Binomial Theorem

This derivation goes to show once again the pervasiveness of mathematical concepts in different areas of the field. While Euler’s Formula does a better job of doing that, this simple derivation will suffice.

The Power Rule is often the first Derivative Rule many Calculus students will learn. It is commonly used to calculate the derivatives of polynomial functions, as well as reciprocal functions and nth roots of functions.

There are numerous and varied derivations and proofs on the internet that show the derivation of the Power Rule, but the one I will focus on involves using the Binomial Theorem through the First Principle or Difference Quotient.

Power Rule Binomial Theorem 3

When trying to come up with this derivation, I tried to avoid expanding the Summation. But evaluating the limit alongside manipulating the summation made it harder to see a clear set of steps to derive the Power Rule.

This proof makes you wonder (at least me) how Isaac Newton (or Gottfried Liebniz, but nobody really cares about him…) derived this Derivative Rule. Was it through observing the patterns in different polynomials? Was it using this exact derivation? Or was it…


Interestingly, the Power Rule was actually created by Liebniz rather than Newton, as commonly believed. Newton’s method of calculating polynomial derivatives was more complicated, thus being the reason why we continue using Liebniz’ notation. Newton and Liebniz were both contemporaries in the 17th century, with Newton being English and Liebniz being German. Newton had made his discoveries in 1666 and had them published in 1693, while Liebniz had made his in 1676 and published in 1684, sparking much controversy. This controversy was taken to court as it had become a matter of national pride for both countries. Being credited as the country for first inventing Calculus – then considered the forefront of mathematical innovation – would be immensely advantageous. As a result, it was taken to court under the mediation of the Royal Society (a society for the sciences – still present today). Coincidentally, Isaac Newton was the society’s president, thus leaving poor Gottfried Liebniz with charges of plagiarism. This story reveals the false perceptions we have of those in history.


Who would have considered Newton as someone who abused their own authority? It is similar to wondering if Leonardo Da Vinci was a bigot or Ludwig van Beethoven a drunkard.

If it is any consolation, England was far behind Europe for most of the 18th century in terms of mathematical innovation due to Newton’s complex notation.

In conclusion (a terrible way to end, but it gets the job done nonetheless), this specific derivation of the Power Rule reveals the interconnectedness of mathematical fields. The Binomial Theorem is useful in numerous situations and contexts, and this derivation proves so.

Works Cited

1 – 2 – 3

Proving Binomial Theorem using Mathematical Induction

The Binomial Theorem is the perfect example to show how different streams in mathematics are connected to one another: its coefficients have combinatorial roots and can be traced to terms in Pascal’s Triangle, and expansion of binomials to different orders of power can describe probability and combination distributions. The combinatorial proof as under requires no need for proving again, but after learning a method called Mathematical Induction from incessant internet browsing on a late Saturday night, I thought, why not give it a shot?

Mathematical Induction is a method of mathematical proof used to prove an expression true for all natural numbers. The steps are as under:

  1. State the proposition P(n) that needs proving.
  2. The Basis: Show P(n) is true, when n=1.
  3. The Inductive Step:
    1. Assume n=k
    2. If P(k) is true, show that P(k+1) is true
  4. If P(k+1) is true, therefore P(n) is true.

Binomial Theorem Proof w. steps 3 - Copy

(Side-note: It’s not everyday you get to use Q.E.D.)

The Game of Permutations

What do mathematicians do? Simply, they try to find a mathematical model that describes the patterns that emerge in life, the universe, and everything. Mathematicians are employed in numerous industries in everything from manufacturing to finance. Mathematicians also, however, research in other topics that appear to have very little mathematical foundation. Take for example, board games. Sudoku are some examples that have been under the scrutiny of mathematicians all over the world.

The quest to accurately find all the possible Sudoku structures has been going on for many decades, and can often require the use of super computers to deal with the complex calculations involved. While it may appear, at first, to involve simple permutation calculations, the rules of the game restrict certain permutations from being included. A variety of research has been conducted to solve these problems, and I will try to do my best to explain them as easily as possible.

The research paper credited to have accurately calculated all the possible Sudoku arrangements goes Professor Bertram Felgenhauer (Dresden University of Technology, Germany) and Professor Frazer Jarvis’ (University of Sheffield, UK) paper Enumerating Possible Sudoku Grids (2005).

If we were to look at all the possible permutations for a 9 x 9 Sudoku grid exempt of its rules, it would total up to 9!^9 = 1.0911069e+50 (9! for each of the nine 3 x 3 grids).  Their final calculation amounted to 6670903752021072936960 ≈ 6.671 × 1021.

The process in which they have calculated the answer is simplified as follows:

We shall create a classification system to identify parts of the grid.

  • The 9 x 9 Sudoku grid is called a grid
  • each 3 x 3 grid will be called a block. Therefore there are 9 blocks in total, creating the 9 x 9 grid.
  • the three rows of blocks will be called bands of the grid.
  • the three columns of blocks will be called the stacks.
  • A cell in the ith row and jth column is said to be in (i,j) position.
  • The number of distinct Sudoku grids will be denoted N.
  • The digits 1-9 that fill up each block will be referred to as symbols or digits.

Fig 1

Fig. 1: Each block is denoted a name as B#.

How many ways are there to fill B1 in a valid way? (Remember that each block must contain nine symbols of 1 to 9 with no repetitions)

Looking solely at B1, we know that the first cell (1,1) has nine possible options (i.e., 1, 2, 3, 4, 5, 6, 7, 8, 9). The second cell (1,2) now has eight possible options, since the digit used in the (1,1) cannot be repeated. The third cell (1,3) now has seven remaining possible options and so onWhat we are essentially doing is counting the number of valid of permutations of nine symbols for B1: how we could arrange 9 symbols into 9 cells/order 9 symbols. Therefore the total permutations for B1 are 9P9 = 9!. However, we need to calculate the number of valid arrangements, since B1 will depend on the arrangements of other blocks.

Fig 2 Fig. 2: One possible permutation for B1.

If N represents the number of distinct grids (abiding by game rules) and Nthe number of distinct grids with the B1 permutation shown in Fig. 2, the total number of valid grids will be N1×9!, so N1=N/9!.

The authors decide to permute B2 and B3 in terms of one permutation of B1 as shown in Fig. 2.

Since 1, 2, and 3 occur in the first row in B1, these numbers cannot occur in the rest of the row. So only the numbers 4, 5, 6, 7, 8, and 9 from the second and third rows of B1 can be used in the first row in B2 and B3.

They classify these two possibilities as pure top rows – when the numbers {4,5,6} as in the second row of B1 are kept together in B2 and the numbers {7,8,9} as in the third row of B1 are kept together in B3, and the swapped version of this – and the rest of the possibilities as mixed top rows – where the sets {4,5,6} and {7,8,9} are mixed up when using them to fill in the first rows of B2 and B3.

With the use of pure and mixed top rows, twenty possibilities occur. With this, they try and complete the first band.

(Note: the permutation of B1 remains fixed)

There are  (3!)6 ways to complete the first band in terms of the the pure top row 1,2,3;{4,5,6};{7,8,9}.

The cases of the mixed top rows are more complicated. Let’s consider the top row 1,2,3;{4,6,8};{5,7,9}. This can be completed to the first band as in Fig. 3 , where a, b, and c are the numbers 1, 2, 3 in any order.

Fig 3

Once a is picked, b and c are the remaining two numbers in any order, since they are in the same row. Since there are three choices for a, and the three digits in each of the six sets in B2 and B3 can be permuted to result in various valid first bands, the total number of configurations in this case is 3×(3!)6. You can similarly work out each of the remaining seventeen cases of first rows to obtain the same number.

Now we have the number of possible first bands given the standard B1: it is 2×(3!)6+18×3×(3!)6=2612736, where the first part of the sum is the number of first band completions from the pure top rows and the second is the number of first band completions from the mixed top rows.

Instead of calculating how many full grid completions each of these 2612736 possibilities has, Felgenhauer and Jarvis next determined which first bands share the same number of full grid completions. Such an analysis reduces the number of first bands that need to be considered when counting.

Here are some operations which leave the number of grid completions of the first band unchanged: relabeling the numbers, permuting any of the blocks in the first band, permuting the columns within any block, and permuting the three rows of the band. When any of these changes B1, we can relabel the digits to recover its standard form.

Permuting B1, B2, and B3 preserves the number of grid completions because if we start with any valid Sudoku grid, the only way to keep it valid would be to permute B4, B5, B6 and B7, B8, B9 correspondingly so that the stacks remain the same. In other words, every valid grid completion for the first band gives exactly one valid grid completion for the first band being any permutation of B1, B2, B3.

Such considerations allow us to reduce the number of specific first bands we need to consider when counting. Following Felgenhauer and Jarvis, we permute the columns of B2 and B3 so that the top row entries of each are in increasing order, and then swap B2 and B3 if necessary to make the first entry of B2 smaller than that of B3. This is called lexicographical reduction. Since there are 6 permutations of the columns in each of the two blocks and two ways to permute the blocks, lexicographical reduction tells us that, given a first band, there are 62×2=72 other first bands with the same number of grid completions. So now we only need to consider 2612736/72=36288 first bands.

For each of these possibilities, we consider permutations of all three top blocks: there are 6 of them. For each of these, there are 63 permutations of the columns within each block. After performing these operations, we relabel to get B1 back into standard form. We can similarly permute the top three rows of the band, and relabel to get B1 back into standard form. These operations preserve the number of grid completions of a first band. Felgenhauer and Jarvis used a computer program to determine that these operations reduce the number of first bands to be considered from 36288 to just 416.

The main point is that the band you started with and the different band you ended with have the same number of completions to a full Sudoku grid. So instead of calculating the number of grid completions for each of these bands, we can count it for just one of them.

There are more steps that reduce the number of grids to be considered. If we have a pair of digits {a,b} in one column with a in the ith row and b in the jth row, and the same pair in a different column with b in the ith row and a in the jth row, then swapping the places of a and b in each pair will result in a band that has the same number of grid completions as the original. This is because each pair lies in the same column, and swapping both at once keeps the One Rule satisfied for the rows involved as well. As an example, look at the numbers 8 and 9 in the sixth and ninth columns of the above example. Considering all possible cases of this left Felgenhauer and Jarvis with 174 out of the 416 first bands with which to proceed.

They also considered other configurations of the same set of digits lying in two different columns or rows, which can be permuted within their columns or rows leaving the number of grid completions invariant. This reduced the number of first bands to consider to 71, and searching through each of these 71 cases let them know that there are actually only 44 first bands whose number of grid completions need to be found. Each of these 44 bands has the same number of completions to a full grid.

Let C stand for one of these 44 bands. Then the number of ways that C can be completed to a full Sudoku grid can be calculated: call it nc. We also need the number mC of first bands that share this number nC of grid completions. Then the total number of Sudoku grids will just be N=ΣCmCnC, or the sum of mCnC over all of the 44 bands.

Felgenhauer and Jarvis wrote a computer program to carry out the final calculations. They computed the number N1 of valid completions with B1 in standard form, starting with the 44 bands. Then they multiplied this number by 9! to get the answer. They discovered that the number of possible 9 by 9 Sudoku grids is N=6670903752021072936960 which is approximately 6.671×1021.

Works Cited