Probability and Counting

Mathematics is the logic of certainty; probability is the logic of uncertainty. Probability deals with qunatifying uncertainty and updating those beliefs when new evidence comes into light. It is extremely useful in a wide variety of fields, since it provides tools for understanding and explaining variation, separating signal from noise, and modeling complex phenomena. To give just a small sample from a continually growing list of applications:

  1. Statistics: Probability is the foundation and language for statistics, enabling many powerful methods for using data to learn about the world.
  2. Physics: Current understanding of quantum physics heavily involves probability at the most fundamental level of nature. Statistical mechanics is another major branch of physics that is built on probability.
  3. Biology: Genetics is deeply intertwined with probability, both in the inheritance of genes and in modeling random mutations.
  4. Computer science: Randomized algorithms make random choices while they are run, and in many important applications they are simpler and more efficient than any currently known deterministic alternatives. Probability also plays an essential role in studying the performance of algorithms, and in machine learning and artificial intelligence.
  5. Meteorology: Weather forecasts are computed and expressed in terms of probability.
  6. Gambling: Many of the earliest investigations of probability were aimed at answering questions about gambling and games of chance. Historical root of probability lies in gambling. In 1650’s Fermat and Pascal correspond through long letters discussing chances in different gambling games.
  7. Finance: Probability is central in quantitative finance. Modeling stock prices over time and determining “fair” prices for financial instruments are based heavily on probability.
  8. Political science: In recent years, political science has become more and more quantitative and statistical. For example, Nate Silver’s successes in predicting U.S. election results, such as in the 2008 and 2012 presidential elections, were achieved using probability models to make sense of polls and to drive simulations.
  9. Medicine: The development of randomized clinical trials, in which patients are randomly assigned to receive treatment or placebo, has transformed medical research in recent years.
  10. History: Mosteller and Wallace used probability to find the true authorship of some of the articles in U.S. constitution by reading federalist papers.

Sample Space

A Sample Space is the set of all possible outcomes of an experiment. Experiment is used here in very broad sense. Anything which results in an outcome, which was not known in advance, is an experiment. An event is a subset of the sample space. One of the major breakthrough for studying probability was the introduction of set theory in this field. The mathematical framework for probability is built around sets. Earlier, people used analogy, intution and some heuristics to find probabilities. But the introduction of set theoretic concepts like sample space, events and the operations like union, intersection made this field scientific.

A Sample Space is usually denoted by set \(S\), and the events are denoted by capital letters e.g. \(A, B, C, \dots\)

sample-space

Naive definition of probability

Historically, the earliest definition of the probability of an event was to count the number of ways the event could happen and divide by the total number of possible outcomes for the experiment. We call this the naive definition since it is restrictive and relies on strong assumptions; nevertheless, it is important to understand, and useful when not misused.

Let \(A\) be an event for an experiment with finite sample space \(S\). The naive probability of \(A\) is \[ P(A) = \frac{|A|}{|S|} = \frac{\mathbb{number \;\; of \;\; favourable \;\; outcomes \;\;to\;\;}A}{\mathbb{total \;\; number \;\; of \;\; outcomes \;\; in \;\;}S}\] Here \(|A|\) denotes size of the \(A\). This definition of probability assumes finite sample space and all the outcomes are equally likely. For example, if the sample space corrensponds to all integer values or any interval on real line, then size of sample space becomes infinity and hence naive definition of probability doesn’t apply. Also it assumes the experiment is somehow symmetric with respect to outcome, then only all the outcomes are equally likely.

Example 1: If we toss a coin two times, then the sample space \(S = \{(HH), \;(HT), \;(TH), \;(TT) \}\). If we assume the coin is fair, then all four outcomes of this experiment are equally likely, then we can find probability of any event. Case in point we define an event \(A = \mathbb{\{Both \;\; toss \;\; results \;\; in \;\; different \;\; outcome\}}\), in other words \(A = \left\{(HT), \; (TH)\right\}\), then the \(P(A) = |A|/|S|\), which is \(2/4 = 0.5\).

Example 2: What is the probability of having life on Neptune. Applying naive definition of probability, either there is life on Neptune or not. So the probability is \(1/2\). This is clearly wrong, since both the outcomes (having life or not) are not equally likely. We see this kind of wrong application of naive definition regularly in uninformed circles.

Since the naive definition of probability rely on counting favourable outcomes, the next topic will cover counting.

Counting

Multiplication Rule:

If we have an experiment with \(n_1\) possible outcomes, and then for the each outcome of the first experiment we have \(n_2\) possible outcomes for second experiment and so on goes for \(r\) experiments, then there are overall \(n_1.n_2.\dots n_r\) outcomes for the combined experiment. Combined experiment consists of all the experiments.

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.