Mathematics is the logic of certainty; probability is the logic of uncertainty. Probability deals with qunatifying uncertainty and updating those beliefs when new evidence comes into light. It is extremely useful in a wide variety of fields, since it provides tools for understanding and explaining variation, separating signal from noise, and modeling complex phenomena. To give just a small sample from a continually growing list of applications:
A Sample Space
is the set of all possible outcomes of an experiment. Experiment is used here in very broad sense. Anything which results in an outcome, which was not known in advance, is an experiment. An event
is a subset of the sample space. One of the major breakthrough for studying probability was the introduction of set theory in this field. The mathematical framework for probability is built around sets. Earlier, people used analogy, intution and some heuristics to find probabilities. But the introduction of set theoretic concepts like sample space, events and the operations like union, intersection made this field scientific.
A Sample Space
is usually denoted by set \(S\), and the events
are denoted by capital letters e.g. \(A, B, C, \dots\)
Historically, the earliest definition of the probability of an event was to count the number of ways the event could happen and divide by the total number of possible outcomes for the experiment. We call this the naive definition since it is restrictive and relies on strong assumptions; nevertheless, it is important to understand, and useful when not misused.
Let \(A\) be an event for an experiment with finite sample space \(S\). The naive probability
of \(A\) is \[ P(A) = \frac{|A|}{|S|} = \frac{\mathbb{number \;\; of \;\; favourable \;\; outcomes \;\;to\;\;}A}{\mathbb{total \;\; number \;\; of \;\; outcomes \;\; in \;\;}S}\] Here \(|A|\) denotes size of the \(A\). This definition of probability assumes finite sample space and all the outcomes are equally likely. For example, if the sample space corrensponds to all integer values or any interval on real line, then size of sample space becomes infinity and hence naive definition of probability doesn’t apply. Also it assumes the experiment is somehow symmetric with respect to outcome, then only all the outcomes are equally likely.
Example 1: If we toss a coin two times, then the sample space \(S = \{(HH), \;(HT), \;(TH), \;(TT) \}\). If we assume the coin is fair, then all four outcomes of this experiment are equally likely, then we can find probability of any event. Case in point we define an event \(A = \mathbb{\{Both \;\; toss \;\; results \;\; in \;\; different \;\; outcome\}}\), in other words \(A = \left\{(HT), \; (TH)\right\}\), then the \(P(A) = |A|/|S|\), which is \(2/4 = 0.5\).
Example 2: What is the probability of having life on Neptune. Applying naive definition of probability, either there is life on Neptune or not. So the probability is \(1/2\). This is clearly wrong, since both the outcomes (having life or not) are not equally likely. We see this kind of wrong application of naive definition regularly in uninformed circles.
Since the naive definition of probability rely on counting favourable outcomes, the next topic will cover counting.
If we have an experiment with \(n_1\) possible outcomes, and then for the each outcome of the first experiment we have \(n_2\) possible outcomes for second experiment and so on goes for \(r\) experiments, then there are overall \(n_1.n_2.\dots n_r\) outcomes for the combined experiment. Combined experiment consists of all the experiments.
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
You can also embed plots, for example:
Note that the echo = FALSE
parameter was added to the code chunk to prevent printing of the R code that generated the plot.