Introduction to Probability

Probability is a a game of chance and its been around for centuries. Famous mathematicians like Cardano, Fermat, and Pascal spent an incredible amount of time trying to figure this out.

The Birth of Probability Theory

Probability theory is not only useful in casinos and bets, but it’s also indispensable in any particular situation that depends on data affected by chance in some way.
Knowledge of probability is essential to data science.

Probability can be as straight forward like rolling dice on 7. There’s only 1/6 chance of this happening. But what about elections? Election forecaster Nate Silver gave Obama a 94% chance of winning in 2008, then 90% in 2012. Obama won both, and he was right. However, for 2016, he gave 71% chance of winning for Hillary Clinton. She lost. There are essential questions that are tackled in this section, like how are these probabilities calculated? What is being used to drive these forecasts?

We’ll cover election forecasting in the next module. We’ll also cover statistical inference which builds upon probability theory.

First Module

In this module, we will analyze the circumstances surrounding the financial crisis of 2007 to 2008. Part of what happened what the underestimation of risk of securities that financial companies sold. Specifically, the risk of mortgage backed securities and Collateralized Debt Obligations (or CDOs) were grossly underestimated.
The risk was assumed to be low, meaning the financial companies believed the home owners will make their monthly payments.
Since many home owners defaulted between 2007-2008, it resulted in a price crash of these securities. The banks lost so much money, they needed government bailouts to avoid closing down completely.

To understand this very complicating situation, we’ll first learn the basics of probability covered by these topics:
* random variables
* independence
* Monte Carlo simulation
* expected values
* standard errors
* margin of errors
* central limit theorem

Discrete Probability

Basic Principles of Categorical Data

The probability of categorical data is called discrete probability.
We will discuss the mathematical definition of probability to get precise answers to specific questions.

A more tangible way to think about the probability of an event is as a proportion of times the event occurs when we repeat the experiment over and over independently and under the same conditions.

Important notaions

  1. Pr(A) = notation probability of A to denote the probability of an event A happening.
  2. event = things that can happen when something happens by chance.
  • For continuous variables like height and money, we can use events like “x >= 6”, but in this lesson, we’ll focus on categorical data and discrete probability.

Monte Carlo simulation

Computers provide a way to actually perform the simple random experiments. Before computers, we would have to have a settling like color beads in a vase and pick at random.
Random number generators permit us to mimic the process of picking at random.
- Example in R is the sample function: sample()

rep() & sample()

  1. Use the rep function: rep() to generate the vase with beads (red & blue color).
beads <- rep( c("red", "blue"), times = c(2,3))
  • type “beads” to see it.
beads
## [1] "red"  "red"  "blue" "blue" "blue"

If you type sample( beads, 1), you will get one random sample.

sample( beads, 1)
## [1] "red"

We want to repeat this over & over.
- Since we cannot do this forever, we’re going to repeat the experiment a large amount of time enough where the results are equivalent to doing it forever.
- This is the Monte Carlo Simulation

What is not covered here is the rigorous definition of practically equivalent. There will be a more practical approach to decide what is large enough (repetition).

replicate()

The first example of Monte Carlo Simulation will use the replicate() function.
We’ll reenact the 2 red & 3 blue beads in the vase and see what probability we receive.

  1. Repeat the random event 10,000 times.
  • Set B to be 10,000, then use replicate function to sample from the beads 10,000 times.
B <- 10000
events <- replicate(B, sample(beads), 1)
  1. See if out definition is in agreement with Monte Carlo Simulation approximation.
  • Use table to see the distribution.
tab <- table(events)
tab
## events
##  blue   red 
## 30000 20000
  • Then use prop.table to give us the proportions.
prop.table(tab)
## events
## blue  red 
##  0.6  0.4
  • We see from the Monte Carlo simulation that it gives a perfect approximation of precisely 60% & 40%.

With and Without Replacement

If you take one bead out of the vase and do the experiment again, it’s without replacement. If you take bead out and put it back into the vase (keep the same # of count), it’s with replacement.
- We want to make sure to do it with replacement.

Without Replacement
sample(beads, 5)
## [1] "blue" "blue" "red"  "red"  "blue"
  • You would get an error if you tried a sample(bead, 6) since you run out of beads - without replacement.
With Replacement
  • Change the replace argument from default False to True in the sample() function.
events <- sample(beads, B, replace = TRUE)
prop.table(table(events))
## events
##   blue    red 
## 0.6057 0.3943
  • We see a very similar answer to when we used the replicate function.

Probability Distribution

Defining a distribution for categorical outcomes is pretty straight forward.
1) Assign a probability to each category.
- For the beads in the vase, the proportion of each bead color defines the distribution.
image:

In the next example, we’ll be using the four polling proportions.
Remember, categorical data makes it easy to define probability distributions.

Independence

TWo events are independent from each other if the outcome of one does not affect the other.
A classic example of the is coin tossing.
- Every time we toss a fair coin, the probability of seeing heads is 1/2 regardless of what previous tosses have revealed. Pr(heads) = 0.5
- In our beads & vase example, the event of choosing the beads is independent with replacement. The probability of picking a red bead is 40%.

Not Independent events

Events that are not independent are one event that affects the other. Without replacement.
If you take a blue bead and you don’t put it back, the likely hood of choosing a blue bead again will change.
If we use the sample() function and generate the data by assigning x, we would see the beads chosen without them being placed back. Without even guessing, we know what bead is left in the vase.

x <- sample(beads, 5)
x[2:5]
## [1] "blue" "red"  "blue" "red"

When events are not independent, conditional probabilities are useful and necessary to make correct calculations.
Example: Probability of choosing a King if one king was previously chosen without replacement.
Pr(Card 2 is a King | Card 1 is a King) = 3/51

The dash symbol | means “given that condition” or “conditional on.”

Example of 2 independent events:
Pr(A | B) = Pr(A)
The probability of A given B is equal to the probability of A. It doesn’t matter what B is, the probability of A is unchanged.

Multiplicative Rule

If we want to know the probability of A and B occurring, we use the multiplication rule.
Pr(A and B) = Pr(A) * PR(B|A)
- The probability of A and B is equal to the probability of A multiplied by the probability of B given that A already happened.

Example: In Blackjack, we need to get 2 cards that add up close to 21. These cards are given out, without replacement.
image:
- The probability of getting an Ace on the first round is 1/13.
- The probability of getting a face card after getting an ace is 16/52 (considering we took one card out).
- The probability of these two events happening is approximately 2%.

The multiplicative rule also applies to more than two events.
Pr(A and B and C) = Pr(A) Pr(B|A) PR(C|A and B)
So the probability of A and B and C is equal to the probability of A times the probability of B given that A happened times the probability of C that A and b happened.

If we have independent event, then it’s very simple to calculate. Just multiply the events. But if they were not independent, you would get very incorrect numbers.

Assessment: Introduction to Discrete Probability

1) Probability of cyan

One ball will be drawn at random from a box containing: 3 cyan balls, 5 magenta balls, and 7 yellow balls.
What is the probability that the ball will be cyan?

cyan <- 3
magenta <- 5
yellow <- 7

p1 = cyan/(cyan + magenta + yellow)
p1
## [1] 0.2

2) Probability of not cyan

One ball will be drawn at random from a box containing: 3 cyan balls, 5 magenta balls, and 7 yellow balls.
What is the probability that the ball will not be cyan?

p2 = 1 - p1
p2
## [1] 0.8

3) Sampling without replacement

Instead of taking just one draw, consider taking two draws. You take the second draw without returning the first draw to the box. We call this sampling without replacement.
What is the probability that the first draw is cyan and that the second draw is not cyan?

cyan <- 3
magenta <- 5
yellow <- 7

# The variable `p1` is the probability of choosing a cyan ball from the box on the first draw.
p1 <- cyan / (cyan + magenta + yellow)

# Assign a variable `p2` as the probability of not choosing a cyan ball on the second draw without replacement.
p2 <- 1 - (cyan - 1) / (cyan + magenta + yellow - 1)

# Calculate the probability that the first draw is cyan and the second draw is not cyan.
p1 * p2
## [1] 0.1714286

4) Sampling with replacement

Now repeat the experiment, but this time, after taking the first draw and recording the color, return it back to the box and shake the box. We call this sampling with replacement.
What is the probability that the first draw is cyan and that the second draw is not cyan?

cyan <- 3
magenta <- 5
yellow <- 7

# The variable `p_1` is the probability of choosing a cyan ball from the box on the first draw.
p1 <- cyan / (cyan + magenta + yellow)

# Assign a variable `p_2` as the probability of not choosing a cyan ball on the second draw with replacement.
p2 <- 1 - (cyan) / (cyan + magenta + yellow - 1)

# Calculate the probability that the first draw is cyan and the second draw is not cyan.
p1 * p2
## [1] 0.1571429