Introduction to Probability

Probability is a a game of chance and its been around for centuries. Famous mathematicians like Cardano, Fermat, and Pascal spent an incredible amount of time trying to figure this out.

The Birth of Probability Theory

Probability theory is not only useful in casinos and bets, but it’s also indispensable in any particular situation that depends on data affected by chance in some way.
Knowledge of probability is essential to data science.

Probability can be as straight forward like rolling dice on 7. There’s only 1/6 chance of this happening. But what about elections? Election forecaster Nate Silver gave Obama a 94% chance of winning in 2008, then 90% in 2012. Obama won both, and he was right. However, for 2016, he gave 71% chance of winning for Hillary Clinton. She lost. There are essential questions that are tackled in this section, like how are these probabilities calculated? What is being used to drive these forecasts?

We’ll cover election forecasting in the next module. We’ll also cover statistical inference which builds upon probability theory.

First Module

In this module, we will analyze the circumstances surrounding the financial crisis of 2007 to 2008. Part of what happened what the underestimation of risk of securities that financial companies sold. Specifically, the risk of mortgage backed securities and Collateralized Debt Obligations (or CDOs) were grossly underestimated.
The risk was assumed to be low, meaning the financial companies believed the home owners will make their monthly payments.
Since many home owners defaulted between 2007-2008, it resulted in a price crash of these securities. The banks lost so much money, they needed government bailouts to avoid closing down completely.

To understand this very complicating situation, we’ll first learn the basics of probability covered by these topics:
* random variables
* independence
* Monte Carlo simulation
* expected values
* standard errors
* margin of errors
* central limit theorem

Discrete Probability

Basic Principles of Categorical Data

The probability of categorical data is called discrete probability.
We will discuss the mathematical definition of probability to get precise answers to specific questions.

A more tangible way to think about the probability of an event is as a proportion of times the event occurs when we repeat the experiment over and over independently and under the same conditions.

Important notaions

Pr(A) = notation probability of A to denote the probability of an event A happening.
event = things that can happen when something happens by chance.

For continuous variables like height and money, we can use events like “x >= 6”, but in this lesson, we’ll focus on categorical data and discrete probability.

Monte Carlo simulation

Computers provide a way to actually perform the simple random experiments. Before computers, we would have to have a settling like color beads in a vase and pick at random.
Random number generators permit us to mimic the process of picking at random.
- Example in R is the sample function: sample()

rep() & sample()

Use the rep function: rep() to generate the vase with beads (red & blue color).

beads <- rep( c("red", "blue"), times = c(2,3))

type “beads” to see it.

beads

## [1] "red"  "red"  "blue" "blue" "blue"

If you type sample( beads, 1), you will get one random sample.

sample( beads, 1)

## [1] "blue"

We want to repeat this over & over.
- Since we cannot do this forever, we’re going to repeat the experiment a large amount of time enough where the results are equivalent to doing it forever.
- This is the Monte Carlo Simulation

What is not covered here is the rigorous definition of practically equivalent. There will be a more practical approach to decide what is large enough (repetition).

replicate()

The first example of Monte Carlo Simulation will use the replicate() function.
We’ll reenact the 2 red & 3 blue beads in the vase and see what probability we receive.

Repeat the random event 10,000 times.

Set B to be 10,000, then use replicate function to sample from the beads 10,000 times.

B <- 10000
events <- replicate(B, sample(beads), 1)

See if out definition is in agreement with Monte Carlo Simulation approximation.

Use table to see the distribution.

tab <- table(events)
tab

## events
##  blue   red 
## 30000 20000

Then use prop.table to give us the proportions.

prop.table(tab)

## events
## blue  red 
##  0.6  0.4

We see from the Monte Carlo simulation that it gives a perfect approximation of precisely 60% & 40%.

With and Without Replacement

If you take one bead out of the vase and do the experiment again, it’s without replacement. If you take bead out and put it back into the vase (keep the same # of count), it’s with replacement.
- We want to make sure to do it with replacement.

Without Replacement

sample(beads, 5)

## [1] "red"  "blue" "blue" "red"  "blue"

You would get an error if you tried a sample(bead, 6) since you run out of beads - without replacement.

With Replacement

Change the replace argument from default False to True in the sample() function.

events <- sample(beads, B, replace = TRUE)
prop.table(table(events))

## events
##   blue    red 
## 0.5918 0.4082

We see a very similar answer to when we used the replicate function.

Probability Distribution

Defining a distribution for categorical outcomes is pretty straight forward.
1) Assign a probability to each category.
- For the beads in the vase, the proportion of each bead color defines the distribution.
image:

Another example is from polling.
If you’re are randomly calling likely voters from a population that has 44% Democrat, 44% Republican, 10% undecided, and 2% green, these proportions define the probability for each group.

In the next example, we’ll be using the four polling proportions.
Remember, categorical data makes it easy to define probability distributions.

Independence

TWo events are independent from each other if the outcome of one does not affect the other.
A classic example of the is coin tossing.
- Every time we toss a fair coin, the probability of seeing heads is 1/2 regardless of what previous tosses have revealed. Pr(heads) = 0.5
- In our beads & vase example, the event of choosing the beads is independent with replacement. The probability of picking a red bead is 40%.

Not Independent events

Events that are not independent are one event that affects the other. Without replacement.
If you take a blue bead and you don’t put it back, the likely hood of choosing a blue bead again will change.
If we use the sample() function and generate the data by assigning x, we would see the beads chosen without them being placed back. Without even guessing, we know what bead is left in the vase.

x <- sample(beads, 5)
x[2:5]

## [1] "red"  "red"  "blue" "blue"

When events are not independent, conditional probabilities are useful and necessary to make correct calculations.
Example: Probability of choosing a King if one king was previously chosen without replacement.
Pr(Card 2 is a King | Card 1 is a King) = 3/51

The dash symbol | means “given that condition” or “conditional on.”

Example of 2 independent events:
Pr(A | B) = Pr(A)
The probability of A given B is equal to the probability of A. It doesn’t matter what B is, the probability of A is unchanged.

Multiplicative Rule

If we want to know the probability of A and B occurring, we use the multiplication rule.
Pr(A and B) = Pr(A) * PR(B|A)
- The probability of A and B is equal to the probability of A multiplied by the probability of B given that A already happened.

Example: In Blackjack, we need to get 2 cards that add up close to 21. These cards are given out, without replacement.
image:
- The probability of getting an Ace on the first round is 1/13.
- The probability of getting a face card after getting an ace is 16/52 (considering we took one card out).
- The probability of these two events happening is approximately 2%.

The multiplicative rule also applies to more than two events.
Pr(A and B and C) = Pr(A) Pr(B|A) PR(C|A and B)
So the probability of A and B and C is equal to the probability of A times the probability of B given that A happened times the probability of C that A and b happened.

If we have independent event, then it’s very simple to calculate. Just multiply the events. But if they were not independent, you would get very incorrect numbers.

Assessment: Introduction to Discrete Probability

1) Probability of cyan

One ball will be drawn at random from a box containing: 3 cyan balls, 5 magenta balls, and 7 yellow balls.
What is the probability that the ball will be cyan?

cyan <- 3
magenta <- 5
yellow <- 7

p1 = cyan/(cyan + magenta + yellow)
p1

## [1] 0.2

2) Probability of not cyan

One ball will be drawn at random from a box containing: 3 cyan balls, 5 magenta balls, and 7 yellow balls.
What is the probability that the ball will not be cyan?

p2 = 1 - p1
p2

## [1] 0.8

3) Sampling without replacement

Instead of taking just one draw, consider taking two draws. You take the second draw without returning the first draw to the box. We call this sampling without replacement.
What is the probability that the first draw is cyan and that the second draw is not cyan?

cyan <- 3
magenta <- 5
yellow <- 7

# The variable `p1` is the probability of choosing a cyan ball from the box on the first draw.
p1 <- cyan / (cyan + magenta + yellow)

# Assign a variable `p2` as the probability of not choosing a cyan ball on the second draw without replacement.
p2 <- 1 - (cyan - 1) / (cyan + magenta + yellow - 1)

# Calculate the probability that the first draw is cyan and the second draw is not cyan.
p1 * p2

## [1] 0.1714286

4) Sampling with replacement

Now repeat the experiment, but this time, after taking the first draw and recording the color, return it back to the box and shake the box. We call this sampling with replacement.
What is the probability that the first draw is cyan and that the second draw is not cyan?

cyan <- 3
magenta <- 5
yellow <- 7

# The variable `p_1` is the probability of choosing a cyan ball from the box on the first draw.
p1 <- cyan / (cyan + magenta + yellow)

# Assign a variable `p_2` as the probability of not choosing a cyan ball on the second draw with replacement.
p2 <- 1 - (cyan) / (cyan + magenta + yellow - 1)

# Calculate the probability that the first draw is cyan and the second draw is not cyan.
p1 * p2

## [1] 0.1571429

1.2 Combinations and Permutations

Combinations and Permutations

Probability computations are not always straight forward.
For example, what is the probability of drawing 5 cards (without replacement) of the same suit? A Flush in Poker?
Discrete probability teaches us how to make these computations using mathematics.

We’re going to construct a deck of cards using R.
For this, we need the expand.grid() and paste() function.

paste() is used to joing strings together.

Example 1: Print “three hearts” using number and suit variable.

number <- "Three"
suit <- "Hearts"

paste(number, suit)

## [1] "Three Hearts"

Example 2: It also works on pairs of vectors. It performs the operation element-wise.

paste(letters[1:5], as.character(1:5))

## [1] "a 1" "b 2" "c 3" "d 4" "e 5"

expand.grid() gives all the combinations of 2 lists.

Example 1: Find all the combination of blue & blank Pants with white, grey, or plaid Shirt.

expand.grid(pants = c("blue", "black"), shirt = c("white", "grey", "plaid"))

##   pants shirt
## 1  blue white
## 2 black white
## 3  blue  grey
## 4 black  grey
## 5  blue plaid
## 6 black plaid

Generate a Deck of cards

suits <- c("Diamonds", "Clubs", "Hearts", "Spades")
numbers <- c("Ace", "Deuce", "Three", "Four", "Five", "Six", "Seven", "Eight", "Nine", "Ten", "Jack", "Queen", "King") 
deck <- expand.grid( number = numbers, suit = suits)
deck <- paste(deck$number, deck$suit)

deck

##  [1] "Ace Diamonds"   "Deuce Diamonds" "Three Diamonds" "Four Diamonds" 
##  [5] "Five Diamonds"  "Six Diamonds"   "Seven Diamonds" "Eight Diamonds"
##  [9] "Nine Diamonds"  "Ten Diamonds"   "Jack Diamonds"  "Queen Diamonds"
## [13] "King Diamonds"  "Ace Clubs"      "Deuce Clubs"    "Three Clubs"   
## [17] "Four Clubs"     "Five Clubs"     "Six Clubs"      "Seven Clubs"   
## [21] "Eight Clubs"    "Nine Clubs"     "Ten Clubs"      "Jack Clubs"    
## [25] "Queen Clubs"    "King Clubs"     "Ace Hearts"     "Deuce Hearts"  
## [29] "Three Hearts"   "Four Hearts"    "Five Hearts"    "Six Hearts"    
## [33] "Seven Hearts"   "Eight Hearts"   "Nine Hearts"    "Ten Hearts"    
## [37] "Jack Hearts"    "Queen Hearts"   "King Hearts"    "Ace Spades"    
## [41] "Deuce Spades"   "Three Spades"   "Four Spades"    "Five Spades"   
## [45] "Six Spades"     "Seven Spades"   "Eight Spades"   "Nine Spades"   
## [49] "Ten Spades"     "Jack Spades"    "Queen Spades"   "King Spades"

Use the Deck of Cards we constructed for the next questions.
1) Check that the probability of a king in the first card is 1 in 13.
- Compute the proportion of prossible outcomes that satisfy our condition.
Instruction
Create a vector that contains the four possibilities of getting a King.
Then use the mean() function to check the proportion of the deck for one of the King cards.

kings <- paste("King", suits)
mean(deck %in% kings)

## [1] 0.07692308

Answer 0.076.. = 1/13

What is the conditional probability of the second card being a King?

Since we took one king out already, we would subtract it. \(4/51 - 1/51 = 3/51 = 1/17\)

Permutation()

Confirm the above question by using combination() and permutations() function from the gtools package.

Permutation Function: Computes for any list of size n all the different ways we can select R items.
Example: all the ways we can choose 2 numbers from the list 1,2,3,4,5.

library(gtools)
permutations(5, 2)

##       [,1] [,2]
##  [1,]    1    2
##  [2,]    1    3
##  [3,]    1    4
##  [4,]    1    5
##  [5,]    2    1
##  [6,]    2    3
##  [7,]    2    4
##  [8,]    2    5
##  [9,]    3    1
## [10,]    3    2
## [11,]    3    4
## [12,]    3    5
## [13,]    4    1
## [14,]    4    2
## [15,]    4    3
## [16,]    4    5
## [17,]    5    1
## [18,]    5    2
## [19,]    5    3
## [20,]    5    4

Note: order matters. Also notice that there are no 1,1; 2,2; or 3,3. Once a number is picked, it cannot appear again.

For the Permutation function, we can optionally add a vector.
Example: Generate 5 random 7-digit phone numbers out of all possible phone numbers.

all_phone_numbers <- permutations(10, 7, v = 0:9)
n <- nrow(all_phone_numbers)
index <- sample(n, 5)
all_phone_numbers[index, ]

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,]    2    9    5    6    0    8    3
## [2,]    9    1    7    5    6    3    8
## [3,]    4    6    1    8    7    9    5
## [4,]    7    9    4    8    1    5    6
## [5,]    0    9    1    6    2    3    8

We define a vector from 0 to 9. This generates all phone numbers, picks 5 at random, then prints it.

Choose 2 cards when the order matters. Use permutation(). There’s 52 cards, we’re going to choose 2, and select them out of deck vector we created earlier.

hands <- permutations(52, 2, v = deck)

This will be a matrix with 2 dimensions, 2 columns, and 2,652 rows.

Next, identify the first and second columns using this code:

first_card <- hands[ ,1]
second_card <- hands[ ,2]

We can check out how many cases have a first card that is a King.

sum(first_card %in% kings)

## [1] 204

Now, find the conditional probability by calculating the fraction of these 204 cases have also a King in the second card.

Add all the cases that have King in the first and second, and divide by the cases that have a king in the first.

sum(first_card %in% kings & second_card %in% kings) /
  sum(first_card %in% kings)

## [1] 0.05882353

Answer 0.058… = \(3/51\)

The code below will give the same answer as above. We’re computing the proportions instead of the totals.

mean(first_card %in% kings & second_card %in% kings) /
  mean(first_card %in% kings)

## [1] 0.05882353

This is an R version of the multiplication rule, which tells us the probability of P, given A, is equal to proportion of A and B, or the probability of A and B, divided by the proportion of A or the probability of A.
\(Pr(B|A) = Pr(A and B)/Pr(A)\)

When Orders Do NOT Matter (Combinations)

When order doesn’t matter, like in blackjack, it doesn’t matter if you get a ace first and a 10 second, it still equals 21.
For this, we need to use cominations() instead of permutations().

Comapare Permutation & Combination

Look at the difference between the permutation() function and combination() function.

permutations(3,2)

##      [,1] [,2]
## [1,]    1    2
## [2,]    1    3
## [3,]    2    1
## [4,]    2    3
## [5,]    3    1
## [6,]    3    2

combinations(3,2)

##      [,1] [,2]
## [1,]    1    2
## [2,]    1    3
## [3,]    2    3

Since order dosen’t matter for combinations(), notice that “2,1” doesn’t appear because 1,2 already appeared. Similarly, 3,1 and 3,2 don’t appear as well.

Compute the probability in blackjack

Define a vector for each: all aces, all face cards.
Generate all combinations for picking 2 cards out of 52, and then count it.

#vector for all aces
aces <- paste("Ace", suits)

#vector for all face cards
facecard <- c("King", "Queen", "Jack", "Ten")
facecard <- expand.grid( number=facecard, suit=suits)
facecard <- paste( facecard$number, facecard$suit)

#combination of picking 2 cards out of 52
hands <- combinations(52, 2, v=deck)

#Count how many times we chose a ace & facecard.
mean(hands[,1] %in% aces & hands[,2] %in% facecard)

## [1] 0.04826546

Note: the code above assumes the Ace comes first.

Here is the code that considers BOTH possibilities (Ace first or Facecard first):

mean((hands[,1] %in% aces & hands[,2] %in% facecard) |
  (hands[,2] %in% aces & hands[,1] %in% facecard))

## [1] 0.04826546

Instead of Combinations(), use Monte Carlo

Instead of using combinations() to deduce the exact probability of a natural 21 in Blackjack, lets use Monte Carlo simulation to estimate this probability.
- In this case, we would draw two cards over and over, and keep track of how many 21’s we get.
- Use the function sample() to draw a card without replacement.

hand <- sample(deck, 2)
hand

## [1] "Eight Clubs"   "King Diamonds"

Here’s just one hand. We didn’t get a 21.
We just repeat this over & over to get a good approximation.

B <- 10000
results <- replicate(B, {
  hand <- sample(deck, 2)
  (hand[1] %in% aces & hand[2] %in% facecard) |
    (hand[2] %in% aces & hand[1] %in% facecard)
})
mean(results)

## [1] 0.0497

Note how the approximate calculation is extremely close to the exact approximation.

HarvardX Intro to Probability