STAT3000 100224

Probability

The subset of probability is referred to as discrete probability. It will help us understand the probability theory we will later introduce for numeric and continuous data, which is much more common in data science applications.

Bead case

I have 2 red beads and 3 blue beads inside a box and I pick one at random, what is the chance of picking a red one? Your intuition tells you \(2/5 = 40%\). A precise definition can be given by noting that there are five possible outcomes of which two satisfy the condition necessary for the event “pick a red bead”. Since each of the five outcomes has the same chance of occurring, we conclude that the probability is .4 for red and .6 for blue.

A more tangible way to think about the probability of an event is as the proportion of times the event occurs when we repeat the experiment an infinite number of times, independently, and under the same conditions.

We can perform this experiment.

Let’s first perform one experiment.

beads <- rep(c("red", "blue"), times = c(2,3))
beads

## [1] "red"  "red"  "blue" "blue" "blue"

sample(beads, 1)

## [1] "blue"

This line of code produces one random outcome. We want to repeat this experiment an infinite number of times, but it is impossible to repeat forever. Instead, we repeat the experiment a large enough number of times to make the results practically equivalent to repeating forever.

events <- replicate(100000,sample(beads,1))
prop.table(table(events))

## events
##    blue     red 
## 0.60121 0.39879

Independence

sample(beads,5,replace = FALSE)

## [1] "blue" "red"  "blue" "blue" "red"

sample(beads,5,replace =TRUE)

## [1] "red"  "blue" "blue" "blue" "blue"

Combinations and permutations

First, let’s construct a deck of cards. For this, we will use the expand.grid and paste functions. We use paste to create strings by joining smaller strings. To do this, we take the number and suit of a card and create the card name like this:

number <- "Three"
suit <- "Hearts"
paste(number, suit)

## [1] "Three Hearts"

paste also works on pairs of vectors performing the operation element-wise:

paste(letters[1:5], as.character(1:5))

## [1] "a 1" "b 2" "c 3" "d 4" "e 5"

The function expand.grid gives us all the combinations of entries of two vectors.

expand.grid(pants = c("blue", "black"), shirt = c("white", "grey", "red"))

##   pants shirt
## 1  blue white
## 2 black white
## 3  blue  grey
## 4 black  grey
## 5  blue   red
## 6 black   red

Here is how we generate a deck of cards:

suits <- c("Diamonds", "Clubs", "Hearts", "Spades")
numbers <- c("Ace", "Deuce", "Three", "Four", "Five", "Six", "Seven", 
             "Eight", "Nine", "Ten", "Jack", "Queen", "King")
deck <- expand.grid(number=numbers, suit=suits)
deck <- paste(deck$number, deck$suit)

So the length of deck is 52. We can double check that the probability of a King in the first card is 1/13 by computing the proportion of possible outcomes that satisfy our condition:

kings <- paste("King", suits)
mean(deck %in% kings)

## [1] 0.07692308

For any list of size n, this function computes all the different combinations we can get when we select r items.

permutations(3, 2)

##      [,1] [,2]
## [1,]    1    2
## [2,]    1    3
## [3,]    2    1
## [4,]    2    3
## [5,]    3    1
## [6,]    3    2

Notice that the order matters here: 3,1 is different than 1,3. Also, note that (1,1), (2,2), and (3,3) do not appear because once we pick a number, it can’t appear again.

To compute all possible ways we can choose two cards when the order matters, we type:

hands <- permutations(52, 2, v = deck)

first_card <- hands[,1]
second_card <- hands[,2]

Now the cases for which the first hand was a King can be computed like this:

kings <- paste("King", suits)
sum(first_card %in% kings)

## [1] 204

To get the conditional probability, we compute what fraction of these have a King in the second card:

sum(first_card%in%kings & second_card%in%kings) / sum(first_card%in%kings)

## [1] 0.05882353

which is exactly 3/51, as we had already deduced.

How about if the order doesn’t matter? For example, in Blackjack if you get an Ace and a face card in the first draw, it is called a Natural 21 and you win automatically. If we wanted to compute the probability of this happening, we would enumerate the combinations, not the permutations, since the order does not matter.

combinations(3,2)

##      [,1] [,2]
## [1,]    1    2
## [2,]    1    3
## [3,]    2    3

In the second line, the outcome does not include (2,1) because (1,2) already was enumerated. The same applies to (3,1) and (3,2).

So to compute the probability of a Natural 21 in Blackjack, we can do this:

aces <- paste("Ace", suits)

facecard <- c("King", "Queen", "Jack", "Ten")
facecard <- expand.grid(number = facecard, suit = suits)
facecard <- paste(facecard$number, facecard$suit)

hands <- combinations(52, 2, v = deck)
mean(hands[,1] %in% aces & hands[,2] %in% facecard)

## [1] 0.04826546

Monte Carlo example

In this case, we draw two cards over and over and keep track of how many 21s we get.

hand <- sample(deck, 2)
hand

## [1] "Four Spades" "Ace Clubs"

And then check if one card is an Ace and the other a face card or a 10. Going forward, we include 10 when we say face card. Now we need to check both possibilities:

(hands[1] %in% aces & hands[2] %in% facecard)  | 
  (hands[2] %in% aces & hands[1] %in% facecard)

## [1] FALSE

blackjack <- function(){
   hand <- sample(deck, 2)
  (hand[1] %in% aces & hand[2] %in% facecard) | 
    (hand[2] %in% aces & hand[1] %in% facecard)
}
B <- 10000
results <- replicate(B, blackjack())
mean(results)

## [1] 0.0473