Introduction

In this project, I will use the lottery data set of the national 6/49 lottery game in Canada from Kaggle to apply probability concepts in the lottery scenario. The data set has data for 3,665 drawings, dating from 1982 to 2018.

The goal of this analysis is to to understand how incredibly small the probability of winning the lottery is.

Getting Started

In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. The drawing is done without replacement, so once a number is drawn, it’s not put back in the set.

I’ll start by writing two functions:

factorial <- function(n){
  product = 1
  for (i in 1:n) {
    product = product * i
  }
  return(product)
}

combinations <- function(n, k) {
  numerator <- factorial(n)
  denominator <- factorial(k) * factorial(n - k)
  return(numerator / denominator)
}

Next, I will build on the above function to write another function to calculate the probability of winning the big prize. In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. A player wins the big prize if the six numbers on their tickets match all the six numbers drawn. If a player has a ticket with the numbers {13, 22, 24, 27, 42, 44}, he only wins the big prize if the numbers drawn are {13, 22, 24, 27, 42, 44}. Even if just one number differs, they won’t win.

one_ticket_probability <- function(nums) {
  total_combinations <- combinations(49, 6) #Total # of outcomes
  prob <- (1 / total_combinations) * 100    #Probability of given set of numbers (one ticket) to win
  pretty_prob <- sprintf("%1.9f", prob)     #Converting from scientific notation to decimal number with 9 decimal places
  
  s <- paste("You have a ", pretty_prob, "% chance of winning the big prize.", sep = "")
  return(s)
}
one_ticket_probability(c(1, 2, 3, 4, 5, 6))
## [1] "You have a 0.000007151% chance of winning the big prize."

As seen above, the probability of winning the big prize with a single ticket is a very small number!

Comparing to historical data

Now, let us compare the ticket against past winning combinations in the historical lottery data in Canada. This is being done to see if they would’ve ever won. First, some data familiarization.

library(tidyverse)
lottery649 <- read_csv("649.csv")
print(dim(lottery649))
## [1] 3665   11
head(lottery649, 3)
## # A tibble: 3 x 11
##   PRODUCT `DRAW NUMBER` `SEQUENCE NUMBER` `DRAW DATE` `NUMBER DRAWN 1`
##     <dbl>         <dbl>             <dbl> <chr>                  <dbl>
## 1     649             1                 0 6/12/1982                  3
## 2     649             2                 0 6/19/1982                  8
## 3     649             3                 0 6/26/1982                  1
## # ... with 6 more variables: NUMBER DRAWN 2 <dbl>, NUMBER DRAWN 3 <dbl>,
## #   NUMBER DRAWN 4 <dbl>, NUMBER DRAWN 5 <dbl>, NUMBER DRAWN 6 <dbl>,
## #   BONUS NUMBER <dbl>
tail(lottery649, 3)
## # A tibble: 3 x 11
##   PRODUCT `DRAW NUMBER` `SEQUENCE NUMBER` `DRAW DATE` `NUMBER DRAWN 1`
##     <dbl>         <dbl>             <dbl> <chr>                  <dbl>
## 1     649          3589                 0 6/13/2018                  6
## 2     649          3590                 0 6/16/2018                  2
## 3     649          3591                 0 6/20/2018                 14
## # ... with 6 more variables: NUMBER DRAWN 2 <dbl>, NUMBER DRAWN 3 <dbl>,
## #   NUMBER DRAWN 4 <dbl>, NUMBER DRAWN 5 <dbl>, NUMBER DRAWN 6 <dbl>,
## #   BONUS NUMBER <dbl>

Now, I will write a function that creates a list of all historical numbers drawn.

#outputs a list of vectors each containing the 6 numbers drawn
historical_lots <- pmap(
  list(
    u <- lottery649$`NUMBER DRAWN 1`,
    v <- lottery649$`NUMBER DRAWN 2`,
    w <- lottery649$`NUMBER DRAWN 3`,
    x <- lottery649$`NUMBER DRAWN 4`,
    y <- lottery649$`NUMBER DRAWN 5`,
    z <- lottery649$`NUMBER DRAWN 6`
  ), 
  .f <- function(u, v, w, x, y, z) { c(u, v, w, x, y, z) }
  )
check_historical_occurrences <- function(lot, hist_lots = historical_lots) {
  historical_matches <- map(hist_lots, function(x) {setequal(x, lot)})
  num_past_matches <- sum(unlist(historical_matches))
  s <- paste("The combination you entered has appeared ", 
             num_past_matches, 
             " times in the past.")
  return(s)
}
check_historical_occurrences(c(3, 12, 11, 14, 41, 43))
## [1] "The combination you entered has appeared  1  times in the past."
check_historical_occurrences(c(1, 2, 3, 4, 5, 6))
## [1] "The combination you entered has appeared  0  times in the past."

Often times, those who are addicted to buying lottery tickets, buy multiple tickets assuming that this increases their chances of winning by a lot. Let us see how the odds change based on the number of tickets bought.

multi_ticket_probability <- function(n) {
  total_combinations <- combinations(49, 6)
  prob <- (n / total_combinations) * 100
  pretty_prob <- sprintf("%1.9f", prob)
  s <- paste("you have a ", pretty_prob, "% chance of winning the big prize.", sep = "")
  return(s)
}
test_amounts <- c(2, 10, 100, 10000, 1000000, 6991908, 13983816)
for (n in test_amounts) {
  print(paste("For ", n, " tickets, ",  multi_ticket_probability(n), sep = ""))
}
## [1] "For 2 tickets, you have a 0.000014302% chance of winning the big prize."
## [1] "For 10 tickets, you have a 0.000071511% chance of winning the big prize."
## [1] "For 100 tickets, you have a 0.000715112% chance of winning the big prize."
## [1] "For 10000 tickets, you have a 0.071511238% chance of winning the big prize."
## [1] "For 1e+06 tickets, you have a 7.151123842% chance of winning the big prize."
## [1] "For 6991908 tickets, you have a 50.000000000% chance of winning the big prize."
## [1] "For 13983816 tickets, you have a 100.000000000% chance of winning the big prize."

As we can see, the probability only starts to grow high enough to make a difference if more than 10,000 tickets are bought.

There is another scenario where a player may want to know the probability of having 3,4 or 5 winning numbers in the 6 number draw they made.

probability_less_6 <- function(n) {
  
    n_combinations_ticket = combinations(6, n)
    n_combinations_remaining = combinations(49 - n, 6 - n)
    successful_outcomes = n_combinations_ticket * n_combinations_remaining
    n_combinations_total = combinations(49, 6)
    
    prob = (successful_outcomes / n_combinations_total) * 100
    pretty_prob <- sprintf("%1.9f", prob)
  
  s <- paste("you have a ", pretty_prob, "% chance of winning the prize.", sep = "")
  return(s)
}
winning_nums <- c(3, 4, 5)
for (n in winning_nums) {
  print(paste("For ", n, " numbers, ",  probability_less_6(n), sep = ""))
}
## [1] "For 3 numbers, you have a 2.171081198% chance of winning the prize."
## [1] "For 4 numbers, you have a 0.106194189% chance of winning the prize."
## [1] "For 5 numbers, you have a 0.001887897% chance of winning the prize."

Conclusion

We tried to understand and quantify the chances of winning the big prize and even the smaller prizes for matching lesser numbers by writing functions in R to calculate the factorial and combination based on an input. We now that unless one buys a ridiculous number of tickets, the probability of winning at the lottery is very low. That shouldn’t stop us from trying but should be enough to know where to draw the line.