Many people start playing the lottery for fun, but for some, this activity can escalate into habit and addiction. Similar to other compulsive gamblers, lottery addicts may start using savings and loans to buy tickets. A mobile app is required which is intended to guide lottery addicts through exercises that will let them better estimate their chances of winning. The hope is that this app will help them realize that buying too many tickets will do little to improve their chances of winning. This code will form the logic for the app.

#Load libraries and the data
Warning messages:
1: In readChar(file, size, TRUE) : truncating string with embedded nuls
2: In readChar(file, size, TRUE) : truncating string with embedded nuls
3: In readChar(file, size, TRUE) : truncating string with embedded nuls
4: In readChar(file, size, TRUE) : truncating string with embedded nuls
5: In readChar(file, size, TRUE) : truncating string with embedded nuls
library (readr)
library(stringr)
library(ggplot2)
library(dplyr)
library(purrr)
library(tidyr)
library(magrittr)
library(ggmap)
library(mapproj)
library(viridis)
library(RColorBrewer)
library(grid)

setwd("C:/Users/Ana/Desktop/Data Analytics/CSV Files")

#import data
data <- read_csv("649.csv") 
Parsed with column specification:
cols(
  PRODUCT = col_double(),
  `DRAW NUMBER` = col_double(),
  `SEQUENCE NUMBER` = col_double(),
  `DRAW DATE` = col_character(),
  `NUMBER DRAWN 1` = col_double(),
  `NUMBER DRAWN 2` = col_double(),
  `NUMBER DRAWN 3` = col_double(),
  `NUMBER DRAWN 4` = col_double(),
  `NUMBER DRAWN 5` = col_double(),
  `NUMBER DRAWN 6` = col_double(),
  `BONUS NUMBER` = col_double()
)
#view beginning and end of dataset
head(data, 3)
tail(data, 3)

#return number of rows in dataset
print(nrow(data))
[1] 3665
#return number of columns in dataset
print(ncol(data))
[1] 11

#count how many unique entries there are for each column
colnames(data)
 [1] "PRODUCT"         "DRAW NUMBER"     "SEQUENCE NUMBER" "DRAW DATE"       "NUMBER DRAWN 1"  "NUMBER DRAWN 2" 
 [7] "NUMBER DRAWN 3"  "NUMBER DRAWN 4"  "NUMBER DRAWN 5"  "NUMBER DRAWN 6"  "BONUS NUMBER"   
for(i in colnames(data)){
  print(as.integer(count(unique(data[,i]))))
}
[1] 1
[1] 3591
[1] 4
[1] 3591
[1] 34
[1] 42
[1] 43
[1] 43
[1] 38
[1] 33
[1] 50
sort(unique(data$`NUMBER DRAWN 1`))
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 38

Next, calculate the probability of winning the draw with a single ticket.

#The order of the permutations doesn't matter in this instance, so we are looking for the total number of combinations. This is given by C = n!/((n-k)!k!) 
#Where:
#n = total number of numbers i.e. 49, k = number allowed i.e. 6.

#first, write a function which will calculate the factorial of any number.

factorial <- function(n){
  answer<-1
  for(i in 1:n) {
    answer<-answer*i
  }
  return(answer)
}

#now write a function to calculate the number of permutations when only a certain number of balls are drawn.

permutation <- function(n, k) {
  factorial(n)/factorial(n-k)
}

#now write a function to calculate the number of combinations when only a certain number of balls are drawn.

combinations <- function(n, k) {
  permutation(n, k)/factorial(k)
}

#the probability of winning the draw with a single ticket is therefore 1 divided by the number of combinations.

combinations(49,6)
[1] 13983816
p_win_1t <- 1/combinations(49,6)
p_win_1t
[1] 7.151124e-08

The probability of winning the lottery with one ticket is 1 in ~14million or 7e-8 or 0.00000007 or 0.000007%

The app wants the user to be able to enter 6 numbers i.e. their chosen numbers, and for the app to display the chance of them winning with those numbers. Although I’ve already calculated the chance, I’ll code to take a input of 6 numbers and return the chance of winning with those numbers in a user-friendly way.

#write a function that takes an input of 6 numbers and returns the probability of winning with those 6 numbers
#the actual numbers selected doesn't make any difference, so k equals the length of the vector i.e. 6.
#the number of combinations i.e. n_combs is calculated by passing k through the combinations function. n set to 49 as this isn't changing.
#to make it easy for the user to understand the chance, I have printed a sentence which presents the liklihood as 1 in X

one_ticket_probability <- function(a, b, c, d, e, f) {
  vector <- c(a, b, c, d, e, f)
  print(vector)
  k <- length(vector)
  n_combs <- combinations(49, k)
  output <- sprintf("The chance of winning the lottery with these numbers is 1 in %i", n_combs)
  print(output)
}

#here's some example user imput for the numbers
one_ticket_probability(2, 19, 36, 12, 11, 1)
[1]  2 19 36 12 11  1
[1] "The chance of winning the lottery with these numbers is 1 in 13983816"

Next we want to look at the probability of winning the lottery if we play more than one ticket. For instance, what is the probability of winning if we play 40 tickets?

#this function takes the usual inputs i.e. n = 49 and k = 6, and adds another input for the number of tickets bought, t. Since we are looking at whether the first ticket OR the second ticket OR the third ticket etc will win, we can add the probabilities. Since the probabilities of winning any ticket are the same, we can just multiply the probability of one ticket being the winning ticket by the number of tickets you buy. 

multi_ticket_probability <- function(t, n, k){
  p <- t/combinations(n, k)
  cat("The probability of winning if you buy", t, "tickets is", (p*100), "%", "\n")
}

num_tickets <- c(1, 10, 40, 100, 10000, 1000000, 6991908, 13983816)

for(i in num_tickets){
  multi_ticket_probability(i, 49, 6)
}
The probability of winning if you buy 1 tickets is 7.151124e-06 % 
The probability of winning if you buy 10 tickets is 7.151124e-05 % 
The probability of winning if you buy 40 tickets is 0.000286045 % 
The probability of winning if you buy 100 tickets is 0.0007151124 % 
The probability of winning if you buy 10000 tickets is 0.07151124 % 
The probability of winning if you buy 1e+06 tickets is 7.151124 % 
The probability of winning if you buy 6991908 tickets is 50 % 
The probability of winning if you buy 13983816 tickets is 100 % 

The probability of winning by buying 40 tickets is now 1 in ~350,000 or 0.000003 or 0.0003%. It is still an incredibly small chance.

Even if you buy 10,000 tickets, the chance you have a winning ticket is only 0.07%!

You’d need to buy a million tickets to have just a 7% chance of winning!

If you purchase 13983816 different tickets, you have a 100% chance of winning. This is because there are 13983816 combinations you can possibly have, so you must have the correct combination in there. Problem is that you’d have to spend £13983816 on buying the tickets, which is more than you’re likely to win.

Next, what is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket? First, create a function which returns the probability of getting exactly 5 (or 4 or 3) winning numbers on a single ticket.

#the probability of picking 1 correct number is 6/49. 
#the probability of picking 2 correct numbers is 6/49*5/48
#the probability of picking all 6 correct answer is show below.

#(6/49)*(5/48)*(4/47)*(3/46)*(2/45)*(1/44)

#this can be written as P = k!/(n!/(n-k)!)
#P = 1/C where c = number of combinations 
#so, C = n!/((n-k)!k!) as already found above

#but there is another input needed for this question which is the number of matching balls, m
#for m = 2, then the formula is

#(6/49)*(5/48)

#i.e. (k!/(k-m)!)/(n!/(n-m)!)

#write a function to calculate the probability of getting m matching balls

matching_balls <- function(n, k, m){
  answer <- if_else(m!=6, (factorial(k)/factorial(k-m))/(factorial(n)/factorial(n-m)), 1/combinations(n, k))
}

matching_balls(49, 6, 1)

for (i in 1:6) {
  prob <- matching_balls(49, 6, i)
  cat("The probability of getting exactly", i, "matching numbers is", prob, "\n")
}
The probability of getting exactly 1 matching numbers is 0.122449 
The probability of getting exactly 2 matching numbers is 0.0127551 
The probability of getting exactly 3 matching numbers is 0.001085541 
The probability of getting exactly 4 matching numbers is 7.079613e-05 
The probability of getting exactly 5 matching numbers is 3.146494e-06 
The probability of getting exactly 6 matching numbers is 7.151124e-08 

Now the probability of getting AT LEAST 5 matching numbers is probability of getting 5 OR 6 matching numbers. Therefore, addition can be used to sum up the inidivical probabilities.

#write a function that calculated the probability of getting at least x number of matching balls. This uses the matching_balls function and then uses a for loop to iterate through x:6 to add up the probabilities. 

#the function takes three inputs - n = total number of balls, k = number of balls drawn, mx = number of matching numbers

at_least_x_matches <- function(n, k, mx){
  chance <- 0
  for(i in mx:6){
    chance<-chance+matching_balls(n, k, i)
  }
  cat("The probability of getting AT LEAST", mx, "matching numbers is", chance, "\n")
}

at_least_x_matches(49, 6, 1)
The probability of getting AT LEAST 1 matching numbers is 0.1363636 

Next create a list containing all the winning number combinations.

#extract the winning numbers as a vector. Each vector contains all the winning numbers by position. 

number_1 <- data$`NUMBER DRAWN 1`
number_2 <- data$`NUMBER DRAWN 2`
number_3 <- data$`NUMBER DRAWN 3`
number_4 <- data$`NUMBER DRAWN 4`
number_5 <- data$`NUMBER DRAWN 5`
number_6 <- data$`NUMBER DRAWN 6`

#create a list of all the winning numbers by position.

numbers_list <- list(number_1, number_2, number_3, number_4, number_5, number_6) 

#create vectors of each of the winning number combinations. 

winning_numbers <- pmap(numbers_list, function(a, b, c, d, e, f) {c(a,b,c,d,e,f)})

winning_numbers[[1]]
[1]  3 11 12 14 41 43

Next write a function that takes any 6 numbers and returns the number of times those numbers have been the winning numbers in the past. Then tell the user how likely it is that those numbers will win the draw next time.

#write function that takes 6 numbers as an input and returns the number of times those numbers have been the winning numbers in the past. 

nw <- length(data$`NUMBER DRAWN 1`) #to find out how many interations are required in the for loop in the function.

check_historical_occurance <- function(a, b, c, d, e, f) {
  user_numbers <- c(a, b, c, d, e, f)
  user_sorted <- sort(user_numbers)
  won_before <- 0
  for(i in 1:nw){
    won_before <- if_else(setequal(user_sorted, winning_numbers[[i]])==TRUE, won_before+1, won_before)
  }
  cat("These numbers have been the winning numbers", won_before, "time(s) before", "\n", "The liklihood of winning with these numbers in the next draw is approximately 1 in 14 million")
}

#run the function with some random numbers
check_historical_occurance(2, 3, 4, 5, 6, 7)
These numbers have been the winning numbers 0 time(s) before 
 The liklihood of winning with these numbers in the next draw is approximately 1 in 14 million
#run the function again with some numbers we know have previously won the draw
check_historical_occurance(3, 11, 12, 14, 41, 43)
These numbers have been the winning numbers 1 time(s) before 
 The liklihood of winning with these numbers in the next draw is approximately 1 in 14 million
---
title: "Guided Project: Mobile App for Lottery Addiction"
output: html_notebook
---

Many people start playing the lottery for fun, but for some, this activity can escalate into habit and addiction. Similar to other compulsive gamblers, lottery addicts may start using savings and loans to buy tickets. 
A mobile app is required which is intended to guide lottery addicts through exercises that will let them better estimate their chances of winning. The hope is that this app will help them realize that buying too many tickets will do little to improve their chances of winning.
This code will form the logic for the app.
```{r}
#Load libraries and the data

library (readr)
library(stringr)
library(ggplot2)
library(dplyr)
library(purrr)
library(tidyr)
library(magrittr)
library(ggmap)
library(mapproj)
library(viridis)
library(RColorBrewer)
library(grid)

setwd("C:/Users/Ana/Desktop/Data Analytics/CSV Files")

#import data
data <- read_csv("649.csv") 

#view beginning and end of dataset
head(data, 3)
tail(data, 3)

#return number of rows in dataset
print(nrow(data))

#return number of columns in dataset
print(ncol(data))
```
```{r}

#count how many unique entries there are for each column
colnames(data)

for(i in colnames(data)){
  print(as.integer(count(unique(data[,i]))))
}

sort(unique(data$`NUMBER DRAWN 1`))
```
Next, calculate the probability of winning the draw with a single ticket.
```{r}
#The order of the permutations doesn't matter in this instance, so we are looking for the total number of combinations. This is given by C = n!/((n-k)!k!) 
#Where:
#n = total number of numbers i.e. 49, k = number allowed i.e. 6.

#first, write a function which will calculate the factorial of any number.

factorial <- function(n){
  answer<-1
  for(i in 1:n) {
    answer<-answer*i
  }
  return(answer)
}

#now write a function to calculate the number of permutations when only a certain number of balls are drawn.

permutation <- function(n, k) {
  factorial(n)/factorial(n-k)
}

#now write a function to calculate the number of combinations when only a certain number of balls are drawn.

combinations <- function(n, k) {
  permutation(n, k)/factorial(k)
}

#the probability of winning the draw with a single ticket is therefore 1 divided by the number of combinations.

combinations(49,6)
p_win_1t <- 1/combinations(49,6)
p_win_1t

```
The probability of winning the lottery with one ticket is 1 in ~14million or 7e-8 or 0.00000007 or 0.000007%

The app wants the user to be able to enter 6 numbers i.e. their chosen numbers, and for the app to display the chance of them winning with those numbers. Although I've already calculated the chance, I'll code to take a input of 6 numbers and return the chance of winning with those numbers in a user-friendly way.

```{r}
#write a function that takes an input of 6 numbers and returns the probability of winning with those 6 numbers
#the actual numbers selected doesn't make any difference, so k equals the length of the vector i.e. 6.
#the number of combinations i.e. n_combs is calculated by passing k through the combinations function. n set to 49 as this isn't changing.
#to make it easy for the user to understand the chance, I have printed a sentence which presents the liklihood as 1 in X

one_ticket_probability <- function(a, b, c, d, e, f) {
  vector <- c(a, b, c, d, e, f)
  print(vector)
  k <- length(vector)
  n_combs <- combinations(49, k)
  output <- sprintf("The chance of winning the lottery with these numbers is 1 in %i", n_combs)
  print(output)
}

#here's some example user imput for the numbers
one_ticket_probability(2, 19, 36, 12, 11, 1)

```
Next we want to look at the probability of winning the lottery if we play more than one ticket. For instance, what is the probability of winning if we play 40 tickets?
```{r}
#this function takes the usual inputs i.e. n = 49 and k = 6, and adds another input for the number of tickets bought, t. Since we are looking at whether the first ticket OR the second ticket OR the third ticket etc will win, we can add the probabilities. Since the probabilities of winning any ticket are the same, we can just multiply the probability of one ticket being the winning ticket by the number of tickets you buy. 

multi_ticket_probability <- function(t, n, k){
  p <- t/combinations(n, k)
  cat("The probability of winning if you buy", t, "tickets is", (p*100), "%", "\n")
}

num_tickets <- c(1, 10, 40, 100, 10000, 1000000, 6991908, 13983816)

for(i in num_tickets){
  multi_ticket_probability(i, 49, 6)
}

```
The probability of winning by buying 40 tickets is now 1 in ~350,000 or 0.000003 or 0.0003%. It is still an incredibly small chance.

Even if you buy 10,000 tickets, the chance you have a winning ticket is only 0.07%!

You'd need to buy a million tickets to have just a 7% chance of winning!

If you purchase 13983816 different tickets, you have a 100% chance of winning. This is because there are 13983816 combinations you can possibly have, so you must have the correct combination in there. Problem is that you'd have to spend £13983816 on buying the tickets, which is more than you're likely to win. 

Next, what is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?
First, create a function which returns the probability of getting exactly 5 (or 4 or 3) winning numbers on a single ticket.

```{r}
#the probability of picking 1 correct number is 6/49. 
#the probability of picking 2 correct numbers is 6/49*5/48
#the probability of picking all 6 correct answer is show below.

#(6/49)*(5/48)*(4/47)*(3/46)*(2/45)*(1/44)

#this can be written as P = k!/(n!/(n-k)!)
#P = 1/C where c = number of combinations 
#so, C = n!/((n-k)!k!) as already found above

#but there is another input needed for this question which is the number of matching balls, m
#for m = 2, then the formula is

#(6/49)*(5/48)

#i.e. (k!/(k-m)!)/(n!/(n-m)!)

#write a function to calculate the probability of getting m matching balls

matching_balls <- function(n, k, m){
  answer <- if_else(m!=6, (factorial(k)/factorial(k-m))/(factorial(n)/factorial(n-m)), 1/combinations(n, k))
}

matching_balls(49, 6, 1)

for (i in 1:6) {
  prob <- matching_balls(49, 6, i)
  cat("The probability of getting exactly", i, "matching numbers is", prob, "\n")
}
```
Now the probability of getting AT LEAST 5 matching numbers is probability of getting 5 OR 6 matching numbers. Therefore, addition can be used to sum up the inidivical probabilities.
```{r}
#write a function that calculated the probability of getting at least x number of matching balls. This uses the matching_balls function and then uses a for loop to iterate through x:6 to add up the probabilities. 

#the function takes three inputs - n = total number of balls, k = number of balls drawn, mx = number of matching numbers

at_least_x_matches <- function(n, k, mx){
  chance <- 0
  for(i in mx:6){
    chance<-chance+matching_balls(n, k, i)
  }
  cat("The probability of getting AT LEAST", mx, "matching numbers is", chance, "\n")
}

at_least_x_matches(49, 6, 1)

```
Next create a list containing all the winning number combinations.

```{r}
#extract the winning numbers as a vector. Each vector contains all the winning numbers by position. 

number_1 <- data$`NUMBER DRAWN 1`
number_2 <- data$`NUMBER DRAWN 2`
number_3 <- data$`NUMBER DRAWN 3`
number_4 <- data$`NUMBER DRAWN 4`
number_5 <- data$`NUMBER DRAWN 5`
number_6 <- data$`NUMBER DRAWN 6`

#create a list of all the winning numbers by position.

numbers_list <- list(number_1, number_2, number_3, number_4, number_5, number_6) 

#create vectors of each of the winning number combinations. 

winning_numbers <- pmap(numbers_list, function(a, b, c, d, e, f) {c(a,b,c,d,e,f)})

winning_numbers[[1]]

```
Next write a function that takes any 6 numbers and returns the number of times those numbers have been the winning numbers in the past. Then tell the user how likely it is that those numbers will win the draw next time. 

```{r}
#write function that takes 6 numbers as an input and returns the number of times those numbers have been the winning numbers in the past. 

nw <- length(data$`NUMBER DRAWN 1`) #to find out how many interations are required in the for loop in the function.

check_historical_occurance <- function(a, b, c, d, e, f) {
  user_numbers <- c(a, b, c, d, e, f)
  user_sorted <- sort(user_numbers)
  won_before <- 0
  for(i in 1:nw){
    won_before <- if_else(setequal(user_sorted, winning_numbers[[i]])==TRUE, won_before+1, won_before)
  }
  cat("These numbers have been the winning numbers", won_before, "time(s) before", "\n", "The liklihood of winning with these numbers in the next draw is approximately 1 in 14 million")
}

#run the function with some random numbers
check_historical_occurance(2, 3, 4, 5, 6, 7)

#run the function again with some numbers we know have previously won the draw
check_historical_occurance(3, 11, 12, 14, 41, 43)
```

