This is a hypothetical powerball simulation project.
Based on this information, if we want to make 100% sure that we hit the jackpot we need to spend this much money on the tickets (remember each ticket costs $2):
choose(69,5)*26*2
## [1] 584402676
For a long time I was trying to come up with a function that would simulate a powerball draw and record the lucky numbers (the complete set of six). I recently realized that a simple function could do such a big task.
To achieve that goal lets follow through these simple processes:
Powerball is one of the most popular lottery program in the United States. It was established in 1992 and the tickets can be bought in 45 states, the District of Columbia, Puerto Rico, and the U.S. Virgin Islands. The tickets are sold for $2 per play. The overall chance of hitting the Jackpot (the highest amount money on a specific play) is 1 in 292.2 million.
Anyone with the matching 5 combinations of numbers between 1:69, followed by the matching Powerball number between 1:26, can claim the Jackpot. Thus, there are total of 6 combinations of numbers the players has to choose.
I am going to create two sets of vectors namely, a and b. Vector ‘a’ represents the first five set of numbers which can be any whole number between 1 through 69 inclusive. The poweball number aka the final number is represented by the vector ‘b’. This number can be anything between 1 and 26, inclusive.
set.seed(0914)
a = 1:69 #Creates vector 'a'
b = 1:26 #Creates vector 'b'
print(a)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
[51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
print(b) #checking to make sure the vectors were created
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26
Exactly. The vectors are created and I can see that there are numbers 1:69 in a, and only 1:26 in b.
For this, we need 5 random numbers from a and 1 from b. Let R choose the lucky numbers for us.
Note: If we don’t set a seed then we get different set of numbers every time we run this code.
set.seed(0916)
first_five<- sample(a, size = 5, replace = FALSE)
powerball<- sample(b, size = 1)
first_five
[1] 47 37 56 43 66
powerball
[1] 19
The five lucky numbers for today’s draw are: 47, 37, 56, 43, 66 and the Powerball number is 19. Anyone with this number in their ticket is a millionaire.
As a statistician, I don’t kind of believe in getting something 100% right in one draw. I want to draw five complete draws and will randomly choose 1 from each order and make my lucky lucky number.
To do this, I want to create an empty matrix that will store my draws for each of the five draws. Rows in this matrix refer to complete draws and columns refers to each of the 5 + 1 number combination that the player needs to win the Jackpot. The column, thus, are named as “First”, “Second”, “Third”, “Forth”, “Fifth”, “Sixth”. Remember, the first 5 numbers come from 1:69 and the numbers in the last column can be any between 1 and 26.
set.seed(0916)
five_draws <- matrix(ncol=6, nrow=5, data=NA)
colnames(five_draws) <- c("First", "Second", "Third", "Forth", "Fifth", "Sixth")
five_draws
First Second Third Forth Fifth Sixth
[1,] NA NA NA NA NA NA
[2,] NA NA NA NA NA NA
[3,] NA NA NA NA NA NA
[4,] NA NA NA NA NA NA
[5,] NA NA NA NA NA NA
Based on the given output, I have successfully created an empty matrix with six columns that will record 5 complete powerball draws from my simulation.
I don’t have any data at the moment, because I haven’t run the simulation, yet. Now, I am going to create a function that will run my simulation five times for the first five sets of numbers and another five times for the powerball number. These numbers will then, be stored in my empty matrix.
set.seed(0916)
for (i in 1:5){
five_draws[i:5,1:5] <- sample(a,size = 5, replace=FALSE)
five_draws[i:5,6] <- sample(b, size=1)
}
five_draws
First Second Third Forth Fifth Sixth
[1,] 47 47 47 47 47 19
[2,] 58 17 22 21 65 7
[3,] 28 57 27 32 59 6
[4,] 50 15 29 47 45 7
[5,] 39 63 40 10 58 24
Here they are, I successfully simulated five complete powerball draws. Now, I want to draw random sample from each of the column and that will give me my final number.
set.seed(0916)
##Setting each of the column as a vector. These numbers come from the above table.
p <- c("47","58","28","50","39")
o <- c("47","17","57","15","63")
w <- c("47","22","27","29","40")
e <- c("47","17","57","15","63")
r <- c("47","65","59","45","58")
ball <- c("19","7","6","7","24")
##Randomly selecting 1 number from all of the vectors above
p_selected <- sample(p,size=1)
o_selected <- sample(o,size=1)
w_selected <- sample(w,size=1)
e_selected <- sample(e,size=1)
r_selected <- sample(r,size=1)
ball_selected <- sample(ball,size=1)
##My final number
cbind(p_selected, o_selected, w_selected, e_selected, r_selected, ball_selected)
p_selected o_selected w_selected e_selected r_selected ball_selected
[1,] "58" "63" "40" "57" "59" "6"
As can be seen, I have come up with the luckiest powerball number for today’s draw. My numbers are 58, 63, 40, 57, 59 and the powerball 6.
It has slightly better chance of hitting the jackpot because I simulated the draw 5 times. As a statistician, I really want to increase my chances of winning the prize. I am now going to simulate 100 draws and I will randomly select my number from there.
set.seed(0980)
hundred_draws <- matrix(ncol=6, nrow=100, data=NA)
colnames(hundred_draws) <- c("First", "Second", "Third", "Forth", "Fifth", "Sixth")
head(hundred_draws)
First Second Third Forth Fifth Sixth
[1,] NA NA NA NA NA NA
[2,] NA NA NA NA NA NA
[3,] NA NA NA NA NA NA
[4,] NA NA NA NA NA NA
[5,] NA NA NA NA NA NA
[6,] NA NA NA NA NA NA
for (i in 1:100){
hundred_draws[i:100,1:5] <- sample(a,size = 5, replace=FALSE)
hundred_draws[i:100,6] <- sample(b, size=1)
}
head(hundred_draws)
First Second Third Forth Fifth Sixth
[1,] 49 49 49 49 49 16
[2,] 59 11 6 60 50 6
[3,] 52 58 65 14 21 26
[4,] 23 59 14 58 9 7
[5,] 1 16 48 34 31 11
[6,] 10 10 10 10 10 2
I am not going to check the most popular numbers by the columns.
apply(X=hundred_draws, MARGIN=2, FUN = table)
$First
1 2 3 6 7 9 10 11 12 13 14 16 17 18 23 25 26 27 28 29 30 31 32 33 34 35
3 1 1 1 2 1 2 1 3 2 2 1 1 1 2 1 2 1 1 1 1 2 1 1 2 1
37 38 40 41 42 43 45 46 47 48 49 50 51 52 54 55 56 58 59 60 61 62 64 65 66 67
1 4 1 2 3 3 2 7 2 1 3 1 1 3 3 1 1 3 3 3 1 1 3 2 1 1
68 69
3 2
$Second
1 3 6 7 8 9 10 11 12 13 14 15 16 17 18 20 21 22 23 24 25 26 29 31 32 34
1 1 2 1 1 1 1 2 4 2 1 3 2 2 2 1 2 1 1 1 2 1 1 6 2 1
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 54 55 56 57 58 59 60 61 62 64
3 1 2 3 1 3 1 1 1 2 1 1 5 2 1 1 2 1 3 3 2 3 2 1 2 2
65 68 69
2 2 1
$Third
1 3 4 5 6 7 8 10 11 12 13 14 17 18 19 21 22 23 24 25 26 27 28 29 30 31
3 2 4 1 2 2 2 2 2 1 3 2 1 3 1 2 2 3 1 1 1 2 1 2 4 3
32 33 35 36 37 38 40 41 42 43 44 45 46 47 48 49 50 51 52 54 56 57 59 61 62 64
1 1 2 1 2 1 1 2 2 1 1 1 1 2 2 3 1 2 1 1 3 1 1 3 1 5
65 67 69
2 1 1
$Forth
1 2 3 5 6 7 10 11 12 13 14 15 16 18 20 21 23 24 25 26 27 28 29 30 31 34
2 1 1 3 2 2 3 2 2 2 1 1 3 1 3 1 1 1 2 4 2 1 1 1 1 3
35 36 37 38 39 40 41 42 44 45 47 48 49 50 51 52 54 55 56 58 59 60 61 62 63 64
1 3 2 2 3 1 1 2 1 1 1 3 3 2 1 2 1 2 3 3 2 2 3 2 1 1
65 66 69
2 1 1
$Fifth
1 2 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 21 23 25 26 28 30 31 34 35
2 3 2 2 3 2 1 6 2 1 2 1 2 1 1 2 1 1 1 1 1 2 2 3 3 1
36 37 38 39 40 42 44 46 47 48 49 50 51 52 55 57 58 59 60 62 63 64 65 66 67 68
3 2 1 1 1 2 4 2 4 1 3 2 2 2 2 1 1 1 2 3 3 1 1 1 2 1
69
2
$Sixth
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
1 4 1 2 1 8 3 3 6 9 8 5 1 3 3 4 4 2 7 4 4 6 3 3 3 2
Looks like, there was somewhat equal distribution of numbers in each of these columns. These are the numbers that repeated the most:
Column 1 : 46 (7 times)
Column 2 : 31 (6 times)
Column 3 : 64 (5 times)
Column 4 : 26 (4 times)
Column 5 : 44 (4 times) &
Column 6 : 10 (9 times)
Thus, my number is: 46,31,64,26,44 & 10
Now, Lets create some Histograms for each of these selections by the columns and decide the final numbers from there.
set.seed(0417)
HT_draws <- matrix(ncol=6, nrow=100000, data=NA)
colnames(HT_draws) <- c("First", "Second", "Third", "Forth", "Fifth", "Sixth")
for (i in 1:100000){
HT_draws[i:100000,1:5] <- sample(a,size = 5, replace=FALSE)
HT_draws[i:100000,6] <- sample(b, size=1)
}
head(HT_draws)
First Second Third Forth Fifth Sixth
[1,] 37 37 37 37 37 22
[2,] 36 19 16 1 17 3
[3,] 58 12 24 14 40 5
[4,] 58 65 23 7 40 2
[5,] 15 36 60 3 11 12
[6,] 8 8 8 8 8 13
First Second Third Forth Fifth Sixth
1: 37 37 37 37 37 22
2: 36 19 16 1 17 3
3: 58 12 24 14 40 5
4: 58 65 23 7 40 2
5: 15 36 60 3 11 12
6: 8 8 8 8 8 13