Predicting the next 10 winning lines, using historical data (180days
(06/09/2023 - 12/01/2023)). Code explanation below code chunk.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Load the data
file_path <- "euromillions-draw-history.csv"
lottery_data <- read.csv(file_path)
# Convert the DrawDate to a Date type with the new format
lottery_data$DrawDate <- as.Date(lottery_data$DrawDate, format = "%d %b %Y")
# Reshape the data to long format - for easier analysis
lottery_data_long <- lottery_data %>%
pivot_longer(
cols = starts_with("Ball"),
names_to = "Ball_Type",
values_to = "Winning_Number"
)
# Calculate the frequency of each winning number
frequency_data <- lottery_data_long %>%
group_by(Winning_Number) %>%
summarise(Frequency = n()) %>%
arrange(desc(Frequency))
# Create a complete set of numbers for main balls and lucky stars
all_main_numbers <- 1:50
all_lucky_stars <- 1:12
# Merge with frequency_data to get probabilities for all numbers
main_numbers_probs <- merge(data.frame(Winning_Number = all_main_numbers), frequency_data, all.x = TRUE)
lucky_stars_probs <- merge(data.frame(Winning_Number = all_lucky_stars), frequency_data, all.x = TRUE)
# Fill NAs with 0 probability
main_numbers_probs$Frequency[is.na(main_numbers_probs$Frequency)] <- 0
lucky_stars_probs$Frequency[is.na(lucky_stars_probs$Frequency)] <- 0
# Define a function to predict the next winning combination
predict_next_winning_combination <- function(main_numbers_probs, lucky_stars_probs) {
# Calculate the probability distribution based on observed frequencies
main_distribution <- main_numbers_probs$Frequency / sum(main_numbers_probs$Frequency)
lucky_stars_distribution <- lucky_stars_probs$Frequency / sum(lucky_stars_probs$Frequency)
# Sample main numbers from the probability distribution to generate a combination
main_numbers <- sample(all_main_numbers, 5, replace = TRUE, prob = main_distribution)
# Ensure uniqueness within the main numbers
while (any(duplicated(main_numbers))) {
main_numbers <- sample(all_main_numbers, 5, replace = TRUE, prob = main_distribution)
}
# Sample Lucky Stars from the probability distribution
lucky_stars <- sample(all_lucky_stars, 2, replace = TRUE, prob = lucky_stars_distribution)
# Ensure uniqueness within the lucky stars
while (any(duplicated(lucky_stars))) {
lucky_stars <- sample(all_lucky_stars, 2, replace = TRUE, prob = lucky_stars_distribution)
}
# Return the predicted winning combination
return(c(main_numbers, lucky_stars))
}
# Predict the next 10 winning combinations
next_10_winning_combinations <- lapply(1:10, function(i)
predict_next_winning_combination(main_numbers_probs, lucky_stars_probs)
)
# Print the predicted winning combinations
cat("Predicted Winning Combinations:\n")
## Predicted Winning Combinations:
for (i in 1:10) {
cat("Set", i, ": Main Numbers -", next_10_winning_combinations[[i]][1:5],
", Lucky Stars -", next_10_winning_combinations[[i]][6:7], "\n")
}
## Set 1 : Main Numbers - 36 50 34 40 48 , Lucky Stars - 11 10
## Set 2 : Main Numbers - 23 50 29 35 5 , Lucky Stars - 12 11
## Set 3 : Main Numbers - 21 15 50 40 36 , Lucky Stars - 5 2
## Set 4 : Main Numbers - 14 4 32 49 29 , Lucky Stars - 2 11
## Set 5 : Main Numbers - 31 35 7 48 20 , Lucky Stars - 4 11
## Set 6 : Main Numbers - 34 21 23 33 12 , Lucky Stars - 8 11
## Set 7 : Main Numbers - 35 10 45 16 48 , Lucky Stars - 2 10
## Set 8 : Main Numbers - 40 17 25 26 19 , Lucky Stars - 10 8
## Set 9 : Main Numbers - 7 27 40 38 11 , Lucky Stars - 3 9
## Set 10 : Main Numbers - 17 48 6 13 23 , Lucky Stars - 6 1
Loading and Preprocessing Data:
The code starts by loading lottery data from a CSV file and
converting the DrawDate
column to the Date type.
The data is then reshaped to a long format using the
pivot_longer
function from the tidyverse
package. This makes it easier to analyse.
Calculating Frequencies:
The frequency of each winning number is calculated using the
group_by
andsummarise
functions from the
dplyr
package.
The resulting frequency data is then arranged in descending
order.
Creating a Complete Set of Numbers:
Sets of all possible main numbers (1 to 50) and lucky stars (1 to
12) are created.
Merging with Frequency Data:
The sets of all possible numbers are merged with the frequency data
to obtain probabilities for each number.
Defining the Prediction Function:
The predict_next_winning_combination
function
calculates the probability distributions for both main numbers and lucky
stars based on observed frequencies.
It then samples main numbers and lucky stars from these
distributions, ensuring uniqueness within each set.
If duplicates are found, the function resamples until a unique
combination is obtained.
Predicting the Next 10 Winning Combinations:
The function is applied ten times using lapply
to
predict the next 10 winning combinations.
Printing the Predictions:
The predicted winning combinations are printed specifying the set
number, main numbers, and lucky stars for each prediction.
In summary, the code leverages the observed frequencies of winning
numbers in historical data to generate predictions for the next set of
winning combinations. The uniqueness of numbers within each set is
ensured to avoid duplicates in the predictions. The code is designed to
be flexible, allowing you to easily modify parameters such as the number
of predictions or the ranges of possible numbers.