Predicting the next 10 winning lines, using historical data (180days (06/09/2023 - 12/01/2023)). Code explanation below code chunk.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Load the data
file_path <- "euromillions-draw-history.csv"
lottery_data <- read.csv(file_path)

# Convert the DrawDate to a Date type with the new format
lottery_data$DrawDate <- as.Date(lottery_data$DrawDate, format = "%d %b %Y")

# Reshape the data to long format - for easier analysis
lottery_data_long <- lottery_data %>%
  pivot_longer(
    cols = starts_with("Ball"),
    names_to = "Ball_Type",
    values_to = "Winning_Number"
  )

# Calculate the frequency of each winning number
frequency_data <- lottery_data_long %>%
  group_by(Winning_Number) %>%
  summarise(Frequency = n()) %>%
  arrange(desc(Frequency))

# Create a complete set of numbers for main balls and lucky stars
all_main_numbers <- 1:50
all_lucky_stars <- 1:12

# Merge with frequency_data to get probabilities for all numbers
main_numbers_probs <- merge(data.frame(Winning_Number = all_main_numbers), frequency_data, all.x = TRUE)
lucky_stars_probs <- merge(data.frame(Winning_Number = all_lucky_stars), frequency_data, all.x = TRUE)

# Fill NAs with 0 probability
main_numbers_probs$Frequency[is.na(main_numbers_probs$Frequency)] <- 0
lucky_stars_probs$Frequency[is.na(lucky_stars_probs$Frequency)] <- 0

# Define a function to predict the next winning combination
predict_next_winning_combination <- function(main_numbers_probs, lucky_stars_probs) {
  # Calculate the probability distribution based on observed frequencies
  main_distribution <- main_numbers_probs$Frequency / sum(main_numbers_probs$Frequency)
  lucky_stars_distribution <- lucky_stars_probs$Frequency / sum(lucky_stars_probs$Frequency)
  
  # Sample main numbers from the probability distribution to generate a combination
  main_numbers <- sample(all_main_numbers, 5, replace = TRUE, prob = main_distribution)
  
  # Ensure uniqueness within the main numbers
  while (any(duplicated(main_numbers))) {
    main_numbers <- sample(all_main_numbers, 5, replace = TRUE, prob = main_distribution)
  }
  
  # Sample Lucky Stars from the probability distribution
  lucky_stars <- sample(all_lucky_stars, 2, replace = TRUE, prob = lucky_stars_distribution)
  
  # Ensure uniqueness within the lucky stars
  while (any(duplicated(lucky_stars))) {
    lucky_stars <- sample(all_lucky_stars, 2, replace = TRUE, prob = lucky_stars_distribution)
  }
  
  # Return the predicted winning combination
  return(c(main_numbers, lucky_stars))
}

# Predict the next 10 winning combinations
next_10_winning_combinations <- lapply(1:10, function(i) 
  predict_next_winning_combination(main_numbers_probs, lucky_stars_probs)
)

# Print the predicted winning combinations
cat("Predicted Winning Combinations:\n")
## Predicted Winning Combinations:
for (i in 1:10) {
  cat("Set", i, ": Main Numbers -", next_10_winning_combinations[[i]][1:5], 
      ", Lucky Stars -", next_10_winning_combinations[[i]][6:7], "\n")
}
## Set 1 : Main Numbers - 36 50 34 40 48 , Lucky Stars - 11 10 
## Set 2 : Main Numbers - 23 50 29 35 5 , Lucky Stars - 12 11 
## Set 3 : Main Numbers - 21 15 50 40 36 , Lucky Stars - 5 2 
## Set 4 : Main Numbers - 14 4 32 49 29 , Lucky Stars - 2 11 
## Set 5 : Main Numbers - 31 35 7 48 20 , Lucky Stars - 4 11 
## Set 6 : Main Numbers - 34 21 23 33 12 , Lucky Stars - 8 11 
## Set 7 : Main Numbers - 35 10 45 16 48 , Lucky Stars - 2 10 
## Set 8 : Main Numbers - 40 17 25 26 19 , Lucky Stars - 10 8 
## Set 9 : Main Numbers - 7 27 40 38 11 , Lucky Stars - 3 9 
## Set 10 : Main Numbers - 17 48 6 13 23 , Lucky Stars - 6 1

Loading and Preprocessing Data:

The code starts by loading lottery data from a CSV file and converting the DrawDate column to the Date type.

The data is then reshaped to a long format using the pivot_longer function from the tidyverse package. This makes it easier to analyse.

Calculating Frequencies:

The frequency of each winning number is calculated using the group_by andsummarise functions from the dplyr package.

The resulting frequency data is then arranged in descending order.

Creating a Complete Set of Numbers:

Sets of all possible main numbers (1 to 50) and lucky stars (1 to 12) are created.

Merging with Frequency Data:

The sets of all possible numbers are merged with the frequency data to obtain probabilities for each number.

Defining the Prediction Function:

The predict_next_winning_combination function calculates the probability distributions for both main numbers and lucky stars based on observed frequencies.

It then samples main numbers and lucky stars from these distributions, ensuring uniqueness within each set.

If duplicates are found, the function resamples until a unique combination is obtained.

Predicting the Next 10 Winning Combinations:

The function is applied ten times using lapply to predict the next 10 winning combinations.

Printing the Predictions:

The predicted winning combinations are printed specifying the set number, main numbers, and lucky stars for each prediction.

In summary, the code leverages the observed frequencies of winning numbers in historical data to generate predictions for the next set of winning combinations. The uniqueness of numbers within each set is ensured to avoid duplicates in the predictions. The code is designed to be flexible, allowing you to easily modify parameters such as the number of predictions or the ranges of possible numbers.