Introduction

The purpose of this investigation is to establish whether or not it is possible to predict future winning lottery numbers based on past winning results. To do this, we will analyze the past 20 years of winning numbers for the Mega Millions lottery to answer the following questions:

  • Is it possible to predict future winning lottery numbers based on previous results?

  • Can a pattern be identified regarding which winning numbers are drawn most often per game over time?

  • Does using these numbers increase our chances of winning?

  • In terms of winning, is it better to play the same numbers, or pick different numbers every game?

  • What strategies can be applied to ensure our chances of winning are increased (if any)?

  • How do winning numbers correlate to payout amounts?

  • What statistical theories are applicable to our goal, and which theories can benefit us (if any):

Similar to previous studies, we want to understand whether or not it is possible to obtain an edge when playing the lottery. With the Data that we have gathered (From the nylottery and megamillions websites), we will identify data trends and use simulations to answer the aforementioned questions. In addition to testing various "winning strategies", we also want to explore lottery payout amounts, as jackpot numbers change, and prize money expections.

Mega Millions Lottery Game

Mega Millions is a US multi-state lottery game that is drawn twice a week on Tuesdays and Fridays. To participate in the game, players purchase one or more lottery tickets at a cost of $2 per ticket. Players must pick 5 unique numbers from 1 to 70, and 1 number from 1 to 25 for the Mega Ball. Optionally, players can opt for randomly selected numbers at the time of ticket purchase.

Game Rules

Lottery numbers are drawn randomly twice a week on Tuesdays and Fridays at 11PM (EST). 50% of lottery ticket sales is allocated as prize money (_75% of which is__used for the jackpot payout_). If all of the numbers selected by the player (including the Mega Ball) are drawn, then the player wins the jackpot. If multiple players win the jackpot, then the jackpot is divided equally among the winning players. If no one wins the jackpot for a particular drawing, then the jackpot amount rolls over to the next drawing. The remaining 25% of the prize money is then allocated to lower tiered prizes.

Mega Millions Prize Tiers

Prize Tiers and Payouts
Prize Winning Criterion Payout Odds of Winning
Jackpot 5 matching numbers with the Mega Ball 75% of the total prize money 1 in 302,575,350
Second Prize 5 matching numbers $1,000,000 1 in 12,607,306
Third Prize 4 matching numbers with the Mega Ball $10,000 1 in 931,001
Fourth Prize 4 matching numbers $500 1 in 38,792
Fifth Prize 3 matching numbers with the Mega Ball $200 1 in 14,547
Sixth Prize 3 matching numbers $10 1 in 606
Seventh Prize 2 matching numbers with the Mega Ball $10 1 in 693
Eighth Prize 1 matching number with the Mega Ball $4 1 in 89
Ninth Prize No matching numbers with the Mega Ball $2 1 in 37

 

Additionally, Players can pay an extra dollar to purchase a multiplier number. Multiplier numbers range from 2 to 5.

If a player picks a winning multiplier number and also winning lottery numbers, then the payout for the winning lottery numbers will be multiplied by the winning multiplier number.

Relevant Academic Papers and Articles

Most of these papers agree that trying to predict next weeks lottery numbers is absurd. Despite popular beliefs and fallacies, there is no reliable way to predict the lottery. If the odds of winning the lottery are so small, someone may ask: "how come people win all the time?".

Although the odds of winning the Mega Millions are about one in 300 million, people win all the time because millions of people play weekly. While we agree with the overall consensus, that the lottery is unpredictable, we still want to see if there are any strategies to even get a slight edge when playing.

Data Source

The main dataset that will be used for this project can be found on the NY Open Data website, and contains biweekly winning Mega Millions lottery numbers from 2002, through the current week. The dataset is in CSV format and can be accessed here:

Lottery Mega Millions Winning Numbers: Beginning 2002.

Winning history: https://www.megamillions.com/jackpot-history

Drawing history: https://www.megamillions.com/Winning-Numbers/Previous-Drawings.aspx

As of writing, the set consists of 2058 observations broken into 4 columns:

  • Draw Date: The date on which the lottery draw took place.
  • Winning Numbers: The winning numbers for the given lottery draw.
  • Mega Ball: The winning Mega Ball for the draw.
  • Multiplier: The winning multiplier number for the draw.

Approach

It is impossible to evaluate our predictions against future winning lottery numbers so we will split the dataset into separate training, and test sets. The training set will be used to identify winning number patterns, and the test set will be used to establish if our predictions influence the winning numbers.

Lottery Number Selection Strategies

There are several lottery number selection strategies that we need to simulate:

Same Number Selection
This is the simplist selection strategy. We will pick the 6 numbers required by the game (5 unique numbers, and 1 Mega Ball number) within the required range (1 to 70 for unique numbers, and 1 to 25 for the Mega Ball), and a Multiplier number within the required range of 2 to 5.

Random Number Selection
To simulate picking random lottery numbers, we will write a number generator in R that randomly generates the required 6 numbers and optional Multiplier number.

Most Frequent Winning Numbers Selection
To generate number selections based on the most frequent winning numbers, we will write an R program that mines the training data and counts how often specific winning numbers occur. We will then rank these numbers and based on how high specific numbers rank, we will include them in our selection.

Selection Strategy Evaluation

In order to evaluate which selection strategy (if any) is most effective, we will compare the strategy selections against the test data to evaluate which strategy results in winning selections most often. This will entail writing an R program that runs each individual strategy against the test data. The program will count the amount of times each strategy results in winning results, and rank them accordingly. Whichever strategy ranks highest, will tell us which strategy is most effective at predicting winning lottery results.