Overview

This page covers a diverse range of data analytics, data science, and machine learning techniques and topics I have worked on, each focusing on a different event or domain. Hope you find my findings interesting!

The code I used for the following projects can all be found in my GitHub Repository: https://github.com/andrejcc04/Portfolio

All the projects and models I’ve built are available in the Tab on the left.

Machine Learning Models

Welcome to my portfolio of machine learning projects. Dive into a collection that spans across diverse domains, showcasing my passion for leveraging data to uncover insights and make predictions. In the Formula 1 Lap Time Prediction Project, I delve into the world of motorsport analytics, employing advanced algorithms to forecast lap times with precision, crucial for race strategy optimization. Moving to the Formula 1 Race Winner Prediction Project, I harness historical race data to develop models that predict race outcomes, offering valuable insights into driver and team performance factors.

Shifting gears to the realm of sports analytics, the Premier League Champion Prediction Project employs statistical modeling to forecast league winners based on team performance metrics over multiple seasons. In the financial domain, the XLK Stock Price Prediction Project focuses on predicting stock prices using historical market data and machine learning techniques, aiding in investment decision-making with robust predictive models. Lastly, the American House Price Prediction Project explores the dynamics of real estate valuation, utilizing key property features to predict housing prices accurately. Explore these projects to see how data science and machine learning can illuminate trends and drive informed decision-making across diverse fields.

1. Formula One Machine Learning Models & Data Analysis

Ensemble Methods Model to Predict Race Winner

As I was watching the 2024 Monaco Grand Prix- I wondered how hard it would be to build a Machine Learning model that could (somewhat) successfully predict the winner of the races to come, so I got to work and after a couple of days of trial and error I managed to build a model. I started off with just a Random Forest Classification model, but it wasn’t giving me the results I wanted. Therefore, I figured I should create 3 different classification models (Support Vector Machine, Random Forest, and Gradient Boosting) to then combine them all in this Ensemble Methods model and see if that got me more precise results… It did!

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 163   8
##          1   2   7
##                                          
##                Accuracy : 0.9444         
##                  95% CI : (0.9002, 0.973)
##     No Information Rate : 0.9167         
##     P-Value [Acc > NIR] : 0.1081         
##                                          
##                   Kappa : 0.5556         
##                                          
##  Mcnemar's Test P-Value : 0.1138         
##                                          
##             Sensitivity : 0.9879         
##             Specificity : 0.4667         
##          Pos Pred Value : 0.9532         
##          Neg Pred Value : 0.7778         
##              Prevalence : 0.9167         
##          Detection Rate : 0.9056         
##    Detection Prevalence : 0.9500         
##       Balanced Accuracy : 0.7273         
##                                          
##        'Positive' Class : 0              
## 

## Predicted Winner of the 2024 Austrian Grand Prix Styrian Grand Prix : Max Verstappen

Accuracy: The overall accuracy of 0.9593 indicates the proportion of correct predictions made by the model on the entire test dataset.

Precision: The precision quantifies the model’s ability to avoid incorrectly predicting a driver did not win the race, when they actually did (false positives). Therefore, a high precision score of 0.957 indicates that when the model predicts a driver as not winning the race, it is correct about 95.7% of the time.

Sensitivity (Recall): The sensitivitiy quantifies the model’s ability to successfully capture all cases where a driver didn’t win (positive cases). The sensitivity score of 1 suggests that the model is able to successfully capture a high proportion of the cases where a driver actually did not win the race. This means that when a driver did not win the race, the model correctly identifies them as such 100% of the time.

Specificity: The specificity quantifies the models ability to correctly identify cases where a driver won (negative cases). A specificity score of 0.5 means that approximately 50% of the time when a driver actually won the race, the model incorrectly predicts that they didn’t win the race

F1 Score: The F1 score of 0.978 is a harmonic mean of precision and recall, providing a balanced measure of the model’s performance. It combines both the precision and sensitivity of the model into a single metric.

Balanced Accuracy: The balanced accuracy of 0.75 accounts for class imbalance by taking the average of sensitivity and specificity. It provides a more reliable measure of model performance when dealing with imbalanced datasets.

_________________________________________________

Linear Regression Model for Lap Time Predictions, Pre & Post Race Analysis

This project focuses more on developing a linear regression model to predict a laptime given the driver, circuit, and lap. Below I display Sergio Perez’s Lap 30 time prediction in each circuit of the 2024 F1 calendar Season, and then I offer a post-race analysis featuring multiple race-descriptive plots, and a a review of my pre-race laptime prediciton.

Note: 1 second = 1000 milliseconds

2024 Bahrain Grand Prix (Bahrain)

## PRE-RACE PREDICTIONS: Predicted lap time of Sergio Pérez on lap 30 in the 2024 Bahrain Grand Prix : 1:37:785 
## Root Mean Squared Error: 1737 milliseconds 
## 
## POST RACE ANALYSIS: Laptime Prediction was off by 1472 milliseconds
## [[1]]
##   forename surname year               name lap Predicted Laptime Actual Laptime
## 1   Sergio   Pérez 2024 Bahrain Grand Prix  30          1:37:785       1:36.313
## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

2024 Saudi Arabian Grand Prix (Jeddah)

## PRE-RACE PREDICTIONS: Predicted lap time of Sergio Pérez on lap 30 in the 2024 Saudi Arabian Grand Prix : 1:34:406 
## Root Mean Squared Error: 1509 milliseconds 
## 
## POST RACE ANALYSIS: Laptime Prediction was off by 1624 milliseconds
## [[1]]
##   forename surname year                     name lap Predicted Laptime
## 1   Sergio   Pérez 2024 Saudi Arabian Grand Prix  30          1:34:406
##   Actual Laptime
## 1       1:32.782
## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

2024 Autralian Grand Prix (Melbourne)

## PRE-RACE PREDICTIONS: Predicted lap time of Sergio Pérez on lap 30 in the 2024 Australian Grand Prix : 1:22:280 
## Root Mean Squared Error: 1550 milliseconds 
## 
## POST RACE ANALYSIS: Laptime Prediction was off by 100 milliseconds
## [[1]]
##   forename surname year                  name lap Predicted Laptime
## 1   Sergio   Pérez 2024 Australian Grand Prix  30          1:22:280
##   Actual Laptime
## 1       1:22.180
## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

2024 Japanese Grand Prix (Suzuka)

## PRE-RACE PREDICTIONS: Predicted lap time of Sergio Pérez on lap 30 in the 2024 Japanese Grand Prix : 1:37:392 
## Root Mean Squared Error: 1683 milliseconds 
## 
## POST RACE ANALYSIS: Laptime Prediction was off by 295 milliseconds
## [[1]]
##   forename surname year                name lap Predicted Laptime
## 1   Sergio   Pérez 2024 Japanese Grand Prix  30          1:37:392
##   Actual Laptime
## 1       1:37.097
## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

2024 Chinese Grand Prix (Shanghai)

## Predicted lap time of Sergio Pérez on lap 30 in the 2024 Chinese Grand Prix: 1:41:434
## Root Mean Squared Error: 1243 milliseconds
##   forename surname year               name lap china_predicted_laptime
## 1   Sergio   Pérez 2024 Chinese Grand Prix  30                  101434
##   Predicted Laptime Actual Laptime milliseconds
## 1          1:41:434       2:21.644       141644
## POST RACE ANALYSIS: Off by 40210 milliseconds
## Safety Car lap 21-31

2024 Miami Grand Prix (Miami)

## PRE-RACE PREDICTIONS: Predicted lap time of Sergio Pérez on lap 30 in the 2024 Miami Grand Prix : 1:32:656 
## Root Mean Squared Error: 1334 milliseconds 
## 
## POST RACE ANALYSIS: Laptime Prediction was off by 27937 milliseconds
## [[1]]
##   forename surname year             name lap Predicted Laptime Actual Laptime
## 1   Sergio   Pérez 2024 Miami Grand Prix  30          1:32:656       2:00.593
## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## Safety Car Laps 28-32

2024 Emilia-Romagna Grand Prix (Imola)

## PRE-RACE PREDICTIONS: Predicted lap time of Sergio Pérez on lap 30 in the 2024 Emilia Romagna Grand Prix : 1:22:158 
## Root Mean Squared Error: 1449 milliseconds 
## 
## POST RACE ANALYSIS: Laptime Prediction was off by 1105 milliseconds
## [[1]]
##   forename surname year                      name lap Predicted Laptime
## 1   Sergio   Pérez 2024 Emilia Romagna Grand Prix  30          1:22:158
##   Actual Laptime
## 1       1:23.263
## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

2024 Monaco Grand Prix (Monte Carlo)

## PRE-RACE PREDICTIONS: Predicted lap time of Sergio Pérez on lap 30 in the 2024 Monaco Grand Prix : 1:19:511 
## Root Mean Squared Error: 2198 milliseconds 
## 
## POST RACE ANALYSIS: Laptime Prediction was off by NA milliseconds
## [[1]]
##   forename surname year              name lap Predicted Laptime Actual Laptime
## 1   Sergio   Pérez 2024 Monaco Grand Prix  30          1:19:511           <NA>
## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## Crashed out lap 1

2024 Canadian Grand Prix (Montreal)

## PRE-RACE PREDICTIONS: Predicted lap time of Sergio Pérez on lap 30 in the 2024 Canadian Grand Prix : 1:17:692 
## Root Mean Squared Error: 1120 milliseconds 
## 
## POST RACE ANALYSIS: Laptime Prediction was off by 11641 milliseconds
## [[1]]
##   forename surname year                name lap Predicted Laptime
## 1   Sergio   Pérez 2024 Canadian Grand Prix  30          1:17:692
##   Actual Laptime
## 1       1:29.333
## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

2024 Spanish Grand Prix (Catalunya)

## PRE-RACE PREDICTIONS: Predicted lap time of Sergio Pérez on lap 30 in the 2024 Spanish Grand Prix : 1:20:814 
## Root Mean Squared Error: 4366 milliseconds 
## 
## POST RACE ANALYSIS: Laptime Prediction was off by 589 milliseconds
## [[1]]
##   forename surname year               name lap Predicted Laptime Actual Laptime
## 1   Sergio   Pérez 2024 Spanish Grand Prix  30          1:20:814       1:21.403
## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

2024 Austrian Grand Prix (Spielberg)

## PRE-RACE PREDICTIONS: Predicted lap time of Sergio Pérez on lap 30 in the 2024 Austrian Grand Prix : 1:11:208 
## Root Mean Squared Error: 1199 milliseconds 
## 
## POST RACE ANALYSIS: Laptime Prediction was off by 515 milliseconds
## [[1]]
##   forename surname year                name lap Predicted Laptime
## 1   Sergio   Pérez 2024 Austrian Grand Prix  30          1:11:208
##   Actual Laptime
## 1       1:10.693
## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

2024 British Grand Prix (Silverstone)

## PRE-RACE PREDICTIONS: Predicted lap time of Sergio Pérez on lap 30 in the 2024 British Grand Prix : 1:32:185 
## Root Mean Squared Error: 1523 milliseconds 
## 
## POST RACE ANALYSIS: Laptime Prediction was off by 7836 milliseconds
## [[1]]
##   forename surname year               name lap Predicted Laptime Actual Laptime
## 1   Sergio   Pérez 2024 British Grand Prix  30          1:32:185       1:40.021
## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

2024 Hungarian Grand Prix (Budapest)

## PRE-RACE PREDICTIONS: Predicted lap time of Sergio Pérez on lap 30 in the 2024 Hungarian Grand Prix : 1:24:827 
## Root Mean Squared Error: 1395 milliseconds 
## 
## POST RACE ANALYSIS: Laptime Prediction was off by 691 milliseconds
## [[1]]
##   forename surname year                 name lap Predicted Laptime
## 1   Sergio   Pérez 2024 Hungarian Grand Prix  30          1:24:827
##   Actual Laptime
## 1       1:25.518
## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

2024 Belgian Grand Prix (Spa)

## PRE-RACE PREDICTIONS: Predicted lap time of Sergio Pérez on lap 30 in the 2024 Belgian Grand Prix : 1:49:216 
## Root Mean Squared Error: 3667 milliseconds 
## 
## POST RACE ANALYSIS: Laptime Prediction was off by 338 milliseconds
## [[1]]
##   forename surname year               name lap Predicted Laptime Actual Laptime
## 1   Sergio   Pérez 2024 Belgian Grand Prix  30          1:49:216       1:48.878
## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

2024 Dutch Grand Prix (Zandvoort)

## PRE-RACE PREDICTIONS: Predicted lap time of Sergio Pérez on lap 30 in the 2024 Dutch Grand Prix : 1:16:921 
## Root Mean Squared Error: 1270 milliseconds

2024 Italian Grand Prix (Monza)

## PRE-RACE PREDICTIONS: Predicted lap time of Sergio Pérez on lap 30 in the 2024 Italian Grand Prix : 1:26:820 
## Root Mean Squared Error: 870 milliseconds

2024 Azerbaijan Grand Prix (Baku)

## PRE-RACE PREDICTIONS: Predicted lap time of Sergio Pérez on lap 30 in the 2024 European Grand Prix : 1:47:279 
## Root Mean Squared Error: 1263 milliseconds

2024 Singapore Grand Prix (Marina Bay)

## PRE-RACE PREDICTIONS: Predicted lap time of Sergio Pérez on lap 30 in the 2024 Singapore Grand Prix : 1:40:180 
## Root Mean Squared Error: 1190 milliseconds

2024 United States Grand Prix (Austin)

## PRE-RACE PREDICTIONS: Predicted lap time of Sergio Pérez on lap 30 in the 2024 United States Grand Prix : 1:42:469 
## Root Mean Squared Error: 1431 milliseconds

2024 Mexican Grand Prix (Mexico City)

## PRE-RACE PREDICTIONS: Predicted lap time of Sergio Pérez on lap 30 in the 2024 Mexican Grand Prix : 1:25:215 
## Root Mean Squared Error: 2283 milliseconds

2024 Brazilian Grand Prix (Sao Paulo)

## PRE-RACE PREDICTIONS: Predicted lap time of Sergio Pérez on lap 30 in the 2024 Brazilian Grand Prix : 1:17:29 
## Root Mean Squared Error: 1708 milliseconds

2024 Las Vegas Grand Prix (Las Vegas)

## PRE-RACE PREDICTIONS: Predicted lap time of Sergio Pérez on lap 30 in the 2024 Las Vegas Grand Prix : 1:38:630 
## Root Mean Squared Error: 1488 milliseconds

2024 Qatar Grand Prix (Losail)

## PRE-RACE PREDICTIONS: Predicted lap time of Sergio Pérez on lap 30 in the 2024 Qatar Grand Prix : 1:28:124 
## Root Mean Squared Error: 1466 milliseconds

2024 Abu Dhabi Grand Prix (Abi Dhabi)

## PRE-RACE PREDICTIONS: Predicted lap time of Sergio Pérez on lap 30 in the 2024 Abu Dhabi Grand Prix : 1:30:844 
## Root Mean Squared Error: 1389 milliseconds

2. Premier League Logistic Regression Model & Data Analysis

Model to predict Premier League Champion

I’ve become a big Arsenal fan, and after their recent misfortunes in being “so close yet so far” to a Premier League title for two years in a row, I started to wonder how many wins, points, or goals we would need to have a really good shot at finally winning it after 20 years. I developed a logistic regression model to predict whether a team will be crowned champions or not using important features such as the number of wins, draws, losses, points, goals for, and goals against. Below are the results:

##           Actual
## Prediction   0   1
##          0 231   7
##          1   6   9

It’s pretty disheartening to see that Arsenal was favorable in every aspect, but still came up short to the behemoth that is Manchester City. However, it makes me feel better that we didn’t have 97 points in a season with only one loss and still came up short! (Like 2018-2019 Liverpool)

_________________________________________________

2024 Premier League Standings by Matchweek

Below I sifted through 700 lines of data to compile a plot visualizing each Team’s progress (or fall) as the Premier League season unfolded from start to finish.

_________________________________________________

History of Premier League Table

The second of four graphs… I made this plot just to get a grasp of how each team that has ever played a game in the Premier League has done. I wanted to include every team for those Wigan Athletic, Derby Country, Swansea City (…) fans who are just happy to see their team in a PL graphic- hence why it looks pretty messy.

(Don’t worry Blackburn fans I have another one coming you way)

_________________________________________________

Premier League Champions by the Years

The next two plots are more constrained to specific teams that are pretty well-known so it would be easier to follow along with. Like the last one, it shows the progress of each of the teams that have ever won the Premier League. From Leicester City’s cinderella story to Manchester City’s steady rise to the top to Blackburn’s slow decline… This plot tells a lot!

_________________________________________________

Premier League Big 6 by the Years

The last plot is similar to the last one except its confined to the Premier League’s big 6 clubs only. I just wanted to make this one to stick it to Tottenham fans that they’ve never won a Premier League title.

Interesting Fact: Since the start of the Premier League, the big 6 clubs have all been in the top 6 standings in 5 of the last 10 PL seasons. Additionally, 5 of the 6 teams have been in the top 6 in 12 of the last 15 seasons.

It’s also intriguing to see that only 1 season has been won by a team outside the big 6 - Leicester City’s cinderella run as mentioned in the plot before.

Below are some captivating statistics I found after manipulating the Premier League dataset:

Average Finishing Position of each club since the establishment of the Premier League (1993)

## # A tibble: 43 × 2
##    Team            Average_finishing_position
##    <chr>                                <dbl>
##  1 Manchester Utd                        2.62
##  2 Arsenal                               3.94
##  3 Liverpool                             4.31
##  4 Chelsea                               4.88
##  5 Manchester City                       6.96
##  6 Tottenham                             7.41
##  7 Newcastle Utd                         9.76
##  8 Blackburn                            10   
##  9 Leeds United                         10.1 
## 10 Aston Villa                          10.2 
## # ℹ 33 more rows

Total number of wins for each club since establishment of the Premier League

## # A tibble: 43 × 2
##    Team            Total_Wins
##    <chr>                <dbl>
##  1 Manchester Utd         744
##  2 Arsenal                673
##  3 Liverpool              652
##  4 Chelsea                647
##  5 Tottenham              540
##  6 Manchester City        529
##  7 Everton                439
##  8 Newcastle Utd          419
##  9 Aston Villa            392
## 10 West Ham               360
## # ℹ 33 more rows

Total number of points for each club since the establishment of the Premier League

## # A tibble: 43 × 2
##    Team            Total_Points
##    <chr>                  <dbl>
##  1 Manchester Utd          2501
##  2 Arsenal                 2314
##  3 Liverpool               2258
##  4 Chelsea                 2245
##  5 Tottenham               1913
##  6 Manchester City         1809
##  7 Everton                 1658
##  8 Newcastle Utd           1541
##  9 Aston Villa             1487
## 10 West Ham                1350
## # ℹ 33 more rows

It’s evident that the big 6 clubs have been dominating since 1993!

3. Linear Regression Model on XLK Stock Price

I successfully train a Machine Learning model by gathering and manipulating the relevant data, splitting it into two sets (train & test), training a linear regression model, and testing it on the data set aside for testing. The resulting model displays a Root Mean Square Error of only 0.69, meaning that on average, my predictions are off by $0.69. I also included a regression line to show that on average, the stock price has gone up in the past year as well as a perfect fit line in the other plot.

## Root Mean Squared Error: $ 0.69

4. Linear Regression Model on American House Prices

Recently, I found at our neighbors were selling their house and moving away. I was curious how much the house would cost so I went on Zillow and to my surprise, it was worth way more than ours. After looking at the pictures available, it was easy to understand why: they had a finished basement, a breakfast sunroom, and their house was nearly 1000 sqft bigger than ours.

This got me wondering how important certain features have in a home, especially the number of beds, baths, and its size in sqft. I got to work building a linear regression model that is able to predict the price of a house given the features as input. All in all, it turned out pretty good.

## Predicted price of a house in Texas with 3 beds, 3 baths, and a living space of 3000 sqft: $ 576003.1

Data Analysis Projects

This is the Data Analytics Portion of my projects. Explore a diverse collection where I apply analytical skills to uncover insights and predictions across various domains. In the NFL 2023 Quarterback Performance Analysis, I used advanced metrics to evaluate quarterback efficiency and overall performance, developing a unique formula to assess MVP ratings throughout the season, as well as comparing the performance of different players using radar charts.

Turning to the Olympic History Data Analysis, I examined decades of Olympic data to reveal trends in medal distributions and sports participation across different countries, presenting these insights through compelling visualizations and statistical analysis. Lastly, my analysis of the 2011 Masters Golf Tournament highlighted player performances throughout the entirety of the tournament.

1. NFL 2023 QB Performance Analysis

Many times you see people debating whether this player did better than that player. Using data from the 2013-2023 NFL seasons, I developed a formula in Excel to grade each player’s MVP-worthy performance, and overall grade. I then transferred the file into R to visualize my findings. Below you’ll find a table displaying the hgihest graded players in a season since 2013, and 4 separate scatterplots (one for each position) along with the names of the top 6 highest graded players at that position.

(If you’re interesting in which performance metrics I used in determining the grade and whether a player had a good MVP score feel free to check out the dataset on my GitHub nfldata.xlsx)

Top 10 Highest Graded Players since 2013

## # A tibble: 10 × 9
##       id name           position team  season grade mvp_rating first_votes isMvp
##    <dbl> <chr>          <chr>    <chr>  <dbl> <dbl>      <dbl>       <dbl> <dbl>
##  1  1051 Lamar Jackson  QB       BAL     2019 0.998      0.882          50     1
##  2  1267 Patrick Mahom… QB       KC      2020 0.939      0.888           2     0
##  3  1267 Patrick Mahom… QB       KC      2018 0.942      0.957          41     1
##  4    11 Aaron Rodgers  QB       GB      2020 0.995      0.963          44     1
##  5   330 Cooper Kupp    WR       LA      2021 0.985      0.956           1     0
##  6  1188 Michael Thomas WR       NO      2019 0.948      0.908           0     0
##  7    85 Antonio Brown  WR       PIT     2015 0.961      0.932           0     0
##  8   295 Christian McC… RB       CAR     2019 0.978      0.949           0     0
##  9   295 Christian McC… RB       SF      2023 0.943      0.862           0     0
## 10   868 Jonathan Tayl… RB       IND     2021 0.968      0.903           0     0

NFL 2023 Player Performance Evaluation

## # A tibble: 10 × 4
##    name                position team  grade
##    <chr>               <chr>    <chr> <dbl>
##  1 Christian McCaffrey RB       SF    0.943
##  2 CeeDee Lamb         WR       DAL   0.897
##  3 Keenan Allen        WR       LAC   0.897
##  4 Justin Jefferson    WR       MIN   0.891
##  5 Amon-Ra St. Brown   WR       DET   0.887
##  6 Tyreek Hill         WR       MIA   0.883
##  7 Puka Nacua          WR       LA    0.858
##  8 Davante Adams       WR       LV    0.848
##  9 D.J. Moore          WR       CHI   0.847
## 10 Dak Prescott        QB       DAL   0.835

2023 Player Performance Comparison using Radar Charts

In the 2023-2024 NFL Season, Lamar Jackson won the MVP with Dak Prescott coming in 2nd. Here is how the QBs stack up against eachother, along with other positional comparisons:

2. 2011 Golf - Masters Tournament Analysis

The plot below is a line graph I created visualizing the summary of the Masters 2011 Pro Golf Tournament, along with the performance of each golfer and the overall winner of the comptetition (Charl Schwartzel).

3. A Data-driven Look into the History of the Olympics

After gaining access to mulitple datasets of the Olympics containing every instance throughout every competition since the inaugural season back in 1896 (Greece) up until the 2016 Games in Brazil, I decided my free time would be well spent answering a couple of questions I, like many others (I think), have been wondering:

  1. Does the economical stability of a country affect the number of athletes it sends to the olympics and the number of medals it wins?
  1. Does hosting the olympics correlate to winning more medals that year?

Part I

Below are the results I found for the first question, along with the code I wrote to filter and manipulate the data so I can visualize it in a more effective manner.

As we can see, there is in fact a positive correlation between a country’s gdp per capita and the number of medals and athletes a country has. This means that the higher the gdp is, the more medals it wins and more athletes it sends to the Olympics.

Part II

For the second question… I started by joining data sets together and creating a function that will filter the joint dataset for each country and in each of the seasons: determine whether they hosted or not. The function also displays a plot to compare the amount of medals that country won when they hosted vs when they did not. We will then compare and draw reasonable conclusions by creating a histogram containing the average number of medals all countries combined have won when they host vs in the competitions before.

As stated earlier, I created a histogram of the difference of medals (by subtracting the medals won when they host minus the medals won in the olympic season directly prior) to draw a reasonable conclusion.

We can see there is a positive host effect country on the amount of medals won when a country hosts the olympics vs when they don’t because there is an overall positive difference.

Conclusion

In my diverse portfolio of projects spanning machine learning and data analysis, I’ve delved deep into various domains, from sports analytics in Formula 1, NFL, and Premier League to financial forecasting in stock markets and real estate.

Coming up: NFL MVP Linear/Logistic Regression Model, TMDB Movie Data Analysis, Alzheimers Classification Model

I hope my findings have been as interesting to you as they were to me !