Football - Match Win Prediction

Ashok Kumar Rayapati | Naveen Kumar Kalluri | Varun Sai Rachulapally

04-27-2022

Initial Proposal Plan

Key Peer Comments Summary

Data Summary

Dataset Description:
Odds.homeWin Odds.draw Odds.awayWin halfTime_score_Home halfTime_score_Away fullTime_score_Home fullTime_score_Away homeTeam awayTeam winner
2.01 3.24 3.76 0 2 0 2 Fortaleza EC CA Paranaense AWAY_TEAM
3.77 3.06 2.10 0 1 0 1 Coritiba FBC SC Internacional AWAY_TEAM
2.90 2.93 2.60 3 2 3 2 SC Recife Cear<e1> SC HOME_TEAM
2.14 3.24 3.39 0 1 1 1 Santos FC RB Bragantino DRAW
1.46 4.46 6.29 0 1 0 1 CR Flamengo CA Mineiro AWAY_TEAM

Data Exploration

Plot 1 - Matches and Wins with respect to Home team and Away team

Plot 2 - Top Home Winners

Plot 3 - Top Away Winners

Plot 4 - Top Three teams with respect to Home wins, Away wins and Draw

Complete Data View

Odds.homeWin Odds.draw Odds.awayWin halfTime_score_Home halfTime_score_Away fullTime_score_Home fullTime_score_Away homeTeam awayTeam winner
2.01 3.24 3.76 0 2 0 2 Fortaleza EC CA Paranaense AWAY_TEAM
3.77 3.06 2.10 0 1 0 1 Coritiba FBC SC Internacional AWAY_TEAM
2.90 2.93 2.60 3 2 3 2 SC Recife Cear<e1> SC HOME_TEAM
2.14 3.24 3.39 0 1 1 1 Santos FC RB Bragantino DRAW
1.46 4.46 6.29 0 1 0 1 CR Flamengo CA Mineiro AWAY_TEAM

Partition and pre-processing of Data

Train Data
## [1] 14883
Test Data
## [1] 6377

AI/ML Models and Results

Naive Bayes model

## Confusion Matrix and Statistics
## 
##            y_pred
##             AWAY_TEAM DRAW HOME_TEAM
##   AWAY_TEAM      1392  526        41
##   DRAW            195 1318       171
##   HOME_TEAM        47  839      1848
## 
## Overall Statistics
##                                           
##                Accuracy : 0.7148          
##                  95% CI : (0.7035, 0.7258)
##     No Information Rate : 0.4207          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.5753          
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
## 
## Statistics by Class:
## 
##                      Class: AWAY_TEAM Class: DRAW Class: HOME_TEAM
## Sensitivity                    0.8519      0.4912           0.8971
## Specificity                    0.8805      0.9009           0.7948
## Pos Pred Value                 0.7106      0.7827           0.6759
## Neg Pred Value                 0.9452      0.7091           0.9418
## Prevalence                     0.2562      0.4207           0.3230
## Detection Rate                 0.2183      0.2067           0.2898
## Detection Prevalence           0.3072      0.2641           0.4287
## Balanced Accuracy              0.8662      0.6961           0.8459

Deep learning model using H2o

Summary

Takeways