Introduction

This app predicts the probability of India winning a cricket match using multinomial logistic regression trained on historical match results
The model takes three user-selected inputs namely, opponent, venue, and format (Test / ODI / T20), and returns the predicted probability of a Win, Loss, or Draw
The model is simplistic in nature, considering only a few key variables. Additionally, it is trained on historical data. Past performance is not a guarantee of future results

ML algorithm used and evaluation

The app uses a multinomial logistic regression model via the multinom() function from the nnet package. The model is trained on historical India match results, predicting outcome (Win/Loss/Draw) from opponent, venue, and format.

library(readxl)

## Warning: package 'readxl' was built under R version 4.5.2

library(nnet)

data <- read_excel("data/Ind vs opponent results.xlsx")
df <- subset(data, Result != "NR")
df$Result[df$Result %in% c("Tied","Draw")] <- "Draw"
df$Result <- factor(df$Result)
df$Opponent <- factor(df$Opponent)
df$Venue <- factor(df$Venue)
df$Format <- factor(df$Format)

model <- multinom(Result ~ Opponent + Venue + Format, data = df)

## # weights:  30 (18 variable)
## initial  value 1316.137522 
## iter  10 value 1026.379392
## iter  20 value 985.422744
## iter  30 value 980.993645
## final  value 980.975357 
## converged

summary(model)

## Call:
## multinom(formula = Result ~ Opponent + Venue + Format, data = df)
## 
## Coefficients:
##      (Intercept) OpponentEngland OpponentPakistan OpponentSouth Africa OpponentWest Indies  VenueHome VenueNeutral
## Loss    5.564669      -0.5569458        -1.551056            0.2923433          -0.9534112 -0.3348392     10.57441
## Win     4.810498      -0.2380381        -1.438593            0.4048800          -0.6323947  0.4819899     11.23608
##      FormatT20 FormatTest
## Loss  7.999503  -4.934941
## Win   8.695043  -5.078633
## 
## Std. Errors:
##      (Intercept) OpponentEngland OpponentPakistan OpponentSouth Africa OpponentWest Indies VenueHome VenueNeutral FormatT20
## Loss   0.7515430       0.2852758        0.3506165            0.4222167           0.2996347 0.2078663     205.1103  106.8301
## Win    0.7553691       0.2951833        0.3626136            0.4329006           0.3088540 0.2151487     205.1104  106.8301
##      FormatTest
## Loss  0.7219215
## Win   0.7234679
## 
## Residual Deviance: 1961.951 
## AIC: 1997.951

Results in graphical format

plot of chunk unnamed-chunk-1

Overall accuracy is moderate; the model performs best on Win predictions but struggles with Draw classification due to limited predictor variables.

Note on Data Collection

Match results data was manually compiled from ESPN Cricinfo, covering India’s international matches across Test, ODI, and T20 formats against five major opponents: Australia, England, South Africa, Pakistan, and West Indies
Each record contains four variables: Opponent, Venue, Format, and Result. Venue is categorized as Home, Away, or Neutral. No-result (NR) matches were excluded, and Tied/Drawn matches were merged into a single Draw category
The dataset is limited in scope; only five opponents are included, and historical factors such as team composition, player form, and pitch conditions are not captured. Users should interpret predictions accordingly

DDP Final Project Pitch Presentation

Introduction

ML algorithm used and evaluation

Results in graphical format

Note on Data Collection