IPL 2023 Auction

analysis on IPL 2023 auction data set.(source: kaggle, date:10-10-2023)

There are a total of 568 players, including players retained by their respective squads.

#IPL 2023 auction data ̥set
data=read.csv("C:/Users/jsunn/Downloads/ipl_2023_dataset.csv")
#summary and structure of data
summary(data)
##  Player.Name         Base.Price            Type           Cost.in.Rs...CR. 
##  Length:568         Length:568         Length:568         Min.   : 0.0000  
##  Class :character   Class :character   Class :character   1st Qu.: 0.0000  
##  Mode  :character   Mode  :character   Mode  :character   Median : 0.0000  
##                                                           Mean   : 0.6872  
##                                                           3rd Qu.: 0.2000  
##                                                           Max.   :18.5000  
##                                                           NA's   :325      
##  Cost.in....K.     X2022.Squad        X2023.Squad       
##  Min.   :   0.00   Length:568         Length:568        
##  1st Qu.:   0.00   Class :character   Class :character  
##  Median :   0.00   Mode  :character   Mode  :character  
##  Mean   :  82.47                                        
##  3rd Qu.:  24.00                                        
##  Max.   :2220.00                                        
##  NA's   :325
str(data)
## 'data.frame':    568 obs. of  7 variables:
##  $ Player.Name     : chr  "Shivam Mavi" "Joshua Little" "Kane Williamson" "K.S. Bharat" ...
##  $ Base.Price      : chr  "4000000" "5000000" "20000000" "2000000" ...
##  $ Type            : chr  "BOWLER" "BOWLER" "BATSMAN" "WICKETKEEPER" ...
##  $ Cost.in.Rs...CR.: num  6 4.4 2 1.2 0.5 0.5 0.2 0 0 0 ...
##  $ Cost.in....K.   : int  720 528 240 144 60 60 24 0 0 0 ...
##  $ X2022.Squad     : chr  "KKR" "" "SRH" "DC" ...
##  $ X2023.Squad     : chr  "GT" "GT" "GT" "GT" ...
#top few players
head(data)
##       Player.Name Base.Price         Type Cost.in.Rs...CR. Cost.in....K.
## 1     Shivam Mavi    4000000       BOWLER              6.0           720
## 2   Joshua Little    5000000       BOWLER              4.4           528
## 3 Kane Williamson   20000000      BATSMAN              2.0           240
## 4     K.S. Bharat    2000000 WICKETKEEPER              1.2           144
## 5    Mohit Sharma    5000000       BOWLER              0.5            60
## 6     Odean Smith    5000000  ALL-ROUNDER              0.5            60
##   X2022.Squad X2023.Squad
## 1         KKR          GT
## 2                      GT
## 3         SRH          GT
## 4          DC          GT
## 5                      GT
## 6        PBKS          GT

Including Plots

for the plots,we are using ggplot2 library.ggplot2 is a popular R data visualization package that provides an intuitive and flexible framework for creating a wide range of high-quality, customized graphs and plots for data analysis and presentation.

library(ggplot2)
# Create a table of the count of players in each base price category
a <- table(data$Base.Price)
a
## 
## 10000000  1500000 15000000  2000000 20000000  3000000  4000000  5000000 
##       20       10        1      274       19        4        7       61 
##  7500000 Retained 
##        9      163
# Convert the result to a data frame for plotting
a_df <- as.data.frame(a)
a_df
##        Var1 Freq
## 1  10000000   20
## 2   1500000   10
## 3  15000000    1
## 4   2000000  274
## 5  20000000   19
## 6   3000000    4
## 7   4000000    7
## 8   5000000   61
## 9   7500000    9
## 10 Retained  163
a_df$Base.Price <- (rownames(a_df))
a_df$Base.Price#displays row numbers respective to the base prices
##  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"
# Create the bar plot
ggplot(a_df, aes(x = Base.Price, y = Freq)) +
  geom_bar(stat = "identity", fill = "blue") +
  labs(title = "Number of Players vs. Base Price", x = "Base Price", y = "Number of Players")

from this plot, we observed that most of the players were either retained by their respective teams or have a base price of 20 lakh rupees.

#histogram based on no. of players in each sold price category̥
ggplot(data, aes(x = Cost.in.Rs...CR.)) + 
  geom_histogram(binwidth = 5, fill = "green", color = "red") + 
  labs(title = "players", x = "price", y = "Frequency")
## Warning: Removed 325 rows containing non-finite values (`stat_bin()`).

there are more no. of players in the range of 20 lakhs-2.5 cr inr than in other sold price categories.

#scatter plot based on base price and sold price.
ggplot(data, aes(x = Base.Price, y = Cost.in.Rs...CR.)) + 
  geom_point(color = "red") + 
  labs(title = "price comparison", x = "base price", y = "sold price in cr")
## Warning: Removed 325 rows containing missing values (`geom_point()`).

this plot shows that most of the players who were bought were in the base price category of 20 lakhs INR. and very few players had their sold prices above 10cr INR with highest at 18.5cr.

a=table(data$X2023.Squad)
b=names(a)
a#displaying no of players in each squad
## 
##    CSK     DC     GT    KKR    LSG     MI   PBKS    RCB     RR    SRH Unsold 
##     25     25     25     22     25     24     22     25     25     25    325
share = round(a/sum(a)*100)
a = paste(share,"%",sep="")

the table created above shows no. of players in each squad.

# Create a data frame with the data to be plotted
c <- data.frame(category = b, value = a)

# Create the pie chart using ggplot2
#share of each squad in no.of players
library(ggplot2)
ggplot(c, aes(x = "", y = value, fill = category)) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar(theta = "y") +
  geom_text(aes(label = value), position = position_stack(vjust = 0.5)) +
  scale_fill_manual(values = rainbow(length(b))) +
  labs(title = "player share of squads")

shows the above analysis in the form of share of pie.

#box plot on sold prices in inr.
ggplot(data, aes(x = Cost.in.Rs...CR.)) + 
  geom_boxplot(fill = "orange", color = "black") + 
  labs(title = "Box Plot Example", x = "Category", y = "Value")
## Warning: Removed 325 rows containing non-finite values (`stat_boxplot()`).

there are Outliers in this data but we can’t remove it because it represents actual sold prices of players

summary(data$Cost.in.Rs...CR.)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.0000  0.0000  0.0000  0.6872  0.2000 18.5000     325

it’s the summary of above box plot.