analysis on IPL 2023 auction data set.(source: kaggle, date:10-10-2023)
There are a total of 568 players, including players retained by their respective squads.
#IPL 2023 auction data ̥set
data=read.csv("C:/Users/jsunn/Downloads/ipl_2023_dataset.csv")
#summary and structure of data
summary(data)
## Player.Name Base.Price Type Cost.in.Rs...CR.
## Length:568 Length:568 Length:568 Min. : 0.0000
## Class :character Class :character Class :character 1st Qu.: 0.0000
## Mode :character Mode :character Mode :character Median : 0.0000
## Mean : 0.6872
## 3rd Qu.: 0.2000
## Max. :18.5000
## NA's :325
## Cost.in....K. X2022.Squad X2023.Squad
## Min. : 0.00 Length:568 Length:568
## 1st Qu.: 0.00 Class :character Class :character
## Median : 0.00 Mode :character Mode :character
## Mean : 82.47
## 3rd Qu.: 24.00
## Max. :2220.00
## NA's :325
str(data)
## 'data.frame': 568 obs. of 7 variables:
## $ Player.Name : chr "Shivam Mavi" "Joshua Little" "Kane Williamson" "K.S. Bharat" ...
## $ Base.Price : chr "4000000" "5000000" "20000000" "2000000" ...
## $ Type : chr "BOWLER" "BOWLER" "BATSMAN" "WICKETKEEPER" ...
## $ Cost.in.Rs...CR.: num 6 4.4 2 1.2 0.5 0.5 0.2 0 0 0 ...
## $ Cost.in....K. : int 720 528 240 144 60 60 24 0 0 0 ...
## $ X2022.Squad : chr "KKR" "" "SRH" "DC" ...
## $ X2023.Squad : chr "GT" "GT" "GT" "GT" ...
#top few players
head(data)
## Player.Name Base.Price Type Cost.in.Rs...CR. Cost.in....K.
## 1 Shivam Mavi 4000000 BOWLER 6.0 720
## 2 Joshua Little 5000000 BOWLER 4.4 528
## 3 Kane Williamson 20000000 BATSMAN 2.0 240
## 4 K.S. Bharat 2000000 WICKETKEEPER 1.2 144
## 5 Mohit Sharma 5000000 BOWLER 0.5 60
## 6 Odean Smith 5000000 ALL-ROUNDER 0.5 60
## X2022.Squad X2023.Squad
## 1 KKR GT
## 2 GT
## 3 SRH GT
## 4 DC GT
## 5 GT
## 6 PBKS GT
for the plots,we are using ggplot2 library.ggplot2 is a popular R data visualization package that provides an intuitive and flexible framework for creating a wide range of high-quality, customized graphs and plots for data analysis and presentation.
library(ggplot2)
# Create a table of the count of players in each base price category
a <- table(data$Base.Price)
a
##
## 10000000 1500000 15000000 2000000 20000000 3000000 4000000 5000000
## 20 10 1 274 19 4 7 61
## 7500000 Retained
## 9 163
# Convert the result to a data frame for plotting
a_df <- as.data.frame(a)
a_df
## Var1 Freq
## 1 10000000 20
## 2 1500000 10
## 3 15000000 1
## 4 2000000 274
## 5 20000000 19
## 6 3000000 4
## 7 4000000 7
## 8 5000000 61
## 9 7500000 9
## 10 Retained 163
a_df$Base.Price <- (rownames(a_df))
a_df$Base.Price#displays row numbers respective to the base prices
## [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"
# Create the bar plot
ggplot(a_df, aes(x = Base.Price, y = Freq)) +
geom_bar(stat = "identity", fill = "blue") +
labs(title = "Number of Players vs. Base Price", x = "Base Price", y = "Number of Players")
from
this plot, we observed that most of the players were either retained by
their respective teams or have a base price of 20 lakh rupees.
#histogram based on no. of players in each sold price category̥
ggplot(data, aes(x = Cost.in.Rs...CR.)) +
geom_histogram(binwidth = 5, fill = "green", color = "red") +
labs(title = "players", x = "price", y = "Frequency")
## Warning: Removed 325 rows containing non-finite values (`stat_bin()`).
there are more no. of players in the range of 20 lakhs-2.5 cr inr than in other sold price categories.
#scatter plot based on base price and sold price.
ggplot(data, aes(x = Base.Price, y = Cost.in.Rs...CR.)) +
geom_point(color = "red") +
labs(title = "price comparison", x = "base price", y = "sold price in cr")
## Warning: Removed 325 rows containing missing values (`geom_point()`).
this plot shows that most of the players who were bought were in the base price category of 20 lakhs INR. and very few players had their sold prices above 10cr INR with highest at 18.5cr.
a=table(data$X2023.Squad)
b=names(a)
a#displaying no of players in each squad
##
## CSK DC GT KKR LSG MI PBKS RCB RR SRH Unsold
## 25 25 25 22 25 24 22 25 25 25 325
share = round(a/sum(a)*100)
a = paste(share,"%",sep="")
the table created above shows no. of players in each squad.
# Create a data frame with the data to be plotted
c <- data.frame(category = b, value = a)
# Create the pie chart using ggplot2
#share of each squad in no.of players
library(ggplot2)
ggplot(c, aes(x = "", y = value, fill = category)) +
geom_bar(stat = "identity", width = 1) +
coord_polar(theta = "y") +
geom_text(aes(label = value), position = position_stack(vjust = 0.5)) +
scale_fill_manual(values = rainbow(length(b))) +
labs(title = "player share of squads")
shows the above analysis in the form of share of pie.
#box plot on sold prices in inr.
ggplot(data, aes(x = Cost.in.Rs...CR.)) +
geom_boxplot(fill = "orange", color = "black") +
labs(title = "Box Plot Example", x = "Category", y = "Value")
## Warning: Removed 325 rows containing non-finite values (`stat_boxplot()`).
there are Outliers in this data but we can’t remove it because it represents actual sold prices of players
summary(data$Cost.in.Rs...CR.)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0000 0.0000 0.0000 0.6872 0.2000 18.5000 325
it’s the summary of above box plot.