Project Statement
Analyze life span of British first class cricketers to understand how the sport impacted their longevity, and analyse cause of death
Read data file
#install.packages("ggplot2")
require(ggplot2)
## Loading required package: ggplot2
cricketer_data = "https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/DAAG/cricketer.csv"
cricketer <- read.csv(cricketer_data, header=TRUE, sep=",")
View file content and format
head(cricketer)
## X left year life dead acd kia inbed cause
## 1 1 right 1890 102 0 0 0 0 alive
## 2 2 left 1892 100 0 0 0 0 alive
## 3 3 right 1893 99 0 0 0 0 alive
## 4 4 right 1894 98 0 0 0 0 alive
## 5 5 right 1896 96 0 0 0 0 alive
## 6 6 right 1896 96 0 0 0 0 alive
Data summary and other counts
summary(cricketer)
## X left year life
## Min. : 1 left :1101 Min. :1840 Min. : 19.00
## 1st Qu.:1491 right:4859 1st Qu.:1878 1st Qu.: 49.00
## Median :2992 Median :1908 Median : 63.00
## Mean :3036 Mean :1906 Mean : 61.84
## 3rd Qu.:4579 3rd Qu.:1935 3rd Qu.: 76.00
## Max. :6172 Max. :1960 Max. :102.00
## dead acd kia inbed
## Min. :0.0000 Min. :0.00000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.0000
## Median :1.0000 Median :0.00000 Median :0.00000 Median :1.0000
## Mean :0.5683 Mean :0.03154 Mean :0.02013 Mean :0.5367
## 3rd Qu.:1.0000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.00000 Max. :1.00000 Max. :1.0000
## cause
## acd : 188
## alive:2573
## inbed:3199
##
##
##
table(cricketer$cause, cricketer$kia )
##
## 0 1
## acd 68 120
## alive 2573 0
## inbed 3199 0
table(cricketer$cause)
##
## acd alive inbed
## 188 2573 3199
table(cricketer$kia )
##
## 0 1
## 5840 120
table(cricketer$kia, cricketer$cause)
##
## acd alive inbed
## 0 68 2573 3199
## 1 120 0 0
Extract subset of columns and rename
cricketer1 <- data.frame("handedness"=cricketer$left, "year"=cricketer$year, "life"=cricketer$life, "dead"=cricketer$dead, "onField"=cricketer$kia, "cause"=cricketer$cause)
head(cricketer1)
## handedness year life dead onField cause
## 1 right 1890 102 0 0 alive
## 2 left 1892 100 0 0 alive
## 3 right 1893 99 0 0 alive
## 4 right 1894 98 0 0 alive
## 5 right 1896 96 0 0 alive
## 6 right 1896 96 0 0 alive
summary(cricketer1)
## handedness year life dead
## left :1101 Min. :1840 Min. : 19.00 Min. :0.0000
## right:4859 1st Qu.:1878 1st Qu.: 49.00 1st Qu.:0.0000
## Median :1908 Median : 63.00 Median :1.0000
## Mean :1906 Mean : 61.84 Mean :0.5683
## 3rd Qu.:1935 3rd Qu.: 76.00 3rd Qu.:1.0000
## Max. :1960 Max. :102.00 Max. :1.0000
## onField cause
## Min. :0.00000 acd : 188
## 1st Qu.:0.00000 alive:2573
## Median :0.00000 inbed:3199
## Mean :0.02013
## 3rd Qu.:0.00000
## Max. :1.00000
Analyse live vs Dead
Here “1” is Dead, “0” is Alive
liveVsdead <- table(cricketer1$dead, cricketer1$cause)
barplot(liveVsdead, main = "Live Vs Dead", xlab = "Status as of 1992", col=c("green","red"), legend = rownames(liveVsdead))

Analyse subset of data focussed on cause of deaths
Box plot to show spread of age
deathstats <- cricketer1[which(cricketer1$dead>0),]
attach(deathstats)
boxplot(deathstats$life, xlab = "Age")

Plot to show scatter chart of birth year Vs the age when a player died
This shows that most deaths were at older age with fewer at younger age
plot(year, life, main="Age Vs birth year",
xlab="year ", ylab="age ", pch=19)

Bar plot that shows ratio between death “in bed”, which indicates normal death, Vs accidental death. It clearly shows that the number of accidental deaths were much less
barplot(table(deathstats$cause), main = "Cause of death", xlab = "Status as of 1992", col=c("red","green", "blue"), legend = rownames(deathstats$cause))

Plot showing distribution of age of players who died and confirming that most of the players were in 60’s and above.
qplot(life, data=deathstats, xlab="Age")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Zoom in on cause of accidental deaths
Barplot showing more than 50% of accidental deaths were on field.
accidentstat <- cricketer[which(cricketer$acd>0),]
barplot(table(accidentstat$kia), main = "Cause of death", xlab = "Status as of 1992", col=c("red","green", "blue"), names.arg=c("On field", "Other"))

Overall histogram comparisons of lifespans for All players, players who died, and accidental deaths on Field
hist(cricketer$life, main= "Lifespan of all Players", xlab = "Age")

hist(deathstats$life, main= "Life span of Players who died", xlab = "Age")

hist(accidentstat$life, main= "Players who died on field", xlab = "Age")

Conclusion
From the histogram above it is clear that most of the players lived a long life, and life span of the players death was more shifted towards older age. However, the players who died due to accident were largely below the age of 50.