The following analysis will compare the ages of professional soccer and hockey players.The data includes the ages of all soccer player in the Bundesliga 2015-2016 and all players in the NHL 2015-2016.
Do soccer or hockey players tend to keep playing professionally into there veteran years. That is, which sport consists of more veteran or older players. For the purposes of this analysis a veteran will be any player greater than the age of 35.
skip to analysis section if this section doesnt interest you
Bundesliga Data:
library(jsonlite)
## Warning: package 'jsonlite' was built under R version 3.2.5
library(plyr)
## Warning: package 'plyr' was built under R version 3.2.5
footy<-fromJSON("https://raw.githubusercontent.com/jokecamp/FootballData/master/Germany/bundesliga-2015-2016-rosters.json")
##Read in data from Bundesliga 2015-2016. Convert it from a JSON to a nested list
I wrote this function below to calculate each soccer player’s age from there birthdate and to get the data into a data frame from the nested list (Team >>Player>>Characteristics was the layers of the nesting)
ageForTeam<-function(x)
{
team1<-(laply(footy[[x]],identity))
birthdate<-as.character(team1[,1])
index<-1:nrow(team1)
##convert to date variable
team<-data.frame(birthdate, index)
team$birthdate<-as.Date(team$birthdate,format="%d.%m.%Y")
##calculate age
date<-Sys.Date()
ageForTeam<-transform(team, age=as.numeric(round((date-birthdate)/365)))
}
The for loop then applies the function to all 18 teams in the Bundesliga and binds them in a data frame called soccer.
soccer<-data.frame()
for (i in 1:18) {
soccer<-rbind(soccer,ageForTeam(i))
}
head(soccer) ##first 6 rows of the data
## birthdate index age
## 1 1993-12-19 1 23
## 2 1995-02-13 2 22
## 3 1994-03-13 3 23
## 4 1990-05-27 4 27
## 5 1993-08-15 5 24
## 6 1993-05-12 6 24
We now have the age of every player in the Bundesliga.
NHL Data:
library(xlsx)
## Loading required package: rJava
## Loading required package: xlsxjars
puck <- read.xlsx("/Users/roberttalarico/Desktop/Coursera/NHL Ages.xlsx", 1) ##Read in Hockey Data
head(puck) ## First 6 rows of data
## Last.Name DOB Age
## 1 Abdelkader 1987-02-25 28
## 2 Acciari 1991-12-01 24
## 3 Agostino 1992-04-30 23
## 4 Agozzino 1991-01-03 25
## 5 Alzner 1988-09-24 27
## 6 Anderson 1994-05-07 21
We already have the age for the NHL player so no further processing is necessary.
Since there are more player in the NHL (n=898) then the Bundesliga (n=530) the overall count of veteran players will be misleading. Thus, we will look at the proportion of veteran players within each league.
##Overall Summary
summary(soccer$age)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 19.00 24.00 27.00 26.93 30.00 40.00
summary(puck$Age)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.00 23.00 25.00 26.28 29.00 43.00
The median age shows that soccer players are slightly older.
data1<-data.frame(age=soccer$age, sport="soccer")
data2<-data.frame(age=puck$Age, sport="hockey")
data<-rbind(data1, data2) ## Create one dataset called data
data$agecat<-cut(data$age, seq(16,45,9)) ##Divide into age groups
unique(data$agecat)
## [1] (16,25] (25,34] (34,43]
## Levels: (16,25] (25,34] (34,43]
The player are divided into 3 ages groups: 17-25, 26-34, 35-43
library(ggplot2)
ggplot(data=data)+
geom_bar(mapping=aes(x=agecat,y=..prop..,group=1))+
facet_grid(.~sport)+
labs(x="Age Group", title="Proportion of Each Age Group by Sport")
##proportions bar plot of age categories for hockey vs soccer
A higher proportion of soccer players are between the ages of 24-34. However, a greater proportion of hockey players are greater than 35.
Calculated the Propotions above 30 and above 35 for each sport.
over30<-soccer$age>30
soccerProportionOver30<-sum(over30)/nrow(soccer)
over35<-soccer$age>=35
soccerProportionOver35<-sum(sum(over35)/nrow(soccer))
over30<-puck$Age>30
puckProportionOver30<-sum(over30)/nrow(puck)
over35<-puck$Age>=35
puckProportionOver35<-sum(sum(over35)/nrow(puck))
lst<-list(soccerProportionOver30*100,soccerProportionOver35*100,
puckProportionOver30*100,puckProportionOver35*100)
names(lst)<-c("Soccer Players Over 30 (%)","Soccer Players Age 35+ (%)",
"Hockey Players Over 30 (%)", "Hockey Players Age 35+ (%)")
lst
## $`Soccer Players Over 30 (%)`
## [1] 18.2
##
## $`Soccer Players Age 35+ (%)`
## [1] 3.4
##
## $`Hockey Players Over 30 (%)`
## [1] 18.59688
##
## $`Hockey Players Age 35+ (%)`
## [1] 5.345212
Looking at the actual proportions we see that hockey players have a 2% edge over soccer players in the proportion of veterans playing the respective sports. Also, a simiar proportion of hockey players and soccer players are above age 30 (~18%)
Soccer data was from the Bundesliga only. Hockey data was from the NHL only.
More hockey players tend to play into there veteran years (35+) compared to soccer players.
Note these represent population parameters since every player in each league was included in the analysis. Also, age is measured without error so there is no sampling variablity or measurement error to quantify.