1. Merge Team and Roster Data
2. Calculate the Number of Shot Attempts
3. Plot Shot Attempts
4. Plot Shot Attempts 2
5. Pass Reception Plots
6. Corsi!
The data set is for a game between the Penguins and Capitals in the NHL. The data is an extract of a spatio-temporal event data set.
teams <- read.csv("~/Downloads/teams.csv")
gameevents <- read.csv("~/Downloads/gameevents.csv")
gamerosters <- read.csv("~/Downloads/gamerosters.csv")
teamroster <- merge(gamerosters,teams, by = 'teamId')
- Exclude goalies
teamroster <- subset(teamroster, primaryPosition!="G")
- Grouping by teamShorthand, primaryPosition and handedness, plot a chart showing handedness by player position and team.
PIT <- subset(teamroster,teamShorthand == 'PIT')
WSH <- subset(teamroster,teamShorthand == 'WSH')
ggplot(NULL,aes(handedness))+geom_bar(data = PIT, aes(handedness,fill=primaryPosition),position = 'dodge')+ggtitle("Pittsburg Penguins")+scale_fill_manual(values = c("#CFC493","#000000"))+scale_y_continuous(limit = c(0,8))+labs(y="Count", x="Handedness",fill="Primary Position")
ggplot(NULL,aes(handedness))+geom_bar(data = WSH, aes(handedness,fill=primaryPosition),position = 'dodge')+ggtitle("Washington Capitals")+scale_fill_manual(values = c("#C8102E","#041E42"))+scale_y_continuous(limit = c(0,8))+labs(y="Count", x="Handedness",fill ="Primary Position")
- Which team has an equal numbers of left and right handed defenders?
WSH has a count of 3 for both Left handed and Right handed defenders.
- Which team seems more imbalanced in terms of handedness and towards which side?
PIT has a higher count of Left handedness at 4 for defensive players and 8 for forward players.
- Join the events and teams data frames. Grouping by teamShorthand, outcome, plot a chart showing shot attempt counts (including both successful and failed shots) grouped by team.
shotattempts <- merge(gameevents,teams, by = 'teamId')
shotattempts<-shotattempts %>% filter(shotattempts$name == "shot" | shotattempts$name=="goal")
ggplot(shotattempts, aes(outcome,fill= teamName))+geom_bar(position = 'dodge')+scale_fill_manual(values = c("#041E42","#CFC493"))+labs(y="SHOT QUANTITY",x="OUTCOME",fill ="Teams")+ggtitle("SHOT ATTEMPTS PER TEAM")+geom_text(aes(label=..count..),stat = "count",vjust = 1.5,position = position_dodge(.9), colour = "white")
## Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(count)` instead.
- Which team dominated the shot attempts and what was the Corsi ratio?
PIT recorded 91 attempted shots versus WSH with 55
Corsi Calculations: total offensive shot attempts minus total defensive shot attempts:
PITcorsi<-shotattempts%>%filter(shotattempts$teamShorthand=="PIT"|shotattempts$manpowerSituation=="evenStrength")
PITcorsiTbl<-with(PITcorsi,table(name,outcome))
PITcorsiTbl
## outcome
## name failed successful
## goal 0 13
## shot 62 71
WShcorsi<-shotattempts%>%filter(shotattempts$teamShorthand=="WSH"|shotattempts$manpowerSituation=="evenStrength")
WSHcorsiTbl<-with(PITcorsi,table(name,outcome))
WSHcorsiTbl
## outcome
## name failed successful
## goal 0 13
## shot 62 71
Corsi = 71/(71+62)=.53 - Which outcome did the dominant team generate their lead in?
The higher number of total shots by PIT allowed them to generate a lead over Washington despite not having too many more successful shots.
Generate a scatterplot of all shot attempts using the xCoord and yCoord columns colored by team. Then do the same thing except this time plot xAdjCoord and yAdjCoord:
coord <- subset(shotattempts,select = c(xCoord,yCoord,period,teamShorthand))
coordadj <- subset(shotattempts,select = c(xAdjCoord,yAdjCoord,period,teamShorthand))
ggplot(coord,aes(xCoord,yCoord,color=teamShorthand))+geom_point()+labs(title = "Unadjusted Coordinates")+scale_color_manual(values = c("#CFC493","#041E42"))
ggplot(coordadj,aes(xAdjCoord,yAdjCoord,color=teamShorthand))+geom_point()+labs(title = "Adjusted Coordinates")+scale_color_manual(values = c("#CFC493","#041E42"))
What is the main difference between the adjusted (Adj) and unadjusted coordinate systems?
Adjusted coordinates have inverse negative values compared to the unadjsuted coordinates.
Compare the adjusted coordinates with unadjusted coordinates by period and teamShorthand.
Unadjusted:
coordperiod1 <- subset(coord,period=="1")
coordperiod2<- subset(coord,period=="2")
coordperiod3<- subset(coord,period=="3")
coordperiod4<- subset(coord,period=="4")
ggplot(coordperiod1,aes(xCoord,yCoord,color=teamShorthand))+geom_point()+labs(title = "Unadjusted Coordinates P1")+scale_color_manual(values = c("#CFC493","#041E42"))
ggplot(coordperiod2,aes(xCoord,yCoord,color=teamShorthand))+geom_point()+labs(title = "Unadjusted Coordinates P2")+scale_color_manual(values = c("#CFC493","#041E42"))
ggplot(coordperiod3,aes(xCoord,yCoord,color=teamShorthand))+geom_point()+labs(title = "Unadjusted Coordinates P3")+scale_color_manual(values = c("#CFC493","#041E42"))
ggplot(coordperiod4,aes(xCoord,yCoord,color=teamShorthand))+geom_point()+labs(title = "Unadjusted Coordinates P4")+scale_color_manual(values = c("#CFC493","#041E42"))
Adjusted:
adjcoordperiod1<- subset(coordadj,period=="1")
adjcoordperiod2<- subset(coordadj,period=="2")
adjcoordperiod3<- subset(coordadj,period=="3")
adjcoordperiod4<- subset(coordadj,period=="4")
ggplot(adjcoordperiod1,aes(xAdjCoord,yAdjCoord,color=teamShorthand))+geom_point()+labs(title = "Adjusted Coordinates P1")+scale_color_manual(values = c("#CFC493","#041E42"))
ggplot(adjcoordperiod2,aes(xAdjCoord,yAdjCoord,color=teamShorthand))+geom_point()+labs(title = "Adjusted Coordinates P2")+scale_color_manual(values = c("#CFC493","#041E42"))
ggplot(adjcoordperiod3,aes(xAdjCoord,yAdjCoord,color=teamShorthand))+geom_point()+labs(title = "Adjusted Coordinates P3")+scale_color_manual(values = c("#CFC493","#041E42"))
ggplot(adjcoordperiod4,aes(xAdjCoord,yAdjCoord,color=teamShorthand))+geom_point()+labs(title = "Adjusted Coordinates P4")+scale_color_manual(values = c("#CFC493","#041E42"))
What does this tell you about how the adjusted coordinates are made?
Adjusted coordinates are generated on the same side of the ice for each team.
Create two new columns in your shot dataframe: xPlotCoord and yPlotCoord. In this new coordinate system, all of Washington’s offensive zone shots should be negative xPlotCoord and all of Pittsburgh’s offensive zone shots should be in the positive xPlotCoord. Make sure you “flip” both the x and y coordinates! Using the newly generated coordinates, generate a scatter plot of Washington and Pittsburgh’s shots colored by team.
shotdf <- subset(shotattempts,select =c(xCoord,yCoord,period,teamShorthand,zone))
shotdfWSH <- (subset(shotdf,teamShorthand == "WSH" & zone == "oz"))%>%
mutate(xPlotCoord = case_when(xCoord > 0 ~ xCoord*-1,xCoord <0 ~ xCoord))%>%
mutate(yPlotCoord = case_when(xCoord > 0 ~ yCoord*-1,xCoord <0 ~ yCoord))
shotdfPIT <- (subset(shotdf,teamShorthand == "PIT" & zone == "oz"))%>%
mutate(xPlotCoord = case_when(xCoord >0 ~ xCoord,xCoord<0 ~ xCoord*-1))%>%
mutate(yPlotCoord = case_when(xCoord >0 ~ yCoord,xCoord <0 ~ yCoord*-1))
ggplot(NULL,aes(xCoord,yCoord))+geom_point(data = shotdfWSH,aes(xCoord,yCoord,color= teamShorthand))+geom_point(data = shotdfPIT,aes(xCoord,yCoord,color=teamShorthand))+scale_color_manual(values = c("#CFC493","#041E42"))
How many shots were from outside the OZ for each team?
outsideOZshots <- subset(gameevents, name == "shot" & zone != "oz")
When filtering for any shots out side the offensive zone “OZ” we’re not given any values. So we can conclude zero shots occured outside the offensive zone
Create a data frame by merging the events, rosters and teams dataframes. Using the the pass and reception events, generate a scatterplot where x,y locations of the pass and reception are connected by a line. To do this, you’ll need to find a way to merge the pass events to their corresponding reception events.
passreception <- subset(gameevents,name== "pass"|name=="reception")
passreception <- subset(passreception, outcome != "failed")
passreception <- merge(passreception,teamroster, by= 'playerId')
passreception <- passreception[-c(1021,1318),]
passreception <- passreception[passreception$outcome == "successful",] %>%
mutate(pair=rep(1:(n()/2),each=2))
passrecpadj <- passreception %>% ggplot(aes(xAdjCoord,yAdjCoord))+geom_point(aes(color=teamShorthand))+geom_line(aes(group=pair,color=teamShorthand),alpha=.25)+scale_color_manual(values = c("#CFC493","#041E42"))+labs(title = "Adjusted Pass Reception Plot")
plot(passrecpadj)
passrecp <- passreception %>% ggplot(aes(xCoord,yCoord))+geom_point(aes(color=teamShorthand))+geom_line(aes(group=pair,color=teamShorthand),alpha=.25)+scale_color_manual(values = c("#CFC493","#041E42"))+labs(title = "Unadjusted Pass Receptions")
plot(passrecp)
Which of the three coordinate systems (unadjusted, adjusted, plot) we’ve used would be best suited to this plot and why? Ideally the unfiltered because it would lead to less layering overtop one another in the plot creating a slightly less messy visual.
Based on the locations of the pass and reception events. What are the areas of the ice that are least likely to be passed through, what areas are the most likely?
Looking at areas of the line desnsity we can see around the back of the goal as well as the neutral zone line has a slightly denser population of dots and lines where passes where intitiated and receieved.
Generate a seperate pass-reception plot for each team. What trends if any do you notice?
passrecepPIT <- subset(passreception,teamShorthand == 'PIT')
passrecepWSH <- subset(passreception,teamShorthand == 'WSH')
passrecepPIT %>% ggplot(aes(xAdjCoord,yAdjCoord))+geom_point(aes(color=teamShorthand))+geom_line(aes(group=pair,color=teamShorthand),alpha=.50)+scale_color_manual(values = c("#CFC493","#041E42"))+labs(title = "PIT Pass Reception Plot")+scale_color_manual(values = c("#CFC493"))
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
passrecepWSH %>% ggplot(aes(xAdjCoord,yAdjCoord))+geom_point(aes(color=teamShorthand))+geom_line(aes(group=pair,color=teamShorthand),alpha=.50)+scale_color_manual(values = c("#041E42"))+labs(title = "WSH Pass Reception Plot")
PIT tends to have a more concentrated pass reception near the neutral
zone where WSH has a fairly evenly distributed pass reception across
most of the ice.
Generate a seperate pass-reception plot for forwards and defenders by team. What trends if any do you notice?
passrecepD <- subset(passreception,primaryPosition == 'D')
passrecepDPIT <- subset(passrecepD,teamShorthand == "PIT")
passrecepDWSH <- subset(passrecepD,teamShorthand == "WSH")
passrecepF <- subset(passreception,primaryPosition == 'F')
passrecepFPIT <- subset(passrecepF,teamShorthand == "PIT")
passrecepFWSH <- subset(passrecepF,teamShorthand == "WSH")
passrecepDPIT %>% ggplot(aes(xAdjCoord,yAdjCoord))+geom_point(aes(color=teamShorthand))+geom_line(aes(group=pair,color=teamShorthand),alpha=.50)+labs(title = "Defensive Pass Reception Plot PIT")+scale_color_manual(values = c("#CFC493"))
passrecepDWSH %>% ggplot(aes(xAdjCoord,yAdjCoord))+geom_point(aes(color=teamShorthand))+geom_line(aes(group=pair,color=teamShorthand),alpha=.50)+labs(title = "Defensive Pass Reception Plot WSH")+scale_color_manual(values = c("#041E42"))
passrecepFPIT %>% ggplot(aes(xAdjCoord,yAdjCoord))+geom_point(aes(color=teamShorthand))+geom_line(aes(group=pair,color=teamShorthand),alpha=.50)+labs(title = "Offensive Pass Reception Plot PIT")+scale_color_manual(values = c("#CFC493"))
passrecepFWSH %>% ggplot(aes(xAdjCoord,yAdjCoord))+geom_point(aes(color=teamShorthand))+geom_line(aes(group=pair,color=teamShorthand),alpha=.50)+labs(title = "Offensive Pass Reception Plot WSH")+scale_color_manual(values = c("#041E42"))
Defensive pass reception zones are understandably limited to one side of
the ice with offensive passes taking up most of the other side of the
ice and more concentrated behind and infront of the goal.
Using the playersOnIce column in the events dataframe, create two new columns teamPlayersOnIds and opponentPlayersOnIds.
Use the players on ice data to make a new dataframe calculating each player’s while on ice Corsi ratio.
alldata<-merge(teams,gamerosters, by = "teamId")
alldata<-merge(alldata,gameevents, by = "playerId")
pcorsi<-alldata%>%filter((alldata$name=="shot"|alldata$name=="goal")&alldata$manpowerSituation=="evenStrength")
PIT_roster<-teamroster%>%filter(teamroster$teamShorthand=="PIT")
WSH_roster <- teamroster%>%filter(teamroster$teamShorthand=="WSH")
PITcorsishot <- pcorsi%>%filter(pcorsi$teamShorthand=="PIT")
WSHcorsishot <- pcorsi%>%filter(pcorsi$teamShorthand=="WSH")
for (i in 1:18) {
PIT_roster$Corsi[i] <- length(which(grepl(PIT_roster$playerId[i], PITcorsishot$playersOnIce)))/
length(which(grepl(PIT_roster$playerId[i], pcorsi$playersOnIce)))
}
for (i in 1:18) {
WSH_roster$Corsi[i] <- length(which(grepl(WSH_roster$playerId[i], WSHcorsishot$playersOnIce)))/
length(which(grepl(WSH_roster$playerId[i], pcorsi$playersOnIce)))
}
PIT_roster[which.max(PIT_roster$Corsi),]
## teamId firstName handedness playerId jerseyNum lastName primaryPosition
## 6 30 Dominik L 1670 12 Simon F
## homeprimarycolor homesecondarycolor location teamName teamShorthand Corsi
## 6 000000 C5B358 Pittsburgh Penguins PIT 0.75
PIT_roster[which.min(PIT_roster$Corsi),]
## teamId firstName handedness playerId jerseyNum lastName primaryPosition
## 10 30 Daniel R 1088 41 Sprong F
## homeprimarycolor homesecondarycolor location teamName teamShorthand
## 10 000000 C5B358 Pittsburgh Penguins PIT
## Corsi
## 10 0.2727273
WSH_roster[which.max(WSH_roster$Corsi),]
## teamId firstName handedness playerId jerseyNum lastName primaryPosition
## 12 31 Christian L 1564 29 Djoos D
## homeprimarycolor homesecondarycolor location teamName teamShorthand Corsi
## 12 E21936 002147 Washington Capitals WSH 0.5625
WSH_roster[which.min(WSH_roster$Corsi),]
## teamId firstName handedness playerId jerseyNum lastName primaryPosition
## 6 31 Chandler L 1589 18 Stephenson F
## homeprimarycolor homesecondarycolor location teamName teamShorthand
## 6 E21936 002147 Washington Capitals WSH
## Corsi
## 6 0.04347826
The Corsi ratio is the ratio of shot attempts for divided by the total shot attempts while the player was on the ice. Make sure you join the playerids to the rosters table to get player names.
Which player had the best or worst Corsi ratio on each team?
PIT Max = Dominik w/.75
PIT Min = Daniel w/.27
WSH Max = Christian w/.56
WSH Min = Chandler w/.04