There are tons of interesting questions a sports enthusiast can answer with this dataset. For example:
What is the value of a shot? Or what is the probability of a shot being a goal given it’s location, shooter, league, assist method, gamestate, number of players on the pitch, time - known as expected goals (xG) models When are teams more likely to score? Which teams are the best or sloppiest at holding the lead? Which teams or players make the best use of set pieces? In which leagues is the referee more likely to give a card? How do players compare when they shoot with their week foot versus strong foot? Or which players are ambidextrous? Identify different styles of plays (shooting from long range vs shooting from the box, crossing the ball vs passing the ball, use of headers) Which teams have a bias for attacking on a particular flank? And many many more…" https://www.kaggle.com/secareanualin/football-events/home
event_type 0 Announcement 1 Attempt 2 Corner 3 Foul 4 Yellow card 5 Second yellow card 6 Red card 7 Substitution 8 Free kick won 9 Offside 10 Hand ball 11 Penalty conceded
event_type2 12 Key Pass 13 Failed through ball 14 Sending off 15 Own goal
side 1 Home 2 Away
shot_place 1 Bit too high 2 Blocked 3 Bottom left corner 4 Bottom right corner 5 Centre of the goal 6 High and wide 7 Hits the bar 8 Misses to the left 9 Misses to the right 10 Too high 11 Top centre of the goal 12 Top left corner 13 Top right corner
shot_outcome 1 On target 2 Off target 3 Blocked 4 Hit the bar
location 1 Attacking half 2 Defensive half 3 Centre of the box 4 Left wing 5 Right wing 6 Difficult angle and long range 7 Difficult angle on the left 8 Difficult angle on the right 9 Left side of the box 10 Left side of the six yard box 11 Right side of the box 12 Right side of the six yard box 13 Very close range 14 Penalty spot 15 Outside the box 16 Long range 17 More than 35 yards 18 More than 40 yards 19 Not recorded
bodypart 1 right foot 2 left foot 3 head
assist_method 0 None 1 Pass 2 Cross 3 Headed pass 4 Through ball
situation 1 Open play 2 Set piece 3 Corner 4 Free kick
i. Libraries
library('tidyverse')
library('ggplot2')
ii. load data and drop text column
events <- read_csv('events.csv') %>%
select(-text)
iii. explore the dataset
glimpse(events)
## Observations: 941,009
## Variables: 21
## $ id_odsp <chr> "UFot0hit/", "UFot0hit/", "UFot0hit/", "UFot0hit...
## $ id_event <chr> "UFot0hit1", "UFot0hit2", "UFot0hit3", "UFot0hit...
## $ sort_order <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1...
## $ time <int> 2, 4, 4, 7, 7, 9, 10, 11, 11, 13, 14, 14, 14, 17...
## $ event_type <int> 1, 2, 2, 3, 8, 10, 2, 8, 3, 3, 8, 1, 3, 1, 1, 3,...
## $ event_type2 <int> 12, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 12, ...
## $ side <int> 2, 1, 1, 1, 2, 2, 2, 1, 2, 2, 1, 1, 2, 1, 1, 1, ...
## $ event_team <chr> "Hamburg SV", "Borussia Dortmund", "Borussia Dor...
## $ opponent <chr> "Borussia Dortmund", "Hamburg SV", "Hamburg SV",...
## $ player <chr> "mladen petric", "dennis diekmeier", "heiko west...
## $ player2 <chr> "gokhan tore", "dennis diekmeier", "heiko wester...
## $ player_in <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ player_out <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ shot_place <int> 6, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 13, N...
## $ shot_outcome <int> 2, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2, NA...
## $ is_goal <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ...
## $ location <int> 9, NA, NA, NA, 2, NA, NA, 2, NA, NA, 4, 15, NA, ...
## $ bodypart <int> 2, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, NA...
## $ assist_method <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, ...
## $ situation <int> 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, NA...
## $ fast_break <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
iv. The perfect game…home or away? This question sets out to answer whether a “perfect game” happens more often at home or away matches
# create a new column using mutate named side_ signifying whether a team is playing home or away.
# filter out those rows where either shot_outcome is missing, any foul play, and at least one goal is scored.
# create a plot with situation on the x-axis, faceted by assist method and the bars have been filled via side_
# to show the ratios of home and away play in regard to situation and assist in goal scoring games.
events %>%
mutate(side_ = as.factor(ifelse(side==1,"Home", "Away"))) %>%
filter(
!is.na(shot_outcome),
shot_outcome == 1,
event_type != c(3,4,5,6,10,11),
event_type2 != 14
) %>%
ggplot(aes(situation, fill=side_)) +
geom_bar(color='black', position = "dodge") +
facet_grid(~assist_method) +
ggtitle(" Team Play in a Perfect Game")

The definition of a “perfect game” in the eyes of most supporters is one where at least one goal is scored, that there is no foul play (foul play includes fouls, players receiving cards or being sent off) and the players play as a team. Fouls and goals have been controlled for, the variable factors are team play in regard to the assistance of other players, and whether this happens more at home or away games. This question is of importance to most loyal fans as they invest a lot of money, emotional energy, want to see good quality football and this information could help them choose which fixtures to attend. From prior reseach the question has not already been answered and the question is answerable from the data provided.
Key information
A “perfect game”" always happens at home.
Most goals come from a pass in open play in a “perfect game”. (situation == 1 == Open play & assist_method == 1 == Pass)
Corners and set piece play there is much less difference between home and away and team play. (situation == 2 or 3 == Set_piece == Free_kick)
Almost no goals were scored in a “perfect game” with no assist method. (situation == 1 == Open play & assist_method == 0 == None)