Introduction

I analyzed La Liga 2017 data and created 5 graphs. Especially, I paid attention to Barelcona, which was the first-placed in the La liga in 2017, and Malaga, which was the bottom of the league. Also, I analyzed relationship between score and possession.

filename <- "/Users/sada/R_datafiles//laliga2017.csv"
df <- fread(filename, na.strings=c(NA, ""))

1: Score of laliga 2017 by frequency (Top10)

This histogram shows Top10 score of laliga 2017 by frequency. The most frequent games were 0-1 and 2-1. There were followed by 1-1 and 0-0. It indicates that many games were very close. On the other hand, 3-0 matches ranked ninth, and 3-1 matches ranked tenth. It also indicates that some games had large differences in strength.

scorecount <- data.frame(plyr::count(df, "Score"))
scorecount <- scorecount[order(scorecount$freq, decreasing = TRUE), ]

scorecount$freq <- as.numeric(scorecount$freq)

ggplot(scorecount[2:11,], aes(x = reorder(Score, -freq), y=freq)) +
  geom_bar(colour="black", fill="lightblue", stat="identity") +
  labs(title = "Score of laliga 2017 by frequency (Top10)", x = "Score", y = "Frequency") +
  theme(plot.title = element_text(hjust = 0.5))+
  theme_light() +
  geom_text(aes(label = freq), vjust = -0.2, colour="black")


2: Barcelona Home Score by Mean of Possession

This is a line chart of home scores and average possession for Barcelona, which was the first-placed in the La liga in 2017. The highest average possession was 79% in 0-0 games. On the other hand, the lowest average possession was 56% in 5-1 matches. However, all games average possession was more than 55%. This chart shows the Barcelona’s soccer style.

barcelona_df <- df %>%
  filter(`Home Team` == 'BARCELONA') %>%
  select(Score, `Home Team Possession %`, `Away Team Possession %`, `Match Excitement`, `Away Team`) %>%
  data.frame()

mean_barcelona_df <- aggregate(barcelona_df$Home.Team.Possession.., list(barcelona_df$Score),mean)

ggplot(mean_barcelona_df, aes(x = Group.1, y = x, group=1)) +
  geom_line(color='black', size=1) +
  geom_point(shape=21, size=3, color='black', fill='indianred1') +
  labs(x="Score", y = "Mean of Possession", title="Barcelona Home Score by Mean of Possession") +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5)) +
  geom_label_repel(aes(label = ifelse(x == max(x) | x == min(x), scales::comma(x) , "")), 
                   box.padding = 1, 
                   point.padding = 1, 
                   size=3, 
                   color='Grey50',
                   segment.color = 'darkblue')

3: Malaga’s relationship between Excitement, Home score, Possession

This is a graph of Málaga’s games at the bottom of La Liga in 2017. It shows the relationship between home score, possession, and excitement. The most exciting game was the 3-3 score. On the other hand, the 0-0 score game, which had the highest possession, was the least exciting match. Compared to the previous graph of Barcelona, the possession was not exceeded 50% in home games.

malaga_df <- df %>%
  filter(`Home Team` == 'M\xc1LAGA') %>%
  select(Score, `Home Team Possession %`, `Match Excitement`) %>%
  data.frame()

mean_malaga_df <- aggregate(malaga_df[c("Match.Excitement","Home.Team.Possession..")], list(malaga_df$Score),FUN = function(x) round(mean(x),1))

ggplot(mean_malaga_df, aes(x=Group.1, y = Home.Team.Possession..)) +
  geom_bar(colour="black", fill="lemonchiffon",stat = "identity", position = position_stack(reverse = TRUE)) +
  coord_flip() +
  theme_light() +
  labs(title = "Malaga's relationship between Excitement, Home score, Possession", x ="Score", y = " Average Possession (%)") +
  theme(plot.title = element_text(hjust = 0.5)) +
  geom_text(aes(label = Home.Team.Possession..), hjust = -1, colour="black") +
  geom_line(inherit.aes = FALSE, data = malaga_df,
            aes(x =Score, y = Match.Excitement, group = 1), size =1) +
  scale_color_manual(NULL, values = "black") +
  scale_y_continuous(labels=comma, 
                     sec.axis = sec_axis(~., name = "Average Match Excitement"))+
  geom_point(inherit.aes = FALSE, data = mean_malaga_df,
             aes(x = Group.1, y = Match.Excitement, group = 1),
             size = 3, shape = 21, fill = "white", color = "black") +
  geom_label_repel(aes(label = ifelse(Match.Excitement == max(Match.Excitement) | Match.Excitement == min(Match.Excitement), scales::comma(Match.Excitement) , "")), 
                   box.padding = 1,
                   point.padding = 1, 
                   size=3, 
                   color='Grey50',
                   segment.color = 'darkblue')

4: Multiple Bar Charts - Barcelona vs Real Madrid, Atletico Madrid, Valencis, Villarreal

This is the result of Barcelona’s games againts the 2nd-place Atletico Madrid, 3rd-place Real Madrid, 4th-place Valencia, 5th-place Villarreal in La Liga 2017. The graphs show the score and the possession of Barcelona’s opponents. In all games, Barcelona did not allow their opponents to take possession. Also, they only lost one game in the games against the top 5. The results had a great impact on the championship.

Home Balrcelona

barcelona2_df <- df %>%
  filter(`Home Team` == 'BARCELONA') %>%
  filter(`Away Team` %in% c('REAL MADRID', 'ATLETICO MADRID', 'VALENCIA', 'VILLARREAL' )) %>%
  select(Score, `Home Team Possession %`, `Away Team Possession %`, `Match Excitement`, `Away Team`) %>%
  data.frame()

ggplot(barcelona2_df, aes(x = Score, y = Away.Team.Possession.., fill=Away.Team)) +
  geom_bar(colour="gray18", stat="identity", position="dodge") +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5)) +
  geom_text(aes(label = Away.Team.Possession..),vjust=1, colour="black") +
  scale_y_continuous(labels = comma) +
  labs(title = "Multiple Bar Charts - Barcelona (home) vs Real Madrid, Atletico Madrid, Valencis, Villarreal",
        x = "Socre",
        y = "Away Team Possession",
        fill = "Away.Team") +
  scale_fill_brewer(palette = "Set2") +
  facet_wrap(~Away.Team, ncol=2, nrow=2)

Away Barcelona

barcelona3_df <- df %>%
  filter(`Home Team` %in% c('REAL MADRID', 'ATLETICO MADRID', 'VALENCIA', 'VILLARREAL' )) %>%
  filter(`Away Team` == 'BARCELONA') %>%
  select(Score, `Home Team Possession %`, `Away Team Possession %`, `Match Excitement`, `Away Team`, `Home Team`) %>%
  data.frame()

ggplot(barcelona3_df, aes(x = Score, y = Home.Team.Possession.., fill=Home.Team)) +
  geom_bar(color="grey18", stat="identity", position="dodge") +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5)) +
  geom_text(aes(label = Home.Team.Possession..),vjust=1, colour="black") +
  scale_y_continuous(labels = comma) +
  labs(title = "Multiple Bar Charts - Barcelona (away) vs Real Madrid, Atletico Madrid, Valencis, Villarreal",
       x = "Socre",
       y = "Home Team Possession",
       fill = "Home.Team") +
  scale_fill_brewer(palette = "Set2") +
  facet_wrap(~Home.Team, ncol=2, nrow=2)

5: Heatmap: Home Team’s On target shots percentage

This is the games of the top five teams in the La Liga 2017. It shows the home team’s on target shots percentage. The highest percentage was Atletico at 80% against Barcelona. On the other hand, Valencia’s on target shots percentage was less than 14% except the game against Real. This shows Valencia is weak in games against the stronger teams.

ontarget_df <- df %>%
   filter(`Home Team` %in% c('BARCELONA','REAL MADRID', 'ATLETICO MADRID', 'VALENCIA', 'VILLARREAL' )) %>%
   filter(`Away Team` %in% c('BARCELONA','REAL MADRID', 'ATLETICO MADRID', 'VALENCIA', 'VILLARREAL' )) %>%
   select(Score, `Home Team Possession %`, `Away Team Possession %`, `Match Excitement`, `Away Team`, `Home Team`, `Home Team Total Shots`, `Home Team On Target Shots`) %>%
   data.frame()

ontarget2_df <- df %>%
  filter(`Home Team` %in% c('BARCELONA','REAL MADRID', 'ATLETICO MADRID', 'VALENCIA', 'VILLARREAL' )) %>%
  filter(`Away Team` %in% c('BARCELONA','REAL MADRID', 'ATLETICO MADRID', 'VALENCIA', 'VILLARREAL' )) %>%
  select(Score, `Home Team Possession %`, `Away Team Possession %`, `Match Excitement`, `Away Team`, `Home Team`, `Home Team Total Shots`, `Home Team On Target Shots`) %>%
  mutate(ontargetpercent = round(100*ontarget_df$Home.Team.On.Target.Shots/ontarget_df$Home.Team.Total.Shots, 1)) %>%
  data.frame()

ggplot(ontarget2_df, aes(x = Home.Team, y = Away.Team, fill= ontargetpercent)) +
  geom_tile(color="black") + 
  geom_text(aes(label= comma(ontargetpercent))) +
  coord_equal(ratio=1) +
  labs(title="Heatmap: Home Team's On target shots percentage",
     x = "Home Team",
     y = "Away Team",
     fill = "On target shots %") +
  theme_minimal() +
  theme(plot.title = element_text(hjust=0.5)) +
  scale_fill_continuous(low="white", high="skyblue") +
  guides(fill = guide_legend(reverse=TRUE, override.aes=list(color="black")))