Introduction to My Analysis

In this analysis, I explore the top soccer goal scorers and trends in goal scoring from 2016 to 2020. Using data visualization techniques, I examine individual player performance, goal distribution by country, and scoring trends across leagues and seasons. Through bar charts, pie charts, line graphs, and heatmaps, I identify key patterns, including the most prolific goal scorers, the countries contributing the most goals, and how goal-scoring trends have evolved over time. This analysis provides valuable insights into the dynamics of goal scoring in professional soccer.

About My Dataset

The dataset on top goal scorers from 2016 to 2020 provides a comprehensive overview of player performance during this period. It includes key statistics such as the player’s name, country, league, expected goals (xG), and numerous other soccer-related metrics. These metrics offer valuable insights into the players’ offensive contributions and overall impact within their respective teams and leagues, allowing for a detailed analysis of goal-scoring trends over the span of these years.

Findings

Some key findings from the dataset on top goal scorers from 2016 to 2020 highlight impressive trends in player performance across various leagues. For instance, players like Lionel Messi and Cristiano Ronaldo consistently ranked among the highest goal scorers, showcasing their dominance in both the La Liga and Serie A leagues. The dataset also reveals that players from the English Premier League, such as Pierr-Emerick Aubameyang, maintained strong goal-scoring rates, reflecting the league’s competitive nature. Additionally, the xG (expected goals) metric indicated that certain players, like Robert Lewandowski, often outperformed their xG, underlining their clinical finishing abilities. The dataset also provided insights into emerging talents from leagues outside of the traditional top five, as players from countries like Belgium and France made their mark in major European competitions.

Visualization 1

For my first visualization, I aimed to highlight the top goal scorers from 2016 to 2020. To achieve this, I created a straightforward bar chart, with players represented on the x-axis and their total goals on the y-axis. To enhance clarity and efficiency, I included goal totals at the top of each bar, making it easy to compare player performances at a glance

library(data.table)
library(ggplot2)
library(dplyr)
library(lubridate)
library(scales)
library(ggthemes)
library(RColorBrewer)
library(plotly)


filename <- "Top_Soccer_Scorers.csv"
df <- fread(filename)
df <- as.data.frame(df)
goal_count <- df %>%
  group_by(Player_Names) %>%
  summarise(n = sum(Goals)) %>%
  arrange(desc(n))
top_10_goal_scorers <- head(goal_count, 10)
top_10_goal_scorers$Player_Names <- factor(top_10_goal_scorers$Player_Names,                                     levels = top_10_goal_scorers$Player_Names)
ggplot(top_10_goal_scorers, aes(x = reorder(Player_Names, -n), y = n)) + 
  geom_bar(colour = "black", fill = "darkred", stat = "identity") +
  geom_text(aes(label = n), vjust = -0.5, size = 2) +
  labs(title = "Top 10 Goal Scorers (2016-2020)", x = "Player", y = "Goals Scored") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5),
        axis.text = element_text(size=5))

Visualization 2

For my second visualization, I designed a pie chart to showcase the top 10 countries with the highest total goals scored between 2016 and 2020. This visualization provides a clear and engaging way to compare goal distribution across different nations. To enhance user interaction, I implemented a hover feature that allows viewers to see detailed insights, including the exact goal count and its corresponding percentage, when gliding over any of the countries. This interactive element makes it easier to analyze which nations contributed the most to goal-scoring during this period.

country_goal_count <- df %>%
  filter(!is.na(Country)) %>% 
  count(Country, wt = Goals)
country_goal_count_df <- data.frame(country_goal_count)
country_goal_count_df <- country_goal_count_df[order(country_goal_count_df$n, decreasing = TRUE), ]
top_10_countries <- country_goal_count_df[1:10, ]
top_10_countries <- df %>%
  group_by(Country) %>%
  summarise(total_goals = sum(Goals), .groups = 'drop') %>%
  arrange(desc(total_goals)) %>%
  head(10)
plot_ly(top_10_countries, labels = ~Country, values = ~total_goals, type = 'pie', 
        textinfo = 'label+percent', 
        title = "Top 10 Countries with Most Goals Scored (2016-2020)") %>%
  layout(showlegend = TRUE)

Visualization 3

For my third visualization, I chose to create a heatmap to analyze goal-scoring performance across different leagues. By focusing on the top six leagues in the world, this visualization provides a clear comparison of how prolific each league’s top goal scorers were during the 2016-2020 period. The heatmap is particularly valuable in identifying trends, such as which leagues consistently produce high-scoring players and whether certain leagues stand out as having a more offensive style of play. This visual representation allows for a deeper understanding of goal distribution among elite leagues and highlights variations in scoring intensity across different competitions.

heatmap_df <- df %>%
  group_by(League, Year) %>%
  summarise(Total_Goals = sum(Goals, na.rm = TRUE), .groups = 'keep') %>%
  data.frame()
top_6_leagues <- heatmap_df %>%
  group_by(League) %>%
  summarise(Total = sum(Total_Goals)) %>%
  arrange(desc(Total)) %>%
  slice(1:6) %>%
  pull(League)
heatmap_df <- heatmap_df %>%
  filter(League %in% top_6_leagues)
complete_grid <- expand.grid(League = top_6_leagues, Year = unique(heatmap_df$Year))
heatmap_df <- complete_grid %>%
  left_join(heatmap_df, by = c("League", "Year")) %>%
  mutate(Total_Goals = ifelse(is.na(Total_Goals), 0, Total_Goals)) 
breaks <- seq(0, max(heatmap_df$Total_Goals, na.rm = TRUE), by = 50)
ggplot(heatmap_df, aes(x = Year, y = League, fill = Total_Goals)) +
  geom_tile(color = "white", size = 0.2) +  
  geom_text(aes(label = ifelse(Total_Goals > 0, comma(Total_Goals), "")), size = 3, color = "black") +  # Clear labels
  coord_equal(ratio = 1) +
  labs(title = "Heatmap: Goals Scored Per Top 6 Leagues by Year",
       x = "Year",
       y = "League",
       fill = "Total Goals") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5),
        axis.text.x = element_text(angle = 45, hjust = 1),
        axis.text.y = element_text(size = 10)) +  
  scale_y_discrete(limits = rev(levels(heatmap_df$League))) +  
  scale_fill_continuous(low = "white", high = "red", breaks = breaks) +
  guides(fill = guide_legend(reverse = TRUE, override.aes = list(colour = "black"))) +
  theme(panel.grid = element_blank()) 

Visualization 4

For my fourth visualization, I created a multi-bar chart to display the total number of goals scored each year across the top five leagues in the world. This chart provides a detailed year-by-year comparison, offering insights into how goal-scoring trends evolved from 2016 to 2020. By showing multiple bars for each year, the chart highlights fluctuations in goal totals, making it easier to identify patterns or significant changes in the offensive performance of each league. This visualization is essential for understanding how different leagues compared in terms of scoring over time, revealing any shifts in goal-scoring intensity and possibly reflecting factors like changes in team strategies, player performance, or league-wide developments.

league_df <- df %>%
  select(League, Year, Goals) %>%
  group_by(League) %>%
  summarise(Total_Goals = sum(Goals, na.rm = TRUE)) %>%
  arrange(desc(Total_Goals)) %>%
  slice(1:5) %>%  
  left_join(df, by = "League") %>%  
  group_by(Year, League) %>%
  summarise(Total_Goals = sum(Goals, na.rm = TRUE), .groups = 'keep') %>%
  data.frame()
league_df$Year <- factor(league_df$Year)
league_order <- league_df %>%
  group_by(League) %>%
  summarise(Total = sum(Total_Goals)) %>%
  arrange(desc(Total)) %>%
  pull(League)
league_df$League <- factor(league_df$League, levels = league_order)
x = min(as.numeric(levels(league_df$Year)))
y = max(as.numeric(levels(league_df$Year)))

league_df$Year <- factor(league_df$Year, levels = seq(y, x, by = -1))
ggplot(league_df, aes(x = League, y = Total_Goals, fill = Year)) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_text(aes(label = Total_Goals), 
            position = position_dodge(width = 0.9), 
            vjust = -0.5, size = 2, fontface = "bold") +  # Labels on top of bars
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5),
        axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_y_continuous(labels = comma) +
  labs(title = "Total Goals by Year (Top 5 Leagues)",
       x = "League",
       y = "Total Goals", 
       fill = "Year") +
  scale_fill_brewer(palette = "Set2") +
  facet_wrap(~Year, ncol = 3, nrow = 2)

Visualization 5

For my fifth and final visualization, I created a line plot to illustrate the fluctuations in total goals scored per year from 2016 to 2020. This visualization effectively captures the year-by-year changes in goal-scoring trends, allowing for a clear understanding of how the total goals evolved over time. By connecting the data points with a smooth line, it highlights key trends, such as any sharp increases or declines, and provides a visual representation of how goal-scoring performance varied from one year to the next. Additionally, red dots are placed at each data point to emphasize specific yearly totals, while text labels further clarify the exact goal count for each year. This line plot offers valuable insights into the dynamics of goal-scoring over the analyzed period, helping to identify patterns and shifts in performance across seasons.

goals_by_year <- df %>%
  group_by(Year) %>%
  summarize(Total_Goals = sum(Goals))
ggplot(goals_by_year, aes(x = Year, y = Total_Goals)) +
  geom_line(color = "blue", size = 1.2) +  
  geom_point(color = "red", size = 3) +  
  geom_text(aes(label = Total_Goals), vjust = -0.5, size = 5, fontface = "bold") +
  labs(title = "Total Goals Scored Per Year (2016-2020)",
       x = "Year",
       y = "Total Goals") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))