In this R Markdown, I will present data on the players who are regarded as the top goal scorers worldwide as well as information on their performance levels. In addition to developing a method for evaluating the best players from these same leagues who are the most effective, I will concentrate my analysis on players who mostly play in the Brazilian, Spanish, and English leagues. I chose these leagues because, in my opinion, they are the best leagues in the world. I did this because I wanted to examine the best players in the best leagues to see how they performed.
I made the decision to take a step back and consider a wider perspective when it comes to scoring goals in order to deepen my understanding. I wanted to determine which nations had the potential to produce top goal scorers by looking at how goals are distributed across the world. The players with the highest annual minutes played were then identified and represented, as a player’s time on the field has a significant impact on his ability to score goals. According to logic, a player who plays for a longer period of time will have more opportunities to score goals than those who do not.
To conclude my analysis, I decided to come up with a little comparison between Lionel Messi and Cristiano Ronaldo because they are the two best soccer players in the world and there seems to be an ongoing and endless discussion regarding who is the best. The comparison is by analyzing the expected goals to be scored versus the actual number of goals scored by each one of them. This will indicate the player that has lived up to the expectations and the one who did not.
df <- read.csv("Data.csv")
# storing all downloads and libraries here #
library(lubridate)
library(dplyr)
library(ggplot2)
library(scales)
library(RColorBrewer)
library(ggthemes)
library(plyr)
library(ggrepel)
library(plotly)
library(cowplot)
# fixing the name of the League, creating data frame with top scorers, and creating effectiveness percentage for each player #
df$League[df$League == "Campeonato Brasileiro Série A"] <- "Campeonato Brasileiro Serie A"
brazil_league_most_goals <- df[df$Country == "Brazil", c("League","Player.Names", "Goals", "Matches_Played", "Year")]
brazil_league_most_goals <- brazil_league_most_goals[order(brazil_league_most_goals$Goals, decreasing = TRUE),]
brazil_league_most_goals$EffectivenessPercentage <- round(brazil_league_most_goals$Goals/brazil_league_most_goals$Matches_Played *100, 0)
# selecting only the top 15 most scoring players from Brazil (Campeonato Brasileiro Serie A) #
top_15_brazil_players <- brazil_league_most_goals %>%
arrange(desc(Goals)) %>%
head(15)
# creating data frame with top scorers and creating effectiveness percentage for each player #
spain_league_most_goals <- df[df$Country == "Spain", c("League","Player.Names", "Goals", "Matches_Played", "Year")]
spain_league_most_goals <- spain_league_most_goals[order(spain_league_most_goals$Goals, decreasing = TRUE), ]
spain_league_most_goals$EffectivenessPercentage <- round(spain_league_most_goals$Goals/spain_league_most_goals$Matches_Played *100, 0)
# selecting only the top 15 most scoring players from Spain (La Liga) #
top_15_spanish_players <- spain_league_most_goals %>%
arrange(desc(Goals)) %>%
head(15)
# creating a data frame with top scorers and creating effectiveness percentage for each player #
english_league_most_goals <- df[df$Country == "England", c("League","Player.Names", "Goals", "Matches_Played", "Year")]
english_league_most_goals <- english_league_most_goals[order(english_league_most_goals$Goals, decreasing = TRUE), ]
english_league_most_goals$EffectivenessPercentage <- round(english_league_most_goals$Goals/english_league_most_goals$Matches_Played *100, 0)
# selecting only the top 15 most scoring players from England (Premier League) #
top_15_english_players <- english_league_most_goals %>%
arrange(desc(Goals)) %>%
head(15)
The top 15 player names who at some point in their careers were regarded as the highest scorers in the Campeonato Brasileiro Serie A are displayed in the visualization. With 25 goals in 2019, Gabriel Barbosa is the only player whose name appears twice on the list. This indicates that he was the league’s leading scorer in both 2018 and 2019, which is an outstanding achievement. Thiago Galhardo, who scored 15 goals in 2020, is the most recent top scorer in the timeline.
Here is the plot for the Campeonato Brasileiro Serie A:
# time to plot our first visualization to analyze the top 15 goal scorers by Year #
ggplot(top_15_brazil_players, aes(x = Goals, y = reorder(Player.Names, Goals), fill = as.factor(Year))) +
geom_bar(stat="identity", position = "dodge") +
labs(title = "Top Scorers In The Campeonato Brasileiro Serie A", x = "Number of Goals", y = "Player Names", fill = "Year") +
theme_light() +
theme(plot.title = element_text(hjust = 0.5)) +
scale_fill_brewer(palette = "Paired", guide = guide_legend(reverse = TRUE)) +
geom_text(aes(label = Goals), size = 3, position = position_dodge(width = 0.9), hjust=-0.5) +
scale_x_continuous(breaks = seq(0, 50, by = 5))
We can immediately see from this representation that the list of top scorers is shorter than the list of scorers from the Brazilian league. Due to the fact that all of the individuals listed were consistently regarded as the top scorers across numerous years, just six players are included. The top player is Lionel Messi, who was consistently regarded as the league’s top scorer in 2016, 2017, 2018, and 2019. Because remaining among the top scorers for numerous seasons is something to be proud of, it is crucial to emphasize that these are all great numbers and incredible performers.
Here is the plot for La Liga top scorers:
# plotting visualization for Spanish League #
ggplot(top_15_spanish_players, aes(x = Goals, y = reorder(Player.Names, Goals), fill = as.factor(Year))) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Top Scorers In The La Liga", x = "Number of Goals", y = "Player Names", fill = "Year") +
theme_light() +
theme(plot.title = element_text(hjust = 0.5)) +
scale_fill_brewer(palette = "Spectral", guide = guide_legend(reverse = TRUE)) +
geom_text(aes(label = Goals), size = 3, position = position_dodge(width = 0.9), hjust = -0.6) +
scale_x_continuous(breaks = seq(0,40, by = 5))
In the Premier League visualization, the players to be highlighted are Pierre-Emerick Aubameyang, Sergio Aguero, and Mohamed Salah because they are the players who were able to remain in the list of top scorer players for more than one year. Pierre-Emerick Aubameyang had the ability to score 22 goals for both years of 2018 and 2019 and Mohamed Salah scored 22 goals in the year of 2018 and 19 goals in the year of 2019. Even though, the player who was the top scorer in the most recent year of 2019 was Jamie Vardy with 23 goals which beats the two players just highlighted. The third player who requires attention is Sergio Aguero who was in the list of top scorer in the years of 2016 and 2018 with 21 goals in both seasons. The player with the most goals scored in a season in the Premier League is Harry Kane with 29 goals and the least goals is 18 scored by Dele Alli.
Here is the plot for the Premier League top scorers:
# plotting for English top 15 scorers #
ggplot(top_15_english_players, aes(x = Goals, y = reorder(Player.Names, Goals), fill = as.factor(Year))) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Top Scorers In The Premier League", x = "Number of Goals", y = "Player Names", fill = "Year") +
theme_light() +
theme(plot.title = element_text(hjust = 0.5)) +
scale_fill_brewer(palette = "Spectral", guide = guide_legend(reverse = TRUE)) +
geom_text(aes(label = Goals), size = 3, position = position_dodge(width = 0.9), hjust = -0.3)
By adding up a player’s goals scored, dividing by the total number of matches they played, and multiplying the result by 100, the effectiveness % was developed to identify the players who may be said to be the most effective players. Understanding that a player’s effectiveness percentage will be higher than 100%, making him very effective, if he is able to score more goals than the total number of matches played, is the rationale behind this analysis.
Here is the plot for the Effectiveness of Players playing in the Brazilian league, Spanish league, and English league:
# selecting the top 3 most effective players from Brazil #
top_3_most_effective_players_BRA <- top_15_brazil_players %>%
arrange(desc(EffectivenessPercentage)) %>%
head(3)
# selecting the top 3 most effective players from Spain #
top_3_most_effective_players_SPAIN <- top_15_spanish_players %>%
arrange(desc(EffectivenessPercentage)) %>%
head(3)
# selecting the top 3 most effective players from England #
top_3_most_effective_players_ENG <- top_15_english_players %>%
arrange(desc(EffectivenessPercentage)) %>%
head(3)
# combining the three data frames into one for comparison of player effectiveness #
overall_top_3_most_effective_players <- bind_rows(
top_3_most_effective_players_BRA,
top_3_most_effective_players_SPAIN,
top_3_most_effective_players_ENG
)
# trying to compare the levels of effectiveness by each player in Brazil #
ggplot(overall_top_3_most_effective_players, aes(x = EffectivenessPercentage, y = reorder(Player.Names, EffectivenessPercentage), fill = League)) +
geom_point(size = 3, shape = 21, color = "black") +
labs(title = "Effectiveness of Top Players in Brazil, Spain, and England", x = "Effectiveness Percentage", y = "Player Names", fill = "League") +
theme_gray() +
theme(plot.title = element_text(hjust = 0.5)) +
scale_fill_brewer(palette = "Paired", guide = guide_legend(reverse = TRUE)) +
geom_text(aes(label = EffectivenessPercentage), hjust = -0.5, size = 3, vjust = 0.5) +
geom_text(aes(label = Player.Names), hjust = -0.2, size = 2.5, vjust = 1.7, color = "black") +
theme(axis.text.y = element_text(size = 10), legend.title = element_text(size = 10), legend.text = element_text(size = 8)) +
theme(aspect.ratio = 1/3) +
scale_x_continuous(expand = c(0.2,0.2), breaks = seq(70, 130, by = 10))
The effectiveness percentages for the top players in La Liga, Campeonato Brasileiro Serie A, and the Premier League are displayed in the visualization above. The players on the list are the top scorers from the various visualizations described above. We are able to clearly observe that Lionel Messi, who plays in Spain’s La Liga division, has consistently been the player who has been the most effective. The player has three percentages above 100%, which is really outstanding because it signifies that he was able to score more goals than games played. Harry Kane, who plays in the Premier League and has an efficiency rate of 100%, is the second most effective player.
In addition to researching and finding the top scorers across various leagues and nations, I made the decision to add extra information to the analysis, including the total number of goals scored by all of these individuals. I choose to display the overall number of goals scored in each nation included in our data collection. Therefore, maybe, we can come to some conclusions about what the leagues in these nations tell us.
# trying to compare the differences between countries #
countries <- df %>%
select(Country, Club, Goals, Year) %>%
group_by(Country, Year)
# plotting to find the difference in number of goals in each country #
plot_ly(labels = ~Country, values = ~Goals, textposition = "inside") %>%
add_pie(data = countries[countries$Year == 2020,], name = "2020", title = "2020", domain = list(row = 0, column = 0)) %>%
add_pie(data = countries[countries$Year == 2019,], name = "2019", title = "2019", domain = list(row = 0, column = 1)) %>%
add_pie(data = countries[countries$Year == 2018,], name = "2018", title = "2018", domain = list(row = 1, column = 0)) %>%
add_pie(data = countries[countries$Year == 2017,], name = "2017", title = "2017", domain = list(row = 1, column = 1)) %>%
layout(title = "Total Goals Per Country By Year", showlegend = TRUE, grid = list(rows = 2, columns = 2))
The breakdown of goals scored in each nation by year is seen. I find it quite fascinating to note that in the year 2020, The Netherlands scored 216 goals in total, accounting for about 20% of the total, followed by the United States, who scored 196 goals. The data on the table are really shocking and exciting at the same time to watch these countries’ rise in soccer, if that is fair to say. I personally believe that these two countries do not have the best leagues in the world. Italy and England were the nations with the most goals scored in 2019, which is more common than not. We observe that Spain and England scored about the same number of goals in 2018, which could have been also anticipated. Last but not least, with 334 goals and 297 goals respectively, Spain and Italy are the top two scoring nations in 2017.
I wanted to study the players who played the most minutes in 2016, 2017, 2018, 2019, and 2020 in this area. I divided the data frames according to the chosen year, which allowed me to determine the average number of minutes played for that particular year. I then chose only the players who had played more minutes than the average, classifying them as players with high minutes played, and I only chose the top 5 players to consider. As may be seen here in the plot.
Here is the plot for Players with Top Minutes Played:
# lets set a standard for minutes played and filter out the players with the most goals and most minutes played #
# overall top minutes played #
top_minutes_played <- df %>%
select(Country, Player.Names, League, Mins, Year) %>%
mutate(avg_minutes = mean(Mins)) %>%
filter(Mins > avg_minutes)
# creating a df for top minutes played only in 2016 #
top_minutes_2016 <- top_minutes_played[top_minutes_played$Year == 2016,] %>%
select(Country, Player.Names, League, Mins, Year) %>%
mutate(avg_minutes = mean(Mins)) %>%
filter(Mins > avg_minutes) %>%
head(5)
# creating a df for top minutes played only in 2017 #
top_minutes_2017 <- top_minutes_played[top_minutes_played$Year == 2017,] %>%
select(Country, Player.Names, League, Mins, Year) %>%
mutate(avg_minutes = mean(Mins)) %>%
filter(Mins > avg_minutes) %>%
head(5)
# creating a df for top minutes played only in 2018 #
top_minutes_2018 <- top_minutes_played[top_minutes_played$Year == 2018,] %>%
select(Country, Player.Names, League, Mins, Year) %>%
mutate(avg_minutes = mean(Mins)) %>%
filter(Mins > avg_minutes) %>%
head(5)
# creating a df for top minutes played only in 2019 #
top_minutes_2019 <- top_minutes_played[top_minutes_played$Year == 2019,] %>%
select(Country, Player.Names, League, Mins, Year) %>%
mutate(avg_minutes = mean(Mins)) %>%
filter(Mins > avg_minutes) %>%
head(5)
# creating a df for top minutes played only in 2020 #
top_minutes_2020 <- top_minutes_played[top_minutes_played$Year == 2020,] %>%
select(Country, Player.Names, League, Mins, Year) %>%
mutate(avg_minutes = mean(Mins)) %>%
filter(Mins > avg_minutes) %>%
head(5)
# put it all together combined in one #
top_minutes_played_yearly <- bind_rows(
top_minutes_2016,
top_minutes_2017,
top_minutes_2018,
top_minutes_2019,
top_minutes_2020
)
# plotting a heat map showing the contrast between the minutes played #
ggplot(top_minutes_played_yearly, aes(x = Year, y = Player.Names, fill = Mins)) +
geom_tile(color = "black") +
geom_text(aes(label = comma(Mins)), size = 4) +
labs(title = "Heatmap: Players With The Most Minutes Played by Year", x = "Year", y = "Player's Name", fill = "Total Minutes Played") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5)) +
scale_fill_continuous(low = "white", high = "red") +
guides(fill = guide_legend(reverse = TRUE, override.aes = list(colour = "black")))
The top 25 players with the most minutes played in the years 2016, 2017, 2018, 2019, and 2020 are displayed on this heat map. Because there are three individuals that were determined to have a high minute played rating, totaling 25 top players, only 22 players are shown on the map. Maxi Gomez is the player that played the most minutes in 2017 with just over 3,100 minutes played, and he also had the highest minutes played in 2018 with a total of 3,148 minutes played, tied with Iago Aspas. Luis Suarez is the second most active player, having played 2,940 and 3,008 consecutive minutes in 2016 and 2017, respectively. Lionel Messi, the third player, had the most minutes played in 2016 and 2017, clocking 2,910 and 3,123 total straight minutes. Another player worth mentioning is Diego Rossi, who played 3,277 minutes in total over the course of five years, which is significant. Alex Pozuelo, on the other hand, has played the fewest minutes, with just 2,247 total.
If you know anything about soccer, you are aware that the debate over who is the best player—Ronaldo or Messi—goes on forever and never seems to have a clear winner. I choose to compare the two of them using the data set and the information acquired for them after taking this conversation into account. I made the decision to compare the actual number of goals each of them scored to the expected number of goals each of them should have scored based on the data.
Here is the plot for the comparison of Lionel Messi and Cristiano Ronaldo:
# looking at Lionel Messi, Cristiano Ronaldo #
lionel_messi <- df[df$Player.Names == "Lionel Messi",]
cr7 <- df[df$Player.Names == "Cristiano Ronaldo",]
# combine into one df #
cr7_and_messi <- bind_rows(
lionel_messi,
cr7
)
# plotting to see if messi and ronaldo were able to score the number of goals that was expected of them #
ggplot(data = cr7_and_messi, aes(x = xG, y = Goals, color = Player.Names)) +
geom_line(linewidth = 1) +
geom_point(shape = 21, size = 5, color = "green", fill = "white") +
labs(title = "Expected Goals vs. Goals", x = "Expected Goals (xG)", y = "Goals Scored", color = "Player Name") +
scale_x_continuous(breaks = seq(5, 40, by = 2)) +
scale_y_continuous(breaks = seq(4, 40, by = 4)) +
theme_light() +
theme(plot.title = element_text(hjust = 0.5)) +
geom_label_repel(aes(label = Goals),
box.padding = 1,
point.padding = 1,
size = 4,
color = "Grey50",
segment.color = "black",
max.overlaps = Inf) +
scale_color_manual(values = c("Lionel Messi" = "blue", "Cristiano Ronaldo" = "orange"))
Each player will have five green dots on their display, one for each of the five years they played. When comparing the initial dots of both players, we can observe that Messi fell short of expectations in terms of goals scored; he only managed to score four when over five were predicted. On the other side, Ronaldo was successful because he scored eight goals when it was predicted that he would only do it slightly more than five times. In the second dot, where the narrative appears to be inverted, Messi was predicted to score over 21 goals but ended up scoring 25. Now, Ronaldo was projected to score slightly more than 21 goals, but he only managed to score 21, which is not bad but suggests that he fell short of the mark. Lionel Messi never again performed below expectations; after just one incident, he raised and kept his level. Aside from this one additional instance, where he was predicted to score 29 goals but only managed to score 26, Cristiano Ronaldo had the potential to succeed.