Welcome to a fascinating exploration of Lionel Messi’s performance on the football field. We’re diving into the data, using R, to uncover insights about his top shot types. We’ll be cleaning and transforming the data, dealing with missing values, and ensuring the data types are correct. We’ll create engaging visualizations, like bar charts and scatter plots, to bring the data to life. By the end, we’ll have a deeper understanding of Messi’s skills and strategies. Let’s kick off this exciting journey into the world of football analytics!
We’re not just looking at numbers, but the story they tell about Messi play style. We’ll analyze trends and patterns, ask interesting questions, and find answers within the data. This project is more than just statistics, it’s about understanding the game of football through the lens of one of its greatest players. So, let’s dive in and discover what the data has to say about Messi top shot types!
Load the dataset
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.1 ✔ purrr 1.0.1
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.3.0 ✔ stringr 1.5.0
## ✔ readr 2.1.3 ✔ forcats 1.0.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(dplyr)
messi_data=read.csv("C:/Users/acer/Downloads/messi data.csv")
Firstly, I’ll try to find if they don’t have missing data
#is.na(messi_data)
This dataset have 704 data so i’m not going to display it
Handling missing data and storing in another data frame
messi_data_clean<-na.omit(messi_data)
head(messi_data_clean)
## Season Competition Matchday Date Venue Club
## 1 5-Apr LaLiga 34 5/1/2005 H FC Barcelona
## 2 6-May UEFA Champions League Group Stage 11/2/2005 H FC Barcelona
## 3 6-May LaLiga 13 11/27/2005 H FC Barcelona
## 4 6-May LaLiga 19 1/15/2006 H FC Barcelona
## 5 6-May LaLiga 20 1/22/2006 H FC Barcelona
## 6 6-May LaLiga 21 1/29/2006 A FC Barcelona
## Opponent Result Playing_Position Minute At_score
## 1 Albacete Balompie 2:00 CF 90+1 2:00
## 2 Panathinaikos Athens 5:00 RW 34 3:00
## 3 Racing Santander 4:01 RW 51 2:00
## 4 Athletic Bilbao 2:01 RW 50 2:01
## 5 Deportivo Alaves 2:00 CF 82 2:00
## 6 RCD Mallorca 0:03 CF 75 0:02
## Type Goal_assist
## 1 Left-footed shot Ronaldinho Gaacho
## 2 Left-footed shot
## 3 Left-footed shot Samuel Etoo
## 4 Left-footed shot Mark van Bommel
## 5 Left-footed shot Ronaldinho Gaacho
## 6 Right-footed shot Sylvinho
Count the number of rows in the dataset
nrow(messi_data_clean)
## [1] 704
Check overview in my dataframe
str(messi_data_clean)
## 'data.frame': 704 obs. of 13 variables:
## $ Season : chr "5-Apr" "6-May" "6-May" "6-May" ...
## $ Competition : chr "LaLiga" "UEFA Champions League" "LaLiga" "LaLiga" ...
## $ Matchday : chr "34" "Group Stage" "13" "19" ...
## $ Date : chr "5/1/2005" "11/2/2005" "11/27/2005" "1/15/2006" ...
## $ Venue : chr "H" "H" "H" "H" ...
## $ Club : chr "FC Barcelona" "FC Barcelona" "FC Barcelona" "FC Barcelona" ...
## $ Opponent : chr "Albacete Balompie" "Panathinaikos Athens" "Racing Santander" "Athletic Bilbao" ...
## $ Result : chr "2:00" "5:00" "4:01" "2:01" ...
## $ Playing_Position: chr "CF" "RW" "RW" "RW" ...
## $ Minute : chr "90+1" "34" "51" "50" ...
## $ At_score : chr "2:00" "3:00" "2:00" "2:01" ...
## $ Type : chr "Left-footed shot" "Left-footed shot" "Left-footed shot" "Left-footed shot" ...
## $ Goal_assist : chr "Ronaldinho Gaacho" "" "Samuel Etoo" "Mark van Bommel" ...
Example of Barplot with some implementation Fig 1. Geom bar(Observation of playing time in his two European clubs)
messi_data_clean%>%
group_by(Club)%>%
summarise(Minute = Minute)
## `summarise()` has grouped output by 'Club'. You can override using the
## `.groups` argument.
## # A tibble: 704 × 2
## # Groups: Club [2]
## Club Minute
## <chr> <chr>
## 1 FC Barcelona 90+1
## 2 FC Barcelona 34
## 3 FC Barcelona 51
## 4 FC Barcelona 50
## 5 FC Barcelona 82
## 6 FC Barcelona 75
## 7 FC Barcelona 83
## 8 FC Barcelona 42
## 9 FC Barcelona 84
## 10 FC Barcelona 59
## # … with 694 more rows
ggplot(messi_data_clean,aes(x=Club,y=Minute))+
geom_bar(stat = "identity")
How you can see on the different graph Messi has played on two club in Europe before shift in another continent that is so exceptional for one players
Task: Messi’s assists by match in each competition ?
barplot(table(messi_data_clean$Competition,messi_data_clean$Goal_assist),col = "blue",main="Most assist by match")
This code creates a bar plot of Messi’s assists by match. It uses data from my data set, specifically the ‘Competition’ and ‘Goal_assist’ columns. With this graph we can see the players who sent the most decisive passes to Messi during his European football career
Task: Displaying the top 5 of competion where Messi have score a lot of goal
top5 <- sort(table(messi_data_clean$Competition), decreasing = TRUE)[1:5]
barplot(top5, main="Top 5 Competitions by Goals")
This code firstly sorts the ‘Competition’ column in my dataset in decreasing order. It then takes the top 5 highest values, which represent the competitions where Messi scored the most goals. These top 5 competitions are stored in ‘top5’. The barplot function then creates a bar chart of these top 5 competitions, with the number of goals as the height of the bars. The ‘main’ parameter is used to give the plot a title: “Top 5 Competitions by Goals”. So, this code is a simple and effective way to visualize Messi’s goal-scoring prowess across different competitions(the competition where you can’t see the name is Uefa Champion leagues).
Task:Messi tends to assist goals during a match using a scatter plot.
ggplot(messi_data_clean, aes(x=Minute, y=Goal_assist)) + geom_point()
It visualizes when Messi tends to assist goals within a match. The ‘geom_point()’ function plots each instance as a point, allowing us to see any patterns or trends in when Messi makes assists. It’s a great way to understand his performance over time in a match! as you can see is not really visible on Markdown but on my laptop it’s really visibile and you can see the time when messi make assist
messi_data_clean%>%
count(Opponent)%>%
top_n(15)%>%
ggplot(aes(x=reorder(Opponent,n),y=n,fill=Opponent))+
geom_bar(stat = "identity")
## Selecting by n
ggplot(messi_data_clean,aes(x=Season,y=Minute,fill=Season))+
geom_col()
ggplot(messi_data_clean,aes(x=Playing_Position, fill=Playing_Position))+
geom_bar(width = 0.5)+
coord_polar("y",start=0)+
theme_minimal()+
xlab("Position")+
ylab("Goals")
a polar bar plot using ‘ggplot2’ in R. It’ll show Messi’s goals scored in each playing position. The plot has a minimal theme and the positions are differentiated by color.As you can see in the different positions where Messi was put he scored goals except when he played as a middle attacker because this position was not favorable to him because he could not show well what he is capable in front of goal
top_shots <- messi_data %>%
count(Type) %>%
arrange(desc(n)) %>%
head(5)
ggplot(top_shots, aes(x=reorder(Type, n), y=n)) +
geom_bar(stat="identity", fill="darkblue") +
labs(title="Top 5 Shot Types", x="Type", y="Count")
So, this code is creating a bar plot of the top 5 shot types in the ‘messi_data’ data frame, with the shot types ordered by count.
ggplot(messi_data_clean, aes(x=Minute, y=Type)) +
geom_point() +
labs(title="Shot Type by Minute", x="Minute", y="Shot Type")
You can clearly see that Messi has scored the most goals with his left foot during his matches (something which is really logical given that he is left-handed) but he has scored many with his right foot, which is not allowed. to all football players to achieve this kind of football performance. His also score less Tap-in and less solo run because His not that type of players but he can scrore that
Task:#Application of linear regration to analyze messi performance based on his position
messi_data_clean$Minute <- as.numeric(messi_data_clean$Minute)
## Warning: NAs introduced by coercion
messi_data_clean$Minute[is.na(messi_data_clean$Minute)] <- mean(messi_data_clean$Minute, na.rm = TRUE)
model <- lm(Minute~Playing_Position,data = messi_data_clean)
summary(model)
##
## Call:
## lm(formula = Minute ~ Playing_Position, data = messi_data_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -48.76 -20.73 0.09 21.57 58.24
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 35.667 8.216 4.341 1.63e-05 ***
## Playing_PositionAM 18.190 12.421 1.464 0.1435
## Playing_PositionCF 15.243 8.349 1.826 0.0683 .
## Playing_PositionCF 17.060 9.093 1.876 0.0611 .
## Playing_PositionLW 23.333 25.981 0.898 0.3694
## Playing_PositionRW 16.093 8.381 1.920 0.0553 .
## Playing_PositionRW 11.938 8.596 1.389 0.1654
## Playing_PositionSS 6.262 9.691 0.646 0.5184
## Playing_PositionSS 14.512 9.300 1.560 0.1191
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 24.65 on 695 degrees of freedom
## Multiple R-squared: 0.01217, Adjusted R-squared: 0.000799
## F-statistic: 1.07 on 8 and 695 DF, p-value: 0.382
unique(messi_data_clean$Playing_Position)
## [1] "CF" "RW" "LW" "SS" "CF " "AM" "RW " "AM " "SS "
**Task: Using boxplot to display Messi position played by minutes
aov(Minute~Playing_Position,data = messi_data_clean)
## Call:
## aov(formula = Minute ~ Playing_Position, data = messi_data_clean)
##
## Terms:
## Playing_Position Residuals
## Sum of Squares 5201.5 422209.8
## Deg. of Freedom 8 695
##
## Residual standard error: 24.64744
## Estimated effects may be unbalanced
boxplot(messi_data_clean$Minute~messi_data_clean$Playing_Position, col=c("purple","green","orange","pink"))
Information related to my data and my result from that little analyze
The model suggests different positions might affect Messi’s playing minutes. However, the p-value is above 0.05 for most positions, indicating the results aren’t statistically significant. So, position might not be a strong predictor for his playing time. In this ANOVA test, we’re seeing if playing positions affect Messi’s minutes played. The large p-value suggests there’s not a significant effect. The ‘unbalanced’ note means the data across different playing positions might not be evenly distributed. So, while there might be some differences in minutes played between positions, it’s not enough to say that the position is a significant factor times Messi plays.
In conclusion, our project explored Messi’s playing time based on his positions. We used R to analyze and visualize the data. However, our model showed that Messi’s position doesn’t significantly affect his playing time. This suggests that Messi’s skill and versatility allow him to excel, regardless of his position. Future work could involve refining our model or exploring other factors that may impact his playing time. This project was a great opportunity to apply data analysis techniques in a real-world context.