Introduction

Welcome to a fascinating exploration of Lionel Messi’s performance on the football field. We’re diving into the data, using R, to uncover insights about his top shot types. We’ll be cleaning and transforming the data, dealing with missing values, and ensuring the data types are correct. We’ll create engaging visualizations, like bar charts and scatter plots, to bring the data to life. By the end, we’ll have a deeper understanding of Messi’s skills and strategies. Let’s kick off this exciting journey into the world of football analytics!

We’re not just looking at numbers, but the story they tell about Messi play style. We’ll analyze trends and patterns, ask interesting questions, and find answers within the data. This project is more than just statistics, it’s about understanding the game of football through the lens of one of its greatest players. So, let’s dive in and discover what the data has to say about Messi top shot types!

Load the dataset

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.1      ✔ purrr   1.0.1 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.3.0      ✔ stringr 1.5.0 
## ✔ readr   2.1.3      ✔ forcats 1.0.0 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(dplyr)
messi_data=read.csv("C:/Users/acer/Downloads/messi data.csv")

Data preprocessing

Firstly, I’ll try to find if they don’t have missing data

#is.na(messi_data) 

This dataset have 704 data so i’m not going to display it

Handling missing data and storing in another data frame

messi_data_clean<-na.omit(messi_data)
head(messi_data_clean)
##   Season           Competition    Matchday       Date Venue         Club
## 1  5-Apr                LaLiga          34   5/1/2005     H FC Barcelona
## 2  6-May UEFA Champions League Group Stage  11/2/2005     H FC Barcelona
## 3  6-May                LaLiga          13 11/27/2005     H FC Barcelona
## 4  6-May                LaLiga          19  1/15/2006     H FC Barcelona
## 5  6-May                LaLiga          20  1/22/2006     H FC Barcelona
## 6  6-May                LaLiga          21  1/29/2006     A FC Barcelona
##               Opponent Result Playing_Position Minute At_score
## 1    Albacete Balompie   2:00               CF   90+1     2:00
## 2 Panathinaikos Athens   5:00               RW     34     3:00
## 3     Racing Santander   4:01               RW     51     2:00
## 4      Athletic Bilbao   2:01               RW     50     2:01
## 5     Deportivo Alaves   2:00               CF     82     2:00
## 6         RCD Mallorca   0:03               CF     75     0:02
##                Type       Goal_assist
## 1  Left-footed shot Ronaldinho Gaacho
## 2  Left-footed shot                  
## 3  Left-footed shot       Samuel Etoo
## 4  Left-footed shot   Mark van Bommel
## 5  Left-footed shot Ronaldinho Gaacho
## 6 Right-footed shot          Sylvinho

Count the number of rows in the dataset

nrow(messi_data_clean)
## [1] 704

Check overview in my dataframe

str(messi_data_clean)
## 'data.frame':    704 obs. of  13 variables:
##  $ Season          : chr  "5-Apr" "6-May" "6-May" "6-May" ...
##  $ Competition     : chr  "LaLiga" "UEFA Champions League" "LaLiga" "LaLiga" ...
##  $ Matchday        : chr  "34" "Group Stage" "13" "19" ...
##  $ Date            : chr  "5/1/2005" "11/2/2005" "11/27/2005" "1/15/2006" ...
##  $ Venue           : chr  "H" "H" "H" "H" ...
##  $ Club            : chr  "FC Barcelona" "FC Barcelona" "FC Barcelona" "FC Barcelona" ...
##  $ Opponent        : chr  "Albacete Balompie" "Panathinaikos Athens" "Racing Santander" "Athletic Bilbao" ...
##  $ Result          : chr  "2:00" "5:00" "4:01" "2:01" ...
##  $ Playing_Position: chr  "CF" "RW" "RW" "RW" ...
##  $ Minute          : chr  "90+1" "34" "51" "50" ...
##  $ At_score        : chr  "2:00" "3:00" "2:00" "2:01" ...
##  $ Type            : chr  "Left-footed shot" "Left-footed shot" "Left-footed shot" "Left-footed shot" ...
##  $ Goal_assist     : chr  "Ronaldinho Gaacho" "" "Samuel Etoo" "Mark van Bommel" ...

Part of visualization technique

Example of Barplot with some implementation Fig 1. Geom bar(Observation of playing time in his two European clubs)

messi_data_clean%>%
  group_by(Club)%>%
  summarise(Minute = Minute)
## `summarise()` has grouped output by 'Club'. You can override using the
## `.groups` argument.
## # A tibble: 704 × 2
## # Groups:   Club [2]
##    Club         Minute
##    <chr>        <chr> 
##  1 FC Barcelona 90+1  
##  2 FC Barcelona 34    
##  3 FC Barcelona 51    
##  4 FC Barcelona 50    
##  5 FC Barcelona 82    
##  6 FC Barcelona 75    
##  7 FC Barcelona 83    
##  8 FC Barcelona 42    
##  9 FC Barcelona 84    
## 10 FC Barcelona 59    
## # … with 694 more rows
ggplot(messi_data_clean,aes(x=Club,y=Minute))+
  geom_bar(stat = "identity")

How you can see on the different graph Messi has played on two club in Europe before shift in another continent that is so exceptional for one players

Task: Messi’s assists by match in each competition ?

barplot(table(messi_data_clean$Competition,messi_data_clean$Goal_assist),col = "blue",main="Most assist by match")

This code creates a bar plot of Messi’s assists by match. It uses data from my data set, specifically the ‘Competition’ and ‘Goal_assist’ columns. With this graph we can see the players who sent the most decisive passes to Messi during his European football career

Task: Displaying the top 5 of competion where Messi have score a lot of goal

top5 <- sort(table(messi_data_clean$Competition), decreasing = TRUE)[1:5]
barplot(top5, main="Top 5 Competitions by Goals")

This code firstly sorts the ‘Competition’ column in my dataset in decreasing order. It then takes the top 5 highest values, which represent the competitions where Messi scored the most goals. These top 5 competitions are stored in ‘top5’. The barplot function then creates a bar chart of these top 5 competitions, with the number of goals as the height of the bars. The ‘main’ parameter is used to give the plot a title: “Top 5 Competitions by Goals”. So, this code is a simple and effective way to visualize Messi’s goal-scoring prowess across different competitions(the competition where you can’t see the name is Uefa Champion leagues).

Example of scatter plot with geom_bar

Task:Messi tends to assist goals during a match using a scatter plot.

ggplot(messi_data_clean, aes(x=Minute, y=Goal_assist)) + geom_point()

It visualizes when Messi tends to assist goals within a match. The ‘geom_point()’ function plots each instance as a point, allowing us to see any patterns or trends in when Messi makes assists. It’s a great way to understand his performance over time in a match! as you can see is not really visible on Markdown but on my laptop it’s really visibile and you can see the time when messi make assist

Task: Messi played against each opponent, then plots the top 15 opponents in a bar chart

messi_data_clean%>% 
  count(Opponent)%>%
  top_n(15)%>%
  ggplot(aes(x=reorder(Opponent,n),y=n,fill=Opponent))+
  geom_bar(stat = "identity")
## Selecting by n

Task:showing the total minutes Messi played each season

ggplot(messi_data_clean,aes(x=Season,y=Minute,fill=Season))+
  geom_col()

Task:The number of goals Messi scored in each playing position.

ggplot(messi_data_clean,aes(x=Playing_Position, fill=Playing_Position))+
  geom_bar(width = 0.5)+
  coord_polar("y",start=0)+
  theme_minimal()+
  xlab("Position")+
  ylab("Goals")

a polar bar plot using ‘ggplot2’ in R. It’ll show Messi’s goals scored in each playing position. The plot has a minimal theme and the positions are differentiated by color.As you can see in the different positions where Messi was put he scored goals except when he played as a middle attacker because this position was not favorable to him because he could not show well what he is capable in front of goal

Task:Type of shot by Messi

top_shots <- messi_data %>% 
  count(Type) %>% 
  arrange(desc(n)) %>% 
  head(5)
ggplot(top_shots, aes(x=reorder(Type, n), y=n)) +
  geom_bar(stat="identity", fill="darkblue") +
  labs(title="Top 5 Shot Types", x="Type", y="Count")

So, this code is creating a bar plot of the top 5 shot types in the ‘messi_data’ data frame, with the shot types ordered by count.

Task:Shoot type by minute represented with scatter plot

ggplot(messi_data_clean, aes(x=Minute, y=Type)) +
  geom_point() +
  labs(title="Shot Type by Minute", x="Minute", y="Shot Type")

You can clearly see that Messi has scored the most goals with his left foot during his matches (something which is really logical given that he is left-handed) but he has scored many with his right foot, which is not allowed. to all football players to achieve this kind of football performance. His also score less Tap-in and less solo run because His not that type of players but he can scrore that

Application of some analytics tehc

Task:#Application of linear regration to analyze messi performance based on his position

messi_data_clean$Minute <- as.numeric(messi_data_clean$Minute)
## Warning: NAs introduced by coercion
messi_data_clean$Minute[is.na(messi_data_clean$Minute)] <- mean(messi_data_clean$Minute, na.rm = TRUE)

model <- lm(Minute~Playing_Position,data = messi_data_clean)
summary(model)
## 
## Call:
## lm(formula = Minute ~ Playing_Position, data = messi_data_clean)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -48.76 -20.73   0.09  21.57  58.24 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           35.667      8.216   4.341 1.63e-05 ***
## Playing_PositionAM    18.190     12.421   1.464   0.1435    
## Playing_PositionCF    15.243      8.349   1.826   0.0683 .  
## Playing_PositionCF    17.060      9.093   1.876   0.0611 .  
## Playing_PositionLW    23.333     25.981   0.898   0.3694    
## Playing_PositionRW    16.093      8.381   1.920   0.0553 .  
## Playing_PositionRW    11.938      8.596   1.389   0.1654    
## Playing_PositionSS     6.262      9.691   0.646   0.5184    
## Playing_PositionSS    14.512      9.300   1.560   0.1191    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 24.65 on 695 degrees of freedom
## Multiple R-squared:  0.01217,    Adjusted R-squared:  0.000799 
## F-statistic:  1.07 on 8 and 695 DF,  p-value: 0.382
unique(messi_data_clean$Playing_Position)
## [1] "CF"  "RW"  "LW"  "SS"  "CF " "AM"  "RW " "AM " "SS "

**Task: Using boxplot to display Messi position played by minutes

aov(Minute~Playing_Position,data = messi_data_clean)
## Call:
##    aov(formula = Minute ~ Playing_Position, data = messi_data_clean)
## 
## Terms:
##                 Playing_Position Residuals
## Sum of Squares            5201.5  422209.8
## Deg. of Freedom                8       695
## 
## Residual standard error: 24.64744
## Estimated effects may be unbalanced
boxplot(messi_data_clean$Minute~messi_data_clean$Playing_Position, col=c("purple","green","orange","pink"))

Information related to my data and my result from that little analyze

The model suggests different positions might affect Messi’s playing minutes. However, the p-value is above 0.05 for most positions, indicating the results aren’t statistically significant. So, position might not be a strong predictor for his playing time. In this ANOVA test, we’re seeing if playing positions affect Messi’s minutes played. The large p-value suggests there’s not a significant effect. The ‘unbalanced’ note means the data across different playing positions might not be evenly distributed. So, while there might be some differences in minutes played between positions, it’s not enough to say that the position is a significant factor times Messi plays.

Conclusion

In conclusion, our project explored Messi’s playing time based on his positions. We used R to analyze and visualize the data. However, our model showed that Messi’s position doesn’t significantly affect his playing time. This suggests that Messi’s skill and versatility allow him to excel, regardless of his position. Future work could involve refining our model or exploring other factors that may impact his playing time. This project was a great opportunity to apply data analysis techniques in a real-world context.