---
title: "European Soccer: Tactical Analysis"
author: "Subhan Khalid"
date: "01/14/2023"
output:
flexdashboard::flex_dashboard:
orientation: rows
social: menu
source: embed
vertical_layout: fill
pdf_document: default
html_document:
df_print: paged
---
# Conclusion {.sidebar}
**Table of Contents:**
* 1.Summary
* 2.Univariate Analysis:
Passing
Build Up Play Speed
Team Wins
Goals Scored
* 3.Bivariate Analysis:
Key Variables
Strategy for socring goals
* 4.Conclusion
# **Mainframe**
Row {data-height=250}
-------------------------------------
### **Overview**
This dashboard will analyse data from top 6 soccer teams historically from the English and Spanish Leagues. The aim is to do an analysis of tactics deployed by these teams and thier impact over the number of goals scored across from season 2010/11 to season 2015/16. The teams from the English league are Arsenal, Chelsea, Manchester City, Manchester United and Tottenham. Teams from the spanish league are Real Madrid, Bacelona, valencia, Athletic Bilbao, Sevilla and Atletico Madrid.
Row
-------------------------------------
### **Objective**
The structure of the project is that in the first half I will do data manipulation to prepare a datset with 72 observtion on the 12 teams and the 6 seasons. This dataset will contain variables on goals scored in a season, wins, build up play speed and chances created using crossing, passing and shooting. The data manipulation will involve a lot of filtering, appending and merging from the SQL database. I will exlain the steps as I go along.
The seond half of th project will involve data visualization of the tactical variables and thier relation with goals scored and wins. The end goal is to visualize or view the tactical strategy which leads to more goals scored across leagues and across seasons.
### **Executive Summary**
Through this analysis we get valuable inisights between teams and thier methodology and approach to scoring goals on a consistent basis. We also see the differences in the spanish and english league and thier understanding of the game. The data supports the claim that the spanish league is more tactical whereas the english league is more physical. Adding in further variables will give more insight into goal scoring tactics but nevertheless this analysis is a very good stepping stone to that.
```{r, echo = TRUE, include = FALSE, message = FALSE}
#install.packages("xts",repos = "http://cran.us.r-project.org")
#install.packages("dygraphs",repos = "http://cran.us.r-project.org")
#install.packages("lubridate",repos = "http://cran.us.r-project.org")
#install.packages("DT", type = "binary")
#install.packages("pdfetch", repos = "http://cran.us.r-project.org")
#install.packages("PerformanceAnalytics", repos = "http://cran.us.r-project.org")
#install.packages("stocks", repos = "http://cran.us.r-project.org")
#install.packages("flexdashboard", repos = "http://cran.us.r-project.org")
#install.packages("glue")
#install.packages("lifecycle", type = "binary")
#install.packages("rlang")
#install.packages("acepack")
#install.packages("latticeExtra")
#install.packages("flexdashboard", type = "binary")
#install.packages("kableExtra", type = "binary")
#install.packages("htmltools")
#install.packages("memoise")
#install.packages("RSQLite", type = "binary")
#install.packages("DBI", type = "binary")
#install.packages("Rcpp", type = "binary")
#install.packages("rlang", type = "binary")
library(xts)
library(pdfetch)
library(DT)
library(lubridate)
library(dygraphs)
library(quantmod)
library(dplyr)
library(knitr)
library(ggplot2)
#library(tidyr)
library(plyr)
library(PerformanceAnalytics)
#library(stocks)
library(kableExtra)
library(flexdashboard)
```
```{r,include=FALSE}
#Loading SQL Libraries
library(DBI)
library(RSQLite)
# connect to the sqlite file
database <- dbConnect(RSQLite::SQLite(),dbname="C:/Users/sk979/Downloads/ANLY 512/00_Project/database.sqlite")
con <- dbConnect(drv=RSQLite::SQLite(), dbname="YOURSQLITEFILE")
#Reading in Dataframe
as.data.frame(dbListTables(database))
Country <- dbReadTable(database, 'Country')
League <- dbReadTable(database, 'League')
Match <- dbReadTable(database, 'Match')
Player <- dbReadTable(database, 'Player')
Player_Att<- dbReadTable(database, 'Player_Attributes')
Team <- dbReadTable(database, 'Team')
Team_Att <- dbReadTable(database, 'Team_Attributes')
```
```{r}
library(ggplot2)
library(tidyverse)
library(dplyr)
#Keeping the two leagues
Match_1 <- Match %>% filter(country_id==1729 | country_id==21518)
#Keeping Reuired Variables
Match_1<-Match_1 %>%select(id,country_id,league_id,season,stage,date,match_api_id,home_team_api_id,away_team_api_id,home_team_goal,away_team_goal)
#Keeping required ID's
Match_1<-Match_1 %>%
filter(home_team_api_id==9825 |home_team_api_id==8455 |home_team_api_id==8650 |home_team_api_id==8456 |home_team_api_id==10260 |home_team_api_id==8586 |home_team_api_id==9906 |home_team_api_id==8634 |home_team_api_id==8633 |home_team_api_id==8315 |home_team_api_id==10267 |home_team_api_id==8302| away_team_api_id==9825 |away_team_api_id==8455 |away_team_api_id==8650 |away_team_api_id==8456 |away_team_api_id==10260 |away_team_api_id==8586 |away_team_api_id==9906 |away_team_api_id==8634 |away_team_api_id==8633 |away_team_api_id==8315 |away_team_api_id==10267 |away_team_api_id==8302)
#Filtering Seasons
Match_1<-Match_1 %>% filter(season=="2010/2011"|season=="2011/2012"|season=="2012/2013"|season=="2013/2014"|season=="2014/2015"|season=="2015/2016")
#Creating a ID variable for the 12 teams
##Not having two seperate ID variables
Match_1$Team_ID[Match_1$home_team_api_id==9825 | Match_1$away_team_api_id==9825]<-1
Match_1$Team_ID[Match_1$home_team_api_id==8455 | Match_1$away_team_api_id==8455]<-2
Match_1$Team_ID[Match_1$home_team_api_id==8650 | Match_1$away_team_api_id==8650]<-3
Match_1$Team_ID[Match_1$home_team_api_id==8456 | Match_1$away_team_api_id==8456]<-4
Match_1$Team_ID[Match_1$home_team_api_id==10260 | Match_1$away_team_api_id==10260]<-5
Match_1$Team_ID[Match_1$home_team_api_id==8586 | Match_1$away_team_api_id==8586]<-6
Match_1$Team_ID[Match_1$home_team_api_id==9906 | Match_1$away_team_api_id==9906]<-7
Match_1$Team_ID[Match_1$home_team_api_id==8634 | Match_1$away_team_api_id==8634]<-8
Match_1$Team_ID[Match_1$home_team_api_id==8633 | Match_1$away_team_api_id==8633]<-9
Match_1$Team_ID[Match_1$home_team_api_id==8315 | Match_1$away_team_api_id==8315]<-10
Match_1$Team_ID[Match_1$home_team_api_id==10267 | Match_1$away_team_api_id==10267]<-11
Match_1$Team_ID[Match_1$home_team_api_id==8302 | Match_1$away_team_api_id==8302]<-12
```
```{r}
#Creating List
list<-c(9825,8455,8650,8456,10260,8586,9906,8634,8633,8315,10267,8302)
#Creating a data set for home games only
Match_3<-Match_1 %>% filter(Match_1$home_team_api_id%in%list)
Match_3$goals_scored=Match_3$home_team_goal
Match_3$goals_conceded=Match_3$away_team_goal
Match_3$Win[Match_3$home_team_goal>Match_3$away_team_goal ]<-1
Match_3$Win[Match_3$home_team_goal==Match_3$away_team_goal ]<-0
Match_3$Win[Match_3$home_team_goalMatch_3$away_team_goal ]<-0
#Creating a data set for away games only
Match_4<-Match_1 %>% filter(Match_1$away_team_api_id%in%list)
Match_4$goals_scored=Match_4$away_team_goal
Match_4$goals_conceded=Match_4$home_team_goal
Match_4$Win[Match_4$home_team_goalMatch_4$away_team_goal ]<-0
Match_4$Loss[Match_4$home_team_goal>Match_4$away_team_goal ]<-1
Match_4$Loss[Match_4$home_team_goal==Match_4$away_team_goal ]<-0
Match_4$Loss[Match_4$home_team_goal% group_by(Team_ID,season)%>%summarize(Wins=sum(Win),Losses=sum(Loss),Total_goals_scored=sum(goals_scored),Total_goals_conceded=sum(goals_conceded))
#Match_6<- Match_5a %>%
#dplyr::group_by(season) %>%
#dplyr::sample_n(6) %>%
#dplyr::mutate(season_1 = dplyr::row_number()) %>% arrange(Team_ID)
#s1<-Match_5a %>%filter(season == "2010/2011") %>%mutate(season_1 = 1)
#s2<-Match_5a %>%filter(season == "2011/2012") %>%mutate(season_1 = 2)
#s3<-Match_5a %>%filter(season == "2012/2013") %>%mutate(season_1 = 3)
#s4<-Match_5a %>%filter(season == "2013/2014") %>%mutate(season_1 = 4)
#s5<-Match_5a %>%filter(season == "2014/2015") %>%mutate(season_1 = 5)
#s6<-Match_5a %>%filter(season == "2015/2016") %>%mutate(season_1 = 6)
#Match_6 = rbind(s1, s2 , s3 , s4 , s5 , s6)
#save.image (file = "Match_6.RData")
#Match_6$season_1[str_sub(Match_6$season,1,4)==2010]<-1
#Match_6$season_1[str_sub(Match_6$season,1,4)==2011]<-2
#Match_6$season_1[str_sub(Match_6$season,1,4)==2012]<-3
#Match_6$season_1[str_sub(Match_6$season,1,4)==2013]<-4
#Match_6$season_1[str_sub(Match_6$season,1,4)==2014]<-5
#Match_6$season_1[str_sub(Match_6$season,1,4)==2015]<-6
```
```{r}
load("Match_6.RData")
Team_1<-Team %>%
filter(team_fifa_api_id==241 |team_fifa_api_id==243 |team_fifa_api_id==10 |team_fifa_api_id==5 |team_fifa_api_id==11 |team_fifa_api_id==1 |team_fifa_api_id==9 |team_fifa_api_id==240 |team_fifa_api_id==461 |team_fifa_api_id==448 |team_fifa_api_id==481 |team_fifa_api_id==18)
Team_Att_1<-Team_Att %>%
filter(team_fifa_api_id==241 |team_fifa_api_id==243 |team_fifa_api_id==10 |team_fifa_api_id==5 |team_fifa_api_id==11 |team_fifa_api_id==1 |team_fifa_api_id==9 |team_fifa_api_id==240 |team_fifa_api_id==461 |team_fifa_api_id==448 |team_fifa_api_id==481 |team_fifa_api_id==18)
#selecting required variables
Team_Att_1<-Team_Att_1 %>% select(team_fifa_api_id,team_api_id,date,buildUpPlaySpeed,buildUpPlayPassing,chanceCreationPassing,chanceCreationCrossing,chanceCreationShooting,defencePressure,defenceAggression,defenceTeamWidth)
#Merging the teama and team attributes data
Team_2<-full_join(Team_1,Team_Att_1,by="team_api_id")
Team_2$Team_ID[Team_2$team_api_id==9825]<-1
Team_2$Team_ID[Team_2$team_api_id==8455]<-2
Team_2$Team_ID[Team_2$team_api_id==8650]<-3
Team_2$Team_ID[Team_2$team_api_id==8456]<-4
Team_2$Team_ID[Team_2$team_api_id==10260]<-5
Team_2$Team_ID[Team_2$team_api_id==8586]<-6
Team_2$Team_ID[Team_2$team_api_id==9906]<-7
Team_2$Team_ID[Team_2$team_api_id==8634]<-8
Team_2$Team_ID[Team_2$team_api_id==8633]<-9
Team_2$Team_ID[Team_2$team_api_id==8315]<-10
Team_2$Team_ID[Team_2$team_api_id==10267]<-11
Team_2$Team_ID[Team_2$team_api_id==8302]<-12
#String Manipulation to create season variable
Team_2$season_1[str_sub(Team_2$date,1,4)==2010]<-1
Team_2$season_1[str_sub(Team_2$date,1,4)==2011]<-2
Team_2$season_1[str_sub(Team_2$date,1,4)==2012]<-3
Team_2$season_1[str_sub(Team_2$date,1,4)==2013]<-4
Team_2$season_1[str_sub(Team_2$date,1,4)==2014]<-5
Team_2$season_1[str_sub(Team_2$date,1,4)==2015]<-6
```
```{r , warning=FALSE}
soccer<-merge(Team_2,Match_6,by=c("Team_ID","season_1"))
attach(soccer)
#View(soccer)
#library(foreign)
#Assigning value labels
soccer$season_1<-factor(soccer$season_1,levels=c(1,2,3,4,5,6),labels=c("2010-11","2011-12","2012-13","2013-14","2014-15","2015-16"))
soccer$League[Team_ID<=6]<-1
soccer$League[Team_ID>6]<-2
soccer$League<-factor(soccer$League,levels=c(1,2),labels=c("English","Spanish"))
z<-soccer %>% filter(League=="English")
a<-soccer %>% filter(League=="Spanish")
```
# **Univariate Visualization**
Row {.tabset .tabset-fade}
-------------------------------------
### Passing
```{r}
hist(buildUpPlayPassing,col=c("Cyan"),main="Histogram of Build up Play Passing",xlab="Buld Up Play Passing")
#Line Graphs by League on Passing
z %>%filter(Team_ID %in% c(1,2,4,5)) %>%
ggplot(aes(x=factor(season_1),y=buildUpPlayPassing))+
geom_line(aes(group=team_long_name,colour=team_long_name),size=1.5)+
ylab("Build up Play Passing")+xlab("Season")+ guides(color=guide_legend(title="Team"))+theme_bw()+ggtitle("BUILD UP PLAY | PASSING","English Teams")
a %>%filter(Team_ID %in% c(8,9,11,7)) %>%
ggplot(aes(x=factor(season_1),y=buildUpPlayPassing))+
geom_line(aes(group=team_long_name,colour=team_long_name),size=1.5)+
ylab("Build up Play Passing")+xlab("Season")+ guides(color=guide_legend(title="Team"))+theme_bw()+ggtitle("BUILD UP PLAY | PASSING","Spanish Teams")
```
### Build Up Play Speed
```{r}
hist(buildUpPlaySpeed,col=c("Orange"),main="Histogram of Build up Play Speed",xlab="Buid up Play Speed")
#Line Graphs of Build up Play speed
z %>%filter(Team_ID %in% c(1,2,4,5)) %>%
ggplot(aes(x=factor(season_1),y=buildUpPlaySpeed))+
geom_line(aes(group=team_long_name,colour=team_long_name),size=1.5)+
ylab("Build up Play Speed")+xlab("Season")+ guides(color=guide_legend(title="Team"))+theme_bw()+ggtitle("BUILD UP PLAY | SPEED","English Teams")
a %>%filter(Team_ID %in% c(8,9,11,7)) %>%
ggplot(aes(x=factor(season_1),y=buildUpPlaySpeed))+
geom_line(aes(group=team_long_name,colour=team_long_name),size=1.5)+
ylab("Build up Play Speed")+xlab("Season")+ guides(color=guide_legend(title="Team"))+theme_bw()+ggtitle("BUILD UP PLAY | SPEED","Spanish Teams")
```
### Wins
```{r}
z %>%
ggplot(aes(factor(team_short_name),Wins))+
geom_bar(stat="identity",aes(fill=season))+xlab("Team")+theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())+ggtitle("Wins Enlish Teams")
a %>%
ggplot(aes(factor(team_short_name),Wins))+
geom_bar(stat="identity",aes(fill=season))+xlab("Team")+ theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())+ggtitle("Wins Spanish Teams")
```
### Goals Scored
```{r}
hist(Total_goals_scored,col=c("Blue"),main="Histogram of Total Goals from Season")
z %>%
ggplot(aes(factor(team_short_name),Total_goals_scored))+
geom_bar(stat="identity",aes(fill=season))+xlab("Team")+theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())+ggtitle("Goals Scored | English Teams")+ylab("Total Goals Scored")+scale_y_continuous(limits = c(0,600))
a %>%
ggplot(aes(factor(team_short_name),Total_goals_scored))+
geom_bar(stat="identity",aes(fill=season))+xlab("Team")+ theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())+ggtitle("Goals Scored | Spanish Teams")+ylab("Total Goals Scored")
```
# **Bivariate Visualization**
Row {.tabset .tabset-fade}
-------------------------------------
### Key Variables
Here I am doing a bi-variate analysis to compare the relationship between total goals scored and the tactical variables.
```{r}
########
par(mfrow=c(2,2))
soccer %>% filter(season_1=="2015-16")%>%ggplot(aes(buildUpPlayPassing,Total_goals_scored,color=factor(League)))+
theme_bw()+ggtitle("Scatterplot of Goals and BP Passing", "Season 2015-16")+geom_point()+ylab("Total Goals Scored")+xlab("Build Up Play - Passing")+ guides(color=guide_legend(title="League"))
soccer %>% filter(season_1=="2015-16")%>%ggplot(aes(buildUpPlaySpeed,Total_goals_scored,color=factor(League)))+
theme_bw()+ggtitle("Scatterplot of Goals and BP Speed", "Season 2015-16")+geom_point()+ylab("Total Goals Scored")+xlab("Build up Play - Speed")+ guides(color=guide_legend(title="League"))
########
soccer %>% filter(season_1=="2015-16")%>% ggplot(aes(chanceCreationPassing,Total_goals_scored,color=factor(League)))+
theme_bw()+ggtitle("Scatterplot of Goals and Chance Passing", "Season 2015-16")+geom_point()+ylab("Total Goals Scored")+xlab("Chance Creation-Passing")+ guides(color=guide_legend(title="League"))
soccer %>% filter(season_1=="2015-16")%>%ggplot(aes(chanceCreationCrossing,Total_goals_scored,color=factor(League)))+theme_bw()+ggtitle("Scatterplot of Goals and Chance Crossing", "Season 2015-16")+geom_point()+ylab("Total Goals Scored")+xlab("Chance Creation - Crossing")+ guides(color=guide_legend(title="League"))
soccer %>% filter(season_1=="2015-16")%>%ggplot(aes(chanceCreationShooting,Total_goals_scored,color=factor(League)))+theme_bw()+ggtitle("Scatterplot of Goals and Chance Shooting", "Season 2015-16")+geom_point()+ylab("Total Goals Scored")+xlab("Chance Creation - Shooting")+ guides(color=guide_legend(title="League"))
#######
soccer %>% filter(season_1=="2015-16")%>%ggplot(aes(defenceTeamWidth,Total_goals_scored,color=factor(League)))+
theme_bw()+ggtitle("Scatterplot of Goals and Team Width", "Season 2015-16")+geom_point()+ylab("Total Goals Scored")+xlab("Dfense Team Width")+ guides(color=guide_legend(title="League"))
soccer %>% filter(season_1=="2015-16")%>%ggplot(aes(defencePressure,Total_goals_scored,color=factor(League)))+
theme_bw()+ggtitle("Scatterplot of Goals and Defence Pressure", "Season 2015-16")+geom_point()+ylab("Total Goals Scored")+xlab("Defense Pressure")+ guides(color=guide_legend(title="League"))
soccer %>% filter(season_1=="2015-16")%>%ggplot(aes(defenceAggression,Total_goals_scored,color=factor(League)))+
theme_bw()+ggtitle("Scatterplot of Goals and Defence Aggression", "Season 2015-16")+geom_point()+ylab("Total Goals Scored")+xlab("Defense Aggression")+ guides(color=guide_legend(title="League"))
```
### Scoring Strategy
I am plotting total goals scored against tactical variables to see which combination of strategy leads to more goals. I first plot goals scored against build up play speed, then add in defensive aggression and then finally add in chances created using corssing. I am faceting these plots by league
```{r}
# Creating categorical variables
soccer$speed[buildUpPlaySpeed<=50]<-0
soccer$speed[buildUpPlaySpeed>50]<-1
soccer$speed<-factor(soccer$speed,levels=c(0,1),labels=c("Low","High"))
soccer$DA[defenceAggression<=50]<-0
soccer$DA[defenceAggression>50]<-1
soccer$DA<-factor(soccer$DA,levels=c(0,1),labels=c("Low","High"))
ggplot(soccer,aes(defenceTeamWidth,Total_goals_scored,color=factor(soccer$speed)))+
geom_point(size=2.5)+
ggtitle("Scatterplot of Total Goals and Width","Colored by Speed\nFaceted by League")+
guides(color=guide_legend(title="Build up Speed"),shape=guide_legend(title="Build up Speed"))+
ylab("Total Goals Scored")+xlab("Team Width")+
theme(panel.grid.major = element_blank())+
facet_grid(cols = vars(League))
ggplot(soccer,aes(defenceTeamWidth,Total_goals_scored,color=factor(soccer$speed),shape=factor(soccer$DA)))+
geom_point(size=2.5)+
ggtitle("Scatterplot of Total Goals and Width","Colored by Speed, Shaped by Aggression\nFaceted by League")+
guides(color=guide_legend(title="Build up Speed"),shape=guide_legend(title="Defence Aggression"))+
ylab("Total Goals Scored")+xlab("Team Width")+
theme(panel.grid.major = element_blank())+
facet_grid(cols = vars(League))
ggplot(soccer,aes(defenceTeamWidth,Total_goals_scored,color=factor(soccer$speed),size=chanceCreationCrossing,shape=factor(soccer$DA)))+
geom_point()+
ggtitle("Scatterplot of Total Goals and Width","Colored by Speed, Shaped by Aggression, Sized by Crossing\nFaceted by League")+
guides(color=guide_legend(title="Build up Speed"),shape=guide_legend(title="Defence Aggression"),size=guide_legend(title="Chances Crossing"))+
ylab("Total Goals Scored")+xlab("Team Width")+
theme(panel.grid.major = element_blank())+
facet_grid(cols = vars(League))
ggplot(soccer,aes(defenceTeamWidth,Total_goals_scored,color=factor(soccer$speed),size=chanceCreationCrossing,shape=factor(soccer$DA),alpha=chanceCreationPassing))+
geom_point()+
ggtitle("Scatterplot of Total Goals and Width","Colored by Speed, Shaped by Aggression, Sized by Crossing, Alpha by Passing\nFaceted by League")+
guides(color=guide_legend(title="Build up Speed"),shape=guide_legend(title="Defence Aggression"),size=guide_legend(title="Chances Crossing"),alpha=guide_legend(title="Chances Passing"))+
ylab("Total Goals Scored")+xlab("Team Width")+
theme(panel.grid.major = element_blank())+
facet_grid(cols = vars(League))
```
# **Conclusion**
Row {data-height=850}
-------------------------------------
### **Insights & Findings**
Through this tactical analysis we see that what are some of the tactics that teams use on the field and how they affect the number of goals scored. What tactics work for one team might not neccessarily work for another team but we do see a pattern emerging where high scoring teams are deploying similar tactics. We see spanish teams typically scoring more goals and using tactics of high team width, aggression and chances created using passing. They are also slow in thier build up play so they play the game in a beautiful way. English game on the other hand is more quick in terms of build up play with high reliance on crossing to score goal. They don't use as high of a team width.
Through this analysis we get valuable inisihg between teams and thier methodology and approach to scoring goals on a consistent basis. We also see the differences in the spanish and english league and thier understanding of the game. The data supports the claim that the spanish league is more tactical whereas the english league is more physical. Adding in further variables will give more insight into goal scoring tactics but nevertheless this analysis is a very good stepping stone to that.
# **Write Up**
Row {.tabset .tabset-fade}
-------------------------------------
###Manipulating Match Data
Below I have outlined the steps for data manipulation.
I start with the match dataset and keeping the english and spanish leagues. Then I am keeping the required variables in the dataset. Then I am filtering on the 12 teams that I will be using for this anlysis and creating a unique ID for them in the dataset.
-Splitting data by home and away games
Here I am creating two datasets where I am summing the the number of home goals scored and wins by the 12 teams and another dataset for the number of away goals scored and wins by the 12 teams.Then I am appending these two datasets.
-Appending and summing variables within groups
After appending the dataset here I am summin gup the goals scored, wins and losses by teamand season and creating an ID for the season variable.
-Team and Team Attribute Data
Here I am preparing the team attribute data. I am keeping the required 12 teams and then merging the team and team attribute data and then keeping the required variables. Then I am creating a unique ID for team and season.
-Merging Match and Team dataset to create Final Working data
here I a merging the match and team attributes dataset that I have been manipulating above to create my final working soccer dataset which contains 72 observation by 6 seasons on 12 teams. I am further creating two datasets, one for the Englih league and one for the Spanish league for the data visualization part.
###Univariate Analysis
The second tab on the dashboard provides univariate analysis of variables. It can be classified into 4 categories. Below is the interpretation of the key visualizations and interesting insights obtained from these 4 categories. There is a tab for each category on the dashboard.
- Passing
The first variable is built up play passing. A low value for this means team plays small passes, while a higher value means a team plays long balls.
The histogram shows that we have a range of buildup play passing with some teams opting to play long balls while a lot of teams opting to make smaller passes more frequently.
The line graph for buildup play passing shows that Spanish teams like Atletico Madrid like playing long passes probably because they have taller players who can win ariel headers. Barcelona for e.g., tended to keep play short but that is increasing over times from 2012 to 2016.
We can see a downward trend for the English teams in terms of buildup play passing. Over time they are choosing to keep play short and not play many log balls. Arsenal for e.g., seems like keeps playing short.
- Build Up Play Speed
The histogram also who's that we have a range of values for buildup play speed. We see a contrast between Spanish teams where teams like real Madrid are playing a more counter attacking style of football, probably due to the fact that they have very quick players while teas like Barcelona are choosing to not counterattack but score more from open play. For English teams we don't see a very consistent pattern where most are playing counter attacking football with arsenal showing a deep decline and then eventually showing a sharp rise.
- Wins
We see that teams like Real Madrid and Barcelona have roughly around 150 wins in total while so do teams like Manchester city, Manchester united and Tottenham.
- Goals Scored
We see that Spanish teams lie Real Madrid and Barcelona are recording highest number of goals scored, around 600 to be exact. While English teams are typically scoring less goals with, he highest being Tottenham around 500.
###Bivariate Analysis
The third tab on the dashboard provides bivariate analysis of variables. It can be split into two categories, analysis of 2 key variables and finally best strategies to score goals. The scoring strategy tab is critical as it will us underpin successful strategies to score goals.
If we go to the scoring strategy tab, we see that, Spanish teams tend to have higher team width and they also tend to have lower build up speed as compared to English teams. But they are still scoring more goals than English team. English teams tend to have less width and also are scoring higher goals with higher build up speed.
The second graphs adds on defensive aggression as shape, and it appears that typically teams scoring high number of goals are pressing high as well but it does not guarantee goals as teams scoring less goals are also seen to be using a high aggression.
The third graph adds in chances created using crossing as a size aesthetic. We see that teams scoring high number of goals are creating chances using crossing. So, this appears to chalk out a strategy for teams scoring high number of goals where they are playing a high team width, having high defensive aggression, and creating chances using crossing.
In the last graph I add in an alpha aesthetic for chances created sing passing. And we see that the high scoring Spanish teams are scoring goals using passing techniques whereas we don't see an emphasis on passing as such in the English league, but we see a emphasis on crossing ad high build up play instead. We see an emphasis on passing in the Spanish league and slower build up play with more team width.
This leads me to believe that the Spanish league is more focused on tactics whereas the English league is not that tactical but is built around quick build up football and crossing methods.
###Data Drawbacks
Based on the last set of results, it would be great to have a variable specifying team physicality as that I assume would be high for the English league. Their game is built around high intensity football along with crossing and heading abilities to score goals. Whereas Spanish teams like to pass the ball into the net and the data support that. Having a variable for physicality and heading ability at the team level would further explain how teams play a certain way.
It would also be interesting to add variables for team budget, size of stadium, season ticket holders to see which teams have more money as having tactics is important but you need players with skill to be able to execute those tactics.
Another important variable is coaching team and its prowess. How shrewd the coaching team is in terms of training, giving confidence, recovery and in game substitution is pivotal to scoring goals. I am not sure how that could be quantified in a dataset.
Finally scoring goals on a consistent basis requires forethought and a lot of effort and a combination of things need to fall in place for everything to work.
###Challenges
I faced a challenge of data wrangling during this project. There was a lot of data in the SQL database and making sense of it and brining it all together was not easy. I guess that is one challenge of working with big data that not knowing what coherent analysis to do and digging deep into it to make sense of it. The challenge was bringing together different datasets, filtering the variables needed, merging, appending, and creating new variables for eventual analysis.
For Data visualization, the problem was to decide which variables to use on which graph and which graph best represents the data. Also, to make the project flow, deciding how to continue to add on variables onto graphs to make them more comprehensive and involved.