The Premier League is an English professional league for men’s association football clubs. At the top of the English football league system, it is the country’s primary football competition. Contested by 20 clubs, it operates on a system of promotion and relegation with the English Football League (EFL; known as “The Football League” before 2016-17). Welsh clubs that compete in the English football league system can also qualify.
The Premier League is the most-watched sports league in the world, broadcast in 212 territories to 643 million homes and a potential TV audience of 4.7 billion people. In the 2014-15 season, the average Premier League match attendance exceeded 36,000, second highest of any professional football league behind the Bundesliga’s 43,500. Most stadium occupancies are near capacity. The Premier League ranks third in the UEFA coefficients of leagues based on performances in European competitions over the past five seasons.
The Premier League is considered to be the toughest league to predict, as the gap in quality between the top teams and the bottom table teams is not so wide as other European leagues. Seeing the number one team beaten by the last ranked team is not such a big surprise here. In this project, I will try to try to make sense out the results for the last 8 years of action based on the team attributes for that season taken from EA sports:FIFA.
The following packages are required for the project: library(tidyverse) #For data cleaning library(dplyr) #For Data transformation library(ggplot2) #For plotting graphs library(RSQLite) #For importing SQLite data library(DT) #For DataFrame library(knitr) #For Dataset summary library(ggExtra) #For ggmarginal library(plotly) #for boxplot library(wordcloud) #for wordcloud
library(tidyverse) #For data cleaning
library(dplyr) #For Data transformation
library(ggplot2) #For plotting graphs
library(RSQLite) #For importing SQLite data
library(DT) #For DataFrame
library(knitr) #For Dataset summary
library(ggExtra) #For ggmarginal
library(plotly) #for boxplot
library(wordcloud) #for wordcloudI have used the following Kaggle dataset for this project:
The dataset contains details of:
Our Dataset consists of 9 Tables in total with the following dimensions:
We have donwloaded the Dataset from Kaggle onto our Local drive as of now. Each of the tables in the SQLite file need to be read into separately. Importing SQLite dataset in R:
con <- dbConnect(RSQLite::SQLite(), dbname = "database.sqlite")
dbListTables(con)## [1] "Country" "League" "Match"
## [4] "Player" "Player_Attributes" "Team"
## [7] "Team_Attributes" "sqlite_sequence"
Country = dbGetQuery( con,'select * from Country' )
League = dbGetQuery( con,'select * from League' )
Match = dbGetQuery( con,'select * from Match' )
Player = dbGetQuery( con,'select * from Player' )
Player_Attributes = dbGetQuery( con,'select * from Player_Attributes' )
Team = dbGetQuery( con,'select * from Team' )
Team_Attributes = dbGetQuery( con,'select * from Team_Attributes' )
sqlite_sequence = dbGetQuery( con,'select * from sqlite_sequence' )We are required to merge the filtered data from the given datasets to create the final dataset to be used for our analysis:
Following steps were taken care of:
#Getting League details for England
Country_League <- Country %>%
inner_join(League, by = "id") %>%
rename(countryName = name.x, leagueName = name.y) %>%
select(-country_id) %>%
filter(countryName == "England")
#Getting match details for EPL
League_Matches <- Country_League %>%
inner_join(Match, by = c("id" = "country_id")) %>%
select(-(home_player_X1:BSA)) %>%
mutate(home_team_win = ifelse(home_team_goal > away_team_goal,1,0)) %>%
mutate(year_match = substring(season,1,4))
#getting team ids from match data
home_teams <- League_Matches %>%
distinct(home_team_api_id) %>%
rename(team_id = home_team_api_id)
away_teams <- League_Matches %>%
distinct(away_team_api_id) %>%
rename(team_id = away_team_api_id)
epl_teams <- union(home_teams,away_teams)
#Getting details of teams
team_data <- epl_teams %>%
inner_join(Team, by = c("team_id" = "team_api_id")) %>%
select(team_id,team_long_name,team_short_name) %>%
inner_join(Team_Attributes, by = c("team_id" = "team_api_id")) %>%
select(-ends_with("Class"), -id, -team_fifa_api_id) %>%
mutate(year = substring(date,1,4))
#Getting details of home team
home_team_data <- League_Matches %>%
inner_join(team_data, by = (c("year_match" = "year", "home_team_api_id" = "team_id"))) %>%
rename(home_team_long_name = team_long_name, home_team_short_name = team_short_name,
home_buildUpPlaySpeed = buildUpPlaySpeed, home_buildUpPlayDribbling = buildUpPlayDribbling,
home_buildUpPlayPassing = buildUpPlayPassing, home_chanceCreationPassing = chanceCreationPassing,
home_chanceCreationCrossing = chanceCreationCrossing, home_chanceCreationShooting = chanceCreationShooting,
home_defencePressure = defencePressure, home_defenceAggression = defenceAggression,
home_defenceTeamWidth = defenceTeamWidth)
#Getting details of away team
total_team_data <- home_team_data %>%
inner_join(team_data, by = (c("year_match" = "year", "away_team_api_id" = "team_id"))) %>%
rename(away_team_long_name = team_long_name, away_team_short_name = team_short_name,
away_buildUpPlaySpeed = buildUpPlaySpeed, away_buildUpPlayDribbling = buildUpPlayDribbling,
away_buildUpPlayPassing = buildUpPlayPassing, away_chanceCreationPassing = chanceCreationPassing,
away_chanceCreationCrossing = chanceCreationCrossing, away_chanceCreationShooting = chanceCreationShooting,
away_defencePressure = defencePressure, away_defenceAggression = defenceAggression,
away_defenceTeamWidth = defenceTeamWidth)#Replacing null values in home_buildUpPlayDribbling
total_team_data$home_buildUpPlayDribbling <- coalesce(total_team_data$home_buildUpPlayDribbling, as.integer(summary(total_team_data$home_buildUpPlayDribbling)[4]))
#Replacing null values in away_buildUpPlayDribbling
total_team_data$away_buildUpPlayDribbling <- coalesce(total_team_data$away_buildUpPlayDribbling, as.integer(summary(total_team_data$away_buildUpPlayDribbling)[4]))
#Creating Columns for attribute difference
final_match_data <- total_team_data %>%
mutate(diff_buildUpPlaySpeed = home_buildUpPlaySpeed - away_buildUpPlaySpeed,
diff_buildUpPlayDribbling = home_buildUpPlayDribbling - away_buildUpPlayDribbling,
diff_buildUpPlayPassing = home_buildUpPlayPassing - away_buildUpPlayPassing,
diff_chanceCreationPassing = home_chanceCreationPassing - away_chanceCreationPassing,
diff_chanceCreationCrossing = home_chanceCreationCrossing - away_chanceCreationCrossing,
diff_chanceCreationShooting = home_chanceCreationShooting - away_chanceCreationShooting,
diff_defencePressure = home_defencePressure - away_defencePressure,
diff_defenceAggression = home_defenceAggression - away_defenceAggression,
diff_defenceTeamWidth = home_defenceTeamWidth - away_defenceTeamWidth) %>%
select(-c(id,id.y,league_id,year_match,date.y,date)) %>%
rename(match_date = date.x) %>%
arrange(stage, season)Our Final Dataset consists of 2280 observations and 42 variables.
#Dataset preview
head(final_match_data,200) %>%
datatable(caption = "Match Data")The final dataset: final_match_data has the following characteristics:
## [1] 2280 42
## [1] "countryName" "leagueName"
## [3] "season" "stage"
## [5] "match_date" "match_api_id"
## [7] "home_team_api_id" "away_team_api_id"
## [9] "home_team_goal" "away_team_goal"
## [11] "home_team_win" "home_team_long_name"
## [13] "home_team_short_name" "home_buildUpPlaySpeed"
## [15] "home_buildUpPlayDribbling" "home_buildUpPlayPassing"
## [17] "home_chanceCreationPassing" "home_chanceCreationCrossing"
## [19] "home_chanceCreationShooting" "home_defencePressure"
## [21] "home_defenceAggression" "home_defenceTeamWidth"
## [23] "away_team_long_name" "away_team_short_name"
## [25] "away_buildUpPlaySpeed" "away_buildUpPlayDribbling"
## [27] "away_buildUpPlayPassing" "away_chanceCreationPassing"
## [29] "away_chanceCreationCrossing" "away_chanceCreationShooting"
## [31] "away_defencePressure" "away_defenceAggression"
## [33] "away_defenceTeamWidth" "diff_buildUpPlaySpeed"
## [35] "diff_buildUpPlayDribbling" "diff_buildUpPlayPassing"
## [37] "diff_chanceCreationPassing" "diff_chanceCreationCrossing"
## [39] "diff_chanceCreationShooting" "diff_defencePressure"
## [41] "diff_defenceAggression" "diff_defenceTeamWidth"
kable(summary(final_match_data))| countryName | leagueName | season | stage | match_date | match_api_id | home_team_api_id | away_team_api_id | home_team_goal | away_team_goal | home_team_win | home_team_long_name | home_team_short_name | home_buildUpPlaySpeed | home_buildUpPlayDribbling | home_buildUpPlayPassing | home_chanceCreationPassing | home_chanceCreationCrossing | home_chanceCreationShooting | home_defencePressure | home_defenceAggression | home_defenceTeamWidth | away_team_long_name | away_team_short_name | away_buildUpPlaySpeed | away_buildUpPlayDribbling | away_buildUpPlayPassing | away_chanceCreationPassing | away_chanceCreationCrossing | away_chanceCreationShooting | away_defencePressure | away_defenceAggression | away_defenceTeamWidth | diff_buildUpPlaySpeed | diff_buildUpPlayDribbling | diff_buildUpPlayPassing | diff_chanceCreationPassing | diff_chanceCreationCrossing | diff_chanceCreationShooting | diff_defencePressure | diff_defenceAggression | diff_defenceTeamWidth | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Length:2280 | Length:2280 | Length:2280 | Min. : 1.0 | Length:2280 | Min. : 839796 | Min. : 8191 | Min. : 8191 | Min. :0.000 | Min. :0.000 | Min. :0.0000 | Length:2280 | Length:2280 | Min. :25.00 | Min. :24.0 | Min. :24.00 | Min. :28.00 | Min. :31.00 | Min. :24.00 | Min. :25.00 | Min. :31.00 | Min. :30.00 | Length:2280 | Length:2280 | Min. :25.00 | Min. :24.0 | Min. :24.00 | Min. :28.00 | Min. :31.00 | Min. :24.00 | Min. :25.00 | Min. :31.00 | Min. :30.00 | Min. :-50 | Min. :-29 | Min. :-51 | Min. :-44 | Min. :-43 | Min. :-45 | Min. :-40 | Min. :-35 | Min. :-40 | |
| Class :character | Class :character | Class :character | 1st Qu.:10.0 | Class :character | 1st Qu.:1025182 | 1st Qu.: 8551 | 1st Qu.: 8551 | 1st Qu.:1.000 | 1st Qu.:0.000 | 1st Qu.:0.0000 | Class :character | Class :character | 1st Qu.:48.00 | 1st Qu.:38.0 | 1st Qu.:43.50 | 1st Qu.:42.00 | 1st Qu.:50.00 | 1st Qu.:45.75 | 1st Qu.:38.00 | 1st Qu.:41.00 | 1st Qu.:45.00 | Class :character | Class :character | 1st Qu.:48.00 | 1st Qu.:38.0 | 1st Qu.:43.50 | 1st Qu.:42.00 | 1st Qu.:50.00 | 1st Qu.:45.75 | 1st Qu.:38.00 | 1st Qu.:41.00 | 1st Qu.:45.00 | 1st Qu.:-10 | 1st Qu.: 0 | 1st Qu.:-11 | 1st Qu.:-10 | 1st Qu.: -9 | 1st Qu.:-10 | 1st Qu.:-10 | 1st Qu.:-10 | 1st Qu.: -7 | |
| Mode :character | Mode :character | Mode :character | Median :19.5 | Mode :character | Median :1351780 | Median : 8663 | Median : 8663 | Median :1.000 | Median :1.000 | Median :0.0000 | Mode :character | Mode :character | Median :58.50 | Median :38.0 | Median :51.00 | Median :49.50 | Median :59.00 | Median :54.00 | Median :44.00 | Median :50.00 | Median :51.00 | Mode :character | Mode :character | Median :58.50 | Median :38.0 | Median :51.00 | Median :49.50 | Median :59.00 | Median :54.00 | Median :44.00 | Median :50.00 | Median :51.00 | Median : 0 | Median : 0 | Median : 0 | Median : 0 | Median : 0 | Median : 0 | Median : 0 | Median : 0 | Median : 0 | |
| NA | NA | NA | Mean :19.5 | NA | Mean :1380333 | Mean : 9195 | Mean : 9195 | Mean :1.552 | Mean :1.187 | Mean :0.4491 | NA | NA | Mean :56.57 | Mean :38.3 | Mean :52.14 | Mean :50.73 | Mean :57.28 | Mean :52.37 | Mean :44.77 | Mean :50.39 | Mean :50.83 | NA | NA | Mean :56.57 | Mean :38.3 | Mean :52.14 | Mean :50.73 | Mean :57.28 | Mean :52.37 | Mean :44.77 | Mean :50.39 | Mean :50.83 | Mean : 0 | Mean : 0 | Mean : 0 | Mean : 0 | Mean : 0 | Mean : 0 | Mean : 0 | Mean : 0 | Mean : 0 | |
| NA | NA | NA | 3rd Qu.:29.0 | NA | 3rd Qu.:1724171 | 3rd Qu.:10003 | 3rd Qu.:10003 | 3rd Qu.:2.000 | 3rd Qu.:2.000 | 3rd Qu.:1.0000 | NA | NA | 3rd Qu.:65.00 | 3rd Qu.:38.0 | 3rd Qu.:61.00 | 3rd Qu.:59.25 | 3rd Qu.:68.25 | 3rd Qu.:60.00 | 3rd Qu.:50.25 | 3rd Qu.:58.00 | 3rd Qu.:56.00 | NA | NA | 3rd Qu.:65.00 | 3rd Qu.:38.0 | 3rd Qu.:61.00 | 3rd Qu.:59.25 | 3rd Qu.:68.25 | 3rd Qu.:60.00 | 3rd Qu.:50.25 | 3rd Qu.:58.00 | 3rd Qu.:56.00 | 3rd Qu.: 10 | 3rd Qu.: 0 | 3rd Qu.: 11 | 3rd Qu.: 10 | 3rd Qu.: 9 | 3rd Qu.: 10 | 3rd Qu.: 10 | 3rd Qu.: 10 | 3rd Qu.: 7 | |
| NA | NA | NA | Max. :38.0 | NA | Max. :1989079 | Max. :10261 | Max. :10261 | Max. :8.000 | Max. :6.000 | Max. :1.0000 | NA | NA | Max. :77.00 | Max. :60.0 | Max. :80.00 | Max. :72.00 | Max. :76.00 | Max. :80.00 | Max. :70.00 | Max. :70.00 | Max. :70.00 | NA | NA | Max. :77.00 | Max. :60.0 | Max. :80.00 | Max. :72.00 | Max. :76.00 | Max. :80.00 | Max. :70.00 | Max. :70.00 | Max. :70.00 | Max. : 50 | Max. : 29 | Max. : 51 | Max. : 44 | Max. : 43 | Max. : 45 | Max. : 40 | Max. : 35 | Max. : 40 |
WORDCLOUD to show the most succesful teams
The Word cloud plot below depicts the teamwise rankings by the total points, taking all the seasons into consideration. We observe thaat Manchester city and Manchester United top the leaderboard followed by Chelsea and Arsenal.
#team_wise_points:
points_wordcloud_home <- final_match_data_points %>%
select(home_team_long_name, home_team_points) %>% rename(team = home_team_long_name, points = home_team_points)
points_wordcloud_away <- final_match_data_points %>%
select(away_team_long_name, away_team_points) %>% rename(team = away_team_long_name, points = away_team_points)
points_wordcloud <- union_all(points_wordcloud_home,points_wordcloud_away) %>%
group_by(team) %>% summarize(total_points = sum(points)) %>% arrange(desc(total_points))
points_wordcloud[1,1] <- "Man City"
points_wordcloud[2,1] <- "Man Utd"
points_wordcloud[5,1] <- "Tottenham"
set.seed(1234)
wordcloud(words = points_wordcloud$team, freq = (points_wordcloud$total_points)^2,
min.freq = 100, random.order = FALSE, rot.per = 0.4,
colors = brewer.pal(8, "Dark2"))Playing styles across seasons
The figure below is an interactive grouped boxplot of the combined ratings of all teams across six seasons. The ratings are further divided into attack points, defense points and midfield points. We observe that after the first two seasons, the attack points for the remaining seasons has remained fairly constant. Also, the range of midfield points seem to have a decreasing trend as the minimum rating of the teams has improved over the seasons. On the contrary, the range of defense points has increased over the seasons.
These attributes of values for attack, midfield and defense can help to tell about the playing style of the teams.
final_match_data_ratings <- final_match_data %>%
mutate(home_defense_points = home_defencePressure + home_defenceAggression + home_defenceTeamWidth,
away_defense_points = away_defencePressure + away_defenceAggression + away_defenceTeamWidth,
home_midfield_points = home_buildUpPlaySpeed + home_buildUpPlayDribbling + home_buildUpPlayPassing,
away_midfield_points = away_buildUpPlaySpeed + away_buildUpPlayDribbling + away_buildUpPlayPassing,
home_attack_points = home_chanceCreationPassing + home_chanceCreationCrossing + home_chanceCreationShooting,
away_attack_points = away_chanceCreationPassing + away_chanceCreationCrossing + away_chanceCreationShooting)
ratings_season_home <- final_match_data_ratings %>%
select(season, home_team_long_name, home_defense_points, home_midfield_points, home_attack_points) %>%
rename(team = home_team_long_name, defense_points = home_defense_points,
midfield_points = home_midfield_points, attack_points = home_attack_points)
ratings_season_away <- final_match_data_ratings %>%
select(season, away_team_long_name, away_defense_points, away_midfield_points, away_attack_points) %>%
rename(team = away_team_long_name, defense_points = away_defense_points,
midfield_points = away_midfield_points, attack_points = away_attack_points)
rating_season <- union_all(ratings_season_home,ratings_season_away) %>%
group_by(season,team,defense_points,midfield_points,attack_points) %>%
filter(row_number() == 1) %>% gather(category, points, 3:5)
attach(rating_season)
plot_ly(ggplot2::diamonds, x = ~season, y = ~points, color = ~category, type = "box") %>%
layout(title = "Playing styles across seasons", boxmode = "group")Goal Difference
The figure below is a lollipop plot of the difference between the goals scored and goals conceded by each team in all the seasons. A positive goal difference suggests that the team has more number of goals scored than the number of goals conceded while the ones with a negative difference suggest otherwise. We observe that Manchester City and Manchester United have the maximum positive goal difference and hence are at the top of the leaderboard as seen in plot 1.
home_team_goals <- final_match_data %>% select(home_team_long_name, home_team_goal, away_team_goal) %>%
rename(team = home_team_long_name, goals_for = home_team_goal, goals_against = away_team_goal)
away_team_goals <- final_match_data %>% select(away_team_long_name, away_team_goal, home_team_goal) %>%
rename(team = away_team_long_name, goals_for = away_team_goal, goals_against = home_team_goal)
team_goals <- union_all(home_team_goals,away_team_goals) %>%
mutate(goal_difference = goals_for - goals_against) %>% select(team, goal_difference) %>%
group_by(team) %>% summarise(total_goal_difference = sum(goal_difference)) %>% arrange(total_goal_difference) %>%
mutate(Avg = mean(total_goal_difference, na.rm = TRUE),
Above = ifelse(total_goal_difference - Avg > 0, TRUE, FALSE),
team_name = factor(team, levels = .$team))
ggplot(team_goals, aes(total_goal_difference, team_name, color = Above)) +
geom_segment(aes(x = Avg, y = team_name, xend = total_goal_difference, yend = team_name), color = "grey50") +
geom_point()Goals scored/conceded vs team attributes
The plots below are scatterplots of goals scored/goals conceded vs attack ratings/midfield ratings/defense ratings across different seasons. The plots for all seasons have been clubbed together using the facet wrap functionality of ggplot.
final_match_data_ratings <- final_match_data %>%
mutate(home_defense_points = home_defencePressure + home_defenceAggression + home_defenceTeamWidth,
away_defense_points = away_defencePressure + away_defenceAggression + away_defenceTeamWidth,
home_midfield_points = home_buildUpPlaySpeed + home_buildUpPlayDribbling + home_buildUpPlayPassing,
away_midfield_points = away_buildUpPlaySpeed + away_buildUpPlayDribbling + away_buildUpPlayPassing,
home_attack_points = home_chanceCreationPassing + home_chanceCreationCrossing + home_chanceCreationShooting,
away_attack_points = away_chanceCreationPassing + away_chanceCreationCrossing + away_chanceCreationShooting)
ratings_goals_season_home <- final_match_data_ratings %>%
select(season, home_team_long_name, home_defense_points, home_midfield_points, home_attack_points, home_team_goal, away_team_goal) %>%
rename(team = home_team_long_name, defense_points = home_defense_points,
midfield_points = home_midfield_points, attack_points = home_attack_points,
goals_for = home_team_goal, goals_against = away_team_goal)
ratings_goals_season_away <- final_match_data_ratings %>%
select(season, away_team_long_name, away_defense_points, away_midfield_points, away_attack_points, away_team_goal, home_team_goal) %>%
rename(team = away_team_long_name, defense_points = away_defense_points,
midfield_points = away_midfield_points, attack_points = away_attack_points,
goals_for = away_team_goal, goals_against = home_team_goal)
rating_goals_season <- union_all(ratings_goals_season_home,ratings_goals_season_away) %>%
group_by(season,team,defense_points,midfield_points,attack_points) %>%
summarise(total_goals_for = sum(goals_for), total_goals_against = sum(goals_against)) %>%
arrange(total_goals_for)ggplot(data = rating_goals_season, aes(x = attack_points, y = total_goals_for)) +
geom_point() +
facet_wrap(~ season, nrow = 2) + geom_smooth(method = "lm")ggplot(data = rating_goals_season, aes(x = midfield_points, y = total_goals_for)) +
geom_point() +
facet_wrap(~ season, nrow = 2) + geom_smooth(method = "lm")ggplot(data = rating_goals_season, aes(x = defense_points, y = total_goals_for)) +
geom_point() +
facet_wrap(~ season, nrow = 2) + geom_smooth(method = "lm")The next set of plots show a correlation between total goals against and attack points.
ggplot(data = rating_goals_season, aes(x = attack_points, y = total_goals_against)) +
geom_point() +
facet_wrap(~ season, nrow = 2) + geom_smooth(method = "lm")ggplot(data = rating_goals_season, aes(x = midfield_points, y = total_goals_against)) +
geom_point() +
facet_wrap(~ season, nrow = 2) + geom_smooth(method = "lm")ggplot(data = rating_goals_season, aes(x = defense_points, y = total_goals_against)) +
geom_point() +
facet_wrap(~ season, nrow = 2) + geom_smooth(method = "lm")Hence we can see from the plots that teams with higher defense ratings and lower attack ratings tend to perform better
Goals scored/ conceded vs overall ratings
Following plots are used to give the overall trend of goals scored and goals against vs the overall rating(attack+midfield+defense ratings). The plot shows data for all years combined in a single graph.
ratings_combined <- rating_goals_season %>%
mutate(total_rating = attack_points + attack_points + defense_points)
g1 <- ggplot(data = ratings_combined, aes(x = total_rating, y = total_goals_for, color = season)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE)
g2 <- ggplot(data = ratings_combined, aes(x = total_rating, y = total_goals_against, color = season)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE)This graph shows the number of goals scored vs the overall ratings. The general trend shows that as the overall rating goes on increasing, number of goals scored by the team goes on decreasing.
ggMarginal(g1, type = "histogram", fill = "transparent")This graph shows the number of goals conceded vs the overall ratings. The general trend shows that as the overall rating goes on increasing, number of goals conceded by the team also goes on increasing.
ggMarginal(g2, type = "histogram", fill = "transparent")The plots suggest that teams with lower overall attributes tend to perform better.
Team rankings and league points through the seasons
This is a general scatter plot showing the number of points per season of a team and how those points go on varying between years. This plot also helps us by showing the change in league position of the team season by season and gives some good insights about the jump in rankings or fall in rankings for the team. It shows how unpredictable the leagus is as teams can jump multiple places or fall multiple places in consecutive years. A low ranked team in a season can actually end up winning the league next season.
points_home <- final_match_data_points %>%
select(season,home_team_long_name, home_team_points) %>% rename(team = home_team_long_name, points = home_team_points)
points_away <- final_match_data_points %>%
select(season,away_team_long_name, away_team_points) %>% rename(team = away_team_long_name, points = away_team_points)
points_team <- union_all(points_home,points_away) %>%
group_by(season,team) %>% summarize(season_total_points = sum(points)) %>% arrange(desc(season_total_points))
ggplot(data = points_team, aes(x = season, y = season_total_points, group = team, color = team)) +
geom_point() + geom_line() +
labs(x = "season", y = "Sason points", title = "Season points data")Exploratory Analysis:
The Premier League is considered to be the toughest league to predict, as the gap in quality between the top teams and the bottom table teams is not so wide as other European leagues. Based on the stats for team ratings, this project tried to find some relation between team’s performances and also gave details about the best teams in England during this period.
With the use of the extensive graphical capabilities available in R, we provided some key insights about the data:
The trends in the data have been unearthed through visualization only, but we could use some data mining techniques like clustering algorithms to find out more insights from the data.