Background: We have been hired by AS Roma team as their resident data scientists to help their coach to increase Roma’s success on the field and decrease failure. We present an exploratory work where we apply Association Rules to the chosen dataset in search of existing patterns. The key challenges are that there are multiple levels of attributes, team attributes, individual player attributes associated with each match results, and also the patterns have been changed over different match seasons.
How can Roma maintain its top two position in the Italian League so that they qualify for the Champions League without playing playoff games?
Assumptions
In order to increase the efficacy of our analysis and aid management in making informed decisions, we sought to focus on the most recent two years and provide a high-level overview of the match results and pattern, and then took a deep dive into analysis regarding each match involving Roma in most recent 2 seasons given the playing styles changes with the manager and the league itself over the years.
To quantify wins, losses and draws and to make comparison between team performances we followed the standard quantifying metrics used in Europian football where teams will be awarded three points for each win, no point for each loss and one point for every draw i.e. Win = 3 points Loss = 0 point Draw = 1 point
First we are loading the data from the sqlite database and converting them into data frames for analysis in R
setwd('/Users/anunaysmac/Home/MSBA Classes/Fall/EDA/HW1/HW 1/HW 1')
con <- src_sqlite("euro_soccer.sqlite")
long_team_name <- 'Roma'
country_tbl <- tbl(con, "country")
country <- collect(country_tbl)
league_tbl <- tbl(con, "league")
league <- collect(league_tbl)
match_tbl <- tbl(con, "match")
match <- collect(match_tbl)
player_tbl <- tbl(con, "player")
player <- collect(player_tbl)
player_atts_tbl <- tbl(con, "player_attributes")
player_attributes <- collect(player_atts_tbl)
team_tbl <- tbl(con, "team")
team <- collect(team_tbl)
team_atts_tbl <- tbl(con, "team_attributes")
team_attributes <- collect(team_atts_tbl)
We are first looking at the table containing match data which gives us the details of the league, season, teams playing the match, players as well as the goals by each team
match_imp <- match %>%
select(id,country_id,season,stage,date,
match_api_id,home_team_api_id,away_team_api_id,
home_team_goal,away_team_goal)
match_imp$win_team <- ifelse(match_imp$home_team_goal > match_imp$away_team_goal,
"HOME",
ifelse(match_imp$home_team_goal < match_imp$away_team_goal,
"AWAY", "TIE"))
match_imp$win_team <- as.factor(match_imp$win_team)
summary(match_imp)
## id country_id season stage
## Min. : 1 Min. : 1 Length:25979 Min. : 1.00
## 1st Qu.: 6496 1st Qu.: 4769 Class :character 1st Qu.: 9.00
## Median :12990 Median :10257 Mode :character Median :18.00
## Mean :12990 Mean :11739 Mean :18.24
## 3rd Qu.:19484 3rd Qu.:17642 3rd Qu.:27.00
## Max. :25979 Max. :24558 Max. :38.00
## date match_api_id home_team_api_id away_team_api_id
## Length:25979 Min. : 483129 Min. : 1601 Min. : 1601
## Class :character 1st Qu.: 768436 1st Qu.: 8475 1st Qu.: 8475
## Mode :character Median :1147511 Median : 8697 Median : 8697
## Mean :1195429 Mean : 9984 Mean : 9984
## 3rd Qu.:1709852 3rd Qu.: 9925 3rd Qu.: 9925
## Max. :2216672 Max. :274581 Max. :274581
## home_team_goal away_team_goal win_team
## Min. : 0.000 Min. :0.000 AWAY: 7466
## 1st Qu.: 1.000 1st Qu.:0.000 HOME:11917
## Median : 1.000 Median :1.000 TIE : 6596
## Mean : 1.545 Mean :1.161
## 3rd Qu.: 2.000 3rd Qu.:2.000
## Max. :10.000 Max. :9.000
In the match table, we select only the important columns pertinant to our analysis and we add a column that gives us the details of the team that won the particular match as a categorical variable - HOME TEAM, AWAY TEAM or TIE
We specifically look at the Italian league where Roma plays and create the score tables from the last 3 seasons to explore how roma has been doing. In our dataset, Roma has the team ID 8686. We look at the points scored at home and away games seperately as well as the total points scored in a particular season.
Top scoring teams of 2015-16 in the Italian league
## Joining, by = "team_api_id"
Top scoring teams of 2014-15 in the Italian league
## Joining, by = "team_api_id"
Top scoring teams of 2013-14 in the Italian league
## Joining, by = "team_api_id"
From the last 3 seasons we can see that Roma’s position has dropped from 2nd in 2013-14 and 2014-15 to 3rd in 2015-16, where Napoli has taken over Roma on the soreboard, however, just by 2 points.
Roma’s home and away performance across the last two seasons
roma1415_teampoints %>%
arrange(desc(total_points)) %>%
head()
roma1516_teampoints %>%
arrange(desc(total_points)) %>%
head()
Another interesting point is that Roma has been losing more points at home compared to other top teams in the league. Considering the home advantage of the fans cheering for the team as well the ground which is suitable for the home. Also, Roma has been losing points at home to teams who are lower down the table against whom Roma is expected to win.
#looking at last 2 seasons at home
roma_matches_last2_home <- match_imp %>%
filter(home_team_api_id == 8686) %>%
filter(season == '2014/2015' | season == '2015/2016')
# 38 matchs total
roma_matches_last2_home$win_team <- ifelse(roma_matches_last2_home$home_team_goal > roma_matches_last2_home$away_team_goal,"HOME",
ifelse(roma_matches_last2_home$home_team_goal < roma_matches_last2_home$away_team_goal, "AWAY", "TIE"))
roma_matches_last2_home$win_team <- as.factor(roma_matches_last2_home$win_team)
ggplot(roma_matches_last2_home, aes(x=win_team))+
geom_bar(fill='darkorange')+
theme_classic() +
ggtitle("Roma's performance on Home ground in 2014/15-2015/16")+
ylab("Number of games")+
xlab("Who won?")
Looking at the trend of past seasons, Roma’s loss percentage at home is very less: they lose 1-2 matches out of 19 matches played at home. We can also observe from the trend that Roma has been drawing a substantial number of matches with teams that are lower in rank. These draws results in 10-15 points loss for Roma which is crucial for Roma’s performance.
So we are looking at teams against which Roma is tying and should change their strategy to convert the ties into wins.
We look at teams which are very down in terms of scores in the league as compared to Roma. Roma should not be drawing against these teams, especially in their home ground.
teams_loss_draw <- roma_matches_last2_home %>%
filter(win_team == 'AWAY' | win_team == 'TIE') %>%
select(away_team_api_id) %>%
unique()
teams_draw <- roma_matches_last2_home %>%
filter(win_team == 'TIE') %>%
select(away_team_api_id) %>%
unique()
#teams which they are losing / drawing at home
ld_teams <- c(7943,8564,8543,8534,10167,9885,9882,8524,8540,9876,8636,9857)
#teams which they are drawing at home
d_teams <- c( 7943, 8564, 8543, 8534,10167, 9885, 8524, 9876, 8636, 9857)
#teams which they are losing / drawing at home multiple times
mult_ld_teams <- c(7943,8524,8564)
drawteam_points <- full_join(team,roma1516_teampoints,by='team_api_id') %>%
filter(team_api_id %in% d_teams) %>%
arrange(desc(total_points))
#most of these have very low points
ggplot(drawteam_points, aes(x =team_long_name, y = total_points))+
geom_bar(stat="identity",fill="lightblue")+
geom_hline(yintercept=80, linetype="dashed", color = "red",size=1.5)+
scale_y_continuous(breaks=seq(0,100,10))+
theme_classic() +
ggtitle("Team Points compared to Roma 2015-16 Season")+
annotate("text", x = "Parma", y = 5,label = "Not \nQualified",color = "red")+
ylab("Total Points")+
xlab("Team Name")
drawteam_points <- full_join(team,roma1415_teampoints,by='team_api_id') %>%
filter(team_api_id %in% d_teams) %>%
arrange(desc(total_points))
#most of these have very low points
ggplot(drawteam_points, aes(x =team_long_name, y = total_points))+
geom_bar(stat="identity",fill="lightblue")+
geom_hline(yintercept=70, linetype="dashed", color = "red",size=1.5)+
scale_y_continuous(breaks=seq(0,100,10))+
ggtitle("Team Points compared to Roma 2014-15 Season")+
ylab("Total Points")+
xlab("Team Name")+
annotate("text", x = "Bologna", y = 5,label = "Not \nQualified",color = "red")+
theme_classic()
Looking into Parma, Bologna, and Atalanta since these teams are very low in the league table and Roma has to win the matches against these players at home.
If the match is turning out to be a draw in the first half, the manager has to make appropriate substitutions in the second half who can score a goal and win the game which gives 2 additional points to Roma and Roma is naturally a second-half scoring team as shown above in the graph. Roma can leverage this further and bring on more impact players against a particular team. Below code will find player attributes who have a higher chance of scoring a goal in the second half against the teams selected above
Extracting Goals column from the match table to identify the players’ scores in a particular match. Based on the time the goal was scored, we categorise it as first or second half. Other attributes are bucketed to categorical quantiles.
roma_losses_draws <- roma_matches_last2_home %>%
filter(win_team == 'TIE')
match_italian <- match %>%
filter(country_id == 10257)
match_italian1 <- match_italian[!((match_italian$home_team_goal == 0) & (match_italian$away_team_goal == 0) ),]
match_italian1 <- match_italian1[!is.na(match_italian1$goal), ]
match_italian1 <- match_italian1[!(match_italian1$goal == 'NA'), ]
match_goals <-data.frame()
x <- seq(1,2775,by=1)
for (i in x)
{
match_goal1 <- xmlToDataFrame(match_italian1$goal[i])
match_goal1$matchid <- match_italian1$match_api_id[i]
if (!("player2" %in% colnames(match_goal1)))
{
match_goal1$player2 <- ''
}
if (!("subtype" %in% colnames(match_goal1)))
{
match_goal1$subtype <- ''
}
if (!("player1" %in% colnames(match_goal1) ))
{
match_goal1$player1<- ''
}
if (!("elapsed" %in% colnames(match_goal1) ))
{
match_goal1$elapsed <- ''
}
if (!("team" %in% colnames(match_goal1) ))
{
match_goal1$team <- ''
}
match_goal1 <- match_goal1 %>% select(elapsed,player2,subtype,player1,team,matchid)
match_goal1 <- match_goal1[c("elapsed","player2","subtype","player1","team","matchid")]
match_goals <- rbind(match_goals,match_goal1)
}
goals_imp <- match_goals %>%
select(elapsed,player1,team,matchid)
colnames(goals_imp) <- c('goal_time','player_api_id','team_api_id','match_api_id')
match_goals <- left_join(goals_imp,match,by='match_api_id') %>%
select(goal_time,player_api_id,team_api_id,match_api_id,home_team_api_id,away_team_api_id) %>%
filter(team_api_id == home_team_api_id) %>%
select(-team_api_id) %>%
filter(away_team_api_id %in% d_teams)
match_goals$player_api_id <- as.numeric(as.character(match_goals$player_api_id))
roma_goal_times <- match_goals %>%
filter(home_team_api_id == 8686)
roma_goal_times$goal_time <- as.numeric(as.character(roma_goal_times$goal_time))
#latest
goal_scorer_attributes <- inner_join(match_goals, player_attributes, by='player_api_id')
goal_scorer_attributes <- goal_scorer_attributes %>%
group_by(player_api_id) %>%
slice(which.max(as.Date(date)))
goal_scorer_attributes <- goal_scorer_attributes %>%
select(-c("match_api_id","home_team_api_id","id","player_fifa_api_id","date",contains("gk")))
goal_scorer_attributes$goal_time <- as.numeric(goal_scorer_attributes$goal_time)
goal_scorer_attributes$goal_time <- ifelse(goal_scorer_attributes$goal_time < 45, "FIRST_HALF","SECOND_HALF")
goal_scorer_attributes <- goal_scorer_attributes %>%
drop_na()
goal_scorer_attributes <- within(goal_scorer_attributes,
overall_rating <- as.factor(cut(overall_rating ,
quantile(overall_rating , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes,
potential <- as.factor(cut(potential ,
quantile(potential , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes,
crossing <- as.factor(cut(crossing ,
quantile(crossing , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes,
finishing <- as.factor(cut(finishing ,
quantile(finishing , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes,
heading_accuracy <- as.factor(cut(heading_accuracy ,
quantile(heading_accuracy , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes,
short_passing <- as.factor(cut(short_passing ,
quantile(short_passing , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes,
volleys <- as.factor(cut(volleys ,
quantile(volleys , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes,
dribbling <- as.factor(cut(dribbling ,
quantile(dribbling , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes,
curve <- as.factor(cut(curve ,
quantile(curve , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes,
free_kick_accuracy <- as.factor(cut(free_kick_accuracy ,
quantile(free_kick_accuracy, probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes,
long_passing <- as.factor(cut(long_passing ,
quantile(long_passing , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes,
ball_control <- as.factor(cut(ball_control ,
quantile(ball_control , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes,
acceleration <- as.factor(cut(acceleration ,
quantile(acceleration , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes,
sprint_speed <- as.factor(cut(sprint_speed ,
quantile(sprint_speed , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes,
agility <- as.factor(cut(agility ,
quantile(agility , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes,
reactions <- as.factor(cut(reactions ,
quantile(reactions , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes,
balance <- as.factor(cut(balance ,
quantile(balance , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes,
shot_power <- as.factor(cut(shot_power ,
quantile(shot_power , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes,
jumping <- as.factor(cut(jumping ,
quantile(jumping , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes,
stamina <- as.factor(cut(stamina ,
quantile(stamina , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes,
strength <- as.factor(cut(strength ,
quantile(strength , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes,
long_shots <- as.factor(cut(long_shots ,
quantile(long_shots , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes,
aggression <- as.factor(cut(aggression ,
quantile(aggression , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes,
interceptions <- as.factor(cut(interceptions ,
quantile(interceptions , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes,
positioning <- as.factor(cut(positioning ,
quantile(positioning , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes,
vision <- as.factor(cut(vision ,
quantile(vision , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes,
penalties <- as.factor(cut(penalties ,
quantile(penalties , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes,
marking <- as.factor(cut(marking ,
quantile(marking , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes,
standing_tackle <- as.factor(cut(standing_tackle ,
quantile(standing_tackle , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes,
sliding_tackle <- as.factor(cut(sliding_tackle ,
quantile(sliding_tackle , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes$goal_time <- as.factor(goal_scorer_attributes$goal_time)
goal_scorer_attributes$preferred_foot <- as.factor(goal_scorer_attributes$preferred_foot)
goal_scorer_attributes$attacking_work_rate <- as.factor(goal_scorer_attributes$attacking_work_rate)
goal_scorer_attributes$defensive_work_rate <- as.factor(goal_scorer_attributes$defensive_work_rate)
summary(goal_scorer_attributes)
## goal_time player_api_id away_team_api_id overall_rating
## FIRST_HALF :248 Min. : 5223 Min. : 7943 [62,71]:130
## SECOND_HALF:238 1st Qu.: 31301 1st Qu.: 8543 (71,74]:125
## Median : 41408 Median : 8636 (74,77]:118
## Mean : 91589 Mean : 9089 (77,89]:113
## 3rd Qu.:129226 3rd Qu.: 9876
## Max. :540307 Max. :10167
## potential preferred_foot attacking_work_rate defensive_work_rate
## [62,72]:142 left :113 high :156 high :112
## (72,75]:109 right:373 low : 49 low : 81
## (75,79]:126 medium:276 medium:288
## (79,91]:109 None : 4 o : 4
## norm : 1 ormal : 1
##
## crossing finishing heading_accuracy short_passing volleys
## [11,53]:123 [10,46.2]:122 [10,58]:134 [22,65]:135 [10,53.2]:122
## (53,67]:128 (46.2,62]:126 (58,66]:114 (65,72]:127 (53.2,68]:143
## (67,73]:119 (62,72.8]:116 (66,73]:118 (72,77]:109 (68,74] :114
## (73,91]:116 (72.8,90]:122 (73,90]:120 (77,90]:115 (74,92] :107
##
##
## dribbling curve free_kick_accuracy long_passing ball_control
## [10,61]:123 [11,56]:125 [10,48] :123 [23,54]:128 [21,68]:125
## (61,71]:133 (56,68]:128 (48,61] :123 (54,66]:129 (68,75]:129
## (71,78]:120 (68,75]:127 (61,71.8]:118 (66,74]:125 (75,80]:118
## (78,90]:110 (75,92]:106 (71.8,93]:122 (74,90]:104 (80,90]:114
##
##
## acceleration sprint_speed agility reactions balance
## [27,60.2]:122 [31,64]:132 [29,61] :125 [52,68]:123 [27,59]:128
## (60.2,69]:129 (64,69]:120 (61,69.5]:118 (68,72]:144 (59,69]:124
## (69,78] :117 (69,78]:124 (69.5,78]:126 (72,76]:121 (69,76]:118
## (78,94] :118 (78,96]:110 (78,95] :117 (76,91]: 98 (76,95]:116
##
##
## shot_power jumping stamina strength long_shots
## [23,69]:133 [27,64] :130 [20,60]:128 [30,66]:137 [11,58.2]:122
## (69,74]:125 (64,70] :129 (60,69]:145 (66,72]:108 (58.2,69]:128
## (74,79]:124 (70,77.8]:105 (69,76]: 96 (72,78]:135 (69,75] :117
## (79,93]:104 (77.8,93]:122 (76,94]:117 (78,93]:106 (75,91] :119
##
##
## aggression interceptions positioning vision penalties
## [12,56.2]:122 [13,30]:130 [10,59]:123 [14,54.2]:122 [11,55]:131
## (56.2,68]:125 (30,60]:114 (59,70]:128 (54.2,67]:124 (55,67]:119
## (68,76] :121 (60,74]:129 (70,76]:116 (67,74] :119 (67,74]:129
## (76,92] :118 (74,92]:113 (76,94]:119 (74,94] :121 (74,92]:107
##
##
## marking standing_tackle sliding_tackle
## [11,24] :137 [12,30] :129 [10,27]:124
## (24,51.5]:106 (30,61] :115 (27,59]:121
## (51.5,70]:126 (61,74.8]:120 (59,73]:124
## (70,90] :117 (74.8,90]:122 (73,90]:117
##
##
Now that we have a list of player attributes that help win against these teams, lets analyse them individually.
#looking at each of the away teams bologna(9857),empoli(8534),parma(10167)
parma <- goal_scorer_attributes %>%
filter(away_team_api_id == 10167)
parma <- parma %>%
select(-c("player_api_id","away_team_api_id"))
## Adding missing grouping variables: `player_api_id`
parma$player_api_id <- as.factor(parma$player_api_id)
ar_all_trans <- as(parma,"transactions")
#Use the apriori method, the algorithm we discussed in class
rules <- apriori(ar_all_trans, parameter = list(conf = 0.5,minlen=2),
appearance = list(rhs=c("goal_time=SECOND_HALF",
"goal_time=FIRST_HALF"), default="lhs"))
## Warning in apriori(ar_all_trans, parameter = list(conf = 0.5, minlen = 2), :
## Mining stopped (maxlen reached). Only patterns up to a length of 10 returned!
rules_chosen_parma <- rules %>%
subset(subset = (rhs %pin% "goal_time=SECOND_HALF")) %>%
subset(subset = lift > 2) %>%
sort(by = "conf")
inspect(rules_chosen_parma[1:10])
plot(rules_chosen_parma[1:5],method="graph",size="lift",
shading = "conf",
control=list(main="Player attributes to Score against Parma in Second Half"))
## Available control parameters (with default values):
## main = Graph for 5 rules
## nodeColors = c("#66CC6680", "#9999CC80")
## nodeCol = c("#EE0000FF", "#EE0303FF", "#EE0606FF", "#EE0909FF", "#EE0C0CFF", "#EE0F0FFF", "#EE1212FF", "#EE1515FF", "#EE1818FF", "#EE1B1BFF", "#EE1E1EFF", "#EE2222FF", "#EE2525FF", "#EE2828FF", "#EE2B2BFF", "#EE2E2EFF", "#EE3131FF", "#EE3434FF", "#EE3737FF", "#EE3A3AFF", "#EE3D3DFF", "#EE4040FF", "#EE4444FF", "#EE4747FF", "#EE4A4AFF", "#EE4D4DFF", "#EE5050FF", "#EE5353FF", "#EE5656FF", "#EE5959FF", "#EE5C5CFF", "#EE5F5FFF", "#EE6262FF", "#EE6666FF", "#EE6969FF", "#EE6C6CFF", "#EE6F6FFF", "#EE7272FF", "#EE7575FF", "#EE7878FF", "#EE7B7BFF", "#EE7E7EFF", "#EE8181FF", "#EE8484FF", "#EE8888FF", "#EE8B8BFF", "#EE8E8EFF", "#EE9191FF", "#EE9494FF", "#EE9797FF", "#EE9999FF", "#EE9B9BFF", "#EE9D9DFF", "#EE9F9FFF", "#EEA0A0FF", "#EEA2A2FF", "#EEA4A4FF", "#EEA5A5FF", "#EEA7A7FF", "#EEA9A9FF", "#EEABABFF", "#EEACACFF", "#EEAEAEFF", "#EEB0B0FF", "#EEB1B1FF", "#EEB3B3FF", "#EEB5B5FF", "#EEB7B7FF", "#EEB8B8FF", "#EEBABAFF", "#EEBCBCFF", "#EEBDBDFF", "#EEBFBFFF", "#EEC1C1FF", "#EEC3C3FF", "#EEC4C4FF", "#EEC6C6FF", "#EEC8C8FF", "#EEC9C9FF", "#EECBCBFF", "#EECDCDFF", "#EECFCFFF", "#EED0D0FF", "#EED2D2FF", "#EED4D4FF", "#EED5D5FF", "#EED7D7FF", "#EED9D9FF", "#EEDBDBFF", "#EEDCDCFF", "#EEDEDEFF", "#EEE0E0FF", "#EEE1E1FF", "#EEE3E3FF", "#EEE5E5FF", "#EEE7E7FF", "#EEE8E8FF", "#EEEAEAFF", "#EEECECFF", "#EEEEEEFF")
## edgeCol = c("#474747FF", "#494949FF", "#4B4B4BFF", "#4D4D4DFF", "#4F4F4FFF", "#515151FF", "#535353FF", "#555555FF", "#575757FF", "#595959FF", "#5B5B5BFF", "#5E5E5EFF", "#606060FF", "#626262FF", "#646464FF", "#666666FF", "#686868FF", "#6A6A6AFF", "#6C6C6CFF", "#6E6E6EFF", "#707070FF", "#727272FF", "#747474FF", "#767676FF", "#787878FF", "#7A7A7AFF", "#7C7C7CFF", "#7E7E7EFF", "#808080FF", "#828282FF", "#848484FF", "#868686FF", "#888888FF", "#8A8A8AFF", "#8C8C8CFF", "#8D8D8DFF", "#8F8F8FFF", "#919191FF", "#939393FF", "#959595FF", "#979797FF", "#999999FF", "#9A9A9AFF", "#9C9C9CFF", "#9E9E9EFF", "#A0A0A0FF", "#A2A2A2FF", "#A3A3A3FF", "#A5A5A5FF", "#A7A7A7FF", "#A9A9A9FF", "#AAAAAAFF", "#ACACACFF", "#AEAEAEFF", "#AFAFAFFF", "#B1B1B1FF", "#B3B3B3FF", "#B4B4B4FF", "#B6B6B6FF", "#B7B7B7FF", "#B9B9B9FF", "#BBBBBBFF", "#BCBCBCFF", "#BEBEBEFF", "#BFBFBFFF", "#C1C1C1FF", "#C2C2C2FF", "#C3C3C4FF", "#C5C5C5FF", "#C6C6C6FF", "#C8C8C8FF", "#C9C9C9FF", "#CACACAFF", "#CCCCCCFF", "#CDCDCDFF", "#CECECEFF", "#CFCFCFFF", "#D1D1D1FF", "#D2D2D2FF", "#D3D3D3FF", "#D4D4D4FF", "#D5D5D5FF", "#D6D6D6FF", "#D7D7D7FF", "#D8D8D8FF", "#D9D9D9FF", "#DADADAFF", "#DBDBDBFF", "#DCDCDCFF", "#DDDDDDFF", "#DEDEDEFF", "#DEDEDEFF", "#DFDFDFFF", "#E0E0E0FF", "#E0E0E0FF", "#E1E1E1FF", "#E1E1E1FF", "#E2E2E2FF", "#E2E2E2FF", "#E2E2E2FF")
## alpha = 0.5
## cex = 1
## itemLabels = TRUE
## labelCol = #000000B3
## measureLabels = FALSE
## precision = 3
## layout = NULL
## layoutParams = list()
## arrowSize = 0.5
## engine = igraph
## plot = TRUE
## plot_options = list()
## max = 100
## verbose = FALSE
Based on the rules generated for Parma, we can see that in the second half if we have a player who is right-footed, a player with attacking work rate with a balance in the range (76,95) or a player with a shot power of (69-74) with strength of (30-66) has a higher chance of scoring a goal in the second half.
Similarly, rules are generated for Atalanta and Bologna
atalanta <- goal_scorer_attributes %>%
filter(away_team_api_id == 8524)
atalanta <- atalanta %>%
select(-c("away_team_api_id"))
atalanta$player_api_id <- as.factor(atalanta$player_api_id)
ar_all_trans <- as(atalanta,"transactions")
#Use the apriori method, the algorithm we discussed in class
rules <- apriori(ar_all_trans, parameter = list(conf = 0.5,minlen=2),
appearance = list(rhs=c("goal_time=SECOND_HALF","goal_time=FIRST_HALF"), default="lhs"))
## Warning in apriori(ar_all_trans, parameter = list(conf = 0.5, minlen = 2), :
## Mining stopped (maxlen reached). Only patterns up to a length of 10 returned!
rules_chosen_atalanta <- rules %>%
subset(subset = (rhs %pin% "goal_time=SECOND_HALF")) %>%
subset(subset = lift > 1.9) %>%
sort(by = "conf")
inspect(rules_chosen_atalanta)
plot(rules_chosen_atalanta[1:5],method="graph",size="lift",
shading = "conf",control=list(main="Player attributes to Score against Atalanta in Second Half"))
## Available control parameters (with default values):
## main = Graph for 5 rules
## nodeColors = c("#66CC6680", "#9999CC80")
## nodeCol = c("#EE0000FF", "#EE0303FF", "#EE0606FF", "#EE0909FF", "#EE0C0CFF", "#EE0F0FFF", "#EE1212FF", "#EE1515FF", "#EE1818FF", "#EE1B1BFF", "#EE1E1EFF", "#EE2222FF", "#EE2525FF", "#EE2828FF", "#EE2B2BFF", "#EE2E2EFF", "#EE3131FF", "#EE3434FF", "#EE3737FF", "#EE3A3AFF", "#EE3D3DFF", "#EE4040FF", "#EE4444FF", "#EE4747FF", "#EE4A4AFF", "#EE4D4DFF", "#EE5050FF", "#EE5353FF", "#EE5656FF", "#EE5959FF", "#EE5C5CFF", "#EE5F5FFF", "#EE6262FF", "#EE6666FF", "#EE6969FF", "#EE6C6CFF", "#EE6F6FFF", "#EE7272FF", "#EE7575FF", "#EE7878FF", "#EE7B7BFF", "#EE7E7EFF", "#EE8181FF", "#EE8484FF", "#EE8888FF", "#EE8B8BFF", "#EE8E8EFF", "#EE9191FF", "#EE9494FF", "#EE9797FF", "#EE9999FF", "#EE9B9BFF", "#EE9D9DFF", "#EE9F9FFF", "#EEA0A0FF", "#EEA2A2FF", "#EEA4A4FF", "#EEA5A5FF", "#EEA7A7FF", "#EEA9A9FF", "#EEABABFF", "#EEACACFF", "#EEAEAEFF", "#EEB0B0FF", "#EEB1B1FF", "#EEB3B3FF", "#EEB5B5FF", "#EEB7B7FF", "#EEB8B8FF", "#EEBABAFF", "#EEBCBCFF", "#EEBDBDFF", "#EEBFBFFF", "#EEC1C1FF", "#EEC3C3FF", "#EEC4C4FF", "#EEC6C6FF", "#EEC8C8FF", "#EEC9C9FF", "#EECBCBFF", "#EECDCDFF", "#EECFCFFF", "#EED0D0FF", "#EED2D2FF", "#EED4D4FF", "#EED5D5FF", "#EED7D7FF", "#EED9D9FF", "#EEDBDBFF", "#EEDCDCFF", "#EEDEDEFF", "#EEE0E0FF", "#EEE1E1FF", "#EEE3E3FF", "#EEE5E5FF", "#EEE7E7FF", "#EEE8E8FF", "#EEEAEAFF", "#EEECECFF", "#EEEEEEFF")
## edgeCol = c("#474747FF", "#494949FF", "#4B4B4BFF", "#4D4D4DFF", "#4F4F4FFF", "#515151FF", "#535353FF", "#555555FF", "#575757FF", "#595959FF", "#5B5B5BFF", "#5E5E5EFF", "#606060FF", "#626262FF", "#646464FF", "#666666FF", "#686868FF", "#6A6A6AFF", "#6C6C6CFF", "#6E6E6EFF", "#707070FF", "#727272FF", "#747474FF", "#767676FF", "#787878FF", "#7A7A7AFF", "#7C7C7CFF", "#7E7E7EFF", "#808080FF", "#828282FF", "#848484FF", "#868686FF", "#888888FF", "#8A8A8AFF", "#8C8C8CFF", "#8D8D8DFF", "#8F8F8FFF", "#919191FF", "#939393FF", "#959595FF", "#979797FF", "#999999FF", "#9A9A9AFF", "#9C9C9CFF", "#9E9E9EFF", "#A0A0A0FF", "#A2A2A2FF", "#A3A3A3FF", "#A5A5A5FF", "#A7A7A7FF", "#A9A9A9FF", "#AAAAAAFF", "#ACACACFF", "#AEAEAEFF", "#AFAFAFFF", "#B1B1B1FF", "#B3B3B3FF", "#B4B4B4FF", "#B6B6B6FF", "#B7B7B7FF", "#B9B9B9FF", "#BBBBBBFF", "#BCBCBCFF", "#BEBEBEFF", "#BFBFBFFF", "#C1C1C1FF", "#C2C2C2FF", "#C3C3C4FF", "#C5C5C5FF", "#C6C6C6FF", "#C8C8C8FF", "#C9C9C9FF", "#CACACAFF", "#CCCCCCFF", "#CDCDCDFF", "#CECECEFF", "#CFCFCFFF", "#D1D1D1FF", "#D2D2D2FF", "#D3D3D3FF", "#D4D4D4FF", "#D5D5D5FF", "#D6D6D6FF", "#D7D7D7FF", "#D8D8D8FF", "#D9D9D9FF", "#DADADAFF", "#DBDBDBFF", "#DCDCDCFF", "#DDDDDDFF", "#DEDEDEFF", "#DEDEDEFF", "#DFDFDFFF", "#E0E0E0FF", "#E0E0E0FF", "#E1E1E1FF", "#E1E1E1FF", "#E2E2E2FF", "#E2E2E2FF", "#E2E2E2FF")
## alpha = 0.5
## cex = 1
## itemLabels = TRUE
## labelCol = #000000B3
## measureLabels = FALSE
## precision = 3
## layout = NULL
## layoutParams = list()
## arrowSize = 0.5
## engine = igraph
## plot = TRUE
## plot_options = list()
## max = 100
## verbose = FALSE
#bologna rules
bologna <- goal_scorer_attributes %>%
filter(away_team_api_id == 9857)
bologna <- bologna %>%
select(-c("away_team_api_id"))
bologna$player_api_id <- as.factor(bologna$player_api_id)
ar_all_trans <- as(bologna,"transactions")
#Use the apriori method, the algorithm we discussed in class
rules <- apriori(ar_all_trans, parameter = list(conf = 0.5,minlen=2),
appearance = list(rhs=c("goal_time=SECOND_HALF","goal_time=FIRST_HALF"), default="lhs"))
## Warning in apriori(ar_all_trans, parameter = list(conf = 0.5, minlen = 2), :
## Mining stopped (maxlen reached). Only patterns up to a length of 10 returned!
rules_chosen_bologna <- rules %>%
subset(subset = (rhs %pin% "goal_time=SECOND_HALF")) %>%
subset(subset = lift > 1.5) %>%
sort(by = "conf")
plot(rules_chosen_bologna[1:5],method="graph",
size="lift", shading = "conf",
control=list(main="Player attributes to Score against Bologna in Second Half"))
## Available control parameters (with default values):
## main = Graph for 5 rules
## nodeColors = c("#66CC6680", "#9999CC80")
## nodeCol = c("#EE0000FF", "#EE0303FF", "#EE0606FF", "#EE0909FF", "#EE0C0CFF", "#EE0F0FFF", "#EE1212FF", "#EE1515FF", "#EE1818FF", "#EE1B1BFF", "#EE1E1EFF", "#EE2222FF", "#EE2525FF", "#EE2828FF", "#EE2B2BFF", "#EE2E2EFF", "#EE3131FF", "#EE3434FF", "#EE3737FF", "#EE3A3AFF", "#EE3D3DFF", "#EE4040FF", "#EE4444FF", "#EE4747FF", "#EE4A4AFF", "#EE4D4DFF", "#EE5050FF", "#EE5353FF", "#EE5656FF", "#EE5959FF", "#EE5C5CFF", "#EE5F5FFF", "#EE6262FF", "#EE6666FF", "#EE6969FF", "#EE6C6CFF", "#EE6F6FFF", "#EE7272FF", "#EE7575FF", "#EE7878FF", "#EE7B7BFF", "#EE7E7EFF", "#EE8181FF", "#EE8484FF", "#EE8888FF", "#EE8B8BFF", "#EE8E8EFF", "#EE9191FF", "#EE9494FF", "#EE9797FF", "#EE9999FF", "#EE9B9BFF", "#EE9D9DFF", "#EE9F9FFF", "#EEA0A0FF", "#EEA2A2FF", "#EEA4A4FF", "#EEA5A5FF", "#EEA7A7FF", "#EEA9A9FF", "#EEABABFF", "#EEACACFF", "#EEAEAEFF", "#EEB0B0FF", "#EEB1B1FF", "#EEB3B3FF", "#EEB5B5FF", "#EEB7B7FF", "#EEB8B8FF", "#EEBABAFF", "#EEBCBCFF", "#EEBDBDFF", "#EEBFBFFF", "#EEC1C1FF", "#EEC3C3FF", "#EEC4C4FF", "#EEC6C6FF", "#EEC8C8FF", "#EEC9C9FF", "#EECBCBFF", "#EECDCDFF", "#EECFCFFF", "#EED0D0FF", "#EED2D2FF", "#EED4D4FF", "#EED5D5FF", "#EED7D7FF", "#EED9D9FF", "#EEDBDBFF", "#EEDCDCFF", "#EEDEDEFF", "#EEE0E0FF", "#EEE1E1FF", "#EEE3E3FF", "#EEE5E5FF", "#EEE7E7FF", "#EEE8E8FF", "#EEEAEAFF", "#EEECECFF", "#EEEEEEFF")
## edgeCol = c("#474747FF", "#494949FF", "#4B4B4BFF", "#4D4D4DFF", "#4F4F4FFF", "#515151FF", "#535353FF", "#555555FF", "#575757FF", "#595959FF", "#5B5B5BFF", "#5E5E5EFF", "#606060FF", "#626262FF", "#646464FF", "#666666FF", "#686868FF", "#6A6A6AFF", "#6C6C6CFF", "#6E6E6EFF", "#707070FF", "#727272FF", "#747474FF", "#767676FF", "#787878FF", "#7A7A7AFF", "#7C7C7CFF", "#7E7E7EFF", "#808080FF", "#828282FF", "#848484FF", "#868686FF", "#888888FF", "#8A8A8AFF", "#8C8C8CFF", "#8D8D8DFF", "#8F8F8FFF", "#919191FF", "#939393FF", "#959595FF", "#979797FF", "#999999FF", "#9A9A9AFF", "#9C9C9CFF", "#9E9E9EFF", "#A0A0A0FF", "#A2A2A2FF", "#A3A3A3FF", "#A5A5A5FF", "#A7A7A7FF", "#A9A9A9FF", "#AAAAAAFF", "#ACACACFF", "#AEAEAEFF", "#AFAFAFFF", "#B1B1B1FF", "#B3B3B3FF", "#B4B4B4FF", "#B6B6B6FF", "#B7B7B7FF", "#B9B9B9FF", "#BBBBBBFF", "#BCBCBCFF", "#BEBEBEFF", "#BFBFBFFF", "#C1C1C1FF", "#C2C2C2FF", "#C3C3C4FF", "#C5C5C5FF", "#C6C6C6FF", "#C8C8C8FF", "#C9C9C9FF", "#CACACAFF", "#CCCCCCFF", "#CDCDCDFF", "#CECECEFF", "#CFCFCFFF", "#D1D1D1FF", "#D2D2D2FF", "#D3D3D3FF", "#D4D4D4FF", "#D5D5D5FF", "#D6D6D6FF", "#D7D7D7FF", "#D8D8D8FF", "#D9D9D9FF", "#DADADAFF", "#DBDBDBFF", "#DCDCDCFF", "#DDDDDDFF", "#DEDEDEFF", "#DEDEDEFF", "#DFDFDFFF", "#E0E0E0FF", "#E0E0E0FF", "#E1E1E1FF", "#E1E1E1FF", "#E2E2E2FF", "#E2E2E2FF", "#E2E2E2FF")
## alpha = 0.5
## cex = 1
## itemLabels = TRUE
## labelCol = #000000B3
## measureLabels = FALSE
## precision = 3
## layout = NULL
## layoutParams = list()
## arrowSize = 0.5
## engine = igraph
## plot = TRUE
## plot_options = list()
## max = 100
## verbose = FALSE
Using the rules generated, matching these attributes to see if Roma has players with these attributes : * If Roma doesn’t have players with these attributes they might consider buying a player who matches these attributes according to Roma’s budget. * If Roma has players with these attributes and not playing in the playing 11, Player with the matched attributes should come off the bench and play in the second half.
# Code to pinpoint players
match_roma <- match %>% filter(home_team_api_id == 8686 )
match_roma_14_16 <-match_roma %>% filter(season == '2014/2015' | season == '2015/2016')
player_atts_updated <- player_attributes
player_atts_updated <- merge(x = player_atts_updated, y = player, by = 'player_api_id', all = FALSE)
player_name_atts <- player_atts_updated %>%
select("player_api_id","player_name", "date", "overall_rating",
"potential", "preferred_foot", "attacking_work_rate", "defensive_work_rate"
,"crossing", "finishing", "heading_accuracy",
"short_passing", "volleys", "dribbling",
"curve", "free_kick_accuracy", "long_passing", "ball_control",
"acceleration","sprint_speed", "agility", "reactions", "balance",
"shot_power", "jumping", "stamina", "strength", "long_shots", "aggression",
"interceptions", "positioning", "vision",
"penalties", "marking", "standing_tackle",
"sliding_tackle", "gk_diving", "gk_handling",
"gk_kicking", "gk_positioning", "gk_reflexes",
"birthday", "height", "weight")
player_name_atts <- player_name_atts %>%
group_by(player_api_id) %>%
slice(which.max(as.Date(date)))
home_player_1 <- unique(match_roma_14_16$home_player_1)
home_player_2 <- unique(match_roma_14_16$home_player_2)
home_player_3 <- unique(match_roma_14_16$home_player_3)
home_player_4 <- unique(match_roma_14_16$home_player_4)
home_player_5 <- unique(match_roma_14_16$home_player_5)
home_player_6 <- unique(match_roma_14_16$home_player_6)
home_player_7 <- unique(match_roma_14_16$home_player_7)
home_player_8 <- unique(match_roma_14_16$home_player_8)
home_player_9 <- unique(match_roma_14_16$home_player_9)
home_player_10 <- unique(match_roma_14_16$home_player_10)
home_player_11<- unique(match_roma_14_16$home_player_11)
player_name_atts_updated_roma <- player_name_atts %>%
filter(player_api_id %in% c(home_player_1,home_player_2,home_player_3,
home_player_4, home_player_5, home_player_6,
home_player_7, home_player_8, home_player_9,
home_player_10, home_player_11))
#bologna
players_against_bologna <- player_name_atts_updated_roma %>%
filter(((strength >72 & strength <= 78) & (aggression > 68 & aggression <= 76)))
ggplot(players_against_bologna,aes(x=strength,y=aggression,label=player_name))+
geom_label(aes(label=player_name),fill="darkorange",color="white")+
ggtitle("Players who should be playing against Bologna \nBased on their desired attributes")+
ylab("Strength")+
xlab("Agrgression")+
scale_x_continuous(limits = c(74,80))+
scale_y_continuous(limits = c(68,78))+
theme_classic()
#Atalanta
players_against_atalanta <- player_name_atts_updated_roma %>%
filter(((crossing > 73 & crossing <= 91) & (curve >75 & curve <= 92) ))
ggplot(players_against_atalanta,aes(x=crossing,y=curve,label=player_name))+
geom_label(aes(label=player_name),fill="darkorange",color="white")+
ggtitle("Players who should be playing against Atalanta \nBased on their desired attributes")+
ylab("Curve")+
xlab("Crossing")+
scale_x_continuous(limits = c(72,96))+
scale_y_continuous(limits = c(72,85))+
theme_classic()
#Parma
players_against_parma <- player_name_atts_updated_roma %>%
filter(((shot_power > 69 & shot_power <= 74) & (strength >= 30 & strength <= 66) ))
ggplot(players_against_parma,aes(x=shot_power,y=strength,label=player_name))+
geom_label(aes(label=player_name),fill="darkorange",color="white")+
ggtitle("Players who should be playing against Parma\nBased on their desired attributes")+
ylab("Strength")+
xlab("Shot Power")+
scale_x_continuous(limits = c(67,75))+
scale_y_continuous(limits = c(50,70))+
theme_classic()
It is seen that all against these 3 teams the players with high chances of scoring in the second half is different.
To maintian the top 2 position in the Italian league, Roma has to ensure that it does not lose points by drawing matches on it’s home ground, especially to teams like Bologna, Atalanta and Parma who are quite down the points table as compared to Roma.
While playing agianst Bologna, Atalanta and Parma, always playing the same team with your best players may not be the right strategy.Instead substitutions can be made according to some of the player attributes who have a higher chance of scoring a goal against these teams
At Roma, there are 3 players who match the attributes of high strength and high aggression(Maicon,Totti and Keita) if one of these players are injured these we can substitute the the other two players. This suggests that against Bologna they can play a more aggressive kind of play in the second half by moving high up on the pitch and closing down players quick.
Maicon, Pjanic and Falque match these characteristics of high crossing and curve skill attributes against atalanta suggests that long crossing balls would lead to a goal instead of short passing game.
Salah, Ucan, Perotti and El Shaarawy are players with good shot power and average strength . In the second half these players can be brought on the field if they are on the bench in the first half. In the second half if the team is trailing or if it’s going to be a draw, players can shoot on target with power instead or more intricate style of scoring a goal.