Team Solution

Overview

Background: We have been hired by AS Roma team as their resident data scientists to help their coach to increase Roma’s success on the field and decrease failure. We present an exploratory work where we apply Association Rules to the chosen dataset in search of existing patterns. The key challenges are that there are multiple levels of attributes, team attributes, individual player attributes associated with each match results, and also the patterns have been changed over different match seasons.

Business Problem

How can Roma maintain its top two position in the Italian League so that they qualify for the Champions League without playing playoff games?

Approach

  • First of all, we will identify the teams (‘opponents’) that AS Roma lost to most frequently during home matches, and rank opponents based on their historical performance.
  • Secondly, we will specifically focus on those opponents who normally ranked low in the Italian league.
  • Thirdly, we find out the teams who won over opponents most frequently and identify the individual players (‘star players’) who scored in those teams.
  • Fourth, we explore to understand the player attributes and properties that associating with those star players.
  • Finally, we compare the key player attributes with Roma’s existing player’s attributes and make recommendations based on the final analysis.

Assumptions

In order to increase the efficacy of our analysis and aid management in making informed decisions, we sought to focus on the most recent two years and provide a high-level overview of the match results and pattern, and then took a deep dive into analysis regarding each match involving Roma in most recent 2 seasons given the playing styles changes with the manager and the league itself over the years.

To quantify wins, losses and draws and to make comparison between team performances we followed the standard quantifying metrics used in Europian football where teams will be awarded three points for each win, no point for each loss and one point for every draw i.e. Win = 3 points Loss = 0 point Draw = 1 point

Data Exploration

First we are loading the data from the sqlite database and converting them into data frames for analysis in R

setwd('/Users/anunaysmac/Home/MSBA Classes/Fall/EDA/HW1/HW 1/HW 1')
con <- src_sqlite("euro_soccer.sqlite")
long_team_name <- 'Roma'
country_tbl <- tbl(con, "country")
country <- collect(country_tbl)
league_tbl <- tbl(con, "league")
league <- collect(league_tbl)
match_tbl <- tbl(con, "match")
match <- collect(match_tbl)
player_tbl <- tbl(con, "player")
player <- collect(player_tbl)
player_atts_tbl <- tbl(con, "player_attributes")
player_attributes <- collect(player_atts_tbl)
team_tbl <- tbl(con, "team")
team <- collect(team_tbl)
team_atts_tbl <- tbl(con, "team_attributes")
team_attributes <- collect(team_atts_tbl)

We are first looking at the table containing match data which gives us the details of the league, season, teams playing the match, players as well as the goals by each team

match_imp <- match %>%
  select(id,country_id,season,stage,date,
         match_api_id,home_team_api_id,away_team_api_id,
         home_team_goal,away_team_goal)

match_imp$win_team <- ifelse(match_imp$home_team_goal > match_imp$away_team_goal, 
                             "HOME",
                             ifelse(match_imp$home_team_goal < match_imp$away_team_goal,
                                    "AWAY", "TIE"))

match_imp$win_team <- as.factor(match_imp$win_team)

summary(match_imp)
##        id          country_id       season              stage      
##  Min.   :    1   Min.   :    1   Length:25979       Min.   : 1.00  
##  1st Qu.: 6496   1st Qu.: 4769   Class :character   1st Qu.: 9.00  
##  Median :12990   Median :10257   Mode  :character   Median :18.00  
##  Mean   :12990   Mean   :11739                      Mean   :18.24  
##  3rd Qu.:19484   3rd Qu.:17642                      3rd Qu.:27.00  
##  Max.   :25979   Max.   :24558                      Max.   :38.00  
##      date            match_api_id     home_team_api_id away_team_api_id
##  Length:25979       Min.   : 483129   Min.   :  1601   Min.   :  1601  
##  Class :character   1st Qu.: 768436   1st Qu.:  8475   1st Qu.:  8475  
##  Mode  :character   Median :1147511   Median :  8697   Median :  8697  
##                     Mean   :1195429   Mean   :  9984   Mean   :  9984  
##                     3rd Qu.:1709852   3rd Qu.:  9925   3rd Qu.:  9925  
##                     Max.   :2216672   Max.   :274581   Max.   :274581  
##  home_team_goal   away_team_goal  win_team    
##  Min.   : 0.000   Min.   :0.000   AWAY: 7466  
##  1st Qu.: 1.000   1st Qu.:0.000   HOME:11917  
##  Median : 1.000   Median :1.000   TIE : 6596  
##  Mean   : 1.545   Mean   :1.161               
##  3rd Qu.: 2.000   3rd Qu.:2.000               
##  Max.   :10.000   Max.   :9.000

In the match table, we select only the important columns pertinant to our analysis and we add a column that gives us the details of the team that won the particular match as a categorical variable - HOME TEAM, AWAY TEAM or TIE

Analysis

How has Roma been performing in the Italian League?

We specifically look at the Italian league where Roma plays and create the score tables from the last 3 seasons to explore how roma has been doing. In our dataset, Roma has the team ID 8686. We look at the points scored at home and away games seperately as well as the total points scored in a particular season.

Top scoring teams of 2015-16 in the Italian league

## Joining, by = "team_api_id"

Top scoring teams of 2014-15 in the Italian league

## Joining, by = "team_api_id"

Top scoring teams of 2013-14 in the Italian league

## Joining, by = "team_api_id"

From the last 3 seasons we can see that Roma’s position has dropped from 2nd in 2013-14 and 2014-15 to 3rd in 2015-16, where Napoli has taken over Roma on the soreboard, however, just by 2 points.

Where has Roma been losing most points?

Roma’s home and away performance across the last two seasons

roma1415_teampoints %>%
  arrange(desc(total_points)) %>%
  head()
roma1516_teampoints %>%
  arrange(desc(total_points)) %>%
  head()

Another interesting point is that Roma has been losing more points at home compared to other top teams in the league. Considering the home advantage of the fans cheering for the team as well the ground which is suitable for the home. Also, Roma has been losing points at home to teams who are lower down the table against whom Roma is expected to win.

#looking at last 2 seasons at home
roma_matches_last2_home <- match_imp %>% 
  filter(home_team_api_id == 8686) %>% 
  filter(season == '2014/2015' | season == '2015/2016')

# 38 matchs total

roma_matches_last2_home$win_team <- ifelse(roma_matches_last2_home$home_team_goal > roma_matches_last2_home$away_team_goal,"HOME",
                                    ifelse(roma_matches_last2_home$home_team_goal < roma_matches_last2_home$away_team_goal, "AWAY", "TIE"))

roma_matches_last2_home$win_team <- as.factor(roma_matches_last2_home$win_team)

ggplot(roma_matches_last2_home, aes(x=win_team))+
  geom_bar(fill='darkorange')+
  theme_classic() +
  ggtitle("Roma's performance on Home ground in 2014/15-2015/16")+
  ylab("Number of games")+
  xlab("Who won?")

Looking at the trend of past seasons, Roma’s loss percentage at home is very less: they lose 1-2 matches out of 19 matches played at home. We can also observe from the trend that Roma has been drawing a substantial number of matches with teams that are lower in rank. These draws results in 10-15 points loss for Roma which is crucial for Roma’s performance.

So we are looking at teams against which Roma is tying and should change their strategy to convert the ties into wins.

Teams against whom Roma should not be losing points

We look at teams which are very down in terms of scores in the league as compared to Roma. Roma should not be drawing against these teams, especially in their home ground.

teams_loss_draw <- roma_matches_last2_home %>%
  filter(win_team == 'AWAY' | win_team == 'TIE') %>%
  select(away_team_api_id) %>%
  unique()

teams_draw <- roma_matches_last2_home %>%
  filter(win_team == 'TIE') %>%
  select(away_team_api_id) %>%
  unique()

#teams which they are losing / drawing at home
ld_teams <- c(7943,8564,8543,8534,10167,9885,9882,8524,8540,9876,8636,9857)

#teams which they are  drawing at home
d_teams <- c( 7943, 8564, 8543, 8534,10167, 9885, 8524, 9876, 8636, 9857)

#teams which they are losing / drawing at home multiple times
mult_ld_teams <- c(7943,8524,8564)

drawteam_points <- full_join(team,roma1516_teampoints,by='team_api_id') %>%
  filter(team_api_id %in% d_teams) %>%
  arrange(desc(total_points))

#most of these have very low points 

ggplot(drawteam_points, aes(x =team_long_name, y = total_points))+
  geom_bar(stat="identity",fill="lightblue")+
  geom_hline(yintercept=80, linetype="dashed", color = "red",size=1.5)+
  scale_y_continuous(breaks=seq(0,100,10))+
  theme_classic() +
  ggtitle("Team Points compared to Roma 2015-16 Season")+
  annotate("text", x = "Parma", y = 5,label = "Not \nQualified",color = "red")+
  ylab("Total Points")+
  xlab("Team Name")

drawteam_points <- full_join(team,roma1415_teampoints,by='team_api_id') %>%
  filter(team_api_id %in% d_teams) %>%
  arrange(desc(total_points))

#most of these have very low points 

ggplot(drawteam_points, aes(x =team_long_name, y = total_points))+
  geom_bar(stat="identity",fill="lightblue")+
  geom_hline(yintercept=70, linetype="dashed", color = "red",size=1.5)+
  scale_y_continuous(breaks=seq(0,100,10))+
  ggtitle("Team Points compared to Roma 2014-15 Season")+
  ylab("Total Points")+
  xlab("Team Name")+
  annotate("text", x = "Bologna", y = 5,label = "Not \nQualified",color = "red")+
  theme_classic()

Looking into Parma, Bologna, and Atalanta since these teams are very low in the league table and Roma has to win the matches against these players at home.

Which players from other teams scored goals against Parma, Bologna, and Atalanta and what attributes do they have

If the match is turning out to be a draw in the first half, the manager has to make appropriate substitutions in the second half who can score a goal and win the game which gives 2 additional points to Roma and Roma is naturally a second-half scoring team as shown above in the graph. Roma can leverage this further and bring on more impact players against a particular team. Below code will find player attributes who have a higher chance of scoring a goal in the second half against the teams selected above

Extracting Goals column from the match table to identify the players’ scores in a particular match. Based on the time the goal was scored, we categorise it as first or second half. Other attributes are bucketed to categorical quantiles.

roma_losses_draws <- roma_matches_last2_home %>%
  filter(win_team == 'TIE')

match_italian <- match %>% 
  filter(country_id == 10257)

match_italian1 <- match_italian[!((match_italian$home_team_goal == 0) & (match_italian$away_team_goal == 0) ),]
match_italian1 <- match_italian1[!is.na(match_italian1$goal), ]
match_italian1 <- match_italian1[!(match_italian1$goal == 'NA'), ]


match_goals <-data.frame()

x <- seq(1,2775,by=1)

for (i in x)
{
  match_goal1 <- xmlToDataFrame(match_italian1$goal[i])
  match_goal1$matchid <- match_italian1$match_api_id[i]
  
  if (!("player2" %in% colnames(match_goal1)))
  {
    match_goal1$player2 <- ''
  }
  if (!("subtype" %in% colnames(match_goal1)))
  {
    match_goal1$subtype <- ''
  }
  
  if (!("player1" %in% colnames(match_goal1) ))
  {
    match_goal1$player1<- ''
  }
  if (!("elapsed" %in% colnames(match_goal1) ))
  {
    match_goal1$elapsed <- ''
  }
  
  if (!("team" %in% colnames(match_goal1) ))
  {
    match_goal1$team <- ''
  }
  
  match_goal1 <- match_goal1 %>% select(elapsed,player2,subtype,player1,team,matchid)
  match_goal1 <- match_goal1[c("elapsed","player2","subtype","player1","team","matchid")]
  match_goals <- rbind(match_goals,match_goal1)
}

goals_imp <- match_goals %>%
  select(elapsed,player1,team,matchid)

colnames(goals_imp) <- c('goal_time','player_api_id','team_api_id','match_api_id')

match_goals <- left_join(goals_imp,match,by='match_api_id') %>%
  select(goal_time,player_api_id,team_api_id,match_api_id,home_team_api_id,away_team_api_id) %>%
  filter(team_api_id == home_team_api_id) %>%
  select(-team_api_id) %>%
  filter(away_team_api_id %in% d_teams)

match_goals$player_api_id <- as.numeric(as.character(match_goals$player_api_id))

roma_goal_times <- match_goals %>%
  filter(home_team_api_id == 8686)

roma_goal_times$goal_time <- as.numeric(as.character(roma_goal_times$goal_time))


#latest
goal_scorer_attributes <- inner_join(match_goals, player_attributes, by='player_api_id')

goal_scorer_attributes <- goal_scorer_attributes %>%
  group_by(player_api_id) %>%
  slice(which.max(as.Date(date)))

goal_scorer_attributes <- goal_scorer_attributes %>%
  select(-c("match_api_id","home_team_api_id","id","player_fifa_api_id","date",contains("gk")))

goal_scorer_attributes$goal_time <- as.numeric(goal_scorer_attributes$goal_time)
goal_scorer_attributes$goal_time <- ifelse(goal_scorer_attributes$goal_time < 45, "FIRST_HALF","SECOND_HALF")

goal_scorer_attributes <- goal_scorer_attributes %>%
  drop_na()

goal_scorer_attributes <- within(goal_scorer_attributes, 
    overall_rating      <- as.factor(cut(overall_rating      , 
        quantile(overall_rating    , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes, 
    potential           <- as.factor(cut(potential           , 
        quantile(potential         , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes, 
    crossing            <- as.factor(cut(crossing            , 
        quantile(crossing          , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes, 
    finishing           <- as.factor(cut(finishing           , 
        quantile(finishing         , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes, 
    heading_accuracy    <- as.factor(cut(heading_accuracy   , 
        quantile(heading_accuracy  , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes, 
    short_passing       <- as.factor(cut(short_passing       , 
        quantile(short_passing     , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes, 
    volleys             <- as.factor(cut(volleys             , 
        quantile(volleys           , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes, 
    dribbling           <- as.factor(cut(dribbling           , 
        quantile(dribbling         , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes, 
    curve               <- as.factor(cut(curve               , 
        quantile(curve             , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes, 
    free_kick_accuracy  <- as.factor(cut(free_kick_accuracy , 
        quantile(free_kick_accuracy, probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes, 
    long_passing        <- as.factor(cut(long_passing        , 
        quantile(long_passing      , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes, 
    ball_control        <- as.factor(cut(ball_control        , 
        quantile(ball_control      , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes, 
    acceleration        <- as.factor(cut(acceleration        , 
        quantile(acceleration      , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes, 
    sprint_speed        <- as.factor(cut(sprint_speed        , 
        quantile(sprint_speed      , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes, 
    agility             <- as.factor(cut(agility            , 
        quantile(agility           , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes, 
    reactions           <- as.factor(cut(reactions           , 
        quantile(reactions         , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes, 
    balance             <- as.factor(cut(balance             , 
        quantile(balance           , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes, 
    shot_power          <- as.factor(cut(shot_power          , 
        quantile(shot_power        , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes, 
    jumping             <- as.factor(cut(jumping             , 
        quantile(jumping           , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes, 
    stamina             <- as.factor(cut(stamina            , 
        quantile(stamina           , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes, 
    strength            <- as.factor(cut(strength            , 
        quantile(strength          , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes, 
    long_shots          <- as.factor(cut(long_shots          , 
        quantile(long_shots        , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes, 
    aggression          <- as.factor(cut(aggression          , 
        quantile(aggression        , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes, 
    interceptions       <- as.factor(cut(interceptions       , 
        quantile(interceptions     , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes, 
    positioning         <- as.factor(cut(positioning        , 
        quantile(positioning       , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes, 
    vision              <- as.factor(cut(vision              , 
        quantile(vision            , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes, 
    penalties           <- as.factor(cut(penalties           , 
        quantile(penalties         , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes, 
    marking             <- as.factor(cut(marking             , 
        quantile(marking           , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes, 
    standing_tackle     <- as.factor(cut(standing_tackle     , 
        quantile(standing_tackle   , probs=0:4/4),include.lowest=TRUE)))
goal_scorer_attributes <- within(goal_scorer_attributes, 
    sliding_tackle      <- as.factor(cut(sliding_tackle  , 
        quantile(sliding_tackle    , probs=0:4/4),include.lowest=TRUE)))

goal_scorer_attributes$goal_time <- as.factor(goal_scorer_attributes$goal_time)
goal_scorer_attributes$preferred_foot <- as.factor(goal_scorer_attributes$preferred_foot)
goal_scorer_attributes$attacking_work_rate <- as.factor(goal_scorer_attributes$attacking_work_rate)
goal_scorer_attributes$defensive_work_rate <- as.factor(goal_scorer_attributes$defensive_work_rate)

summary(goal_scorer_attributes)
##        goal_time   player_api_id    away_team_api_id overall_rating
##  FIRST_HALF :248   Min.   :  5223   Min.   : 7943    [62,71]:130   
##  SECOND_HALF:238   1st Qu.: 31301   1st Qu.: 8543    (71,74]:125   
##                    Median : 41408   Median : 8636    (74,77]:118   
##                    Mean   : 91589   Mean   : 9089    (77,89]:113   
##                    3rd Qu.:129226   3rd Qu.: 9876                  
##                    Max.   :540307   Max.   :10167                  
##    potential   preferred_foot attacking_work_rate defensive_work_rate
##  [62,72]:142   left :113      high  :156          high  :112         
##  (72,75]:109   right:373      low   : 49          low   : 81         
##  (75,79]:126                  medium:276          medium:288         
##  (79,91]:109                  None  :  4          o     :  4         
##                               norm  :  1          ormal :  1         
##                                                                      
##     crossing       finishing   heading_accuracy short_passing      volleys   
##  [11,53]:123   [10,46.2]:122   [10,58]:134      [22,65]:135   [10,53.2]:122  
##  (53,67]:128   (46.2,62]:126   (58,66]:114      (65,72]:127   (53.2,68]:143  
##  (67,73]:119   (62,72.8]:116   (66,73]:118      (72,77]:109   (68,74]  :114  
##  (73,91]:116   (72.8,90]:122   (73,90]:120      (77,90]:115   (74,92]  :107  
##                                                                              
##                                                                              
##    dribbling       curve     free_kick_accuracy  long_passing  ball_control
##  [10,61]:123   [11,56]:125   [10,48]  :123      [23,54]:128   [21,68]:125  
##  (61,71]:133   (56,68]:128   (48,61]  :123      (54,66]:129   (68,75]:129  
##  (71,78]:120   (68,75]:127   (61,71.8]:118      (66,74]:125   (75,80]:118  
##  (78,90]:110   (75,92]:106   (71.8,93]:122      (74,90]:104   (80,90]:114  
##                                                                            
##                                                                            
##     acceleration  sprint_speed      agility      reactions      balance   
##  [27,60.2]:122   [31,64]:132   [29,61]  :125   [52,68]:123   [27,59]:128  
##  (60.2,69]:129   (64,69]:120   (61,69.5]:118   (68,72]:144   (59,69]:124  
##  (69,78]  :117   (69,78]:124   (69.5,78]:126   (72,76]:121   (69,76]:118  
##  (78,94]  :118   (78,96]:110   (78,95]  :117   (76,91]: 98   (76,95]:116  
##                                                                           
##                                                                           
##    shot_power       jumping       stamina       strength       long_shots 
##  [23,69]:133   [27,64]  :130   [20,60]:128   [30,66]:137   [11,58.2]:122  
##  (69,74]:125   (64,70]  :129   (60,69]:145   (66,72]:108   (58.2,69]:128  
##  (74,79]:124   (70,77.8]:105   (69,76]: 96   (72,78]:135   (69,75]  :117  
##  (79,93]:104   (77.8,93]:122   (76,94]:117   (78,93]:106   (75,91]  :119  
##                                                                           
##                                                                           
##      aggression  interceptions  positioning        vision      penalties  
##  [12,56.2]:122   [13,30]:130   [10,59]:123   [14,54.2]:122   [11,55]:131  
##  (56.2,68]:125   (30,60]:114   (59,70]:128   (54.2,67]:124   (55,67]:119  
##  (68,76]  :121   (60,74]:129   (70,76]:116   (67,74]  :119   (67,74]:129  
##  (76,92]  :118   (74,92]:113   (76,94]:119   (74,94]  :121   (74,92]:107  
##                                                                           
##                                                                           
##       marking     standing_tackle sliding_tackle
##  [11,24]  :137   [12,30]  :129    [10,27]:124   
##  (24,51.5]:106   (30,61]  :115    (27,59]:121   
##  (51.5,70]:126   (61,74.8]:120    (59,73]:124   
##  (70,90]  :117   (74.8,90]:122    (73,90]:117   
##                                                 
## 

Now that we have a list of player attributes that help win against these teams, lets analyse them individually.

How do we win against Parma?

#looking at each of the away teams bologna(9857),empoli(8534),parma(10167)

parma <- goal_scorer_attributes %>%
  filter(away_team_api_id == 10167) 

parma <- parma %>%
  select(-c("player_api_id","away_team_api_id"))
## Adding missing grouping variables: `player_api_id`
parma$player_api_id <- as.factor(parma$player_api_id)

ar_all_trans <- as(parma,"transactions")


#Use the apriori method, the algorithm we discussed in class
rules <- apriori(ar_all_trans, parameter = list(conf = 0.5,minlen=2), 
                 appearance = list(rhs=c("goal_time=SECOND_HALF",
                                         "goal_time=FIRST_HALF"), default="lhs"))
## Warning in apriori(ar_all_trans, parameter = list(conf = 0.5, minlen = 2), :
## Mining stopped (maxlen reached). Only patterns up to a length of 10 returned!
rules_chosen_parma <- rules %>%
  subset(subset = (rhs %pin% "goal_time=SECOND_HALF")) %>%
  subset(subset = lift > 2) %>%
  sort(by = "conf")

inspect(rules_chosen_parma[1:10])
plot(rules_chosen_parma[1:5],method="graph",size="lift", 
     shading = "conf",
     control=list(main="Player attributes to Score against Parma in Second Half"))
## Available control parameters (with default values):
## main  =  Graph for 5 rules
## nodeColors    =  c("#66CC6680", "#9999CC80")
## nodeCol   =  c("#EE0000FF", "#EE0303FF", "#EE0606FF", "#EE0909FF", "#EE0C0CFF", "#EE0F0FFF", "#EE1212FF", "#EE1515FF", "#EE1818FF", "#EE1B1BFF", "#EE1E1EFF", "#EE2222FF", "#EE2525FF", "#EE2828FF", "#EE2B2BFF", "#EE2E2EFF", "#EE3131FF", "#EE3434FF", "#EE3737FF", "#EE3A3AFF", "#EE3D3DFF", "#EE4040FF", "#EE4444FF", "#EE4747FF", "#EE4A4AFF", "#EE4D4DFF", "#EE5050FF", "#EE5353FF", "#EE5656FF", "#EE5959FF", "#EE5C5CFF", "#EE5F5FFF", "#EE6262FF", "#EE6666FF", "#EE6969FF", "#EE6C6CFF", "#EE6F6FFF", "#EE7272FF", "#EE7575FF",  "#EE7878FF", "#EE7B7BFF", "#EE7E7EFF", "#EE8181FF", "#EE8484FF", "#EE8888FF", "#EE8B8BFF", "#EE8E8EFF", "#EE9191FF", "#EE9494FF", "#EE9797FF", "#EE9999FF", "#EE9B9BFF", "#EE9D9DFF", "#EE9F9FFF", "#EEA0A0FF", "#EEA2A2FF", "#EEA4A4FF", "#EEA5A5FF", "#EEA7A7FF", "#EEA9A9FF", "#EEABABFF", "#EEACACFF", "#EEAEAEFF", "#EEB0B0FF", "#EEB1B1FF", "#EEB3B3FF", "#EEB5B5FF", "#EEB7B7FF", "#EEB8B8FF", "#EEBABAFF", "#EEBCBCFF", "#EEBDBDFF", "#EEBFBFFF", "#EEC1C1FF", "#EEC3C3FF", "#EEC4C4FF", "#EEC6C6FF", "#EEC8C8FF",  "#EEC9C9FF", "#EECBCBFF", "#EECDCDFF", "#EECFCFFF", "#EED0D0FF", "#EED2D2FF", "#EED4D4FF", "#EED5D5FF", "#EED7D7FF", "#EED9D9FF", "#EEDBDBFF", "#EEDCDCFF", "#EEDEDEFF", "#EEE0E0FF", "#EEE1E1FF", "#EEE3E3FF", "#EEE5E5FF", "#EEE7E7FF", "#EEE8E8FF", "#EEEAEAFF", "#EEECECFF", "#EEEEEEFF")
## edgeCol   =  c("#474747FF", "#494949FF", "#4B4B4BFF", "#4D4D4DFF", "#4F4F4FFF", "#515151FF", "#535353FF", "#555555FF", "#575757FF", "#595959FF", "#5B5B5BFF", "#5E5E5EFF", "#606060FF", "#626262FF", "#646464FF", "#666666FF", "#686868FF", "#6A6A6AFF", "#6C6C6CFF", "#6E6E6EFF", "#707070FF", "#727272FF", "#747474FF", "#767676FF", "#787878FF", "#7A7A7AFF", "#7C7C7CFF", "#7E7E7EFF", "#808080FF", "#828282FF", "#848484FF", "#868686FF", "#888888FF", "#8A8A8AFF", "#8C8C8CFF", "#8D8D8DFF", "#8F8F8FFF", "#919191FF", "#939393FF",  "#959595FF", "#979797FF", "#999999FF", "#9A9A9AFF", "#9C9C9CFF", "#9E9E9EFF", "#A0A0A0FF", "#A2A2A2FF", "#A3A3A3FF", "#A5A5A5FF", "#A7A7A7FF", "#A9A9A9FF", "#AAAAAAFF", "#ACACACFF", "#AEAEAEFF", "#AFAFAFFF", "#B1B1B1FF", "#B3B3B3FF", "#B4B4B4FF", "#B6B6B6FF", "#B7B7B7FF", "#B9B9B9FF", "#BBBBBBFF", "#BCBCBCFF", "#BEBEBEFF", "#BFBFBFFF", "#C1C1C1FF", "#C2C2C2FF", "#C3C3C4FF", "#C5C5C5FF", "#C6C6C6FF", "#C8C8C8FF", "#C9C9C9FF", "#CACACAFF", "#CCCCCCFF", "#CDCDCDFF", "#CECECEFF", "#CFCFCFFF", "#D1D1D1FF",  "#D2D2D2FF", "#D3D3D3FF", "#D4D4D4FF", "#D5D5D5FF", "#D6D6D6FF", "#D7D7D7FF", "#D8D8D8FF", "#D9D9D9FF", "#DADADAFF", "#DBDBDBFF", "#DCDCDCFF", "#DDDDDDFF", "#DEDEDEFF", "#DEDEDEFF", "#DFDFDFFF", "#E0E0E0FF", "#E0E0E0FF", "#E1E1E1FF", "#E1E1E1FF", "#E2E2E2FF", "#E2E2E2FF", "#E2E2E2FF")
## alpha     =  0.5
## cex   =  1
## itemLabels    =  TRUE
## labelCol  =  #000000B3
## measureLabels     =  FALSE
## precision     =  3
## layout    =  NULL
## layoutParams  =  list()
## arrowSize     =  0.5
## engine    =  igraph
## plot  =  TRUE
## plot_options  =  list()
## max   =  100
## verbose   =  FALSE

Based on the rules generated for Parma, we can see that in the second half if we have a player who is right-footed, a player with attacking work rate with a balance in the range (76,95) or a player with a shot power of (69-74) with strength of (30-66) has a higher chance of scoring a goal in the second half.

Similarly, rules are generated for Atalanta and Bologna

How do we win against Atalanta?

atalanta <- goal_scorer_attributes %>%
  filter(away_team_api_id == 8524) 

atalanta <- atalanta %>%
  select(-c("away_team_api_id"))

atalanta$player_api_id <- as.factor(atalanta$player_api_id)

ar_all_trans <- as(atalanta,"transactions")


#Use the apriori method, the algorithm we discussed in class
rules <- apriori(ar_all_trans, parameter = list(conf = 0.5,minlen=2), 
                 appearance = list(rhs=c("goal_time=SECOND_HALF","goal_time=FIRST_HALF"), default="lhs"))
## Warning in apriori(ar_all_trans, parameter = list(conf = 0.5, minlen = 2), :
## Mining stopped (maxlen reached). Only patterns up to a length of 10 returned!
rules_chosen_atalanta <- rules %>%
  subset(subset = (rhs %pin% "goal_time=SECOND_HALF")) %>%
  subset(subset = lift > 1.9) %>%
  sort(by = "conf")

inspect(rules_chosen_atalanta)
plot(rules_chosen_atalanta[1:5],method="graph",size="lift", 
     shading = "conf",control=list(main="Player attributes to Score against Atalanta in Second Half"))
## Available control parameters (with default values):
## main  =  Graph for 5 rules
## nodeColors    =  c("#66CC6680", "#9999CC80")
## nodeCol   =  c("#EE0000FF", "#EE0303FF", "#EE0606FF", "#EE0909FF", "#EE0C0CFF", "#EE0F0FFF", "#EE1212FF", "#EE1515FF", "#EE1818FF", "#EE1B1BFF", "#EE1E1EFF", "#EE2222FF", "#EE2525FF", "#EE2828FF", "#EE2B2BFF", "#EE2E2EFF", "#EE3131FF", "#EE3434FF", "#EE3737FF", "#EE3A3AFF", "#EE3D3DFF", "#EE4040FF", "#EE4444FF", "#EE4747FF", "#EE4A4AFF", "#EE4D4DFF", "#EE5050FF", "#EE5353FF", "#EE5656FF", "#EE5959FF", "#EE5C5CFF", "#EE5F5FFF", "#EE6262FF", "#EE6666FF", "#EE6969FF", "#EE6C6CFF", "#EE6F6FFF", "#EE7272FF", "#EE7575FF",  "#EE7878FF", "#EE7B7BFF", "#EE7E7EFF", "#EE8181FF", "#EE8484FF", "#EE8888FF", "#EE8B8BFF", "#EE8E8EFF", "#EE9191FF", "#EE9494FF", "#EE9797FF", "#EE9999FF", "#EE9B9BFF", "#EE9D9DFF", "#EE9F9FFF", "#EEA0A0FF", "#EEA2A2FF", "#EEA4A4FF", "#EEA5A5FF", "#EEA7A7FF", "#EEA9A9FF", "#EEABABFF", "#EEACACFF", "#EEAEAEFF", "#EEB0B0FF", "#EEB1B1FF", "#EEB3B3FF", "#EEB5B5FF", "#EEB7B7FF", "#EEB8B8FF", "#EEBABAFF", "#EEBCBCFF", "#EEBDBDFF", "#EEBFBFFF", "#EEC1C1FF", "#EEC3C3FF", "#EEC4C4FF", "#EEC6C6FF", "#EEC8C8FF",  "#EEC9C9FF", "#EECBCBFF", "#EECDCDFF", "#EECFCFFF", "#EED0D0FF", "#EED2D2FF", "#EED4D4FF", "#EED5D5FF", "#EED7D7FF", "#EED9D9FF", "#EEDBDBFF", "#EEDCDCFF", "#EEDEDEFF", "#EEE0E0FF", "#EEE1E1FF", "#EEE3E3FF", "#EEE5E5FF", "#EEE7E7FF", "#EEE8E8FF", "#EEEAEAFF", "#EEECECFF", "#EEEEEEFF")
## edgeCol   =  c("#474747FF", "#494949FF", "#4B4B4BFF", "#4D4D4DFF", "#4F4F4FFF", "#515151FF", "#535353FF", "#555555FF", "#575757FF", "#595959FF", "#5B5B5BFF", "#5E5E5EFF", "#606060FF", "#626262FF", "#646464FF", "#666666FF", "#686868FF", "#6A6A6AFF", "#6C6C6CFF", "#6E6E6EFF", "#707070FF", "#727272FF", "#747474FF", "#767676FF", "#787878FF", "#7A7A7AFF", "#7C7C7CFF", "#7E7E7EFF", "#808080FF", "#828282FF", "#848484FF", "#868686FF", "#888888FF", "#8A8A8AFF", "#8C8C8CFF", "#8D8D8DFF", "#8F8F8FFF", "#919191FF", "#939393FF",  "#959595FF", "#979797FF", "#999999FF", "#9A9A9AFF", "#9C9C9CFF", "#9E9E9EFF", "#A0A0A0FF", "#A2A2A2FF", "#A3A3A3FF", "#A5A5A5FF", "#A7A7A7FF", "#A9A9A9FF", "#AAAAAAFF", "#ACACACFF", "#AEAEAEFF", "#AFAFAFFF", "#B1B1B1FF", "#B3B3B3FF", "#B4B4B4FF", "#B6B6B6FF", "#B7B7B7FF", "#B9B9B9FF", "#BBBBBBFF", "#BCBCBCFF", "#BEBEBEFF", "#BFBFBFFF", "#C1C1C1FF", "#C2C2C2FF", "#C3C3C4FF", "#C5C5C5FF", "#C6C6C6FF", "#C8C8C8FF", "#C9C9C9FF", "#CACACAFF", "#CCCCCCFF", "#CDCDCDFF", "#CECECEFF", "#CFCFCFFF", "#D1D1D1FF",  "#D2D2D2FF", "#D3D3D3FF", "#D4D4D4FF", "#D5D5D5FF", "#D6D6D6FF", "#D7D7D7FF", "#D8D8D8FF", "#D9D9D9FF", "#DADADAFF", "#DBDBDBFF", "#DCDCDCFF", "#DDDDDDFF", "#DEDEDEFF", "#DEDEDEFF", "#DFDFDFFF", "#E0E0E0FF", "#E0E0E0FF", "#E1E1E1FF", "#E1E1E1FF", "#E2E2E2FF", "#E2E2E2FF", "#E2E2E2FF")
## alpha     =  0.5
## cex   =  1
## itemLabels    =  TRUE
## labelCol  =  #000000B3
## measureLabels     =  FALSE
## precision     =  3
## layout    =  NULL
## layoutParams  =  list()
## arrowSize     =  0.5
## engine    =  igraph
## plot  =  TRUE
## plot_options  =  list()
## max   =  100
## verbose   =  FALSE

How do we win against Bologna?

#bologna rules
bologna <- goal_scorer_attributes %>%
  filter(away_team_api_id == 9857) 

bologna <- bologna %>%
  select(-c("away_team_api_id"))

bologna$player_api_id <- as.factor(bologna$player_api_id)

ar_all_trans <- as(bologna,"transactions")


#Use the apriori method, the algorithm we discussed in class
rules <- apriori(ar_all_trans, parameter = list(conf = 0.5,minlen=2), 
                 appearance = list(rhs=c("goal_time=SECOND_HALF","goal_time=FIRST_HALF"), default="lhs"))
## Warning in apriori(ar_all_trans, parameter = list(conf = 0.5, minlen = 2), :
## Mining stopped (maxlen reached). Only patterns up to a length of 10 returned!
rules_chosen_bologna <- rules %>%
  subset(subset = (rhs %pin% "goal_time=SECOND_HALF")) %>%
  subset(subset = lift > 1.5) %>%
  sort(by = "conf")
plot(rules_chosen_bologna[1:5],method="graph",
     size="lift", shading = "conf",
     control=list(main="Player attributes to Score against Bologna in Second Half"))
## Available control parameters (with default values):
## main  =  Graph for 5 rules
## nodeColors    =  c("#66CC6680", "#9999CC80")
## nodeCol   =  c("#EE0000FF", "#EE0303FF", "#EE0606FF", "#EE0909FF", "#EE0C0CFF", "#EE0F0FFF", "#EE1212FF", "#EE1515FF", "#EE1818FF", "#EE1B1BFF", "#EE1E1EFF", "#EE2222FF", "#EE2525FF", "#EE2828FF", "#EE2B2BFF", "#EE2E2EFF", "#EE3131FF", "#EE3434FF", "#EE3737FF", "#EE3A3AFF", "#EE3D3DFF", "#EE4040FF", "#EE4444FF", "#EE4747FF", "#EE4A4AFF", "#EE4D4DFF", "#EE5050FF", "#EE5353FF", "#EE5656FF", "#EE5959FF", "#EE5C5CFF", "#EE5F5FFF", "#EE6262FF", "#EE6666FF", "#EE6969FF", "#EE6C6CFF", "#EE6F6FFF", "#EE7272FF", "#EE7575FF",  "#EE7878FF", "#EE7B7BFF", "#EE7E7EFF", "#EE8181FF", "#EE8484FF", "#EE8888FF", "#EE8B8BFF", "#EE8E8EFF", "#EE9191FF", "#EE9494FF", "#EE9797FF", "#EE9999FF", "#EE9B9BFF", "#EE9D9DFF", "#EE9F9FFF", "#EEA0A0FF", "#EEA2A2FF", "#EEA4A4FF", "#EEA5A5FF", "#EEA7A7FF", "#EEA9A9FF", "#EEABABFF", "#EEACACFF", "#EEAEAEFF", "#EEB0B0FF", "#EEB1B1FF", "#EEB3B3FF", "#EEB5B5FF", "#EEB7B7FF", "#EEB8B8FF", "#EEBABAFF", "#EEBCBCFF", "#EEBDBDFF", "#EEBFBFFF", "#EEC1C1FF", "#EEC3C3FF", "#EEC4C4FF", "#EEC6C6FF", "#EEC8C8FF",  "#EEC9C9FF", "#EECBCBFF", "#EECDCDFF", "#EECFCFFF", "#EED0D0FF", "#EED2D2FF", "#EED4D4FF", "#EED5D5FF", "#EED7D7FF", "#EED9D9FF", "#EEDBDBFF", "#EEDCDCFF", "#EEDEDEFF", "#EEE0E0FF", "#EEE1E1FF", "#EEE3E3FF", "#EEE5E5FF", "#EEE7E7FF", "#EEE8E8FF", "#EEEAEAFF", "#EEECECFF", "#EEEEEEFF")
## edgeCol   =  c("#474747FF", "#494949FF", "#4B4B4BFF", "#4D4D4DFF", "#4F4F4FFF", "#515151FF", "#535353FF", "#555555FF", "#575757FF", "#595959FF", "#5B5B5BFF", "#5E5E5EFF", "#606060FF", "#626262FF", "#646464FF", "#666666FF", "#686868FF", "#6A6A6AFF", "#6C6C6CFF", "#6E6E6EFF", "#707070FF", "#727272FF", "#747474FF", "#767676FF", "#787878FF", "#7A7A7AFF", "#7C7C7CFF", "#7E7E7EFF", "#808080FF", "#828282FF", "#848484FF", "#868686FF", "#888888FF", "#8A8A8AFF", "#8C8C8CFF", "#8D8D8DFF", "#8F8F8FFF", "#919191FF", "#939393FF",  "#959595FF", "#979797FF", "#999999FF", "#9A9A9AFF", "#9C9C9CFF", "#9E9E9EFF", "#A0A0A0FF", "#A2A2A2FF", "#A3A3A3FF", "#A5A5A5FF", "#A7A7A7FF", "#A9A9A9FF", "#AAAAAAFF", "#ACACACFF", "#AEAEAEFF", "#AFAFAFFF", "#B1B1B1FF", "#B3B3B3FF", "#B4B4B4FF", "#B6B6B6FF", "#B7B7B7FF", "#B9B9B9FF", "#BBBBBBFF", "#BCBCBCFF", "#BEBEBEFF", "#BFBFBFFF", "#C1C1C1FF", "#C2C2C2FF", "#C3C3C4FF", "#C5C5C5FF", "#C6C6C6FF", "#C8C8C8FF", "#C9C9C9FF", "#CACACAFF", "#CCCCCCFF", "#CDCDCDFF", "#CECECEFF", "#CFCFCFFF", "#D1D1D1FF",  "#D2D2D2FF", "#D3D3D3FF", "#D4D4D4FF", "#D5D5D5FF", "#D6D6D6FF", "#D7D7D7FF", "#D8D8D8FF", "#D9D9D9FF", "#DADADAFF", "#DBDBDBFF", "#DCDCDCFF", "#DDDDDDFF", "#DEDEDEFF", "#DEDEDEFF", "#DFDFDFFF", "#E0E0E0FF", "#E0E0E0FF", "#E1E1E1FF", "#E1E1E1FF", "#E2E2E2FF", "#E2E2E2FF", "#E2E2E2FF")
## alpha     =  0.5
## cex   =  1
## itemLabels    =  TRUE
## labelCol  =  #000000B3
## measureLabels     =  FALSE
## precision     =  3
## layout    =  NULL
## layoutParams  =  list()
## arrowSize     =  0.5
## engine    =  igraph
## plot  =  TRUE
## plot_options  =  list()
## max   =  100
## verbose   =  FALSE

Do our players have the desired attributes?

Using the rules generated, matching these attributes to see if Roma has players with these attributes : * If Roma doesn’t have players with these attributes they might consider buying a player who matches these attributes according to Roma’s budget. * If Roma has players with these attributes and not playing in the playing 11, Player with the matched attributes should come off the bench and play in the second half.

# Code to pinpoint players

match_roma <- match %>% filter(home_team_api_id == 8686 )
match_roma_14_16 <-match_roma %>% filter(season == '2014/2015' | season == '2015/2016')

player_atts_updated <- player_attributes
player_atts_updated <- merge(x = player_atts_updated, y = player, by = 'player_api_id', all = FALSE)
player_name_atts <- player_atts_updated %>% 
  select("player_api_id","player_name", "date", "overall_rating",
         "potential", "preferred_foot", "attacking_work_rate", "defensive_work_rate"
         ,"crossing", "finishing", "heading_accuracy",
         "short_passing", "volleys", "dribbling",
         "curve", "free_kick_accuracy", "long_passing", "ball_control",
         "acceleration","sprint_speed", "agility", "reactions", "balance",
         "shot_power", "jumping", "stamina", "strength", "long_shots", "aggression",
         "interceptions", "positioning", "vision",
         "penalties", "marking", "standing_tackle",
         "sliding_tackle", "gk_diving", "gk_handling",
         "gk_kicking", "gk_positioning", "gk_reflexes",
         "birthday", "height", "weight")

player_name_atts <- player_name_atts %>%
  group_by(player_api_id) %>%
  slice(which.max(as.Date(date)))
home_player_1 <- unique(match_roma_14_16$home_player_1)
home_player_2 <- unique(match_roma_14_16$home_player_2)
home_player_3 <- unique(match_roma_14_16$home_player_3)
home_player_4 <- unique(match_roma_14_16$home_player_4)
home_player_5 <- unique(match_roma_14_16$home_player_5)
home_player_6 <- unique(match_roma_14_16$home_player_6)
home_player_7 <- unique(match_roma_14_16$home_player_7)
home_player_8 <- unique(match_roma_14_16$home_player_8)
home_player_9 <- unique(match_roma_14_16$home_player_9)
home_player_10 <- unique(match_roma_14_16$home_player_10)
home_player_11<- unique(match_roma_14_16$home_player_11)
player_name_atts_updated_roma <- player_name_atts %>% 
    filter(player_api_id %in% c(home_player_1,home_player_2,home_player_3,
                              home_player_4, home_player_5, home_player_6, 
                              home_player_7, home_player_8, home_player_9,
                              home_player_10, home_player_11))

#bologna
players_against_bologna <- player_name_atts_updated_roma %>% 
  filter(((strength >72 & strength <= 78) & (aggression > 68 & aggression <= 76)))

ggplot(players_against_bologna,aes(x=strength,y=aggression,label=player_name))+
  geom_label(aes(label=player_name),fill="darkorange",color="white")+
  ggtitle("Players who should be playing against Bologna \nBased on their desired attributes")+
  ylab("Strength")+
  xlab("Agrgression")+
  scale_x_continuous(limits = c(74,80))+
  scale_y_continuous(limits = c(68,78))+
  theme_classic()

#Atalanta
players_against_atalanta <- player_name_atts_updated_roma %>% 
  filter(((crossing > 73 & crossing <= 91) & (curve >75 & curve <= 92) ))

ggplot(players_against_atalanta,aes(x=crossing,y=curve,label=player_name))+
  geom_label(aes(label=player_name),fill="darkorange",color="white")+
  ggtitle("Players who should be playing against Atalanta \nBased on their desired attributes")+
  ylab("Curve")+
  xlab("Crossing")+
  scale_x_continuous(limits = c(72,96))+
  scale_y_continuous(limits = c(72,85))+
  theme_classic()

#Parma
players_against_parma <- player_name_atts_updated_roma %>% 
  filter(((shot_power > 69 & shot_power <= 74) & (strength >= 30 & strength <= 66) ))

ggplot(players_against_parma,aes(x=shot_power,y=strength,label=player_name))+
  geom_label(aes(label=player_name),fill="darkorange",color="white")+
  ggtitle("Players who should be playing against Parma\nBased on their desired attributes")+
  ylab("Strength")+
  xlab("Shot Power")+
  scale_x_continuous(limits = c(67,75))+
  scale_y_continuous(limits = c(50,70))+
  theme_classic()

It is seen that all against these 3 teams the players with high chances of scoring in the second half is different.

  • Against Bologna, attributes which correspond to to a goal in the second half are strength (73-78) and aggression (68-76).
  • For Atalanta, the player attributes which could result in the goal second half are crossing (74-91) rating and curve skills on (76-92).
  • For Parma, the player attributes which could result in a goal in the second half are shot power (69-74) and average strength (30-66).

Conclusions

To maintian the top 2 position in the Italian league, Roma has to ensure that it does not lose points by drawing matches on it’s home ground, especially to teams like Bologna, Atalanta and Parma who are quite down the points table as compared to Roma.

Recommendations

While playing agianst Bologna, Atalanta and Parma, always playing the same team with your best players may not be the right strategy.Instead substitutions can be made according to some of the player attributes who have a higher chance of scoring a goal against these teams

  • At Roma, there are 3 players who match the attributes of high strength and high aggression(Maicon,Totti and Keita) if one of these players are injured these we can substitute the the other two players. This suggests that against Bologna they can play a more aggressive kind of play in the second half by moving high up on the pitch and closing down players quick.

  • Maicon, Pjanic and Falque match these characteristics of high crossing and curve skill attributes against atalanta suggests that long crossing balls would lead to a goal instead of short passing game.

  • Salah, Ucan, Perotti and El Shaarawy are players with good shot power and average strength . In the second half these players can be brought on the field if they are on the bench in the first half. In the second half if the team is trailing or if it’s going to be a draw, players can shoot on target with power instead or more intricate style of scoring a goal.