Short introduction

In this project the association rules methods for the data set with football matches are used. It is the data of English Premier League for seasons 2014/2015-2020/2021. The data set was scraped from a website providing sport events statistics.

Such analysis can provide some insights on the regularities in the data set and on what are the drivers of winning or loosing a match.

Methods used are the so-called Apriori and Equivalence Class Clustering and bottom-up Lattice Traversal (ECLAT).

First the data set and its transformation is described. Then some basic info of the data set is provided. Lastly the analysis using Apriori and ECLAT algorithms is presented with comments on the results.

Read the data

setwd("C:/Users/bpop/OneDrive/R/USL/project/assocRules")

library(readr)
matches_data <- read_csv("matches-data.csv")
matches_data
## # A tibble: 2,660 x 29
##    MatchID MatchDate     Week HomeTeam       AwayTeam       HomeGoalsHT AwayGoalsHT HomeGoalsFT AwayGoalsFT HomeBallPos AwayBallPos HomeShotsOffTarget AwayShotsOffTarget HomeShotsOnTarget AwayShotsOnTarget HomeBlockedShots AwayBlockedShots HomeCorners AwayCorners HomePassSuccPerc AwayPassSuccPerc HomeAerialsWon AwayAerialsWon HomeFouls AwayFouls HomeYellowCards AwayYellowCards HomeRedCards AwayRedCards
##      <dbl> <chr>        <dbl> <chr>          <chr>                <dbl>       <dbl>       <dbl>       <dbl> <chr>       <chr>                    <dbl>              <dbl>             <dbl>             <dbl>            <dbl>            <dbl>       <dbl>       <dbl>            <dbl>            <dbl>          <dbl>          <dbl>     <dbl>     <dbl>           <dbl>           <dbl>        <dbl>        <dbl>
##  1       0 Aug 16, 2014     1 Man Utd        Swansea                  0           1           1           2 60%         40%                          5                  0                 5                 4                4                1           4           0               86               80             20             10        14        20               2               4            0            0
##  2       1 Aug 16, 2014     1 QPR            Hull City                0           0           0           1 51%         49%                          7                  3                 6                 4                6                4           8           9               77               76             30             15        10        10               1               2            0            0
##  3       2 Aug 16, 2014     1 Stoke          Aston Villa              0           0           0           1 63%         37%                          4                  4                 2                 1                6                2           2           8               84               68             30              9        14         9               0               3            0            0
##  4       3 Aug 16, 2014     1 West Brom      Sunderland               1           1           2           2 58%         42%                          5                  2                 5                 2                0                3           6           3               80               75             16             15        18         9               3               1            0            0
##  5       4 Aug 16, 2014     1 Leicester City Everton                  1           2           2           2 37%         63%                          5                  5                 3                 3                3                5           3           6               77               84             27             14        16        10               1               1            0            0
##  6       5 Aug 16, 2014     1 West Ham       Tottenham                0           0           0           1 47%         53%                         10                  2                 4                 4                4                4           8           5               83               80             15             12        12        10               2               0            0            1
##  7       6 Aug 16, 2014     1 Arsenal        Crystal Palace           1           1           2           1 76%         24%                          5                  0                 6                 2                3                2           9           3               88               57             23             17        13        19               2               3            0            0
##  8       7 Aug 17, 2014     1 Liverpool      Southampton              1           0           2           1 56%         44%                          5                  4                 5                 6                2                2           2           6               86               77             23             14         8        11               1               2            0            0
##  9       8 Aug 17, 2014     1 Newcastle      Man City                 0           1           0           2 44%         56%                          9                  5                 0                 5                3                3           3           3               83               86             14             16         8        11               1               5            0            0
## 10       9 Aug 18, 2014     1 Burnley        Chelsea                  1           3           1           3 39%         61%                          6                  4                 2                 3                1                4           4           3               70               82             27             20         6         7               1               1            0            0
## # ... with 2,650 more rows

Data description

The data set contains information from Premier League football matches. It consists of 7 seasons 2014/2015-2020/2021 which sums up to 2660 observations. The data was gathered using a self-built web scraper.

In order to perform the association rules analysis the original data set had to be transformed in a specific way.

Variables available in the original data set are:

  • MatchID – number of the match in the data set
  • MatchDate – calendar date of the played match
  • Week – week of the season
  • HomeTeam – home team name
  • Away Team – away team name

Next variables have prefixes ‘Home’ and ‘Away’ which points to the team. Only the names without prefixes are listed below.

  • GoalsHT – number of goals scored in the first half of the match
  • GoalsFT – number of goals scored till the end of the match
  • BallPos – percentage of ball possession
  • ShotsOffTarget – number of shots off the goal
  • ShotsOnTarget – number of shots on the goal
  • Blocked shots – number of shots blocked by the opponent
  • Corners – number of corners
  • PassSuccPerc – percentage of successful passes
  • AerialsWon – number of aerial duels won
  • Fouls – number of fouls commited
  • YellowCards – number of yellow cards
  • RedCards – number of red cards

There are no missing values in the data set.

Data transformation

The data set was transformed to present each match in a descriptive manner. One variable was created out of two from the original data set: statistics for the away team was subtracted from the corresponding statistics for the home team. This resulted in providing the information if home team achieved better (or higher) result for the specific statistics than away team.

matches_data <- read_csv("matches_assoc_rules.csv")
matches_data
## # A tibble: 2,660 x 12
##     ...1 Goals_desc BallPos_desc           ShotsOffTarget_desc        ShotsOnTarget_desc        BlockedShots_desc       Corners_desc      PassSuccPerc_desc         AerialsWon_desc       Fouls_desc                YellowCards_desc            RedCards_desc           
##    <dbl> <chr>      <chr>                  <chr>                      <chr>                     <chr>                   <chr>             <chr>                     <chr>                 <chr>                     <chr>                       <chr>                   
##  1     1 away win   higher home possession more home shots off target more home shots on target more home blocked shots more home corners better home pass accuracy more home aerials won away team more aggressive away team more yellow cards equal red cards         
##  2     2 away win   higher home possession more home shots off target more home shots on target more home blocked shots more away corners better home pass accuracy more home aerials won equal aggressiveness      away team more yellow cards equal red cards         
##  3     3 away win   higher home possession equal shots off target     more home shots on target more home blocked shots more away corners better home pass accuracy more home aerials won home team more aggressive away team more yellow cards equal red cards         
##  4     4 tie        higher home possession more home shots off target more home shots on target more away blocked shots more home corners better home pass accuracy more home aerials won home team more aggressive home team more yellow cards equal red cards         
##  5     5 tie        higher away possession equal shots off target     equal shots on target     more away blocked shots more away corners better away pass accuracy more home aerials won home team more aggressive equal yellow cards          equal red cards         
##  6     6 away win   higher away possession more home shots off target equal shots on target     equal blocked shots     more home corners better home pass accuracy more home aerials won home team more aggressive home team more yellow cards away team more red cards
##  7     7 home win   higher home possession more home shots off target more home shots on target more home blocked shots more home corners better home pass accuracy more home aerials won away team more aggressive away team more yellow cards equal red cards         
##  8     8 home win   higher home possession more home shots off target more away shots on target equal blocked shots     more away corners better home pass accuracy more home aerials won away team more aggressive away team more yellow cards equal red cards         
##  9     9 away win   higher away possession more home shots off target more away shots on target equal blocked shots     equal corners     better away pass accuracy more away aerials won away team more aggressive away team more yellow cards equal red cards         
## 10    10 away win   higher away possession more home shots off target more away shots on target more away blocked shots more home corners better away pass accuracy more home aerials won away team more aggressive equal yellow cards          equal red cards         
## # ... with 2,650 more rows

It reduced the number of variables to 11. However, each variable has now three values:

  • Higher home team statistics
  • Equal statistics
  • Higher away team statistics

Basic info

library(arules)
library(arulesViz)

transMatches<-read.transactions("matches_assoc_rules.csv", format="basket", sep=",", skip=1) # reading the file as transactions
transMatches
## transactions in sparse format with
##  2660 transactions (rows) and
##  2693 items (columns)
summary(transMatches)
## transactions as itemMatrix in sparse format with
##  2660 rows (elements/itemsets/transactions) and
##  2693 columns (items) and a density of 0.004455997 
## 
## most frequent items:
##            equal red cards more home shots off target          more home corners    more home blocked shots     higher home possession                    (Other) 
##                       2470                       1450                       1437                       1397                       1390                      23776 
## 
## element (itemset/transaction) length distribution:
## sizes
##   12 
## 2660 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      12      12      12      12      12      12 
## 
## includes extended item information - examples:
##   labels
## 1      1
## 2     10
## 3    100

We see that the item saying that both teams got the same number of red cards (zero) is the most frequent one. Other most frequent items show that home teams are in general dominating the games.

We can see some exemplary observations in the listing below:

inspect(transMatches[1:5])
##     items                                                                                                                                                                                                                                                              
## [1] {1, away team more aggressive, away team more yellow cards, away win, better home pass accuracy, equal red cards, higher home possession, more home aerials won, more home blocked shots, more home corners, more home shots off target, more home shots on target}
## [2] {2, away team more yellow cards, away win, better home pass accuracy, equal aggressiveness, equal red cards, higher home possession, more away corners, more home aerials won, more home blocked shots, more home shots off target, more home shots on target}     
## [3] {3, away team more yellow cards, away win, better home pass accuracy, equal red cards, equal shots off target, higher home possession, home team more aggressive, more away corners, more home aerials won, more home blocked shots, more home shots on target}    
## [4] {4, better home pass accuracy, equal red cards, higher home possession, home team more aggressive, home team more yellow cards, more away blocked shots, more home aerials won, more home corners, more home shots off target, more home shots on target, tie}     
## [5] {5, better away pass accuracy, equal red cards, equal shots off target, equal shots on target, equal yellow cards, higher away possession, home team more aggressive, more away blocked shots, more away corners, more home aerials won, tie}

Item frequencies

itemFrequencyPlot(transMatches, support = 0.1)

itemFrequencyPlot(transMatches, topN=10, type="relative", main="Item Frequency")

Once again it turns out that home teams have in general better (higher) statistics.

The most frequent item is the case when the number of red cards given is equal, and in particular 0.

The only one item in the most frequent ones for away team is the aggressiveness – number of fouls commited.

These results show that in general home teams are more in control of the game and away teams try to fight this by fouling.

A priori method

It is the default algorithm of deriving the association rules. The default, recommended, parameters values (support and confidence) are used.

matchesRules <- apriori(transMatches, parameter = list(support = 0.1, confidence = 0.8, maxlen= 10))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen maxlen target  ext
##         0.8    0.1    1 none FALSE            TRUE       5     0.1      1     10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 266 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[2693 item(s), 2660 transaction(s)] done [0.01s].
## sorting and recoding items ... [25 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 done [0.00s].
## writing ... [1915 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
summary(matchesRules)
## set of 1915 rules
## 
## rule length distribution (lhs + rhs):sizes
##   1   2   3   4   5   6   7 
##   1  28 277 541 628 363  77 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   4.000   5.000   4.652   5.000   7.000 
## 
## summary of quality measures:
##     support         confidence        coverage           lift           count       
##  Min.   :0.1000   Min.   :0.8000   Min.   :0.1019   Min.   :0.932   Min.   : 266.0  
##  1st Qu.:0.1120   1st Qu.:0.8556   1st Qu.:0.1259   1st Qu.:1.011   1st Qu.: 298.0  
##  Median :0.1305   Median :0.9258   Median :0.1455   Median :1.564   Median : 347.0  
##  Mean   :0.1509   Mean   :0.9068   Mean   :0.1665   Mean   :1.473   Mean   : 401.3  
##  3rd Qu.:0.1650   3rd Qu.:0.9495   3rd Qu.:0.1805   3rd Qu.:1.834   3rd Qu.: 439.0  
##  Max.   :0.9286   Max.   :0.9935   Max.   :1.0000   Max.   :2.223   Max.   :2470.0  
## 
## mining info:
##          data ntransactions support confidence                                                                                         call
##  transMatches          2660     0.1        0.8 apriori(data = transMatches, parameter = list(support = 0.1, confidence = 0.8, maxlen = 10))
# inspect(matchesRules)

The algorithm found 1915 association rules. The most frequent rules consist of 5 items. 3-, 4- and 6-items long rules are also quite frequent.

Confidence of derived rules is quite high in general – minimum of 0.8. Also the lift parameter is quiet high.

Plots of the rules by metrics

plot(matchesRules)
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

We see that the confidence is high in general and the support is quite low. The graph shows also that the lift is high in large number of the discovered rules.

matches_rules_byConf = sort(matchesRules, by = 'confidence', decreasing = TRUE)
plot(matches_rules_byConf[1:10], method = 'graph', measure = 'support', shading = 'confidence')

We can see some strong rules present in the set (with relatively high support and confidence). There are couple of items linked quite strongly (connected through the dot in upper right part of the graph).

Graph for groups

plot(matchesRules, method="grouped")

The graph shows that there are many rules which can be described as equally important in terms of lift and support. ‘Equal red cards’ appears with high support in every of the plotted rules (appears frequently in the data and rules).

Rules inspection

5 rules with the highest confidence measure.

inspect(matches_rules_byConf[1:5])
##     lhs                                                                                                                                                  rhs                      support   confidence coverage  lift     count
## [1] {away team more yellow cards, better home pass accuracy, more home blocked shots, more home corners, more home shots off target}                  => {higher home possession} 0.1146617 0.9934853  0.1154135 1.901202 305  
## [2] {away team more yellow cards, better home pass accuracy, equal red cards, more home blocked shots, more home corners, more home shots off target} => {higher home possession} 0.1082707 0.9931034  0.1090226 1.900471 288  
## [3] {away team more yellow cards, better home pass accuracy, more home blocked shots, more home shots off target, more home shots on target}          => {higher home possession} 0.1030075 0.9927536  0.1037594 1.899802 274  
## [4] {away team more aggressive, better home pass accuracy, more home blocked shots, more home shots off target, more home shots on target}            => {higher home possession} 0.1045113 0.9893238  0.1056391 1.893238 278  
## [5] {away team more yellow cards, better home pass accuracy, equal red cards, more home blocked shots, more home shots off target}                    => {higher home possession} 0.1259398 0.9882006  0.1274436 1.891089 335

Rules with the highest values of confidence are associated with a higher ball possession of home team. The confidence for these rules is almost 1 (100% probability). On the left side we see what match characteristics are associated with it.

High lift values means that higher ball possession of home team is very likely to appear with the items on the left side.

These items are:

  • Away team gets more yellow cards
  • Home team has higher pass accuracy
  • Home team has more corners, shots off target, shots on target and blocked shots.

What is associated with the team winning

What is the profile of matches in which home team wins?

Note: Support and confidence values were lowered in order to get more rules.

rules.HomeTeamWin<-apriori(data=transMatches, parameter=list(supp=0.01, conf=0.08), appearance=list(default="lhs", rhs="home win"), control=list(verbose=F)) 
summary(rules.HomeTeamWin)
## set of 8810 rules
## 
## rule length distribution (lhs + rhs):sizes
##    1    2    3    4    5    6    7    8    9   10 
##    1   29  286 1163 2388 2576 1574  602  164   27 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   5.000   6.000   5.737   7.000  10.000 
## 
## summary of quality measures:
##     support          confidence        coverage            lift            count        
##  Min.   :0.01015   Min.   :0.1399   Min.   :0.01203   Min.   :0.3138   Min.   :  27.00  
##  1st Qu.:0.01316   1st Qu.:0.4308   1st Qu.:0.02594   1st Qu.:0.9661   1st Qu.:  35.00  
##  Median :0.01880   Median :0.5093   Median :0.04023   Median :1.1422   Median :  50.00  
##  Mean   :0.02856   Mean   :0.5236   Mean   :0.05787   Mean   :1.1742   Mean   :  75.96  
##  3rd Qu.:0.03195   3rd Qu.:0.6246   3rd Qu.:0.06729   3rd Qu.:1.4008   3rd Qu.:  85.00  
##  Max.   :0.44586   Max.   :0.8857   Max.   :1.00000   Max.   :1.9865   Max.   :1186.00  
## 
## mining info:
##          data ntransactions support confidence                                                                                                                                                        call
##  transMatches          2660    0.01       0.08 apriori(data = transMatches, parameter = list(supp = 0.01, conf = 0.08), appearance = list(default = "lhs", rhs = "home win"), control = list(verbose = F))
rules.HomeTeamWin.byconf<-sort(rules.HomeTeamWin, by="confidence", decreasing=TRUE)
inspect(head(rules.HomeTeamWin.byconf))
##     lhs                                                                                                                                                                     rhs        support    confidence coverage   lift     count
## [1] {home team more aggressive, home team more yellow cards, more away corners, more away shots off target, more home shots on target}                                   => {home win} 0.01165414 0.8857143  0.01315789 1.986509 31   
## [2] {equal red cards, home team more aggressive, home team more yellow cards, more away corners, more away shots off target, more home shots on target}                  => {home win} 0.01052632 0.8750000  0.01203008 1.962479 28   
## [3] {better away pass accuracy, equal red cards, higher away possession, home team more aggressive, more away corners, more home aerials won, more home shots on target} => {home win} 0.01203008 0.8648649  0.01390977 1.939748 32   
## [4] {better away pass accuracy, higher away possession, home team more aggressive, more away corners, more home aerials won, more home shots on target}                  => {home win} 0.01315789 0.8536585  0.01541353 1.914614 35   
## [5] {equal red cards, higher away possession, home team more aggressive, more away corners, more home aerials won, more home shots on target}                            => {home win} 0.01315789 0.8536585  0.01541353 1.914614 35   
## [6] {higher away possession, home team more yellow cards, more away corners, more home aerials won, more home shots on target}                                           => {home win} 0.01090226 0.8529412  0.01278195 1.913005 29
is.significant(head(rules.HomeTeamWin.byconf, 10), transMatches)
##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

Home team win is influenced mostly by home team being more aggressive (commiting more fouls and getting more yellow cards), winning more aerial duels and shooting more on target. Away teams must shoot more off target, have higher ball possession and have more corners (last two are surprising).

All listed rules are statistically significant.


What is the profile of matches in which away team wins?

rules.AwayTeamWin<-apriori(data=transMatches, parameter=list(supp=0.01, conf=0.08), appearance=list(default="lhs", rhs="away win"), control=list(verbose=F)) 
summary(rules.AwayTeamWin)
## set of 6420 rules
## 
## rule length distribution (lhs + rhs):sizes
##    1    2    3    4    5    6    7    8    9 
##    1   28  276  971 1900 1945 1011  261   27 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   5.000   6.000   5.506   6.000   9.000 
## 
## summary of quality measures:
##     support          confidence         coverage            lift            count       
##  Min.   :0.01015   Min.   :0.08382   Min.   :0.01316   Min.   :0.2638   Min.   : 27.00  
##  1st Qu.:0.01241   1st Qu.:0.28489   1st Qu.:0.02998   1st Qu.:0.8968   1st Qu.: 33.00  
##  Median :0.01692   Median :0.38462   Median :0.04774   Median :1.2107   Median : 45.00  
##  Mean   :0.02293   Mean   :0.40521   Mean   :0.06760   Mean   :1.2756   Mean   : 60.99  
##  3rd Qu.:0.02594   3rd Qu.:0.53893   3rd Qu.:0.08158   3rd Qu.:1.6965   3rd Qu.: 69.00  
##  Max.   :0.31767   Max.   :0.80952   Max.   :1.00000   Max.   :2.5483   Max.   :845.00  
## 
## mining info:
##          data ntransactions support confidence                                                                                                                                                        call
##  transMatches          2660    0.01       0.08 apriori(data = transMatches, parameter = list(supp = 0.01, conf = 0.08), appearance = list(default = "lhs", rhs = "away win"), control = list(verbose = F))
rules.AwayTeamWin.byconf<-sort(rules.AwayTeamWin, by="confidence", decreasing=TRUE)
inspect(head(rules.AwayTeamWin.byconf))
##     lhs                                                                                                                               rhs        support    confidence coverage   lift     count
## [1] {equal red cards, more away blocked shots, more away shots on target, more home corners, more home shots off target}           => {away win} 0.01278195 0.8095238  0.01578947 2.548323 34   
## [2] {equal aerials won, equal red cards, higher away possession, more away shots on target}                                        => {away win} 0.01052632 0.8000000  0.01315789 2.518343 28   
## [3] {better away pass accuracy, equal aerials won, equal red cards, higher away possession, more away shots on target}             => {away win} 0.01052632 0.8000000  0.01315789 2.518343 28   
## [4] {home team more aggressive, more away shots off target, more away shots on target, more home blocked shots, more home corners} => {away win} 0.01052632 0.8000000  0.01315789 2.518343 28   
## [5] {more away blocked shots, more away shots on target, more home corners, more home shots off target}                            => {away win} 0.01315789 0.7954545  0.01654135 2.504034 35   
## [6] {better away pass accuracy, equal aerials won, equal red cards, more away shots on target}                                     => {away win} 0.01052632 0.7777778  0.01353383 2.448389 28
is.significant(head(rules.AwayTeamWin.byconf, 10), transMatches)
##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

For away team to win it should have more shots blocked, shots on target, higher ball possession and better pass accuracy. Home team should have more corners, shots off target and be more aggressive.

Ale listed rules are statistically significant.

ECLAT method

Some insights from the Equivalence Class Clustering and bottom-up Lattice Traversal (ECLAT) algorithm.

Its advantage over the Apriori algorithm is very fast computing, which is important when we work with a big data set.

It was used with the default parameters (support and confidence).

itemsets_eclat = eclat(transMatches, parameter=list(supp=0.1, maxlen=5)) 
## Eclat
## 
## parameter specification:
##  tidLists support minlen maxlen            target  ext
##     FALSE     0.1      1      5 frequent itemsets TRUE
## 
## algorithmic control:
##  sparse sort verbose
##       7   -2    TRUE
## 
## Absolute minimum support count: 266 
## 
## create itemset ... 
## set transactions ...[2693 item(s), 2660 transaction(s)] done [0.01s].
## sorting and recoding items ... [25 item(s)] done [0.00s].
## creating bit matrix ... [25 row(s), 2660 column(s)] done [0.00s].
## writing  ... [1405 set(s)] done [0.00s].
## Creating S4 object  ... done [0.00s].
matches_rules_eclat<-ruleInduction(itemsets_eclat, transMatches, confidence=0.8)
# inspect(head(matches_rules_eclat, 10))
rules_eclat.bysupp<-sort(matches_rules_eclat, by="support", decreasing=TRUE)
inspect(head(rules_eclat.bysupp, 10))
##      lhs                                          rhs                         support   confidence lift     itemset
## [1]  {more home corners}                       => {equal red cards}           0.5078947 0.9401531  1.012473 1378   
## [2]  {more home shots off target}              => {equal red cards}           0.5067669 0.9296552  1.001167 1380   
## [3]  {more home blocked shots}                 => {equal red cards}           0.4902256 0.9334288  1.005231 1374   
## [4]  {higher home possession}                  => {equal red cards}           0.4875940 0.9330935  1.004870 1366   
## [5]  {higher home possession}                  => {better home pass accuracy} 0.4849624 0.9280576  1.799295 1323   
## [6]  {better home pass accuracy}               => {higher home possession}    0.4849624 0.9402332  1.799295 1323   
## [7]  {more home shots on target}               => {equal red cards}           0.4842105 0.9326575  1.004400 1350   
## [8]  {better home pass accuracy}               => {equal red cards}           0.4804511 0.9314869  1.003140 1319   
## [9]  {away team more aggressive}               => {equal red cards}           0.4530075 0.9341085  1.005963 1262   
## [10] {equal red cards, higher home possession} => {better home pass accuracy} 0.4522556 0.9275251  1.798263 1304
is.significant(head(rules_eclat.bysupp, 10), transMatches)
##  [1] FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE

The most frequent rules are the ones with the case when number of red cards given is equal.

Results of the test for significance show that only the rules with high lift are statistically significant.

rules_eclat.byconf<-sort(matches_rules_eclat, by="confidence", decreasing=TRUE)
inspect(head(rules_eclat.byconf, 10))
##      lhs                                                                                                              rhs                         support   confidence lift     itemset
## [1]  {away team more yellow cards, better home pass accuracy, more home blocked shots, more home shots off target} => {higher home possession}    0.1353383 0.9863014  1.887454  533   
## [2]  {away team more yellow cards, higher home possession, more home shots off target, more home shots on target}  => {better home pass accuracy} 0.1255639 0.9852507  1.910180  518   
## [3]  {away team more yellow cards, better home pass accuracy, more home blocked shots, more home corners}          => {higher home possession}    0.1387218 0.9840000  1.883050  534   
## [4]  {away team more yellow cards, better home pass accuracy, more home corners, more home shots off target}       => {higher home possession}    0.1364662 0.9837398  1.882552  536   
## [5]  {away team more yellow cards, better home pass accuracy, more home blocked shots, more home shots on target}  => {higher home possession}    0.1274436 0.9826087  1.880388  520   
## [6]  {higher away possession, more away blocked shots, more away corners, more away shots on target}               => {better away pass accuracy} 0.1056391 0.9825175  2.177914  183   
## [7]  {away team more yellow cards, higher home possession, home win, more home shots on target}                    => {better home pass accuracy} 0.1026316 0.9820144  1.903905  415   
## [8]  {higher away possession, more away corners, more away shots off target, more away shots on target}            => {better away pass accuracy} 0.1007519 0.9816850  2.176068  248   
## [9]  {better home pass accuracy, more home blocked shots, more home shots off target, more home shots on target}   => {higher home possession}    0.2015038 0.9816850  1.878620 1270   
## [10] {away team more yellow cards, better home pass accuracy, equal red cards, more home blocked shots}            => {higher home possession}    0.1597744 0.9815242  1.878313  532
is.significant(head(rules_eclat.byconf, 10), transMatches)
##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

Rules with the highest confidence are the ones with higher home ball possession and passes accuracy of both teams on the right hand side. All of the listed rules are statistically significant.

Conclusions

Utilized association rules on the investigated data set allowed to discover some regularities in the data. The rules which resulted from the analysis were not very strong, however it was possible to find out the basic characteristics of the match which are associated with winning a game by home and away teams.

According to the result of ECLAT algorithm utilization, rules which are most frequent in the data set are not always statistically significant.