Lab 7: Multiple Regression and Bootstrapping

Author

Micah Lohr

Lab 7: Multiple Regression and Bootstrapping

load packages

library(tidyverse)

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0      ✔ purrr   1.0.1 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.3.0      ✔ stringr 1.5.0 
✔ readr   2.1.3      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

library(broom)
library(data.table)


Attaching package: 'data.table'

The following objects are masked from 'package:dplyr':

    between, first, last

The following object is masked from 'package:purrr':

    transpose

library(performance)
library(patchwork)
library(car)

Loading required package: carData

Attaching package: 'car'

The following object is masked from 'package:dplyr':

    recode

The following object is masked from 'package:purrr':

    some

library(rsample)

Essentials

1.) Load data ‘soccer’ from tidytuesday

find soccer here

soccer <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-04-04/soccer21-22.csv') %>%
drop_na()

Rows: 380 Columns: 22
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (6): Date, HomeTeam, AwayTeam, FTR, HTR, Referee
dbl (16): FTHG, FTAG, HTHG, HTAG, HS, AS, HST, AST, HF, AF, HC, AC, HY, AY, ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

str(soccer)

tibble [380 × 22] (S3: tbl_df/tbl/data.frame)
 $ Date    : chr [1:380] "13/08/2021" "14/08/2021" "14/08/2021" "14/08/2021" ...
 $ HomeTeam: chr [1:380] "Brentford" "Man United" "Burnley" "Chelsea" ...
 $ AwayTeam: chr [1:380] "Arsenal" "Leeds" "Brighton" "Crystal Palace" ...
 $ FTHG    : num [1:380] 2 5 1 3 3 1 3 0 2 1 ...
 $ FTAG    : num [1:380] 0 1 2 0 1 0 2 3 4 0 ...
 $ FTR     : chr [1:380] "H" "H" "A" "H" ...
 $ HTHG    : num [1:380] 1 1 1 2 0 1 2 0 2 0 ...
 $ HTAG    : num [1:380] 0 0 0 0 1 0 0 1 1 0 ...
 $ HTR     : chr [1:380] "H" "H" "H" "H" ...
 $ Referee : chr [1:380] "M Oliver" "P Tierney" "D Coote" "J Moss" ...
 $ HS      : num [1:380] 8 16 14 13 14 9 13 14 17 13 ...
 $ AS      : num [1:380] 22 10 14 4 6 17 11 19 8 18 ...
 $ HST     : num [1:380] 3 8 3 6 6 5 7 3 3 3 ...
 $ AST     : num [1:380] 4 3 8 1 3 3 2 8 9 4 ...
 $ HF      : num [1:380] 12 11 10 15 13 6 18 4 4 11 ...
 $ AF      : num [1:380] 8 9 7 11 15 10 13 14 3 8 ...
 $ HC      : num [1:380] 2 5 7 5 6 5 2 3 7 3 ...
 $ AC      : num [1:380] 5 4 6 2 8 4 4 11 6 11 ...
 $ HY      : num [1:380] 0 1 2 0 2 1 3 1 1 2 ...
 $ AY      : num [1:380] 0 2 1 0 0 2 1 1 0 1 ...
 $ HR      : num [1:380] 0 0 0 0 0 0 0 0 0 0 ...
 $ AR      : num [1:380] 0 0 0 0 0 0 0 0 0 0 ...

After you load the data, record which variables are categorical and which are numeric. CATEGORICAL:Date, HomeTeam, AwayTeam, full time result (FTR), halftime results (HTR), Referee NUMERIC: full time home goals (FTHG), full time away goals (FTAG), halftime home goals (HTHG), halftime away goals (HTAG), number of shots taken by the home team (HS), Number of shots taken by the away team (AS), number of shots on target by the home team (HST), number of shots on target by the away team (AST), number of fouls by the home team (HF), number of fouls by the away team (AF), number of corners taken by the home team (HC), number of corners taken by the away team (AC), number of yellow cards received by the home team (HY), number of yellow cards received by the away team (AY), number of red cards received by the home team (HR), number of red cards received by the away team (AR)

2.) Let’s consider the effects of home team shots (HS), home team (HomeTeam), and home team fouls (HF) on home team goals (full time home goals). Build a fully interactive multiple linear regression model. Assess model fit and then model assumptions. How well does the model fit the data? Is the model valid?

Build a model

socmod <- lm(FTHG~HS*HomeTeam*HF, data=soccer)
summary(socmod)


Call:
lm(formula = FTHG ~ HS * HomeTeam * HF, data = soccer)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.7138 -0.6469 -0.0615  0.5152  3.9047 

Coefficients:
                               Estimate Std. Error t value Pr(>|t|)  
(Intercept)                    0.697352   3.708178   0.188   0.8510  
HS                             0.055902   0.190075   0.294   0.7689  
HomeTeamAston Villa            0.881435   4.996848   0.176   0.8601  
HomeTeamBrentford             -2.533050   5.203392  -0.487   0.6268  
HomeTeamBrighton               3.717225   6.575577   0.565   0.5723  
HomeTeamBurnley               -0.276222   6.857200  -0.040   0.9679  
HomeTeamChelsea                6.034787   4.925771   1.225   0.2215  
HomeTeamCrystal Palace        -2.318297   4.283831  -0.541   0.5888  
HomeTeamEverton                0.319386   4.645420   0.069   0.9452  
HomeTeamLeeds                 -3.624261   5.001969  -0.725   0.4693  
HomeTeamLeicester             -0.792986   4.114557  -0.193   0.8473  
HomeTeamLiverpool              4.175053   5.026751   0.831   0.4069  
HomeTeamMan City             -10.661735   5.330983  -2.000   0.0464 *
HomeTeamMan United             7.326569   4.781359   1.532   0.1265  
HomeTeamNewcastle              0.168113   4.800783   0.035   0.9721  
HomeTeamNorwich                2.094630   5.334412   0.393   0.6948  
HomeTeamSouthampton           -0.310685   4.470009  -0.070   0.9446  
HomeTeamTottenham             -2.750661   4.805536  -0.572   0.5675  
HomeTeamWatford               -2.913636   4.142058  -0.703   0.4823  
HomeTeamWest Ham              -2.027195   4.796066  -0.423   0.6728  
HomeTeamWolves                -1.557139   4.102273  -0.380   0.7045  
HF                            -0.009934   0.365871  -0.027   0.9784  
HS:HomeTeamAston Villa         0.061853   0.328940   0.188   0.8510  
HS:HomeTeamBrentford           0.182702   0.332776   0.549   0.5834  
HS:HomeTeamBrighton           -0.262377   0.425503  -0.617   0.5379  
HS:HomeTeamBurnley             0.031909   0.467206   0.068   0.9456  
HS:HomeTeamChelsea            -0.391915   0.267354  -1.466   0.1437  
HS:HomeTeamCrystal Palace      0.105923   0.251576   0.421   0.6740  
HS:HomeTeamEverton             0.011687   0.286931   0.041   0.9675  
HS:HomeTeamLeeds               0.243965   0.297532   0.820   0.4129  
HS:HomeTeamLeicester           0.146672   0.235410   0.623   0.5337  
HS:HomeTeamLiverpool          -0.109337   0.242931  -0.450   0.6530  
HS:HomeTeamMan City            0.558982   0.261335   2.139   0.0332 *
HS:HomeTeamMan United         -0.414803   0.258899  -1.602   0.1102  
HS:HomeTeamNewcastle          -0.025273   0.307031  -0.082   0.9345  
HS:HomeTeamNorwich            -0.290202   0.363840  -0.798   0.4257  
HS:HomeTeamSouthampton         0.053585   0.266735   0.201   0.8409  
HS:HomeTeamTottenham           0.208991   0.306756   0.681   0.4962  
HS:HomeTeamWatford             0.257706   0.248203   1.038   0.3000  
HS:HomeTeamWest Ham            0.147201   0.312313   0.471   0.6378  
HS:HomeTeamWolves              0.010637   0.248269   0.043   0.9659  
HS:HF                          0.001280   0.019025   0.067   0.9464  
HomeTeamAston Villa:HF        -0.084654   0.465462  -0.182   0.8558  
HomeTeamBrentford:HF           0.221657   0.505506   0.438   0.6613  
HomeTeamBrighton:HF           -0.268913   0.596677  -0.451   0.6525  
HomeTeamBurnley:HF            -0.085970   0.651113  -0.132   0.8950  
HomeTeamChelsea:HF            -0.450434   0.447628  -1.006   0.3151  
HomeTeamCrystal Palace:HF      0.196644   0.417779   0.471   0.6382  
HomeTeamEverton:HF            -0.128741   0.491577  -0.262   0.7936  
HomeTeamLeeds:HF               0.213802   0.446558   0.479   0.6324  
HomeTeamLeicester:HF           0.048414   0.403071   0.120   0.9045  
HomeTeamLiverpool:HF          -0.289467   0.533930  -0.542   0.5881  
HomeTeamMan City:HF            1.036326   0.560155   1.850   0.0653 .
HomeTeamMan United:HF         -0.787337   0.499734  -1.576   0.1162  
HomeTeamNewcastle:HF          -0.073170   0.446549  -0.164   0.8700  
HomeTeamNorwich:HF            -0.254151   0.505784  -0.502   0.6157  
HomeTeamSouthampton:HF         0.024514   0.429623   0.057   0.9545  
HomeTeamTottenham:HF           0.187767   0.454811   0.413   0.6800  
HomeTeamWatford:HF             0.117282   0.411028   0.285   0.7756  
HomeTeamWest Ham:HF            0.451202   0.482874   0.934   0.3508  
HomeTeamWolves:HF              0.302944   0.420129   0.721   0.4714  
HS:HomeTeamAston Villa:HF     -0.005724   0.028970  -0.198   0.8435  
HS:HomeTeamBrentford:HF       -0.018001   0.032204  -0.559   0.5766  
HS:HomeTeamBrighton:HF         0.015377   0.038286   0.402   0.6882  
HS:HomeTeamBurnley:HF          0.002547   0.044694   0.057   0.9546  
HS:HomeTeamChelsea:HF          0.031442   0.024617   1.277   0.2025  
HS:HomeTeamCrystal Palace:HF  -0.008544   0.024429  -0.350   0.7268  
HS:HomeTeamEverton:HF          0.006583   0.032144   0.205   0.8379  
HS:HomeTeamLeeds:HF           -0.017376   0.025839  -0.672   0.5018  
HS:HomeTeamLeicester:HF       -0.010627   0.024129  -0.440   0.6599  
HS:HomeTeamLiverpool:HF        0.006820   0.025865   0.264   0.7922  
HS:HomeTeamMan City:HF        -0.048830   0.028501  -1.713   0.0877 .
HS:HomeTeamMan United:HF       0.046000   0.028069   1.639   0.1023  
HS:HomeTeamNewcastle:HF        0.006266   0.028067   0.223   0.8235  
HS:HomeTeamNorwich:HF          0.027014   0.033845   0.798   0.4254  
HS:HomeTeamSouthampton:HF     -0.006353   0.025158  -0.253   0.8008  
HS:HomeTeamTottenham:HF       -0.011332   0.028415  -0.399   0.6903  
HS:HomeTeamWatford:HF         -0.013823   0.024557  -0.563   0.5739  
HS:HomeTeamWest Ham:HF        -0.030930   0.030862  -1.002   0.3171  
HS:HomeTeamWolves:HF          -0.015949   0.027111  -0.588   0.5568  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.16 on 300 degrees of freedom
Multiple R-squared:  0.3949,    Adjusted R-squared:  0.2355 
F-statistic: 2.478 on 79 and 300 DF,  p-value: 1.774e-08

p-value of the model is less than 0.05: however, adjusted R-squared is 0.2355, which is pretty low…the mode does not even explain a quarter of the variability found in the data.

Assess model fit

sigma(socmod)

[1] 1.159849

mean(soccer$FTHG)

[1] 1.513158

sigma(socmod)/mean(soccer$FTHG)

[1] 0.7665088

#76.7% error rate-- not good

model_performance(socmod)

# Indices of model performance

AIC      |     AICc |      BIC |    R2 | R2 (adj.) |  RMSE | Sigma
------------------------------------------------------------------
1263.266 | 1307.843 | 1582.420 | 0.395 |     0.236 | 1.031 | 1.160

#Not looking particularly great, the AIC and BIC are very high

Assess model assumptions

vif(socmod)

there are higher-order terms (interactions) in this model
consider setting type = 'predictor'; see ?vif

                       GVIF Df GVIF^(1/(2*Df))
HS             3.262830e+02  1        18.06331
HomeTeam       2.960377e+40 19        11.61543
HF             4.473665e+02  1        21.15104
HS:HomeTeam    8.630016e+40 19        11.94713
HS:HF          5.313753e+02  1        23.05158
HomeTeam:HF    1.377771e+41 19        12.09511
HS:HomeTeam:HF 2.036968e+41 19        12.22021

check_model(socmod)

Variable `Component` is not in your data frame :/

Assumptions: 1. Linearity: looks pretty good, the reference line is mainly flat and horizontal 2. Normality: normality of residuals is looking pretty good, not great, but I would still consider the assumption met 3. Equal variance: the reference line is not flat, this assumption is looking like it’s not met… 4. Independence: we know nothing about the experimental design, I am assuming independence because I have to do this lab 5. Colinearity: based on the VIFs, there is high colinearity in this data. This assumption of no colinearity is not met.

How well does the model fit the data? Is the model valid? Overall, the model does not fit the data well at all. Furthermore, based on the fact that multiple assumptions are violated, this model is not really valid.

3.) Run through a top-down modeling approach to find the best fit model! Be sure to check assumptions after each change and compare performance. What model is the best fit?

socmod2 <- lm(FTHG~HS*HomeTeam+HF, data=soccer)
check_model(socmod2)

Variable `Component` is not in your data frame :/

socmod3 <- lm(FTHG~HS+HomeTeam*HF, data=soccer)
check_model(socmod3)

Variable `Component` is not in your data frame :/

socmod4 <- lm(FTHG~HS+HomeTeam+HF, data=soccer)
check_model(socmod4)

Variable `Component` is not in your data frame :/

socmod5 <- lm(FTHG~HS*HomeTeam, data=soccer)
check_model(socmod5)

Variable `Component` is not in your data frame :/

socmod6 <- lm(FTHG~HS+HomeTeam, data=soccer)
check_model(socmod6)

Variable `Component` is not in your data frame :/

socmod7 <- lm(FTHG~HS*HF, data=soccer)
check_model(socmod7)

Variable `Component` is not in your data frame :/

socmod8 <- lm(FTHG~HS+HF, data=soccer)
check_model(socmod8)

Variable `Component` is not in your data frame :/

socmod9 <- lm(FTHG~HomeTeam*HF, data=soccer)
check_model(socmod9)

Variable `Component` is not in your data frame :/

socmod10 <- lm(FTHG~HomeTeam+HF, data=soccer)
check_model(socmod10)

Variable `Component` is not in your data frame :/

#Only compare models that do not violate colinearity!!!
compare_performance(socmod4,socmod6,socmod8,socmod10, rank=TRUE)

# Comparison of Model Performance Indices

Name     | Model |    R2 | R2 (adj.) |  RMSE | Sigma | AIC weights | AICc weights | BIC weights | Performance-Score
-------------------------------------------------------------------------------------------------------------------
socmod6  |    lm | 0.260 |     0.218 | 1.140 | 1.173 |       0.727 |        0.746 |    9.91e-14 |            85.70%
socmod4  |    lm | 0.260 |     0.216 | 1.140 | 1.174 |       0.270 |        0.243 |    5.13e-15 |            66.25%
socmod8  |    lm | 0.162 |     0.158 | 1.213 | 1.217 |       0.003 |        0.012 |       1.000 |            19.40%
socmod10 |    lm | 0.190 |     0.145 | 1.192 | 1.226 |    2.95e-08 |     3.02e-08 |    4.01e-21 |             8.16%

Technically, the best fit model is socmod6, which has the additive effects on FTHG for home team shots and home team. However, the most complex model that works the best is socmod4, which includes all terms (home team shots, home team, and home team fouls) in an additive model for their effect on full time home team goals. This best meets all of our assumptions of linearity, normality, equal variances, independence, and colinearity, and the assessment of the model is only a bit worse than socmod6.

4.) After identifying the best fit model, build the appropriate graph! See our multiple regression tutorial. Next, Build a coef plot for the model. Using patchwork, show me a 2-panel figure with the coef plot and the graph for the model

Build the appropriate graph

socmod4g<-socmod4 %>% 
  augment() %>%
  ggplot(aes(x=HS, y=FTHG, color=as.factor(HF)))+
  geom_point()+
  geom_line(aes(y=.fitted))+
  theme_classic()+
  facet_wrap(~HomeTeam)
socmod4g

ugly<-socmod4 %>% 
  augment() %>%
  ggplot(aes(x=HS, y=FTHG, color=HomeTeam))+
  geom_point(aes(size=HF))+
  geom_line(aes(y=.fitted),size=1)+
  theme_classic()

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

ugly

Build a coef plot for the model

coefs<-tidy(socmod4, quick=FALSE)
coefs

# A tibble: 22 × 5
   term                   estimate std.error statistic      p.value
   <chr>                     <dbl>     <dbl>     <dbl>        <dbl>
 1 (Intercept)              0.567     0.396     1.43   0.152       
 2 HS                       0.0711    0.0123    5.79   0.0000000151
 3 HomeTeamAston Villa      0.105     0.388     0.270  0.787       
 4 HomeTeamBrentford       -0.256     0.388    -0.661  0.509       
 5 HomeTeamBrighton        -0.587     0.384    -1.53   0.127       
 6 HomeTeamBurnley         -0.450     0.389    -1.16   0.248       
 7 HomeTeamChelsea          0.188     0.383     0.493  0.623       
 8 HomeTeamCrystal Palace   0.0173    0.389     0.0445 0.965       
 9 HomeTeamEverton          0.0265    0.389     0.0681 0.946       
10 HomeTeamLeeds           -0.524     0.388    -1.35   0.178       
# … with 12 more rows

ci<-data.table(confint(socmod4), keep.rownames='term')
ci

                      term       2.5 %     97.5 %
 1:            (Intercept) -0.21075026 1.34567111
 2:                     HS  0.04697836 0.09525245
 3:    HomeTeamAston Villa -0.65790223 0.86770425
 4:      HomeTeamBrentford -1.01962630 0.50674849
 5:       HomeTeamBrighton -1.34330647 0.16847085
 6:        HomeTeamBurnley -1.21505431 0.31484437
 7:        HomeTeamChelsea -0.56404249 0.94094445
 8: HomeTeamCrystal Palace -0.74815835 0.78277336
 9:        HomeTeamEverton -0.73870205 0.79165904
10:          HomeTeamLeeds -1.28763336 0.24012543
11:      HomeTeamLeicester -0.40505935 1.12026992
12:      HomeTeamLiverpool -0.23029360 1.27539679
13:       HomeTeamMan City  0.40498557 1.90586943
14:     HomeTeamMan United -0.65931506 0.84986417
15:      HomeTeamNewcastle -0.82887822 0.69807362
16:        HomeTeamNorwich -1.44200427 0.10053024
17:    HomeTeamSouthampton -0.97626468 0.55131870
18:      HomeTeamTottenham -0.27164995 1.24653097
19:        HomeTeamWatford -1.22220073 0.31782064
20:       HomeTeamWest Ham -0.44147985 1.08723763
21:         HomeTeamWolves -1.09133427 0.44009047
22:                     HF -0.03898277 0.03385659
                      term       2.5 %     97.5 %

cidf<-cbind(coefs,ci)
cidf

                     term    estimate  std.error   statistic      p.value
1             (Intercept)  0.56746042 0.39571125  1.43402653 1.524375e-01
2                      HS  0.07111540 0.01227341  5.79426619 1.505048e-08
3     HomeTeamAston Villa  0.10490101 0.38787673  0.27044935 7.869704e-01
4       HomeTeamBrentford -0.25643891 0.38807207 -0.66080227 5.091641e-01
5        HomeTeamBrighton -0.58741781 0.38436075 -1.52829811 1.273216e-01
6         HomeTeamBurnley -0.45010497 0.38896800 -1.15717736 2.479714e-01
7         HomeTeamChelsea  0.18845098 0.38263433  0.49250932 6.226612e-01
8  HomeTeamCrystal Palace  0.01730751 0.38923064  0.04446594 9.645578e-01
9         HomeTeamEverton  0.02647849 0.38908557  0.06805314 9.457813e-01
10          HomeTeamLeeds -0.52375396 0.38842394 -1.34840803 1.783796e-01
11      HomeTeamLeicester  0.35760529 0.38780626  0.92212356 3.570851e-01
12      HomeTeamLiverpool  0.52255160 0.38281318  1.36503032 1.731007e-01
13       HomeTeamMan City  1.15542750 0.38159115  3.02792008 2.640982e-03
14     HomeTeamMan United  0.09527456 0.38370020  0.24830468 8.040410e-01
15      HomeTeamNewcastle -0.06540230 0.38821878 -0.16846763 8.663105e-01
16        HomeTeamNorwich -0.67073702 0.39218059 -1.71027591 8.808125e-02
17    HomeTeamSouthampton -0.21247299 0.38837935 -0.54707591 5.846674e-01
18      HomeTeamTottenham  0.48744051 0.38598883  1.26283581 2.074701e-01
19        HomeTeamWatford -0.45219004 0.39154164 -1.15489644 2.489031e-01
20       HomeTeamWest Ham  0.32287889 0.38866769  0.83073252 4.066784e-01
21         HomeTeamWolves -0.32562190 0.38935599 -0.83630895 4.035390e-01
22                     HF -0.00256309 0.01851899 -0.13840334 8.899995e-01
                     term       2.5 %     97.5 %
1             (Intercept) -0.21075026 1.34567111
2                      HS  0.04697836 0.09525245
3     HomeTeamAston Villa -0.65790223 0.86770425
4       HomeTeamBrentford -1.01962630 0.50674849
5        HomeTeamBrighton -1.34330647 0.16847085
6         HomeTeamBurnley -1.21505431 0.31484437
7         HomeTeamChelsea -0.56404249 0.94094445
8  HomeTeamCrystal Palace -0.74815835 0.78277336
9         HomeTeamEverton -0.73870205 0.79165904
10          HomeTeamLeeds -1.28763336 0.24012543
11      HomeTeamLeicester -0.40505935 1.12026992
12      HomeTeamLiverpool -0.23029360 1.27539679
13       HomeTeamMan City  0.40498557 1.90586943
14     HomeTeamMan United -0.65931506 0.84986417
15      HomeTeamNewcastle -0.82887822 0.69807362
16        HomeTeamNorwich -1.44200427 0.10053024
17    HomeTeamSouthampton -0.97626468 0.55131870
18      HomeTeamTottenham -0.27164995 1.24653097
19        HomeTeamWatford -1.22220073 0.31782064
20       HomeTeamWest Ham -0.44147985 1.08723763
21         HomeTeamWolves -1.09133427 0.44009047
22                     HF -0.03898277 0.03385659

colnames(cidf)

[1] "term"      "estimate"  "std.error" "statistic" "p.value"   "term"     
[7] "2.5 %"     "97.5 %"

cidf<-cidf[,-6]

cidf<- cidf %>%
  rename("lower"="2.5 %",
         "upper"="97.5 %")

cidf

                     term    estimate  std.error   statistic      p.value
1             (Intercept)  0.56746042 0.39571125  1.43402653 1.524375e-01
2                      HS  0.07111540 0.01227341  5.79426619 1.505048e-08
3     HomeTeamAston Villa  0.10490101 0.38787673  0.27044935 7.869704e-01
4       HomeTeamBrentford -0.25643891 0.38807207 -0.66080227 5.091641e-01
5        HomeTeamBrighton -0.58741781 0.38436075 -1.52829811 1.273216e-01
6         HomeTeamBurnley -0.45010497 0.38896800 -1.15717736 2.479714e-01
7         HomeTeamChelsea  0.18845098 0.38263433  0.49250932 6.226612e-01
8  HomeTeamCrystal Palace  0.01730751 0.38923064  0.04446594 9.645578e-01
9         HomeTeamEverton  0.02647849 0.38908557  0.06805314 9.457813e-01
10          HomeTeamLeeds -0.52375396 0.38842394 -1.34840803 1.783796e-01
11      HomeTeamLeicester  0.35760529 0.38780626  0.92212356 3.570851e-01
12      HomeTeamLiverpool  0.52255160 0.38281318  1.36503032 1.731007e-01
13       HomeTeamMan City  1.15542750 0.38159115  3.02792008 2.640982e-03
14     HomeTeamMan United  0.09527456 0.38370020  0.24830468 8.040410e-01
15      HomeTeamNewcastle -0.06540230 0.38821878 -0.16846763 8.663105e-01
16        HomeTeamNorwich -0.67073702 0.39218059 -1.71027591 8.808125e-02
17    HomeTeamSouthampton -0.21247299 0.38837935 -0.54707591 5.846674e-01
18      HomeTeamTottenham  0.48744051 0.38598883  1.26283581 2.074701e-01
19        HomeTeamWatford -0.45219004 0.39154164 -1.15489644 2.489031e-01
20       HomeTeamWest Ham  0.32287889 0.38866769  0.83073252 4.066784e-01
21         HomeTeamWolves -0.32562190 0.38935599 -0.83630895 4.035390e-01
22                     HF -0.00256309 0.01851899 -0.13840334 8.899995e-01
         lower      upper
1  -0.21075026 1.34567111
2   0.04697836 0.09525245
3  -0.65790223 0.86770425
4  -1.01962630 0.50674849
5  -1.34330647 0.16847085
6  -1.21505431 0.31484437
7  -0.56404249 0.94094445
8  -0.74815835 0.78277336
9  -0.73870205 0.79165904
10 -1.28763336 0.24012543
11 -0.40505935 1.12026992
12 -0.23029360 1.27539679
13  0.40498557 1.90586943
14 -0.65931506 0.84986417
15 -0.82887822 0.69807362
16 -1.44200427 0.10053024
17 -0.97626468 0.55131870
18 -0.27164995 1.24653097
19 -1.22220073 0.31782064
20 -0.44147985 1.08723763
21 -1.09133427 0.44009047
22 -0.03898277 0.03385659

cidf$term=as.factor(cidf$term)

soccerci<- ggplot(data=cidf, aes(x=estimate, y=term))+
  geom_vline(xintercept = 0, linetype=2)+
  geom_point(size=3)+
  geom_errorbarh(aes(xmax=lower, xmin=upper),height=0.2)+
  theme_classic()
soccerci

Patchwork

ugly+soccerci

Depth

1.) Bootstrap the coef plot from Essential #4, above.

set.seed(420) 

soccer_intervals<- reg_intervals(FTHG ~ HS+HomeTeam+HF, data=soccer, 
                                   type='percentile',
                                   keep_reps=FALSE)

soccer_intervals

# A tibble: 21 × 6
   term                    .lower .estimate .upper .alpha .method   
   <chr>                    <dbl>     <dbl>  <dbl>  <dbl> <chr>     
 1 HF                     -0.0375  -0.00240 0.0325   0.05 percentile
 2 HomeTeamAston Villa    -0.623    0.0983  0.833    0.05 percentile
 3 HomeTeamBrentford      -0.987   -0.273   0.466    0.05 percentile
 4 HomeTeamBrighton       -1.38    -0.606   0.214    0.05 percentile
 5 HomeTeamBurnley        -1.18    -0.463   0.263    0.05 percentile
 6 HomeTeamChelsea        -0.636    0.175   1.03     0.05 percentile
 7 HomeTeamCrystal Palace -0.669    0.00141 0.728    0.05 percentile
 8 HomeTeamEverton        -0.717    0.00608 0.687    0.05 percentile
 9 HomeTeamLeeds          -1.16    -0.535   0.102    0.05 percentile
10 HomeTeamLeicester      -0.424    0.345   1.16     0.05 percentile
# … with 11 more rows

cleats<-ggplot(data=soccer_intervals, aes(x=.estimate, y=term))+
  geom_vline(xintercept=0, linetype=2)+
  geom_errorbarh(aes(xmin=.lower, xmax=.upper),height=0.2)+
  geom_point(size=3)+
  theme_classic()
cleats

2.) Calculate means and 95% CIs of full time home goals and full time away goals (using bootstrapping). Plot the results and interpret the plot (is there a home advantage or not?)

soccer1 <- soccer
n=nrow(soccer1)

soccerbs<- 1:1000 %>% 
  map_dfr(
    ~soccer1 %>%
      slice_sample(n=n, replace=TRUE) %>%
      summarize(meanhg=mean(FTHG), meanag=mean(FTAG))) %>%
  mutate(n=n)

soccerbs

# A tibble: 1,000 × 3
   meanhg meanag     n
    <dbl>  <dbl> <int>
 1   1.61   1.33   380
 2   1.50   1.48   380
 3   1.59   1.31   380
 4   1.45   1.34   380
 5   1.46   1.33   380
 6   1.39   1.29   380
 7   1.45   1.28   380
 8   1.61   1.31   380
 9   1.57   1.34   380
10   1.61   1.24   380
# … with 990 more rows

calc_CIs<-soccerbs %>%
  dplyr::summarize(meanhgboot=mean(meanhg),meanagboot=mean(meanag), CIhg=1.96*sd(meanhg), CIag=1.96*sd(meanag))

calc_CIs

# A tibble: 1 × 4
  meanhgboot meanagboot  CIhg  CIag
       <dbl>      <dbl> <dbl> <dbl>
1       1.51       1.31 0.132 0.126

CIs_long<- calc_CIs %>%
  pivot_longer(meanhgboot:meanagboot,names_to='team',values_to='goals') %>%
   pivot_longer(CIhg:CIag, names_to='CIcat',values_to='CI')

CIs_long

# A tibble: 4 × 4
  team       goals CIcat    CI
  <chr>      <dbl> <chr> <dbl>
1 meanhgboot  1.51 CIhg  0.132
2 meanhgboot  1.51 CIag  0.126
3 meanagboot  1.31 CIhg  0.132
4 meanagboot  1.31 CIag  0.126

CIs_long<- CIs_long[c(1,4),]

CIs_long

# A tibble: 2 × 4
  team       goals CIcat    CI
  <chr>      <dbl> <chr> <dbl>
1 meanhgboot  1.51 CIhg  0.132
2 meanagboot  1.31 CIag  0.126

CIs_long$team <- recode_factor(CIs_long$team, meanhgboot="Home Team",meanagboot="Away Team")

CIs_long

# A tibble: 2 × 4
  team      goals CIcat    CI
  <fct>     <dbl> <chr> <dbl>
1 Home Team  1.51 CIhg  0.132
2 Away Team  1.31 CIag  0.126

ggplot(data=CIs_long, aes(x=team, y=goals))+
  geom_point()+
  geom_errorbar(aes(ymin=goals-CI,ymax=goals+CI),width=0.2)+
  theme_classic()

Based on this graph, there is not a home advantage! There is a lot of overlap between the two confidence intervals.

3.) Add raw data behind your 95% CI plot above!

soccer2 <- soccer1 %>%
  select(FTHG,FTAG) %>%
  pivot_longer(FTHG:FTAG, names_to='team',values_to='goals')

soccer2$team<- recode_factor(soccer2$team, FTHG='Home Team',FTAG='Away Team')

ggplot(data=CIs_long, aes(x=team, y=goals))+
  geom_point()+
  geom_errorbar(aes(ymin=goals-CI,ymax=goals+CI),width=0.2)+
  geom_jitter(data=soccer2, aes(x=team,y=goals), alpha=0.6,size=0.3)+
  theme_classic()