Basic Use: Plotting Results from One or More Regression Models

Broom package

The broom package takes the messy output of built-in functions in R, such as lm, nls, or t.test, and turns them into tidy tibbles.

It summarizes key information about models in tidy tibble()s. broom provides three verbs to make it convenient to interact with model objects:

tidy() summarizes information about model components glance() reports information about the entire model augment() adds information about observations to a data set

#vignette("broom")

#Installation

# we recommend installing the entire tidyverse 
# modeling set, which includes broom:
#install.packages("tidymodels")

# alternatively, to install just broom:
#install.packages("broom")
#install.packages("https://cran.rstudio.com/bin/windows/Rtools")

# to get the development version from GitHub:
#install.packages("devtools")
#devtools::install_github("tidymodels/broom")

tidy() produces a tibble() where each row contains information about an important component of a model.For regression models, this often corresponds to regression coefficients. This is can be useful if you want to inspect a model or create custom visualizations.

#devtools::install_github("kassmbara/datarium")
#library(tidyverse) # for Data Manipulation
#install.packages("datarium")
library(broom)
library(tidymodels)
## ── Attaching packages ────────────────────────────────────── tidymodels 1.0.0 ──
## ✔ dials        1.1.0     ✔ rsample      1.1.1
## ✔ dplyr        1.1.0     ✔ tibble       3.1.8
## ✔ ggplot2      3.4.0     ✔ tidyr        1.3.0
## ✔ infer        1.0.4     ✔ tune         1.0.1
## ✔ modeldata    1.1.0     ✔ workflows    1.1.2
## ✔ parsnip      1.0.3     ✔ workflowsets 1.0.0
## ✔ purrr        1.0.1     ✔ yardstick    1.1.0
## ✔ recipes      1.0.4
## ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
## ✖ purrr::discard() masks scales::discard()
## ✖ dplyr::filter()  masks stats::filter()
## ✖ dplyr::lag()     masks stats::lag()
## ✖ recipes::step()  masks stats::step()
## • Search for functions across packages at https://www.tidymodels.org/find/
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2
## ──
## ✔ readr   2.1.3     ✔ forcats 1.0.0
## ✔ stringr 1.5.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ readr::col_factor() masks scales::col_factor()
## ✖ purrr::discard()    masks scales::discard()
## ✖ dplyr::filter()     masks stats::filter()
## ✖ stringr::fixed()    masks recipes::fixed()
## ✖ dplyr::lag()        masks stats::lag()
## ✖ readr::spec()       masks yardstick::spec()
library(datarium)
data("marketing", package = "datarium")
head(marketing, 4)
##   youtube facebook newspaper sales
## 1  276.12    45.36     83.04 26.52
## 2   53.40    47.16     54.12 12.48
## 3   20.64    55.08     83.16 11.16
## 4  181.80    49.56     70.20 22.20

Building model

We want to build a model for estimating sales based on the advertising budget invested in youtube, facebook and newspaper, as follow:

model <- lm(sales ~ youtube + facebook + newspaper, data = marketing)
summary(model)
## 
## Call:
## lm(formula = sales ~ youtube + facebook + newspaper, data = marketing)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.5932  -1.0690   0.2902   1.4272   3.3951 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.526667   0.374290   9.422   <2e-16 ***
## youtube      0.045765   0.001395  32.809   <2e-16 ***
## facebook     0.188530   0.008611  21.893   <2e-16 ***
## newspaper   -0.001037   0.005871  -0.177     0.86    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.023 on 196 degrees of freedom
## Multiple R-squared:  0.8972, Adjusted R-squared:  0.8956 
## F-statistic: 570.3 on 3 and 196 DF,  p-value: < 2.2e-16
summary(model)$coefficient # to see only the regression cofficient part
##                 Estimate  Std. Error    t value     Pr(>|t|)
## (Intercept)  3.526667243 0.374289884  9.4222884 1.267295e-17
## youtube      0.045764645 0.001394897 32.8086244 1.509960e-81
## facebook     0.188530017 0.008611234 21.8934961 1.505339e-54
## newspaper   -0.001037493 0.005871010 -0.1767146 8.599151e-01

For a given the predictor, the t-statistic evaluates whether or not there is significant association between the predictor and the outcome variable, that is whether the beta coefficient of the predictor is significantly different from zero.

It can be seen that, changing in youtube and facebook advertising budget are significantly associated to changes in sales while changes in newspaper budget is not significantly associated with sales.

For a given predictor variable, the coefficient (b) can be interpreted as the average effect on y of a one unit increase in predictor, holding all other predictors fixed.

For example, for a fixed amount of youtube and newspaper advertising budget, spending an additional 1 000 dollars on facebook advertising leads to an increase in sales by approximately 0.1885*1000 = 189 sale units, on average.

The youtube coefficient suggests that for every 1 000 dollars increase in youtube advertising budget, holding all other predictors constant, we can expect an increase of 0.045*1000 = 45 sales units, on average.

We found that newspaper is not significant in the multiple regression model. This means that, for a fixed amount of youtube and newspaper advertising budget, changes in the newspaper advertising budget will not significantly affect sales units.

As the newspaper variable is not significant, it is possible to remove it from the model:

model2  <- lm(sales ~ youtube + facebook, data = marketing)
summary(model)
## 
## Call:
## lm(formula = sales ~ youtube + facebook + newspaper, data = marketing)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.5932  -1.0690   0.2902   1.4272   3.3951 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.526667   0.374290   9.422   <2e-16 ***
## youtube      0.045765   0.001395  32.809   <2e-16 ***
## facebook     0.188530   0.008611  21.893   <2e-16 ***
## newspaper   -0.001037   0.005871  -0.177     0.86    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.023 on 196 degrees of freedom
## Multiple R-squared:  0.8972, Adjusted R-squared:  0.8956 
## F-statistic: 570.3 on 3 and 196 DF,  p-value: < 2.2e-16

Finally, our model equation can be written as follow: sales = 3.5 + 0.045youtube + 0.187facebook.

The confidence interval of the model coefficient can be extracted as follow:

confint(model2)
##                  2.5 %     97.5 %
## (Intercept) 2.80841159 4.20222820
## youtube     0.04301292 0.04849671
## facebook    0.17213877 0.20384969

Model accuracy assessment

As we have seen in simple linear regression, the overall quality of the model can be assessed by examining the R-squared (R2) and Residual Standard Error (RSE).

R-squared:

In multiple linear regression, the R2 represents the correlation coefficient between the observed values of the outcome variable (y) and the fitted (i.e., predicted) values of y. For this reason, the value of R will always be positive and will range from zero to one.

R2 represents the proportion of variance, in the outcome variable y, that may be predicted by knowing the value of the x variables. An R2 value close to 1 indicates that the model explains a large portion of the variance in the outcome variable.

A problem with the R2, is that, it will always increase when more variables are added to the model, even if those variables are only weakly associated with the response (James et al. 2014). A solution is to adjust the R2 by taking into account the number of predictor variables.

Adjusted R Square

The adjustment in the “Adjusted R Square” value in the summary output is a correction for the number of x variables included in the prediction model.

Residual Standard Error (RSE), or sigma:

The RSE estimate gives a measure of error of prediction. The lower the RSE, the more accurate the model (on the data in hand).The error-rate can be estimated by

dividing the RSE by the mean outcome variable:

sigma(model2)/mean(marketing$sales)
## [1] 0.1199045

These model objects are messy. why?

1.Extracting coefficients takes multiple steps:data.frame(coef(summary(lmfit))

  1. Information in row names (can’t combine models)

  2. Column names are inconvenient (access with $“Pr(>|t|)”,converts to Pr…t..)

  3. Information is computed in print method, not stored

Thus, broom’s tidy() method does the work for us!

broom’s three methods

broom defines tidying methods for extracting three kinds of statistics from an object:

augment(): observation-level statistics tidy(): component-level statistics glance(): model-level statistics

broom takes model objects and turns them into tidy data frames that can be used with tidy tools such as dplyr, ggplot2, tidyverse, etc

The tidy() method: component-level statistics - each row is a coefficient

library(broom)
tidy(model2) # each row is a coefficient
## # A tibble: 3 × 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)   3.51     0.353        9.92 4.57e-19
## 2 youtube       0.0458   0.00139     32.9  5.44e-82
## 3 facebook      0.188    0.00804     23.4  9.78e-59
            #One function to call,Information stored in columns,never row names,
            # Convenient column names including the p-value column

The augment() method: observation-level statistics

each row is an observation from the original data

augment(model2) # note that new columns start with .
## # A tibble: 200 × 9
##    sales youtube facebook .fitted  .resid    .hat .sigma   .cooksd .std.resid
##    <dbl>   <dbl>    <dbl>   <dbl>   <dbl>   <dbl>  <dbl>     <dbl>      <dbl>
##  1 26.5    276.     45.4    24.7   1.85   0.0140    2.02 0.00406       0.925 
##  2 12.5     53.4    47.2    14.8  -2.33   0.0188    2.02 0.00871      -1.17  
##  3 11.2     20.6    55.1    14.8  -3.64   0.0295    2.01 0.0341       -1.83  
##  4 22.2    182.     49.6    21.1   1.06   0.0124    2.02 0.00117       0.528 
##  5 15.5    217.     13.0    15.9  -0.389  0.00951   2.02 0.000120     -0.194 
##  6  8.64    10.4    58.7    15.0  -6.37   0.0347    1.97 0.124        -3.22  
##  7 14.2     69      39.4    14.1   0.0981 0.0129    2.02 0.0000105     0.0490
##  8 15.8    144.     23.5    14.5   1.31   0.00576   2.02 0.000823      0.653 
##  9  5.76    10.3     2.52    4.45  1.31   0.0271    2.02 0.00401       0.658 
## 10 12.7    240.      3.12   15.1  -2.34   0.0171    2.02 0.00797      -1.17  
## # … with 190 more rows

The glance() method: model-level statistics - one row for the model

glance(model2)
## # A tibble: 1 × 12
##   r.squared adj.r.squa…¹ sigma stati…²  p.value    df logLik   AIC   BIC devia…³
##       <dbl>        <dbl> <dbl>   <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>   <dbl>
## 1     0.897        0.896  2.02    860. 4.83e-98     2  -423.  853.  867.    802.
## # … with 2 more variables: df.residual <int>, nobs <int>, and abbreviated
## #   variable names ¹​adj.r.squared, ²​statistic, ³​deviance

broom works across many kinds of model objects

#Nonlinear least squares: before

n <- nls(mpg ~ k * e ^ wt, data = mtcars, start = list(k = 1, e = 2))
summary(n)
## 
## Formula: mpg ~ k * e^wt
## 
## Parameters:
##   Estimate Std. Error t value Pr(>|t|)    
## k 49.65969    3.78876   13.11 5.96e-14 ***
## e  0.74559    0.01986   37.54  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.668 on 30 degrees of freedom
## 
## Number of iterations to convergence: 10 
## Achieved convergence tolerance: 2.043e-06

Nonlinear least squares: after

tidy(n) # each row is one estimated parameter
## # A tibble: 2 × 5
##   term  estimate std.error statistic  p.value
##   <chr>    <dbl>     <dbl>     <dbl>    <dbl>
## 1 k       49.7      3.79        13.1 5.96e-14
## 2 e        0.746    0.0199      37.5 8.86e-27
augment(n) # each row is an observation from the original data 
## # A tibble: 32 × 4
##      mpg    wt .fitted .resid
##    <dbl> <dbl>   <dbl>  <dbl>
##  1  21    2.62    23.0 -2.01 
##  2  21    2.88    21.4 -0.352
##  3  22.8  2.32    25.1 -2.33 
##  4  21.4  3.22    19.3  2.08 
##  5  18.7  3.44    18.1  0.611
##  6  18.1  3.46    18.0  0.117
##  7  14.3  3.57    17.4 -3.11 
##  8  24.4  3.19    19.5  4.93 
##  9  22.8  3.15    19.7  3.10 
## 10  19.2  3.44    18.1  1.11 
## # … with 22 more rows
glance(n) # one row for the model
## # A tibble: 1 × 9
##   sigma isConv     finTol logLik   AIC   BIC deviance df.residual  nobs
##   <dbl> <lgl>       <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
## 1  2.67 TRUE   0.00000204  -75.8  158.  162.     214.          30    32

K-means clustering: before

m <- as.matrix(iris[, -5])
k <- kmeans(m, 3)
k
## K-means clustering with 3 clusters of sizes 50, 38, 62
## 
## Cluster means:
##   Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1     5.006000    3.428000     1.462000    0.246000
## 2     6.850000    3.073684     5.742105    2.071053
## 3     5.901613    2.748387     4.393548    1.433871
## 
## Clustering vector:
##   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
##  [75] 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 2 2 2 2 3 2 2 2 2
## [112] 2 2 3 3 2 2 2 2 3 2 3 2 3 2 2 3 3 2 2 2 2 2 3 2 2 2 2 3 2 2 2 3 2 2 2 3 2
## [149] 2 3
## 
## Within cluster sum of squares by cluster:
## [1] 15.15100 23.87947 39.82097
##  (between_SS / total_SS =  88.4 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

K-means clustering: after

tidy(k) # each row is one cluster
## # A tibble: 3 × 7
##   Sepal.Length Sepal.Width Petal.Length Petal.Width  size withinss cluster
##          <dbl>       <dbl>        <dbl>       <dbl> <int>    <dbl> <fct>  
## 1         5.01        3.43         1.46       0.246    50     15.2 1      
## 2         6.85        3.07         5.74       2.07     38     23.9 2      
## 3         5.90        2.75         4.39       1.43     62     39.8 3
head(augment(k, m)) # each row is one assignment
## # A tibble: 6 × 5
##   Sepal.Length Sepal.Width Petal.Length Petal.Width .cluster
##          <dbl>       <dbl>        <dbl>       <dbl> <fct>   
## 1          5.1         3.5          1.4         0.2 1       
## 2          4.9         3            1.4         0.2 1       
## 3          4.7         3.2          1.3         0.2 1       
## 4          4.6         3.1          1.5         0.2 1       
## 5          5           3.6          1.4         0.2 1       
## 6          5.4         3.9          1.7         0.4 1
glance(k) # one row describing the entire clustering operation
## # A tibble: 1 × 4
##   totss tot.withinss betweenss  iter
##   <dbl>        <dbl>     <dbl> <int>
## 1  681.         78.9      603.     3

#Why are tidy models useful? 1. ggplot2 can visualize tidy model data

td <- tidy(model2, conf.int = TRUE)
ggplot(td, aes(estimate, term, color = term))+
  geom_point()+
  geom_errorbarh(aes(xmin = conf.low, xmax = conf.high))

# Multiple Models

Tidy models can be combined and compared • different parameters • different methods • bootstrap replicates • subgroup models (within each country, gene…) • ensemble voting

augumented <- augment(model2)
ggplot(augumented, aes(facebook,.fitted)) +geom_line()

dotwhisker: Dot-and-Whisker Plots of Regression Results

Basic Use: Plotting Results from One or More Regression Models Generating dot-and-whisker plots from model objects generated by the most commonly used regression functions is straightforward. To make a basic dot-and-whisker plot of any single model object of a class supported by parameters::parameters, simply pass it to dwplot.

#Package preload
#install.packages("dotwhisker")
library(dotwhisker)
library(dplyr)
# draw a dot-and-whisker plot
dwplot(model2)

By default, the whiskers span the 95% confidence interval. To change the width of the confidence interval, specify a ci argument to pass to parameters::parameters():

dwplot(model2, ci = .60) +   # using 60% of confidence intervals
    theme(legend.position = "none") 

# run a regression compatible with tidy
m1 <- lm(mpg ~ wt + cyl + disp + gear, data = mtcars)

# draw a dot-and-whisker plot
dwplot(m1)

Plotting the results of more than one regression model is just as easy.

Just pass the model objects to dwplot as a list. The dodge_size argument is used to adjust the space between the estimates of one variable when multiple models are presented in a single plot. Its default value of .4 will usually be fine, but, depending on the dimensions of the desired plot, more pleasing results may be achieved by setting dodge_size to lower values when the plotted results include a relatively small number of predictors or to higher values when many models appear on the same plot.

m2 <- update(m1, . ~ . + hp) # add another predictor
m3 <- update(m2, . ~ . + am) # and another 

dwplot(list(m1, m2, m3))

Model intercepts are rarely theoretically interesting (see Kastellec and Leoni 2007, 765),so they are excluded by dwplot by default. They are easy to include if desired, however, by setting the show_intercept argument to true.

dwplot(list(m1, m2, m3), show_intercept = TRUE)

Users are free to customize the order of the models and variables to present with the arguments model_order and vars_order. Moreover, the output of dwplot is a ggplot object. Add or change any ggplot layers after calling dwplot to achieve the desired presentation. Users can provide a named character vector to relabel_predictors, a dotwhisker function, conveniently renames the predictors. Note that both vars_order and relabel_predictors changes the presenting order of variables. When both are used, the later overwrites the former.

dwplot(list(m1, m2, m3),
       vline = geom_vline(
           xintercept = 0,
           colour = "grey60",
           linetype = 2
       ),
       vars_order = c("am", "cyl", "disp", "gear", "hp", "wt"),
       model_order = c("Model 2", "Model 1", "Model 3")
       ) %>% # plot line at zero _behind_coefs
    relabel_predictors(
        c(
            am = "Manual",
            cyl = "Cylinders",
            disp = "Displacement",
            wt = "Weight",
            gear = "Gears",
            hp = "Horsepower"
        )
    ) +
    theme_bw(base_size = 4) + 
    # Setting `base_size` for fit the theme
    # No need to set `base_size` in most usage
    xlab("Coefficient Estimate") + ylab("") +
    geom_vline(xintercept = 0,
               colour = "grey60",
               linetype = 2) +
    ggtitle("Predicting Gas Mileage") +
    theme(
        plot.title = element_text(face = "bold"),
        legend.position = c(0.007, 0.01),
        legend.justification = c(0, 0),
        legend.background = element_rect(colour = "grey80"),
        legend.title = element_blank()
    ) 

Advanced Use: Decoration and Modification

##Plotting Results Stored in a Tidy Data Frame

# regression compatible with tidy
m1_df <- broom::tidy(m1) # create data.frame of regression results
m1_df # a tidy data.frame available for dwplot
## # A tibble: 5 × 5
##   term        estimate std.error statistic       p.value
##   <chr>          <dbl>     <dbl>     <dbl>         <dbl>
## 1 (Intercept) 43.5        4.86       8.96  0.00000000142
## 2 wt          -3.79       1.08      -3.51  0.00161      
## 3 cyl         -1.78       0.614     -2.91  0.00722      
## 4 disp         0.00694    0.0120     0.578 0.568        
## 5 gear        -0.490      0.790     -0.621 0.540
dwplot(m1_df) #same as dwplot(m1)

Using tidy can be helpful when one wishes to omit certain model estimates from the plot. To illustrate, we drop the intercept (although this is in fact done by dwplot automatically by default):

m1_df <-
    broom::tidy(m1) %>% filter(term != "(Intercept)") %>% mutate(model = "Model 1")
m2_df <-
    broom::tidy(m2) %>% filter(term != "(Intercept)") %>% mutate(model = "Model 2")

two_models <- rbind(m1_df, m2_df)

dwplot(two_models)

# Transform cyl to factor variable in the data
m_factor <-
    lm(mpg ~ wt + cyl + disp + gear, data = mtcars %>% mutate(cyl = factor(cyl)))

# Remove all model estimates that start with cyl*
m_factor_df <- broom::tidy(m_factor) %>%
    filter(!grepl('cyl*', term))

dwplot(m_factor_df)

# Run model on subsets of data, save results as tidy df, make a model variable, 
#and relabel predictors
by_trans <- mtcars %>%
    group_by(am) %>%                                    # group data by trans
    do(broom::tidy(lm(mpg ~ wt + cyl + disp + gear, data = .))) %>%
  # run model on each grp
    rename(model = am) %>%                       # make model variable
    relabel_predictors(c(
        wt = "Weight",
        # relabel predictors
        cyl = "Cylinders",
        disp = "Displacement",
        gear = "Gear"
    ))

by_trans
## # A tibble: 10 × 6
## # Groups:   model [2]
##    model term         estimate std.error statistic  p.value
##    <dbl> <fct>           <dbl>     <dbl>     <dbl>    <dbl>
##  1     0 Weight       -2.81       1.27      -2.22  0.0434  
##  2     0 Cylinders    -1.30       0.599     -2.17  0.0473  
##  3     0 Displacement  0.00692    0.0129     0.534 0.601   
##  4     0 Gear          1.26       1.81       0.696 0.498   
##  5     0 (Intercept)  30.7        7.41       4.15  0.000986
##  6     1 Weight       -7.53       2.77      -2.71  0.0265  
##  7     1 Cylinders     0.198      1.70       0.116 0.910   
##  8     1 Displacement -0.0146     0.0315    -0.464 0.655   
##  9     1 Gear         -1.08       2.14      -0.506 0.627   
## 10     1 (Intercept)  48.4       11.1        4.34  0.00247
dwplot(by_trans,
       vline = geom_vline(
           xintercept = 0,
           colour = "grey60",
           linetype = 2
       )) + # plot line at zero _behind_ coefs
    theme_bw(base_size = 4) + xlab("Coefficient Estimate") + ylab("") +
    ggtitle("Predicting Gas Mileage by Transmission Type") +
    theme(
        plot.title = element_text(face = "bold"),
        legend.position = c(0.007, 0.01),
        legend.justification = c(0, 0),
        legend.background = element_rect(colour = "grey80"),
        legend.title.align = .5
    ) +
    scale_colour_grey(
        start = .3,
        end = .7,
        name = "Transmission",
        breaks = c(0, 1),
        labels = c("Automatic", "Manual")
    )

dwplot(
    by_trans,
    vline = geom_vline(
        xintercept = 0,
        colour = "grey60",
        linetype = 2
    ),
    # plot line at zero _behind_ coefs
    dot_args = list(aes(shape = model)),
    whisker_args = list(aes(linetype = model))
) +
    theme_bw(base_size = 4) + xlab("Coefficient Estimate") + ylab("") +
    ggtitle("Predicting Gas Mileage by Transmission Type") +
    theme(
        plot.title = element_text(face = "bold"),
        legend.position = c(0.007, 0.01),
        legend.justification = c(0, 0),
        legend.background = element_rect(colour = "grey80"),
        legend.title.align = .5
    ) +
    scale_colour_grey(
        start = .1,
        end = .1,
        # if start and end same value, use same colour for all models
        name = "Model",
        breaks = c(0, 1),
        labels = c("Automatic", "Manual")
    ) +
    scale_shape_discrete(
        name = "Model",
        breaks = c(0, 1),
        labels = c("Automatic", "Manual")
    ) +
    guides(
        shape = guide_legend("Model"), 
        colour = guide_legend("Model")
    ) # Combine the legends for shape and color