Libraries & Data

We’ll be using a couple of new packages today, so install car, ppcor, and GGally if you don’t have them on your machine already. We’ll also be revisiting Winter’s data on iconicity that we saw in an earlier chapter.

library(tidyverse)
library(broom)
library(car)
library(ppcor)
library(GGally)

icon <- read_csv("perry_winter_2017_iconicity.csv")

-- Column specification ----------------------------------------------------------------------------------
cols(
  Word = col_character(),
  POS = col_character(),
  SER = col_double(),
  CorteseImag = col_double(),
  Conc = col_double(),
  Syst = col_double(),
  Freq = col_double(),
  Iconicity = col_double()
)
d <- read_csv("isbell_under_rev.csv")

-- Column specification ----------------------------------------------------------------------------------
cols(
  Speaker = col_character(),
  Gender = col_character(),
  L1 = col_character(),
  EIT = col_double(),
  Comprehensibility = col_double(),
  Accentedness = col_double(),
  SA_Comprehensibility = col_double(),
  SA_Accentedness = col_double(),
  Satisfaction = col_double(),
  Value = col_double(),
  Kor_Use = col_double(),
  LivingSK = col_double(),
  SKKorSchool = col_double(),
  OtherKorSchool = col_double(),
  TotalKorSchool = col_double(),
  Age = col_double(),
  AO = col_double()
)

You might notice some odd error messages - the car package requires some additional packages that have functions that overlap names of some tidyverse packages. That’s okay - we can force R to use a specific package’s function when needed.

Prelude: looking at partial correlations and regressions

We’re going to play with some data from a study I have under review (and your HW last week) to examine how partial correlations relate to multiple regressions.

Let’s look at some simple correlations between 3 variables first: Self-assessed comprehensibility, (listener-assessed) comprehensibility, and EIT scores.

d %>% dplyr::select(SA_Comprehensibility, Comprehensibility, EIT) %>%
  cor() -> corrs

round(corrs, 2)
                     SA_Comprehensibility Comprehensibility  EIT
SA_Comprehensibility                 1.00              0.54 0.51
Comprehensibility                    0.54              1.00 0.70
EIT                                  0.51              0.70 1.00

EIT and Comprehensibility are rather strongly correlated (~.70) and both are moderately correlated with SA_Comprehensibility (~.50).

Now we’ll use the pcor() function from ppcor to examine the partial correlations of the 3 variables.

d %>% dplyr::select(SA_Comprehensibility, Comprehensibility, EIT) %>%
  pcor() -> partials

round(partials$estimate, 2)
                     SA_Comprehensibility Comprehensibility  EIT
SA_Comprehensibility                 1.00              0.29 0.23
Comprehensibility                    0.29              1.00 0.59
EIT                                  0.23              0.59 1.00

You’ll notice that these partial correlation coefficients are all smaller - they account for the correlation between each member of the pair and the third variable prior to calculating the relationship between the pair.

Now for a regression. We can actually standardize the variables within the regression equation itself, though it’s generally better to follow Winter’s practice of creating new variables.

d_mod <- lm(scale(SA_Comprehensibility) ~ scale(Comprehensibility) + scale(EIT), data = d)

tidy(d_mod) %>% dplyr::select(term, estimate) %>% mutate(estimate = round(estimate, 2))

The regression coefficients are not exactly the same as partial correlation estimates, but they are similar. In calculating both, there is a common numerator but different denominators, which leads to some differences in the final values.

The point here is that like univariate regression, multiple regression runs on the same kind of logic and principles as (partial) correlation.

Transformations

We’ll do some transformations next:

icon <- icon %>% mutate(Log10Freq = log10(Freq),
                        SER_z = scale(SER),
                        CorteseImag_z = scale(CorteseImag),
                        Syst_z = scale(Syst),
                        Freq_z = scale(Log10Freq))

Exploration

The ggpairs() function from GGally can be very helpful for examining the distributions of and correlations among several variables.

icon %>% dplyr::select(Iconicity, SER, CorteseImag, Syst, Log10Freq) %>%
  ggpairs(axisLabels = "none", progress = F,
          lower = list(continuous = wrap("points", alpha = 0.3)))

An unstandardized multiple regression model

Now we’ll specify and run (‘fit’) the model from Ch. 6 based on unstandardized variables.

icon_mdl <- lm(Iconicity ~ SER + CorteseImag + Syst + Log10Freq, data = icon)

summary(icon_mdl)

Call:
lm(formula = Iconicity ~ SER + CorteseImag + Syst + Log10Freq, 
    data = icon)

Residuals:
     Min       1Q   Median       3Q      Max 
-3.07601 -0.71411 -0.03824  0.67337  2.76066 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)   1.54476    0.18795   8.219 6.43e-16 ***
SER           0.49713    0.04012  12.391  < 2e-16 ***
CorteseImag  -0.26328    0.02500 -10.531  < 2e-16 ***
Syst        401.52431  262.90268   1.527    0.127    
Log10Freq    -0.25163    0.03741  -6.725 2.97e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.002 on 984 degrees of freedom
  (2012 observations deleted due to missingness)
Multiple R-squared:  0.2125,    Adjusted R-squared:  0.2093 
F-statistic: 66.36 on 4 and 984 DF,  p-value: < 2.2e-16

Remember, there are tools for more ‘cleanly’ extracting model info from the broom package.

glance(icon_mdl)$r.squared
[1] 0.2124559
tidy(icon_mdl) %>% dplyr::select(term, estimate)

A standardized multiple regression model

Note: the Iconicity variable is pre-standardized.

icon_mdl_z <- lm(Iconicity ~ SER_z + CorteseImag_z, data = icon)

summary(icon_mdl_z)

Call:
lm(formula = Iconicity ~ SER_z + CorteseImag_z, data = icon)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.2164 -0.7381 -0.1050  0.6143  3.0554 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)    1.12971    0.03319  34.036   <2e-16 ***
SER_z          0.57965    0.04040  14.349   <2e-16 ***
CorteseImag_z -0.33641    0.03535  -9.517   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.032 on 1088 degrees of freedom
  (1910 observations deleted due to missingness)
Multiple R-squared:  0.1658,    Adjusted R-squared:  0.1643 
F-statistic: 108.1 on 2 and 1088 DF,  p-value: < 2.2e-16

Assumptions

We start checking our assumptions by looking at residuals.

A histogram:

hist(resid(icon_mdl))

Next we look at a “QQ Plot”, which roughly plots the mean of the standardized outcome variable and the mean of standardized predicted values at various windows (quantiles). We’re looking for a straight line.

qqnorm(resid(icon_mdl))

Next comes a plot of the residuals and predicted values. We hope to not see any patterns - that is, we hope to see a largely formless cloud or blob.

plot(fitted(icon_mdl), resid(icon_mdl))

Now we’ll take a closer look at the predictor variables to see if collinearity might be a concern. We can use the vif() function from the car package. Values greater than 10 are really bad, and values greater than ~4 are concerning.

vif(icon_mdl)
        SER CorteseImag        Syst   Log10Freq 
   1.148597    1.143599    1.015054    1.020376 
LS0tDQp0aXRsZTogIk11bHRpcGxlIFJlZ3Jlc3Npb24iDQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sNCi0tLQ0KDQojIExpYnJhcmllcyAmIERhdGENCg0KV2UnbGwgYmUgdXNpbmcgYSBjb3VwbGUgb2YgbmV3IHBhY2thZ2VzIHRvZGF5LCBzbyBpbnN0YWxsIGBjYXJgLCBgcHBjb3JgLCBhbmQgYEdHYWxseWAgaWYgeW91IGRvbid0IGhhdmUgdGhlbSBvbiB5b3VyIG1hY2hpbmUgYWxyZWFkeS4gV2UnbGwgYWxzbyBiZSByZXZpc2l0aW5nIFdpbnRlcidzIGRhdGEgb24gaWNvbmljaXR5IHRoYXQgd2Ugc2F3IGluIGFuIGVhcmxpZXIgY2hhcHRlci4NCg0KYGBge3J9DQpsaWJyYXJ5KHRpZHl2ZXJzZSkNCmxpYnJhcnkoYnJvb20pDQpsaWJyYXJ5KGNhcikNCmxpYnJhcnkocHBjb3IpDQpsaWJyYXJ5KEdHYWxseSkNCg0KaWNvbiA8LSByZWFkX2NzdigicGVycnlfd2ludGVyXzIwMTdfaWNvbmljaXR5LmNzdiIpDQpkIDwtIHJlYWRfY3N2KCJpc2JlbGxfdW5kZXJfcmV2LmNzdiIpDQpgYGANCg0KWW91IG1pZ2h0IG5vdGljZSBzb21lIG9kZCBlcnJvciBtZXNzYWdlcyAtIHRoZSBjYXIgcGFja2FnZSByZXF1aXJlcyBzb21lIGFkZGl0aW9uYWwgcGFja2FnZXMgdGhhdCBoYXZlIGZ1bmN0aW9ucyB0aGF0IG92ZXJsYXAgbmFtZXMgb2Ygc29tZSB0aWR5dmVyc2UgcGFja2FnZXMuIFRoYXQncyBva2F5IC0gd2UgY2FuIGZvcmNlIFIgdG8gdXNlIGEgc3BlY2lmaWMgcGFja2FnZSdzIGZ1bmN0aW9uIHdoZW4gbmVlZGVkLg0KDQojIFByZWx1ZGU6IGxvb2tpbmcgYXQgcGFydGlhbCBjb3JyZWxhdGlvbnMgYW5kIHJlZ3Jlc3Npb25zDQoNCldlJ3JlIGdvaW5nIHRvIHBsYXkgd2l0aCBzb21lIGRhdGEgZnJvbSBhIHN0dWR5IEkgaGF2ZSB1bmRlciByZXZpZXcgKGFuZCB5b3VyIEhXIGxhc3Qgd2VlaykgdG8gZXhhbWluZSBob3cgcGFydGlhbCBjb3JyZWxhdGlvbnMgcmVsYXRlIHRvIG11bHRpcGxlIHJlZ3Jlc3Npb25zLg0KDQpMZXQncyBsb29rIGF0IHNvbWUgc2ltcGxlIGNvcnJlbGF0aW9ucyBiZXR3ZWVuIDMgdmFyaWFibGVzIGZpcnN0OiBTZWxmLWFzc2Vzc2VkIGNvbXByZWhlbnNpYmlsaXR5LCAobGlzdGVuZXItYXNzZXNzZWQpIGNvbXByZWhlbnNpYmlsaXR5LCBhbmQgRUlUIHNjb3Jlcy4NCg0KYGBge3J9DQpkICU+JSBkcGx5cjo6c2VsZWN0KFNBX0NvbXByZWhlbnNpYmlsaXR5LCBDb21wcmVoZW5zaWJpbGl0eSwgRUlUKSAlPiUNCiAgY29yKCkgLT4gY29ycnMNCg0Kcm91bmQoY29ycnMsIDIpDQpgYGANCkVJVCBhbmQgQ29tcHJlaGVuc2liaWxpdHkgYXJlIHJhdGhlciBzdHJvbmdseSBjb3JyZWxhdGVkICh+LjcwKSBhbmQgYm90aCBhcmUgbW9kZXJhdGVseSBjb3JyZWxhdGVkIHdpdGggU0FfQ29tcHJlaGVuc2liaWxpdHkgKH4uNTApLg0KDQpOb3cgd2UnbGwgdXNlIHRoZSBgcGNvcigpYCBmdW5jdGlvbiBmcm9tIGBwcGNvcmAgdG8gZXhhbWluZSB0aGUgcGFydGlhbCBjb3JyZWxhdGlvbnMgb2YgdGhlIDMgdmFyaWFibGVzLg0KYGBge3J9DQpkICU+JSBkcGx5cjo6c2VsZWN0KFNBX0NvbXByZWhlbnNpYmlsaXR5LCBDb21wcmVoZW5zaWJpbGl0eSwgRUlUKSAlPiUNCiAgcGNvcigpIC0+IHBhcnRpYWxzDQoNCnJvdW5kKHBhcnRpYWxzJGVzdGltYXRlLCAyKQ0KYGBgDQpZb3UnbGwgbm90aWNlIHRoYXQgdGhlc2UgcGFydGlhbCBjb3JyZWxhdGlvbiBjb2VmZmljaWVudHMgYXJlIGFsbCBzbWFsbGVyIC0gdGhleSBhY2NvdW50IGZvciB0aGUgY29ycmVsYXRpb24gYmV0d2VlbiBlYWNoIG1lbWJlciBvZiB0aGUgcGFpciBhbmQgdGhlIHRoaXJkIHZhcmlhYmxlIHByaW9yIHRvIGNhbGN1bGF0aW5nIHRoZSByZWxhdGlvbnNoaXAgYmV0d2VlbiB0aGUgcGFpci4NCg0KTm93IGZvciBhIHJlZ3Jlc3Npb24uIFdlIGNhbiBhY3R1YWxseSBzdGFuZGFyZGl6ZSB0aGUgdmFyaWFibGVzIHdpdGhpbiB0aGUgcmVncmVzc2lvbiBlcXVhdGlvbiBpdHNlbGYsIHRob3VnaCBpdCdzIGdlbmVyYWxseSBiZXR0ZXIgdG8gZm9sbG93IFdpbnRlcidzIHByYWN0aWNlIG9mIGNyZWF0aW5nIG5ldyB2YXJpYWJsZXMuDQoNCmBgYHtyfQ0KZF9tb2QgPC0gbG0oc2NhbGUoU0FfQ29tcHJlaGVuc2liaWxpdHkpIH4gc2NhbGUoQ29tcHJlaGVuc2liaWxpdHkpICsgc2NhbGUoRUlUKSwgZGF0YSA9IGQpDQoNCnRpZHkoZF9tb2QpICU+JSBkcGx5cjo6c2VsZWN0KHRlcm0sIGVzdGltYXRlKSAlPiUgbXV0YXRlKGVzdGltYXRlID0gcm91bmQoZXN0aW1hdGUsIDIpKQ0KYGBgDQoNClRoZSByZWdyZXNzaW9uIGNvZWZmaWNpZW50cyBhcmUgbm90IGV4YWN0bHkgdGhlIHNhbWUgYXMgcGFydGlhbCBjb3JyZWxhdGlvbiBlc3RpbWF0ZXMsIGJ1dCB0aGV5IGFyZSBzaW1pbGFyLiBJbiBjYWxjdWxhdGluZyBib3RoLCB0aGVyZSBpcyBhIGNvbW1vbiBudW1lcmF0b3IgYnV0IGRpZmZlcmVudCBkZW5vbWluYXRvcnMsIHdoaWNoIGxlYWRzIHRvIHNvbWUgZGlmZmVyZW5jZXMgaW4gdGhlIGZpbmFsIHZhbHVlcy4gDQoNClRoZSBwb2ludCBoZXJlIGlzIHRoYXQgbGlrZSB1bml2YXJpYXRlIHJlZ3Jlc3Npb24sIG11bHRpcGxlIHJlZ3Jlc3Npb24gcnVucyBvbiB0aGUgc2FtZSBraW5kIG9mIGxvZ2ljIGFuZCBwcmluY2lwbGVzIGFzIChwYXJ0aWFsKSBjb3JyZWxhdGlvbi4NCg0KIyBUcmFuc2Zvcm1hdGlvbnMNCg0KV2UnbGwgZG8gc29tZSB0cmFuc2Zvcm1hdGlvbnMgbmV4dDoNCg0KYGBge3J9DQppY29uIDwtIGljb24gJT4lIG11dGF0ZShMb2cxMEZyZXEgPSBsb2cxMChGcmVxKSwNCiAgICAgICAgICAgICAgICAgICAgICAgIFNFUl96ID0gc2NhbGUoU0VSKSwNCiAgICAgICAgICAgICAgICAgICAgICAgIENvcnRlc2VJbWFnX3ogPSBzY2FsZShDb3J0ZXNlSW1hZyksDQogICAgICAgICAgICAgICAgICAgICAgICBTeXN0X3ogPSBzY2FsZShTeXN0KSwNCiAgICAgICAgICAgICAgICAgICAgICAgIEZyZXFfeiA9IHNjYWxlKExvZzEwRnJlcSkpDQpgYGANCg0KIyBFeHBsb3JhdGlvbg0KDQpUaGUgYGdncGFpcnMoKWAgZnVuY3Rpb24gZnJvbSBgR0dhbGx5YCBjYW4gYmUgdmVyeSBoZWxwZnVsIGZvciBleGFtaW5pbmcgdGhlIGRpc3RyaWJ1dGlvbnMgb2YgYW5kIGNvcnJlbGF0aW9ucyBhbW9uZyBzZXZlcmFsIHZhcmlhYmxlcy4NCg0KYGBge3IgbWVzc2FnZSA9IEZ9DQppY29uICU+JSBkcGx5cjo6c2VsZWN0KEljb25pY2l0eSwgU0VSLCBDb3J0ZXNlSW1hZywgU3lzdCwgTG9nMTBGcmVxKSAlPiUNCiAgZ2dwYWlycyhheGlzTGFiZWxzID0gIm5vbmUiLCBwcm9ncmVzcyA9IEYsDQogICAgICAgICAgbG93ZXIgPSBsaXN0KGNvbnRpbnVvdXMgPSB3cmFwKCJwb2ludHMiLCBhbHBoYSA9IDAuMykpKQ0KYGBgDQoNCg0KIyBBbiB1bnN0YW5kYXJkaXplZCBtdWx0aXBsZSByZWdyZXNzaW9uIG1vZGVsDQoNCk5vdyB3ZSdsbCBzcGVjaWZ5IGFuZCBydW4gKCdmaXQnKSB0aGUgbW9kZWwgZnJvbSBDaC4gNiBiYXNlZCBvbiB1bnN0YW5kYXJkaXplZCB2YXJpYWJsZXMuDQoNCmBgYHtyfQ0KaWNvbl9tZGwgPC0gbG0oSWNvbmljaXR5IH4gU0VSICsgQ29ydGVzZUltYWcgKyBTeXN0ICsgTG9nMTBGcmVxLCBkYXRhID0gaWNvbikNCg0Kc3VtbWFyeShpY29uX21kbCkNCmBgYA0KDQpSZW1lbWJlciwgdGhlcmUgYXJlIHRvb2xzIGZvciBtb3JlICdjbGVhbmx5JyBleHRyYWN0aW5nIG1vZGVsIGluZm8gZnJvbSB0aGUgYGJyb29tYCBwYWNrYWdlLg0KDQpgYGB7cn0NCmdsYW5jZShpY29uX21kbCkkci5zcXVhcmVkDQpgYGANCmBgYHtyfQ0KdGlkeShpY29uX21kbCkgJT4lIGRwbHlyOjpzZWxlY3QodGVybSwgZXN0aW1hdGUpDQpgYGANCg0KIyBBIHN0YW5kYXJkaXplZCBtdWx0aXBsZSByZWdyZXNzaW9uIG1vZGVsDQoNCk5vdGU6IHRoZSBJY29uaWNpdHkgdmFyaWFibGUgaXMgcHJlLXN0YW5kYXJkaXplZC4NCmBgYHtyfQ0KaWNvbl9tZGxfeiA8LSBsbShJY29uaWNpdHkgfiBTRVJfeiArIENvcnRlc2VJbWFnX3osIGRhdGEgPSBpY29uKQ0KDQpzdW1tYXJ5KGljb25fbWRsX3opDQpgYGANCiMgQXNzdW1wdGlvbnMNCg0KV2Ugc3RhcnQgY2hlY2tpbmcgb3VyIGFzc3VtcHRpb25zIGJ5IGxvb2tpbmcgYXQgcmVzaWR1YWxzLiANCg0KQSBoaXN0b2dyYW06DQoNCmBgYHtyfQ0KaGlzdChyZXNpZChpY29uX21kbCkpDQpgYGANCk5leHQgd2UgbG9vayBhdCBhICJRUSBQbG90Iiwgd2hpY2ggcm91Z2hseSBwbG90cyB0aGUgbWVhbiBvZiB0aGUgc3RhbmRhcmRpemVkIG91dGNvbWUgdmFyaWFibGUgYW5kIHRoZSBtZWFuIG9mIHN0YW5kYXJkaXplZCBwcmVkaWN0ZWQgdmFsdWVzIGF0IHZhcmlvdXMgd2luZG93cyAocXVhbnRpbGVzKS4gV2UncmUgbG9va2luZyBmb3IgYSBzdHJhaWdodCBsaW5lLg0KDQpgYGB7cn0NCnFxbm9ybShyZXNpZChpY29uX21kbCkpDQpgYGANCk5leHQgY29tZXMgYSBwbG90IG9mIHRoZSByZXNpZHVhbHMgYW5kIHByZWRpY3RlZCB2YWx1ZXMuIFdlIGhvcGUgdG8gbm90IHNlZSBhbnkgcGF0dGVybnMgLSB0aGF0IGlzLCB3ZSBob3BlIHRvIHNlZSBhIGxhcmdlbHkgZm9ybWxlc3MgY2xvdWQgb3IgYmxvYi4NCg0KYGBge3J9DQpwbG90KGZpdHRlZChpY29uX21kbCksIHJlc2lkKGljb25fbWRsKSkNCmBgYA0KTm93IHdlJ2xsIHRha2UgYSBjbG9zZXIgbG9vayBhdCB0aGUgcHJlZGljdG9yIHZhcmlhYmxlcyB0byBzZWUgaWYgY29sbGluZWFyaXR5IG1pZ2h0IGJlIGEgY29uY2Vybi4gV2UgY2FuIHVzZSB0aGUgYHZpZigpYCBmdW5jdGlvbiBmcm9tIHRoZSBgY2FyYCBwYWNrYWdlLiBWYWx1ZXMgZ3JlYXRlciB0aGFuIDEwIGFyZSByZWFsbHkgYmFkLCBhbmQgdmFsdWVzIGdyZWF0ZXIgdGhhbiB+NCBhcmUgY29uY2VybmluZy4NCg0KYGBge3J9DQp2aWYoaWNvbl9tZGwpDQpgYGANCg0K