Getting ready:
library(tidyverse)
library(broom)
senses <- read_csv("winter_2016_senses_valence.csv")
First some histograms:
senses %>% ggplot(aes(x = Val))+
geom_histogram(binwidth = .25)+
theme_bw()+
facet_wrap(~Modality, nrow = 1)
And some dotplots:
senses %>% ggplot(aes(x = Modality, y = Val, shape = Modality, color = Modality)) +
geom_point(alpha = .3)+
stat_summary(fun.y = "mean", fun.ymin = "mean", fun.ymax= "mean", size= 0.3, geom = "crossbar") +
theme_bw()
`fun.y` is deprecated. Use `fun` instead.`fun.ymin` is deprecated. Use `fun.min` instead.`fun.ymax` is deprecated. Use `fun.max` instead.
Finally, some summary stats by modality:
senses %>% group_by(Modality) %>% summarise(mean = mean(Val),
sd = sd(Val),
min = min(Val),
max = max(Val))
Let’s create a smaller dataset with only Taste and Smell modalities:
chem <- filter(senses, Modality %in% c('Taste', 'Smell'))
table(chem$Modality)
Smell Taste
25 47
Now for a regression model:
chem_mdl <- lm(Val ~ Modality, data = chem)
summary(chem_mdl)
Call:
lm(formula = Val ~ Modality, data = chem)
Residuals:
Min 1Q Median 3Q Max
-0.99315 -0.20870 0.04343 0.19115 0.62788
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.47101 0.06297 86.889 < 2e-16 ***
ModalityTaste 0.33711 0.07793 4.326 4.95e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3148 on 70 degrees of freedom
Multiple R-squared: 0.2109, Adjusted R-squared: 0.1997
F-statistic: 18.71 on 1 and 70 DF, p-value: 4.951e-05
We can infer that Smell is the reference category. Let’s play with a couple different ways to do treatment coding:
chem <- chem %>% mutate(Mod01 = if_else(Modality == "Taste", 1, 0),
Mod10 = if_else(Modality == "Smell", 1, 0),
Modality = factor(Modality),
ModRe = relevel(Modality, ref = 'Taste'))
And now for some models…
chem_mdl01 <- lm(Val ~ Mod01, data = chem)
chem_mdl10 <- lm(Val ~ Mod10, data = chem)
chem_mdlFac <- lm(Val ~ Modality, data = chem)
chem_mdlRe <- lm(Val ~ ModRe, data = chem)
Take a peek at the summaries and coefficient values…
#fit summary
glance(chem_mdl01)$r.squared
[1] 0.2109257
glance(chem_mdl10)$r.squared
[1] 0.2109257
glance(chem_mdlFac)$r.squared
[1] 0.2109257
glance(chem_mdlRe)$r.squared
[1] 0.2109257
#coefficients
tidy(chem_mdl01)$estimate
[1] 5.4710116 0.3371123
tidy(chem_mdl10)$estimate
[1] 5.8081239 -0.3371123
tidy(chem_mdlFac)$estimate
[1] 5.4710116 0.3371123
tidy(chem_mdlRe)$estimate
[1] 5.8081239 -0.3371123
Now let’s try Modality with sum contrasts:
chem <- mutate(chem, ModSum = Modality)
contrasts(chem$ModSum) <- contr.sum(2)
chem_mdlSum <- lm(Val ~ ModSum, data = chem)
summary(chem_mdlSum)
Call:
lm(formula = Val ~ ModSum, data = chem)
Residuals:
Min 1Q Median 3Q Max
-0.99315 -0.20870 0.04343 0.19115 0.62788
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.63957 0.03897 144.729 < 2e-16 ***
ModSum1 -0.16856 0.03897 -4.326 4.95e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3148 on 70 degrees of freedom
Multiple R-squared: 0.2109, Adjusted R-squared: 0.1997
F-statistic: 18.71 on 1 and 70 DF, p-value: 4.951e-05
Note that R^2, F-stat, etc. are the same even though intercept and slope differ.
t.test(Val ~ Modality, data = chem, var.equal = T)
Two Sample t-test
data: Val by Modality
t = -4.3257, df = 70, p-value = 4.951e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.4925442 -0.1816804
sample estimates:
mean in group Smell mean in group Taste
5.471012 5.808124
Same p-value as regression model!
Now we’ll take a look at the whole dataset, which has 5 categories for sense modalities.
sense_all <- lm(Val ~ Modality, data = senses)
summary(sense_all)
Call:
lm(formula = Val ~ Modality, data = senses)
Residuals:
Min 1Q Median 3Q Max
-0.99315 -0.16482 -0.02158 0.15920 1.15734
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.57966 0.01889 295.308 < 2e-16 ***
ModalitySmell -0.10865 0.05643 -1.925 0.0549 .
ModalitySound -0.17447 0.03758 -4.643 4.66e-06 ***
ModalityTaste 0.22846 0.04314 5.296 1.96e-07 ***
ModalityTouch -0.04523 0.03737 -1.210 0.2269
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2659 on 400 degrees of freedom
Multiple R-squared: 0.1455, Adjusted R-squared: 0.137
F-statistic: 17.03 on 4 and 400 DF, p-value: 6.616e-13
Now for the ANOVA approach:
aov_senses <- aov(Val ~ Modality, data = senses)
summary(aov_senses)
Df Sum Sq Mean Sq F value Pr(>F)
Modality 4 4.815 1.2036 17.03 6.62e-13 ***
Residuals 400 28.274 0.0707
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Same model fit - F-stat and p.val are identical.
The emmeans package is helpful for extracting model contrasts of interest when working with categorical variables.
library(emmeans)
We’ll first use the emmeans() function to get the model-based estimates for each category, then use that information to make pairwise comparisons among the categories.
sense_all_emm <- emmeans(sense_all, "Modality")
sense_all_emm
Modality emmean SE df lower.CL upper.CL
Sight 5.58 0.0189 400 5.54 5.62
Smell 5.47 0.0532 400 5.37 5.58
Sound 5.41 0.0325 400 5.34 5.47
Taste 5.81 0.0388 400 5.73 5.88
Touch 5.53 0.0322 400 5.47 5.60
Confidence level used: 0.95
And then…
pairs(sense_all_emm)
contrast estimate SE df t.ratio p.value
Sight - Smell 0.1087 0.0564 400 1.925 0.3055
Sight - Sound 0.1745 0.0376 400 4.643 <.0001
Sight - Taste -0.2285 0.0431 400 -5.296 <.0001
Sight - Touch 0.0452 0.0374 400 1.210 0.7454
Smell - Sound 0.0658 0.0623 400 1.056 0.8287
Smell - Taste -0.3371 0.0658 400 -5.122 <.0001
Smell - Touch -0.0634 0.0622 400 -1.020 0.8461
Sound - Taste -0.4029 0.0506 400 -7.965 <.0001
Sound - Touch -0.1292 0.0458 400 -2.824 0.0397
Taste - Touch 0.2737 0.0504 400 5.427 <.0001
P value adjustment: tukey method for comparing a family of 5 estimates
You can do the same thing with the ANOVA model, too!