Getting ready: load packages and some data.
library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
-- Attaching packages -------------------------- tidyverse 1.3.1 --
v ggplot2 3.3.5 v purrr 0.3.4
v tibble 3.1.4 v dplyr 1.0.7
v tidyr 1.1.3 v stringr 1.4.0
v readr 2.0.1 v forcats 0.5.1
-- Conflicts ----------------------------- tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
library(effsize)
library(pwr)
chem <- read_csv('winter_2016_senses_valence.csv') %>%
filter(Modality %in% c('Taste', 'Smell'))
Rows: 405 Columns: 3
-- Column specification -------------------------------------------
Delimiter: ","
chr (2): Word, Modality
dbl (1): Val
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
We’re going to simulate a large (n = 100000) set of data as a population and then take some small (n = 30) samples from it.
set.seed(100)
x <- round(rnorm(100000, mean = 50, sd = 10), 0)
mean(x)
[1] 50.01609
sd(x)
[1] 9.995032
range(x)
[1] 5 94
hist(x)
Now to sample:
d <- tibble(df01 = sample(x, 30),
df02 = sample(x, 30),
df03 = sample(x, 30),
df04 = sample(x, 30),
df05 = sample(x, 30),
df06 = sample(x, 30),
df07 = sample(x, 30),
df08 = sample(x, 30),
df09 = sample(x, 30),
df10 = sample(x, 30))
and the plot:
d %>% pivot_longer(df01:df10, names_to = "sample", values_to = "score") %>%
group_by(sample) %>%
summarise(mean = mean(score),
n = n(),
sd = sd(score),
se = sd/sqrt(n),
CI = 1.96*se) %>%
ggplot(aes(y = mean, x = sample))+
geom_hline(yintercept = 50, linetype = 2)+
geom_pointrange(aes(x = sample, y = mean, ymin = mean-CI, ymax = mean+CI),
color = "blue")+
scale_y_continuous(limits = c(35, 65), breaks = c(40, 50, 60))+
theme_bw()+
coord_flip()
set.seed(42)
t.test(rnorm(10), rnorm(10))
Welch Two Sample t-test
data: rnorm(10) and rnorm(10)
t = 1.2268, df = 13.421, p-value = 0.241
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.5369389 1.9584459
sample estimates:
mean of x mean of y
0.5472968 -0.1634567
t.test(rnorm(10), rnorm(10))
Welch Two Sample t-test
data: rnorm(10) and rnorm(10)
t = 0.36582, df = 17.977, p-value = 0.7188
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.8814711 1.2531201
sample estimates:
mean of x mean of y
-0.1780795 -0.3639041
t.test(rnorm(10), rnorm(10))
Welch Two Sample t-test
data: rnorm(10) and rnorm(10)
t = -0.082117, df = 16.257, p-value = 0.9356
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.0340503 0.9568318
sample estimates:
mean of x mean of y
-0.02021535 0.01839391
t.test(rnorm(10), rnorm(10))
Welch Two Sample t-test
data: rnorm(10) and rnorm(10)
t = 2.3062, df = 17.808, p-value = 0.03335
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.06683172 1.44707262
sample estimates:
mean of x mean of y
0.5390768 -0.2178754
set.seed(42)
t.test(rnorm(10, mean = 1), rnorm(10, mean = 0))
Welch Two Sample t-test
data: rnorm(10, mean = 1) and rnorm(10, mean = 0)
t = 2.9527, df = 13.421, p-value = 0.01089
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.4630611 2.9584459
sample estimates:
mean of x mean of y
1.5472968 -0.1634567
t.test(rnorm(10, mean = 1), rnorm(10, mean = 0))
Welch Two Sample t-test
data: rnorm(10, mean = 1) and rnorm(10, mean = 0)
t = 2.3345, df = 17.977, p-value = 0.03137
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.1185289 2.2531201
sample estimates:
mean of x mean of y
0.8219205 -0.3639041
t.test(rnorm(10, mean = 1), rnorm(10, mean = 0))
Welch Two Sample t-test
data: rnorm(10, mean = 1) and rnorm(10, mean = 0)
t = 2.0448, df = 16.257, p-value = 0.05742
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.03405031 1.95683178
sample estimates:
mean of x mean of y
0.97978465 0.01839391
t.test(rnorm(10, mean = 1), rnorm(10, mean = 0))
Welch Two Sample t-test
data: rnorm(10, mean = 1) and rnorm(10, mean = 0)
t = 5.3528, df = 17.808, p-value = 4.513e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
1.066832 2.447073
sample estimates:
mean of x mean of y
1.5390768 -0.2178754
pwr.r.test(r = .30, sig.level = .05, power = .80, alternative = "two.sided")
approximate correlation power calculation (arctangh transformation)
n = 84.07364
r = 0.3
sig.level = 0.05
power = 0.8
alternative = two.sided
pwr.t.test(d = 1, power = .8, type = "two.sample", alternative = "two.sided")
Two-sample t test power calculation
n = 16.71472
d = 1
sig.level = 0.05
power = 0.8
alternative = two.sided
NOTE: n is number in *each* group
pwr.anova.test(k = 3, f = .25, power = .8)
Balanced one-way analysis of variance power calculation
k = 3
n = 52.3966
f = 0.25
sig.level = 0.05
power = 0.8
NOTE: n is number in each group
pwr.f2.test(u = 6, f2 = 0.15, power = .8) #whole model, based on R2
Multiple regression power calculation
u = 6
v = 90.30998
f2 = 0.15
sig.level = 0.05
power = 0.8
chem %>% group_by(Modality) %>%
summarise(mean = mean(Val),
n = n(),
sd = sd(Val),
se = sd/sqrt(n),
CI = 1.96*se)
chem %>% group_by(Modality) %>%
summarise(mean = mean(Val),
n = n(),
sd = sd(Val),
se = sd/sqrt(n),
CI = 1.96*se) %>%
ggplot(aes(x = Modality, y = mean))+
geom_point()+
geom_errorbar(aes(ymin = mean-CI, ymax = mean+CI))+
theme_bw()
Or we can try a different way, using stat_summary()
library(Hmisc)
Loading required package: lattice
Loading required package: survival
Loading required package: Formula
Registered S3 method overwritten by 'htmlwidgets':
method from
print.htmlwidget tools:rstudio
Registered S3 method overwritten by 'data.table':
method from
print.data.table
Attaching package: ‘Hmisc’
The following objects are masked from ‘package:dplyr’:
src, summarize
The following objects are masked from ‘package:base’:
format.pval, units
chem %>% ggplot(aes(x = Modality, y = Val))+
stat_summary(fun.data = "mean_cl_normal",
size= 0.3, geom = "pointrange")+
theme_bw()
NA
cohen.d(Val ~ Modality, data = chem)
Cohen's d
d estimate: -1.070784 (large)
95 percent confidence interval:
lower upper
-1.5955866 -0.5459824
library(TOSTER)
chem_summary <- chem %>% group_by(Modality) %>%
summarise(mean = mean(Val),
n = n(),
sd = sd(Val),
se = sd/sqrt(n),
CI = 1.96*se)
TOSTtwo(m1 = chem_summary$mean[1], m2 = chem_summary$mean[2],
sd1 = chem_summary$sd[1], sd2 = chem_summary$sd[2],
n1 = chem_summary$n[1], n2 = chem_summary$n[2],
low_eqbound_d = -.4, high_eqbound_d = .4, var.equal = F)
TOST results:
t-value lower bound: -2.60 p-value lower bound: 0.994
t-value upper bound: -5.78 p-value upper bound: 0.0000003
degrees of freedom : 44.94
Equivalence bounds (Cohen's d):
low eqbound: -0.4
high eqbound: 0.4
Equivalence bounds (raw scores):
low eqbound: -0.128
high eqbound: 0.128
TOST confidence interval:
lower bound 90% CI: -0.472
upper bound 90% CI: -0.202
NHST confidence interval:
lower bound 95% CI: -0.499
upper bound 95% CI: -0.175
Equivalence Test Result:
The equivalence test was non-significant, t(44.94) = -2.600, p = 0.994, given equivalence bounds of -0.128 and 0.128 (on a raw scale) and an alpha of 0.05.
Null Hypothesis Test Result:
The null hypothesis test was significant, t(44.94) = -4.192, p = 0.000128, given an alpha of 0.05.
Based on the equivalence test and the null-hypothesis test combined, we can conclude that the observed effect is statistically different from zero and statistically not equivalent to zero.
We’ll now turn to some of the examples from Winter Ch. 11
library(broom)
icon <- read_csv('perry_winter_2017_iconicity.csv')
Rows: 3001 Columns: 8
-- Column specification -------------------------------------------
Delimiter: ","
chr (2): Word, POS
dbl (6): SER, CorteseImag, Conc, Syst, Freq, Iconicity
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
icon <- icon %>% mutate(SER_z = scale(SER),
CorteseImag_z = scale(CorteseImag),
Syst_z = scale(Syst),
Freq_z = scale(Freq))
Now to run a model and clean up the output…
icon_mdl_z <- lm(Iconicity ~ SER_z + CorteseImag_z +
Syst_z + Freq_z, data = icon)
tidy(icon_mdl_z) %>%
mutate(p.value = base::format.pval(p.value, 4),
estimate = round(estimate, 2),
std.error = round(std.error, 2),
statistic = round(statistic, 2))
plotting time!
mycoefs <- tidy(icon_mdl_z, conf.int = T) %>%
filter(term != '(Intercept)')
mycoefs %>% ggplot(aes(x = reorder(term, estimate), y = estimate))+
geom_point()+
geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = .2)+
geom_hline(yintercept = 0, linetype = 2)+
labs(x = NULL, y = 'Beta coefficient estimate')+
coord_flip()+
theme_minimal()