Alejandro de la Vega
10/8/2013
Last time we covered:
t.test(french_fries$potato, french_fries$grassy)
Welch Two Sample t-test
data: french_fries$potato and french_fries$grassy
t = 43.4, df = 879, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
6.004 6.573
sample estimates:
mean of x mean of y
6.9525 0.6642
t.test(extra~group, data=sleep)
Welch Two Sample t-test
data: extra by group
t = -1.861, df = 17.78, p-value = 0.07939
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.3655 0.2055
sample estimates:
mean in group 1 mean in group 2
0.75 2.33
t.test(extra~group, data=sleep, paired=TRUE)
Paired t-test
data: extra by group
t = -4.062, df = 9, p-value = 0.002833
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.4599 -0.7001
sample estimates:
mean of the differences
-1.58
cor.test(french_fries$potato, french_fries$grassy)
Pearson's product-moment correlation
data: french_fries$potato and french_fries$grassy
t = 3.434, df = 693, p-value = 0.0006295
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.05553 0.20179
sample estimates:
cor
0.1294
cor.test(~potato + grassy, data=french_fries)
Pearson's product-moment correlation
data: potato and grassy
t = 3.434, df = 693, p-value = 0.0006295
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.05553 0.20179
sample estimates:
cor
0.1294
x <- cor.test(~potato + grassy, data=french_fries)
str(x)
List of 9
$ statistic : Named num 3.43
..- attr(*, "names")= chr "t"
$ parameter : Named num 693
..- attr(*, "names")= chr "df"
$ p.value : num 0.000629
$ estimate : Named num 0.129
..- attr(*, "names")= chr "cor"
$ null.value : Named num 0
..- attr(*, "names")= chr "correlation"
$ alternative: chr "two.sided"
$ method : chr "Pearson's product-moment correlation"
$ data.name : chr "potato and grassy"
$ conf.int : atomic [1:2] 0.0555 0.2018
..- attr(*, "conf.level")= num 0.95
- attr(*, "class")= chr "htest"
x$statistic
t
3.434
x$p.value
[1] 0.0006295
x$estimate
cor
0.1294
rcorr(as.matrix(fries))
grassy potato buttery
grassy 1.00 0.13 0.10
potato 0.13 1.00 0.41
buttery 0.10 0.41 1.00
n
grassy potato buttery
grassy 696 695 692
potato 695 696 692
buttery 692 692 696
P
grassy potato buttery
grassy 0.0006 0.0092
potato 0.0006 0.0000
buttery 0.0092 0.0000
m <- lm(choice~condition, data=trustData)
m
Call:
lm(formula = choice ~ condition, data = trustData)
Coefficients:
(Intercept) conditionT conditionU
0.7061 -0.0249 -0.3137
levels(trustData$condition)
[1] "N" "T" "U"
summary(m)
Call:
lm(formula = choice ~ condition, data = trustData)
Residuals:
Min 1Q Median 3Q Max
-0.706 -0.681 0.294 0.319 0.608
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.7061 0.0298 23.72 < 2e-16 ***
conditionT -0.0249 0.0474 -0.53 0.6
conditionU -0.3137 0.0603 -5.20 2.9e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.466 on 481 degrees of freedom
Multiple R-squared: 0.0557, Adjusted R-squared: 0.0518
F-statistic: 14.2 on 2 and 481 DF, p-value: 1.02e-06
summary(glm(choice~condition, family=binomial, data=trustData))
Call:
glm(formula = choice ~ condition, family = binomial, data = trustData)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.565 -1.512 0.834 0.876 1.368
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.877 0.140 6.25 4.1e-10 ***
conditionT -0.117 0.220 -0.53 0.59
conditionU -1.314 0.270 -4.87 1.1e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 628.69 on 483 degrees of freedom
Residual deviance: 602.86 on 481 degrees of freedom
AIC: 608.9
Number of Fisher Scoring iterations: 4
sum_m$coefficients
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.8766 0.1402 6.251 4.089e-10
conditionT -0.1171 0.2201 -0.532 5.947e-01
conditionU -1.3138 0.2697 -4.871 1.112e-06
sum_m$aic
[1] 608.9
Sort of like transform
Problem: Want to code linearly Untrust > Neutral > Trust
sub condition value delay choice
415 1 U 30 14 1
114 3 N 22 21 1
363 9 T 14 14 1
475 10 U 18 14 0
448 6 U 14 90 0
405 10 T 22 42 1
trustData$linear <- recode(trustData$condition,
"'N' = 0; 'U' = -1; 'T' = 1", as.factor.result=F)
sub condition value delay choice linear
415 1 U 30 14 1 -1
114 3 N 22 21 1 0
363 9 T 14 14 1 1
475 10 U 18 14 0 -1
448 6 U 14 90 0 -1
405 10 T 22 42 1 1
summary(lm(choice~linear, data=trustData))
Call:
lm(formula = choice ~ linear, data = trustData)
Residuals:
Min 1Q Median 3Q Max
-0.741 -0.628 0.259 0.372 0.486
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.6277 0.0221 28.38 < 2e-16 ***
linear 0.1136 0.0315 3.61 0.00034 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.473 on 482 degrees of freedom
Multiple R-squared: 0.0263, Adjusted R-squared: 0.0243
F-statistic: 13 on 1 and 482 DF, p-value: 0.000338
ddply(trustData,.(linear), summarize, mean = mean(choice))
linear mean
1 -1 0.3924
2 0 0.7061
3 1 0.6813
trustData$UvO <- recode(trustData$condition,
"'N' = 1; 'U' = -2; 'T' = 1", as.factor.result=F)
summary(lm(choice~UvO, data=trustData))
Call:
lm(formula = choice ~ UvO, data = trustData)
Residuals:
Min 1Q Median 3Q Max
-0.696 -0.696 0.304 0.304 0.608
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.5950 0.0233 25.54 < 2e-16 ***
UvO 0.1013 0.0191 5.31 1.7e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.466 on 482 degrees of freedom
Multiple R-squared: 0.0552, Adjusted R-squared: 0.0532
F-statistic: 28.2 on 1 and 482 DF, p-value: 1.7e-07
summary(aov(choice ~ linear + Error(sub/linear), data=trustData))
Error: sub
Df Sum Sq Mean Sq
linear 1 2.27 2.27
Error: sub:linear
Df Sum Sq Mean Sq
linear 1 6.49 6.49
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
linear 1 0.9 0.919 4.37 0.037 *
Residuals 480 100.9 0.210
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(lme(choice ~ linear, random = ~ 1 | sub, trustData))
Linear mixed-effects model fit by REML
Data: trustData
AIC BIC logLik
648.8 665.5 -320.4
Random effects:
Formula: ~1 | sub
(Intercept) Residual
StdDev: 0.1339 0.4579
Fixed effects: choice ~ linear
Value Std.Error DF t-value p-value
(Intercept) 0.6168 0.04757 473 12.968 0
linear 0.1830 0.03609 473 5.071 0
Correlation:
(Intr)
linear -0.125
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-1.8012 -1.1688 0.5713 0.7159 1.5473
Number of Observations: 484
Number of Groups: 10
library(lme4)
summary(lmer(choice ~ linear + (1 | sub), family = binomial, trustData))
Generalized linear mixed model fit by the Laplace approximation
Formula: choice ~ linear + (1 | sub)
Data: trustData
AIC BIC logLik deviance
609 621 -301 603
Random effects:
Groups Name Variance Std.Dev.
sub (Intercept) 0.312 0.559
Number of obs: 484, groups: sub, 10
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.520 0.204 2.55 0.011 *
linear 0.781 0.166 4.70 2.6e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
linear -0.113
plot(mtcars$mpg, mtcars$qsec)
library(ggplot2)
plot1 <- ggplot(mtcars, aes(mpg, qsec))
plot1 + geom_point()
Scatterplot
Defaults:
plot1 + geom_line()
ggplot(mtcars, aes(mpg, qsec)) + geom_smooth(method=lm)
ggplot(mtcars, aes(mpg, qsec)) +
geom_smooth(method = lm, se = F, size = 2) +
geom_point(size = 5)
ggplot(mtcars, aes(mpg, qsec, color=factor(cyl))) +
geom_point(size = 5)
ggplot(mtcars, aes(cyl, qsec)) + stat_summary(fun.y = mean, geom = "bar")
ggplot(mtcars, aes(cyl, qsec, fill = factor(gear))) + stat_summary(fun.y = mean, geom = "bar", position = "dodge")
ggplot(mtcars, aes(cyl, qsec, fill=factor(cyl))) +
stat_summary(fun.y = mean, geom = "bar") +
stat_summary(fun.data = mean_cl_boot, geom = "errorbar", width = 0.3)
ggplot(mtcars, aes(cyl, qsec, color = factor(cyl))) +
stat_summary(fun.y = mean, geom = "point", size = 10) +
stat_summary(fun.data = mean_cl_boot, geom = "pointrange", size = 3)
ggplot(ToothGrowth, aes(factor(dose), len, color = supp, group=supp)) +
stat_summary(fun.y = mean, geom="point", size=5) +
stat_summary(fun.y=mean, geom="line", size=3)
ggplot(mtcars, aes(cyl, qsec, color = factor(cyl))) +
stat_summary(fun.y = mean, geom = "point", size = 10) +
geom_violin(alpha=0.2, fill = "black")
ggplot(mtcars, aes(qsec)) + geom_histogram()
ggplot(mtcars, aes(qsec, fill = factor(cyl))) +
geom_density(alpha = 0.5)
p2 + theme_bw() + xlab("Quarter Second Time") + theme(legend.position="none",panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.title.x = element_text(size=15,vjust=.3))
p2 = ggplot(mtcars, aes(mpg, qsec, color=factor(cyl))) + geom_point(size =4) + geom_smooth(method=lm, se=F, size=2)
p2
p2 + facet_grid(. ~ cyl)
ggplot(mtcars, aes(mpg, qsec, size=wt, color=wt)) + geom_point()