Michael Crawford
June 4th, 2015
x <- c(23, 33, 14, 15, 42, 28, 33, 45, 23, 34, 39,
21, 36, 23, 34, 36, 25, 9, 11, 19, 35, 24,
31, 29, 16, 23, 34, 24, 38, 15, 13, 35, 28)
summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
9.00 21.00 28.00 26.91 34.00 45.00
mean(x)
[1] 26.90909
sd(x)
[1] 9.494616
quantile(x,c(.1,.9),type = 1)
10% 90%
14 38
hist(x)
library(TeachingDemos)
z.test(x,mu = mean(x),sd = sd(x),conf.level = 0.9)
One Sample z-test
data: x
z = 0, n = 33.0000, Std. Dev. = 9.4946, Std. Dev. of the sample
mean = 1.6528, p-value = 1
alternative hypothesis: true mean is not equal to 26.90909
90 percent confidence interval:
24.19048 29.62771
sample estimates:
mean of x
26.90909
Note we had to load the 'TeachingDemos' library because r does not have z-tests by default since normally t-tests are used
z.test(x,mu = mean(x),sd = sd(x), conf.level = 0.9, alternative = 'greater')
One Sample z-test
data: x
z = 0, n = 33.0000, Std. Dev. = 9.4946, Std. Dev. of the sample
mean = 1.6528, p-value = 0.5
alternative hypothesis: true mean is greater than 26.90909
90 percent confidence interval:
24.79094 Inf
sample estimates:
mean of x
26.90909
All you have to do is change 'greater' to 'less' to get the other side of the interval
z.test(x,mu = mean(x),sd = sd(x), conf.level = 0.9, alternative = 'less')
RISC_I <- c(140,120,176,288,992,144,2736,2796,752,17720,96)
t.test(RISC_I,mu = mean(RISC_I),conf.level = 0.9)
One Sample t-test
data: RISC_I
t = 0, df = 10, p-value = 1
alternative hypothesis: true mean is not equal to 2360
90 percent confidence interval:
-478.3372 5198.3372
sample estimates:
mean of x
2360
Z8002 <- c(126,180,141,374,1091,302,1368,1398,602,17720,240)
t.test(RISC_I,Z8002,conf.level=0.9,paired = TRUE)
Paired t-test
data: RISC_I and Z8002
t = 1.2531, df = 10, p-value = 0.2387
alternative hypothesis: true difference in means is not equal to 0
90 percent confidence interval:
-98.11371 537.75007
sample estimates:
mean of the differences
219.8182
Note that in this case the p-value is the alpha with which we can have confidence. So this is saying we can only be sure with a confidence of 76% (i.e. at a confidence of 0.76 the interval will no longer pass through 0)
Data Frames (along with vectors) are one of the fundamental data structures in R. Think of it as an excel worksheet where you have rows and columns.
DiskIOs <- c(14,16,27,42,39,50,83)
CPUTime <- c(2,5,7,9,10,13,20)
data <- data.frame(DiskIOs,CPUTime)
data
DiskIOs CPUTime
1 14 2
2 16 5
3 27 7
4 42 9
5 39 10
6 50 13
7 83 20
One of the nice things about them is you can do operations on columns, rows or even the entire data frame.
log10(data$DiskIOs)
[1] 1.146128 1.204120 1.431364 1.623249 1.591065 1.698970 1.919078
log10(data)
DiskIOs CPUTime
1 1.146128 0.3010300
2 1.204120 0.6989700
3 1.431364 0.8450980
4 1.623249 0.9542425
5 1.591065 1.0000000
6 1.698970 1.1139434
7 1.919078 1.3010300
fit <- lm(DiskIOs ~ CPUTime, data)
fit$coefficients
(Intercept) CPUTime
1.137500 3.985417
confint(fit,level=0.9)
5 % 95 %
(Intercept) -5.556274 7.831274
CPUTime 3.369950 4.600884
predict(fit,data.frame(CPUTime=c(40)),
interval='prediction',level=0.9)
fit lwr upr
1 160.5542 139.5045 181.6038
predict(fit,data.frame(CPUTime=c(40)),
interval='confidence',level=0.9)
fit lwr upr
1 160.5542 141.4449 179.6634
summary(fit)
Call:
lm(formula = DiskIOs ~ CPUTime, data = data)
Residuals:
1 2 3 4 5 6 7
4.892 -5.065 -2.035 4.994 -1.992 -2.948 2.154
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.1375 3.3219 0.342 0.746
CPUTime 3.9854 0.3054 13.048 4.72e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.381 on 5 degrees of freedom
Multiple R-squared: 0.9715, Adjusted R-squared: 0.9658
F-statistic: 170.3 on 1 and 5 DF, p-value: 4.716e-05
plot(DiskIOs ~ CPUTime,data=data)
abline(fit)
plot(data$CPUTime,fit$residuals, xlab="CPU Time", ylab="Residuals")
data <- read.csv("table20_1.csv")
data
Processor Size
1 R 144
2 R 120
3 R 176
4 R 288
5 R 144
6 V 101
7 V 144
8 V 211
9 V 288
10 V 72
11 Z 130
12 Z 180
13 Z 141
14 Z 374
15 Z 302
data.anova <- aov(Size~Processor,data)
summary(data.anova)
Df Sum Sq Mean Sq F value Pr(>F)
Processor 2 10992 5496 0.699 0.516
Residuals 12 94365 7864
plot(TukeyHSD(data.anova,'Processor',conf.level=.9))
library(agricolae)
HSD.test(data.anova,"Processor",alpha=0.1,
console=TRUE)
Study: data.anova ~ "Processor"
HSD Test for Size
Mean Square Error: 7863.767
Processor, means
Size std r Min Max
R 174.4 66.54923 5 120 288
V 163.2 87.19920 5 72 288
Z 225.4 107.51186 5 130 374
alpha: 0.1 ; Df Error: 12
Critical Value of Studentized Range: 3.203947
Honestly Significant Difference: 127.062
Means with the same letter are not significantly different.
Groups, Treatments and means
a Z 225.4
a R 174.4
a V 163.2
data <- read.csv("table18_11.csv")
data
Processor Workload Time
1 A I 41.16
2 A I 39.02
3 A I 42.56
4 A J 51.50
5 A J 52.50
6 A J 50.50
7 B I 63.17
8 B I 59.25
9 B I 64.23
10 B J 48.08
11 B J 48.98
12 B J 47.10
data.anova <- aov(Time~Processor*Workload,data)
summary(data.anova)
Df Sum Sq Mean Sq F value Pr(>F)
Processor 1 239.1 239.1 80.086 1.93e-05 ***
Workload 1 9.6 9.6 3.213 0.111
Processor:Workload 1 459.4 459.4 153.853 1.67e-06 ***
Residuals 8 23.9 3.0
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(TukeyHSD(data.anova,'Workload',
conf.level=.9))
HSD.test(data.anova,"Workload",alpha=0.1,
console=TRUE)
Study: data.anova ~ "Workload"
HSD Test for Time
Mean Square Error: 2.9861
Workload, means
Time std r Min Max
I 51.56500 11.839498 6 39.02 64.23
J 49.77667 2.077861 6 47.10 52.50
alpha: 0.1 ; Df Error: 8
Critical Value of Studentized Range: 2.629798
Honestly Significant Difference: 1.855235
Means with the same letter are not significantly different.
Groups, Treatments and means
a I 51.56
a J 49.78
HSD.test(data.anova,"Processor",alpha=0.1,
console=TRUE)
Study: data.anova ~ "Processor"
HSD Test for Time
Mean Square Error: 2.9861
Processor, means
Time std r Min Max
A 46.20667 5.940931 6 39.02 52.50
B 55.13500 7.955272 6 47.10 64.23
alpha: 0.1 ; Df Error: 8
Critical Value of Studentized Range: 2.629798
Honestly Significant Difference: 1.855235
Means with the same letter are not significantly different.
Groups, Treatments and means
a B 55.14
b A 46.21
plot(TukeyHSD(data.anova,"Processor:Workload",
conf.level=.9))
tx <- with(data,interaction(Processor,Workload))
HSD.test(aov(Time~tx,data),"tx",alpha=0.1,
console=TRUE)
Study: aov(Time ~ tx, data) ~ "tx"
HSD Test for Time
Mean Square Error: 2.9861
tx, means
Time std r Min Max
A.I 40.91333 1.7828442 3 39.02 42.56
A.J 51.50000 1.0000000 3 50.50 52.50
B.I 62.21667 2.6233058 3 59.25 64.23
B.J 48.05333 0.9402836 3 47.10 48.98
alpha: 0.1 ; Df Error: 8
Critical Value of Studentized Range: 3.834156
Honestly Significant Difference: 3.825263
Means with the same letter are not significantly different.
Groups, Treatments and means
a B.I 62.22
b A.J 51.5
b B.J 48.05
c A.I 40.91
?sum
example(sum)
install.packages("agricolae")
Due Date: Thursday June 11th, 2015