RADI 5007 – HOMEWORK SET #2 - 2014
1.Run the following analysis on the data in “Prob2_1.txt”:
• Read the data into a table and show them.
setwd("/Users/mishaque/Desktop/Statistics Course/HW")
Prob2_1<-read.table("Prob2_1.txt",header=T)
Prob2_1
## Observation Subject Task Valence Recall
## 1 1 Jim Free Neg 8
## 2 2 Jim Free Neu 9
## 3 3 Jim Free Pos 5
## 4 4 Jim Cued Neg 7
## 5 5 Jim Cued Neu 9
## 6 6 Jim Cued Pos 10
## 7 7 Victor Free Neg 12
## 8 8 Victor Free Neu 13
## 9 9 Victor Free Pos 14
## 10 10 Victor Cued Neg 16
## 11 11 Victor Cued Neu 13
## 12 12 Victor Cued Pos 14
## 13 13 Faye Free Neg 13
## 14 14 Faye Free Neu 13
## 15 15 Faye Free Pos 12
## 16 16 Faye Cued Neg 15
## 17 17 Faye Cued Neu 16
## 18 18 Faye Cued Pos 14
## 19 19 Ron Free Neg 12
## 20 20 Ron Free Neu 14
## 21 21 Ron Free Pos 15
## 22 22 Ron Cued Neg 17
## 23 23 Ron Cued Neu 18
## 24 24 Ron Cued Pos 20
## 25 25 Jason Free Neg 6
## 26 26 Jason Free Neu 7
## 27 27 Jason Free Pos 9
## 28 28 Jason Cued Neg 4
## 29 29 Jason Cued Neu 9
## 30 30 Jason Cued Pos 10
• Report the means and the number of subjects/cell.
summary(Prob2_1)
## Observation Subject Task Valence Recall
## Min. : 1.00 Faye :6 Cued:15 Neg:10 Min. : 4.0
## 1st Qu.: 8.25 Jason :6 Free:15 Neu:10 1st Qu.: 9.0
## Median :15.50 Jim :6 Pos:10 Median :12.5
## Mean :15.50 Ron :6 Mean :11.8
## 3rd Qu.:22.75 Victor:6 3rd Qu.:14.0
## Max. :30.00 Max. :20.0
aov.Prob2_1=aov(Recall~(Task*Valence)+Error(Subject/(Task*Valence)),Prob2_1)
summary(aov.Prob2_1)
##
## Error: Subject
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 4 349 87.3
##
## Error: Subject:Task
## Df Sum Sq Mean Sq F value Pr(>F)
## Task 1 30.0 30.00 7.35 0.054 .
## Residuals 4 16.3 4.08
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: Subject:Valence
## Df Sum Sq Mean Sq F value Pr(>F)
## Valence 2 9.8 4.90 1.46 0.29
## Residuals 8 26.9 3.36
##
## Error: Subject:Task:Valence
## Df Sum Sq Mean Sq F value Pr(>F)
## Task:Valence 2 1.4 0.70 0.29 0.76
## Residuals 8 19.3 2.41
print(model.tables(aov.Prob2_1,"means",digits=3))
## Tables of means
## Grand mean
##
## 11.8
##
## Task
## Task
## Cued Free
## 12.8 10.8
##
## Valence
## Valence
## Neg Neu Pos
## 11.0 12.1 12.3
##
## Task:Valence
## Valence
## Task Neg Neu Pos
## Cued 11.8 13.0 13.6
## Free 10.2 11.2 11.0
attach(Prob2_1)
tapply(Recall,list(Task),length)
## Cued Free
## 15 15
tapply(Recall,list(Valence),length)
## Neg Neu Pos
## 10 10 10
tapply(Recall,list(Task,Valence),length)
## Neg Neu Pos
## Cued 5 5 5
## Free 5 5 5
• Produce a graphical summary (boxplot) of the means of the 6 cells.
attach(Prob2_1)
## The following objects are masked from Prob2_1 (position 3):
##
## Observation, Recall, Subject, Task, Valence
boxplot(Recall~Task*Valence,xlab="Task.Valence",ylab="Recall",main="Recall vs Task+Valence")
2.a. Using R, determine if there are significant differences between the various groups and whether there are significant interactions between the treatments. Show your results and explain them.
setwd("/Users/mishaque/Desktop/Statistics Course/HW")
schiz<-read.table("schiz.txt", header=T)
schiz
## Dosage Therapy Behavior
## 1 absent Bmod 25
## 2 absent Bmod 22
## 3 absent Bmod 23
## 4 absent Bmod 25
## 5 absent Bmod 24
## 6 absent Pyscho 27
## 7 absent Pyscho 25
## 8 absent Pyscho 26
## 9 absent Pyscho 24
## 10 absent Pyscho 25
## 11 absent Group 20
## 12 absent Group 25
## 13 absent Group 19
## 14 absent Group 21
## 15 absent Group 18
## 16 absent Nondir 19
## 17 absent Nondir 22
## 18 absent Nondir 20
## 19 absent Nondir 17
## 20 absent Nondir 22
## 21 low Bmod 25
## 22 low Bmod 21
## 23 low Bmod 20
## 24 low Bmod 22
## 25 low Bmod 21
## 26 low Pyscho 25
## 27 low Pyscho 23
## 28 low Pyscho 19
## 29 low Pyscho 24
## 30 low Pyscho 20
## 31 low Group 22
## 32 low Group 21
## 33 low Group 25
## 34 low Group 26
## 35 low Group 22
## 36 low Nondir 23
## 37 low Nondir 27
## 38 low Nondir 23
## 39 low Nondir 26
## 40 low Nondir 28
## 41 high Bmod 22
## 42 high Bmod 30
## 43 high Bmod 26
## 44 high Bmod 28
## 45 high Bmod 20
## 46 high Pyscho 28
## 47 high Pyscho 27
## 48 high Pyscho 24
## 49 high Pyscho 21
## 50 high Pyscho 25
## 51 high Group 27
## 52 high Group 25
## 53 high Group 28
## 54 high Group 29
## 55 high Group 24
## 56 high Nondir 20
## 57 high Nondir 24
## 58 high Nondir 21
## 59 high Nondir 24
## 60 high Nondir 26
names(schiz)
## [1] "Dosage" "Therapy" "Behavior"
summary(schiz)
## Dosage Therapy Behavior
## absent:20 Bmod :15 Min. :17.0
## high :20 Group :15 1st Qu.:21.0
## low :20 Nondir:15 Median :24.0
## Pyscho:15 Mean :23.5
## 3rd Qu.:25.2
## Max. :30.0
attach(schiz)
anova(lm(Behavior~Dosage*Therapy))
## Analysis of Variance Table
##
## Response: Behavior
## Df Sum Sq Mean Sq F value Pr(>F)
## Dosage 2 66.5 33.3 5.69 0.0061 **
## Therapy 3 14.9 5.0 0.85 0.4755
## Dosage:Therapy 6 156.8 26.1 4.47 0.0011 **
## Residuals 48 280.8 5.8
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#There is a significant difference between the dosage groups (p =0.006) but not between the therapy groups (p=0.475). In others words, dosage significantly affects the behavior (results) but therapy does not. There is significant interaction between dosage and therapy (p=0.001).
b. Make a box & whiskers graph depicting the results. What conclusions can you draw from your analysis?
attach(schiz)
## The following objects are masked from schiz (position 3):
##
## Behavior, Dosage, Therapy
schiz$Dosage<-as.factor(schiz$Dosage)
schiz<-data.frame(schiz)
schiz$Dosage<-ordered(schiz$Dosage,levels=c("absent","low","high"))
boxplot(Behavior~Therapy*Dosage,data=schiz,par(mar=c(12,5,4,2)+0.1),main="Behavior vs Therapy.Dosage",ylab="Behavior",las=2)
mtext("Therapy.Dosage",side=1,line=7)
#The boxplot shows dosage appears to significantly affect behavior. Higher dosages generally result in higher behavioral values.
3.a. Use R to determine if there is a significant between-groups difference in the Dandruff.txt data.
setwd("/Users/mishaque/Desktop/Statistics Course/HW")
dand<-read.table("dand.txt",header=T)
dand
## Groups MeasuredValue
## 1 A 26
## 2 A 25
## 3 A 29
## 4 A 21
## 5 A 20
## 6 A 18
## 7 B 24
## 8 B 17
## 9 B 16
## 10 B 13
## 11 B 21
## 12 B 19
## 13 C 32
## 14 C 34
## 15 C 29
## 16 C 19
## 17 C 27
## 18 C 28
## 19 D 23
## 20 D 29
## 21 D 26
## 22 D 20
## 23 D 24
## 24 D 25
attach(dand)
anova(lm(MeasuredValue~Groups))
## Analysis of Variance Table
##
## Response: MeasuredValue
## Df Sum Sq Mean Sq F value Pr(>F)
## Groups 3 297 99.2 5.79 0.0051 **
## Residuals 20 343 17.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#There is a significant between-groups difference, with p=0.005107.
b. Create a vertical box and whisker graph of the Dandruff.txt data.
attach(dand)
## The following objects are masked from dand (position 3):
##
## Groups, MeasuredValue
boxplot(MeasuredValue~Groups,main="Measured Value of Dandruff vs Group", xlab="Groups",ylab="Measured Value")
4.Data on the antennae lengths of a sample of 32 woodlice were used to determine the expected frequency at each of the lengths encountered as shown in the table, below.
a. Plot the cumulative frequencies (observed and estimated) against each other and interpret what it tells you about normality of the data.
obs<-c(2,5,9,12,16,23,26,29,31,32)
est<-c(1.3,3.7,7.5,12.4,17.8,22.7,26.5,28.9,30.2,30.8)
plot(obs,est,main="Estimated vs Observed Cumulative Frequencies", ylab="Estimated Cumulative Frequency", xlab="Cumulative Observed Frequency")
#Upon visual inspection, the data appears to have a somewhat sigmoidal distribution. It does seem to be fairly normally distributed though.
b. Analyze the data more formally with the Kolomogorov-Smirnov test and explain your results.
ks.test(obs,est,"pnorm")
##
## Two-sample Kolmogorov-Smirnov test
##
## data: obs and est
## D = 0.2, p-value = 0.9945
## alternative hypothesis: two-sided
#The K-S test is a nonparametric test for the equality of probability distributions that is used here to compare two samples. The null hypothesis of the K-S statistic is that the samples are drawn from the same distribution. With a p value of 0.9945, the null hypothesis is not rejected. The observed and expected cumulative frequencies are concluded to come from the same distribution.
5.An experiment is performed and the following quantities are measured: A = 1.0 m ± 0.2 m, B = 2.0 m ± 0.2 m, C = 2.5 m/s ± 0.5 m/s, D = 0.10 s ± 0.01 s. If, for example, A+B = 3.0 m + 0.4 m, then determine the correct answers for the following expressions:
a. B – A = z
z = B-A = 2.0 m – 1.0 m = 1.0 m
dz = sqrt(dB^2 + dA^2)
= sqrt(0.2^2 + 0.2^2)
= 0.28 m
z±dz= 1.0 m ± 0.28 m
b. C × D = z
z = C x D = 2.5 m/s x 0.10 s = 0.25 m
dz = z(sqrt((dC/C)^2 + (dD/D)^2))
= 0.25 (sqrt((0.5/2.5)^2 + (0.01/0.10)^2))
= 0.056 m
z±dz = 0.25 m ± 0.056 m
c. B / D = z
z = B/D = 2.0 m / 0.10 m = 20 m/s
dz = z(sqrt((dB/B)^2 + (dD/D)^2))
= 20 (sqrt((0.2/2.0)^2 + (0.01/0.10)^2))
= 2.8 m/s
z±dz = 20 m/s ± 2.8 m/s
d.3 × A = z
z = 3A = 3(1.0 m) = 3.0 m
dz = 3 dA
= 3 (0.2 m)
= 0.6 m
z±dz = 3.0 m ± 0.6 m
e.(A×B)½ = z
z = (A)^(0.5) B^(0.5) = sqrt(1.0 m x 2.0 m) = 1.41 m
dz = z sqrt((n dA/A)^2 + (n dB/B)^2)
= 1.41 sqrt((0.5 x 0.2/1.0)^2 + (0.5 x 0.2/2.0)^2)
= 0.158 m
z±dz = 1.41 m ± 0.16 m
6.In a trial comparing a binary outcome between two groups, find the required number of patients to find an increase in the success rate from 60% to 75% with a power of 90% using a two-tailed t-test . What happens if we reduce the power requirement to 80%?
power.prop.test(p1=0.6, p2=0.75, sig.level=0.05, power=0.9)
##
## Two-sample comparison of proportions power calculation
##
## n = 202.8
## p1 = 0.6
## p2 = 0.75
## sig.level = 0.05
## power = 0.9
## alternative = two.sided
##
## NOTE: n is number in *each* group
power.prop.test(p1=0.6, p2=0.75, sig.level=0.05, power=0.8)
##
## Two-sample comparison of proportions power calculation
##
## n = 151.9
## p1 = 0.6
## p2 = 0.75
## sig.level = 0.05
## power = 0.8
## alternative = two.sided
##
## NOTE: n is number in *each* group
#Reducing the power requirement to 80% decreases the number of patients by 51 (203-152). This can save time, resources, and money at the expense of power.
7.Plot the density of the non-central t distribution for ncp=3 and df=25 and compare it with the distribution of t+3, where t has a central t distribution with df=25.
curve(dt(x-3,25),from=0,to=5, ylab="dt")
curve(dt(x,25,3),add=T,col=2)
text(3,.1, "Red: df=25, ncp=3")
text(3,.175, "Black: df=25, x+3")
8.Assume that the within-population standard deviations all equal 9. Set the Type 1 error rate at α = .05. Presume that the population means will have the following values: 17.5, 19, 25, 20.5. You intend to run 80 subjects in all, with equal n's across all 4 groups. Compute your power to reject the null hypothesis under these conditions.
m<-c(17.5,19,25,20.5)
group<-c(1,2,3,4)
data<-data.frame(group,m)
data
## group m
## 1 1 17.5
## 2 2 19.0
## 3 3 25.0
## 4 4 20.5
var(m)
## [1] 10.5
power.anova.test(groups=4,n=20,sig.level=0.05,between.var=10.5,within.var=81)
##
## Balanced one-way analysis of variance power calculation
##
## groups = 4
## n = 20
## between.var = 10.5
## within.var = 81
## sig.level = 0.05
## power = 0.6155
##
## NOTE: n is number in each group
#Power = 61.6%
UsE the ISwR data for the last two problems:
9.The zelazo data are in the form of vectors, one for each of the four groups. Convert the data to a form suitable for the use of lm, and calculate the relevant test. Consider t-tests comparing selected groups or obtained by combining.
library(ISwR)
attach(zelazo)
data(zelazo)
walk<-unlist(zelazo)
group<-factor(rep(1:4,c(6,6,6,5)),labels=names(zelazo))
summary(lm(walk~group))
##
## Call:
## lm(formula = walk ~ group)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.708 -0.850 -0.350 0.638 3.625
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.125 0.619 16.36 1.2e-12 ***
## grouppassive 1.250 0.875 1.43 0.170
## groupnone 1.583 0.875 1.81 0.086 .
## groupctr.8w 2.225 0.918 2.42 0.026 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.52 on 19 degrees of freedom
## Multiple R-squared: 0.253, Adjusted R-squared: 0.135
## F-statistic: 2.14 on 3 and 19 DF, p-value: 0.129
t.test(zelazo$active,zelazo$ctr.8w)
##
## Welch Two Sample t-test
##
## data: zelazo$active and zelazo$ctr.8w
## t = -3.045, df = 8.663, p-value = 0.01453
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.8879 -0.5621
## sample estimates:
## mean of x mean of y
## 10.12 12.35
t.test(zelazo$active,unlist(zelazo[-1]))
##
## Welch Two Sample t-test
##
## data: zelazo$active and unlist(zelazo[-1])
## t = -2.386, df = 9.086, p-value = 0.04058
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.22071 -0.08811
## sample estimates:
## mean of x mean of y
## 10.12 11.78
10.The ashina trial was designed to have 80% statistical power if the true treatment difference was 15% and the standard deviation of differences within a person was 20%. Comment on the sample size chosen. (The power calculation was originally done using the approximate formula. The imbalance between group sizes is due to the use of an open randomization procedure.)
ashina
## vas.active vas.plac grp
## 1 -167 -102 1
## 2 -127 -39 1
## 3 -58 32 1
## 4 -103 28 1
## 5 -35 16 1
## 6 -164 -42 1
## 7 -3 -27 1
## 8 25 -30 1
## 9 -61 -47 1
## 10 -45 8 1
## 11 -38 12 2
## 12 29 11 2
## 13 2 -9 2
## 14 -18 -1 2
## 15 -74 3 2
## 16 -72 -36 2
power.t.test(power=0.8,delta=0.3,sd=0.2)
##
## Two-sample t test power calculation
##
## n = 8.06
## delta = 0.3
## sd = 0.2
## sig.level = 0.05
## power = 0.8
## alternative = two.sided
##
## NOTE: n is number in *each* group
power.t.test(power=0.8,delta=0.3,sd=0.2, alt="one.sided")
##
## Two-sample t test power calculation
##
## n = 6.299
## delta = 0.3
## sd = 0.2
## sig.level = 0.05
## power = 0.8
## alternative = one.sided
##
## NOTE: n is number in *each* group
#approx formula
(qnorm(.975)+qnorm(0.8))^2*2*(.2/.3)^2
## [1] 6.977
#The sample size was about 9 (8.06) for a two-sided test and 7 (6.29) for a one-sided test. With the approximate formula for a two-sided test, sample size is calculated to be about 7 (6.98). There is indeed a reduction in power due to use of an open randomization procedure that led to imbalance of group sizes.