Run the following analysis on the data in Prob2_1.txt:
1. Read the data into a table and show them.
2. Report the means and the number of subjects/cell.
3. Produce a graphical summary (boxplot) of the means of the 6 cells.
Data=read.table("C:/Users/Rachel/Documents/R/Prob2_1.txt",header=T)
Data
## Observation Subject Task Valence Recall
## 1 1 Jim Free Neg 8
## 2 2 Jim Free Neu 9
## 3 3 Jim Free Pos 5
## 4 4 Jim Cued Neg 7
## 5 5 Jim Cued Neu 9
## 6 6 Jim Cued Pos 10
## 7 7 Victor Free Neg 12
## 8 8 Victor Free Neu 13
## 9 9 Victor Free Pos 14
## 10 10 Victor Cued Neg 16
## 11 11 Victor Cued Neu 13
## 12 12 Victor Cued Pos 14
## 13 13 Faye Free Neg 13
## 14 14 Faye Free Neu 13
## 15 15 Faye Free Pos 12
## 16 16 Faye Cued Neg 15
## 17 17 Faye Cued Neu 16
## 18 18 Faye Cued Pos 14
## 19 19 Ron Free Neg 12
## 20 20 Ron Free Neu 14
## 21 21 Ron Free Pos 15
## 22 22 Ron Cued Neg 17
## 23 23 Ron Cued Neu 18
## 24 24 Ron Cued Pos 20
## 25 25 Jason Free Neg 6
## 26 26 Jason Free Neu 7
## 27 27 Jason Free Pos 9
## 28 28 Jason Cued Neg 4
## 29 29 Jason Cued Neu 9
## 30 30 Jason Cued Pos 10
aov.data=aov(Recall~(Task*Valence)+Error(Subject/(Task*Valence)),Data)
summary(aov.data)
##
## Error: Subject
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 4 349 87.3
##
## Error: Subject:Task
## Df Sum Sq Mean Sq F value Pr(>F)
## Task 1 30.0 30.00 7.35 0.054 .
## Residuals 4 16.3 4.08
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: Subject:Valence
## Df Sum Sq Mean Sq F value Pr(>F)
## Valence 2 9.8 4.90 1.46 0.29
## Residuals 8 26.9 3.36
##
## Error: Subject:Task:Valence
## Df Sum Sq Mean Sq F value Pr(>F)
## Task:Valence 2 1.4 0.70 0.29 0.76
## Residuals 8 19.3 2.41
print(model.tables(aov.data,"means"))
## Tables of means
## Grand mean
##
## 11.8
##
## Task
## Task
## Cued Free
## 12.8 10.8
##
## Valence
## Valence
## Neg Neu Pos
## 11.0 12.1 12.3
##
## Task:Valence
## Valence
## Task Neg Neu Pos
## Cued 11.8 13.0 13.6
## Free 10.2 11.2 11.0
boxplot(Recall~Task*Valence,data=Data)
Download the text file labeled, schizophrenic.txt. In it you will find data from a clinical psychologist's study of the effect of a new drug combined with therapy on schizophrenic patients' behavior. The drug has three dosages given (absent, low, high) and the therapy has four types: behavior modification (BMod); psychodynamic (Psycho); group counseling (Group), and nondirective counseling (Nondir).
a. Using R, determine if there are significant differences between the various groups and whether there are significant interactions between the treatments. Show your results and explain them.
b. Make a box & whiskers graph depicting the results. What conclusions can you draw from your analysis?
Data=read.table("C:/Users/Rachel/Documents/R/schizophrenic.txt",header=T)
Data
## Dosage Therapy Behavior
## 1 absent BMod 25
## 2 absent BMod 22
## 3 absent BMod 23
## 4 absent BMod 25
## 5 absent BMod 24
## 6 absent Psycho 27
## 7 absent Psycho 25
## 8 absent Psycho 26
## 9 absent Psycho 24
## 10 absent Psycho 25
## 11 absent Group 20
## 12 absent Group 25
## 13 absent Group 19
## 14 absent Group 21
## 15 absent Group 18
## 16 absent Nondir 19
## 17 absent Nondir 22
## 18 absent Nondir 20
## 19 absent Nondir 17
## 20 absent Nondir 22
## 21 low BMod 25
## 22 low BMod 21
## 23 low BMod 20
## 24 low BMod 22
## 25 low BMod 21
## 26 low Psycho 25
## 27 low Psycho 23
## 28 low Psycho 19
## 29 low Psycho 24
## 30 low Psycho 20
## 31 low Group 22
## 32 low Group 21
## 33 low Group 25
## 34 low Group 26
## 35 low Group 22
## 36 low Nondir 23
## 37 low Nondir 27
## 38 low Nondir 23
## 39 low Nondir 26
## 40 low Nondir 28
## 41 high BMod 22
## 42 high BMod 30
## 43 high BMod 26
## 44 high BMod 28
## 45 high BMod 20
## 46 high Psycho 28
## 47 high Psycho 27
## 48 high Psycho 24
## 49 high Psycho 21
## 50 high Psycho 25
## 51 high Group 27
## 52 high Group 25
## 53 high Group 28
## 54 high Group 29
## 55 high Group 24
## 56 high Nondir 20
## 57 high Nondir 24
## 58 high Nondir 21
## 59 high Nondir 24
## 60 high Nondir 26
aov.data=aov(Behavior~Therapy*Dosage,Data)
summary(aov.data)
## Df Sum Sq Mean Sq F value Pr(>F)
## Therapy 3 14.9 5.0 0.85 0.4755
## Dosage 2 66.5 33.3 5.69 0.0061 **
## Therapy:Dosage 6 156.8 26.1 4.47 0.0011 **
## Residuals 48 280.8 5.9
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
boxplot(Behavior~Therapy*Dosage,Data)
"As dose increases, so does behavior. In the absent group, the behavior is low and in the high dose group, the behavior ranks higher."
## [1] "As dose increases, so does behavior. In the absent group, the behavior is low and in the high dose group, the behavior ranks higher."
Download the data in the file named “Dandruff.txt”. These data were collected by a research group investigating the anti-dandruff power of four new shampoo formulas. The measured, dependent variable value is a dandruff measure where the lower the number indicates that less dandruff was measured. The independent variable (Group) codes the four different shampoo formulas using letters.
a. Use R to determine if there is a significant between-groups difference in the Dandruff.txt data.
b. Create a vertical box and whisker graph of the Dandruff.txt data.
data<-read.table("C:/Users/Rachel/Documents/R/Dandruff.txt",header=T)
data
## Groups MeasuredValue
## 1 A 26
## 2 A 25
## 3 A 29
## 4 A 21
## 5 A 20
## 6 A 18
## 7 B 24
## 8 B 17
## 9 B 16
## 10 B 13
## 11 B 21
## 12 B 19
## 13 C 32
## 14 C 34
## 15 C 29
## 16 C 19
## 17 C 27
## 18 C 28
## 19 D 23
## 20 D 29
## 21 D 26
## 22 D 20
## 23 D 24
## 24 D 25
aov.data<-aov(MeasuredValue~Groups,data)
summary(aov.data)
## Df Sum Sq Mean Sq F value Pr(>F)
## Groups 3 297 99.2 5.79 0.0051 **
## Residuals 20 343 17.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
boxplot(MeasuredValue~Groups,data)
Data on the antennae lengths of a sample of 32 woodlice were used to determine the expected frequency at each of the lengths encountered as shown in the table, below.
a. Plot the cumulative frequencies (observed and estimated) against each other and interpret what it tells you about normality of the data.
b. Analyze the data more formally with the Kolomogorov-Smirnov test and explain your results.
ObservedCF<-c(2,5,9,12,16,23,26,29,31,32)
EstimatedCF<-c(1.3,3.7,7.5,12.4,17.8,22.7,26.5,28.9,30.2,30.8)
plot(ObservedCF,EstimatedCF)
Since the data is approximately a straight line, the distribution seems to be a normal distribution.
ks.test(ObservedCF,EstimatedCF)
##
## Two-sample Kolmogorov-Smirnov test
##
## data: ObservedCF and EstimatedCF
## D = 0.2, p-value = 0.9945
## alternative hypothesis: two-sided
Since P>0.05, the data is considered normally distributed.
An experiment is performed and the following quantities are measured: A = 1.0 m ± 0.2 m, B = 2.0 m ± 0.2 m, C = 2.5 m/s ± 0.5 m/s, D = 0.10 s ± 0.01 s. If, for example, A+B = 3.0 m + 0.4 m, then determine the correct answers for the following expressions:
a. B - A =
A=1
dA=0.2
B=2
dB=0.2
(B-A) + sqrt((dA^2)+(dB^2))
## [1] 1.283
(B-A) - sqrt((dA^2)+(dB^2))
## [1] 0.7172
b. C × D =
C=2.5
dC=0.5
dD=0.01
D=0.1
(c*D) + sqrt((0.01/0.1)^2+(0.5/2.5)^2)
## Error: non-numeric argument to binary operator
(C*D) - sqrt((0.01/0.1)^2+(0.5/2.5)^2)
## [1] 0.02639
c. B / D =
(B/D) + sqrt((0.01/0.1)^2+(0.2/2)^2)
## [1] 20.14
(B/D) - sqrt((0.01/0.1)^2+(0.2/2)^2)
## [1] 19.86
d. 3 × A =
(3*A) + (3*dA)
## [1] 3.6
(3*A) - (3*dA)
## [1] 2.4
e. (A × B)½ =
sqrt(A*B) + (0.5)*((A*B)^(-.5))*sqrt((dA/A)^2+(dB/B)^2)
## [1] 1.493
sqrt(A*B) - (0.5)*((A*B)^(-.5))*sqrt((dA/A)^2+(dB/B)^2)
## [1] 1.335
In a trial comparing a binary outcome between two groups, find the required number of patients to find an increase in the success rate from 60% to 75% with a power of 90% using a two-tailed t-test . What happens if we reduce the power requirement to 80%?
power.prop.test(power=.9,p1=.6,p2=.75)
##
## Two-sample comparison of proportions power calculation
##
## n = 202.8
## p1 = 0.6
## p2 = 0.75
## sig.level = 0.05
## power = 0.9
## alternative = two.sided
##
## NOTE: n is number in *each* group
power.prop.test(power=.8,p1=.6,p2=.75)
##
## Two-sample comparison of proportions power calculation
##
## n = 151.9
## p1 = 0.6
## p2 = 0.75
## sig.level = 0.05
## power = 0.8
## alternative = two.sided
##
## NOTE: n is number in *each* group
With a power of 90%, the number of patients needed is 203. For a power of 80%, the number of patients decreases to 152.
Plot the density of the non-central t distribution for ncp=3 and df=25 and compare it with the distribution of t+3, where t has a central t distribution with df=25.
curve(dt(x-3,25),from=0,to=5)
curve(dt(x,25,3),add=TRUE)
Assume that the within-population standard deviations all equal 9. Set the Type 1 error rate at ?? = .05. Presume that the population means will have the following values: 17.5, 19, 25, 20.5. You intend to run 80 subjects in all, with equal n's across all 4 groups. Compute your power to reject the null hypothesis under these conditions.
means<-c(17.5,19,25,20.5)
power.anova.test(groups=4,n=20,between.var=var(means),sig.level=0.05,within.var=81)
##
## Balanced one-way analysis of variance power calculation
##
## groups = 4
## n = 20
## between.var = 10.5
## within.var = 81
## sig.level = 0.05
## power = 0.6155
##
## NOTE: n is number in each group
The zelazo data are in the form of vectors, one for each of the four groups. Convert the data to a form suitable for the use of lm, and calculate the relevant test. Consider t-tests comparing selected groups or obtained by combining.
library(ISwR)
attach(zelazo)
data<-unlist(zelazo)
group<-factor(rep(1:4,c(6,6,6,5)))
labels<-names(zelazo)
summary(lm(data~group))
##
## Call:
## lm(formula = data ~ group)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.708 -0.850 -0.350 0.638 3.625
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.125 0.619 16.36 1.2e-12 ***
## group2 1.250 0.875 1.43 0.170
## group3 1.583 0.875 1.81 0.086 .
## group4 2.225 0.918 2.42 0.026 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.52 on 19 degrees of freedom
## Multiple R-squared: 0.253, Adjusted R-squared: 0.135
## F-statistic: 2.14 on 3 and 19 DF, p-value: 0.129
t.test(zelazo$active,zelazo$ctr.8w)
##
## Welch Two Sample t-test
##
## data: zelazo$active and zelazo$ctr.8w
## t = -3.045, df = 8.663, p-value = 0.01453
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.8879 -0.5621
## sample estimates:
## mean of x mean of y
## 10.12 12.35
t.test(zelazo$active,unlist(zelazo[-1]))
##
## Welch Two Sample t-test
##
## data: zelazo$active and unlist(zelazo[-1])
## t = -2.386, df = 9.086, p-value = 0.04058
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.22071 -0.08811
## sample estimates:
## mean of x mean of y
## 10.12 11.78
The ashina trial was designed to have 80% statistical power if the true treatment difference was 15% and the standard deviation of differences within a person was 20%. Comment on the sample size chosen. (The power calculation was originally done using the approximate formula. The imbalance between group sizes is due to the use of an open randomization procedure.)
attach(ashina)
power.t.test(power=.8,delta=0.30,sd=20)
##
## Two-sample t test power calculation
##
## n = 69769
## delta = 0.3
## sd = 20
## sig.level = 0.05
## power = 0.8
## alternative = two.sided
##
## NOTE: n is number in *each* group
power.t.test(power=0.8,delta=0.3,sd=0.2,alt="one.sided")
##
## Two-sample t test power calculation
##
## n = 6.299
## delta = 0.3
## sd = 0.2
## sig.level = 0.05
## power = 0.8
## alternative = one.sided
##
## NOTE: n is number in *each* group
(qnorm(0.975)+qnorm(0.8))^2*2*(0.2/0.3)^2
## [1] 6.977
power.t.test(n=8,delta=0.3,sd=0.2)
##
## Two-sample t test power calculation
##
## n = 8
## delta = 0.3
## sd = 0.2
## sig.level = 0.05
## power = 0.7965
## alternative = two.sided
##
## NOTE: n is number in *each* group
x<-(0.3)*sqrt(2/8)/sqrt((1/6)+.1)
power.t.test(n=8,delta=x,sd=0.2)
##
## Two-sample t test power calculation
##
## n = 8
## delta = 0.2905
## sd = 0.2
## sig.level = 0.05
## power = 0.7707
## alternative = two.sided
##
## NOTE: n is number in *each* group