RADI 5007 - Homework 2 Rachel Tuohy

Run the following analysis on the data in Prob2_1.txt:
1. Read the data into a table and show them.
2. Report the means and the number of subjects/cell.
3. Produce a graphical summary (boxplot) of the means of the 6 cells.

Data=read.table("C:/Users/Rachel/Documents/R/Prob2_1.txt",header=T)
Data

##    Observation Subject Task Valence Recall
## 1            1     Jim Free     Neg      8
## 2            2     Jim Free     Neu      9
## 3            3     Jim Free     Pos      5
## 4            4     Jim Cued     Neg      7
## 5            5     Jim Cued     Neu      9
## 6            6     Jim Cued     Pos     10
## 7            7  Victor Free     Neg     12
## 8            8  Victor Free     Neu     13
## 9            9  Victor Free     Pos     14
## 10          10  Victor Cued     Neg     16
## 11          11  Victor Cued     Neu     13
## 12          12  Victor Cued     Pos     14
## 13          13    Faye Free     Neg     13
## 14          14    Faye Free     Neu     13
## 15          15    Faye Free     Pos     12
## 16          16    Faye Cued     Neg     15
## 17          17    Faye Cued     Neu     16
## 18          18    Faye Cued     Pos     14
## 19          19     Ron Free     Neg     12
## 20          20     Ron Free     Neu     14
## 21          21     Ron Free     Pos     15
## 22          22     Ron Cued     Neg     17
## 23          23     Ron Cued     Neu     18
## 24          24     Ron Cued     Pos     20
## 25          25   Jason Free     Neg      6
## 26          26   Jason Free     Neu      7
## 27          27   Jason Free     Pos      9
## 28          28   Jason Cued     Neg      4
## 29          29   Jason Cued     Neu      9
## 30          30   Jason Cued     Pos     10

aov.data=aov(Recall~(Task*Valence)+Error(Subject/(Task*Valence)),Data)
summary(aov.data)

## 
## Error: Subject
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals  4    349    87.3               
## 
## Error: Subject:Task
##           Df Sum Sq Mean Sq F value Pr(>F)  
## Task       1   30.0   30.00    7.35  0.054 .
## Residuals  4   16.3    4.08                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error: Subject:Valence
##           Df Sum Sq Mean Sq F value Pr(>F)
## Valence    2    9.8    4.90    1.46   0.29
## Residuals  8   26.9    3.36               
## 
## Error: Subject:Task:Valence
##              Df Sum Sq Mean Sq F value Pr(>F)
## Task:Valence  2    1.4    0.70    0.29   0.76
## Residuals     8   19.3    2.41

print(model.tables(aov.data,"means"))

## Tables of means
## Grand mean
##      
## 11.8 
## 
##  Task 
## Task
## Cued Free 
## 12.8 10.8 
## 
##  Valence 
## Valence
##  Neg  Neu  Pos 
## 11.0 12.1 12.3 
## 
##  Task:Valence 
##       Valence
## Task   Neg  Neu  Pos 
##   Cued 11.8 13.0 13.6
##   Free 10.2 11.2 11.0

boxplot(Recall~Task*Valence,data=Data)

plot of chunk unnamed-chunk-2

Download the text file labeled, schizophrenic.txt. In it you will find data from a clinical psychologist's study of the effect of a new drug combined with therapy on schizophrenic patients' behavior. The drug has three dosages given (absent, low, high) and the therapy has four types: behavior modification (BMod); psychodynamic (Psycho); group counseling (Group), and nondirective counseling (Nondir).
a. Using R, determine if there are significant differences between the various groups and whether there are significant interactions between the treatments. Show your results and explain them.
b. Make a box & whiskers graph depicting the results. What conclusions can you draw from your analysis?

Data=read.table("C:/Users/Rachel/Documents/R/schizophrenic.txt",header=T)
Data

##    Dosage Therapy Behavior
## 1  absent    BMod       25
## 2  absent    BMod       22
## 3  absent    BMod       23
## 4  absent    BMod       25
## 5  absent    BMod       24
## 6  absent  Psycho       27
## 7  absent  Psycho       25
## 8  absent  Psycho       26
## 9  absent  Psycho       24
## 10 absent  Psycho       25
## 11 absent   Group       20
## 12 absent   Group       25
## 13 absent   Group       19
## 14 absent   Group       21
## 15 absent   Group       18
## 16 absent  Nondir       19
## 17 absent  Nondir       22
## 18 absent  Nondir       20
## 19 absent  Nondir       17
## 20 absent  Nondir       22
## 21    low    BMod       25
## 22    low    BMod       21
## 23    low    BMod       20
## 24    low    BMod       22
## 25    low    BMod       21
## 26    low  Psycho       25
## 27    low  Psycho       23
## 28    low  Psycho       19
## 29    low  Psycho       24
## 30    low  Psycho       20
## 31    low   Group       22
## 32    low   Group       21
## 33    low   Group       25
## 34    low   Group       26
## 35    low   Group       22
## 36    low  Nondir       23
## 37    low  Nondir       27
## 38    low  Nondir       23
## 39    low  Nondir       26
## 40    low  Nondir       28
## 41   high    BMod       22
## 42   high    BMod       30
## 43   high    BMod       26
## 44   high    BMod       28
## 45   high    BMod       20
## 46   high  Psycho       28
## 47   high  Psycho       27
## 48   high  Psycho       24
## 49   high  Psycho       21
## 50   high  Psycho       25
## 51   high   Group       27
## 52   high   Group       25
## 53   high   Group       28
## 54   high   Group       29
## 55   high   Group       24
## 56   high  Nondir       20
## 57   high  Nondir       24
## 58   high  Nondir       21
## 59   high  Nondir       24
## 60   high  Nondir       26

aov.data=aov(Behavior~Therapy*Dosage,Data)
summary(aov.data)

##                Df Sum Sq Mean Sq F value Pr(>F)   
## Therapy         3   14.9     5.0    0.85 0.4755   
## Dosage          2   66.5    33.3    5.69 0.0061 **
## Therapy:Dosage  6  156.8    26.1    4.47 0.0011 **
## Residuals      48  280.8     5.9                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

boxplot(Behavior~Therapy*Dosage,Data)

plot of chunk unnamed-chunk-4

"As dose increases, so does behavior. In the absent group, the behavior is low and in the high dose group, the behavior ranks higher."

## [1] "As dose increases, so does behavior. In the absent group, the behavior is low and in the high dose group, the behavior ranks higher."

Download the data in the file named “Dandruff.txt”. These data were collected by a research group investigating the anti-dandruff power of four new shampoo formulas. The measured, dependent variable value is a dandruff measure where the lower the number indicates that less dandruff was measured. The independent variable (Group) codes the four different shampoo formulas using letters.
a. Use R to determine if there is a significant between-groups difference in the Dandruff.txt data.
b. Create a vertical box and whisker graph of the Dandruff.txt data.

data<-read.table("C:/Users/Rachel/Documents/R/Dandruff.txt",header=T)
 data

##    Groups MeasuredValue
## 1       A            26
## 2       A            25
## 3       A            29
## 4       A            21
## 5       A            20
## 6       A            18
## 7       B            24
## 8       B            17
## 9       B            16
## 10      B            13
## 11      B            21
## 12      B            19
## 13      C            32
## 14      C            34
## 15      C            29
## 16      C            19
## 17      C            27
## 18      C            28
## 19      D            23
## 20      D            29
## 21      D            26
## 22      D            20
## 23      D            24
## 24      D            25

aov.data<-aov(MeasuredValue~Groups,data)
summary(aov.data)

##             Df Sum Sq Mean Sq F value Pr(>F)   
## Groups       3    297    99.2    5.79 0.0051 **
## Residuals   20    343    17.1                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

boxplot(MeasuredValue~Groups,data)

plot of chunk unnamed-chunk-6

Data on the antennae lengths of a sample of 32 woodlice were used to determine the expected frequency at each of the lengths encountered as shown in the table, below.
a. Plot the cumulative frequencies (observed and estimated) against each other and interpret what it tells you about normality of the data.
b. Analyze the data more formally with the Kolomogorov-Smirnov test and explain your results.

ObservedCF<-c(2,5,9,12,16,23,26,29,31,32)
EstimatedCF<-c(1.3,3.7,7.5,12.4,17.8,22.7,26.5,28.9,30.2,30.8)

plot(ObservedCF,EstimatedCF)

plot of chunk unnamed-chunk-8
Since the data is approximately a straight line, the distribution seems to be a normal distribution.

ks.test(ObservedCF,EstimatedCF)

## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  ObservedCF and EstimatedCF
## D = 0.2, p-value = 0.9945
## alternative hypothesis: two-sided

Since P>0.05, the data is considered normally distributed.

An experiment is performed and the following quantities are measured: A = 1.0 m ± 0.2 m, B = 2.0 m ± 0.2 m, C = 2.5 m/s ± 0.5 m/s, D = 0.10 s ± 0.01 s. If, for example, A+B = 3.0 m + 0.4 m, then determine the correct answers for the following expressions:
a. B - A =

A=1
dA=0.2
B=2
dB=0.2
(B-A) + sqrt((dA^2)+(dB^2))

## [1] 1.283

(B-A) - sqrt((dA^2)+(dB^2))

## [1] 0.7172

b. C × D =

C=2.5
dC=0.5
dD=0.01
D=0.1
(c*D) + sqrt((0.01/0.1)^2+(0.5/2.5)^2)

## Error: non-numeric argument to binary operator

(C*D) - sqrt((0.01/0.1)^2+(0.5/2.5)^2)

## [1] 0.02639

c. B / D =

(B/D) + sqrt((0.01/0.1)^2+(0.2/2)^2)

## [1] 20.14

(B/D) - sqrt((0.01/0.1)^2+(0.2/2)^2)

## [1] 19.86

d. 3 × A =

(3*A) + (3*dA)

## [1] 3.6

(3*A) - (3*dA)

## [1] 2.4

e. (A × B)½ =

sqrt(A*B) + (0.5)*((A*B)^(-.5))*sqrt((dA/A)^2+(dB/B)^2)

## [1] 1.493

sqrt(A*B) - (0.5)*((A*B)^(-.5))*sqrt((dA/A)^2+(dB/B)^2)

## [1] 1.335

In a trial comparing a binary outcome between two groups, find the required number of patients to find an increase in the success rate from 60% to 75% with a power of 90% using a two-tailed t-test . What happens if we reduce the power requirement to 80%?

power.prop.test(power=.9,p1=.6,p2=.75)

## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 202.8
##              p1 = 0.6
##              p2 = 0.75
##       sig.level = 0.05
##           power = 0.9
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

power.prop.test(power=.8,p1=.6,p2=.75)

## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 151.9
##              p1 = 0.6
##              p2 = 0.75
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

With a power of 90%, the number of patients needed is 203. For a power of 80%, the number of patients decreases to 152.

Plot the density of the non-central t distribution for ncp=3 and df=25 and compare it with the distribution of t+3, where t has a central t distribution with df=25.

curve(dt(x-3,25),from=0,to=5)
curve(dt(x,25,3),add=TRUE)

plot of chunk unnamed-chunk-16
Assume that the within-population standard deviations all equal 9. Set the Type 1 error rate at ?? = .05. Presume that the population means will have the following values: 17.5, 19, 25, 20.5. You intend to run 80 subjects in all, with equal n's across all 4 groups. Compute your power to reject the null hypothesis under these conditions.

means<-c(17.5,19,25,20.5)
power.anova.test(groups=4,n=20,between.var=var(means),sig.level=0.05,within.var=81)

## 
##      Balanced one-way analysis of variance power calculation 
## 
##          groups = 4
##               n = 20
##     between.var = 10.5
##      within.var = 81
##       sig.level = 0.05
##           power = 0.6155
## 
## NOTE: n is number in each group

The zelazo data are in the form of vectors, one for each of the four groups. Convert the data to a form suitable for the use of lm, and calculate the relevant test. Consider t-tests comparing selected groups or obtained by combining.

library(ISwR)
attach(zelazo)
data<-unlist(zelazo)
group<-factor(rep(1:4,c(6,6,6,5)))
labels<-names(zelazo)
summary(lm(data~group))

## 
## Call:
## lm(formula = data ~ group)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -2.708 -0.850 -0.350  0.638  3.625 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   10.125      0.619   16.36  1.2e-12 ***
## group2         1.250      0.875    1.43    0.170    
## group3         1.583      0.875    1.81    0.086 .  
## group4         2.225      0.918    2.42    0.026 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.52 on 19 degrees of freedom
## Multiple R-squared:  0.253,  Adjusted R-squared:  0.135 
## F-statistic: 2.14 on 3 and 19 DF,  p-value: 0.129

t.test(zelazo$active,zelazo$ctr.8w)

## 
##  Welch Two Sample t-test
## 
## data:  zelazo$active and zelazo$ctr.8w
## t = -3.045, df = 8.663, p-value = 0.01453
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.8879 -0.5621
## sample estimates:
## mean of x mean of y 
##     10.12     12.35

t.test(zelazo$active,unlist(zelazo[-1]))

## 
##  Welch Two Sample t-test
## 
## data:  zelazo$active and unlist(zelazo[-1])
## t = -2.386, df = 9.086, p-value = 0.04058
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.22071 -0.08811
## sample estimates:
## mean of x mean of y 
##     10.12     11.78

The ashina trial was designed to have 80% statistical power if the true treatment difference was 15% and the standard deviation of differences within a person was 20%. Comment on the sample size chosen. (The power calculation was originally done using the approximate formula. The imbalance between group sizes is due to the use of an open randomization procedure.)

 attach(ashina)
 power.t.test(power=.8,delta=0.30,sd=20)

## 
##      Two-sample t test power calculation 
## 
##               n = 69769
##           delta = 0.3
##              sd = 20
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

 power.t.test(power=0.8,delta=0.3,sd=0.2,alt="one.sided")

## 
##      Two-sample t test power calculation 
## 
##               n = 6.299
##           delta = 0.3
##              sd = 0.2
##       sig.level = 0.05
##           power = 0.8
##     alternative = one.sided
## 
## NOTE: n is number in *each* group

(qnorm(0.975)+qnorm(0.8))^2*2*(0.2/0.3)^2

## [1] 6.977

power.t.test(n=8,delta=0.3,sd=0.2)

## 
##      Two-sample t test power calculation 
## 
##               n = 8
##           delta = 0.3
##              sd = 0.2
##       sig.level = 0.05
##           power = 0.7965
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

x<-(0.3)*sqrt(2/8)/sqrt((1/6)+.1)
power.t.test(n=8,delta=x,sd=0.2)

## 
##      Two-sample t test power calculation 
## 
##               n = 8
##           delta = 0.2905
##              sd = 0.2
##       sig.level = 0.05
##           power = 0.7707
##     alternative = two.sided
## 
## NOTE: n is number in *each* group