RADI 5007 – HOMEWORK SET #2 - 2014

Mariam Ishaque

1.Run the following analysis on the data in “Prob2_1.txt”:

• Read the data into a table and show them.

setwd("/Users/mishaque/Desktop/Statistics Course/HW")
Prob2_1<-read.table("Prob2_1.txt",header=T)
Prob2_1

##    Observation Subject Task Valence Recall
## 1            1     Jim Free     Neg      8
## 2            2     Jim Free     Neu      9
## 3            3     Jim Free     Pos      5
## 4            4     Jim Cued     Neg      7
## 5            5     Jim Cued     Neu      9
## 6            6     Jim Cued     Pos     10
## 7            7  Victor Free     Neg     12
## 8            8  Victor Free     Neu     13
## 9            9  Victor Free     Pos     14
## 10          10  Victor Cued     Neg     16
## 11          11  Victor Cued     Neu     13
## 12          12  Victor Cued     Pos     14
## 13          13    Faye Free     Neg     13
## 14          14    Faye Free     Neu     13
## 15          15    Faye Free     Pos     12
## 16          16    Faye Cued     Neg     15
## 17          17    Faye Cued     Neu     16
## 18          18    Faye Cued     Pos     14
## 19          19     Ron Free     Neg     12
## 20          20     Ron Free     Neu     14
## 21          21     Ron Free     Pos     15
## 22          22     Ron Cued     Neg     17
## 23          23     Ron Cued     Neu     18
## 24          24     Ron Cued     Pos     20
## 25          25   Jason Free     Neg      6
## 26          26   Jason Free     Neu      7
## 27          27   Jason Free     Pos      9
## 28          28   Jason Cued     Neg      4
## 29          29   Jason Cued     Neu      9
## 30          30   Jason Cued     Pos     10

• Report the means and the number of subjects/cell.

summary(Prob2_1)

##   Observation      Subject    Task    Valence      Recall    
##  Min.   : 1.00   Faye  :6   Cued:15   Neg:10   Min.   : 4.0  
##  1st Qu.: 8.25   Jason :6   Free:15   Neu:10   1st Qu.: 9.0  
##  Median :15.50   Jim   :6             Pos:10   Median :12.5  
##  Mean   :15.50   Ron   :6                      Mean   :11.8  
##  3rd Qu.:22.75   Victor:6                      3rd Qu.:14.0  
##  Max.   :30.00                                 Max.   :20.0

aov.Prob2_1=aov(Recall~(Task*Valence)+Error(Subject/(Task*Valence)),Prob2_1)
summary(aov.Prob2_1)

## 
## Error: Subject
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals  4    349    87.3               
## 
## Error: Subject:Task
##           Df Sum Sq Mean Sq F value Pr(>F)  
## Task       1   30.0   30.00    7.35  0.054 .
## Residuals  4   16.3    4.08                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error: Subject:Valence
##           Df Sum Sq Mean Sq F value Pr(>F)
## Valence    2    9.8    4.90    1.46   0.29
## Residuals  8   26.9    3.36               
## 
## Error: Subject:Task:Valence
##              Df Sum Sq Mean Sq F value Pr(>F)
## Task:Valence  2    1.4    0.70    0.29   0.76
## Residuals     8   19.3    2.41

print(model.tables(aov.Prob2_1,"means",digits=3))

## Tables of means
## Grand mean
##      
## 11.8 
## 
##  Task 
## Task
## Cued Free 
## 12.8 10.8 
## 
##  Valence 
## Valence
##  Neg  Neu  Pos 
## 11.0 12.1 12.3 
## 
##  Task:Valence 
##       Valence
## Task   Neg  Neu  Pos 
##   Cued 11.8 13.0 13.6
##   Free 10.2 11.2 11.0

attach(Prob2_1)
tapply(Recall,list(Task),length)

## Cued Free 
##   15   15

tapply(Recall,list(Valence),length)

## Neg Neu Pos 
##  10  10  10

tapply(Recall,list(Task,Valence),length)

##      Neg Neu Pos
## Cued   5   5   5
## Free   5   5   5

• Produce a graphical summary (boxplot) of the means of the 6 cells.

attach(Prob2_1)

## The following objects are masked from Prob2_1 (position 3):
## 
##     Observation, Recall, Subject, Task, Valence

boxplot(Recall~Task*Valence,xlab="Task.Valence",ylab="Recall",main="Recall vs Task+Valence")

plot of chunk unnamed-chunk-3

2.a. Using R, determine if there are significant differences between the various groups and whether there are significant interactions between the treatments. Show your results and explain them.

setwd("/Users/mishaque/Desktop/Statistics Course/HW")
schiz<-read.table("schiz.txt", header=T)
schiz

##    Dosage Therapy Behavior
## 1  absent    Bmod       25
## 2  absent    Bmod       22
## 3  absent    Bmod       23
## 4  absent    Bmod       25
## 5  absent    Bmod       24
## 6  absent  Pyscho       27
## 7  absent  Pyscho       25
## 8  absent  Pyscho       26
## 9  absent  Pyscho       24
## 10 absent  Pyscho       25
## 11 absent   Group       20
## 12 absent   Group       25
## 13 absent   Group       19
## 14 absent   Group       21
## 15 absent   Group       18
## 16 absent  Nondir       19
## 17 absent  Nondir       22
## 18 absent  Nondir       20
## 19 absent  Nondir       17
## 20 absent  Nondir       22
## 21    low    Bmod       25
## 22    low    Bmod       21
## 23    low    Bmod       20
## 24    low    Bmod       22
## 25    low    Bmod       21
## 26    low  Pyscho       25
## 27    low  Pyscho       23
## 28    low  Pyscho       19
## 29    low  Pyscho       24
## 30    low  Pyscho       20
## 31    low   Group       22
## 32    low   Group       21
## 33    low   Group       25
## 34    low   Group       26
## 35    low   Group       22
## 36    low  Nondir       23
## 37    low  Nondir       27
## 38    low  Nondir       23
## 39    low  Nondir       26
## 40    low  Nondir       28
## 41   high    Bmod       22
## 42   high    Bmod       30
## 43   high    Bmod       26
## 44   high    Bmod       28
## 45   high    Bmod       20
## 46   high  Pyscho       28
## 47   high  Pyscho       27
## 48   high  Pyscho       24
## 49   high  Pyscho       21
## 50   high  Pyscho       25
## 51   high   Group       27
## 52   high   Group       25
## 53   high   Group       28
## 54   high   Group       29
## 55   high   Group       24
## 56   high  Nondir       20
## 57   high  Nondir       24
## 58   high  Nondir       21
## 59   high  Nondir       24
## 60   high  Nondir       26

names(schiz)

## [1] "Dosage"   "Therapy"  "Behavior"

summary(schiz)

##     Dosage     Therapy      Behavior   
##  absent:20   Bmod  :15   Min.   :17.0  
##  high  :20   Group :15   1st Qu.:21.0  
##  low   :20   Nondir:15   Median :24.0  
##              Pyscho:15   Mean   :23.5  
##                          3rd Qu.:25.2  
##                          Max.   :30.0

attach(schiz)
anova(lm(Behavior~Dosage*Therapy))

## Analysis of Variance Table
## 
## Response: Behavior
##                Df Sum Sq Mean Sq F value Pr(>F)   
## Dosage          2   66.5    33.3    5.69 0.0061 **
## Therapy         3   14.9     5.0    0.85 0.4755   
## Dosage:Therapy  6  156.8    26.1    4.47 0.0011 **
## Residuals      48  280.8     5.8                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#There is a significant difference between the dosage groups (p =0.006) but not between the therapy groups (p=0.475). In others words, dosage significantly affects the behavior (results) but therapy does not. There is significant interaction between dosage and therapy (p=0.001).

b. Make a box & whiskers graph depicting the results. What conclusions can you draw from your analysis?

attach(schiz)

## The following objects are masked from schiz (position 3):
## 
##     Behavior, Dosage, Therapy

schiz$Dosage<-as.factor(schiz$Dosage)
schiz<-data.frame(schiz)
schiz$Dosage<-ordered(schiz$Dosage,levels=c("absent","low","high"))
boxplot(Behavior~Therapy*Dosage,data=schiz,par(mar=c(12,5,4,2)+0.1),main="Behavior vs Therapy.Dosage",ylab="Behavior",las=2)
mtext("Therapy.Dosage",side=1,line=7)

plot of chunk unnamed-chunk-5

#The boxplot shows dosage appears to significantly affect behavior. Higher dosages generally result in higher behavioral values.

3.a. Use R to determine if there is a significant between-groups difference in the Dandruff.txt data.

setwd("/Users/mishaque/Desktop/Statistics Course/HW")
dand<-read.table("dand.txt",header=T)
dand

##    Groups MeasuredValue
## 1       A            26
## 2       A            25
## 3       A            29
## 4       A            21
## 5       A            20
## 6       A            18
## 7       B            24
## 8       B            17
## 9       B            16
## 10      B            13
## 11      B            21
## 12      B            19
## 13      C            32
## 14      C            34
## 15      C            29
## 16      C            19
## 17      C            27
## 18      C            28
## 19      D            23
## 20      D            29
## 21      D            26
## 22      D            20
## 23      D            24
## 24      D            25

attach(dand)
anova(lm(MeasuredValue~Groups))

## Analysis of Variance Table
## 
## Response: MeasuredValue
##           Df Sum Sq Mean Sq F value Pr(>F)   
## Groups     3    297    99.2    5.79 0.0051 **
## Residuals 20    343    17.1                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#There is a significant between-groups difference, with p=0.005107.

b. Create a vertical box and whisker graph of the Dandruff.txt data.

attach(dand)

## The following objects are masked from dand (position 3):
## 
##     Groups, MeasuredValue

boxplot(MeasuredValue~Groups,main="Measured Value of Dandruff vs Group", xlab="Groups",ylab="Measured Value")

plot of chunk unnamed-chunk-7

4.Data on the antennae lengths of a sample of 32 woodlice were used to determine the expected frequency at each of the lengths encountered as shown in the table, below.

a. Plot the cumulative frequencies (observed and estimated) against each other and interpret what it tells you about normality of the data.

obs<-c(2,5,9,12,16,23,26,29,31,32)
est<-c(1.3,3.7,7.5,12.4,17.8,22.7,26.5,28.9,30.2,30.8)
plot(obs,est,main="Estimated vs Observed Cumulative Frequencies", ylab="Estimated Cumulative Frequency", xlab="Cumulative Observed Frequency")

plot of chunk unnamed-chunk-8

#Upon visual inspection, the data appears to have a somewhat sigmoidal distribution. It does seem to be fairly normally distributed though.

b. Analyze the data more formally with the Kolomogorov-Smirnov test and explain your results.

ks.test(obs,est,"pnorm")

## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  obs and est
## D = 0.2, p-value = 0.9945
## alternative hypothesis: two-sided

#The K-S test is a nonparametric test for the equality of probability distributions that is used here to compare two samples. The null hypothesis of the K-S statistic is that the samples are drawn from the same distribution. With a p value of 0.9945, the null hypothesis is not rejected. The observed and expected cumulative frequencies are concluded to come from the same distribution.

5.An experiment is performed and the following quantities are measured: A = 1.0 m ± 0.2 m, B = 2.0 m ± 0.2 m, C = 2.5 m/s ± 0.5 m/s, D = 0.10 s ± 0.01 s. If, for example, A+B = 3.0 m + 0.4 m, then determine the correct answers for the following expressions:

a. B – A = z

  z = B-A = 2.0 m – 1.0 m = 1.0 m 
  dz = sqrt(dB^2 + dA^2)
     = sqrt(0.2^2 + 0.2^2)
     = 0.28 m 

  z±dz= 1.0 m ± 0.28 m

b. C × D = z

  z = C x D = 2.5 m/s x 0.10 s = 0.25 m 
  dz = z(sqrt((dC/C)^2 + (dD/D)^2))
     = 0.25 (sqrt((0.5/2.5)^2 + (0.01/0.10)^2))
     = 0.056 m 

  z±dz = 0.25 m ± 0.056 m

c. B / D = z

  z = B/D = 2.0 m / 0.10 m = 20 m/s
  dz = z(sqrt((dB/B)^2 + (dD/D)^2))
     = 20 (sqrt((0.2/2.0)^2 + (0.01/0.10)^2))
     = 2.8 m/s

  z±dz = 20 m/s ± 2.8 m/s

d.3 × A = z

  z = 3A = 3(1.0 m) = 3.0 m 
  dz = 3 dA
     = 3 (0.2 m)
     = 0.6 m 

  z±dz = 3.0 m ± 0.6 m

e.(A×B)^½ = z

  z = (A)^(0.5) B^(0.5) = sqrt(1.0 m x 2.0 m) = 1.41 m 
  dz = z sqrt((n dA/A)^2 + (n dB/B)^2)
     = 1.41 sqrt((0.5 x 0.2/1.0)^2 + (0.5 x 0.2/2.0)^2)
     = 0.158 m 

  z±dz = 1.41 m ± 0.16 m

6.In a trial comparing a binary outcome between two groups, find the required number of patients to find an increase in the success rate from 60% to 75% with a power of 90% using a two-tailed t-test . What happens if we reduce the power requirement to 80%?

power.prop.test(p1=0.6, p2=0.75, sig.level=0.05, power=0.9)

## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 202.8
##              p1 = 0.6
##              p2 = 0.75
##       sig.level = 0.05
##           power = 0.9
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

power.prop.test(p1=0.6, p2=0.75, sig.level=0.05, power=0.8)

## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 151.9
##              p1 = 0.6
##              p2 = 0.75
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

#Reducing the power requirement to 80% decreases the number of patients by 51 (203-152). This can save time, resources, and money at the expense of power.

7.Plot the density of the non-central t distribution for ncp=3 and df=25 and compare it with the distribution of t+3, where t has a central t distribution with df=25.

curve(dt(x-3,25),from=0,to=5, ylab="dt")
curve(dt(x,25,3),add=T,col=2)
text(3,.1, "Red: df=25, ncp=3")
text(3,.175, "Black: df=25, x+3")

plot of chunk unnamed-chunk-11

8.Assume that the within-population standard deviations all equal 9. Set the Type 1 error rate at α = .05. Presume that the population means will have the following values: 17.5, 19, 25, 20.5. You intend to run 80 subjects in all, with equal n's across all 4 groups. Compute your power to reject the null hypothesis under these conditions.

m<-c(17.5,19,25,20.5)
group<-c(1,2,3,4)
data<-data.frame(group,m)
data

##   group    m
## 1     1 17.5
## 2     2 19.0
## 3     3 25.0
## 4     4 20.5

var(m)

## [1] 10.5

power.anova.test(groups=4,n=20,sig.level=0.05,between.var=10.5,within.var=81)

## 
##      Balanced one-way analysis of variance power calculation 
## 
##          groups = 4
##               n = 20
##     between.var = 10.5
##      within.var = 81
##       sig.level = 0.05
##           power = 0.6155
## 
## NOTE: n is number in each group

#Power = 61.6%

UsE the ISwR data for the last two problems:

9.The zelazo data are in the form of vectors, one for each of the four groups. Convert the data to a form suitable for the use of lm, and calculate the relevant test. Consider t-tests comparing selected groups or obtained by combining.

library(ISwR)
attach(zelazo)
data(zelazo)
walk<-unlist(zelazo)
group<-factor(rep(1:4,c(6,6,6,5)),labels=names(zelazo))
summary(lm(walk~group))

## 
## Call:
## lm(formula = walk ~ group)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -2.708 -0.850 -0.350  0.638  3.625 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    10.125      0.619   16.36  1.2e-12 ***
## grouppassive    1.250      0.875    1.43    0.170    
## groupnone       1.583      0.875    1.81    0.086 .  
## groupctr.8w     2.225      0.918    2.42    0.026 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.52 on 19 degrees of freedom
## Multiple R-squared:  0.253,  Adjusted R-squared:  0.135 
## F-statistic: 2.14 on 3 and 19 DF,  p-value: 0.129

t.test(zelazo$active,zelazo$ctr.8w)

## 
##  Welch Two Sample t-test
## 
## data:  zelazo$active and zelazo$ctr.8w
## t = -3.045, df = 8.663, p-value = 0.01453
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.8879 -0.5621
## sample estimates:
## mean of x mean of y 
##     10.12     12.35

t.test(zelazo$active,unlist(zelazo[-1]))

## 
##  Welch Two Sample t-test
## 
## data:  zelazo$active and unlist(zelazo[-1])
## t = -2.386, df = 9.086, p-value = 0.04058
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.22071 -0.08811
## sample estimates:
## mean of x mean of y 
##     10.12     11.78

10.The ashina trial was designed to have 80% statistical power if the true treatment difference was 15% and the standard deviation of differences within a person was 20%. Comment on the sample size chosen. (The power calculation was originally done using the approximate formula. The imbalance between group sizes is due to the use of an open randomization procedure.)

ashina

##    vas.active vas.plac grp
## 1        -167     -102   1
## 2        -127      -39   1
## 3         -58       32   1
## 4        -103       28   1
## 5         -35       16   1
## 6        -164      -42   1
## 7          -3      -27   1
## 8          25      -30   1
## 9         -61      -47   1
## 10        -45        8   1
## 11        -38       12   2
## 12         29       11   2
## 13          2       -9   2
## 14        -18       -1   2
## 15        -74        3   2
## 16        -72      -36   2

power.t.test(power=0.8,delta=0.3,sd=0.2)

## 
##      Two-sample t test power calculation 
## 
##               n = 8.06
##           delta = 0.3
##              sd = 0.2
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

power.t.test(power=0.8,delta=0.3,sd=0.2, alt="one.sided")

## 
##      Two-sample t test power calculation 
## 
##               n = 6.299
##           delta = 0.3
##              sd = 0.2
##       sig.level = 0.05
##           power = 0.8
##     alternative = one.sided
## 
## NOTE: n is number in *each* group

#approx formula
(qnorm(.975)+qnorm(0.8))^2*2*(.2/.3)^2

## [1] 6.977

#The sample size was about 9 (8.06) for a two-sided test and 7 (6.29) for a one-sided test. With the approximate formula for a two-sided test, sample size is calculated to be about 7 (6.98). There is indeed a reduction in power due to use of an open randomization procedure that led to imbalance of group sizes.