STA 5206 Assignment#5 Danilo Martinez

7.6 A leading researcher in the study of interstate highway accidents proposes that a major cause of many collisions on the interstates is not the speed of the vehicles but rather the difference in speeds of the vehicles. When some vehicles are traveling slowly while other vehicles are traveling at speeds greatly in excess of the speed limit, the faster-moving vehicles may have to change lanes quickly, which can increase the chance of an accident. Thus, when there is a large variation in the speeds of the vehicles in a given location on the interstate, there may be a larger number of accidents than when the traffic is moving at a more uniform speed. The researcher believes that when the standard deviation in speed of vehicles exceeds 10 mph, the rate of accidents is greatly increased. During a 1-hour period of time, a random sample of 50 vehicles is selected from a section of an interstate known to have a high rate of accidents, and their speeds are recorded using a radar gun. The data are presented here. 56.1 57.0 53.9 50.2 54.2 47.9 78.1 60.2 47.4 68.8 45.5 63.3 59.7 74.3 61.4 58.7 61.2 64.7 64.3 48.2 57.7 72.1 72.0 67.6 47.6 65.9 72.3 55.7 55.0 75.2 62.8 47.0 48.1 62.9 64.0 80.6 51.2 53.7 53.3 58.3 68.2 69.5 51.8 68.8 63.8 61.8 59.3 63.6 54.7 59.9

Estimate the standard deviation in the speeds of the vehicles traveling on the interstate using a 95% confidence interval.

library(BSDA)
library(pwr)
sps <- read.csv(file = "speeds.csv", header = TRUE)
chisq.test(sps)

## 
##  Chi-squared test for given probabilities
## 
## data:  sps
## X-squared = 62.357, df = 49, p-value = 0.09525

csq<-62.357
dg<-49
alpha<-.05
length(sps$s)

## [1] 50

st<-sd(sps$s)
L = (dg*(st^2))/qchisq(alpha/2,dg,lower.tail = FALSE)
U = (dg*(st^2))/qchisq(alpha/2,dg,lower.tail = TRUE)
sqrt(L)

## [1] 7.322995

sqrt(U)

## [1] 10.92429

Do the data indicate that the standard deviation in vehicle speeds exceeds 10 mph? Use alpha = .05 in reaching your conclusion.

chisquare<-dg*var(sps$s)/100
pval<-1-pchisq(chisquare,dg)
pval

## [1] 0.8809251

With a p value of 0.8809251 greater than alpha,we accept the null hypothesis. There is not enough evidence to suggest that the vehicle speeds excees 10 mph.

7.16 The SAT Reasoning Test is an exam taken by most high school students as part of their college admission requirements. A proposal has been made to alter the exam by having the students take the exam on a computer. The exam questions would be selected for the student in the following fashion. For a given section of questions, if the student answers the initial questions posed correctly, then the following questions become increasingly difficult. If the student provides incorrect answers for the initial questions asked in a given section, then the level of difficulty of latter questions does not increase. The final score on the exams will be standardized to take into account the overall difficulty of the questions on each exam. The testing agency wants to compare the scores obtained using the new method of administering the exam to the scores using the current method. A group of 182 high school students is randomly selected to participate in the study with 91 students randomly assigned to each of the two methods of administering the exam. The data are summarized in the following table and boxplots for the math portion of the exam. Summary Data for SAT Reasoning Exams Testing Method Sample Size Mean Standard Deviation Computer 91 484.45 53.77 Conventional 91 487.38 36.94 **Refer to Boxplots Evaluate the two methods of administering the SAT exam. Provide tests of hypotheses and confidence intervals. Are the means and standard deviations of scores for the two methods equivalent? Justify your answer using alpha = .05. Ho: Ocomputers = Oconventional Ha: Ocomputers != Oconventional F Statistic

u1<-484.5
u2<-487.38
s1<-53.77
s2<-36.94
n<-91
dg<-n-1
alpha<-.05
Fstat<- s1^2/s2^2
Fstat

## [1] 2.118782

95% confidence interval

L = qf(alpha/2,dg,dg)
U = qf(1-(alpha/2),dg,dg)
L

## [1] 0.6597954

## [1] 1.515621

Since the value of the F statistic is lying in the rejection region, there is sufficient evidence to reject Ho, the data indicates that there is a difference in the standard deviation of the two methods.

zsum.test(mean.x = u1,sigma.x = s1,n.x = n,mean.y = u2,sigma.y = s2,n.y = n,mu=0,alternative = "two.sided",conf.level = .95)

## 
##  Two-sample z-Test
## 
## data:  Summarized x and y
## z = -0.42114, p-value = 0.6737
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -16.28346  10.52346
## sample estimates:
## mean of x mean of y 
##    484.50    487.38

With a zscore of -.42114, which is below -1.96 for a 95% confidence interval, there is sufficient evidence to reject the null hypothesis, indicating that there is a difference between the means.

7.30 A new steroidal treatment for a skin condition in dogs was under evaluation by a veterinary hospital. One of the possible side effects of the treatment is that a dog receiving the treatment may have an allergic reaction to the treatment. This type of allegeric reaction manifests itself through an elevation in the resting pulse rate of the dog after the dog has received the treatment for a period of time. A group of 80 dogs of the same breed and age, and all having the skin condition, is randomly assigned to either a placebo treatment or the steroidal treatment. Four days after receiving the treatment, either steroidal or placebo, resting pulse rate measurements are taken on all the dogs. These data are displayed here. Dogs of this age and breed have a fairly constant resting pulse rate of 100 beats a minute. The researchers are interested in testing whether there is a significant difference between the placebo and treatment dogs in terms of both the means and standard deviations of the resting pulse rates.

Placebo Group Pulse Rates 105.1 103.3 102.1 102.3 101.5 100.6 104.5 103.2 101.8 102.1 108.1 103.2 104.0 103.9 105.3 103.6 102.3 103.9 103.0 107.0 102.3 103.5 111.7 101.4 103.0 101.1 103.7 102.3 106.2 100.8 102.1 104.3 104.0 102.2 103.1 104.7 102.3 110.1 103.1 103.4

Treatment Group Pulse Rates 107.6 107.8 110.4 106.6 108.2 113.4 113.5 108.7 108.2 106.0 105.3 107.1 110.3 108.7 107.4 111.1 105.9 106.9 106.4 111.5 106.8 107.8 106.1 106.7 105.0 110.4 105.9 106.4 106.0 106.0 106.9 107.6 107.0 105.8 108.6 109.3 108.5 106.9 107.0 109.2 a. Is there significant evidence of an increase in the mean pulse rates for those dogs receiving the treatment?

plcb <- c(105.1,103.3,102.1,102.3,101.5,100.6,104.5,103.2,101.8,102.1,108.1,103.2,104.0,103.9,105.3,103.6,102.3,103.9,103.0,107.0,102.3,103.5,111.7,101.4,103.0,101.1,103.7,102.3,106.2,100.8,102.1,104.3,104.0,102.2,103.1,104.7,102.3,110.1,103.1,103.4)
trmt<- c(107.6,107.8,110.4,106.6,108.2,113.4,113.5,108.7,108.2,106.0,105.3,107.1,110.3,108.7,107.4,111.1,105.9,106.9,106.4,111.5,106.8,107.8,106.1,106.7,105.0,110.4,105.9,106.4,106.0,106.0,106.9,107.6,107.0,105.8,108.6,109.3,108.5,106.9,107.0,109.2)
z.test(plcb,trmt,alternative="less",mu=0,sigma.x=sd(plcb),sigma.y=sd(trmt), conf.level = 0.95)

## 
##  Two-sample z-Test
## 
## data:  plcb and trmt
## z = -8.6175, p-value < 2.2e-16
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##         NA -3.414513
## sample estimates:
## mean of x mean of y 
##  103.6525  107.8725

The z test shows a P value close to zero,so we reject the null hypothesis and accept the research that there is enough evidence to accept the claim that the heart rate increases after the treatment. b. Is there significant evidence of a difference in the levels of variability in pulse rate between the placebo and the treatment group of dogs?

var.test(plcb,trmt,ratio = 1,alternative = c("two.sided"),conf.level = .95)

## 
##  F test to compare two variances
## 
## data:  plcb and trmt
## F = 1.2645, num df = 39, denom df = 39, p-value = 0.467
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.6688081 2.3908676
## sample estimates:
## ratio of variances 
##           1.264528

The p value for the F test is .467, so we accept the null hypothesis. There is no sufficient evidence of a difference showing a variability in the pulse rate of the dogs. Placebo and treatment groups have equal variances. c. Provide a 95% confidence interval on the difference in mean pulse rates between the placebo and treatment groups.

z.test(plcb,trmt,alternative="two.sided",mu=0,sigma.x=sd(plcb),sigma.y=sd(trmt), conf.level = 0.95)

## 
##  Two-sample z-Test
## 
## data:  plcb and trmt
## z = -8.6175, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -5.179797 -3.260203
## sample estimates:
## mean of x mean of y 
##  103.6525  107.8725

95 percent confidence interval: -5.179797 -3.260203

Do the necessary conditions hold for the statistical procedures you applied in parts (a)-(c)? Justify your answer. Let’s check to see if the data is normally distributed.

qqnorm(plcb,ylab="Heart Rate",xlab="Placebo",main="Normal Probability Plot")
qqline(plcb)

qqnorm(trmt,ylab="Heart Rate",xlab="Treatment",main="Normal Probability Plot")
qqline(trmt)

The data for the placebo and the treatment are approximately normally distributed and follow a similar pattern so we can conclude that the the tests that were carried out are valid.

Extra Problem (Read entire problem before you answer) Step 1. Take a random sample of size n = 20 from N(mean1 = 25, variance1 = 9), take a second random sample of size 25 from N(mean2= 25, variance2 = 16). Now assume that the variances are not known. (a) Use equal variance t-test formula to test H0: mean1 = mean2, Ha: mean1 ???mean2. Save the pvalue of your test.

a<-.05
n1<-20
u1<-25
o1<-9
n2<-25
u2<-25
o2<-16
r<-10000
y1<-rep(0,n1)
y2<-rep(0,n1)
tvaryes<-0
tvarno<-0
ychk<-0
nchk<-0

y1<-rnorm(n1,u1,o1)
y2<-rnorm(n2,u2,o2)
t.test(y1,y2,alternative = "two.sided",mu=0,paired = FALSE, var.equal = TRUE,conf.level = 1-a)

## 
##  Two Sample t-test
## 
## data:  y1 and y2
## t = -0.31073, df = 43, p-value = 0.7575
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.012689   8.072104
## sample estimates:
## mean of x mean of y 
##  25.43051  26.90080

p-value = 0.8645 (b) Next use unequal variance t-test formula to test H0: mean1 = mean2, Ha: mean1 ???mean2. Save the pvalue of your test.

t.test(y1,y2,alternative = "two.sided",mu=0,paired = FALSE, var.equal = FALSE,conf.level = 1-a)

## 
##  Welch Two Sample t-test
## 
## data:  y1 and y2
## t = -0.33623, df = 33.897, p-value = 0.7388
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -10.358036   7.417451
## sample estimates:
## mean of x mean of y 
##  25.43051  26.90080

p-value = 0.8546

Step 2. Repeat step 1 10000 times and report the proportion of times you reject null hypothesis under (a) and (b) when alpha = 0.05 is used.

for(i in 1:r)
{
  y1<-rnorm(n1,u1,o1)
  y2<-rnorm(n2,u2,o2)
  tvaryes<-t.test(y1,y2,alternative="two.sided",mu=0,
  paired=FALSE, var.equal = TRUE,conf.level = 1-a)
  tvarno<-t.test(y1,y2,alternative="two.sided",mu=0,
  paired=FALSE, var.equal = FALSE,conf.level = 1-a)
  if (tvaryes$p.value <a)
  {
    ychk<-ychk+1
  }
  if (tvarno$p.value <a)
  {
    nchk<-nchk+1
  }
}
ychk

## [1] 379

nchk

## [1] 470

Equal variance proportion of rejection

ychk/r

## [1] 0.0379

Unequal variance proportion of rejection

nchk/r

## [1] 0.047

The proportion calculated in step 2 is called empirical type I error rate. Which of the two empirical proportions are closer to 0.05? What can be said about the appropriateness of the two t-test method? The empirical proportion for the unequal variance is closer to .05. Therefore, the unequal variance test is more accurate. If we are concerned with accuracy, we would not use equal variances unless we are sure that is the case.