Keep It Simple

My goal of this document is to show you how to do your homeworks by using R and more specifically base R (no packages!) I am not going to load any data so I’ll have to be given the statistics, just like the WeBWorK questions. You should notice how unrealistic this is. Of course you would normally have the data loaded…

  1. 45 people are randomly selected and the accuracy of their wristwatches is checked, with positive errors representing watches that are ahead of the correct time and negative errors representing watches that are behind the correct time. The 45 values have a mean of 93sec and a population standard deviation of 172sec. Use a 0.01 significance level to test the claim that the population of all watches has a mean of 0sec (use a two-sided alternative).

To tackle this and any hypothesis test you should state your null and alternative hypothesis. \[ H_0: \mu = 0\\ H_A:\mu\neq 0 \] Then we will use a test statistic \[ z = \frac{\overline x-\mu}{\frac{\sigma}{\sqrt n}} \] We pick this test statistic because the population standard deviation was given.

xbar = 93
sigma = 172
n = 45

z = xbar/(sigma/sqrt(n))
z
## [1] 3.62711

To do the corresponding \(p\) value, I will need to use the pnorm function but that will give me everything to the left of the \(z\) value. Here I want everything to the right and I need to double it because it is a two tailed alternative hypothesis.

2*(1-pnorm(z))
## [1] 0.0002866109

If I was asked to include the confidence interval, I could get the critical \(z\) values by using the qnorm function. Again though I have to be careful about the two tails! Let’s do 95% because that is traditional. I’ll also need the standard error (it is the thing in the denominator in the test statistic)

\[ SE = \frac{\sigma}{\sqrt{n}} \]

alpha = 0.05
zcrit = qnorm(1-alpha/2)
serror = sigma/sqrt(n)
c(zcrit*serror,zcrit*serror)
## [1] 50.25396 50.25396

I used the c to write both values. This is the confidence interval about the mean \(\mu\). Notice this interval does not contain \(\overline x\). So we can reject the null hypothesis.

  1. Golf-course designers have become concerned that old courses are becoming obsolete since new technology has given golfers the ability to hit the ball so far. Designers, therefore, have proposed that new golf courses need to be built expecting that the average golfer can hit the ball more than 245 yards on average. Suppose a random sample of 102 golfers be chosen so that their mean driving distance is 248.7 yards, with a standard deviation of 49.7.

Here my hypothesis is one tailed. I am trying to show that things have improved! \[ H_0: \mu = 245\\ H_A: \mu >245 \] This time, I do not know the population standard deviation only the sample standard deviation so I will use a different test statistic. \[ t = \frac{\overline x-\mu}{\frac{s}{\sqrt n}} \]

mu = 245
xbar = 248.7
s = 49.7
n = 102

t = (xbar-mu)/(s/sqrt(n))
t
## [1] 0.7518746

To use the \(t\)-student distribution, I also need to know the degrees of freedom. For that I am going to use \(n-1\). There are several improvements that can be done to the degrees of freedom but for your homework, \(df = n-1\) will suffice. The function pt will give us the probability

df = n-1
1-pt(t,df)
## [1] 0.2269375

If I want a critical \(t\) for say 90%.

tcrit = qt(.90,df)
tcrit
## [1] 1.28999

We see that we are less than are critical \(t\) so we will fail to reject the null hypothesis.

  1. Two independent samples have been selected, 75 observations from population 1 and 100 observations from population 2. The sample means have been calculated to be \(\overline x_1=7.6\) and \(\overline x_2=8.2\). From previous experience with these populations, it is known that the variances are \(\sigma^2_1=32\) and \(\sigma^2_2=35\).

\[ H_0: \mu_1 = \mu_2\\ H_A: \mu_1\neq\mu_2 \]

Here we are comparing two means. We will use a test statistic of \[ z = \frac{(\overline x_1-\overline x_2)-0}{\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}}} \] This one is \(z\) because I know the population variances! I have assumed that the means are equal here too, that is why I am subtracing zero. So we set it up in R just like the other two.

xbar1 = 7.6
xbar2 = 8.2
sigma1 = 32
sigma2 = 35
n1 = 75
n2 = 100

se = sqrt(sigma1/n1+sigma2/n2)
z = (xbar1-xbar2)/se
se
## [1] 0.8812869
1-pnorm(z)
## [1] 0.7520081

I’ll fail to reject my null hypothesis, I do not have evidence to suggest the alternative that they are not equal.

My 95% confidence interval will be

zcrit = qnorm(1-.05/2)
c((xbar1-xbar2)-zcrit*se,(xbar1-xbar2)+zcrit*se)
## [1] -2.327291  1.127291

WeBWorK 37

Randomly selected 18 student cars have ages with a mean of 7.1 years and a standard deviation of 3.4 years, while randomly selected 17 faculty cars have ages with a mean of 5.2 years and a standard deviation of 3.7 years. Use a 0.05 significance level to test the claim that student cars are older than faculty cars.

Here we are comparing two means but we do not know the population standard deviations only the sample so this is a \(t\). We will use a test statistic of \[ t = \frac{(\overline x_1-\overline x_2)-0}{\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}} \]

We also need the degrees of freedom. There are several ways to do this. I am going to use the simplest, \[ df = min(n_1-1,n_2-1) \]

xbar1 = 7.1
xbar2 = 5.2
sigma1 = 3.4
sigma2 = 3.7
n1 = 18
n2 = 17
df = min(n1-1,n2-1)

se = sqrt(sigma1**2/n1+sigma2**2/n2)
t = (xbar1-xbar2)/se
t
## [1] 1.579217

This test is one-sided

tcrit = qt(1-.05,df)
tcrit
## [1] 1.745884

But when you compute the confidence interval you need to do it as two sided

tcrit = qt(1-.05/2,df)
c((xbar1-xbar2)-tcrit*se,(xbar1-xbar2)+tcrit*se)
## [1] -0.6505169  4.4505169