My goal of this document is to show you how to do your homeworks by using R and more specifically base R (no packages!) I am not going to load any data so I’ll have to be given the statistics, just like the WeBWorK questions. You should notice how unrealistic this is. Of course you would normally have the data loaded…
To tackle this and any hypothesis test you should state your null and alternative hypothesis. \[ H_0: \mu = 0\\ H_A:\mu\neq 0 \] Then we will use a test statistic \[ z = \frac{\overline x-\mu}{\frac{\sigma}{\sqrt n}} \] We pick this test statistic because the population standard deviation was given.
xbar = 93
sigma = 172
n = 45
z = xbar/(sigma/sqrt(n))
z
## [1] 3.62711
To do the corresponding \(p\) value, I will need to use the pnorm function but that will give me everything to the left of the \(z\) value. Here I want everything to the right and I need to double it because it is a two tailed alternative hypothesis.
2*(1-pnorm(z))
## [1] 0.0002866109
If I was asked to include the confidence interval, I could get the critical \(z\) values by using the qnorm function. Again though I have to be careful about the two tails! Let’s do 95% because that is traditional. I’ll also need the standard error (it is the thing in the denominator in the test statistic)
\[ SE = \frac{\sigma}{\sqrt{n}} \]
alpha = 0.05
zcrit = qnorm(1-alpha/2)
serror = sigma/sqrt(n)
c(zcrit*serror,zcrit*serror)
## [1] 50.25396 50.25396
I used the c to write both values. This is the confidence interval about the mean \(\mu\). Notice this interval does not contain \(\overline x\). So we can reject the null hypothesis.
Here my hypothesis is one tailed. I am trying to show that things have improved! \[ H_0: \mu = 245\\ H_A: \mu >245 \] This time, I do not know the population standard deviation only the sample standard deviation so I will use a different test statistic. \[ t = \frac{\overline x-\mu}{\frac{s}{\sqrt n}} \]
mu = 245
xbar = 248.7
s = 49.7
n = 102
t = (xbar-mu)/(s/sqrt(n))
t
## [1] 0.7518746
To use the \(t\)-student distribution, I also need to know the degrees of freedom. For that I am going to use \(n-1\). There are several improvements that can be done to the degrees of freedom but for your homework, \(df = n-1\) will suffice. The function pt will give us the probability
df = n-1
1-pt(t,df)
## [1] 0.2269375
If I want a critical \(t\) for say 90%.
tcrit = qt(.90,df)
tcrit
## [1] 1.28999
We see that we are less than are critical \(t\) so we will fail to reject the null hypothesis.
\[ H_0: \mu_1 = \mu_2\\ H_A: \mu_1\neq\mu_2 \]
Here we are comparing two means. We will use a test statistic of \[ z = \frac{(\overline x_1-\overline x_2)-0}{\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}}} \] This one is \(z\) because I know the population variances! I have assumed that the means are equal here too, that is why I am subtracing zero. So we set it up in R just like the other two.
xbar1 = 7.6
xbar2 = 8.2
sigma1 = 32
sigma2 = 35
n1 = 75
n2 = 100
se = sqrt(sigma1/n1+sigma2/n2)
z = (xbar1-xbar2)/se
se
## [1] 0.8812869
1-pnorm(z)
## [1] 0.7520081
I’ll fail to reject my null hypothesis, I do not have evidence to suggest the alternative that they are not equal.
My 95% confidence interval will be
zcrit = qnorm(1-.05/2)
c((xbar1-xbar2)-zcrit*se,(xbar1-xbar2)+zcrit*se)
## [1] -2.327291 1.127291
Randomly selected 18 student cars have ages with a mean of 7.1 years and a standard deviation of 3.4 years, while randomly selected 17 faculty cars have ages with a mean of 5.2 years and a standard deviation of 3.7 years. Use a 0.05 significance level to test the claim that student cars are older than faculty cars.
Here we are comparing two means but we do not know the population standard deviations only the sample so this is a \(t\). We will use a test statistic of \[ t = \frac{(\overline x_1-\overline x_2)-0}{\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}} \]
We also need the degrees of freedom. There are several ways to do this. I am going to use the simplest, \[ df = min(n_1-1,n_2-1) \]
xbar1 = 7.1
xbar2 = 5.2
sigma1 = 3.4
sigma2 = 3.7
n1 = 18
n2 = 17
df = min(n1-1,n2-1)
se = sqrt(sigma1**2/n1+sigma2**2/n2)
t = (xbar1-xbar2)/se
t
## [1] 1.579217
This test is one-sided
tcrit = qt(1-.05,df)
tcrit
## [1] 1.745884
But when you compute the confidence interval you need to do it as two sided
tcrit = qt(1-.05/2,df)
c((xbar1-xbar2)-tcrit*se,(xbar1-xbar2)+tcrit*se)
## [1] -0.6505169 4.4505169