Chapter 8 (cont.)

Below are the code explanations from the in-class handout.

Example 1: Working While Enrolled in School

1

(a)

Let \(p\) be the population proportion of students working while enrolled in school. Here, the population is represented by all students at the school.

\(H_0: p=.488\)

\(H_a: p \ne .488\)

(b)

prop.test(40, 100, p=.488)
## 
##  1-sample proportions test with continuity correction
## 
## data:  40 out of 100
## X-squared = 2.7572, df = 1, p-value = 0.09682
## alternative hypothesis: true p is not equal to 0.488
## 95 percent confidence interval:
##  0.3047801 0.5029964
## sample estimates:
##   p 
## 0.4

The \(P\)-value is \(.09682\), the sample proportion of students who work wile enrolled in school is \(\hat{p}=.4\), and the 95% confidence interval is \((30.5\%, 50.3\%)\). To find the 99% confidence interval, we have to specify the confidence level inside prop.test:

prop.test(40, 100, p=.488, conf.level=.99)
## 
##  1-sample proportions test with continuity correction
## 
## data:  40 out of 100
## X-squared = 2.7572, df = 1, p-value = 0.09682
## alternative hypothesis: true p is not equal to 0.488
## 99 percent confidence interval:
##  0.279419 0.533502
## sample estimates:
##   p 
## 0.4

The 99% confidence interval is \((27.9\%, 53.4\%)\). To find the test statistic, \(z\), we can use xqnorm with half of the \(P\)-value (we use half because we have a two-sided alternative):

xqnorm(.09682/2)
##  P(X <= -1.6604696926881) = 0.04841
##  P(X >  -1.6604696926881) = 0.95159

## [1] -1.66047

So the value of the \(z\) test statistic is \(1.66\).

(c)

Since the \(P\)-value is greater than \(\alpha=.05\), we cannot reject the null hypothesis at the 5% significance level. This means that the data does not provide enough evidence for us to conclude that the percentage of students who work while enrolled is different from 48.8%. We are also 99% confident that the true percentage of students who work at our school is between 27.9% and 53.4%, and we are 95% confident that the true percentage is between 30.5% and 50.3%. This means that even though we did not have enough evidence to suggest that the true percentage is not 48.8%, there might be evidence that the true percentage is below 48.8% as a bigger portion of the confidence intervals is less than 48.8% and only a small portion of it is above 48.8%.

2 (this question is not on the handout)

Suppose that we have a good reason to suspect that the percentage of students who work while enrolled at our school is less than 48.8%. Repeat the significance test with a one-sided alternative.

(a)

The null hypothesis stays the same as for question 1, but the new alternative hypothesis is \(H_a: p<.488\).

(b)

prop.test(40, 100, p=.488, alternative="less")
## 
##  1-sample proportions test with continuity correction
## 
## data:  40 out of 100
## X-squared = 2.7572, df = 1, p-value = 0.04841
## alternative hypothesis: true p is less than 0.488
## 95 percent confidence interval:
##  0.0000000 0.4872158
## sample estimates:
##   p 
## 0.4

The only thing that is different for this test is the \(P\)-value. The new \(P\)-value is \(.04841\) which is half of the \(P\)-value from question 1. The value of the \(z\) test statistic and the confidence intervals do not change.

(c)

Since the \(P\)-value is less than \(\alpha=.05\), we can reject the null hypothesis at the 5% significance level. This means that the data provides enough evidence to suggest that the true percentage of students who work at our school is less than 48.8%.

Example 2: Smoking Among Young Adults

1

(a)

Before we can load the data set, we must require the Lock5Data package:

require(Lock5Data)
## Loading required package: Lock5Data

Load the data:

data(StudentSurvey)

(b)

\(H_0: p=.187\)

\(H_a: p<.187\)

(c)

When we have a data set with individual observations, we can use prop.test by specifying the variable we are tewsting:

prop.test(~Smoke, data=StudentSurvey, p=.187, alternative="less")
## 
##  1-sample proportions test with continuity correction
## 
## data:  StudentSurvey$Smoke  [with success = No]
## X-squared = 1143, df = 1, p-value = 1
## alternative hypothesis: true p is less than 0.187
## 95 percent confidence interval:
##  0.0000000 0.9076287
## sample estimates:
##         p 
## 0.8812155

However, we have to be careful when calling this command. Notice that in the output it says success=No. This means that the numbers in the output don’t reflect what we are looking for as we want the response Yes to be treated as a success. We can easily change this by adding an optional command that specify the value of the variable we want to be considered a success:

prop.test(~Smoke, data=StudentSurvey, p=.187, alternative="less",success="Yes")
## 
##  1-sample proportions test with continuity correction
## 
## data:  StudentSurvey$Smoke  [with success = Yes]
## X-squared = 10.636, df = 1, p-value = 0.0005546
## alternative hypothesis: true p is less than 0.187
## 95 percent confidence interval:
##  0.0000000 0.1511306
## sample estimates:
##         p 
## 0.1187845

So the \(P\)-value \(.0005546\) and the sample proportion of smokers is \(\hat{p}=.1188\). To find the value of the \(z\) test statistic, we can use xqnorm with the \(P\)-value. Note that we do not have to divide the \(P\)-value by two here because we obtained the \(P\)-value using a one-sided alternative.

xqnorm(.0005546)
##  P(X <= -3.26125561474464) = 0.0005546
##  P(X >  -3.26125561474464) = 0.9994454

## [1] -3.261256

So \(z=-3.26\). Note that we keep it negative as we have a one-sided alternative with a \(<\) symbol. To find the 95% confidence interval, we have to run the test again with the default two-sided alternative:

prop.test(~Smoke, data=StudentSurvey, p=.187,success="Yes")
## 
##  1-sample proportions test with continuity correction
## 
## data:  StudentSurvey$Smoke  [with success = Yes]
## X-squared = 10.636, df = 1, p-value = 0.001109
## alternative hypothesis: true p is not equal to 0.187
## 95 percent confidence interval:
##  0.08819147 0.15771103
## sample estimates:
##         p 
## 0.1187845

The 95% confidence interval is \((8.8\%, 15.8\%)\).

(d)

Since are \(P\)-value is really small (less than .001), we can reject the null hypothesis at the .1% significance level. This means that the data provides enough evidence to conclude that the true percentage of students who smoke at the school is less than 18.7%. We are also 95% confident that the true percentage is between 8.8% and 15.8%.

2

(a)

\(H_0: p_F-p_M=0\)

\(H_a: p_F-p_M<0\)

(b)

When we use prop.test for a two-sample proportion test, we cannot specify which value of the categorical variable is counted as a success. The value that comes alphabetically first is automatically used as a success. So for our example, the non-smokers are counted as success. This means that we have to rephrase our hypotheses to be out the proportion of non-smokers, which means that the difference in the alternative will be greater than zero.

prop.test(~Smoke|Gender, data=StudentSurvey, alternative="greater")
## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  tally(Smoke ~ Gender)
## X-squared = 1.3548, df = 1, p-value = 0.1222
## alternative hypothesis: greater
## 95 percent confidence interval:
##  -0.01563501  1.00000000
## sample estimates:
##    prop 1    prop 2 
## 0.9053254 0.8601036

So the \(P\)-value is .122, the proportion of female smokers in the sample is \(\hat{p}_F=1-.905=.095\) and the proportion of male smokers in the sample is \(\hat{p}_M=1-.86=.14\). We can use the prop.test command with two samples in a different way. First, we need to create a two-way table with our data. We can do this using the tally command. Adding the option margins=T will add the row and column sums as well.

tally(~Smoke&Gender, data=StudentSurvey, margins=T)
##        Gender
## Smoke     F   M Total
##   No    153 166   319
##   Yes    16  27    43
##   Total 169 193   362

Let

\(x_1=\) number of female smokers in the sample \(=16\)

\(x_2=\) number of male smokers in the sample \(=27\)

\(n_1=\) total number of females in the sample \(=169\)

\(n_2=\) total number of males in the sample \(=193\)

Now, we can use prop.test(c(\(x_1, x_2\)), c(\(n_1, n_2\))) to carry out the significance test.

prop.test(c(16, 27), c(169, 193), alternative="less")
## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  c(16, 27) out of c(169, 193)
## X-squared = 1.3548, df = 1, p-value = 0.1222
## alternative hypothesis: less
## 95 percent confidence interval:
##  -1.00000000  0.01563501
## sample estimates:
##     prop 1     prop 2 
## 0.09467456 0.13989637

This gives the same \(P\)-value as before, but now the output includes the correct sample proportions. To get the 95% confidence interval, we have to use the command with the default two-sided alternative:

prop.test(c(16, 27), c(169, 193))
## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  c(16, 27) out of c(169, 193)
## X-squared = 1.3548, df = 1, p-value = 0.2444
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.11667411  0.02623048
## sample estimates:
##     prop 1     prop 2 
## 0.09467456 0.13989637

So the 95% confidence interval is \((-11.6\%, 2.6\%)\).

(c)

The \(P\)-value is greater than .05, so we cannot reject the null hypothesis. This means that we do not have enough evidence to conclude that there are less female smokers than male smokers at the school. We are also 95% confident that the true difference between the percentage of female and male smokers \((p_F-p_M)\) at the school is between -11.6% and 2.6%.