Week 6 Homework

6.8 Elderly Drivers

In January 2011, The Marist Poll published a report stating that 66% of adults nationally think licensed drivers should be required to retake their road test once they reach 65 years of age. It was also reported that interviews were conducted on 1,018 American adults, and that the margin of error was 3% using a 95% confidence level.4
a) Verify the margin of error reported by The Marist Poll.
ME = 1.96*sqrt((.66*.44)/1018)
ME
## [1] 0.033104

The morgin of error reported by The Marist Poll was correct.

b) Based on a 95% confidence interval, does the poll provide convincing evidence that more than 70% of the population think that licensed drivers should be required to retake their road test once they turn 65?
pbar = .66
p0 = .70
n = 1018
z = (pbar-p0)/sqrt(p0*(1-p0)/n)
z
## [1] -2.784994

z = -2.784

alpha = .05
z.alpha = qnorm(1-alpha)
z.alpha
## [1] 1.644854
pval = pnorm(z, lower.tail=FALSE)
pval
## [1] 0.9973236

Since the test statistic is NOT greater than the critical value of 1.64 and the p-value IS greater than the .05 significance level, we do not reject the null, so the poll does not provide evidence that more than 70% of the population think that licensed drivers should be required to retake their road test once they turn 65.

6.16 Is College Worth It?

Among a simple random sample of 331 American adults who do not have a four-year college degree and are not currently enrolled in school, 48% said they decided not to go to college because they could not afford school.
a)A newspaper article states that only a minority of the Americans who decide not to go to college do so because they cannot afford it and uses the point estimate from this survey as evidence. Conduct a hypothesis test to determine if these data provide strong evidence supporting this statement.

Ho: >= 50% of American adults who decided not to go to college did so because they could not afford it

Ha: < 50% of American adults who decided not to go to college did so because they could not afford it

pbar1 = .48
p01 = .50
n = 331
z1 = (pbar-p0)/sqrt(p0*(1-p0)/n) 
z1
## [1] -1.588051

The test statistic -1.58 is not less than the critical value of -1.64 so therefore we do not reject the null hypothesis, and there is not enough evidence to support the newspaper’s claim.

b)Would you expect a confidence interval for the proportion of American adults who decide not to go to college because they cannot afford it to include 0.5? Explain.
SE = sqrt((.48*(1-.48))/331)
SE
## [1] 0.02746049

With a standard error of .027 or 2.7% and a sample proportion of .48 or 48% we can form a 95% confidence interval. We are 95% confident that the population proportion falls within 2 SE’s of the sample proportion, in the range (42.6%-53.4%). Therefore the confidence interval, at 95%, does contain .5.

6.24 Heart Transplant Success

The Stanford University Heart Transplant Study was con- ducted to determine whether an experimental heart transplant program increased lifespan. Each patient entering the program was o cially designated a heart transplant candidate, meaning that he was gravely ill and might benefit from a new heart. Patients were randomly assigned into treatment and control groups. Patients in the treatment group received a transplant, and those in the control group did not. The table below displays how many patients survived and died in each group.
myvector=c(4,30) 
mymatrix=matrix(c(4,30,24,45), nrow=2)
colnames(mymatrix) <- c("Control", "Treatment")
rownames(mymatrix) <-c("Alive", "Dead")
mymatrix
##       Control Treatment
## Alive       4        24
## Dead       30        45
A hypothesis test would reject the conclusion that the survival rate is the same in each group, and so we might like to calculate a confidence interval. Explain why we cannot construct such an interval using the normal approximation. What might go wrong if we constructed the confidence interval despite this problem?

We cannot calculate a confidence interval because the few number of discrete variables and it is count data rather than observations. However, we can use the Chi-Square distribution.

alive = c(4,24)
dead = c(30,45)
heartData<-rbind(alive,dead)
colnames(heartData)=c("control","treatment")
chisq.test(heartData)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  heartData
## X-squared = 4.9891, df = 1, p-value = 0.02551

Chi-Square = 4.98 and p-value = 0.025. Therefore, we can reject Ho, and we conclude that there is a differnce in survival rate between the two groups.

6.32 Full Body Scan

A news article reports that “Americans have differing views on two potentially inconvenient and invasive practices that airports could implement to uncover potential terrorist attacks.” This news piece was based on a survey conducted among a random sample of 1,137 adults nationwide, interviewed by telephone November 7-10, 2010, where one of the questions on the survey was “Some airports are now using ‘full-body’ digital x-ray machines to electronically screen passengers in airport security lines. Do you think these new x-ray machines should or should not be used at airports?” Below is a summary of responses based on party affiliation.
myvector1=c(264,38,16) 
mymatrix1=matrix(c(264,38,16,299,55,15,351,77,22), nrow=3)
colnames(mymatrix1) <- c("Republican", "Democrat", "Independent")
rownames(mymatrix1) <-c("Should", "Should Not", "Don't Know/No Answer")
mymatrix1
##                      Republican Democrat Independent
## Should                      264      299         351
## Should Not                   38       55          77
## Don't Know/No Answer         16       15          22
a) Conduct an appropriate hypothesis test evaluating whether there is a difference in the proportion of Republicans and Democrats who think the full-body scans should be applied in airports. Assume that all relevant conditions are met.
myvector2=c(264,38,16) 
mymatrix2=matrix(c(264,38,16,299,55,15), nrow=3)
colnames(mymatrix2) <- c("Republican", "Democrat")
rownames(mymatrix2) <-c("Should", "Should Not", "Don't Know/No Answer")
mymatrix2
##                      Republican Democrat
## Should                      264      299
## Should Not                   38       55
## Don't Know/No Answer         16       15

Since we are just looking at the differnce between Republican and Democrat, I removed Independent from the matrix. Now we can run the chisq.test to test our hypothesis.

Ho: No difference in proportion Republicans and Democrats who think the full-body scans should be applied in airports

Ha: There is a difference in proportion Republicans and Democrats who think the full-body scans should be applied in airports

chisq.test(mymatrix2)
## 
##  Pearson's Chi-squared test
## 
## data:  mymatrix2
## X-squared = 1.5381, df = 2, p-value = 0.4635

Chi-Square = 1.53 p-value = 0.4635 Since p > 0.05 we do not reject the null hypothesis and we can conclude that there is no evidence of an association between party and opinion.

6.40 True or False

Determine if the statements below are true or false. For each false statement, suggest an alternative wording to make it a true statement.
a)As the degrees of freedom increases, the mean of the chi-square distribution increases.

TRUE

b)If you found 2 = 10 with df = 5 you would fail to reject H0 at the 5% significance level.
pchisq(10,5,lower.tail=FALSE)
## [1] 0.07523525

p-value (.075) > .05

TRUE

c)When finding the p-value of a chi-square test, we always shade the tail areas in both tails.

FALSE

The chi-squared test is basically always a one-sided test because it is squared and made positive.

d) As the degrees of freedom increases, the variability of the chi-square distribution decreases.

TRUE

6.48 Coffee And Depression

Researchers conducted a study investigating the relationship between caffeinated coffee consumption and risk of depression in women. They collected data on 50,739 women free of depression symptoms at the start of the study in the year 1996, and these women were followed through 2006. The researchers used questionnaires to collect data on caffeinated coffee consumption, asked each individual about physician-diagnosed depression, and also asked about the use of antidepressants. The table below shows the distribution of incidences of depression by amount of caffeinated coffee consumption.
myvector3=c(670,11545) 
mymatrix3=matrix(c(670,11545,373,6244,905,16392,564,11726,95,2288), nrow=2)
colnames(mymatrix3) <- c("<=1Cup/Week", "2-6Cups/Weel", "1Cup/Day", "2-3Cups/Day", ">=4Cups/Day")
rownames(mymatrix3) <-c("Yes", "No")
mymatrix3
##     <=1Cup/Week 2-6Cups/Weel 1Cup/Day 2-3Cups/Day >=4Cups/Day
## Yes         670          373      905         564          95
## No        11545         6244    16392       11726        2288
a)What type of test is appropriate for evaluating if there is an association between coffee intake and depression?

Based on the data, conducting a chi-square independence test to evaluate if the variables are dependent or not is most appropriate.

b) Write the hypotheses for the test you identified in part (a).

Ho: Depression in women is independent of coffee consumption

Ha: Depression in women is NOT independent of coffee consumption

c) Calculate the overall proportion of women who do and do not suffer from depression

Proportion of women who do notsuffer from depression = 48,132/50,739 or .949 or 94.9%.

Therefore, 2,607/50,739 do not suffer from depression (5.1%).

Exp=(2607*6617)/50739
Exp
## [1] 339.9854

The expected value is 339.98. We can plug this into the chi-test statistic to find its contribution.

ChiSq = (373-339.98)^2/339.98
ChiSq
## [1] 3.207013

Chi-Sq = 3.207

e) The test statistic is 2 = 20.93. What is the p-value?
ChiSq=20.93
r=2
c=5
df=(r-1)*(c-1)
round(1-pchisq(ChiSq,df),5)
## [1] 0.00033

P-value = .0003

f)What is the conclusion of the hypothesis test?

With a p-value less than .05, we can reject the null hypothesis that depression and coffee consumption are independent.

g)One of the authors of this study was quoted on the NYTimes as saying it was “too early to recommend that women load up on extra coffee” based on just this study. Do you agree with this statement? Explain your reasoning.

Yes I agree with this statement because only one variable is being tested, leaving room for confounding variables to effect the results.

6.56 Is Yawning Contagious?

An experiment conducted by the MythBusters, a science en- tertainment TV program on the Discovery Channel, tested if a person can be subconsciously influenced into yawning if another person near them yawns. 50 people were randomly assigned to two groups: 34 to a group where a person near them yawned (treatment) and 16 to a group where there wasn’t a person yawning near them (control). The following table shows the results of this experiment.68
myvector4=c(10,324) 
mymatrix4=matrix(c(10,24,4,12), nrow=2)
colnames(mymatrix4) <- c("Treatment", "Control")
rownames(mymatrix4) <-c("Yawn", "NotYawn")
mymatrix4
##         Treatment Control
## Yawn           10       4
## NotYawn        24      12
A simulation was conducted to understand the distribution of the test statistic under the assumption of independence: having someone yawn near another person has no influence on if the other person will yawn. In order to conduct the simulation, a researcher wrote yawn on 14 index cards and not yawn on 36 index cards to indicate whether or not a person yawned. Then he shuffled the cards and dealt them into two groups of size 34 and 16 for treatment and control, respectively. He counted how many participants in each simulated group yawned in an apparent response to a nearby yawning person, and calculated the difference between the simulated proportions of yawning as pˆtrtmt,sim pˆctrl,sim. This simulation was repeated 10,000 times using software to obtain 10,000 differences that are due to chance alone. The histogram shows the distribution of the simulated diffrences.
a) What are the hypotheses for testing if yawning is contagious, i.e. whether it is more likely for someone to yawn if they see someone else yawning?

Ho: Yawning is independent of seeing someone else yawn (Yawning is not contagious)

Ha: Yawning is not indpendent of seeing someone else yawn (Yawning is contagious)

b)Calculate the observed difference between the yawning rates under the two scenarios.

Control Yawning Rate = 4/16 = .25

Treatment Yawning Rate = 10/34 = .29

There is a differnce of .04 or 4% between the treatment and control groups’ yawning rates.

c) Estimate the p-value using the figure above and determine the conclusion of the hypothesis test.
chisq.test(mymatrix4)
## Warning in chisq.test(mymatrix4): Chi-squared approximation may be
## incorrect
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  mymatrix4
## X-squared = 0, df = 1, p-value = 1
Yawn = c(10,4)
NotYawn = c(24,12)
YawnData<-rbind(Yawn,NotYawn)
colnames(YawnData)=c("Treatment","Control")
chisq.test(YawnData)
## Warning in chisq.test(YawnData): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  YawnData
## X-squared = 0, df = 1, p-value = 1

``` When running the Chi-Square test for independence we get a p-value of 1 > .05, so we fail to reject the null hypothesis and do not have enough evidence to say yawning is contagious.