6.8 Elderly drivers. In January 2011, The Marist Poll published a report stating that 66% of adults nationally think licensed drivers should be required to retake their road test once they reach 65 years of age. It was also reported that interviews were conducted on 1,018 American adults, and that the margin of error was 3% using a 95% confidence level.

  1. Verify the margin of error reported by The Marist Poll.
    Answer:
    Given: p=0.66 and n=1018 and Using formula: MoE at 95% confidence
    \(MoE=.98/sqrt(n)\)
    \(MoE = 0.98/sqrt(1018)\)
    \(MoE = 0.03071512\)
    Hence verfied.

  2. Based on a 95% confidence interval, does the poll provide convincing evidence that more than 70% of the population think that licensed drivers should be required to retake their road test once they turn 65?
    Answer:
    No, it provides convincing evidence tha 66% think so.


6.16 Is college worth it? Part I. Among a simple random sample of 331 American adults who do not have a four-year college degree and are not currently enrolled in school, 48% said they decided not to go to college because they could not afford school.
Given:
Sample proportion, p=0.48 ; Population proportion, P=0.5; sample size=331.

  1. A newspaper article states that only a minority of the Americans who decide not to go to college do so because they cannot afford it and uses the point estimate from this survey as evidence. Conduct a hypothesis test to determine if these data provide strong evidence supporting this statement.
    Answer:
    Null Hypothesis, Ho: The Americans did not decide not to go to college do so because they cannot afford it. The drive to drop out/not attending college was something other than the college fee. Or P=0.5
    Alternate Hypothesis, Ha: The claim is that the Americans who decide not to go to college do so because they cannot afford it. or P!=0.5
#pnorm()
# Not able to work this out.
  1. Would you expect a confidence interval for the proportion of American adults who decide not to go to college because they cannot afford it to include 0.5? Explain.

6.24 Heart transplant success. The Stanford University Heart Transplant Study was conducted to determine whether an experimental heart transplant program increased lifespan. Each patient entering the program was officially designated a heart transplant candidate, meaning that he was gravely ill and might benefit from a new heart. Patients were randomly assigned into treatment and control groups. Patients in the treatment group received a transplant, and those in the control group did not. The table below displays how many patients survived and died in each group.

obs control treatment
alive 4 24
dead 30 45

A hypothesis test would reject the conclusion that the survival rate is the same in each group, and so we might like to calculate a confidence interval. Explain why we cannot construct such an interval using the normal approximation. What might go wrong if we constructed the confidence interval despite this problem?
Answer:
Null Hypothesis, Ho: The heart transplant procedure did not alter the survival rates of the observed patients. Alternate Hypothesis, Ha: The claim was that the patients getting a heart transplant were more likely to survive than those who did not.

Normal distribution is the approximation for large distribution from binomial,poisson distribution etc. Moreover, the variables are discrete and few; and instead of the actual observations we are working on counts data. In the given situation it is preferred to use chi square distribution. As chi square test is used we can check whether the two groups are the same or diferent. Therefore, it is not possible to construct a confidence interval. In other words,
Null Hypothesis, Ho: The two groups(control and treatment) are having the same average
Alternate Hypothesis, Ha: the two groups(control and treatment) averages are different

obs control treatment
alive 4/103=0.38 24/103=0.233
dead 30/103=0.291 45/103=0.436
alive = c(4,24)
dead = c(30,45)
heartData<-rbind(alive,dead)
colnames(heartData)=c("control","treatment")
chisq.test(heartData)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  heartData
## X-squared = 4.9891, df = 1, p-value = 0.02551

X^2=4.98 and p-value=0.025 (<0.05). Hence the chisquare test result is significant and we can reject Ho. I can thus conclude that there is difference in proportion of survival rates for the control and experiment groups.

However, one of the observation is <5 which means Central Limit Theorem may not apply. Therefore this X^2 may not be valid and constructed confidence interval maybe unreliable.


6.32 Full body scan, Part I.
A news article reports that “Americans have differing views on two potentially inconvenient and invasive practices that airports could implement to uncover potential terrorist attacks.” This news piece was based on a survey conducted among a random sample of 1,137 adults nationwide, interviewed by telephone Nov 7-10, 2010, where one of the questions on the survey was “Some airports are now using ‘full-body’ digital x-ray machines to electronically screen passengers in airport security lines. Do you think these new x-ray machines should or should not be used at airports?” Below is a summary of responses based on party affiliation.

  1. Conduct an appropriate hypothesis test evaluating whether there is a difference in the proportion of Republicans and Democrats who think the full-body scans should be applied in airports. Assume that all relevant conditions are met.
    Answer:
    Null Hypothesis, Ho:No association between the party and opinion of using the Full body X-ray.
    Alternate Hypothesis, Ha:There is an association between the party and opinion of using a full body x-ray at airport.

Given: n = 1137 ; the variables are categorical and we have a counts data. We can use a chi-sq test in this situation to test association.

library(stats)

Rep<-c(264,38,16,318)
Dem<-c(299,55,15,369)
Ind<-c(351,77,22,450)
Tot<-c(914,170,53,1137)
data<-cbind(Rep,Dem,Ind,Tot)
data
##      Rep Dem Ind  Tot
## [1,] 264 299 351  914
## [2,]  38  55  77  170
## [3,]  16  15  22   53
## [4,] 318 369 450 1137
chisq.test(data[1:3,1:3])
## 
##  Pearson's Chi-squared test
## 
## data:  data[1:3, 1:3]
## X-squared = 4.3576, df = 4, p-value = 0.3598
#rownames(data)=c("Should","Should Not","No Answer","Total")

The chi-square statistic = 4.3576.
The P-Value is 0.3598.
Since p > 0.05 the test result is not significant and we do not reject the null hypothesis (Ho) and conclude there is no evidence there is an association between party and opinion.

  1. The conclusion of the test in part (a) may be incorrect, meaning a testing error was made. If an error was made, was it a Type 1 or a Type 2 Error? Explain.
    Answer:
    If a testing error was made it must have been a Type 2 error which means failure to reject the null hypothesis (Ho) when the alternative hypothesis (Ha) is in fact true.

6.40 True or false, Part II. Determine if the statements below are true or false. For each false statement, suggest an alternative wording to make it a true statement.

  1. As the degrees of freedom increases, the mean of the chi-square distribution increases.
    Answer:
    TRUE. The mean of a Chi Square distribution is its degrees of freedom. Chi Square distributions are positively skewed, with the degree of skew decreasing with increasing df. As the df increases, the Chi Square distribution approaches a normal distribution

  2. If you found chi2 = 10 with df = 5 you would fail to reject H0 at the 5% significance level.
    Answer:

pchisq(10,5,lower.tail=FALSE)
## [1] 0.07523525

TRUE.Since p-value > 0.05 there is not enough evidence to reject Ho. The relationship in question may be due to chance.

  1. When finding the p-value of a chi-square test, we always shade the tail areas in both tails.
    Answer:
    FALSE; The chi-squared test is essentially always a one-sided test as the square term is involved meaning its always positive.

  2. As the degrees of freedom increases, the variability of the chi-square distribution decreases.
    Answer:
    TRUE, as df of chi-sq test increases it becomes less skewed and follows normal distribution more closely.


6.48 Coffee and Depression. Researchers conducted a study investigating the relationship between caffeinated coffee consumption and risk of depression in women. They collected data on 50,739 women free of depression symptoms at the start of the study in the year 1996, and these women were followed through 2006. The researchers used questionnaires to collect data on caffeinated coffee consumption, asked each individual about physician-diagnosed depression, and also asked about the use of antidepressants. The table below shows the distribution of incidences of depression by amount of caffeinated coffee consumption.

  1. What type of test is appropriate for evaluating if there is an association between coffee intake and depression?
    Answer: Given the nature of data i.e. using the categorical variable based counts data we would be performing a chi-square independence test to evaluate if the variables are dependent or not.

  2. Write the hypotheses for the test you identified in part (a).
    Answer:
    Null Hypothesis, Ho:No association between caffeinated coffee consumption and risk of depression in women.
    Alternate Hypothesis, Ha:There is an association between caffeinated coffee consumption and risk of depression in women.

  3. Calculate the overall proportion of women who do and do not suffer from depression.
    Answer:

Yes<-c(670,373,905,564,95,2607)
No<-c(11545,6244,16329,11726,2288,48132)
coffeDep<-rbind(Yes,No)
colnames(coffeDep)<-c("lt1CupPerWeek","bt26CupPerWeek","1CupPerDay","23CupPerDay","4orMoreCupPerDay","Total")

coffeDep
##     lt1CupPerWeek bt26CupPerWeek 1CupPerDay 23CupPerDay 4orMoreCupPerDay
## Yes           670            373        905         564               95
## No          11545           6244      16329       11726             2288
##     Total
## Yes  2607
## No  48132
coffeDep1<-coffeDep/(sum(Yes[-1],No[-1]))
coffeDep1
##     lt1CupPerWeek bt26CupPerWeek 1CupPerDay 23CupPerDay 4orMoreCupPerDay
## Yes    0.00750591    0.004178663 0.01013858 0.006318407      0.001064271
## No     0.12933690    0.069950595 0.18293134 0.131364619      0.025632121
##          Total
## Yes 0.02920583
## No  0.53921558
  1. Identify the expected count for the highlighted cell, and calculate the contribution of this cell to the test statistic, i.e. (Observed − Expected)2/Expected.
    Answer:
    \(Expected = (2607*6617)/50739\)
    \(Expected = 339.98\)

\((O-E)^2 / E = (373-339.98)^2 / 339.98\)
\(X^2 = 3.207013\)

  1. The test statistic is X2 = 20.93. What is the p-value?
    Answer:
X2=20.93
r=2
c=5
df=(r-1)*(c-1)
paste("P-value at X2=20.93 is:",round(1-pchisq(X2,df),5))
## [1] "P-value at X2=20.93 is: 0.00033"
  1. What is the conclusion of the hypothesis test?
    Answer:
    The p-value as observed is very low (less than 0.05) we can thus safely say that the difference in the two groups is not by chance and we can thus reject null hypothesis. And we thus conclude that there is significant relationship between coffee consumption and occurence of Depression in women.

  2. One of the authors of this study was quoted on the NYTimes as saying it was “too early to recommend that women load up on extra coffee” based on just this study. Do you agree with this statement? Explain your reasoning.
    Answer:
    Though the Hypothesis test confirms that there are reasons to believe the relationship. However, questions maybe raised on the validity of data or presence of any bias, effect of probable confounding variables to make sure the final conclusion is errorfree.


6.56 Is yawning contagious? An experiment conducted by the MythBusters, a science entertainment TV program on the Discovery Channel, tested if a person can be subconsciously influenced into yawning if another person near them yawns. 50 people were randomly assigned to two groups: 34 to a group where a person near them yawned (treatment) and 16 to a group where there wasn’t a person yawning near them (control). The following table shows the results of this experiment.

obs Treatment Control
Yawn 10 4
Not Yawn 24 12

A simulation was conducted to understand the distribution of the test statistic under the assumption of independence: having someone yawn near another person has no influence on if the other person will yawn. In order to conduct the simulation, a researcher wrote yawn on 14 index cards and not yawn on 36 index cards to indicate whether or not a person yawned. Then he shuffled the cards and dealt them into two groups of size 34 and 16 for treatment and control, respectively.
He counted how many participants in each simulated group yawned in an apparent response to a nearby yawning person, and calculated the difference between the simulated proportions of yawning as ˆptrtmt,sim − ˆpctrl,sim. This simulation was repeated 10,000 times using software to obtain 10,000 differences that are due to chance alone. The histogram shows the distribution of the simulated differences.

  1. What are the hypotheses for testing if yawning is contagious, i.e. whether it is more likely for someone to yawn if they see someone else yawning?
    Answer:
    Null Hypothesis, Ho:Yawning is not contagious i.e. One person will not yawn by looking at other person yawn; the variables are independent i.e. p_control >= p_trtmnt. Alternate Hypothesis, Ha:Yawning is contagious i.e. One person will tend to yawn by looking at other person yawn. Or the two variables are dependent. Or proportion of yawns in the control group will be less than the proporition of yawns in the treatment group.i/.e. p_control<p_trtmnt.

  2. Calculate the observed difference between the yawning rates under the two scenarios.
    Answer:

p_control=10/34=0.29 p_trtmnt=4/16=0.25

Difference of proportion = p_trtmnt-p_control = 0.04

Yawn<-c(10,4)
NotYawn<-c(24,12)
YawnExpData<-rbind(Yawn,NotYawn)
colnames(YawnExpData)<-c("Treatment","Control")

YawnExpData1<-YawnExpData/colSums(YawnExpData)
chisq.test(YawnExpData)
## Warning in chisq.test(YawnExpData): Chi-squared approximation may be
## incorrect
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  YawnExpData
## X-squared = 0, df = 1, p-value = 1
pchisq(0,1,lower.tail=FALSE)
## [1] 1
  1. Estimate the p-value using the figure above and determine the conclusion of the hypothesis test.
    Answer:
pnorm(0.04,1,lower.tail=FALSE)
## [1] 0.8314724

The p-value is high (>0.05) thus we fail to reject Ho. We do not have enough statistical evidence to say that yawning is contagious.