HW_Chapter1

1.8 Smoking habits of UK residents

Each case of the data matrix represents a single observation (or case) of the smoking habits of an individual UK resident.
There are 1691 participants in the survey.
sex, marital, grossIncome and smoke are all categorical variables; grossIncome is an ordinal variable.

age, amtWeekends (after recoding), amtWeekdays (after recoding) are all numeric variables; amtWeekends and amtWeekdays need to be cleaned to extract the numeric data.

age is reported as discrete, while amtWeekends and amtWeekdays are discrete (I assume one cannot report that one half of a cigarette is smoked in this study)

1.10 Cheaters, scope of inference.

The population of interest: children between the ages of 5 and 15. The sample: 160 children between the ages of 5 and 15.
The sample was not chosen randomly, therefore results cannot be generalized to the population. Causal relationships cannot be established but there is evidence of associations between variables.

1.28 Reading the paper.

The sample was not composed randomly (only health plan members who volunteered were part of the study), therefore results cannot be generalized to the population. Causal relationships cannot be established but there appears to be evidence of an association between variables.
The statement is not justified, a casual connection cannot be established. “The study exposed an association between sleep disorders in children and behavioral issues such as bullying.”

1.36 Exercise and mental health.

Prospective experimental study (with stratified random sampling)
treatment - exercise twice a week
control - no exercise
Participants were sorted into blocks based on age; stratified random sampling employed the use of blocks to generate the sample.
Everyone involved in the study knew the group (exercise vs. no exercise) to which they were assigned; the study does not make use of blinding.
The study is designed so that causal relationships be tested and established; random sampling contributes to the evidence supporting the causal relationship and permits the relationship to be generalized for the population at large.
Studies should not cause harm, and instructing people not to exercise does not contribute to health. Additionally, the sample should be blocked or stratified according to physical fitness, exercise history, sex, and initial mental health assessment. Given sufficient sample size, researchers might consider living arrangements (alone and single, married, divorced, widowed) as an additional barrier that resists/affects improvement in mental health. Duration and type of exercise should be strictly monitored; a waitlist/immediate design might also be effective.

1.48 Stats scores.

It seems as if Q1 and Q3 in the table given in the text are not accurate.

score <- c(57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94)
score<-as.data.frame(score)
library(ggplot2)
library(ggthemes)
summary(score)

##      score      
##  Min.   :57.00  
##  1st Qu.:72.75  
##  Median :78.50  
##  Mean   :77.70  
##  3rd Qu.:82.25  
##  Max.   :94.00

ggplot(score, aes(y=score, fill=""))+
         geom_boxplot()+
  stat_summary(geom="text",fun.y=quantile,
               aes(x=.45,label=sprintf("%1.1f",..y..)))+
  theme_bw()+
  ggtitle("Final Exam Scores",subtitle="20 Introductory Statistics Students")+
  xlab("")+
  ylab("score")+
  theme(legend.position="none")+
  theme(axis.text.y = element_blank())+
  coord_flip()

1.50 Mix-and-match.

symmetrical, unimodal (and normal) boxplot 2
symetrical, bimodal/multimodal boxplot 3
right skewed, unimodal boxplot 1

1.56 Distributions and appropriate statistics, Part II.

Right-skewed, median, standard deviation
Symmetrical, mean, IQR
Right-skewed, median, standard deviation
Right-skewed, median, standard deviation

1.70 Heart transplants.

From the plot, greater odds of survival are associated with receiving treatment. Although some who did not receive the treatment survived, prevalence of survival in the treatment group more than doubled.
The box plot suggests that treatment is associated with a substantial increase in survival time, as measured in days.

rm(list=ls())
library(openintro)
library(plyr)
data(heartTr)

hrtcon<-prop.table(table(heartTr$survived,heartTr$transplant),2)
hrtcon

##        
##           control treatment
##   alive 0.1176471 0.3478261
##   dead  0.8823529 0.6521739

65.2% of those in the treatment group died while 88.2% of those in the control group died.
i. The claim being tested is that an experimental heart transplant program increases lifespan.

ii. We write alive on [alive, 28] cards representing patients who were alive at the end of the study, and dead on [dead, 75] cards representing patients who were not. Then, we shuffle these cards and split them into two groups: one group of size [treatment, 69] representing treatment, and another group of size [control, 34] representing control. We calculate the difference between the proportion of dead cards in the treatment and control groups (treatment - control) and record this value. We repeat this 100 times to build a distribution centered at zero. Lastly, we calculate the fraction of simulations where the simulated differences in proportions are at least .882 - .652 = 0.23. If this fraction is low, we conclude that it is unlikely to have observed such an outcome by chance and that the null hypothesis should be rejected in favor of the alternative.

iii. From the simulation results we can reject the null hypothesis and conclude that the transplant is effective.

HW_Chapter1

Stephen Jones

February 9, 2019