Graded 1.8
Q(a) What does each row of the data matrix represent?
A(a) Each row represents a case, or set of observations for a UK resident.
Q(b) How many participants were included in the survey?
A(b) 1691 rows are indicated, for 1691 respondents.
Q(c) Indicate whether each variable in the study is numerical or categorical. If numerical, identify as continuous or discrete. If categorical, indicate if the variable is ordinal.
A(c) sex is categorical (seemingly with two levels)
A(c) age is ordinal
A(c) marital is categorical
A(c) grossIncome is numerical and continuous
A(c) smoke is categorical
A(c) amtWeekends is numerical and discrete
A(c) amtWeekdays is numerical and discrete
Graded 1.10
Q(a) Identify the population of interest and the sample in this study.
A(a) The population of interest is not made explicit, but seems to be children of different ages. The sample is 160 children between the ages of 5 and 15.
Q(b) Comment on whether or not the results of the study can be generalized to the population, and if the findings of the study can be used to establish causal relationships.
A(b) The sampling methodology is not apparent from description, so to err on the conservative side I’d say that it should not be generelized. Neither is it clear how children were assigned to the two groups (instructed / not instructed), so it is risky to claim more than associations.
Graded 1.28 Reading the paper.
Q(a) Based on this study, can we conclude that smoking causes dementia later in life? Explain your reasoning.
A(a) We cannot make any conclusions about causality, as this is a retrospective observational study and there may be confounding variables beyond those the study claims to have accounted for.
Q(b) A friend of yours who read the article says, “The study shows that sleep disorders lead to bullying in school children.” Is this statement justified? If not, how best can you describe the conclusion that can be drawn from this study?
A(b) No, it’s not justified. The relationship could conceivably work the other way - bullying could disquiet young minds and lead to sleep problems. A better way to interpret the result is that sleep disorders are associated with bullying and disruptive behaviors.
Graded 1.36 Exercise and mental health.
Q(a) What type of study is this?
A(a) A random experiment with sample stratification.
Q(b) What are the treatment and control groups in this study?
Q(b) The treatment group is the 50% of the subjects instructed to exercise. The control group is the other half.
(c) Does this study make use of blocking? If so, what is the blocking variable?
A(c) No, it does not seem to make use of blocking. For example, it does not control for whether subjects from each stratum already exercise and the effect that has on baseline mental health.
(d) Does this study make use of blinding?
A(d) No, as both groups of study participants conduct mental health exams at the outset and are instructed by the researcher(s) to exercise or not exercise, the study is not blinded. There’s nothing to suggest that those administering the study are unaware of which participants fall in which groups, so there is no double-blind.
(e) Comment on whether or not the results of the study can be used to establish a causal relationship between exercise and mental health, and indicate whether or not the conclusions can be generalized to the population at large.
A(e) The study as described does not seem to account for confounding variables like disposition to exercise, baseline mental health, and potentially others. It does not seem well suited to identifying a causal relationship. The strata identified do not seem to include those younger or older than 18-55yo, and so the study’s results should not be extrapolated to a population that includes minors and seniors.
(f) Suppose you are given the task of determining if this proposed study should get funding. Would you have any reservations about the study proposal?
A(f) Yes, I would have reservations. As mentioned under A(c) it does not block for pre-study exercise tendencies. It does not seem to factor for veracity (i.e. whether respondents actually did / did not exercise) or the impact blocks with strata to change behavior (exercisers to not do so, non-exercisers to do so) may have on mental health.
Graded 1.48 Stats scores.
Q Create a box plot of the distribution of these scores. The five number summary provided below may be useful.
A See chart below
scores <- c(57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94)
boxplot(scores, notch = TRUE)

Double-checking computation - matches to question
fivenum(scores)
## [1] 57.0 72.5 78.5 82.5 94.0
Graded 1.50 Mix-and-match.
Describe the distribution in the histograms below and match them to the box plots.
(a) –> (2) Unimodal, near-normal distribution, slightly left-skewed. Based on the median of 60, matches to boxplot (2).
(b) –> (3) Multimodal, flat distribution. Based on median of 50, matches to boxplot (3).
(c) –> (1) Unimodal, right-skewed distribution with a long tail starting before 3.
Graded 1.56 Distributions and appropriate statistics, Part II.
Q(a) Housing prices in a country where 25% of the houses cost below $350,000, 50% of the houses cost below $450,000, 75% of the houses cost below $1,000,000 and there are a meaningful number of houses that cost more than $6,000,000.
A(b) Sounds like a largely symmetric distribution. The standard deviation and mean would represent variability well.
Q(c) Number of alcoholic drinks consumed by college students in a given week. Assume that most of these students don’t drink since they are under 21 years old, and only a few drink excessively.
A(d) The executive outliers will cause some right-sided skew, making robustness of median / IQR preferable.
Graded 1.70 Heart transplants.
Q(a) Based on the mosaic plot, is survival independent of whether or not the patient got a transplant? Explain your reasoning.
A(a) The observed survival rates were 4 of 34 (almost 12%) patients in the control cell survived, and 24 of 69 (almost 35%) patients in the test cell. H0, the independence model, interprets this difference of about 23% as due to chance; HA, that it is not due to chance. From the mosaic plot alone, it would be surprising to find that H0 was supported through reproduction and iteration of the experiment.
Q(b) What do the box plots below suggest about the efficacy (effectiveness) of the heart transplant treatment.
A(c) The fatality rates for the treatment cell were about 65%, while the fatality rates for the control cell were about 88%.
Q(d) One approach for investigating whether or not the treatment is effective is to use a randomization technique.
Q(di) What are the claims being tested?
A(di) The tested claim is that the difference of approximately 23% between the control cell and the treatment cell is random, the product of chance.
Q(dii) The paragraph below describes the set up for such approach, if we were to do it without using statistical software. Fill in the blanks with a number or phrase, whichever is appropriate.
A(dii) We write alive on 28 cards representing patients who were alive at the end of the study, and dead on 75 cards representing patients who were not. Then, we shuffe these cards and split them into two groups: one group of size 69 representing treatment, and another group of size 34 representing control. We calculate the difference between the proportion of dead cards in the treatment and control groups (treatment - control) and record this value. We repeat this 100 times to build a distribution centered at 0. Lastly, we calculate the fraction of simulations where the simulated differences in proportions are . If this fraction is low, we conclude that it is unlikely to have observed such an outcome by chance and that the null hypothesis should be rejected in favor of the alternative.
Q(diii) What do the simulation results shown below suggest about the effectiveness of the transplant program?
A(diii) It does not appear that simulating the distribution 100 times produced any outcomes where the difference was 23% or greater. This supports HA: the difference was unlikely to be due to chance. Accordingly, the transplant program seems to be effective.