Set up workspace
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(psych)
knitr::opts_chunk$set(echo = TRUE)
rm(list=ls())
1.8
- What does each row of the data matrix represent?
- Each row is one participant in the study.
- How many participants were included in the survey?
- Indicate whether each variable in the study is numerical or categorical. If numerical, identify as continuous or discrete. If categorical, indicate if the variable is ordinal.
- Sex: categorical, nominal
- Age: numeric, discrete
- Marital: categorical, nominal
- Gross income: categorical, ordinal
- Smoker: categorical, nominal
- amtWeekends: categorical, ordinal
- amtWeekdays: categorical, ordinal
1.10
- Identify the population of interest and the sample in this study.
- Population is all children aged 5 - 15. The sample is the 160 children included in this experiment.
- Comment on whether or not the results of the study can be generalized to the population, and if the findings of the study can be used to establish causal relationships.
- This group represents only a very specific geographical (and likely socioeconomic) region, and so can only represent such a population. No mention is made of double-blinding, but the experimental design is meant to establish a causal link between the intervention and the outcome.
1.28
Based on this study, can we conclude that smoking causes dementia later in life? Explain your reasoning.
- Because this is an observational study, we cannot conclude a causal relationship.
A friend of yours who read the article says, “The study shows that sleep disorders lead to bullying in school children.” Is this statement justified? If not, how best can you describe the conclusion that can be drawn from this study?
- Again, as the design is observational, a causal relationship can’t be determined. You could say that the study shows that sleep disorders are correlated with bullying, but not which variable “caused” the other, or, if a confounding variable effected each of the observed variables.
1.36
- What type of study is this?
- What are the treatment and control groups in this study?
- Does this study make use of blocking? If so, what is the blocking variable?
- Yes, age groups. 18-30, 31-40 and 41- 55 year olds.
- Does this study make use of blinding?
- No. The children know whether they are exercising or not. And it is not stated that the person conducting the mental health exam is blinded from the treatment/control groupings.
- Comment on whether or not the results of the study can be used to establish a causal relationship between exercise and mental health, and indicate whether or not the conclusions can be generalized to the population at large.
- Besides the issues with blinding, my understanding is that the experiment is designed so that a causal relationship could be determined. The, caveat, is that we would need to know how representative this group was of other populations that the conclusions may be applied to.
- Suppose you are given the task of determining if this proposed study should get funding. Would you have any reservations about the study proposal?
- Since it deals with mental health and children, there are serious concerns. Have the guardians consented? Could the intervention of no exercise unintentionally harm any of the children’s mental health?
1.48
scores <- c(57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94)
boxplot(scores)

1.56
- Housing prices in a country where 25% of the houses cost below $350,000, 50% of the houses cost below $450,000, 75% of the houses cost below $1,000,000 and there are a meaningful number of houses that cost more than $6,000,000.
- Given the jump between the median and the 3rd quartile and the outliers above $6,000,000: Right skewed; median for typical observation, IQR for variability
- Housing prices in a country where 25% of the houses cost below $300,000, 50% of the houses cost below $600,000, 75% of the houses cost below $900,000 and very few houses that cost more than $1,200,000.
- Symmetric; mean for typical observation and standard deviation for variability.
- Number of alcoholic drinks consumed by college students in a given week. Assume that most of these students don’t drink since they are under 21 years old, and only a few drink excessively.
- Given the boundary of 0 to the left: Data will likely be right skewed. Median for typical observation, IQR for variability.
- Annual salaries of the employees at a Fortune 500 company where only a few high level executives earn much higher salaries than all the other employees.
- Bimodal? The data seems like it will be largely symmetric but the few high level salaries will make it slightly right skewed. For this reason, again, Median for typical observation, IQR for variability
1.70
- Based on the mosaic plot, is survival independent of whether or not the patient got a transplant? Explain your reasoning.
- Not independent. Seems treatment group has better survival.
- What do the box plots below suggest about the efficacy (effectiveness) of the heart transplant treatment.
- Survival was significantly improved by the treatment.
- What proportion of patients in the treatment group and what proportion of patients in the control group died?
30/34
## [1] 0.8823529
45/69
## [1] 0.6521739
- 88% died in the control
- 65% died in the control
- i What are the claims being tested?
- Whether the treatment improves survival
ii alive on 14, dead on 75
iii From the simulated data, you wouldn’t expect such a large observed difference if the populations the experimental groups were meant to be representing (control vs treatment) actually were not different. Therefore, it’s evidence that it was the intervention that was the source of this difference.