Each row represents one case, observation, or record
1691 participants were included in the survey
| variable | type |
|---|---|
| sex | nominal categorical |
| age | discrete numerical |
| marital | nominal categorical |
| grossIncome | ordinal categorical |
| smoke | nominal categorical |
| amtWeekends | discrete numerical |
| amtWeekdays | discrete numerical |
The population of interest is children between the ages of 5 and 15.
Assuming that the children were randomly selected and randomly assigned to the treatment and control groups, the findings can be generalized at least to the population of children between the ages of 5 and 15 from the same geographical region where the children were selected because it was a controlled randomized experiment. I would not generalize outside of this geographical region because cultural differences could play a role in the outcome.
Because this study is based on observational data and not a randomized controlled experiment we cannot conclude that there is a causal relationship between smoking and dementia later in life, only that there is a correlation.
No, we cannot say that there is a causal relationship because, once again, this is based on observational data, not a controlled randomized experiment. Also the study does not specify what percentage of children with symptoms of sleep disorders have behavioral issues or are identified as bullies, only the other way around. Additionally, if only 1% of students who do not have behavioral or bullying issues have symptoms of sleep disorders and 2% of students who do have these issues have symptoms of sleep disorders that would mean they are “twice as likely”, but it would still be a very small percentage and so you couldn’t say that “sleep disorders lead to bullying in school children.”
This study is a controlled randomized experiment.
The treatment group are those who are instructed to exercise twice a week and the control group are those who are instructed not to exercise.
Yes, this study uses blocking based on age groups.
No, blinding is not used because patients know whether or not they are exercising, so they know if they are in the treatment or control group. Assuming that the researchers who perform the mental health exam also know who is assigned to each group there is no blinding there either. Since the exam is a mental health exam I guess this could cause some bias depending on how the mental health test is conducted if there is any subjectivity to the mental health measures.
Yes, the results of this study can be used to establish a causal relationship between exercise and mental health because participants were randomly assigned to treatment groups and they can be generallized to the population because it was a randomized sample of the population.
I might suggest further blocking by gender and would have questions about the type of mental health exam being used, but otherwise no, I would not have any other reservations about the study proposal.
57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94
scores <- c(57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94)
boxplot(scores, horizontal = TRUE)
summary(scores)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 57.00 72.75 78.50 77.70 82.25 94.00
Histogram (a) is a normal distribution that corresponds to boxplot (2) with about 50% of the values falling between about 58 and 62.
Histogram (b) is a uniform distribution that corresponds to boxplot (3). There are no outliers and values are distributed evenly throughout the range of 0 to 100.
Histogram (c) is a positively skewed distribution that corresponds to boxplot (1). The majority of values fall in the lower end of the range between about 0.8 and 2 with a lot of outliers on the upper end of the range above about 3.6.
This is a right skewed distribution since 50 percent of the values fall within the lower 7.5% of the range. The median would best represent a typical observation since the mean would be thrown off by the many outliers in the top end of the range, and variability would be best described by the IQR since the standard deviation is based on the mean and so would also be thrown off by the outliers.
This is a relatively uniform distribution with about the same number of values in each of the 4 quarters of the range. Thus the mean and standard deviation or the median and IQR would both make good representations of the typical observation and variability respectively
The distribution of the number of alcoholic drinks consumed by college students in a given week is likely right skewed since there is most students don’t drink and only a few drink excessively, so the majority of values are at the lower end of the range. A typical observation would best be described by the median and the variability would best be described by the IQR since both or not affected as much by outliers as the mean and standard deviation are.
This one is tougher to judge, but I would guess it is normally distributed with just a few outliers on the upper end of the range and very few also at the lower end of the range. In this case the mean and standard deviation would be the best representations of the typical observation and variability.
No survival seems to be dependent on whether or not the patient received a transplant because the probability of survival increases with a transplant.
The boxplots suggest that transplants are highly effective at prologing life in patients who are designated an official heart transplant candidate.
# str(heartTr)
treat <- subset(heartTr, transplant == 'treatment')
con <- subset(heartTr, transplant == 'control')
prop.table(table(treat$survived))
##
## alive dead
## 0.3478261 0.6521739
prop.table(table(con$survived))
##
## alive dead
## 0.1176471 0.8823529
65.22% of patients in the treatment group died.
88.24% of patients in the control group died.
What are the claims being tested?
The paragraph below describes the set up for such approach, if we were to do it without using statistical software. Fill in the blanks with a number or phrase, whichever is appropriate.
table(treat$survived)
##
## alive dead
## 24 45
table(con$survived)
##
## alive dead
## 4 30
We write alive on 28 cards representing patients who were alive at the end of the study, and dead on 75 cards representing patients who were not. Then, we shuffle these cards and split them into two groups: one group of size 69 representing treatment, and another group of size 34 representing control. We calculate the difference between the proportion of dead cards in the treatment and control groups (treatment - control) and record this value. We repeat this 100 times to build a distribution centered at . Lastly, we calculate the fraction of simulations where the simulated differences in proportions are . If this fraction is low, we conclude that it is unlikely to have observed such an outcome by chance and that the null hypothesis should be rejected in favor of the alternative.