1.8
- Each row of the data matrix represents the smoking habits of an individual person in the UK.
- 1691 participants were included in the survey.
- sex: Categorical, non-ordinal
age: numerical, discrete
marital: Categorical, non-ordinal
grossIncome: Categorical, ordinal
smoke: Categorical, non-ordinal
amtWeekends: Numerical, discrete
amtWeekdays: Numerical, discrete
1.10
- Population: All children
Sample: 160 children between the ages of 5 and 15
- We haven’t been given any information on how the children were sampled from the population, therefore we can’t make the determination of whether or not the results of the study can be generalized to the population. That being said, since this is an experiment, we can certainly evaluate whether there is a causal relationship between the variables.
1.28
- No, we cannot infer causation from this observational study. While the variables may be associated, association <> causation. We can only infer causation from a randomized experiment, which this was not.
- No, this statement is not justified. Sleep disorders do not cause children to bully. The best way to describe the conclusion from this study is that there is a correlation between sleep disorders and behavioral issues.
1.36
- This is an experiment.
- The treament group is comprised of the individuals instructed to exercise twice per week. The control group is comprised of the individuals instructed not to exercise.
- The study does make use of blocking by samping from the various age groups. The blocking variable is age.
- This study does not make sure of blinding, because it is very clear to the participants whether they are exercising or not.
- All necessary precautions have been taken in order to determine whether or not there is a causal relationship between exercise and mental health. Random sampling and random assignment to groups was implemented. For this reason, the results could be generalized to the population at large.
- My first reservation would be that twice a week doesn’t seem like enough to produce noticable effects in mental health - but that comes from personal experience. I also feel that by notifying the control group that they are not allowed to exercise, it tips them off to the purpose of the study and can influence their results.
1.48
x = c(57,66,69,71,72,73,74,77,78,78,79,79,81,81,82,83,83,88,89,94)
df = data.frame(x)
summary(df)
## x
## Min. :57.00
## 1st Qu.:72.75
## Median :78.50
## Mean :77.70
## 3rd Qu.:82.25
## Max. :94.00
boxplot(df, main = "Box Plot Distribution of Statistics Final Exam Scores", ylab = "Final Exam Scores")
#### 1.50 a) This is a unimodal distribution with is fairly normal, centered somewhere around 50. The #2 boxplot would be the match for this.
This is a multimodal distribution. The #3 box plot would be the match for this. You can see how widely the data are distributed here.
This is another unimodal right skewed distribution. The #1 box plot would be the match for this.
1.56
- Symmetric, left, or right?
I would expect this distribution to be left skewed.
Mean or median?
We should use the median, since this data is not symmetric/normally distributed.
Standard deviation or IQR?
Since we are focusing on the median above, we should use the IQR here since it is based around that.
- Symmetric, left, or right?
I would expect this distribution to be left skewed.
Mean or median?
We should use the median, since this data is not symmetric/normally distributed.
Standard deviation or IQR?
Since we are focusing on the median above, we should use the IQR here since it is based around that.
- Symmetric, left, or right?
I would think this would be pretty symmetric - with the data normally distributed, thinning out on both ends around no drinks and a high number of drinks.
Mean or median?
We would want to use the mean here since the data is symmetric.
Standard deviation or IQR?
Since we are focusing on the mean above, we should use the standard deviation here since it takes all data points into account and is more conclusive.
- Symmetric, left, or right? I think this would be right skewed - with the majority of people huddling around a certain section of the distribution, and then trailing off on the higher salaries, with the small number of CEOs included in the data.
Mean or median?
We would want to use the median here since the data is not symmetric.
Standard deviation or IQR?
Since we are focusing on the median above, we should use the IQR here since it is based around that.
1.70
- No, I would say there is a large causal relationship between treatment and survival because there is a significant difference between the control and treatment groups.
- The box plots suggest that the heart transplant treatment was wildly successful, extending survival time dramatically for the participants in the treatment group.
- I would say that about 16% of the participants in the control group are alive, and about 84% are dead. I would say that about one third of participants in the treatment group are alive, and about two thirds of them are dead. With this, you could say that there are twice as many people alive in the treatment group as in the control group.
- i.
The claims being tested are whether an experimental heart transplant increased lifespan.
ii.
49
151
100
100
-0.025
Normally distributed
iii.
It is unlikely to have observed the differences due to chance. We should reject the hull hypothesis and accept the alternative. We have sufficient evidence to conclude that the transplant program is effective.