data(iris)
Each row of data represents an observation or participant.
1,691 participants
Population of interest: Children between the ages of 5 and 15. Sample: 160 children between teh ages of 5 and 15.
For this study to be generalizable to the population, this study needs to be replicated by collecting a sufficiently large sample.
This is an experimental study to test if giving specific instruction not to cheat would cause cheating or not. I think that the researchers would need to consider many other factors such as upbringing and values of the children and would need to do the appropriate blocking/ randomization to try and control for the effects of other possible factors that may influcence the outcome of the experiment.
No, this is an observational study. This type of study cannot prove any causality. The study may demonstration association between smoking and dementia but not causality.
Experimental
The treatment group is instructed to exercise twice a week. The control group is instructed NOT to exercise at all.
Stratified randomiation is used to ensure that age group 18 to 55 are appropriately represented in the study. I don’t think the study is blocking for any variable such as the initial mental health of the subject before the start of the experimental study.
Does this study make use of blinding? No.
Comment on whether or not the resutls of the study can be used to established a causal relationship between exercise and mental health, and indicate wehther or not the conclusions can be generalized to the population at large.
For this study to be generalized to the population (18 to 55 of age), this study needs to be replicated to a sufficiently large sample. I’m not sure if this study can be used to establish causality. One obvious factor is the subject’s initial mental health. I do not think that this is being controlled in this study.
Yes. Initial mental health of the participants may have an effect on the outcome. If the treatment group has more subjects with good mental health and control group just happens to have more subjects with poorer mental health, this would have an effect on the outcome.
data <- c(57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94)
summary(data)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 57.00 72.75 78.50 77.70 82.25 94.00
boxplot(data)
Housing prices look like it is skewed to the right. Since there are outliers, median and IQR would be more robust in terms of measuring the center and spread of data.
IQR <- 1000000 - 350000
IQR
## [1] 650000
top_whisker <- 1000000 + 1.5*IQR
top_whisker
## [1] 1975000
Q3 = 900,000
Some cost more than 1,200,000
top_whisker = 1,875,000
Data is more symmetric. I think it would be more robust to use median and IQR.
IQR <- 900000 - 300000
top_whisker <- 900000 + 1.5*IQR
top_whisker
## [1] 1800000
Data is concentrated towards zero number of drinks with some outliers that drink excessively. Since there is no such thing as negative number of drinks, all the data are going to start from zero and onwards. I would say the data would be slighty skewed to the right.
Since most of the data would be centered towards zero with a few outliers, mean and IQR would be appropriate to use.
I think this data would be skewed to the right because of the executive salaries. I think it woudl be better to use median and IQR.
If survival is independent of whether or not the patient got a transplat, the the outcome between the control group and treatment group should be close to each other. Based on the data, control group has survival rate of only about 12% while treatment group has a survival rate of about 35%. This is a difference of 23%. How likely is it that this difference is not due to chance if survival is independent of whether or not the patient got a transplant. A null hypothesis testing should be done.
H0: Survival is independent whether or not the patient received a heart transplant. This means that difference in survival rate should be 0.
HA: Survival is not independent of whether or not the patient received a heart transplant. This means that there is going to be a significant difference in survival rates.
Control group survival: 0.1176471 Treatmetn group survival: 0.3478261
control_survival <- (34-30)/34
treatment_survival <- (69-45)/69
control_survival
## [1] 0.1176471
treatment_survival
## [1] 0.3478261
treatment_survival - control_survival
## [1] 0.230179
Looking at the boxplot for the treatment group, it looks like 75% of the patients only survived about 525 days after the transplant (about 1 year and 160 days). 75% of 69 patients (rounded to next highest integer) is 52 patients. So about 52 of the patients did not survive past 525 days. Now I am wondering how the researchers defined “survival” when they reported 45 died. They must’ve defined survival as surviving number of days that is less than 525 days. Only very few survive beyond the top whiskher point (close to 1500 days). So basically, it looks like most patients did not really survive long term (past 1500 days or about 4 years).
control_died = 30/34
treatment_died = 45/69
control_died
## [1] 0.8823529
treatment_died
## [1] 0.6521739
Experimental heart transplant increases lifespan.
Total Dead: 75
differences in proportions are close to the difference in proportions in the study
About 88% in the control group died while about 65% in the treatment group died. This is a difference of 23%. The distribution of simulated differences suggest that a 23% difference is highly unlikely to have occured by chance. The researches might make the decision to reject the null hypothesis that there is no difference between the control and treatment group.