Problem 1.8
a.) Each row represents a UK resident who participated in the study. b.) 1691 Participants were included in the study c.) Sex = Categorical (Nominal), age = Numerical (discrete), Marital = Categorical (Nominal), grossIncome = Categorical (Ordinal), Smoke = Categorical (Nominal), amtWeekends = Numerical (Discrete), amtWeekdays = Categorical (Discrete)
Problem 1.10
a.) The population of interest is children between the ages of 5 and 15; the sample is the 160 children ages 5-15 used for the study. b.) Given that this was an experiment, the findings can be applied to the broader population and causal relationships can be established given the results.
Problem 1.28
a.) Based on the study you cannot conclude that smoking causes dementia, rather it appears that there is a correlation between dementia and the amount of smoking done by a participant later in their life. Risk of dementia increases with the amount one smokes later in their life but the study did not describe it as a root cause, but rather a contributing factor to the development of dementia. There are a number of factors that contribute to the development of dementia and to attribute smoking as the root cause of its development, based on this study, would be inaccurate as they are not taking various aspects of human development into account. (i.e genetics, other hazardous activities, injury, etc.)
b.) The statement is not justified. The study did not isolate or control solely for participants who suffer from sleep disorders and end up bullying/acting disruptive. What was identified in this study was a correlation between lack of sleep and the propensity to bully; the best that could be deduced is that sleep disorders can be a contributing factor to poor behavior/bullying.
Problem 1.36
a.) This is an experiment b.) Treatment group: those instructed to exercise twice a week, Control group: those instructed not to exercise c.) This study makes use of blocking with the blocking variable being age. d.) There is no blinding used in this study. e.) We can make a causal statement based on the findings of the experiment and we can cautiously apply conclusions to the population at large. f.) I would have some reservations: 1.) Without the instruction on how long each participant should exercise it risks skewing results and creating a confounding scenario 2.) Introductory mental health exams should be administered to develop a baseline and then additional blocking should be done based on the results.
Problem 1.48
final_exam_scores = c(57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94)
summary(final_exam_scores)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 57.00 72.75 78.50 77.70 82.25 94.00
boxplot(final_exam_scores)Problem 1.50
Histogram a matches box plot 2. This distribution is symmetric. Histogram b matches box plot 3. This distribtion is uniform. Histogram c matches box plot 1. This distribution is right skewed.
Problem 1.56
a.) Right Skewed. Median would best represent a typical observation. IQR would best represent the variability of observations. b.) Symmetric Distribution. Mean would best represent a typical observation. Standard deviation would best represent the variability of observations. c.) Right skewed distribution. Median would best represent a typical observation. IQR would best represent the variability of observations. d.) Right skewed distribution. Median would best represent a typical observation. IQR would best represent the variability of observations.
Problem 1.7
a.) Survival is not independent of whether or not a transplant was received because the mosaic plot shows that a greater proportion of people survived in the treatment group versus the control group where the treatment was not received and the survival rate was very low. If it were independent the survival variable would be unaffected by whether or not the patient received a transplant, which the mosaic plot shows is not the case.
b.) The box plot suggests that the heart transplant treatment is highly effective, expanding the duration of survival to a max of 1799 days versus the max amount of survival in the contro group being 1400 in a completely abnormal outlier. Both the mean and the IQR are greater for the treatment group than the control group by a significant amount; the mean indicates that people survived 415.4 days on average with the transplant versus the average of 96.62 days in the non treatment group, which is a difference of 318.78 days.
library(openintro)## Please visit openintro.org for free statistics materials
##
## Attaching package: 'openintro'
## The following objects are masked from 'package:datasets':
##
## cars, trees
data("heartTr")
s_time = subset(heartTr,transplant=='treatment',select=survtime)
summary(s_time)## survtime
## Min. : 5.0
## 1st Qu.: 72.0
## Median : 207.0
## Mean : 415.4
## 3rd Qu.: 630.0
## Max. :1799.0
ns_time = subset(heartTr,transplant=='control',select=survtime)
summary(ns_time$survtime)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 6.00 21.00 96.62 47.50 1400.00
boxplot(heartTr$survtime~heartTr$transplant)c.) Approximately 65% of patients died in the treatment group versus 88% for the control group.
t_dead = subset(heartTr,transplant=='treatment'&survived=='dead')
number_treatment = subset(heartTr,transplant=='treatment')
tp_dead = length(t_dead$survived)/length(number_treatment$survived)
c_dead = subset(heartTr,transplant=='control'&survived=='dead')
number_control = subset(heartTr,transplant=='control')
cp_dead = length(c_dead$survived)/length(number_control$survived)
print(tp_dead)## [1] 0.6521739
print(cp_dead)## [1] 0.8823529
d.i.) The claims being tested are: Null Hypothesis: The experimental heart transplant and survival are variables independent of each other with regards to survival. Alternative Hypothesis: The experimental heart transplant and survival are not variables independent of each other with regards to survival.
d.ii.) We write alive on 28 cards representing patients who were alive at the end of the study, and dead on 75 cards representing patients who were not. Then, we shuffle these cards and split them into two groups: one group of size 69 representing treatment, and another group of size 34 representing control. We calculate the difference between the proportion of dead cards in the treatment and control groups (treatment - control) and record this value. We repeat this 100 times to build a distribution centered at 0. Lastly, we calculate the fraction of simulations where the simulated differences in proportions are equivalent to the difference observed in the study outcome. If this fraction is low, we conclude that it is unlikely to have observed such an outcome by chance and that the null hypothesis should be rejected in favor of the alternative.
study_outcome = (1-tp_dead)-(1-cp_dead)
print(study_outcome)## [1] 0.230179
print(-study_outcome)## [1] -0.230179
d.iii.) The study indicates that there are points that are below and above the study outcome (exceeding the range of -.2302 to .2302), which means that survival is not independent of the transplant treatment.