What does each row of the data matrix represent?
Each row of the data matrix represents an observation.How many participants were included in the survey?
There are 1691 participants were included in the survey. Indicate whether each variable in the study is numerical or categorical. If numerical, identify as continuous or discrete. If categorical, indicate if the variable is ordinal.
sex: Categorical
age: Numerical, coninuous
marital: Categorical
grossIncome: Categorical, ordinal
smoke: Categorical,nominal
amtWeekends: Numerical, discrete
amtWeekdays: Numerical, discreteIdentify the population of interest and the sample in this study.
The population is all the children between ages 5 and 15. The sample was the 160 choosen children.Comment on whether or not the results of the study can be generalized to the population, and if the findings of the study can be used to establish causal relationships.
The study can not be generalized to the population for two reasons. First,the 160 children are too small to represent all children. Second, it is not sure whther the study used random sampling.
The findings of the study is useful to discover the causal relationships.Smokers Found More Prone to Dementia
Based on the study, it is hard to make that conclusion because the data is bias. First, the study should include people who are not the health plan members. Also, 23 years after the first survey, it 's possible that some of them might die and we are not sure how the survey deal with the problem. What's more, we don't know whether some of them had dementia when they took the first survey. Finally, smoking is not the only factor will lead to dementia. The School Bully Is Sleepy
The statement is not justified. There might be some relationship between sleep disorders and behavioral issues. However, further analysis needed in order to confirm the casual relationship. What type of study is this?
This is an experiment.What are the treatment and control groups in this study?
The treatment group is the group which exercise twice a week.
The other is control group.Does this study make use of blocking? If so, what is the blocking variable?
Yes, the blocking variable is the age.Does this study make use of blinding?
No.Comment on whether or not the results of the study can be used to establish a causal relationship between exercise and mental health, and indicate whether or not the conclusions can be generalized to the population at large.
Yes, it is possible to establish a causal relationship. Suppose you are given the task of determining if this proposed study should get funding. Would you have any reservations about the study proposal?
Yes, I don't know the sample size, how long the exercises, what kind of exercises and more. Unless those concerns are being clarified, I won't fund the propose. Below are the final exam scores of twenty introductory statistics students. 57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94. Create a box plot of the distribution of these scores.
library(tidyverse)
scores <- c(57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94)
scores <- as.data.frame(scores)
ggplot(scores,aes(y=scores))+
geom_boxplot()
Describe the distribution in the histograms below and match them to the box plots.
(a)This one can consider to be symmetric and matches the boxplot (2).
(b)The distribution is kind of distribute evently,and matches the boxplot (3).
(c)The distribution is right skewed, unimodel and matches the boxplot (1).
(a) Right skewed(median<mean), Median and IQR
(b) Symmetric, Median and IQR
(c) Left skewed(mean<median), Median and IQR
(d) Right skewed(median<mean), Median and IQR
Based on the mosaic plot, is survival independent of whether or not the patient got a transplant? Explain your reasoning.
The mosaic plot shows more survival rate in the treatment group than the control group.So it is highly possible that survival is somewhat related to the heart transplant.What do the box plots below suggest about the efficacy (effectiveness) of the heart transplant treatment.
Compare the two box plots, we can clearly see that the patients in control group usually died in few months after entering the program while the patients in treatment group live much longer. The median survival time of the treatment group is much larger than the other group. Therefore, the heart transplant treatment is effective for survival.library(openintro)
data(heartTr)
treat_group <- heartTr %>%
group_by(transplant) %>%
count(survived) %>%
filter(transplant == 'treatment') %>%
mutate(percentage = n/sum(n))
treat_rate <- round(treat_group[2,4]*100,2)
treat_group
## # A tibble: 2 x 4
## # Groups: transplant [1]
## transplant survived n percentage
## <fct> <fct> <int> <dbl>
## 1 treatment alive 24 0.348
## 2 treatment dead 45 0.652
print(paste0("The proportion of patients in the control group died is: ",treat_rate,"%"))
## [1] "The proportion of patients in the control group died is: 65.22%"
control_group <- heartTr %>%
group_by(transplant) %>%
count(survived) %>%
filter(transplant == 'control') %>%
mutate(percentage = n/sum(n))
control_rate <- round(control_group[2,4]*100,2)
control_group
## # A tibble: 2 x 4
## # Groups: transplant [1]
## transplant survived n percentage
## <fct> <fct> <int> <dbl>
## 1 control alive 4 0.118
## 2 control dead 30 0.882
print(paste0("The proportion of patients in the control group died is: ",control_rate,"%"))
## [1] "The proportion of patients in the control group died is: 88.24%"
What are the claims being tested?
The claim being tested is that a heart transplant increased lifespan.Second
We write alive on 28 cards representing patients who were alive at the end of the study,and dead on 75 cards representing patients who were not. Then, we shuffle these cards and split them into two groups: one group of size 69 representing treatment, and another group of size 34 representing control. We calculate the difference betwen the proportion of dead cards in the treatment and control groups (treatment control) and record this value. We repeat this 100 times to build a distribution centered at 0 . Lastly, we calculate the fraction of simulations where the simulated differences in proportions are 23.02%. If this fraction is low, we conclude that it is unlikely to have observed such an outcome by chance and that the null hypothesis should be rejected in favor of the alternative.
What do the simulation results shown below suggest about the effectiveness of the transplant program?
Two of the 100 simulations had a difference of at least 23.02%, the difference observed in the study. We conclude the evidence is suciently strong to reject H0 and assert that the transplant program is effective.