1.8)
(a) Each row represents a different UK resident that participated in the survey.
1691 participants
sex - categorical - 2 levels (male and female)
age - numerical - discrete
marital - categorical - 2 levels (single and married)
grossIncome - categorical - ordinal
smoke - categorical - 2 levels (yes and no)
amt/Weekends - numerical - discrete
amt/Weekdays - numerical - discrete
1.10) (a) Population of interest - 160 children between the ages of 5 and 15
1.28) (a) We cannot conlcude that smoking causes dementia later in life. This was an observational study so we cannot determine that there is a causal relationship.
1.36) (a) experiment using stratified random sampling
treatment group - those individuals that excercise twice a week control group - those individuals that are instructed not to excercise
Yes, blocking is being used. The blocking variable is the age. Individuals are broken up according to their age and then half in each group are put into a category.
This study does not make use of blinding. The patients know if they are excercising or not.
The results can be used to show a causal relationship between excercise and mental health because this is an experiment that used stratified random sampling. The conclusion can be generalized to the population at large because it used stratified random sampling.
The reservations that I have relate to other factors that might be signifcant that are not being taken into account in the study, such as gender, health background, socio-economic background, whether the participant has excercised in the past and the location where the participant lives. I also think it would be important to specify the type and rigor of the excercise being conducted so that there is uniformity in partipants’ experiences so a general conclusion can be drawn.
1.48)
scores <- c(57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94)
boxplot(scores)
1.50) (a) unimodal, symmetric, matches picture 2
multimodal, symmetric, matches picture 3
bimodal, right skewed, matches picture 1
1.56) (a) I would expect the data to be right skewed because of the number of houses that cost over $6,000,000. The median would be used to best represent a typical observation. The variability of observations would be best represented using the IQR because it is right skewed.
This is closer to be symmetric, but skews slightly to the right due to the most expensive houses and a cap on the left end due to houses costing some amount of money. Because of the right skew, I would use the IQR to represent the variability of observations. The median would be used to best represent a typical observation.
I would expect the data to skew right because the minimum value is zero and some people drink a lot. The median would be used to best represent a typical observation, and the variability of observations would best be represented using the IQR because it is right skewed.
I would expect the data to skew right due to the high salary employees and that no employee can make less than zero dollars. I would use the median to represent a typical observation and the IQR to best represent the variability.
1.70) (a) Survival appears to dependent on whether the patient got a transplant. In the mosaic plot, a greater percentage of people who received treatment lived as compared with the percentage of people who lived who did not receive treatment.
The box plots suggest that treatment has a large effect on life expectancy. Half of the people who underwent treatment lived more than about 220 days whereas a person who did not receive treatment who lived that long would be considered an outlier.
Proportion of people in the treatment group that died is about .67 Proportion of people in the control group that died is about .86
alive on 116 cards
dead on 284 cards
one group of size 300 representing treatment
one group of size 100 representing control
distribution centered at zero
calculate the fraction of simulations where the simulated differences in proportions are equal to the actual data - percentage of those who underwent treatment who lived.
The simulation suggests that survival is dependent on receiving a transplant. About 33% of patients who received a transplant survived. In the simulation there were no cases in which the survival rate was that high.