1.8 Smoking habits of UK residents.

  1. Each row represent the a participant’s demographics, econstatus and smoking habits.
  2. 1691 participants
col_name <- c('sex','age','marital','grossincome','smoke','amt weekends','amt weekdays')
col_type <- c('categorical','numerical','categorical','categorical','categorical','numerical','numerical')
col_subtype <- c('not ordinal','continuous','not ordinal','ordinal','not ordinal','continous','contiuous')

answer1_8_c <- data.frame(col_name = c(col_name),col_type = c(col_type),col_subtype = c(col_subtype))
answer1_8_c
##       col_name    col_type col_subtype
## 1          sex categorical not ordinal
## 2          age   numerical  continuous
## 3      marital categorical not ordinal
## 4  grossincome categorical     ordinal
## 5        smoke categorical not ordinal
## 6 amt weekends   numerical   continous
## 7 amt weekdays   numerical   contiuous

1.10 Cheaters, scope of inference.

a.The population of interest : 5- 15 yr old children. The sample is 160 children between 5- 15 yrs old
b.The results cannot be generalized to the population as for the following reasons:
1. The experiment is designed to explore the casual relationship between honesty and instrution rather than the age.
2. Different characteristic will interfere the casual relationship
3. The sample size is too small

1.28 Reading the paper.

  1. We cannot conclude that smoking causes dementia later in life, since this is not a randomized controlled experiment. We can only say that there is a correlation between smoking an dementia.

  2. It is not justified to say ‘The study shows that sleep disorders lead to bullying in school children.’ This is because this survey is not a randomized controlled experiment. We can conclud that there is a correlation between sleep disorder and bullying in school children.

1.36 Exercise and mental health.

(a). Randomized controlled study
(b). Treatment group: Those instructed to exercise twice a week
Controlled group: Those instructed not to exercise.
(c). This study use blocking based on age groups.
(d). No. Blinding is not used in this study. Participants know which group they are in by given or not any instructions on excercise.
(e). Yes, we can draw a conclusion that there is a causal relationship between exercise and mental health because this is a randomized controlled study.

1.48 Stats scores.

 score<- c(57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94)
boxplot(score)

1.50 Mix-and-match.

  1. –> (2): Distribution (a) is a normal distribution
  2. –> (3): Distribution (b) is uniform distributed thus the bloxplot is expected to have a wider 25% - 75% range.
  3. –> (1): Distribution (c) is right skewed distribution with long tail.

1.56 Distributions and appropriate statistics, Part II .

  1. This is a right skewed distribution as the median price is only 0.075 of the range. The median would best represent a typical observation as there are a meaningful number of houses costing more than $6M.The variability would be best represented using the IQR because SD is based on the mean.
  2. This is a uniform distribution as the number of house falls in each price quartile are the same. Mean and Median could both represent a typical observation in the data. Both IQR adn SD would best represent the variability of observations.

  3. This is a slightly right skewed distribution given that most college student almost don’t drink, the right tail shows a few students drink excessively. Median would best describe observation and IQR would best describe variability since they are not affected as much by outliers.

  4. This is a right skew distribution given the assumption that the number of higher level position is less the lower position and the salary level is positively correlated with the position level. Median would best describe observation and IQR would best describe variability since they are not affected as much by outliers.

1.70 Heart transplants.

  1. Base on the mosaic plot, Survival independent is not independent from patient got a transplant. We can see there is a strong correlation between survival rate/time and transplant.

  2. The heart transplant can prolong patient’s survival time.

# install.packages("openintro")
library(openintro)
## Please visit openintro.org for free statistics materials
## 
## Attaching package: 'openintro'
## The following objects are masked from 'package:datasets':
## 
##     cars, trees
treatment <- subset(heartTr, transplant == 'treatment')
control <- subset(heartTr, transplant == 'control')
prop.table(table(treatment$survived))
## 
##     alive      dead 
## 0.3478261 0.6521739
prop.table(table(control$survived))
## 
##     alive      dead 
## 0.1176471 0.8823529

65% of patients in the treatment group died. 88% of patients in the control group died.

(d-i) Heart transplants can increase lifespan for gravely ill patient with heart problem.

(d-ii) 28, 75, 69, 34, 0, independent

(d-iii) The transplant treatment is effective in increasing survival rate of patients, since the simulated difference is pretty low.