Week 2 assignment

1.10

Identify the population of interest and the sample in this study Population: all children between 5 and 15 Sample: 160 children between 5 and 15 in the experiment

B)The 160 chilidren are not specified as random selected. If they were randomly sampled then it would be possible to generalize for a wider population. As the question does not state that they are, we can not develop a relationship therefore.

1.20

A)This study is an observational study

B)No because this is only an observational study.

C)At the final period which is more stressful for students, they may tend to sleep less and consume more coffee.

1.30

A)This is an expirmental study

B)Yes this can because an experiment was performed with a random sample and a control was introduced. However there could be a confounding variable with the short stop on the elevator causing muscle cramps.

1.40

prod <-   c(1,.8,.7,.75,.8,.5,.1,.1,.2,.2,.4,.55,.2,.4)
stress <- c(.5,.6,.65,.5,.55,.4,1,.1,.3,.9,.3,.6,.2,.8)

prod.df <- data.frame(prod, stress)
library(ggplot2)
ggplot(prod.df, aes(stress, prod)) + geom_point() + geom_smooth()

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

1.50

A)Box plot 2, the distribution of this data is unimodal, symmetric, therefore it could be normally distributed

B)Box plot 3. The distribution of this data may be symmetric but it is nor normally distributed, there is a possibility that it is uniform.

C)Box plot 1. The distribution of this data is unimodal and has a right skew.

1.60

mean/median = 1 the shape may be symetric as the mean is the same as the median.
mean/median < 1 the shape has a negative skewness. This is because in a negative skew data set, the median is always greater than the mean
mean/median > 1 the shape will have a positive skewness. This is because in a positive skew data set, the median will always be less than the mean.

1.70

No, the data shows that the survival is related with the transplant which the patient has. Based on the mosaic plot, those have a higher survival rate. Therefore the survival is not independent with receiving the transplant.

B)The heart treatment increases the survival time for patients as we can see from the boc plot

library(openintro)

## Loading required package: airports

## Loading required package: cherryblossom

## Loading required package: usdata

data("heart_transplant")
patient_control_dead <- nrow(subset(heart_transplant, heart_transplant$transplant == 
    "control" & heart_transplant$survived == "dead"))
patient_control <- nrow(subset(heart_transplant, heart_transplant$transplant == 
    "control"))
patient_treatement_dead <- nrow(subset(heart_transplant, heart_transplant$transplant == 
    "treatment" & heart_transplant$survived == "dead"))
patient_treatment <- nrow(subset(heart_transplant, heart_transplant$transplant == 
    "treatment"))
patient_control_dead_ratio <- patient_control_dead/patient_control
patient_treatment_dead_ratio <- patient_treatement_dead/patient_treatment
patient_control_dead_ratio

## [1] 0.8823529

patient_treatment_dead_ratio

## [1] 0.6521739

patient_Dead <- sum(heart_transplant$survived == "dead")
patient_Dead

## [1] 75

patient_treatment

## [1] 69

patient_control

## [1] 34

patient_treatment_dead_ratio - patient_control_dead_ratio

## [1] -0.230179

2.12

A)Probability that a student chosen at random miss exactly one day = 25/100 Probability that a student chosen at random miss 2 days = 15/100 Probability that a student chosen at random miss 3 or more days = 28/100 Probability that a student chosen at random miss 1 or more days = 0.25+0.15+0.28 = 0.68 Probability that a student chosen at random doesn’t miss any days of school due to sickness this year = 1-0.68 = 0.32

B)0.32+0.25=0.57

C)0.25+0.15+0.28=0.68

D)0.32*0.32=0.1024

E)0.68*0.68=0.46

F)No, considering the effect of infection, one kid missing school due to sickness will cause other kid sick and miss school.

2.18

mat=matrix(c(.023, 0.0364, 0.0427, 0.0192, 0.0050,0.2099, 0.3123 ,0.2410 ,0.0817,0.0289), byrow=TRUE, nrow=2)
colnames(mat)=c("Excellent", "Very Good","Good", "Fair","Poor")
rownames(mat)=c("No Coverage","Coverage")
mat

##             Excellent Very Good   Good   Fair   Poor
## No Coverage    0.0230    0.0364 0.0427 0.0192 0.0050
## Coverage       0.2099    0.3123 0.2410 0.0817 0.0289

mat <-cbind(mat,c(sum(mat[1,1:5]),sum(mat[2,1:5])))
colnames(mat)[6] = "Total"

mat <-rbind(mat,c(sum(mat[1:2,1]),sum(mat[1:2,2]),sum(mat[1:2,3]),sum(mat[1:2,4]),sum(mat[1:2,5]),sum(mat[1:2,6])))
rownames(mat)[3] = "Total"

mat

##             Excellent Very Good   Good   Fair   Poor  Total
## No Coverage    0.0230    0.0364 0.0427 0.0192 0.0050 0.1263
## Coverage       0.2099    0.3123 0.2410 0.0817 0.0289 0.8738
## Total          0.2329    0.3487 0.2837 0.1009 0.0339 1.0001

A)No,because in the given distribution frequency, being in excellent health and having coverage > 0

B)0.023+0.2099=0.2329

C)0.2099/0.8738

D)0.2099/0.1263

E)Having excellent health and having health coverage does not appear to be independent since if they were independent, the probability of having excellent health and having health coverage would be the same as the probability of having excellent health times the probability of having health coverage. However the numbers are close but not eqyal. Therefore the two events are not independent.

2.24

mat=matrix(c(19.61,33.39,53,20.68,26.32,47,40.29,59.71,100), byrow=TRUE, nrow=3)
colnames(mat)=c("--had degree--", "--No degree--","Total")
rownames(mat)=c("Voted in favor","Voted against","Total")
mat

##                --had degree-- --No degree-- Total
## Voted in favor          19.61         33.39    53
## Voted against           20.68         26.32    47
## Total                   40.29         59.71   100

19.61/40.29

2.30

mymat2=matrix(c(13,59,72,15,8,23,28,67,95),nrow=3,byrow=TRUE)
colnames(mymat2)=c("hard","paper","TOTAL")
rownames(mymat2)=c("fiction","nonfiction","TOTAL")
mymat2

##            hard paper TOTAL
## fiction      13    59    72
## nonfiction   15     8    23
## TOTAL        28    67    95

A)P(hc)∗P(p&f)=28/95∗59/94=0.185

B)P(f&hc)∗P(hc)+P(f&p)∗P(hc)=13/95∗27/94+59/95∗28/94=0.224

C)P(f)∗P(hc)=72/95∗28/95=0.223

D)picking hardcover book after picking fiction and paperback book is independent

2.36

A)Probability of winning nothing/-2 dollar profit = 36/52 Probability of winning 3 dollars/1 dollar profit = 12/52 Probability of winning 5 dollars/ 3 dollars profit = 3/52 Probability of winning 25 dollars/ 23 dollars profit=1/52 Expected profit per game=36/52∗(−2)+12/52∗1+3/52∗3+1/52∗23=0.807Dollars

B)Probability of making money=16/52 Probability of losing money=36/52 Probability of losing money is more than gaining. I don’t recommend this game as a good way to make money.

2.42

mymat3=matrix(c(48,1,1, 2,.25,.0625), nrow=2, byrow=TRUE)
colnames(mymat3)=c("mean", "SD", "Var")
rownames(mymat3)=c("X, In Box","Y, Scooped")
mymat3

##            mean   SD    Var
## X, In Box    48 1.00 1.0000
## Y, Scooped    2 0.25 0.0625

A)48+3∗2=54 √(1+9∗0.0625)=1.25

B)48−2=46 √(1+0.0625)=1.03

Week 2 assignment

Yifeng Qi

2021/2/12