library(datasets)
data(iris)
library(openintro)
## Please visit openintro.org for free statistics materials
##
## Attaching package: 'openintro'
## The following objects are masked from 'package:datasets':
##
## cars, trees
data("heartTr")
Question 1.8
1.8 (a) Each row is an observation or case 1.8 (b) Based on the text, a total of 1691 participants were included in the survey 1.8 (c) The following variables are categorical: sex, but this is not ordinal martial, this is also not ordinal smoke, not ordinal grossIncome: based on the data it is within a range, so it is catagorical but ordinal. This variable is generally a numerical continous variable if it had a number defined and not a range
The following variables are numerical:
age: In this case it is discrete, but in theory it can be continous if age was measured at milseconds and nanoseconds AmtWeekends : This is a discrete variable but the way the users input the data can cause confusion and make people believe it is categorical AmtWeekdays : This is a discrete variable but the way the users input the data can cause confusion and make people believe it is categorical
Question 1.10
1.10 (a) The population are children from the ages of 5 to 15. The sample is 160 children that assumes total representation of this population 1.10 (b) I personally feel that a sample size fo 160 students is a relatively small sample for a population. However, there are other factors to take into account. Is religion part of the upbringing, is country considers, is culture considered? If the sample size is for a specific county or state, then it can be generalized to the population, but there are several other factors that cna influence the results. e.g. Are certain childred from certain cultures more prone to lying than others? Assuming that we used a clustered or multistage sample, then it is possible that the results can be representative.
Question 1.28
1.28 (a)
I do not feel that this study can definitively conclude that smoking causes dementia later in life. The population of of 23,123 sre health plan members who voluntarily participated in a survey. Also, this study only takes a population from 50-60 years of old. We do not know whether this same population are life long smokers or whether they had other confounding variables to that led to 25% of the group developing dementia. To make this even more complex, we do not know what other habits were common from 1978 to 1985, but we do know that there were other drug epidemics that occured - e.g. the war on drugs. We don’t know what the previous history was for these individuals vs those who did not participate in a study.
1.28 (b) This study is even more flawed than the previous study. Parents and teacher will not know whether a student are truly deprived of sleep. They are going by proxy information, based on what they asked the children in the study. The children themselves are not the ones actually answering the study.
Question 1.36
Question 1.48 Create a Box Plot
# given:
data <- c(57,66,69,71,72,73,74,77,78,78,79,79,81,81,82,83,83,88,89,94)
data
## [1] 57 66 69 71 72 73 74 77 78 78 79 79 81 81 82 83 83 88 89 94
min <- min(data)
max <- max(data)
q1 <- 72.5
q3 <- 82.5
iqr <- q3 - q1
iqr
## [1] 10
boxplot(data,data=data)
Question 1.50 Mix-and-Max based on the diagrams, the following histograms and boxplots match
Question 1.56
1.56
Queston 1.70 (a) Based on the chart, survival does not appear to be independent of whether the patient received a heart transplant. In the control group there was a very high rate of deaths (%88 vs 65%). The key factor is that survival time does increase with treatment by having treatment.
##
total_control <- subset(heartTr, transplant=="control")
total_treatment <- subset(heartTr, transplant=="treatment")
# total in control
total_control
## id acceptyear age survived survtime prior transplant wait
## 1 15 68 53 dead 1 no control NA
## 2 43 70 43 dead 2 no control NA
## 3 61 71 52 dead 2 no control NA
## 4 75 72 52 dead 2 no control NA
## 5 6 68 54 dead 3 no control NA
## 6 42 70 36 dead 3 no control NA
## 7 54 71 47 dead 3 no control NA
## 9 85 73 47 dead 5 no control NA
## 10 2 68 51 dead 6 no control NA
## 11 103 67 39 dead 6 no control NA
## 12 12 68 53 dead 8 no control NA
## 13 48 71 56 dead 9 no control NA
## 14 102 74 40 alive 11 no control NA
## 15 35 70 43 dead 12 no control NA
## 17 31 69 54 dead 16 no control NA
## 20 5 68 20 dead 18 no control NA
## 21 77 72 41 dead 21 no control NA
## 22 99 73 49 dead 21 no control NA
## 25 101 74 49 alive 31 no control NA
## 26 66 72 53 dead 32 no control NA
## 27 29 69 50 dead 35 no control NA
## 28 17 68 20 dead 36 no control NA
## 29 19 68 59 dead 37 no control NA
## 32 8 68 45 dead 40 no control NA
## 33 44 70 42 dead 40 no control NA
## 36 1 67 30 dead 50 no control NA
## 44 62 71 39 dead 69 no control NA
## 51 9 68 47 dead 85 no control NA
## 55 32 71 41 dead 102 no control NA
## 59 37 71 41 dead 149 no control NA
## 67 27 69 8 dead 263 no control NA
## 73 91 73 47 dead 340 no control NA
## 78 82 71 29 alive 427 no control NA
## 99 26 69 30 alive 1400 no control NA
# total in treatment
total_treatment
## id acceptyear age survived survtime prior transplant wait
## 8 38 70 41 dead 5 no treatment 5
## 16 95 73 40 dead 16 no treatment 2
## 18 3 68 54 dead 16 no treatment 1
## 19 74 72 29 dead 17 no treatment 5
## 23 20 69 55 dead 28 no treatment 1
## 24 70 72 52 dead 30 no treatment 5
## 30 4 68 40 dead 39 no treatment 36
## 31 100 74 35 alive 39 yes treatment 38
## 34 16 68 56 dead 43 no treatment 20
## 35 45 71 36 dead 45 no treatment 1
## 37 22 69 42 dead 51 no treatment 12
## 38 39 70 50 dead 53 no treatment 2
## 39 10 68 42 dead 58 no treatment 12
## 40 35 71 52 dead 61 no treatment 10
## 41 37 70 61 dead 66 no treatment 19
## 42 68 72 45 dead 68 no treatment 3
## 43 60 71 49 dead 68 no treatment 3
## 45 28 69 53 dead 72 no treatment 71
## 46 47 71 47 dead 72 no treatment 21
## 47 32 69 64 dead 77 no treatment 17
## 48 65 72 51 dead 78 no treatment 12
## 49 83 73 53 dead 80 no treatment 32
## 50 13 68 54 dead 81 no treatment 17
## 52 73 72 56 dead 90 no treatment 27
## 53 79 72 53 dead 96 no treatment 67
## 54 36 70 48 dead 100 no treatment 46
## 56 98 73 28 alive 109 no treatment 96
## 57 87 73 46 dead 110 no treatment 60
## 58 97 73 23 alive 131 no treatment 21
## 60 11 68 47 dead 153 no treatment 26
## 61 94 73 43 dead 165 yes treatment 4
## 62 96 73 26 alive 180 no treatment 13
## 63 90 73 52 dead 186 yes treatment 160
## 64 53 71 47 dead 188 no treatment 41
## 65 89 73 51 dead 207 no treatment 139
## 66 24 69 51 dead 219 no treatment 83
## 68 93 73 47 alive 265 no treatment 28
## 69 51 71 48 dead 285 no treatment 32
## 70 67 73 19 dead 285 no treatment 57
## 71 16 68 49 dead 308 no treatment 28
## 72 84 73 42 dead 334 no treatment 37
## 74 92 73 44 alive 340 no treatment 310
## 75 58 71 47 dead 342 yes treatment 21
## 76 88 73 54 alive 370 no treatment 31
## 77 86 73 48 alive 397 no treatment 8
## 79 81 73 52 alive 445 no treatment 6
## 80 80 72 46 alive 482 yes treatment 26
## 81 78 72 48 alive 515 no treatment 210
## 82 76 72 52 alive 545 yes treatment 46
## 83 64 72 48 dead 583 yes treatment 32
## 84 72 72 26 alive 596 no treatment 4
## 85 71 72 47 alive 630 no treatment 31
## 86 69 72 47 alive 670 no treatment 10
## 87 7 68 50 dead 675 no treatment 51
## 88 23 69 58 dead 733 no treatment 3
## 89 63 71 32 alive 841 no treatment 27
## 90 30 69 44 dead 852 no treatment 16
## 91 59 71 41 alive 915 no treatment 78
## 92 56 71 38 alive 941 no treatment 67
## 93 50 71 45 dead 979 yes treatment 83
## 94 46 71 48 dead 995 yes treatment 2
## 95 21 69 43 dead 1032 no treatment 8
## 96 49 71 36 alive 1141 yes treatment 36
## 97 41 70 45 alive 1321 yes treatment 58
## 98 14 68 53 dead 1386 no treatment 37
## 100 40 70 48 alive 1407 yes treatment 41
## 101 34 69 40 alive 1571 no treatment 23
## 102 33 69 48 alive 1586 no treatment 51
## 103 25 69 33 alive 1799 no treatment 25
1.70 (b)
The box plot for the control vs the treatment definitely seems to imply that treatment is effective in increasing the survival time. This is visible by the wider spread for the IQR, as 50% of the data is contained in this area. The overall mean is higher for the treatment group vs the control group.
1.70 (c)
Using R below, the proportion of patients who dead in the control group is 88% and 65% for the patients who died in the treatment group
total_control <- nrow(subset(heartTr, transplant=="control"))
total_treatment <- nrow(subset(heartTr, transplant=="treatment"))
### print totals for each
total_control
## [1] 34
total_treatment
## [1] 69
total_control_dead <- nrow(subset(heartTr, transplant=="control" & survived=="dead"))
total_treatment_dead <- nrow(subset(heartTr, transplant=="treatment" & survived=="dead"))
### total_control
total_control_dead
## [1] 30
total_treatment_dead
## [1] 45
# Proportation of control who died
prop_dead_control <- ( total_control_dead / total_control)
# Proportation of control who died
prop_dead_treatment <- ( total_treatment_dead / total_treatment)
# print out prop for control who died
sprintf("%.4f", prop_dead_control)
## [1] "0.8824"
# print out prop for treatment who died
sprintf("%.4f", prop_dead_treatment)
## [1] "0.6522"
1.70(d) (i) What are the claims being tested? One of the claims tested is whether the increased survival rate due to treatment is statistically significant - independent (H0) or whether it is due to the natural randomness of a sample (H2) - not independent
Blanks:
the simulation below suggests to sugget that the difference is statistically significant. .2303 is almost at the right side of the distribution and very unlikely to be due to variation in chance.