Graded: 1.8, 1.10, 1.28, 1.36, 1.48, 1.50, 1.56, 1.70
Problem 1.8
library(openintro)
## Please visit openintro.org for free statistics materials
##
## Attaching package: 'openintro'
## The following objects are masked from 'package:datasets':
##
## cars, trees
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
data(package = 'openintro')
data("heartTr")
data(package = 'openintro')
str(smoking)
## 'data.frame': 1691 obs. of 12 variables:
## $ gender : Factor w/ 2 levels "Female","Male": 2 1 2 1 1 1 2 2 2 1 ...
## $ age : int 38 42 40 40 39 37 53 44 40 41 ...
## $ maritalStatus : Factor w/ 5 levels "Divorced","Married",..: 1 4 2 2 2 2 2 4 4 2 ...
## $ highestQualification: Factor w/ 8 levels "A Levels","Degree",..: 6 6 2 2 4 4 2 2 3 6 ...
## $ nationality : Factor w/ 8 levels "British","English",..: 1 1 2 2 1 1 1 2 2 2 ...
## $ ethnicity : Factor w/ 7 levels "Asian","Black",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ grossIncome : Factor w/ 10 levels "10,400 to 15,600",..: 3 9 5 1 3 2 7 1 3 6 ...
## $ region : Factor w/ 7 levels "London","Midlands & East Anglia",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ smoke : Factor w/ 2 levels "No","Yes": 1 2 1 1 1 1 2 1 2 2 ...
## $ amtWeekends : int NA 12 NA NA NA NA 6 NA 8 15 ...
## $ amtWeekdays : int NA 12 NA NA NA NA 6 NA 8 12 ...
## $ type : Factor w/ 5 levels "","Both/Mainly Hand-Rolled",..: 1 5 1 1 1 1 5 1 4 5 ...
Each row represents a UK resident that was interviewed in the survey
There was a total of 1691 UK residents in this survey
Categorical variables: gender, marital status, higest qualification - ordinal, nationality, ethnicity, gross income - ordinal, region, smoke Numerical variable: age - continuous, amount on weekends - discrete, amount on weekdays - discrete
Problem 1.10
The population of interest in this group are kids aged between 5 and 15
The population is pretty huge, kids aged 5-15, and the sample used can also be considered extremely small, 160 kids. Depending on whether the outcome of the experiment is being used to generalize all kids aged 5-15 in the US, or the state, or the school would change my perspective on whether it is appropriate to generalize the population. Also, 10 years is a huge gap in age which results in differences in maturity. In this study, it’s combining first graders with highschool freshman/sophomores which I would assume not be accurate in generalizing the population.
Problem 1.28
Based on the study and its results, I conclude that smoking and dementia have a degree of correlation. I would not conclude that smoking causes dimentia. The nature of the study is merely observational, and not expirmental so I would not conclude that smoking causes cigarettes.
The statement is not justified. According to the researchers conclusion, children who had behavioral issues, and those who were identified as bullies were twice as likely to have shown symptoms of sleep disorders. This means that children that have behavioral issues or that are bullies are more likely to have sleep disorder symptoms.
Problem 1.36
Randomized experiment
treatment - exercise group. Control - non exercise group
This experiment makes use of blocking by the age variable
No because participants know who is recieving the treament (exercise) and who is not
There is a possibility of a casual relationship due to the nature of the experiment. Depending on the size of the sample one would be able to generalize it to the population at large
I would suggest coming up with a reasonable sample size, moving excerising up to 3-5 days per week, categorizing the types of workouts to weight lifting, cardio, yoga, etc., and tracking average heartbeat rate throughout the workout.
Problem 1.48
data <- c(57,66,69,71,72,73,74,77,78,78,79,79,81,81,82,83,83,88,89,94)
dataframe <- as.data.frame(data)
boxplot(dataframe, ylab = "Final Exam Scores for Intro Statistics")
Problem 1.50
symmetric, unimodal, matches boxplot (2)
evenly distributed, matches boxplot (3)
skewed right, unimodal, matches boxplot (1)
Problem 1.56
Right skewed since there is a meaningful number of houses over $6,000,000. Based off of the percentages provided, most costs of homes lie between $350k and $1mil. The mean might now be a good representation for a typical observation due to the number of houses over $6mil, I would suggest using the median. Also, the standard deviation wouldn’t be appropriate to represent the variability of the observations due to its skewness. The IQR would be a better representation.
The distribution seems fairly symmetric spread, with a majority of houses price around $600k. The few houses over $1.2mil might skew the distribution right however. I assumed standard deviation be appropriate to measure the variablility due to the symmetric distribution
The distribution would be heavly skewed left. Students would not be allowed to drink till their junior year, and in some cases their senior year. This suggest 75% have 0 drinks per week. There will be an upper 25% which will drink a decent amount, say 4 drinks per week and a few excessive drinkers, say 8+. Median seems to be the best representation since most students are underaged and unable to drink, and IQR is also the best representation since the 21 year old would not inflate the variablility.
Left skewed distribution, left tail of low salaries, and few of executives on the right making much higher salaries. Due to the skewness, the median would best represent the typical salary, which suggests the IQR is also a better representation since it is not influenced by the higher executive salaries.
Problem 1.70
It looks like more people took the treatment than people who did not. However, the mosaic plot suggests survival rate improves with treament, but we cannot claim independece soley from the mosaic plot.
The boxplot suggests that the treatment isn’t too effective, the median is around 250 days, but it is a greater survival rate than not having any treatment. The treatment is effective in extending survival for its patients compared to the nontreatment group where almost all patients die within 250 days.
heartTr
## id acceptyear age survived survtime prior transplant wait
## 1 15 68 53 dead 1 no control NA
## 2 43 70 43 dead 2 no control NA
## 3 61 71 52 dead 2 no control NA
## 4 75 72 52 dead 2 no control NA
## 5 6 68 54 dead 3 no control NA
## 6 42 70 36 dead 3 no control NA
## 7 54 71 47 dead 3 no control NA
## 8 38 70 41 dead 5 no treatment 5
## 9 85 73 47 dead 5 no control NA
## 10 2 68 51 dead 6 no control NA
## 11 103 67 39 dead 6 no control NA
## 12 12 68 53 dead 8 no control NA
## 13 48 71 56 dead 9 no control NA
## 14 102 74 40 alive 11 no control NA
## 15 35 70 43 dead 12 no control NA
## 16 95 73 40 dead 16 no treatment 2
## 17 31 69 54 dead 16 no control NA
## 18 3 68 54 dead 16 no treatment 1
## 19 74 72 29 dead 17 no treatment 5
## 20 5 68 20 dead 18 no control NA
## 21 77 72 41 dead 21 no control NA
## 22 99 73 49 dead 21 no control NA
## 23 20 69 55 dead 28 no treatment 1
## 24 70 72 52 dead 30 no treatment 5
## 25 101 74 49 alive 31 no control NA
## 26 66 72 53 dead 32 no control NA
## 27 29 69 50 dead 35 no control NA
## 28 17 68 20 dead 36 no control NA
## 29 19 68 59 dead 37 no control NA
## 30 4 68 40 dead 39 no treatment 36
## 31 100 74 35 alive 39 yes treatment 38
## 32 8 68 45 dead 40 no control NA
## 33 44 70 42 dead 40 no control NA
## 34 16 68 56 dead 43 no treatment 20
## 35 45 71 36 dead 45 no treatment 1
## 36 1 67 30 dead 50 no control NA
## 37 22 69 42 dead 51 no treatment 12
## 38 39 70 50 dead 53 no treatment 2
## 39 10 68 42 dead 58 no treatment 12
## 40 35 71 52 dead 61 no treatment 10
## 41 37 70 61 dead 66 no treatment 19
## 42 68 72 45 dead 68 no treatment 3
## 43 60 71 49 dead 68 no treatment 3
## 44 62 71 39 dead 69 no control NA
## 45 28 69 53 dead 72 no treatment 71
## 46 47 71 47 dead 72 no treatment 21
## 47 32 69 64 dead 77 no treatment 17
## 48 65 72 51 dead 78 no treatment 12
## 49 83 73 53 dead 80 no treatment 32
## 50 13 68 54 dead 81 no treatment 17
## 51 9 68 47 dead 85 no control NA
## 52 73 72 56 dead 90 no treatment 27
## 53 79 72 53 dead 96 no treatment 67
## 54 36 70 48 dead 100 no treatment 46
## 55 32 71 41 dead 102 no control NA
## 56 98 73 28 alive 109 no treatment 96
## 57 87 73 46 dead 110 no treatment 60
## 58 97 73 23 alive 131 no treatment 21
## 59 37 71 41 dead 149 no control NA
## 60 11 68 47 dead 153 no treatment 26
## 61 94 73 43 dead 165 yes treatment 4
## 62 96 73 26 alive 180 no treatment 13
## 63 90 73 52 dead 186 yes treatment 160
## 64 53 71 47 dead 188 no treatment 41
## 65 89 73 51 dead 207 no treatment 139
## 66 24 69 51 dead 219 no treatment 83
## 67 27 69 8 dead 263 no control NA
## 68 93 73 47 alive 265 no treatment 28
## 69 51 71 48 dead 285 no treatment 32
## 70 67 73 19 dead 285 no treatment 57
## 71 16 68 49 dead 308 no treatment 28
## 72 84 73 42 dead 334 no treatment 37
## 73 91 73 47 dead 340 no control NA
## 74 92 73 44 alive 340 no treatment 310
## 75 58 71 47 dead 342 yes treatment 21
## 76 88 73 54 alive 370 no treatment 31
## 77 86 73 48 alive 397 no treatment 8
## 78 82 71 29 alive 427 no control NA
## 79 81 73 52 alive 445 no treatment 6
## 80 80 72 46 alive 482 yes treatment 26
## 81 78 72 48 alive 515 no treatment 210
## 82 76 72 52 alive 545 yes treatment 46
## 83 64 72 48 dead 583 yes treatment 32
## 84 72 72 26 alive 596 no treatment 4
## 85 71 72 47 alive 630 no treatment 31
## 86 69 72 47 alive 670 no treatment 10
## 87 7 68 50 dead 675 no treatment 51
## 88 23 69 58 dead 733 no treatment 3
## 89 63 71 32 alive 841 no treatment 27
## 90 30 69 44 dead 852 no treatment 16
## 91 59 71 41 alive 915 no treatment 78
## 92 56 71 38 alive 941 no treatment 67
## 93 50 71 45 dead 979 yes treatment 83
## 94 46 71 48 dead 995 yes treatment 2
## 95 21 69 43 dead 1032 no treatment 8
## 96 49 71 36 alive 1141 yes treatment 36
## 97 41 70 45 alive 1321 yes treatment 58
## 98 14 68 53 dead 1386 no treatment 37
## 99 26 69 30 alive 1400 no control NA
## 100 40 70 48 alive 1407 yes treatment 41
## 101 34 69 40 alive 1571 no treatment 23
## 102 33 69 48 alive 1586 no treatment 51
## 103 25 69 33 alive 1799 no treatment 25
treatmentgroup <- filter(heartTr,transplant == "treatment")
treatmentgroup
## id acceptyear age survived survtime prior transplant wait
## 1 38 70 41 dead 5 no treatment 5
## 2 95 73 40 dead 16 no treatment 2
## 3 3 68 54 dead 16 no treatment 1
## 4 74 72 29 dead 17 no treatment 5
## 5 20 69 55 dead 28 no treatment 1
## 6 70 72 52 dead 30 no treatment 5
## 7 4 68 40 dead 39 no treatment 36
## 8 100 74 35 alive 39 yes treatment 38
## 9 16 68 56 dead 43 no treatment 20
## 10 45 71 36 dead 45 no treatment 1
## 11 22 69 42 dead 51 no treatment 12
## 12 39 70 50 dead 53 no treatment 2
## 13 10 68 42 dead 58 no treatment 12
## 14 35 71 52 dead 61 no treatment 10
## 15 37 70 61 dead 66 no treatment 19
## 16 68 72 45 dead 68 no treatment 3
## 17 60 71 49 dead 68 no treatment 3
## 18 28 69 53 dead 72 no treatment 71
## 19 47 71 47 dead 72 no treatment 21
## 20 32 69 64 dead 77 no treatment 17
## 21 65 72 51 dead 78 no treatment 12
## 22 83 73 53 dead 80 no treatment 32
## 23 13 68 54 dead 81 no treatment 17
## 24 73 72 56 dead 90 no treatment 27
## 25 79 72 53 dead 96 no treatment 67
## 26 36 70 48 dead 100 no treatment 46
## 27 98 73 28 alive 109 no treatment 96
## 28 87 73 46 dead 110 no treatment 60
## 29 97 73 23 alive 131 no treatment 21
## 30 11 68 47 dead 153 no treatment 26
## 31 94 73 43 dead 165 yes treatment 4
## 32 96 73 26 alive 180 no treatment 13
## 33 90 73 52 dead 186 yes treatment 160
## 34 53 71 47 dead 188 no treatment 41
## 35 89 73 51 dead 207 no treatment 139
## 36 24 69 51 dead 219 no treatment 83
## 37 93 73 47 alive 265 no treatment 28
## 38 51 71 48 dead 285 no treatment 32
## 39 67 73 19 dead 285 no treatment 57
## 40 16 68 49 dead 308 no treatment 28
## 41 84 73 42 dead 334 no treatment 37
## 42 92 73 44 alive 340 no treatment 310
## 43 58 71 47 dead 342 yes treatment 21
## 44 88 73 54 alive 370 no treatment 31
## 45 86 73 48 alive 397 no treatment 8
## 46 81 73 52 alive 445 no treatment 6
## 47 80 72 46 alive 482 yes treatment 26
## 48 78 72 48 alive 515 no treatment 210
## 49 76 72 52 alive 545 yes treatment 46
## 50 64 72 48 dead 583 yes treatment 32
## 51 72 72 26 alive 596 no treatment 4
## 52 71 72 47 alive 630 no treatment 31
## 53 69 72 47 alive 670 no treatment 10
## 54 7 68 50 dead 675 no treatment 51
## 55 23 69 58 dead 733 no treatment 3
## 56 63 71 32 alive 841 no treatment 27
## 57 30 69 44 dead 852 no treatment 16
## 58 59 71 41 alive 915 no treatment 78
## 59 56 71 38 alive 941 no treatment 67
## 60 50 71 45 dead 979 yes treatment 83
## 61 46 71 48 dead 995 yes treatment 2
## 62 21 69 43 dead 1032 no treatment 8
## 63 49 71 36 alive 1141 yes treatment 36
## 64 41 70 45 alive 1321 yes treatment 58
## 65 14 68 53 dead 1386 no treatment 37
## 66 40 70 48 alive 1407 yes treatment 41
## 67 34 69 40 alive 1571 no treatment 23
## 68 33 69 48 alive 1586 no treatment 51
## 69 25 69 33 alive 1799 no treatment 25
filter(treatmentgroup, survived == 'dead')
## id acceptyear age survived survtime prior transplant wait
## 1 38 70 41 dead 5 no treatment 5
## 2 95 73 40 dead 16 no treatment 2
## 3 3 68 54 dead 16 no treatment 1
## 4 74 72 29 dead 17 no treatment 5
## 5 20 69 55 dead 28 no treatment 1
## 6 70 72 52 dead 30 no treatment 5
## 7 4 68 40 dead 39 no treatment 36
## 8 16 68 56 dead 43 no treatment 20
## 9 45 71 36 dead 45 no treatment 1
## 10 22 69 42 dead 51 no treatment 12
## 11 39 70 50 dead 53 no treatment 2
## 12 10 68 42 dead 58 no treatment 12
## 13 35 71 52 dead 61 no treatment 10
## 14 37 70 61 dead 66 no treatment 19
## 15 68 72 45 dead 68 no treatment 3
## 16 60 71 49 dead 68 no treatment 3
## 17 28 69 53 dead 72 no treatment 71
## 18 47 71 47 dead 72 no treatment 21
## 19 32 69 64 dead 77 no treatment 17
## 20 65 72 51 dead 78 no treatment 12
## 21 83 73 53 dead 80 no treatment 32
## 22 13 68 54 dead 81 no treatment 17
## 23 73 72 56 dead 90 no treatment 27
## 24 79 72 53 dead 96 no treatment 67
## 25 36 70 48 dead 100 no treatment 46
## 26 87 73 46 dead 110 no treatment 60
## 27 11 68 47 dead 153 no treatment 26
## 28 94 73 43 dead 165 yes treatment 4
## 29 90 73 52 dead 186 yes treatment 160
## 30 53 71 47 dead 188 no treatment 41
## 31 89 73 51 dead 207 no treatment 139
## 32 24 69 51 dead 219 no treatment 83
## 33 51 71 48 dead 285 no treatment 32
## 34 67 73 19 dead 285 no treatment 57
## 35 16 68 49 dead 308 no treatment 28
## 36 84 73 42 dead 334 no treatment 37
## 37 58 71 47 dead 342 yes treatment 21
## 38 64 72 48 dead 583 yes treatment 32
## 39 7 68 50 dead 675 no treatment 51
## 40 23 69 58 dead 733 no treatment 3
## 41 30 69 44 dead 852 no treatment 16
## 42 50 71 45 dead 979 yes treatment 83
## 43 46 71 48 dead 995 yes treatment 2
## 44 21 69 43 dead 1032 no treatment 8
## 45 14 68 53 dead 1386 no treatment 37
controlgroup <- filter(heartTr,transplant == "control")
controlgroup
## id acceptyear age survived survtime prior transplant wait
## 1 15 68 53 dead 1 no control NA
## 2 43 70 43 dead 2 no control NA
## 3 61 71 52 dead 2 no control NA
## 4 75 72 52 dead 2 no control NA
## 5 6 68 54 dead 3 no control NA
## 6 42 70 36 dead 3 no control NA
## 7 54 71 47 dead 3 no control NA
## 8 85 73 47 dead 5 no control NA
## 9 2 68 51 dead 6 no control NA
## 10 103 67 39 dead 6 no control NA
## 11 12 68 53 dead 8 no control NA
## 12 48 71 56 dead 9 no control NA
## 13 102 74 40 alive 11 no control NA
## 14 35 70 43 dead 12 no control NA
## 15 31 69 54 dead 16 no control NA
## 16 5 68 20 dead 18 no control NA
## 17 77 72 41 dead 21 no control NA
## 18 99 73 49 dead 21 no control NA
## 19 101 74 49 alive 31 no control NA
## 20 66 72 53 dead 32 no control NA
## 21 29 69 50 dead 35 no control NA
## 22 17 68 20 dead 36 no control NA
## 23 19 68 59 dead 37 no control NA
## 24 8 68 45 dead 40 no control NA
## 25 44 70 42 dead 40 no control NA
## 26 1 67 30 dead 50 no control NA
## 27 62 71 39 dead 69 no control NA
## 28 9 68 47 dead 85 no control NA
## 29 32 71 41 dead 102 no control NA
## 30 37 71 41 dead 149 no control NA
## 31 27 69 8 dead 263 no control NA
## 32 91 73 47 dead 340 no control NA
## 33 82 71 29 alive 427 no control NA
## 34 26 69 30 alive 1400 no control NA
filter(controlgroup, survived == 'dead')
## id acceptyear age survived survtime prior transplant wait
## 1 15 68 53 dead 1 no control NA
## 2 43 70 43 dead 2 no control NA
## 3 61 71 52 dead 2 no control NA
## 4 75 72 52 dead 2 no control NA
## 5 6 68 54 dead 3 no control NA
## 6 42 70 36 dead 3 no control NA
## 7 54 71 47 dead 3 no control NA
## 8 85 73 47 dead 5 no control NA
## 9 2 68 51 dead 6 no control NA
## 10 103 67 39 dead 6 no control NA
## 11 12 68 53 dead 8 no control NA
## 12 48 71 56 dead 9 no control NA
## 13 35 70 43 dead 12 no control NA
## 14 31 69 54 dead 16 no control NA
## 15 5 68 20 dead 18 no control NA
## 16 77 72 41 dead 21 no control NA
## 17 99 73 49 dead 21 no control NA
## 18 66 72 53 dead 32 no control NA
## 19 29 69 50 dead 35 no control NA
## 20 17 68 20 dead 36 no control NA
## 21 19 68 59 dead 37 no control NA
## 22 8 68 45 dead 40 no control NA
## 23 44 70 42 dead 40 no control NA
## 24 1 67 30 dead 50 no control NA
## 25 62 71 39 dead 69 no control NA
## 26 9 68 47 dead 85 no control NA
## 27 32 71 41 dead 102 no control NA
## 28 37 71 41 dead 149 no control NA
## 29 27 69 8 dead 263 no control NA
## 30 91 73 47 dead 340 no control NA
69 people took the treatment, 45 died. 34 did not take the treatment, 30 of them died
Heart trasplant can increase lifespan is the claim being tested.
We write alive on 28 cards representing patients who were alive at the end of the study, and dead on 65 cards representing patients who were not. Then, we shu✏e these cards and split them into two groups: one group of size 69 representing treatment, and another group of size 34 representing control. We calculate the di↵erence between the proportion of dead cards in the treatment and control groups (treatment - control) and record this value. We repeat this 100 times to build a distribution centered at 0. Lastly, we calculate the fraction of simulations where the simulated di↵erences in proportions are. 45/69 - 30/34 If this fraction is low, we conclude that it is unlikely to have observed such an outcome by chance and that the null hypothesis should be rejected in favor of the alternative.
Based on the results, it seems as the trasnplant is effective. The more negative the simulated differences in proportion, the less the proportion for deaths in the treatment group is. The data is almost symmetric, it is unimodal, with its mode of difference in proportions being negative.