d1 <- read.csv("http://faraway.neu.edu/biostats/lab4_dataset1.csv")
d1
## treatment age score
## 1 control 15 4.3567227
## 2 control 23 3.8493535
## 3 control 23 6.9803791
## 4 control 19 1.0894147
## 5 control 11 23.9647335
## 6 control 20 1.3979789
## 7 control 21 4.6481514
## 8 control 20 2.4207560
## 9 control 16 3.4874855
## 10 control 24 59.5708610
## 11 control 19 7.5187178
## 12 control 15 2.0415299
## 13 control 16 1.4324617
## 14 control 16 11.5905151
## 15 control 19 7.2531813
## 16 control 14 5.3759302
## 17 control 21 2.9172397
## 18 control 20 1.6695259
## 19 control 21 2.5213834
## 20 control 21 20.0861155
## 21 control 18 3.9698698
## 22 control 9 1.8507170
## 23 control 20 47.9083889
## 24 control 22 11.3033640
## 25 control 15 5.8203064
## 26 control 15 21.2952288
## 27 control 16 17.9290563
## 28 control 15 3.9779116
## 29 control 19 67.0945262
## 30 control 17 5.7257467
## 31 control 12 1.7780288
## 32 control 22 6.3955370
## 33 control 16 9.0933042
## 34 control 23 74.2901520
## 35 control 16 8.2136908
## 36 control 25 11.6697358
## 37 control 20 6.8404058
## 38 control 16 5.2909571
## 39 control 14 7.1368676
## 40 control 15 16.2426355
## 41 control 11 58.8649012
## 42 control 21 20.6433336
## 43 control 17 24.7273124
## 44 control 21 2.1569099
## 45 control 19 19.7646615
## 46 control 15 9.2066385
## 47 control 14 1.7036108
## 48 control 24 12.4413144
## 49 control 26 6.3043848
## 50 control 16 31.9632662
## 51 control 13 3.4346602
## 52 control 12 4.8056305
## 53 control 28 2.9267439
## 54 control 17 6.1897583
## 55 control 20 11.0453749
## 56 control 18 3.5546330
## 57 control 14 16.9517855
## 58 control 11 2.2076249
## 59 control 24 2.5909266
## 60 control 18 31.2230844
## 61 vaccinated 27 0.9842774
## 62 vaccinated 20 4.1040517
## 63 vaccinated 14 1.8569288
## 64 vaccinated 17 4.0935061
## 65 vaccinated 20 14.7150869
## 66 vaccinated 22 13.2843737
## 67 vaccinated 18 1.9524641
## 68 vaccinated 17 0.2765854
## 69 vaccinated 14 33.0381049
## 70 vaccinated 15 5.2966056
## 71 vaccinated 23 4.6707859
## 72 vaccinated 21 2.6821011
## 73 vaccinated 17 4.5272216
## 74 vaccinated 16 2.3062531
## 75 vaccinated 16 4.1399953
## 76 vaccinated 22 1.8216693
## 77 vaccinated 16 0.6905908
## 78 vaccinated 23 7.2997366
## 79 vaccinated 20 12.4254281
## 80 vaccinated 23 1.9962281
## 81 vaccinated 14 0.7762429
## 82 vaccinated 22 5.1667368
## 83 vaccinated 20 2.5994266
## 84 vaccinated 24 0.4803605
## 85 vaccinated 20 2.7240830
## 86 vaccinated 12 1.4472999
## 87 vaccinated 18 1.9329192
## 88 vaccinated 22 0.8550696
## 89 vaccinated 21 16.4963956
## 90 vaccinated 14 1.9520263
## 91 vaccinated 14 0.5457941
## 92 vaccinated 13 3.3108119
## 93 vaccinated 25 3.5366348
## 94 vaccinated 21 1.0142742
## 95 vaccinated 21 0.1512350
## 96 vaccinated 19 1.4326391
## 97 vaccinated 25 4.8090888
## 98 vaccinated 15 2.5606899
## 99 vaccinated 18 2.4640868
## 100 vaccinated 18 4.7627285
## 101 vaccinated 10 0.8298929
## 102 vaccinated 16 8.1398931
## 103 vaccinated 16 2.7037940
## 104 vaccinated 16 5.5141122
## 105 vaccinated 20 7.6454273
## 106 vaccinated 17 3.3989971
## 107 vaccinated 15 1.1289550
## 108 vaccinated 13 8.6968819
## 109 vaccinated 17 0.3678188
## 110 vaccinated 17 1.5765032
## 111 vaccinated 21 2.1050291
## 112 vaccinated 17 2.3022317
## 113 vaccinated 17 7.5418228
## 114 vaccinated 15 3.1149774
## 115 vaccinated 24 4.0843704
## 116 vaccinated 15 2.5353842
## 117 vaccinated 20 2.1219504
## 118 vaccinated 11 5.4496468
## 119 vaccinated 13 8.5525404
## 120 vaccinated 16 0.2458346
HO: There is no difference in the mean age between the control and treatment groups. HA: There is a significant difference in the mean age between the control and treatment groups.
hist(d1$age)
The distribution seems very close to normal, with a few exceptions
par(mfrow=c(1,2))
hist(d1$age)
hist(log10(d1$age))
The data looks worse with the log. Transofrmation is not necessary in this case.
se = function(x) {
sd(x)/sqrt(length(x))
}
mean.age = aggregate(age~treatment, data=d1, FUN=mean)
mean.age
## treatment age
## 1 control 17.96667
## 2 vaccinated 18.05000
mean(mean.age$age)
## [1] 18.00833
stderr.age = aggregate(age~treatment, data=d1, FUN=se)
ci.upper = mean.age$age + 1.96 * stderr.age$age
ci.lower = mean.age$age - 1.96 * stderr.age$age
H0: The vaccine has no effect on HPV rates. HA: The vaccine has a significant effect on HPV rates.
hist(d1$score)
This is very clearly not normal.
qqnorm(d1$score)
qqline(d1$score, col = "red")
This is clearly not following the qq plot. The distribution is not normal.
qqnorm(log10(d1$score))
qqline(log10(d1$score), col = "red")
While it still isn’t completely normal, it is significantly better.
control = log10(subset(d1, treatment == "control", select = "score"))
vaccinated = log10(subset(d1, treatment == "vaccinated", select = "score"))
t.test(control, vaccinated)
##
## Welch Two Sample t-test
##
## data: control and vaccinated
## t = 5.1839, df = 118, p-value = 9.099e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.2725156 0.6094143
## sample estimates:
## mean of x mean of y
## 0.8543715 0.4134065
There is a significant difference in mean HPV scores. The p value of the t test is 9.099e-07, which means at a n alpha of 0.5, the null hypothesis has been rejected.
mean.score = aggregate(score~treatment, data=d1, FUN=mean)
mean.score
## treatment score
## 1 control 13.01176
## 2 vaccinated 4.32061
mean(mean.score$score)
## [1] 8.666184
stderr.score = aggregate(score~treatment, data=d1, FUN=se)
ci.upper = mean.score$score + 1.96 * stderr.score$score
ci.lower = mean.score$score - 1.96 * stderr.score$score
bp = barplot(mean.score$score, names=mean.score$treatment, ylim=c(0, max(ci.upper + 1.9)), ylab="HPV Score")
arrows(y0=ci.lower, y1=ci.upper, x0=bp, x1=bp, angle=90, code=3, legnth=0.1)
## Warning in arrows(y0 = ci.lower, y1 = ci.upper, x0 = bp, x1 = bp, angle = 90, :
## "legnth" is not a graphical parameter
text(x=bp, y=ci.upper, c("a", "b"), pos=3)
#### 1.12
I would appove this vaccine for HPV
avg = mean.score$score[mean.score$treatment == "control"] / mean.score$score[mean.score$treatment == "vaccinated"]
avg
## [1] 3.011556
The mean HPV score in the control group is over 3 times higher than the mean HPV score in the vaccinated group.
# Download the data
d3 <- read.csv("http://faraway.neu.edu/biostats/lab4_dataset3.csv")
head(d3)
## year autism vaccine
## 1 1950 0.8747092 5.079621
## 2 1951 1.0775450 4.979636
## 3 1952 0.9145069 5.272306
## 4 1953 1.4415051 5.080250
## 5 1954 1.2291669 5.694768
## 6 1955 1.0399880 5.906284
par(mfrow=c(1,2))
hist(d3$autism)
hist(d3$vaccine)
People might be convinced because they follow similar distributions.
This doesn’t mean necessarily however, that with a higher vaccination
rate, there is a higher autism rate.
There would be a treatment and placebo control group. Both will be brought into an exam room and injected, and told they are getting the vaccine. I will pick random participants with replacement. Both will have to be injected with some substance, because it could be the injection itself which causes autism. Both groups will come back in after a few months or years in order to check for autism, where the exact same room, examiner, and procedure will be used.
# Download the data
d4 <- read.csv("http://faraway.neu.edu/biostats/lab4_dataset4.csv")
d4
## treatment autism
## 1 control 3.5735024
## 2 control 8.0336850
## 3 control 2.8990161
## 4 control 32.9595418
## 5 control 9.2952895
## 6 control 2.9433006
## 7 control 10.8854720
## 8 control 13.9897470
## 9 control 11.8909945
## 10 control 4.9264153
## 11 control 30.3191998
## 12 control 9.8733898
## 13 control 3.5921806
## 14 control 0.7300079
## 15 control 20.5925821
## 16 control 6.3921226
## 17 control 6.5785196
## 18 control 17.1815513
## 19 control 15.1988717
## 20 control 12.1084229
## 21 control 16.7597030
## 22 control 14.6162847
## 23 control 7.2034853
## 24 control 0.9145239
## 25 control 12.4264311
## 26 control 6.3209610
## 27 control 5.7213483
## 28 control 1.5361014
## 29 control 4.1447810
## 30 control 10.1547498
## 31 control 26.0151628
## 32 control 6.0328062
## 33 control 9.8519717
## 34 control 6.3356661
## 35 control 1.6869808
## 36 control 4.4149894
## 37 control 4.5073529
## 38 control 6.3008630
## 39 control 20.0860465
## 40 control 14.3417627
## 41 control 5.6716291
## 42 control 5.1895050
## 43 control 13.4229157
## 44 control 11.6658200
## 45 control 3.3576600
## 46 control 3.2953252
## 47 control 9.6270993
## 48 control 14.4188002
## 49 control 5.9754164
## 50 control 16.1368863
## 51 control 9.9553080
## 52 control 3.6254326
## 53 control 9.4038548
## 54 control 2.1611423
## 55 control 28.0229468
## 56 control 48.4435837
## 57 control 4.6310264
## 58 control 2.3534101
## 59 control 11.8191326
## 60 control 5.8412534
## 61 vaccine 90.1628753
## 62 vaccine 7.8519350
## 63 vaccine 16.2767769
## 64 vaccine 8.3980720
## 65 vaccine 3.8834611
## 66 vaccine 9.8630189
## 67 vaccine 1.3431819
## 68 vaccine 35.3590673
## 69 vaccine 9.5186529
## 70 vaccine 71.7086707
## 71 vaccine 13.1380096
## 72 vaccine 4.0150651
## 73 vaccine 15.0401960
## 74 vaccine 3.2088171
## 75 vaccine 2.3311614
## 76 vaccine 10.9292888
## 77 vaccine 5.2420263
## 78 vaccine 8.1752014
## 79 vaccine 8.7963892
## 80 vaccine 4.5288999
## 81 vaccine 4.6243289
## 82 vaccine 7.1336383
## 83 vaccine 26.5249818
## 84 vaccine 1.7796793
## 85 vaccine 14.7899247
## 86 vaccine 11.3924445
## 87 vaccine 23.6437743
## 88 vaccine 6.0243891
## 89 vaccine 11.8226692
## 90 vaccine 10.6664019
## 91 vaccine 4.7468440
## 92 vaccine 27.3267973
## 93 vaccine 26.0600272
## 94 vaccine 16.4481605
## 95 vaccine 39.9182437
## 96 vaccine 14.2746670
## 97 vaccine 2.2782504
## 98 vaccine 4.6031212
## 99 vaccine 2.3998048
## 100 vaccine 5.0865478
## 101 vaccine 4.3913352
## 102 vaccine 8.5174404
## 103 vaccine 3.2840531
## 104 vaccine 9.5642173
## 105 vaccine 4.2436144
## 106 vaccine 47.8125075
## 107 vaccine 16.7217033
## 108 vaccine 20.2909349
## 109 vaccine 11.9913476
## 110 vaccine 43.9114928
## 111 vaccine 4.3243574
## 112 vaccine 5.1466976
## 113 vaccine 34.2019356
## 114 vaccine 4.2601469
## 115 vaccine 6.6367292
## 116 vaccine 5.5134583
## 117 vaccine 5.9298987
## 118 vaccine 6.1773334
## 119 vaccine 13.3857182
## 120 vaccine 6.8391915
hist(log10(d4$autism))
d4control = log10(subset(d4, treatment == "control", select = "autism"))
d4vaccinated = log10(subset(d4, treatment == "vaccine", select = "autism"))
t.test(d4control, d4vaccinated)
##
## Welch Two Sample t-test
##
## data: d4control and d4vaccinated
## t = -1.2634, df = 117.48, p-value = 0.209
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.22768101 0.05032468
## sample estimates:
## mean of x mean of y
## 0.8718967 0.9605749
There is no significant difference in autism scores between treatment groups. The p value of the t test is 0.209, which means at a n alpha of 0.5, the null hypothesis has failed to have been rejected.
mean.autism = aggregate(autism~treatment, data=d4, FUN=mean)
mean.autism
## treatment autism
## 1 control 10.30590
## 2 vaccine 14.24099
mean(mean.autism$autism)
## [1] 12.27345
stderr.autism = aggregate(autism~treatment, data=d4, FUN=se)
ci.upper = mean.autism$autism + 1.96 * stderr.autism$autism
ci.lower = mean.autism$autism - 1.96 * stderr.autism$autism
bp = barplot(mean.autism$autism, names=mean.autism$treatment, ylim=c(0, max(ci.upper + 1.9)), ylab="Autism Score")
arrows(y0=ci.lower, y1=ci.upper, x0=bp, x1=bp, angle=90, code=3, legnth=0.1)
## Warning in arrows(y0 = ci.lower, y1 = ci.upper, x0 = bp, x1 = bp, angle = 90, :
## "legnth" is not a graphical parameter
text(x=bp, y=ci.upper, c("a", "b"), pos=3)
I would say that there is no significant relationship between the incidence of autism and vaccination in the study. The numbers are close enough together between the groups that the vaccine has no significant effect on causeing autism on people.