## Warning: package 'tidyverse' was built under R version 4.0.2
## Warning: package 'tibble' was built under R version 4.0.2
## Warning: package 'tidyr' was built under R version 4.0.2
## Warning: package 'dplyr' was built under R version 4.0.2
## Warning: package 'openintro' was built under R version 4.0.2
## Warning: package 'airports' was built under R version 4.0.2
## Warning: package 'cherryblossom' was built under R version 4.0.2
## Warning: package 'usdata' was built under R version 4.0.2
## Warning: package 'infer' was built under R version 4.0.2
## [1] "C:/Users/Jerome/Documents/From_Toshiba_HD_Work_Files/0000_Montgomery_College/Math_217/Week_8/201023_Math217_Lab7_rja"
Get Data
## starting httpd help server ... done
Exercise 1
13,583 observations; 13 variables
Exercise 2
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 29.94 56.25 64.41 67.91 76.20 180.99 1004
yrbss <- yrbss %>%
mutate(physical_3plus = ifelse(yrbss$physically_active_7d > 2, "yes", "no"))
glimpse(yrbss$physical_3plus)
## chr [1:13583] "yes" "no" "yes" "no" "no" "no" "yes" "yes" "yes" "no" "no" ...
table(yrbss$physical_3plus)
##
## no yes
## 4404 8906
Exercise 3
Part of the reason I had trouble w/ this was I misread the question. I thought we were to do 2 boxplots: One of the weights of the persons in physical_3plus and another boxplot of all the weights. Had the question been written “Construct boxplots comparing the weights of the youth who exercised > 2 times/week with those who didn’t” it would have been easier to understand. I’m not sure i could have figured out how to do that boxplot on my own, but I would have understood the question.
Thanks much for the code.
plot1 <- yrbss %>%
na.omit() %>%
ggplot(aes(physical_3plus, weight)) +
ggtitle("Comparison of Weights of Youth Who Do and Do Not Exercise > 2 Times/Week") +
geom_boxplot()
plot1

# This is the mean weight of everyone who has a weight in the file.
# And the mean age.
mean(yrbss$weight, na.rm = TRUE)
## [1] 67.9065
mean(yrbss$age, na.rm = TRUE)
## [1] 16.15704
The boxplot shows the median weight of the “no” group is lower than that of the “yes” group. The mean is also lower.This seems counter-intuitive, unless those who exercise > 2 times/week are real jocks w/ muscle mass.
yrbss %>%
group_by(physical_3plus) %>%
summarise(mean_weight = mean(weight, na.rm = TRUE))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 3 x 2
## physical_3plus mean_weight
## <chr> <dbl>
## 1 no 66.7
## 2 yes 68.4
## 3 <NA> 69.9
I calculated the mean age of each of the two groups, thinking perhaps the “no” group was younger, but both groups are all the same age. Since the mean age of both groups is 16, I wanted to see the distribution of the ages in each group. Because the ages were so close, I thought maybe we only had 16 year-olds in the sample. But we don’t.
yrbss %>%
group_by(physical_3plus) %>%
summarise(mean_age = mean(age, na.rm = TRUE))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 3 x 2
## physical_3plus mean_age
## <chr> <dbl>
## 1 no 16.3
## 2 yes 16.1
## 3 <NA> 16.1
#What is the distribution of ages in the dataset?
table(yrbss$age)
##
## 12 13 14 15 16 17 18
## 26 18 1368 3098 3203 3473 2320
#what is the distribution of the ages in physical_3plus?
table (yrbss$physical_3plus, yrbss$age)
##
## 12 13 14 15 16 17 18
## no 10 6 383 897 1000 1245 839
## yes 14 12 960 2143 2120 2165 1445
Exercise 4
I assume CDC did proper random sampling (it was probably very complicated random sampling). The observations should be independent of each other. The sample size we are given is far too large to do meaningful inferential statistics. I have no way of knowing whether the sample we have is < 10% of the entire dataset, or if the “sample” we are given is the entire dataset.
Exercise 5
This exercise made no sense at all to me. I have no idea what I did or why. But the code worked, so I’m happy.
obs_diff <- yrbss %>%
specify(weight ~ physical_3plus) %>%
calculate (stat = "diff in means", order = c("yes" , "no"))
## Warning: Removed 1219 rows containing missing values.
null_dist <- yrbss %>%
specify(weight ~ physical_3plus) %>%
hypothesize(null = "independence") %>%
generate(reps = 1000, type = "permute") %>%
calculate(stat = "diff in means", order = c("yes", "no"))
## Warning: Removed 1219 rows containing missing values.
ggplot(data=null_dist, aes(x=stat)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Exercise 6
null_dist %>%
get_p_value(obs_stat = obs_diff, direction = "two-sided")
## Warning: Please be cautious in reporting a p-value of 0. This result is an
## approximation based on the number of `reps` chosen in the `generate()` step. See
## `?get_p_value()` for more information.
## # A tibble: 1 x 1
## p_value
## <dbl>
## 1 0
Exercise 7
Exercise 7 wants us to construct a confidence interval and interpret it. The easiest way to do that is via a t-test. I found the way to do a t-test w/ categorical data in R and ran it. The confidence interval does not cross zero, therefore the difference is statistically significant. The results of the t-test confirm the significant difference. But is this difference “significant,” or is it “significant” because of the large sample size? A 1.77 pound difference could be significant, but maybe it isn’t.
t.test(yrbss$weight ~ yrbss$physical_3plus)
##
## Welch Two Sample t-test
##
## data: yrbss$weight by yrbss$physical_3plus
## t = -5.353, df = 7478.8, p-value = 8.908e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.424441 -1.124728
## sample estimates:
## mean in group no mean in group yes
## 66.67389 68.44847
LS0tDQp0aXRsZTogIk1hdGgyMTdfTGFiNyINCmF1dGhvcjogInJqYSINCmRhdGU6ICJgciBTeXMuRGF0ZSgpYCINCm91dHB1dDogb3BlbmludHJvOjpsYWJfcmVwb3J0DQotLS0NCg0KYGBge3IgbG9hZC1wYWNrYWdlcywgbWVzc2FnZT1GQUxTRX0NCmxpYnJhcnkodGlkeXZlcnNlKQ0KbGlicmFyeShvcGVuaW50cm8pDQpsaWJyYXJ5KGluZmVyKQ0KZ2V0d2QoKQ0KYGBgDQoNCiMjIyBHZXQgRGF0YQ0KDQoNCg0KYGBge3IgY29kZS1jaHVuay1sYWJlbH0NCmRhdGEoeXJic3MpDQo/eXJic3MNCmBgYA0KDQojIyMgRXhlcmNpc2UgMQ0KDQojIyMjIDEzLDU4MyBvYnNlcnZhdGlvbnM7IDEzIHZhcmlhYmxlcw0KDQojIyMgRXhlcmNpc2UgMg0KDQpgYGB7cn0NCnN1bW1hcnkoeXJic3Mkd2VpZ2h0KQ0KeXJic3MgPC0geXJic3MgJT4lDQogIG11dGF0ZShwaHlzaWNhbF8zcGx1cyA9IGlmZWxzZSh5cmJzcyRwaHlzaWNhbGx5X2FjdGl2ZV83ZCA+IDIsICJ5ZXMiLCAibm8iKSkNCg0KDQpgYGANCg0KYGBge3J9DQpnbGltcHNlKHlyYnNzJHBoeXNpY2FsXzNwbHVzKQ0KdGFibGUoeXJic3MkcGh5c2ljYWxfM3BsdXMpDQpgYGANCg0KIyMjIEV4ZXJjaXNlIDMNCiBQYXJ0IG9mIHRoZSByZWFzb24gSSBoYWQgdHJvdWJsZSB3LyB0aGlzIHdhcyBJIG1pc3JlYWQgdGhlIHF1ZXN0aW9uLiBJIHRob3VnaHQgd2Ugd2VyZSB0byBkbyAyIGJveHBsb3RzOiBPbmUgb2YgdGhlIHdlaWdodHMgb2YgdGhlIHBlcnNvbnMgaW4gcGh5c2ljYWxfM3BsdXMgYW5kIGFub3RoZXIgYm94cGxvdCBvZiBhbGwgdGhlIHdlaWdodHMuIEhhZCB0aGUgcXVlc3Rpb24gYmVlbiB3cml0dGVuICJDb25zdHJ1Y3QgYm94cGxvdHMgY29tcGFyaW5nIHRoZSB3ZWlnaHRzIG9mIHRoZSB5b3V0aCB3aG8gZXhlcmNpc2VkID4gMiB0aW1lcy93ZWVrIHdpdGggdGhvc2Ugd2hvIGRpZG4ndCIgaXQgd291bGQgaGF2ZSBiZWVuIGVhc2llciB0byB1bmRlcnN0YW5kLiBJJ20gbm90IHN1cmUgaSBjb3VsZCBoYXZlIGZpZ3VyZWQgb3V0IGhvdyB0byBkbyB0aGF0IGJveHBsb3Qgb24gbXkgb3duLCBidXQgSSB3b3VsZCBoYXZlIHVuZGVyc3Rvb2QgdGhlIHF1ZXN0aW9uLg0KIA0KVGhhbmtzIG11Y2ggZm9yIHRoZSBjb2RlLiANCg0KDQpgYGB7cn0NCnBsb3QxIDwtIHlyYnNzICU+JQ0KICBuYS5vbWl0KCkgJT4lDQogIGdncGxvdChhZXMocGh5c2ljYWxfM3BsdXMsIHdlaWdodCkpICsNCiAgZ2d0aXRsZSgiQ29tcGFyaXNvbiBvZiBXZWlnaHRzIG9mIFlvdXRoIFdobyBEbyBhbmQgRG8gTm90IEV4ZXJjaXNlID4gMiBUaW1lcy9XZWVrIikgKw0KICBnZW9tX2JveHBsb3QoKQ0KcGxvdDENCg0KDQpgYGANCg0KDQoNCg0KYGBge3J9DQojIFRoaXMgaXMgdGhlIG1lYW4gd2VpZ2h0IG9mIGV2ZXJ5b25lIHdobyBoYXMgYSB3ZWlnaHQgaW4gdGhlIGZpbGUuIA0KIyBBbmQgdGhlIG1lYW4gYWdlLiANCm1lYW4oeXJic3Mkd2VpZ2h0LCBuYS5ybSA9IFRSVUUpDQptZWFuKHlyYnNzJGFnZSwgbmEucm0gPSBUUlVFKQ0KDQpgYGANCg0KVGhlIGJveHBsb3Qgc2hvd3MgdGhlIG1lZGlhbiB3ZWlnaHQgb2YgdGhlICJubyIgZ3JvdXAgaXMgbG93ZXIgdGhhbiB0aGF0IG9mIHRoZSAieWVzIiBncm91cC4gVGhlIG1lYW4gaXMgYWxzbyBsb3dlci5UaGlzIHNlZW1zIGNvdW50ZXItaW50dWl0aXZlLCB1bmxlc3MgdGhvc2Ugd2hvIGV4ZXJjaXNlID4gMiB0aW1lcy93ZWVrIGFyZSByZWFsIGpvY2tzIHcvICBtdXNjbGUgbWFzcy4NCg0KYGBge3J9DQogDQoNCnlyYnNzICU+JQ0KICBncm91cF9ieShwaHlzaWNhbF8zcGx1cykgJT4lDQogIHN1bW1hcmlzZShtZWFuX3dlaWdodCA9IG1lYW4od2VpZ2h0LCBuYS5ybSA9IFRSVUUpKQ0KYGBgDQoNCkkgY2FsY3VsYXRlZCB0aGUgbWVhbiBhZ2Ugb2YgZWFjaCBvZiB0aGUgdHdvIGdyb3VwcywgdGhpbmtpbmcgcGVyaGFwcyB0aGUgIm5vIiBncm91cCB3YXMgeW91bmdlciwgYnV0IGJvdGggZ3JvdXBzIGFyZSBhbGwgdGhlIHNhbWUgYWdlLg0KU2luY2UgdGhlIG1lYW4gYWdlIG9mIGJvdGggZ3JvdXBzIGlzIDE2LCBJIHdhbnRlZCB0byBzZWUgdGhlIGRpc3RyaWJ1dGlvbiBvZiB0aGUgYWdlcyBpbiBlYWNoIGdyb3VwLiBCZWNhdXNlIHRoZSBhZ2VzIHdlcmUgc28gY2xvc2UsIEkgdGhvdWdodCBtYXliZSB3ZSBvbmx5IGhhZCAxNiB5ZWFyLW9sZHMgaW4gdGhlIHNhbXBsZS4gQnV0IHdlIGRvbid0Lg0KDQoNCmBgYHtyfQ0KIA0KDQp5cmJzcyAlPiUNCiAgZ3JvdXBfYnkocGh5c2ljYWxfM3BsdXMpICU+JQ0KICBzdW1tYXJpc2UobWVhbl9hZ2UgPSBtZWFuKGFnZSwgbmEucm0gPSBUUlVFKSkNCmBgYA0KDQpgYGB7cn0NCiNXaGF0IGlzIHRoZSBkaXN0cmlidXRpb24gb2YgYWdlcyBpbiB0aGUgZGF0YXNldD8gDQp0YWJsZSh5cmJzcyRhZ2UpDQpgYGANCmBgYHtyfQ0KI3doYXQgaXMgdGhlIGRpc3RyaWJ1dGlvbiBvZiB0aGUgYWdlcyBpbiBwaHlzaWNhbF8zcGx1cz8NCnRhYmxlICh5cmJzcyRwaHlzaWNhbF8zcGx1cywgeXJic3MkYWdlKQ0KYGBgDQoNCg0KIyMjIEV4ZXJjaXNlIDQNCg0KSSBhc3N1bWUgQ0RDIGRpZCBwcm9wZXIgcmFuZG9tIHNhbXBsaW5nIChpdCB3YXMgcHJvYmFibHkgdmVyeSBjb21wbGljYXRlZCByYW5kb20gc2FtcGxpbmcpLg0KVGhlIG9ic2VydmF0aW9ucyBzaG91bGQgYmUgaW5kZXBlbmRlbnQgb2YgZWFjaCBvdGhlci4gDQpUaGUgc2FtcGxlIHNpemUgd2UgYXJlIGdpdmVuIGlzIGZhciB0b28gbGFyZ2UgdG8gZG8gbWVhbmluZ2Z1bCBpbmZlcmVudGlhbCBzdGF0aXN0aWNzLiANCkkgaGF2ZSBubyB3YXkgb2Yga25vd2luZyB3aGV0aGVyIHRoZSBzYW1wbGUgd2UgaGF2ZSBpcyA8IDEwJSBvZiB0aGUgZW50aXJlIGRhdGFzZXQsIG9yIGlmIHRoZSAic2FtcGxlIiB3ZSBhcmUgZ2l2ZW4gaXMgdGhlIGVudGlyZSBkYXRhc2V0LiANCg0KIyMjIEV4ZXJjaXNlIDUNClRoaXMgZXhlcmNpc2UgbWFkZSBubyBzZW5zZSBhdCBhbGwgIHRvIG1lLiBJIGhhdmUgbm8gaWRlYSB3aGF0IEkgZGlkIG9yIHdoeS4gQnV0IHRoZSBjb2RlIHdvcmtlZCwgc28gSSdtIGhhcHB5LiANCg0KDQpgYGB7cn0NCm9ic19kaWZmIDwtIHlyYnNzICU+JQ0KICBzcGVjaWZ5KHdlaWdodCB+IHBoeXNpY2FsXzNwbHVzKSAlPiUNCiAgDQogIGNhbGN1bGF0ZSAoc3RhdCA9ICJkaWZmIGluIG1lYW5zIiwgb3JkZXIgPSBjKCJ5ZXMiICwgIm5vIikpDQogIA0KYGBgIA0KDQoNCmBgYHtyfQ0KbnVsbF9kaXN0IDwtIHlyYnNzICU+JQ0Kc3BlY2lmeSh3ZWlnaHQgfiBwaHlzaWNhbF8zcGx1cykgJT4lDQpoeXBvdGhlc2l6ZShudWxsID0gImluZGVwZW5kZW5jZSIpICU+JQ0KZ2VuZXJhdGUocmVwcyA9IDEwMDAsIHR5cGUgPSAicGVybXV0ZSIpICU+JQ0KY2FsY3VsYXRlKHN0YXQgPSAiZGlmZiBpbiBtZWFucyIsIG9yZGVyID0gYygieWVzIiwgIm5vIikpDQoNCg0KDQpgYGANCg0KYGBge3J9DQpnZ3Bsb3QoZGF0YT1udWxsX2Rpc3QsIGFlcyh4PXN0YXQpKSArDQogIGdlb21faGlzdG9ncmFtKCkNCg0KDQoNCmBgYA0KDQojIyMgRXhlcmNpc2UgNg0KDQpgYGB7cn0NCm51bGxfZGlzdCAlPiUNCiAgZ2V0X3BfdmFsdWUob2JzX3N0YXQgPSBvYnNfZGlmZiwgZGlyZWN0aW9uID0gInR3by1zaWRlZCIpDQpgYGANCg0KIyMjIEV4ZXJjaXNlIDcNCkV4ZXJjaXNlIDcgd2FudHMgdXMgdG8gY29uc3RydWN0IGEgY29uZmlkZW5jZSBpbnRlcnZhbCBhbmQgaW50ZXJwcmV0IGl0LiBUaGUgZWFzaWVzdCB3YXkgdG8gZG8gdGhhdCBpcyB2aWEgYSB0LXRlc3QuIEkgZm91bmQgdGhlIHdheSB0byBkbyBhIHQtdGVzdCAgdy8gY2F0ZWdvcmljYWwgZGF0YSBpbiBSIGFuZCByYW4gaXQuIFRoZSBjb25maWRlbmNlIGludGVydmFsIGRvZXMgbm90IGNyb3NzIHplcm8sIHRoZXJlZm9yZSB0aGUgZGlmZmVyZW5jZSBpcyBzdGF0aXN0aWNhbGx5IHNpZ25pZmljYW50LiBUaGUgcmVzdWx0cyBvZiB0aGUgdC10ZXN0IGNvbmZpcm0gdGhlIHNpZ25pZmljYW50IGRpZmZlcmVuY2UuIEJ1dCBpcyB0aGlzIGRpZmZlcmVuY2UgInNpZ25pZmljYW50LCIgb3IgaXMgaXQgInNpZ25pZmljYW50IiBiZWNhdXNlIG9mIHRoZSBsYXJnZSBzYW1wbGUgc2l6ZT8gQSAxLjc3IHBvdW5kIGRpZmZlcmVuY2UgY291bGQgYmUgc2lnbmlmaWNhbnQsIGJ1dCBtYXliZSBpdCBpc24ndC4gIA0KDQoNCmBgYHtyfQ0KdC50ZXN0KHlyYnNzJHdlaWdodCB+IHlyYnNzJHBoeXNpY2FsXzNwbHVzKQ0KYGBgDQoNCg0KDQoNCg0KDQoNCg0KDQo=