library(tidyverse)
library(openintro)
library(infer)
library(skimr)
data(yrbss)
?yrbss
## starting httpd help server ... done

Exercise 1

What are the cases in this data set? How many cases are there in our sample?

Ans. There are 13583 cases in our dataset.

glimpse(yrbss)
## Rows: 13,583
## Columns: 13
## $ age                      <int> 14, 14, 15, 15, 15, 15, 15, 14, 15, 15, 15, 1~
## $ gender                   <chr> "female", "female", "female", "female", "fema~
## $ grade                    <chr> "9", "9", "9", "9", "9", "9", "9", "9", "9", ~
## $ hispanic                 <chr> "not", "not", "hispanic", "not", "not", "not"~
## $ race                     <chr> "Black or African American", "Black or Africa~
## $ height                   <dbl> NA, NA, 1.73, 1.60, 1.50, 1.57, 1.65, 1.88, 1~
## $ weight                   <dbl> NA, NA, 84.37, 55.79, 46.72, 67.13, 131.54, 7~
## $ helmet_12m               <chr> "never", "never", "never", "never", "did not ~
## $ text_while_driving_30d   <chr> "0", NA, "30", "0", "did not drive", "did not~
## $ physically_active_7d     <int> 4, 2, 7, 0, 2, 1, 4, 4, 5, 0, 0, 0, 4, 7, 7, ~
## $ hours_tv_per_school_day  <chr> "5+", "5+", "5+", "2", "3", "5+", "5+", "5+",~
## $ strength_training_7d     <int> 0, 0, 0, 0, 1, 0, 2, 0, 3, 0, 3, 0, 0, 7, 7, ~
## $ school_night_hours_sleep <chr> "8", "6", "<5", "6", "9", "8", "9", "6", "<5"~

Ex 2

How many observations are we missing weights from?

Ans. We are missing weight from 1004 onbersavtions.

yrbss%>%
  skim()
Data summary
Name Piped data
Number of rows 13583
Number of columns 13
_______________________
Column type frequency:
character 8
numeric 5
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
gender 12 1.00 4 6 0 2 0
grade 79 0.99 1 5 0 5 0
hispanic 231 0.98 3 8 0 2 0
race 2805 0.79 5 41 0 5 0
helmet_12m 311 0.98 5 12 0 6 0
text_while_driving_30d 918 0.93 1 13 0 8 0
hours_tv_per_school_day 338 0.98 1 12 0 7 0
school_night_hours_sleep 1248 0.91 1 3 0 7 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
age 77 0.99 16.16 1.26 12.00 15.00 16.00 17.00 18.00 ▁▂▅▅▇
height 1004 0.93 1.69 0.10 1.27 1.60 1.68 1.78 2.11 ▁▅▇▃▁
weight 1004 0.93 67.91 16.90 29.94 56.25 64.41 76.20 180.99 ▆▇▂▁▁
physically_active_7d 273 0.98 3.90 2.56 0.00 2.00 4.00 7.00 7.00 ▆▂▅▃▇
strength_training_7d 1176 0.91 2.95 2.58 0.00 0.00 3.00 5.00 7.00 ▇▂▅▂▅

Ex 3

Make a side-by-side violin plots of physical_3plus and weight. Is there a relationship between these two variables? What did you expect and why?

Ans. We can see a relation between physical activity and weight. There are more outliers for the those who don’t exercise than those who do. This indicates a certain relation despite the clusters being fairly close to each other.

yrbss <- yrbss %>% 
  mutate(physical_3plus = if_else(physically_active_7d > 2, "yes", "no"))
ggplot(yrbss, aes(x=weight, y=physical_3plus)) + geom_boxplot() + theme_bw()
## Warning: Removed 1004 rows containing non-finite values (stat_boxplot).

yrbss %>%
  group_by(physical_3plus) %>%
  summarise(mean_weight = mean(weight, na.rm = TRUE))
## # A tibble: 3 x 2
##   physical_3plus mean_weight
##   <chr>                <dbl>
## 1 no                    66.7
## 2 yes                   68.4
## 3 <NA>                  69.9

Ex 4

Are all conditions necessary for inference satisfied? Comment on each. You can compute the group sizes with the summarize command above by defining a new variable with the definition n().

Ans. The conditions for inference are independence and normality. Since the data is gathered from individuals with no connection between them, the data is independent; and through the violin plots above, we know that the data is normally distributed.

yrbss %>% 
  group_by(physical_3plus)%>%
  summarise(n=sum(table(weight)))
## # A tibble: 3 x 2
##   physical_3plus     n
##   <chr>          <int>
## 1 no              4022
## 2 yes             8342
## 3 <NA>             215

Ex 5

Write the hypotheses for testing if the average weights are different for those who exercise at least times a week and those who don’t.

Ans. H_o = people who exercise three or more times a week have the same average weight as those who do not exercise three or more times a week. H_1 = people who exercise three or more times a week do not have the same average weight as those who do not exercise three or more times a week.

obs_diff <- yrbss %>%
  specify(weight ~ physical_3plus) %>%
  calculate(stat = "diff in means", order = c("yes", "no"))
## Warning: Removed 1219 rows containing missing values.
null_dist <- yrbss %>%
  specify(weight ~ physical_3plus) %>%
  hypothesize(null = "independence") %>%
  generate(reps = 1000, type = "permute") %>%
  calculate(stat = "diff in means", order = c("yes", "no"))
## Warning: Removed 1219 rows containing missing values.
            ggplot(data = null_dist, aes(x = stat)) +
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Ex 6

Add a vertical red line to the plot above, demonstrating where the observed difference in means (obs_diff) falls on the distribution.

  visualize(null_dist) + 
  shade_p_value(obs_stat = obs_diff, direction = "two_sided")
## Warning: F usually corresponds to right-tailed tests. Proceed with caution.

Ex 7

How many of these null_dist permutations have a difference at least as large (or larger) as obs_diff?

Ans. The red line is too far off from the measurements, therefore the answer is none. It can be seen through the code given below.

null_dist %>%
  get_p_value(obs_stat = obs_diff, direction = "two_sided")
## Warning: Please be cautious in reporting a p-value of 0. This result is an
## approximation based on the number of `reps` chosen in the `generate()` step. See
## `?get_p_value()` for more information.
## # A tibble: 1 x 1
##   p_value
##     <dbl>
## 1       0

Ex 8

What warning message do you get? Why do you think you get this warning message?

Ans. The warning message highlights that the p-value might not be completely accurate, but it is small enough; however, one must be careful while using and reporting the given p value.

Ex 9

Construct and record a confidence interval for the difference between the weights of those who exercise at least three times a week and those who don’t, and interpret this interval in context of the data.

yrbss %>% 
  group_by(physical_3plus) %>% 
  summarise(sd_weight = sd(weight, na.rm = TRUE))
## # A tibble: 3 x 2
##   physical_3plus sd_weight
##   <chr>              <dbl>
## 1 no                  17.6
## 2 yes                 16.5
## 3 <NA>                17.6
yrbss %>% 
  group_by(physical_3plus) %>% 
  summarise(mean_weight = mean(weight, na.rm = TRUE))
## # A tibble: 3 x 2
##   physical_3plus mean_weight
##   <chr>                <dbl>
## 1 no                    66.7
## 2 yes                   68.4
## 3 <NA>                  69.9
yrbss %>% 
  group_by(physical_3plus) %>%
  summarise(n = sum(table(weight)))
## # A tibble: 3 x 2
##   physical_3plus     n
##   <chr>          <int>
## 1 no              4022
## 2 yes             8342
## 3 <NA>             215
meanNA<-66.67389
meanA<-68.44847
sdNA<-17.63805
sdA<-16.47832
nNA<-4022
nA<-8342
z=1.96

upperCINA<-meanNA+z*(sdNA/sqrt(nNA))
lowerCINA<-meanNA-z*(sdNA/sqrt(nNA))

upperCIA<-meanA+z*(sdA/sqrt(nA))
lowerCIA<-meanA-z*(sdA/sqrt(nA))

Through these calculations, the confidence interval comes to be: For those who are active: (68.0949,68.8021) For those who aren’t active: (66.1288, 67.2190)

LS0tDQp0aXRsZTogIkxhYiA5OiBJbmZlcmVuY2UgZm9yIG51bWVyaWNhbCBkYXRhIg0KYXV0aG9yOiAiU21pdCBNZWh0YSINCmRhdGU6ICJgciBTeXMuRGF0ZSgpYCINCm91dHB1dDogb3BlbmludHJvOjpsYWJfcmVwb3J0DQotLS0NCg0KYGBge3IgbG9hZC1wYWNrYWdlcywgbWVzc2FnZT1GQUxTRX0NCmxpYnJhcnkodGlkeXZlcnNlKQ0KbGlicmFyeShvcGVuaW50cm8pDQpsaWJyYXJ5KGluZmVyKQ0KbGlicmFyeShza2ltcikNCmBgYA0KDQpgYGB7ciBkYXRhfQ0KZGF0YSh5cmJzcykNCj95cmJzcw0KYGBgDQojIyMgRXhlcmNpc2UgMQ0KV2hhdCBhcmUgdGhlIGNhc2VzIGluIHRoaXMgZGF0YSBzZXQ/IEhvdyBtYW55IGNhc2VzIGFyZSB0aGVyZSBpbiBvdXIgc2FtcGxlPw0KDQpBbnMuIFRoZXJlIGFyZSAxMzU4MyBjYXNlcyBpbiBvdXIgZGF0YXNldC4NCg0KYGBge3IgZXggMX0NCmdsaW1wc2UoeXJic3MpDQpgYGANCg0KIyBFeCAyDQpIb3cgbWFueSBvYnNlcnZhdGlvbnMgYXJlIHdlIG1pc3Npbmcgd2VpZ2h0cyBmcm9tPw0KDQpBbnMuIFdlIGFyZSBtaXNzaW5nIHdlaWdodCBmcm9tIDEwMDQgb25iZXJzYXZ0aW9ucy4NCmBgYHtyfQ0KeXJic3MlPiUNCiAgc2tpbSgpDQpgYGANCg0KIyBFeCAzDQpNYWtlIGEgc2lkZS1ieS1zaWRlIHZpb2xpbiBwbG90cyBvZiBwaHlzaWNhbF8zcGx1cyBhbmQgd2VpZ2h0LiBJcyB0aGVyZSBhIHJlbGF0aW9uc2hpcCBiZXR3ZWVuIHRoZXNlIHR3byB2YXJpYWJsZXM/IFdoYXQgZGlkIHlvdSBleHBlY3QgYW5kIHdoeT8NCg0KQW5zLg0KV2UgY2FuIHNlZSBhIHJlbGF0aW9uIGJldHdlZW4gcGh5c2ljYWwgYWN0aXZpdHkgYW5kIHdlaWdodC4gVGhlcmUgYXJlIG1vcmUgb3V0bGllcnMgZm9yIHRoZSB0aG9zZSB3aG8gZG9uJ3QgZXhlcmNpc2UgdGhhbiB0aG9zZSB3aG8gZG8uIFRoaXMgaW5kaWNhdGVzIGEgY2VydGFpbiByZWxhdGlvbiBkZXNwaXRlIHRoZSBjbHVzdGVycyBiZWluZyBmYWlybHkgY2xvc2UgdG8gZWFjaCBvdGhlci4NCg0KYGBge3J9DQp5cmJzcyA8LSB5cmJzcyAlPiUgDQogIG11dGF0ZShwaHlzaWNhbF8zcGx1cyA9IGlmX2Vsc2UocGh5c2ljYWxseV9hY3RpdmVfN2QgPiAyLCAieWVzIiwgIm5vIikpDQpnZ3Bsb3QoeXJic3MsIGFlcyh4PXdlaWdodCwgeT1waHlzaWNhbF8zcGx1cykpICsgZ2VvbV9ib3hwbG90KCkgKyB0aGVtZV9idygpDQpgYGANCg0KYGBge3J9DQp5cmJzcyAlPiUNCiAgZ3JvdXBfYnkocGh5c2ljYWxfM3BsdXMpICU+JQ0KICBzdW1tYXJpc2UobWVhbl93ZWlnaHQgPSBtZWFuKHdlaWdodCwgbmEucm0gPSBUUlVFKSkNCmBgYA0KDQojIEV4IDQNCkFyZSBhbGwgY29uZGl0aW9ucyBuZWNlc3NhcnkgZm9yIGluZmVyZW5jZSBzYXRpc2ZpZWQ/IENvbW1lbnQgb24gZWFjaC4gWW91IGNhbiBjb21wdXRlIHRoZSBncm91cCBzaXplcyB3aXRoIHRoZSBzdW1tYXJpemUgY29tbWFuZCBhYm92ZSBieSBkZWZpbmluZyBhIG5ldyB2YXJpYWJsZSB3aXRoIHRoZSBkZWZpbml0aW9uIG4oKS4NCg0KQW5zLg0KVGhlIGNvbmRpdGlvbnMgZm9yIGluZmVyZW5jZSBhcmUgaW5kZXBlbmRlbmNlIGFuZCBub3JtYWxpdHkuIFNpbmNlIHRoZSBkYXRhIGlzIGdhdGhlcmVkIGZyb20gaW5kaXZpZHVhbHMgd2l0aCBubyBjb25uZWN0aW9uIGJldHdlZW4gdGhlbSwgdGhlIGRhdGEgaXMgaW5kZXBlbmRlbnQ7IGFuZCB0aHJvdWdoIHRoZSB2aW9saW4gcGxvdHMgYWJvdmUsIHdlIGtub3cgdGhhdCB0aGUgZGF0YSBpcyBub3JtYWxseSBkaXN0cmlidXRlZC4NCg0KYGBge3IgZXggNH0NCnlyYnNzICU+JSANCiAgZ3JvdXBfYnkocGh5c2ljYWxfM3BsdXMpJT4lDQogIHN1bW1hcmlzZShuPXN1bSh0YWJsZSh3ZWlnaHQpKSkNCmBgYA0KDQojIEV4IDUNCldyaXRlIHRoZSBoeXBvdGhlc2VzIGZvciB0ZXN0aW5nIGlmIHRoZSBhdmVyYWdlIHdlaWdodHMgYXJlIGRpZmZlcmVudCBmb3IgdGhvc2Ugd2hvIGV4ZXJjaXNlIGF0IGxlYXN0IHRpbWVzIGEgd2VlayBhbmQgdGhvc2Ugd2hvIGRvbuKAmXQuDQoNCkFucy4NCkhfbyA9IHBlb3BsZSB3aG8gZXhlcmNpc2UgdGhyZWUgb3IgbW9yZSB0aW1lcyBhIHdlZWsgaGF2ZSB0aGUgc2FtZSBhdmVyYWdlIHdlaWdodCBhcyB0aG9zZSB3aG8gZG8gbm90IGV4ZXJjaXNlIHRocmVlIG9yIG1vcmUgdGltZXMgYSB3ZWVrLg0KSF8xID0gcGVvcGxlIHdobyBleGVyY2lzZSB0aHJlZSBvciBtb3JlIHRpbWVzIGEgd2VlayBkbyBub3QgaGF2ZSB0aGUgc2FtZSBhdmVyYWdlIHdlaWdodCBhcyB0aG9zZSB3aG8gZG8gbm90IGV4ZXJjaXNlIHRocmVlIG9yIG1vcmUgdGltZXMgYSB3ZWVrLiANCg0KYGBge3J9DQpvYnNfZGlmZiA8LSB5cmJzcyAlPiUNCiAgc3BlY2lmeSh3ZWlnaHQgfiBwaHlzaWNhbF8zcGx1cykgJT4lDQogIGNhbGN1bGF0ZShzdGF0ID0gImRpZmYgaW4gbWVhbnMiLCBvcmRlciA9IGMoInllcyIsICJubyIpKQ0KbnVsbF9kaXN0IDwtIHlyYnNzICU+JQ0KICBzcGVjaWZ5KHdlaWdodCB+IHBoeXNpY2FsXzNwbHVzKSAlPiUNCiAgaHlwb3RoZXNpemUobnVsbCA9ICJpbmRlcGVuZGVuY2UiKSAlPiUNCiAgZ2VuZXJhdGUocmVwcyA9IDEwMDAsIHR5cGUgPSAicGVybXV0ZSIpICU+JQ0KICBjYWxjdWxhdGUoc3RhdCA9ICJkaWZmIGluIG1lYW5zIiwgb3JkZXIgPSBjKCJ5ZXMiLCAibm8iKSkNCiAgICAgICAgICAgIGdncGxvdChkYXRhID0gbnVsbF9kaXN0LCBhZXMoeCA9IHN0YXQpKSArDQogIGdlb21faGlzdG9ncmFtKCkNCmBgYA0KDQojIEV4IDYNCkFkZCBhIHZlcnRpY2FsIHJlZCBsaW5lIHRvIHRoZSBwbG90IGFib3ZlLCBkZW1vbnN0cmF0aW5nIHdoZXJlIHRoZSBvYnNlcnZlZCBkaWZmZXJlbmNlIGluIG1lYW5zIChvYnNfZGlmZikgZmFsbHMgb24gdGhlIGRpc3RyaWJ1dGlvbi4NCg0KYGBge3IgZXggNn0NCiAgdmlzdWFsaXplKG51bGxfZGlzdCkgKyANCiAgc2hhZGVfcF92YWx1ZShvYnNfc3RhdCA9IG9ic19kaWZmLCBkaXJlY3Rpb24gPSAidHdvX3NpZGVkIikNCmBgYA0KDQojIEV4IDcNCkhvdyBtYW55IG9mIHRoZXNlIG51bGxfZGlzdCBwZXJtdXRhdGlvbnMgaGF2ZSBhIGRpZmZlcmVuY2UgYXQgbGVhc3QgYXMgbGFyZ2UgKG9yIGxhcmdlcikgYXMgb2JzX2RpZmY/DQoNCkFucy4gVGhlIHJlZCBsaW5lIGlzIHRvbyBmYXIgb2ZmIGZyb20gdGhlIG1lYXN1cmVtZW50cywgdGhlcmVmb3JlIHRoZSBhbnN3ZXIgaXMgbm9uZS4gSXQgY2FuIGJlIHNlZW4gdGhyb3VnaCB0aGUgY29kZSBnaXZlbiBiZWxvdy4NCmBgYHtyfQ0KbnVsbF9kaXN0ICU+JQ0KICBnZXRfcF92YWx1ZShvYnNfc3RhdCA9IG9ic19kaWZmLCBkaXJlY3Rpb24gPSAidHdvX3NpZGVkIikNCmBgYA0KDQojIEV4IDgNCldoYXQgd2FybmluZyBtZXNzYWdlIGRvIHlvdSBnZXQ/IFdoeSBkbyB5b3UgdGhpbmsgeW91IGdldCB0aGlzIHdhcm5pbmcgbWVzc2FnZT8NCg0KQW5zLg0KVGhlIHdhcm5pbmcgbWVzc2FnZSBoaWdobGlnaHRzIHRoYXQgdGhlIHAtdmFsdWUgbWlnaHQgbm90IGJlIGNvbXBsZXRlbHkgYWNjdXJhdGUsIGJ1dCBpdCBpcyBzbWFsbCBlbm91Z2g7IGhvd2V2ZXIsIG9uZSBtdXN0IGJlIGNhcmVmdWwgd2hpbGUgdXNpbmcgYW5kIHJlcG9ydGluZyB0aGUgZ2l2ZW4gcCB2YWx1ZS4NCg0KIyBFeCA5DQpDb25zdHJ1Y3QgYW5kIHJlY29yZCBhIGNvbmZpZGVuY2UgaW50ZXJ2YWwgZm9yIHRoZSBkaWZmZXJlbmNlIGJldHdlZW4gdGhlIHdlaWdodHMgb2YgdGhvc2Ugd2hvIGV4ZXJjaXNlIGF0IGxlYXN0IHRocmVlIHRpbWVzIGEgd2VlayBhbmQgdGhvc2Ugd2hvIGRvbuKAmXQsIGFuZCBpbnRlcnByZXQgdGhpcyBpbnRlcnZhbCBpbiBjb250ZXh0IG9mIHRoZSBkYXRhLg0KDQpgYGB7ciBleCA5fQ0KeXJic3MgJT4lIA0KICBncm91cF9ieShwaHlzaWNhbF8zcGx1cykgJT4lIA0KICBzdW1tYXJpc2Uoc2Rfd2VpZ2h0ID0gc2Qod2VpZ2h0LCBuYS5ybSA9IFRSVUUpKQ0KeXJic3MgJT4lIA0KICBncm91cF9ieShwaHlzaWNhbF8zcGx1cykgJT4lIA0KICBzdW1tYXJpc2UobWVhbl93ZWlnaHQgPSBtZWFuKHdlaWdodCwgbmEucm0gPSBUUlVFKSkNCnlyYnNzICU+JSANCiAgZ3JvdXBfYnkocGh5c2ljYWxfM3BsdXMpICU+JQ0KICBzdW1tYXJpc2UobiA9IHN1bSh0YWJsZSh3ZWlnaHQpKSkNCg0KYGBgDQpgYGB7cn0NCm1lYW5OQTwtNjYuNjczODkNCm1lYW5BPC02OC40NDg0Nw0Kc2ROQTwtMTcuNjM4MDUNCnNkQTwtMTYuNDc4MzINCm5OQTwtNDAyMg0KbkE8LTgzNDINCno9MS45Ng0KDQp1cHBlckNJTkE8LW1lYW5OQSt6KihzZE5BL3NxcnQobk5BKSkNCmxvd2VyQ0lOQTwtbWVhbk5BLXoqKHNkTkEvc3FydChuTkEpKQ0KDQp1cHBlckNJQTwtbWVhbkEreiooc2RBL3NxcnQobkEpKQ0KbG93ZXJDSUE8LW1lYW5BLXoqKHNkQS9zcXJ0KG5BKSkNCmBgYA0KDQpUaHJvdWdoIHRoZXNlIGNhbGN1bGF0aW9ucywgdGhlIGNvbmZpZGVuY2UgaW50ZXJ2YWwgY29tZXMgdG8gYmU6DQpGb3IgdGhvc2Ugd2hvIGFyZSBhY3RpdmU6ICg2OC4wOTQ5LDY4LjgwMjEpDQpGb3IgdGhvc2Ugd2hvIGFyZW4ndCBhY3RpdmU6ICg2Ni4xMjg4LCA2Ny4yMTkwKQ0K