library(tidyverse)
library(openintro)
data('arbuthnot', package='openintro')
head(arbuthnot, n = 3)
## # A tibble: 3 × 3
##    year  boys girls
##   <int> <int> <int>
## 1  1629  5218  4683
## 2  1630  4858  4457
## 3  1631  4422  4102

Exercise 1

We can use the data.frame$variable notation to extract the values of a variable/column.See below.

arbuthnot$girls
##  [1] 4683 4457 4102 4590 4839 4820 4928 4605 4457 4952 4784 5332 5200 4910 4617
## [16] 3997 3919 3395 3536 3181 2746 2722 2840 2908 2959 3179 3349 3382 3289 3013
## [31] 2781 3247 4107 4803 4881 5681 4858 4319 5322 5560 5829 5719 6061 6120 5822
## [46] 5738 5717 5847 6203 6033 6041 6299 6533 6744 7158 7127 7246 7119 7214 7101
## [61] 7167 7302 7392 7316 7483 6647 6713 7229 7767 7626 7452 7061 7514 7656 7683
## [76] 5738 7779 7417 7687 7623 7380 7288

Exercise 2

The line of best fit shows that, in general, the trend of girls baptized has an upward trend from 1629 to 1710. Looking more closely shows that between 1640 to about 1655, there was a downward trend in the number of girls baptized. This trend in the number of girls baptized then took a sharp upward trend until about 1690 when the trend started to plateau.

# Insert code for Exercise 2 here
ggplot(data = arbuthnot, mapping = aes(x = year, y = girls)) + geom_point()+ geom_line() + geom_smooth(method = "lm") + labs(title = "Trend in x girls batized", x = "No. of girls baptized", y = "Year")
## `geom_smooth()` using formula = 'y ~ x'

Exercise 3

The graph below shows that, in general, the proportion of boys born/baptized had a downward trend from 1629 to 1710.This shows that, overall, fewer and fewer boys were born/baptized as a proportion to the total number of children born/baptized.However, as shown by he red dashed line, the proportion of boys born/baptized was higher than 50% (0.5) of the total number of children born/baptized each year. That is, more boys than girls were born/baptized each year.

# Insert code for Exercise 3 here
arbuthnot <- arbuthnot %>% mutate(total = boys + girls)
arbuthnot <- arbuthnot %>% mutate(boy_ratio = boys/total)

ggplot(data = arbuthnot, mapping = aes(x = year, y = boy_ratio)) + geom_point()+ geom_hline(yintercept = 0.5, linetype = "dashed", color = "red", size = 1) + geom_smooth(method = "lm") + labs(title = "Proportion of boys born over time", x = "Year of observation", y = "Proportion of boys") + theme(plot.title = element_text(hjust = 0.5))
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `geom_smooth()` using formula = 'y ~ x'

Exercise 4

The the years included in the observations are from 1940 to 2002.There are three columns/variables and 63 rows/observations in the data frame. The variable (column names) are year, boys, and girls.

# Insert code for Exercise 4 here
data('present', package='openintro')
present %>% summarize(Start_year = min(year), end_year = max(year), columns = ncol(present), rows = nrow(present))
## # A tibble: 1 × 4
##   Start_year end_year columns  rows
##        <dbl>    <dbl>   <int> <int>
## 1       1940     2002       3    63

Below are the variable or column names present in the data.

names(present)
## [1] "year"  "boys"  "girls"

Exercise 5

The summaries of the present and arbuthnot data sets shows that the two data sets are similar in how the respective variables compare. The summary statistics of the two sets of data shows that in both data sets,the proportion of boys is higher than 0.5 (50%).

Comparing the summary statistics, the two data sets are different in terms of the magnitude of the variables. The magnitude of boys and girls are much higher in the present data set than in the arbuthnot data set. However, the comparison of the magnitudes cannot be used to draw any inference because the two data sets are collected from different places at vastly different times.

present <- present %>% mutate(total = boys + girls)
present <- present %>% mutate(boy_ratio = boys/total)
summary(present)
##       year           boys             girls             total        
##  Min.   :1940   Min.   :1211684   Min.   :1148715   Min.   :2360399  
##  1st Qu.:1956   1st Qu.:1799857   1st Qu.:1711405   1st Qu.:3511262  
##  Median :1971   Median :1924868   Median :1831679   Median :3756547  
##  Mean   :1971   Mean   :1885600   Mean   :1793915   Mean   :3679515  
##  3rd Qu.:1986   3rd Qu.:2058524   3rd Qu.:1965538   3rd Qu.:4023830  
##  Max.   :2002   Max.   :2186274   Max.   :2082052   Max.   :4268326  
##    boy_ratio     
##  Min.   :0.5112  
##  1st Qu.:0.5121  
##  Median :0.5125  
##  Mean   :0.5125  
##  3rd Qu.:0.5130  
##  Max.   :0.5143
summary(arbuthnot)
##       year           boys          girls          total         boy_ratio     
##  Min.   :1629   Min.   :2890   Min.   :2722   Min.   : 5612   Min.   :0.5027  
##  1st Qu.:1649   1st Qu.:4759   1st Qu.:4457   1st Qu.: 9199   1st Qu.:0.5118  
##  Median :1670   Median :6073   Median :5718   Median :11813   Median :0.5157  
##  Mean   :1670   Mean   :5907   Mean   :5535   Mean   :11442   Mean   :0.5170  
##  3rd Qu.:1690   3rd Qu.:7576   3rd Qu.:7150   3rd Qu.:14723   3rd Qu.:0.5210  
##  Max.   :1710   Max.   :8426   Max.   :7779   Max.   :16145   Max.   :0.5362

Exercise 6

Below is a plot of the proportion of boys in the present data set over time. The plot shows that the proportion of boys over time remains greater that 0.5 (greater that 50% of the total). However, the trend line shows that this proportion decrease over time. This is similar to the arbuthnot data set (boy_ration plot also shown below). In both data sets, the proportion of boys is greater that 50% of the total but the trends in the proportion of boys over the observation periods decreases.

ggplot(data = present, mapping = aes(x = year, y = boy_ratio)) + geom_point()+ geom_hline(yintercept = 0.5, linetype = "dashed", color = "red", size = 1) + geom_smooth(method = "lm") + labs(title = "Proportion of boys born over time - present", x = "Year of observation", y = "Proportion of boys") + theme(plot.title = element_text(hjust = 0.5))
## `geom_smooth()` using formula = 'y ~ x'

ggplot(data = arbuthnot, mapping = aes(x = year, y = boy_ratio)) + geom_point()+ geom_hline(yintercept = 0.5, linetype = "dashed", color = "red", size = 1) + geom_smooth(method = "lm") + labs(title = "Proportion of boys born over time - arbuthnot", x = "Year of observation", y = "Proportion of boys") + theme(plot.title = element_text(hjust = 0.5))
## `geom_smooth()` using formula = 'y ~ x'

Exercise 7

Between 1940 and 2002, The year 1961 was the year with the most total births in the Unites States.

# Insert code for Exercise 7 here

max_total_year <- present %>%
  arrange(desc(total))

head(max_total_year, n = 5)
## # A tibble: 5 × 5
##    year    boys   girls   total boy_ratio
##   <dbl>   <dbl>   <dbl>   <dbl>     <dbl>
## 1  1961 2186274 2082052 4268326     0.512
## 2  1960 2179708 2078142 4257850     0.512
## 3  1957 2179960 2074824 4254784     0.512
## 4  1959 2173638 2071158 4244796     0.512
## 5  1958 2152546 2051266 4203812     0.512
LS0tDQp0aXRsZTogIkxhYiAxOiBJbnRybyB0byBSIg0KYXV0aG9yOiAiRm9tYmEgS2Fzc29oIg0KZGF0ZTogImByIFN5cy5EYXRlKClgIg0Kb3V0cHV0OiBvcGVuaW50cm86OmxhYl9yZXBvcnQNCi0tLQ0KDQpgYGB7ciBsb2FkLXBhY2thZ2VzLCBtZXNzYWdlPUZBTFNFfQ0KbGlicmFyeSh0aWR5dmVyc2UpDQpsaWJyYXJ5KG9wZW5pbnRybykNCmBgYA0KDQpgYGB7ciBwcmV2aWV3IGRhdGF9DQpkYXRhKCdhcmJ1dGhub3QnLCBwYWNrYWdlPSdvcGVuaW50cm8nKQ0KaGVhZChhcmJ1dGhub3QsIG4gPSAzKQ0KYGBgDQoNCg0KDQojIyMgRXhlcmNpc2UgMQ0KV2UgY2FuIHVzZSB0aGUgZGF0YS5mcmFtZSR2YXJpYWJsZSBub3RhdGlvbiB0byBleHRyYWN0IHRoZSB2YWx1ZXMgb2YgYSB2YXJpYWJsZS9jb2x1bW4uU2VlIGJlbG93Lg0KYGBge3Igdmlldy1naXJscy1jb3VudHN9DQphcmJ1dGhub3QkZ2lybHMNCmBgYA0KDQoNCiMjIyBFeGVyY2lzZSAyDQoNClRoZSBsaW5lIG9mIGJlc3QgZml0IHNob3dzIHRoYXQsIGluIGdlbmVyYWwsIHRoZSB0cmVuZCBvZiBnaXJscyBiYXB0aXplZCBoYXMgYW4gdXB3YXJkIHRyZW5kIGZyb20gMTYyOSB0byAxNzEwLiBMb29raW5nIG1vcmUgY2xvc2VseSBzaG93cyB0aGF0IGJldHdlZW4gMTY0MCB0byBhYm91dCAxNjU1LCB0aGVyZSB3YXMgYSBkb3dud2FyZCB0cmVuZCBpbiB0aGUgbnVtYmVyIG9mIGdpcmxzIGJhcHRpemVkLiBUaGlzIHRyZW5kIGluIHRoZSBudW1iZXIgb2YgZ2lybHMgYmFwdGl6ZWQgdGhlbiB0b29rIGEgc2hhcnAgdXB3YXJkIHRyZW5kIHVudGlsIGFib3V0IDE2OTAgd2hlbiB0aGUgdHJlbmQgc3RhcnRlZCB0byBwbGF0ZWF1Lg0KDQpgYGB7ciB0cmVuZC1naXJsc30NCiMgSW5zZXJ0IGNvZGUgZm9yIEV4ZXJjaXNlIDIgaGVyZQ0KZ2dwbG90KGRhdGEgPSBhcmJ1dGhub3QsIG1hcHBpbmcgPSBhZXMoeCA9IHllYXIsIHkgPSBnaXJscykpICsgZ2VvbV9wb2ludCgpKyBnZW9tX2xpbmUoKSArIGdlb21fc21vb3RoKG1ldGhvZCA9ICJsbSIpICsgbGFicyh0aXRsZSA9ICJUcmVuZCBpbiB4IGdpcmxzIGJhdGl6ZWQiLCB4ID0gIk5vLiBvZiBnaXJscyBiYXB0aXplZCIsIHkgPSAiWWVhciIpDQpgYGANCg0KDQojIyMgRXhlcmNpc2UgMw0KDQpUaGUgZ3JhcGggYmVsb3cgc2hvd3MgdGhhdCwgaW4gZ2VuZXJhbCwgdGhlIHByb3BvcnRpb24gb2YgYm95cyBib3JuL2JhcHRpemVkIGhhZCBhIGRvd253YXJkIHRyZW5kIGZyb20gMTYyOSB0byAxNzEwLlRoaXMgc2hvd3MgdGhhdCwgb3ZlcmFsbCwgZmV3ZXIgYW5kIGZld2VyIGJveXMgd2VyZSBib3JuL2JhcHRpemVkIGFzIGEgcHJvcG9ydGlvbiB0byB0aGUgdG90YWwgbnVtYmVyIG9mIGNoaWxkcmVuIGJvcm4vYmFwdGl6ZWQuSG93ZXZlciwgYXMgc2hvd24gYnkgaGUgcmVkIGRhc2hlZCBsaW5lLCB0aGUgcHJvcG9ydGlvbiBvZiBib3lzIGJvcm4vYmFwdGl6ZWQgd2FzIGhpZ2hlciB0aGFuIDUwJSAoMC41KSBvZiB0aGUgdG90YWwgbnVtYmVyIG9mIGNoaWxkcmVuIGJvcm4vYmFwdGl6ZWQgZWFjaCB5ZWFyLiBUaGF0IGlzLCBtb3JlIGJveXMgdGhhbiBnaXJscyB3ZXJlIGJvcm4vYmFwdGl6ZWQgZWFjaCB5ZWFyLg0KDQpgYGB7ciBwbG90LXByb3AtYm95c30NCiMgSW5zZXJ0IGNvZGUgZm9yIEV4ZXJjaXNlIDMgaGVyZQ0KYXJidXRobm90IDwtIGFyYnV0aG5vdCAlPiUgbXV0YXRlKHRvdGFsID0gYm95cyArIGdpcmxzKQ0KYXJidXRobm90IDwtIGFyYnV0aG5vdCAlPiUgbXV0YXRlKGJveV9yYXRpbyA9IGJveXMvdG90YWwpDQoNCmdncGxvdChkYXRhID0gYXJidXRobm90LCBtYXBwaW5nID0gYWVzKHggPSB5ZWFyLCB5ID0gYm95X3JhdGlvKSkgKyBnZW9tX3BvaW50KCkrIGdlb21faGxpbmUoeWludGVyY2VwdCA9IDAuNSwgbGluZXR5cGUgPSAiZGFzaGVkIiwgY29sb3IgPSAicmVkIiwgc2l6ZSA9IDEpICsgZ2VvbV9zbW9vdGgobWV0aG9kID0gImxtIikgKyBsYWJzKHRpdGxlID0gIlByb3BvcnRpb24gb2YgYm95cyBib3JuIG92ZXIgdGltZSIsIHggPSAiWWVhciBvZiBvYnNlcnZhdGlvbiIsIHkgPSAiUHJvcG9ydGlvbiBvZiBib3lzIikgKyB0aGVtZShwbG90LnRpdGxlID0gZWxlbWVudF90ZXh0KGhqdXN0ID0gMC41KSkNCmBgYA0KDQoNCiMjIyBFeGVyY2lzZSA0DQoNClRoZSB0aGUgeWVhcnMgaW5jbHVkZWQgaW4gdGhlIG9ic2VydmF0aW9ucyBhcmUgZnJvbSAxOTQwIHRvIDIwMDIuVGhlcmUgYXJlIHRocmVlIGNvbHVtbnMvdmFyaWFibGVzIGFuZCA2MyByb3dzL29ic2VydmF0aW9ucyAgaW4gdGhlIGRhdGEgZnJhbWUuIFRoZSB2YXJpYWJsZSAoY29sdW1uIG5hbWVzKSBhcmUgeWVhciwgYm95cywgYW5kIGdpcmxzLg0KDQpgYGB7ciBkaW0tcHJlc2VudH0NCiMgSW5zZXJ0IGNvZGUgZm9yIEV4ZXJjaXNlIDQgaGVyZQ0KZGF0YSgncHJlc2VudCcsIHBhY2thZ2U9J29wZW5pbnRybycpDQpwcmVzZW50ICU+JSBzdW1tYXJpemUoU3RhcnRfeWVhciA9IG1pbih5ZWFyKSwgZW5kX3llYXIgPSBtYXgoeWVhciksIGNvbHVtbnMgPSBuY29sKHByZXNlbnQpLCByb3dzID0gbnJvdyhwcmVzZW50KSkNCmBgYA0KQmVsb3cgYXJlIHRoZSB2YXJpYWJsZSBvciBjb2x1bW4gbmFtZXMgcHJlc2VudCBpbiB0aGUgZGF0YS4NCg0KYGBge3IgdmFyaWFibGUvY29sdW1uIG5hbWVzfQ0KbmFtZXMocHJlc2VudCkNCmBgYA0KDQojIyMgRXhlcmNpc2UgNQ0KVGhlIHN1bW1hcmllcyBvZiB0aGUgcHJlc2VudCBhbmQgYXJidXRobm90IGRhdGEgc2V0cyBzaG93cyB0aGF0IHRoZSB0d28gZGF0YSBzZXRzIGFyZSBzaW1pbGFyIGluIGhvdyB0aGUgcmVzcGVjdGl2ZSB2YXJpYWJsZXMgY29tcGFyZS4gVGhlIHN1bW1hcnkgc3RhdGlzdGljcyBvZiB0aGUgdHdvIHNldHMgb2YgZGF0YSBzaG93cyB0aGF0IGluIGJvdGggZGF0YSBzZXRzLHRoZSBwcm9wb3J0aW9uIG9mIGJveXMgaXMgaGlnaGVyIHRoYW4gMC41ICg1MCUpLiANCg0KQ29tcGFyaW5nIHRoZSBzdW1tYXJ5IHN0YXRpc3RpY3MsIHRoZSB0d28gZGF0YSBzZXRzIGFyZSBkaWZmZXJlbnQgaW4gdGVybXMgb2YgdGhlIG1hZ25pdHVkZSBvZiB0aGUgdmFyaWFibGVzLiBUaGUgbWFnbml0dWRlIG9mIGJveXMgYW5kIGdpcmxzIGFyZSBtdWNoIGhpZ2hlciBpbiB0aGUgcHJlc2VudCBkYXRhIHNldCB0aGFuIGluIHRoZSBhcmJ1dGhub3QgZGF0YSBzZXQuIEhvd2V2ZXIsIHRoZSBjb21wYXJpc29uIG9mIHRoZSBtYWduaXR1ZGVzIGNhbm5vdCBiZSB1c2VkIHRvIGRyYXcgYW55IGluZmVyZW5jZSBiZWNhdXNlIHRoZSB0d28gZGF0YSBzZXRzIGFyZSBjb2xsZWN0ZWQgZnJvbSBkaWZmZXJlbnQgcGxhY2VzIGF0IHZhc3RseSBkaWZmZXJlbnQgdGltZXMuDQoNCmBgYHtyIHN1bW1hcnkgb2YgcHJlc2VudCBkYXRhfQ0KcHJlc2VudCA8LSBwcmVzZW50ICU+JSBtdXRhdGUodG90YWwgPSBib3lzICsgZ2lybHMpDQpwcmVzZW50IDwtIHByZXNlbnQgJT4lIG11dGF0ZShib3lfcmF0aW8gPSBib3lzL3RvdGFsKQ0Kc3VtbWFyeShwcmVzZW50KQ0KYGBgDQoNCmBgYHtyIHN1bW1hcnkgb2YgYXJidXRobm90IGRhdGF9DQpzdW1tYXJ5KGFyYnV0aG5vdCkNCmBgYA0KIyMjIEV4ZXJjaXNlIDYNCg0KQmVsb3cgaXMgYSBwbG90IG9mIHRoZSBwcm9wb3J0aW9uIG9mIGJveXMgaW4gdGhlIHByZXNlbnQgZGF0YSBzZXQgb3ZlciB0aW1lLiBUaGUgcGxvdCBzaG93cyB0aGF0IHRoZSBwcm9wb3J0aW9uIG9mIGJveXMgb3ZlciB0aW1lIHJlbWFpbnMgZ3JlYXRlciB0aGF0IDAuNSAoZ3JlYXRlciB0aGF0IDUwJSBvZiB0aGUgdG90YWwpLiBIb3dldmVyLCB0aGUgdHJlbmQgbGluZSBzaG93cyB0aGF0IHRoaXMgcHJvcG9ydGlvbiBkZWNyZWFzZSBvdmVyIHRpbWUuIFRoaXMgaXMgc2ltaWxhciB0byB0aGUgYXJidXRobm90IGRhdGEgc2V0IChib3lfcmF0aW9uIHBsb3QgYWxzbyBzaG93biBiZWxvdykuIEluIGJvdGggZGF0YSBzZXRzLCB0aGUgcHJvcG9ydGlvbiBvZiBib3lzIGlzIGdyZWF0ZXIgdGhhdCA1MCUgb2YgdGhlIHRvdGFsIGJ1dCB0aGUgdHJlbmRzIGluIHRoZSBwcm9wb3J0aW9uIG9mIGJveXMgb3ZlciB0aGUgb2JzZXJ2YXRpb24gcGVyaW9kcyBkZWNyZWFzZXMuDQoNCg0KYGBge3IgcGxvdC1wcm9wLWJveXMtcHJlc2VudH0NCmdncGxvdChkYXRhID0gcHJlc2VudCwgbWFwcGluZyA9IGFlcyh4ID0geWVhciwgeSA9IGJveV9yYXRpbykpICsgZ2VvbV9wb2ludCgpKyBnZW9tX2hsaW5lKHlpbnRlcmNlcHQgPSAwLjUsIGxpbmV0eXBlID0gImRhc2hlZCIsIGNvbG9yID0gInJlZCIsIHNpemUgPSAxKSArIGdlb21fc21vb3RoKG1ldGhvZCA9ICJsbSIpICsgbGFicyh0aXRsZSA9ICJQcm9wb3J0aW9uIG9mIGJveXMgYm9ybiBvdmVyIHRpbWUgLSBwcmVzZW50IiwgeCA9ICJZZWFyIG9mIG9ic2VydmF0aW9uIiwgeSA9ICJQcm9wb3J0aW9uIG9mIGJveXMiKSArIHRoZW1lKHBsb3QudGl0bGUgPSBlbGVtZW50X3RleHQoaGp1c3QgPSAwLjUpKQ0KYGBgDQoNCg0KYGBge3IgcGxvdC1wcm9wLWJveXMtYXJidXRobm90fQ0KZ2dwbG90KGRhdGEgPSBhcmJ1dGhub3QsIG1hcHBpbmcgPSBhZXMoeCA9IHllYXIsIHkgPSBib3lfcmF0aW8pKSArIGdlb21fcG9pbnQoKSsgZ2VvbV9obGluZSh5aW50ZXJjZXB0ID0gMC41LCBsaW5ldHlwZSA9ICJkYXNoZWQiLCBjb2xvciA9ICJyZWQiLCBzaXplID0gMSkgKyBnZW9tX3Ntb290aChtZXRob2QgPSAibG0iKSArIGxhYnModGl0bGUgPSAiUHJvcG9ydGlvbiBvZiBib3lzIGJvcm4gb3ZlciB0aW1lIC0gYXJidXRobm90IiwgeCA9ICJZZWFyIG9mIG9ic2VydmF0aW9uIiwgeSA9ICJQcm9wb3J0aW9uIG9mIGJveXMiKSArIHRoZW1lKHBsb3QudGl0bGUgPSBlbGVtZW50X3RleHQoaGp1c3QgPSAwLjUpKQ0KYGBgDQoNCg0KIyMjIEV4ZXJjaXNlIDcNCg0KQmV0d2VlbiAxOTQwIGFuZCAyMDAyLCBUaGUgeWVhciAxOTYxIHdhcyB0aGUgeWVhciB3aXRoIHRoZSBtb3N0IHRvdGFsIGJpcnRocyBpbiB0aGUgVW5pdGVzIFN0YXRlcy4NCg0KYGBge3IgZmluZC1tYXgtdG90YWx9DQojIEluc2VydCBjb2RlIGZvciBFeGVyY2lzZSA3IGhlcmUNCg0KbWF4X3RvdGFsX3llYXIgPC0gcHJlc2VudCAlPiUNCiAgYXJyYW5nZShkZXNjKHRvdGFsKSkNCg0KaGVhZChtYXhfdG90YWxfeWVhciwgbiA9IDUpDQoNCmBgYA0KDQo=