library(tidyverse)
library(openintro)

Exercise 1

arbuthnot$girls
##  [1] 4683 4457 4102 4590 4839 4820 4928 4605 4457 4952 4784 5332 5200 4910 4617
## [16] 3997 3919 3395 3536 3181 2746 2722 2840 2908 2959 3179 3349 3382 3289 3013
## [31] 2781 3247 4107 4803 4881 5681 4858 4319 5322 5560 5829 5719 6061 6120 5822
## [46] 5738 5717 5847 6203 6033 6041 6299 6533 6744 7158 7127 7246 7119 7214 7101
## [61] 7167 7302 7392 7316 7483 6647 6713 7229 7767 7626 7452 7061 7514 7656 7683
## [76] 5738 7779 7417 7687 7623 7380 7288

Exercise 2

There is a dramatic downward trend in the number of girls getting baptized from 1640 - 1660. The smallest number of births on the graph and in the data set is 2,722 in 1650.

From 1660 - 1710 we see the graph trend upwards. The largest number of girl births is 7,779 in 1705.

# Insert code for Exercise 2 here
ggplot(data = arbuthnot, aes(x = year, y = girls)) +
  geom_line()

Exercise 3

From 1629 - 1710 there are more boy births then girl births based on this data. It was difficult to see that with just the line graph but after using R to make comparisons and create ‘logical’ data, it tells us that there are more boys than girls in this data every single year from 1629 - 1710 because the output says ‘TRUE’ for every year. The result surprised me!

# Insert code for Exercise 3 here
arbuthnot <- arbuthnot %>%
    mutate(boy_to_girl_ratio = boys / girls)

arbuthnot$boys + arbuthnot$girls
##  [1]  9901  9315  8524  9584  9997  9855 10034  9522  9160 10311 10150 10850
## [13] 10670 10370  9410  8104  7966  7163  7332  6544  5825  5612  6071  6128
## [25]  6155  6620  7004  7050  6685  6170  5990  6971  8855 10019 10292 11722
## [37]  9972  8997 10938 11633 12335 11997 12510 12563 11895 11851 11775 12399
## [49] 12626 12601 12288 12847 13355 13653 14735 14702 14730 14694 14951 14588
## [61] 14771 15211 15054 14918 15159 13632 13976 14861 15829 16052 15363 14639
## [73] 15616 15687 15448 11851 16145 15369 16066 15862 15220 14928
arbuthnot <- arbuthnot %>%
  mutate(total = boys + girls)

arbuthnot <- arbuthnot %>%
    mutate(boy_ratio = boys / total)

ggplot(data = arbuthnot, aes(x = year, y = boy_ratio)) +
  geom_line()

arbuthnot <- arbuthnot %>%
  mutate(more_boys = boys > girls)
arbuthnot$more_boys
##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [76] TRUE TRUE TRUE TRUE TRUE TRUE TRUE

Exercise 4

The data set for present day birth record in the United States includes the years 1940 - 2002. The dimensions of the data frame is 63 by 3. The three variables are year, boys, girls

# Insert code for Exercise 4 here
data('present', package = 'openintro')

arbuthnot %>% 
  summarize(min = min(boys), max = max(boys))
## # A tibble: 1 × 2
##     min   max
##   <int> <int>
## 1  2890  8426
View(present)
dim(present)
## [1] 63  3
str(present)
## tibble [63 × 3] (S3: tbl_df/tbl/data.frame)
##  $ year : num [1:63] 1940 1941 1942 1943 1944 ...
##  $ boys : num [1:63] 1211684 1289734 1444365 1508959 1435301 ...
##  $ girls: num [1:63] 1148715 1223693 1364631 1427901 1359499 ...

Exercise 5

Arbuthnot has a bigger data frame, its dimensions are much bigger then the U.S present data frame. Arbuthnot measure the number of births for a larger period of time.

However, the present U.S data frame has larger magnitude. The number of births in present U.S is much larger then Arbuthnot’s data of London.

# Insert code for Exercise 5 here

Exercise 6

Yes, Arbuthnot’s observation about boys being born in greater proportion then girls holds up in the United States!

# Insert code for Exercise 6 here
present <- present %>%
    mutate(boy_to_girl_ratio = boys / girls)
 
present$boys + present$girls
##  [1] 2360399 2513427 2808996 2936860 2794800 2735456 3288672 3699940 3535068
## [10] 3559529 3554149 3750850 3846986 3902120 4017362 4047295 4163090 4254784
## [19] 4203812 4244796 4257850 4268326 4167362 4098020 4027490 3760358 3606274
## [28] 3520959 3501564 3600206 3731386 3555970 3258411 3136965 3159958 3144198
## [37] 3167788 3326632 3333279 3494398 3612258 3629238 3680537 3638933 3669141
## [46] 3760561 3756547 3809394 3909510 4040958 4158212 4110907 4065014 4000240
## [55] 3952767 3899589 3891494 3880894 3941553 3959417 4058814 4025933 4021726
present <- present %>%
  mutate(total = boys + girls)

present <- present %>%
    mutate(boy_ratio = boys / total)

ggplot(data = present, aes(x = year, y = boy_ratio)) +
  geom_line()

present <- present %>%
  mutate(more_boys = boys > girls)
present$more_boys
##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [61] TRUE TRUE TRUE

Exercise 7

In 1691 we observed the max 4268326 total of number of births. This included 2186274 boy births and 2082052 girl births.

# Insert code for Exercise 7 here
present %>% 
  arrange(desc(total))
## # A tibble: 63 × 7
##     year    boys   girls boy_to_girl_ratio   total boy_ratio more_boys
##    <dbl>   <dbl>   <dbl>             <dbl>   <dbl>     <dbl> <lgl>    
##  1  1961 2186274 2082052              1.05 4268326     0.512 TRUE     
##  2  1960 2179708 2078142              1.05 4257850     0.512 TRUE     
##  3  1957 2179960 2074824              1.05 4254784     0.512 TRUE     
##  4  1959 2173638 2071158              1.05 4244796     0.512 TRUE     
##  5  1958 2152546 2051266              1.05 4203812     0.512 TRUE     
##  6  1962 2132466 2034896              1.05 4167362     0.512 TRUE     
##  7  1956 2133588 2029502              1.05 4163090     0.513 TRUE     
##  8  1990 2129495 2028717              1.05 4158212     0.512 TRUE     
##  9  1991 2101518 2009389              1.05 4110907     0.511 TRUE     
## 10  1963 2101632 1996388              1.05 4098020     0.513 TRUE     
## # ℹ 53 more rows