library(tidyverse)
library(openintro)
arbuthnot$girls
## [1] 4683 4457 4102 4590 4839 4820 4928 4605 4457 4952 4784 5332 5200 4910 4617
## [16] 3997 3919 3395 3536 3181 2746 2722 2840 2908 2959 3179 3349 3382 3289 3013
## [31] 2781 3247 4107 4803 4881 5681 4858 4319 5322 5560 5829 5719 6061 6120 5822
## [46] 5738 5717 5847 6203 6033 6041 6299 6533 6744 7158 7127 7246 7119 7214 7101
## [61] 7167 7302 7392 7316 7483 6647 6713 7229 7767 7626 7452 7061 7514 7656 7683
## [76] 5738 7779 7417 7687 7623 7380 7288
There is a dramatic downward trend in the number of girls getting baptized from 1640 - 1660. The smallest number of births on the graph and in the data set is 2,722 in 1650.
From 1660 - 1710 we see the graph trend upwards. The largest number of girl births is 7,779 in 1705.
# Insert code for Exercise 2 here
ggplot(data = arbuthnot, aes(x = year, y = girls)) +
geom_line()
From 1629 - 1710 there are more boy births then girl births based on this data. It was difficult to see that with just the line graph but after using R to make comparisons and create ‘logical’ data, it tells us that there are more boys than girls in this data every single year from 1629 - 1710 because the output says ‘TRUE’ for every year. The result surprised me!
# Insert code for Exercise 3 here
arbuthnot <- arbuthnot %>%
mutate(boy_to_girl_ratio = boys / girls)
arbuthnot$boys + arbuthnot$girls
## [1] 9901 9315 8524 9584 9997 9855 10034 9522 9160 10311 10150 10850
## [13] 10670 10370 9410 8104 7966 7163 7332 6544 5825 5612 6071 6128
## [25] 6155 6620 7004 7050 6685 6170 5990 6971 8855 10019 10292 11722
## [37] 9972 8997 10938 11633 12335 11997 12510 12563 11895 11851 11775 12399
## [49] 12626 12601 12288 12847 13355 13653 14735 14702 14730 14694 14951 14588
## [61] 14771 15211 15054 14918 15159 13632 13976 14861 15829 16052 15363 14639
## [73] 15616 15687 15448 11851 16145 15369 16066 15862 15220 14928
arbuthnot <- arbuthnot %>%
mutate(total = boys + girls)
arbuthnot <- arbuthnot %>%
mutate(boy_ratio = boys / total)
ggplot(data = arbuthnot, aes(x = year, y = boy_ratio)) +
geom_line()
arbuthnot <- arbuthnot %>%
mutate(more_boys = boys > girls)
arbuthnot$more_boys
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [76] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
The data set for present day birth record in the United States includes the years 1940 - 2002. The dimensions of the data frame is 63 by 3. The three variables are year, boys, girls
# Insert code for Exercise 4 here
data('present', package = 'openintro')
arbuthnot %>%
summarize(min = min(boys), max = max(boys))
## # A tibble: 1 × 2
## min max
## <int> <int>
## 1 2890 8426
View(present)
dim(present)
## [1] 63 3
str(present)
## tibble [63 × 3] (S3: tbl_df/tbl/data.frame)
## $ year : num [1:63] 1940 1941 1942 1943 1944 ...
## $ boys : num [1:63] 1211684 1289734 1444365 1508959 1435301 ...
## $ girls: num [1:63] 1148715 1223693 1364631 1427901 1359499 ...
Arbuthnot has a bigger data frame, its dimensions are much bigger then the U.S present data frame. Arbuthnot measure the number of births for a larger period of time.
However, the present U.S data frame has larger magnitude. The number of births in present U.S is much larger then Arbuthnot’s data of London.
# Insert code for Exercise 5 here
Yes, Arbuthnot’s observation about boys being born in greater proportion then girls holds up in the United States!
# Insert code for Exercise 6 here
present <- present %>%
mutate(boy_to_girl_ratio = boys / girls)
present$boys + present$girls
## [1] 2360399 2513427 2808996 2936860 2794800 2735456 3288672 3699940 3535068
## [10] 3559529 3554149 3750850 3846986 3902120 4017362 4047295 4163090 4254784
## [19] 4203812 4244796 4257850 4268326 4167362 4098020 4027490 3760358 3606274
## [28] 3520959 3501564 3600206 3731386 3555970 3258411 3136965 3159958 3144198
## [37] 3167788 3326632 3333279 3494398 3612258 3629238 3680537 3638933 3669141
## [46] 3760561 3756547 3809394 3909510 4040958 4158212 4110907 4065014 4000240
## [55] 3952767 3899589 3891494 3880894 3941553 3959417 4058814 4025933 4021726
present <- present %>%
mutate(total = boys + girls)
present <- present %>%
mutate(boy_ratio = boys / total)
ggplot(data = present, aes(x = year, y = boy_ratio)) +
geom_line()
present <- present %>%
mutate(more_boys = boys > girls)
present$more_boys
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [61] TRUE TRUE TRUE
In 1691 we observed the max 4268326 total of number of births. This included 2186274 boy births and 2082052 girl births.
# Insert code for Exercise 7 here
present %>%
arrange(desc(total))
## # A tibble: 63 × 7
## year boys girls boy_to_girl_ratio total boy_ratio more_boys
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>
## 1 1961 2186274 2082052 1.05 4268326 0.512 TRUE
## 2 1960 2179708 2078142 1.05 4257850 0.512 TRUE
## 3 1957 2179960 2074824 1.05 4254784 0.512 TRUE
## 4 1959 2173638 2071158 1.05 4244796 0.512 TRUE
## 5 1958 2152546 2051266 1.05 4203812 0.512 TRUE
## 6 1962 2132466 2034896 1.05 4167362 0.512 TRUE
## 7 1956 2133588 2029502 1.05 4163090 0.513 TRUE
## 8 1990 2129495 2028717 1.05 4158212 0.512 TRUE
## 9 1991 2101518 2009389 1.05 4110907 0.511 TRUE
## 10 1963 2101632 1996388 1.05 4098020 0.513 TRUE
## # ℹ 53 more rows