library(tidyverse)
## Warning: package 'ggplot2' was built under R version 4.1.2
## Warning: package 'tidyr' was built under R version 4.1.2
## Warning: package 'readr' was built under R version 4.1.2
## Warning: package 'dplyr' was built under R version 4.1.2
library(openintro)
## Warning: package 'openintro' was built under R version 4.1.2
Counts of girls baptized:
arbuthnot$girls
## [1] 4683 4457 4102 4590 4839 4820 4928 4605 4457 4952 4784 5332 5200 4910 4617
## [16] 3997 3919 3395 3536 3181 2746 2722 2840 2908 2959 3179 3349 3382 3289 3013
## [31] 2781 3247 4107 4803 4881 5681 4858 4319 5322 5560 5829 5719 6061 6120 5822
## [46] 5738 5717 5847 6203 6033 6041 6299 6533 6744 7158 7127 7246 7119 7214 7101
## [61] 7167 7302 7392 7316 7483 6647 6713 7229 7767 7626 7452 7061 7514 7656 7683
## [76] 5738 7779 7417 7687 7623 7380 7288
There is a peak around 1660 with the number of girls being baptized increases with a slight decrease after the 1700s. Overall, it’s an upward trend in girls being baptized.
ggplot(data = arbuthnot, aes(x = year, y = girls)) +
geom_line()
ggplot(data = arbuthnot, aes(x = year, y = girls)) +
geom_point()
The boys born over time are slightly above 50% up until early 1700s where there’s a sudden drop. Overall, the birth of boys has significantly increased.
arbuthnot <- arbuthnot %>%
mutate(total = boys + girls)
ggplot(data = arbuthnot, aes(x = year, y = boys/total)) +
geom_line()
The years included in this data set are from 1940 to 2002. The dimensions of the data set are 63, 3 and the column names are “year”, “boys”, “girls”.
data(present)
dim(present)
## [1] 63 3
range(present$year)
## [1] 1940 2002
names(present)
## [1] "year" "boys" "girls"
Comparing the arbuthnot’s and present data set they are completely different. The median for arbuthnot’s data set is 11813 where as the one for present is 3756547.
summary(arbuthnot)
## year boys girls total
## Min. :1629 Min. :2890 Min. :2722 Min. : 5612
## 1st Qu.:1649 1st Qu.:4759 1st Qu.:4457 1st Qu.: 9199
## Median :1670 Median :6073 Median :5718 Median :11813
## Mean :1670 Mean :5907 Mean :5535 Mean :11442
## 3rd Qu.:1690 3rd Qu.:7576 3rd Qu.:7150 3rd Qu.:14723
## Max. :1710 Max. :8426 Max. :7779 Max. :16145
summary(present)
## year boys girls
## Min. :1940 Min. :1211684 Min. :1148715
## 1st Qu.:1956 1st Qu.:1799857 1st Qu.:1711404
## Median :1971 Median :1924868 Median :1831679
## Mean :1971 Mean :1885600 Mean :1793915
## 3rd Qu.:1986 3rd Qu.:2058524 3rd Qu.:1965538
## Max. :2002 Max. :2186274 Max. :2082052
Looking at the graph, there is a decrease in boys being born in the US compared to Arbuthnot’s observation. You can see that between 1940 and 1945 there’s still a peak but after 1950’s there’s a decline. From 1965 to 1980 there’s a slight increase with a huge drop afterwards.
present <- present %>%
mutate(total = boys + girls)
ggplot(data = present, aes(x = year, y = boys/total)) +
geom_line()
Based on this data, the most number of births in the US was in 1961.
present %>%
mutate(total = boys + girls) %>%
arrange(desc(total))
## # A tibble: 63 × 4
## year boys girls total
## <dbl> <dbl> <dbl> <dbl>
## 1 1961 2186274 2082052 4268326
## 2 1960 2179708 2078142 4257850
## 3 1957 2179960 2074824 4254784
## 4 1959 2173638 2071158 4244796
## 5 1958 2152546 2051266 4203812
## 6 1962 2132466 2034896 4167362
## 7 1956 2133588 2029502 4163090
## 8 1990 2129495 2028717 4158212
## 9 1991 2101518 2009389 4110907
## 10 1963 2101632 1996388 4098020
## # … with 53 more rows