Exercise 1

What command would you use to extract just the counts of girls baptized?

The following counts how many girls have been baptized from the arbuthnot dataset

data('arbuthnot', package='openintro');
sum(arbuthnot$girls)

[1] 453841

Exercise 2

Is there an apparent trend in the number of girls baptized over the years? How would you describe it?

library(ggplot2)
ggplot(data = arbuthnot, aes(x = year, y = girls)) + 
  geom_line()

There was a major decrease in baptized girls from about 1640 to 1660 after which the number of girl getting baptized started to dramatically increase. This trend tappered off after around 1690.

Exercise 3

Now, generate a plot of the proportion of boys born over time. What do you see?

library(magrittr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

data('arbuthnot', package='openintro');
arbuthnot <- arbuthnot %>% mutate(total = boys + girls);

ggplot(data = arbuthnot, aes(x = year, y = boys/total)) + 
  geom_line()

#arbuthnot <- arbuthnot %>% mutate(total = boys + girls);

Looking at the graph, I don’t see any definitive correlation of the proportion of baptized boys over the course of years.

Exercise 4

What years are included in this data set?

data('present', package='openintro');

present$year

[1] 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 [16] 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 [31] 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 [46] 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 [61] 2000 2001 2002

What are the dimensions of the data frame?

data('present', package='openintro');

glimpse(present)

Rows: 63 Columns: 3 $ year 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949, 1950… $ boys 1211684, 1289734, 1444365, 1508959, 1435301, 1404587, 1691220, 1… $ girls 1148715, 1223693, 1364631, 1427901, 1359499, 1330869, 1597452, 1…

What are the variable (column) names?

data('present', package='openintro');

names(present)

[1] “year” “boys” “girls”

Exercise 5

How do these counts compare to Arbuthnot’s? Are they of a similar magnitude?

library(knitr)

data('present', package='openintro');
data('arbuthnot', package='openintro');

kable(present, caption = "Present dataset")
Present dataset
year boys girls
1940 1211684 1148715
1941 1289734 1223693
1942 1444365 1364631
1943 1508959 1427901
1944 1435301 1359499
1945 1404587 1330869
1946 1691220 1597452
1947 1899876 1800064
1948 1813852 1721216
1949 1826352 1733177
1950 1823555 1730594
1951 1923020 1827830
1952 1971262 1875724
1953 2001798 1900322
1954 2059068 1958294
1955 2073719 1973576
1956 2133588 2029502
1957 2179960 2074824
1958 2152546 2051266
1959 2173638 2071158
1960 2179708 2078142
1961 2186274 2082052
1962 2132466 2034896
1963 2101632 1996388
1964 2060162 1967328
1965 1927054 1833304
1966 1845862 1760412
1967 1803388 1717571
1968 1796326 1705238
1969 1846572 1753634
1970 1915378 1816008
1971 1822910 1733060
1972 1669927 1588484
1973 1608326 1528639
1974 1622114 1537844
1975 1613135 1531063
1976 1624436 1543352
1977 1705916 1620716
1978 1709394 1623885
1979 1791267 1703131
1980 1852616 1759642
1981 1860272 1768966
1982 1885676 1794861
1983 1865553 1773380
1984 1879490 1789651
1985 1927983 1832578
1986 1924868 1831679
1987 1951153 1858241
1988 2002424 1907086
1989 2069490 1971468
1990 2129495 2028717
1991 2101518 2009389
1992 2082097 1982917
1993 2048861 1951379
1994 2022589 1930178
1995 1996355 1903234
1996 1990480 1901014
1997 1985596 1895298
1998 2016205 1925348
1999 2026854 1932563
2000 2076969 1981845
2001 2057922 1968011
2002 2057979 1963747
kable(arbuthnot, caption = "Arbuthnot dataset")
Arbuthnot dataset
year boys girls
1629 5218 4683
1630 4858 4457
1631 4422 4102
1632 4994 4590
1633 5158 4839
1634 5035 4820
1635 5106 4928
1636 4917 4605
1637 4703 4457
1638 5359 4952
1639 5366 4784
1640 5518 5332
1641 5470 5200
1642 5460 4910
1643 4793 4617
1644 4107 3997
1645 4047 3919
1646 3768 3395
1647 3796 3536
1648 3363 3181
1649 3079 2746
1650 2890 2722
1651 3231 2840
1652 3220 2908
1653 3196 2959
1654 3441 3179
1655 3655 3349
1656 3668 3382
1657 3396 3289
1658 3157 3013
1659 3209 2781
1660 3724 3247
1661 4748 4107
1662 5216 4803
1663 5411 4881
1664 6041 5681
1665 5114 4858
1666 4678 4319
1667 5616 5322
1668 6073 5560
1669 6506 5829
1670 6278 5719
1671 6449 6061
1672 6443 6120
1673 6073 5822
1674 6113 5738
1675 6058 5717
1676 6552 5847
1677 6423 6203
1678 6568 6033
1679 6247 6041
1680 6548 6299
1681 6822 6533
1682 6909 6744
1683 7577 7158
1684 7575 7127
1685 7484 7246
1686 7575 7119
1687 7737 7214
1688 7487 7101
1689 7604 7167
1690 7909 7302
1691 7662 7392
1692 7602 7316
1693 7676 7483
1694 6985 6647
1695 7263 6713
1696 7632 7229
1697 8062 7767
1698 8426 7626
1699 7911 7452
1700 7578 7061
1701 8102 7514
1702 8031 7656
1703 7765 7683
1704 6113 5738
1705 8366 7779
1706 7952 7417
1707 8379 7687
1708 8239 7623
1709 7840 7380
1710 7640 7288

The Present dataset contains a larger sample space of record baptized boys and girls than the Arbuthnot dataset

Exercise 6

Make a plot that displays the proportion of boys born over time. What do you see? Does Arbuthnot’s observation about boys being born in greater proportion than girls hold up in the U.S.?

library(dplyr)
library(ggplot2)

data('present', package='openintro');
present <- present %>% mutate(total = boys + girls);

ggplot(data = present, aes(x = year, y = boys/total)) + 
  geom_line()

Looking at the graph the proportion of boys decreases over time. This observation doesn’t hold with the Arbuthnot observation.

Exercise 7

In what year did we see the most total number of births in the U.S.?

library(knitr)

data('present', package='openintro');
present <- present %>% mutate(total = boys + girls);

kable(filter(present, total == max(present$total)))
year boys girls total
1961 2186274 2082052 4268326

1961 was the year with the most total number of births in the US according to the Present dataset