I had tried to load the arbuthnot data as an R data set, but got an error message.
So, I will instead read in the data from a CSV file I found on the OpenIntro website.
download.file("https://www.openintro.org/stat/data/arbuthnot.csv",destfile="arbuthnot.csv")
arbuthnot <- read.csv("arbuthnot.csv",header=TRUE,stringsAsFactors=FALSE)
Exercise 1 - Extract the count of just girls baptized with this command:
arbuthnot$girls
## [1] 4683 4457 4102 4590 4839 4820 4928 4605 4457 4952 4784 5332 5200 4910
## [15] 4617 3997 3919 3395 3536 3181 2746 2722 2840 2908 2959 3179 3349 3382
## [29] 3289 3013 2781 3247 4107 4803 4881 5681 4858 4319 5322 5560 5829 5719
## [43] 6061 6120 5822 5738 5717 5847 6203 6033 6041 6299 6533 6744 7158 7127
## [57] 7246 7119 7214 7101 7167 7302 7392 7316 7483 6647 6713 7229 7767 7626
## [71] 7452 7061 7514 7656 7683 5738 7779 7417 7687 7623 7380 7288
Exercise 2 - The number of girls baptized decreases substantially starting around 1640 and stays low until around 1660, where it starts increasing again.
Exercise 3 - Make a plot of the proportion of boys over time with this command.
plot(arbuthnot$year,
arbuthnot$boys / (arbuthnot$boys + arbuthnot$girls),
xlab="Year",
ylab="Proportions of baptisms that were boys",
type="l")
We find that the proportion of boys fluctuates over time, but is always over 0.50 (so more boys born than girls).
First, we load the dataset “present”.
data(present,package='DATA606')
Now, we can explore this data to answer the questions.
dim(present)
## [1] 63 3
head(present)
## year boys girls
## 1 1940 1211684 1148715
## 2 1941 1289734 1223693
## 3 1942 1444365 1364631
## 4 1943 1508959 1427901
## 5 1944 1435301 1359499
## 6 1945 1404587 1330869
range(present$year)
## [1] 1940 2002
Like the arbuthnot data, this data frame also has 3 columns: “year”,“boys”, and “girls”.
There are 63 rows, corresponding to years 1940 to 2002.
range(arbuthnot$boys + arbuthnot$girls)
## [1] 5612 16145
range(present$boys + present$girls)
## [1] 2360399 4268326
The numbers in present are definitely a lot larger, in the range of a few million per year including boys + girls. Versus the arbuthnot data, where boys + girls are in the range of thousands (max less than 20,000).
#Let's put the plots side-by-side for Arbuthnot vs. United States so we can compare more clearly.
#Set ylim on the same scale for clearer comparison.
par(mfrow=c(1,2))
plot(arbuthnot$year,
arbuthnot$boys/arbuthnot$girls,
type="l",
xlab="Year",
ylab="Boy/girl birth ratio",
main="Arbuthnot birth sex ratio\n(1629-1710)",
ylim=range(c(arbuthnot$boys/arbuthnot$girls,present$boys/present$girls)))
plot(present$year,
present$boys/present$girls,
type="l",
xlab="Year",
ylab="Boy/girl birth ratio",
main="US birth sex ratio\n(1940-2002)",
ylim=range(c(arbuthnot$boys/arbuthnot$girls,present$boys/present$girls)))
In both Arbuthnot and the United States, more boys are born than girls in all years. However there is a lot more fluctuation in the Arbuthnot birth sex ratios.
present$year[which((present$boys + present$girls) == max((present$boys + present$girls)))]
## [1] 1961
The most total births in the U.S. were in 1961.