Data 606 - Lab 0

I. What years are included in this data set? What are the dimensions of the data frame and what are the variable or column names?

Within the USA birthrate data set, present, the range of years within the data can be found by using the range function:

range(present$year)
## [1] 1940 2002

and the dimension of the data and the headers can be found using:

dim(present)
## [1] 63  3
names(present)
## [1] "year"  "boys"  "girls"

How do these counts compare to Arbuthnot’s? Are they on a similar scale?

Comparing the present birthrate data from the USA, which ranged in years from 1940 to 2002, to that of Artbuthnot’s London birthrate data, which ranged from 1629 to 1710, the scale is quite different:

range(present$boys + present$girls) - range(arbuthnot$boys + arbuthnot$girls)
## [1] 2354787 4252181

The present data count, on the low end, exceeds Arbuthnot’s by nearly a qurter million. The variables are similar though, which will make comparisons easy.

Make a plot that displays the boy-to-girl ratio for every year in the data set. What do you see? Does Arbuthnot’s observation about boys being born in greater proportion than girls hold up in the U.S.? Include the plot in your response.

Observing the plot of the boy-to-girl ratio:

plot(x = present$year, y = present$boys/present$girls, type = "l", main = "Ratio of Boy-to-Girl Birthrates in the USA Between 1940 to 2002" ,xlab = "Year", ylab = "boy-to-girl ratio")

and determining the range ratio:

range(present$boys/present$girls)
## [1] 1.045686 1.058698

we see that, like Arbuthnot’s observations which was:

range(arbuthnot$boys/arbuthnot$girls)
## [1] 1.010673 1.156075

, the proportion of the birthrate of boys-to-girls is greater for every single year.

In what year did we see the most total number of births in the U.S.?

The year with the highest birthrate can be found by using:

present$year[present$boys + present$girls == max(present$boys+present$girls)]
## [1] 1961

Where the birthrate was:

max(present$boys+present$girls)
## [1] 4268326