Lab 0 - Introduction to R and RStudio

By Brian Weinfeld

January 27, 2018

#1: What command would you use to extract just the counts of girls baptized? Try it!

arbuthnot$girls
##  [1] 4683 4457 4102 4590 4839 4820 4928 4605 4457 4952 4784 5332 5200 4910
## [15] 4617 3997 3919 3395 3536 3181 2746 2722 2840 2908 2959 3179 3349 3382
## [29] 3289 3013 2781 3247 4107 4803 4881 5681 4858 4319 5322 5560 5829 5719
## [43] 6061 6120 5822 5738 5717 5847 6203 6033 6041 6299 6533 6744 7158 7127
## [57] 7246 7119 7214 7101 7167 7302 7392 7316 7483 6647 6713 7229 7767 7626
## [71] 7452 7061 7514 7656 7683 5738 7779 7417 7687 7623 7380 7288

#2: Is there an apparent trend in the number of girls baptized over the years? How would you describe it?

plot(arbuthnot$year, arbuthnot$girls, type='l', xlab='Year', ylab='Female Baptisms')
title('Female Baptisms by Year (1629-1710)')
abline(lm(arbuthnot$girls ~ arbuthnot$year), col='red')

There is a consistent upwards trend in the number of girls baptized over the given time frame aside form a decade long drop from appoximately 1650 to 1660. One would speculate that some event or series of events occurred during this time to artificially drop the number of girls being baptized.

#3: Now, make a plot of the proportion of boys over time. What do you see? Tip: If you use the up and down arrow keys, you can scroll through your previous commands, your so-called command history. You can also access it by clicking on the history tab in the upper right panel. This will save you a lot of typing in the future.

prop <- function(data){
  data$boys/(data$boys+data$girls)
}
plot(arbuthnot$year, prop(arbuthnot), type = 'l',  xlab='Year', ylab='Proportion Male Baptisms')
title('Proportion of Male Baptisms by Year (1629-1710)')
abline(lm(prop(arbuthnot) ~ arbuthnot$year), col='red')

The proportion of boys born during the measured time frame varies from year to year but is consistently just around 51-52% and never below 50%

#1: What years are included in this data set? What are the dimensions of the data frame and what are the variable or column names?

cbind(min(present$year), max(present$year))
##      [,1] [,2]
## [1,] 1940 2002
dim(present)
## [1] 63  3
names(present)
## [1] "year"  "boys"  "girls"

The years included in this data set range from 1940 to 2002 inclusive.

The dimensions of the data frame are 63 observations of 3 variables.

The column names are ‘year’, ‘boys’, and ‘girls’

#2: How do these counts compare to Arbuthnot’s? Are they on a similar scale?

summary(present$boys + present$girls)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 2360399 3511262 3756547 3679515 4023830 4268326
summary(arbuthnot$boys + arbuthnot$girls)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    5612    9199   11813   11442   14723   16145

Comparing the summary statistics show that the present data is many orders of magnitude larger than the arbuthnot data.

#3: Make a plot that displays the boy-to-girl ratio for every year in the data set. What do you see? Does Arbuthnot’s observation about boys being born in greater proportion than girls hold up in the U.S.? Include the plot in your response.

plot(present$year, prop(present), type='l', xlab='Year', ylab='Proportion Male Births')
title('Proportion of Male Births by Year (1940-2002)')

plot(seq(1,length(arbuthnot[,1])), prop(arbuthnot), type='l', col='blue', xlab='Year Count', ylab='Proportion Male Births/Baptisms')
lines(seq(1,length(present[,1])), prop(present), type='l', col='red')
title('Comparison of Proportion of Male Births/Baptisms by Year')
legend(x='topright', legend=c('Present', 'Arbuthnot'), col=c('red', 'blue'), lty=1)

Yes, all the years have a greater proportion of boys than girls however the present data shows a significantly more consistent proportion when compared the relative volatility of the arbuthnot data.

#4: In what year did we see the most total number of births in the U.S.? You can refer to the help files or the R reference card http://cran.r-project.org/doc/contrib/Short-refcard.pdf to find helpful commands.

present[present$boys + present$girls == max(present$boys + present$girls),]$year
## [1] 1961

The greatest number of births occurred in 1961.