library(ggplot2)
data('arbuthnot', package='openintro')
data('present', package='openintro')
PresentBirthData = data.frame(present)

Q1 What command would you use to extract just the counts of girls baptized?

PresentBirthData[,3]
##  [1] 1148715 1223693 1364631 1427901 1359499 1330869 1597452 1800064 1721216
## [10] 1733177 1730594 1827830 1875724 1900322 1958294 1973576 2029502 2074824
## [19] 2051266 2071158 2078142 2082052 2034896 1996388 1967328 1833304 1760412
## [28] 1717571 1705238 1753634 1816008 1733060 1588484 1528639 1537844 1531063
## [37] 1543352 1620716 1623885 1703131 1759642 1768966 1794861 1773380 1789651
## [46] 1832578 1831679 1858241 1907086 1971468 2028717 2009389 1982917 1951379
## [55] 1930178 1903234 1901014 1895298 1925348 1932563 1981845 1968011 1963747

Q2 Is there an apparent trend in the number of girls baptized over the years? How would you describe it? (To ensure that your lab report is comprehensive, be sure to include the code needed to make the plot as well as your written interpretation.)

ggplot(PresentBirthData, aes(x=year)) + geom_line(aes(y=girls), color ="darkred")

There seems to be no apparent trend other than it is generally increasing over time.

Q3 Now, generate a plot of the proportion of boys born over time. What do you see?

 ggplot(PresentBirthData, aes(x=year)) + geom_line(aes(y=boys), color ="green") 

## I see that it matches the proportion of girls almost exactly.

Q4 What years are included in this data set? What are the dimensions of the data frame? What are the variable (column) names?

range(PresentBirthData$year)
## [1] 1940 2002
dim(PresentBirthData)
## [1] 63  3
names(PresentBirthData)
## [1] "year"  "boys"  "girls"

Q5 How do these counts compare to Arbuthnot’s? Are they of a similar magnitude?

The counts compared to Arbuthnots are 3 orders of magnitude larger.

Q6 Make a plot that displays the proportion of boys born over time. What do you see? Does Arbuthnot’s observation about boys being born in greater proportion than girls hold up in the U.S.? Include the plot in your response. Hint: You should be able to reuse your code from Exercise 3 above, just replace the dataframe name.

 ggplot(arbuthnot, aes(x=year)) + geom_line(aes(y=boys), color ="darkred") 

## Yes it seems that a higher proportion of boys are being born over time.

Q7 In what year did we see the most total number of births in the U.S.? Hint: First calculate the totals and save it as a new variable. Then, sort your dataset in descending order based on the total column. You can do this interactively in the data viewer by clicking on the arrows next to the variable names. To include the sorted result in your report you will need to use two new functions: arrange (for sorting). We can arrange the data in a descending order with another function: desc (for descending order). The sample code is provided below.

for (i in 1:nrow(PresentBirthData)) {
     PresentBirthData$total[i] <- PresentBirthData$boys[i] + PresentBirthData$girls[i]
                                    }
                                    
PresentBirthData[which.max(PresentBirthData$total),]
##    year    boys   girls   total
## 22 1961 2186274 2082052 4268326

The year with the highest number of births is 1962.