library(tidyverse)
library(openintro)

Exercise 1

To extract all counts of the births of girls, I would simply use arbuthnot$girls.

arbuthnot$girls
##  [1] 4683 4457 4102 4590 4839 4820 4928 4605 4457 4952 4784 5332 5200 4910 4617
## [16] 3997 3919 3395 3536 3181 2746 2722 2840 2908 2959 3179 3349 3382 3289 3013
## [31] 2781 3247 4107 4803 4881 5681 4858 4319 5322 5560 5829 5719 6061 6120 5822
## [46] 5738 5717 5847 6203 6033 6041 6299 6533 6744 7158 7127 7246 7119 7214 7101
## [61] 7167 7302 7392 7316 7483 6647 6713 7229 7767 7626 7452 7061 7514 7656 7683
## [76] 5738 7779 7417 7687 7623 7380 7288
ggplot(data = arbuthnot, aes(x = year, y = girls)) +
  geom_point()+
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(data = arbuthnot, aes(x = year, y = girls)) +
  geom_line(color = 'pink')

Exercise 2

There is a general increase in the number of girls baptized over the years, but the plot looks like a portion of a sine wave. It appears as though the number of girls baptized in the 40 years after the end of the plot will see a gradual decline.

To add the new variable column, I took an approach that I personally find more appealing, but for the sake of the exercise I did follow the piping and mutation steps. I have included my approach in the code block as a comment. It seems that piping is much more versatile than attach() and detach()

arbuthnot <- arbuthnot %>%
  mutate(total = boys + girls)

#arbuthnot$total <- arbuthnot$boys + arbuthnot$girls
ggplot(data = arbuthnot, aes(x = year, y = total)) +
  geom_line(color = 'chartreuse')+
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

arbuthnot <- arbuthnot %>%
  mutate(boy_to_girl_ratio = boys / girls)
arbuthnot <- arbuthnot %>%
  mutate(boy_ratio = boys / total)
arbuthnot <- arbuthnot %>%
  mutate(girl_ratio = girls / total)

Exercise 3

Around the same time that the girls saw their lowest baptism numbers, boys proportionally saw their highest spike, which indicates that the girls’ recovery from whatever decreased their baptism numbers was significantly slower than that of the boys.

ggplot(data = arbuthnot, aes(x = year, y = boy_ratio)) +
  geom_line(color = 'skyblue')+
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

arbuthnot <- arbuthnot %>%
  mutate(more_boys = boys > girls)

data('present', package = 'openintro')

arbuthnot %>%
  summarize(min = min(boys), max = max(boys))
## # A tibble: 1 × 2
##     min   max
##   <int> <int>
## 1  2890  8426

Exercise 4

The data set ranges from 1629-1710. The dimensions are 82 rows by 8 columns (after adding columns to the data frame) Note: I added a girl_ratio column as well. The column names are: year, boys, girls, total, boy_to_girl_ratio, boy_ratio, girl_ratio, and more_boys.

attach(arbuthnot)
y1 <- year[as.numeric(which(year == min(year)))]
y2 <- year[as.numeric(which(year == max(year)))]
detach(arbuthnot)
dims <- dim(arbuthnot)
names <- colnames(arbuthnot)

y1
## [1] 1629
y2
## [1] 1710
dims
## [1] 82  8
names
## [1] "year"              "boys"              "girls"            
## [4] "total"             "boy_to_girl_ratio" "boy_ratio"        
## [7] "girl_ratio"        "more_boys"

Exercise 5

Present day birth numbers are significantly higher, so at a glance, the data is tremendously different. The ratios offer a much more clear comparison than counts would provide. In the present dataframe, the proportion for births of girls is sharply increasing, while that of boys is decreasing. While girls may be trending towards a larger set of the proportion of births, in the two data sets, girls born never exceeds 50% of children in a given year.

In both samples, the number of boys exceeds that of girls in every single year.

present <- present %>%
  mutate(total = boys + girls)
present <- present %>%
  mutate(boy_to_girl_ratio = boys / girls)
present <- present %>%
  mutate(boy_ratio = boys / total)
present <- present %>%
  mutate(girl_ratio = girls / total)
present <- present %>%
  mutate(more_boys = boys > girls)

present %>%
  summarize(min = min(boys), max = max(boys))
## # A tibble: 1 × 2
##       min     max
##     <dbl>   <dbl>
## 1 1211684 2186274
max(arbuthnot$girl_ratio)
## [1] 0.4973459
max(present$girl_ratio)
## [1] 0.4888335
ggplot(data = present, aes(x = year, y = total)) +
  geom_line(color = 'darkgreen')+
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(data = present, aes(x = year, y = girl_ratio)) +
  geom_line(color = 'pink')+
  ggtitle("Present Girl Ratio")+
  theme(plot.title = element_text(hjust = 0.5))+
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(data = arbuthnot, aes(x = year, y = girl_ratio)) +
  geom_line(color = 'pink2')+
  ggtitle("Arbuthnot Girl Ratio")+
  theme(plot.title = element_text(hjust = 0.5))+
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(data = present, aes(x = year, y = boy_ratio)) +
  geom_line(color = 'aquamarine')+
  ggtitle("Present Boy Ratio")+
  theme(plot.title = element_text(hjust = 0.5))+
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(data = arbuthnot, aes(x = year, y = boy_ratio)) +
  geom_line(color = 'aquamarine')+
  ggtitle("Arbuthnot Boy Ratio")+
  theme(plot.title = element_text(hjust = 0.5))+
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Exercise 6

The plots I will display here are the same as the plots that I used to more fully grasp the comparison between the two data sets in exercise 5. As mentioned in the text of exercise 5, the number of boys always exceeds that of girls.

It is interesting to note that the two plotting strategies yield identical results with the exception of the scale for the y-axis. Dare I say, they are \(proportionate\).

ggplot(data = arbuthnot, aes(x = year, y = boy_ratio)) +
  geom_line(color = 'aquamarine')+
  ggtitle("Arbuthnot Boy Ratio")+
  theme(plot.title = element_text(hjust = 0.5))+
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(data = present, aes(x = year, y = boy_ratio)) +
  geom_line(color = 'aquamarine')+
  ggtitle("Present Boy Ratio")+
  theme(plot.title = element_text(hjust = 0.5))+
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(data = arbuthnot, aes(x = year, y = boy_to_girl_ratio)) +
  geom_line(color = 'aquamarine')+
  ggtitle("Arbuthnot Boy to Girl Ratio")+
  theme(plot.title = element_text(hjust = 0.5))+
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(data = present, aes(x = year, y = boy_to_girl_ratio)) +
  geom_line(color = 'aquamarine')+
  ggtitle("Present Boy to Girl Ratio")+
  theme(plot.title = element_text(hjust = 0.5))+
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Exercise 7

The most births in the US was in 1961. This makes sense because it took place within the baby boom. In the plot below, we can see the sharp increase in births that coincides with the end of WWII.

ggplot(data = present, aes(x = year, y = total)) +
  geom_line(color = 'darkorange1')+
  ggtitle("Present Total Births")+
  theme(plot.title = element_text(hjust = 0.5))+
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

present[as.numeric(which(present$total == max(present$total))), 1]
## # A tibble: 1 × 1
##    year
##   <dbl>
## 1  1961
present %>%
  arrange(desc(total))
## # A tibble: 63 × 8
##     year    boys   girls   total boy_to_girl_rat… boy_ratio girl_ratio more_boys
##    <dbl>   <dbl>   <dbl>   <dbl>            <dbl>     <dbl>      <dbl> <lgl>    
##  1  1961 2186274 2082052 4268326             1.05     0.512      0.488 TRUE     
##  2  1960 2179708 2078142 4257850             1.05     0.512      0.488 TRUE     
##  3  1957 2179960 2074824 4254784             1.05     0.512      0.488 TRUE     
##  4  1959 2173638 2071158 4244796             1.05     0.512      0.488 TRUE     
##  5  1958 2152546 2051266 4203812             1.05     0.512      0.488 TRUE     
##  6  1962 2132466 2034896 4167362             1.05     0.512      0.488 TRUE     
##  7  1956 2133588 2029502 4163090             1.05     0.513      0.487 TRUE     
##  8  1990 2129495 2028717 4158212             1.05     0.512      0.488 TRUE     
##  9  1991 2101518 2009389 4110907             1.05     0.511      0.489 TRUE     
## 10  1963 2101632 1996388 4098020             1.05     0.513      0.487 TRUE     
## # … with 53 more rows
---
title: "Lab 1: Intro to R"
author: "Shane Hylton"
date: "`r Sys.Date()`"
output: openintro::lab_report
---

```{r load-packages, message=FALSE}
library(tidyverse)
library(openintro)
```

### Exercise 1

To extract all counts of the births of girls, I would simply use arbuthnot$girls. 

```{r view-girls-counts}
arbuthnot$girls
ggplot(data = arbuthnot, aes(x = year, y = girls)) +
  geom_point()+
  geom_smooth()

ggplot(data = arbuthnot, aes(x = year, y = girls)) +
  geom_line(color = 'pink')


```


### Exercise 2

There is a general increase in the number of girls baptized over the years, but the plot looks like a portion of a sine wave. It appears as though the number of girls baptized in the 40 years after the end of the plot will see a gradual decline. 

To add the new variable column, I took an approach that I personally find more appealing, but for the sake of the exercise I did follow the piping and mutation steps. I have included my approach in the code block as a comment. It seems that piping is much more versatile than attach() and detach() 


```{r trend-girls}
arbuthnot <- arbuthnot %>%
  mutate(total = boys + girls)

#arbuthnot$total <- arbuthnot$boys + arbuthnot$girls
ggplot(data = arbuthnot, aes(x = year, y = total)) +
  geom_line(color = 'chartreuse')+
  geom_smooth()

arbuthnot <- arbuthnot %>%
  mutate(boy_to_girl_ratio = boys / girls)
arbuthnot <- arbuthnot %>%
  mutate(boy_ratio = boys / total)
arbuthnot <- arbuthnot %>%
  mutate(girl_ratio = girls / total)

```


### Exercise 3

Around the same time that the girls saw their lowest baptism numbers, boys proportionally saw their highest spike, which indicates that the girls' recovery from whatever decreased their baptism numbers was significantly slower than that of the boys. 

```{r plot-prop-boys-arbuthnot}
ggplot(data = arbuthnot, aes(x = year, y = boy_ratio)) +
  geom_line(color = 'skyblue')+
  geom_smooth()

arbuthnot <- arbuthnot %>%
  mutate(more_boys = boys > girls)

data('present', package = 'openintro')

arbuthnot %>%
  summarize(min = min(boys), max = max(boys))
```


### Exercise 4

The data set ranges from 1629-1710.
The dimensions are 82 rows by 8 columns (after adding columns to the data frame)
Note: I added a girl_ratio column as well. 
The column names are: year, boys, girls, total, boy_to_girl_ratio, boy_ratio, girl_ratio, and more_boys. 


```{r dim-present}
attach(arbuthnot)
y1 <- year[as.numeric(which(year == min(year)))]
y2 <- year[as.numeric(which(year == max(year)))]
detach(arbuthnot)
dims <- dim(arbuthnot)
names <- colnames(arbuthnot)

y1
y2
dims
names

```


### Exercise 5

Present day birth numbers are significantly higher, so at a glance, the data is tremendously different. 
The ratios offer a much more clear comparison than counts would provide. In the present dataframe, the proportion for births of girls is sharply increasing, while that of boys is decreasing. While girls may be trending towards a larger set of the proportion of births, in the two data sets, girls born never exceeds 50% of children in a given year. 

In both samples, the number of boys exceeds that of girls in every single year. 



```{r count-compare}
present <- present %>%
  mutate(total = boys + girls)
present <- present %>%
  mutate(boy_to_girl_ratio = boys / girls)
present <- present %>%
  mutate(boy_ratio = boys / total)
present <- present %>%
  mutate(girl_ratio = girls / total)
present <- present %>%
  mutate(more_boys = boys > girls)

present %>%
  summarize(min = min(boys), max = max(boys))

max(arbuthnot$girl_ratio)
max(present$girl_ratio)


ggplot(data = present, aes(x = year, y = total)) +
  geom_line(color = 'darkgreen')+
  geom_smooth()

ggplot(data = present, aes(x = year, y = girl_ratio)) +
  geom_line(color = 'pink')+
  ggtitle("Present Girl Ratio")+
  theme(plot.title = element_text(hjust = 0.5))+
  geom_smooth()

ggplot(data = arbuthnot, aes(x = year, y = girl_ratio)) +
  geom_line(color = 'pink2')+
  ggtitle("Arbuthnot Girl Ratio")+
  theme(plot.title = element_text(hjust = 0.5))+
  geom_smooth()

ggplot(data = present, aes(x = year, y = boy_ratio)) +
  geom_line(color = 'aquamarine')+
  ggtitle("Present Boy Ratio")+
  theme(plot.title = element_text(hjust = 0.5))+
  geom_smooth()

ggplot(data = arbuthnot, aes(x = year, y = boy_ratio)) +
  geom_line(color = 'aquamarine')+
  ggtitle("Arbuthnot Boy Ratio")+
  theme(plot.title = element_text(hjust = 0.5))+
  geom_smooth()
```


### Exercise 6

The plots I will display here are the same as the plots that I used to more fully grasp the comparison between the two data sets in exercise 5. As mentioned in the text of exercise 5, the number of boys always exceeds that of girls.

It is interesting to note that the two plotting strategies yield identical results with the exception of the scale for the y-axis. Dare I say, they are $proportionate$. 

```{r plot-prop-boys-present}

ggplot(data = arbuthnot, aes(x = year, y = boy_ratio)) +
  geom_line(color = 'aquamarine')+
  ggtitle("Arbuthnot Boy Ratio")+
  theme(plot.title = element_text(hjust = 0.5))+
  geom_smooth()

ggplot(data = present, aes(x = year, y = boy_ratio)) +
  geom_line(color = 'aquamarine')+
  ggtitle("Present Boy Ratio")+
  theme(plot.title = element_text(hjust = 0.5))+
  geom_smooth()

ggplot(data = arbuthnot, aes(x = year, y = boy_to_girl_ratio)) +
  geom_line(color = 'aquamarine')+
  ggtitle("Arbuthnot Boy to Girl Ratio")+
  theme(plot.title = element_text(hjust = 0.5))+
  geom_smooth()

ggplot(data = present, aes(x = year, y = boy_to_girl_ratio)) +
  geom_line(color = 'aquamarine')+
  ggtitle("Present Boy to Girl Ratio")+
  theme(plot.title = element_text(hjust = 0.5))+
  geom_smooth()
```


### Exercise 7

The most births in the US was in 1961. This makes sense because it took place within the baby boom. In the plot below, we can see the sharp increase in births that coincides with the end of WWII.

```{r find-max-total}
ggplot(data = present, aes(x = year, y = total)) +
  geom_line(color = 'darkorange1')+
  ggtitle("Present Total Births")+
  theme(plot.title = element_text(hjust = 0.5))+
  geom_smooth()

present[as.numeric(which(present$total == max(present$total))), 1]

present %>%
  arrange(desc(total))
```

