library(stringr)
library(knitr)
data <- read.csv(url("https://raw.githubusercontent.com/fivethirtyeight/data/master/births/US_births_1994-2003_CDC_NCHS.csv"))
births <- as.data.frame(data)
head(births)
##   year month date_of_month day_of_week births
## 1 1994     1             1           6   8096
## 2 1994     1             2           7   7772
## 3 1994     1             3           1  10142
## 4 1994     1             4           2  11248
## 5 1994     1             5           3  11053
## 6 1994     1             6           4  11406
reces <- read.csv("/Users/christinakasman/Desktop/Real_time_decision_rules.csv")
head(reces)
##   Date.of.release Date.described index declaration our.dates NBER.dates
## 1        May 1968        1967:Q4   3.8                    NA         NA
## 2        Aug 1968        1968:Q1   1.8                    NA         NA
## 3        Nov 1968        1968:Q2   1.2                    NA         NA
## 4        Feb 1969        1968:Q3   2.3                    NA         NA
## 5        May 1969        1968:Q4   6.3                    NA         NA
## 6        Aug 1969        1969:Q1  13.0                    NA         NA
##   hyperlink
## 1          
## 2          
## 3          
## 4          
## 5          
## 6

Research Question: A recession is defined as two consecutive quarters of negative growth as measured by a countrys Gross Dometic Product. What are the effects of a US recession? Specifically, do U.S. recessions have an impact on number of births in the United States? I found a data set with US recession dates and a data set of number of births in the US and I plan to see if there is any correlation between the two. I will mainly focus on the birth data to determine if there are any changes related to a recession - I will look at the year occuring after a recession spefically to see if the recession created a change in decisions for parents to have children.

What are the cases, and how many are there? The cases in the birth data set are number of births by month and year. There are 39722137 births recorded from 1994-2003.

sum(births$births)
## [1] 39722137

Describe the method of data collection. The data is collected by each state and the National Center for Health Statistics (NCHS) from standard collection of birth certificate forms.

What type of study is this (observational/experiment)? This is an observational study - data from 1994-2003

If not, provide a citation/link.

What is the response variable, and what type is it (numerical/categorical)? Response variable is number of births - numerical

What is the explanatory variable, and what type is it (numerical/categorival)? Explanatory variable is years in which a recession occur - numerical

Provide summary statistics relevant to your research question. For example, if you’re comparing means across groups provide means, SDs, sample sizes of each group. This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.

hist(births$births, main = "Births per day from 1994-2003", xlab = "Births")

#summary statistic for number of a births per day
summary(births$births)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    6443    8844   11620   10880   12270   14540
qqnorm(births$births)
qqline(births$births)

#Not a normal distribution
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
births$date <- paste(births$month, births$year)
births1 <- births %>% group_by(date) %>% summarise(monthlybirths = sum(as.numeric(births)))
births1[order(as.Date(births1$date, format="%M %Y")),]
## # A tibble: 120 x 2
##       date monthlybirths
##      <chr>         <dbl>
##  1  1 1994        320705
##  2 10 1994        330172
##  3 11 1994        319397
##  4 12 1994        326748
##  5  2 1994        301327
##  6  3 1994        339736
##  7  4 1994        317392
##  8  5 1994        330295
##  9  6 1994        329737
## 10  7 1994        345862
## # ... with 110 more rows
hist(births1$monthlybirths, main = "Births per month from 1994-2003", xlab = "Births")

#Births per month show a normal distribution
summary(births1$monthlybirths)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  291500  319700  330600  331000  342800  364200

Will need to look at number of births for year after recession to see how it compares against the mean births per month