library(stringr)
library(knitr)
data <- read.csv(url("https://raw.githubusercontent.com/fivethirtyeight/data/master/births/US_births_1994-2003_CDC_NCHS.csv"))
births <- as.data.frame(data)
head(births)
## year month date_of_month day_of_week births
## 1 1994 1 1 6 8096
## 2 1994 1 2 7 7772
## 3 1994 1 3 1 10142
## 4 1994 1 4 2 11248
## 5 1994 1 5 3 11053
## 6 1994 1 6 4 11406
reces <- read.csv("/Users/christinakasman/Desktop/Real_time_decision_rules.csv")
head(reces)
## Date.of.release Date.described index declaration our.dates NBER.dates
## 1 May 1968 1967:Q4 3.8 NA NA
## 2 Aug 1968 1968:Q1 1.8 NA NA
## 3 Nov 1968 1968:Q2 1.2 NA NA
## 4 Feb 1969 1968:Q3 2.3 NA NA
## 5 May 1969 1968:Q4 6.3 NA NA
## 6 Aug 1969 1969:Q1 13.0 NA NA
## hyperlink
## 1
## 2
## 3
## 4
## 5
## 6
Research Question: A recession is defined as two consecutive quarters of negative growth as measured by a countrys Gross Dometic Product. What are the effects of a US recession? Specifically, do U.S. recessions have an impact on number of births in the United States? I found a data set with US recession dates and a data set of number of births in the US and I plan to see if there is any correlation between the two. I will mainly focus on the birth data to determine if there are any changes related to a recession - I will look at the year occuring after a recession spefically to see if the recession created a change in decisions for parents to have children.
What are the cases, and how many are there? The cases in the birth data set are number of births by month and year. There are 39722137 births recorded from 1994-2003.
sum(births$births)
## [1] 39722137
Describe the method of data collection. The data is collected by each state and the National Center for Health Statistics (NCHS) from standard collection of birth certificate forms.
What type of study is this (observational/experiment)? This is an observational study - data from 1994-2003
If not, provide a citation/link.
What is the response variable, and what type is it (numerical/categorical)? Response variable is number of births - numerical
What is the explanatory variable, and what type is it (numerical/categorival)? Explanatory variable is years in which a recession occur - numerical
Provide summary statistics relevant to your research question. For example, if you’re comparing means across groups provide means, SDs, sample sizes of each group. This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
hist(births$births, main = "Births per day from 1994-2003", xlab = "Births")
#summary statistic for number of a births per day
summary(births$births)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6443 8844 11620 10880 12270 14540
qqnorm(births$births)
qqline(births$births)
#Not a normal distribution
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
births$date <- paste(births$month, births$year)
births1 <- births %>% group_by(date) %>% summarise(monthlybirths = sum(as.numeric(births)))
births1[order(as.Date(births1$date, format="%M %Y")),]
## # A tibble: 120 x 2
## date monthlybirths
## <chr> <dbl>
## 1 1 1994 320705
## 2 10 1994 330172
## 3 11 1994 319397
## 4 12 1994 326748
## 5 2 1994 301327
## 6 3 1994 339736
## 7 4 1994 317392
## 8 5 1994 330295
## 9 6 1994 329737
## 10 7 1994 345862
## # ... with 110 more rows
hist(births1$monthlybirths, main = "Births per month from 1994-2003", xlab = "Births")
#Births per month show a normal distribution
summary(births1$monthlybirths)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 291500 319700 330600 331000 342800 364200
Will need to look at number of births for year after recession to see how it compares against the mean births per month