##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
##
## | year| month| date_of_month| day_of_week| births|
## |----:|-----:|-------------:|-----------:|------:|
## | 2000| 1| 1| 6| 9083|
## | 2000| 1| 2| 7| 8006|
## | 2000| 1| 3| 1| 11363|
## | 2000| 1| 4| 2| 13032|
## | 2000| 1| 5| 3| 12558|
## | 2000| 1| 6| 4| 12466|
colnames(birth_data)
## [1] "year" "month" "date_of_month" "day_of_week"
## [5] "births"
## [1] "data.frame"
## Year Total_Births
## 1 2000 4149598
## 2 2001 4110963
## 3 2002 4099313
## 4 2003 4163060
## 5 2004 4186863
## 6 2005 4211941
model <- lm( data = birthdata, Total_Births ~ Year)
summary(model)
##
## Call:
## lm(formula = Total_Births ~ Year, data = birthdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -107161 -87672 -50894 55664 234982
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28337549 14338700 1.976 0.0697 .
## Year -12054 7144 -1.687 0.1154
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 119500 on 13 degrees of freedom
## Multiple R-squared: 0.1796, Adjusted R-squared: 0.1165
## F-statistic: 2.847 on 1 and 13 DF, p-value: 0.1154
ggplot(data = birthdata, aes(x = Year, y = Total_Births)) +
geom_point() +
stat_smooth(method = "lm", se = FALSE)
## `geom_smooth()` using formula 'y ~ x'
res = resid(model)
plot(birthdata$Total_Births, res,
ylab="Residuals", xlab="Births")
abline(0, 0) # the horizon
check_model(model)
The study was based on the number of US births over the years. Based on the first plot, there does not seem to be a linear relationship between the number of births over the years. The residuals are also not normal. Based on the above plot the residual dots do not fall along the line. There is a constant variability of births across the years.