Source file ⇒ Holiday_Birthdays.Rmd
1. Create a new data table, DailyBirths, that will add all the births for each day across all the states. Plot out daily births vs date.
| date | total |
|---|---|
| 1969-01-01 | 8486 |
| 1969-01-02 | 9002 |
| 1969-01-03 | 9542 |
| 1969-01-04 | 8960 |
| 1969-01-05 | 8390 |
| 1969-01-06 | 9560 |
2. To examine seasonality in birth rates, look at the number of births aggregated over all the years by
a) each week
| Week | total |
|---|---|
| 1 | 1252303 |
| 2 | 1307253 |
| 3 | 1321998 |
| 4 | 1319819 |
| 5 | 1314877 |
| 6 | 1322192 |
b) each month
| Month | total |
|---|---|
| 1 | 5759165 |
| 2 | 5362585 |
| 3 | 5868501 |
| 4 | 5560775 |
| 5 | 5785348 |
| 6 | 5758571 |
c) each Julian day
| Julianday | total |
|---|---|
| 1 | 160369 |
| 2 | 169896 |
| 3 | 180036 |
| 4 | 182854 |
| 5 | 184145 |
| 6 | 186726 |
3. To examine patterns within the week, look at the number of births by day of the week.
| Week | Wday | total |
|---|---|---|
| 1 | 1 | 160224 |
| 1 | 2 | 183159 |
| 1 | 3 | 188855 |
| 1 | 4 | 185469 |
| 1 | 5 | 182825 |
| 1 | 6 | 185522 |
4. Pick a two-year span of the Birthdays that falls in the 1980s, say, 1980/1981. Extract out the data just in this interval, calling it MyTwoYears. Plotu out the births in this two-year span day by day. Color each date according to its day of the week. Explain the pattern that you see.
| date | wday | total |
|---|---|---|
| 1980-01-01 | Tues | 4576 |
| 1980-01-02 | Wed | 4112 |
| 1980-01-03 | Thurs | 5544 |
| 1980-01-04 | Fri | 4411 |
| 1980-01-05 | Sat | 4725 |
| 1980-01-06 | Sun | 3656 |
5. A few days each year don’t follow the pattern in (4). We’re going to examine the hypothesis that these are holidays. You can find a data set listing US federal holidays at “http://tiny.cc/dcf/US-Holidays.csv”.
6. Add a couple of layers to your plot from (4).
geom_vline() glyph. You can give a data = argument to geom_vline() to tell it to plot out the information from Holdays rather than MyTwoYears.## Warning in c(1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L, 1985L,
## 1986L, : longer object length is not a multiple of shorter object length
geom_text glyph.## Warning in c(1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L, 1985L,
## 1986L, : longer object length is not a multiple of shorter object length
## Warning: Removed 713 rows containing missing values (geom_text).
7. Join MyTwoYears and Holidays.
| date | wday | total | holiday | year |
|---|---|---|---|---|
| 1980-01-01 | Tues | 4576 | New Year’s Day | 1980 |
| 1980-01-02 | Wed | 4112 | NA | NA |
| 1980-01-03 | Thurs | 5544 | NA | NA |
| 1980-01-04 | Fri | 4411 | NA | NA |
| 1980-01-05 | Sat | 4725 | NA | NA |
| 1980-01-06 | Sun | 3656 | NA | NA |
8. Mutate the holiday variable to be “yes” or “no”, depending on whether the day is a holiday or not. As an appropriate argument to mutate would be is_holiday = ifelse(is.na(holiday), "no", "yes").
| date | wday | total | holiday | year | is_holiday |
|---|---|---|---|---|---|
| 1980-01-01 | Tues | 4576 | New Year’s Day | 1980 | yes |
| 1980-01-02 | Wed | 4112 | NA | NA | no |
| 1980-01-03 | Thurs | 5544 | NA | NA | no |
| 1980-01-04 | Fri | 4411 | NA | NA | no |
| 1980-01-05 | Sat | 4725 | NA | NA | no |
| 1980-01-06 | Sun | 3656 | NA | NA | no |
9. Plot out the daily pattern over the two years of MyTwoYears, setting the size of the symbol to is_holiday. Is your hypothesis in (5) correct? If yes, which holidays do not follow the pattern?
## Warning: Removed 713 rows containing missing values (geom_text).