Source file ⇒ Holiday_Birthdays.Rmd

1. Create a new data table, DailyBirths, that will add all the births for each day across all the states. Plot out daily births vs date.

date total
1969-01-01 8486
1969-01-02 9002
1969-01-03 9542
1969-01-04 8960
1969-01-05 8390
1969-01-06 9560

2. To examine seasonality in birth rates, look at the number of births aggregated over all the years by

a) each week

Week total
1 1252303
2 1307253
3 1321998
4 1319819
5 1314877
6 1322192

b) each month

Month total
1 5759165
2 5362585
3 5868501
4 5560775
5 5785348
6 5758571

c) each Julian day

Julianday total
1 160369
2 169896
3 180036
4 182854
5 184145
6 186726

3. To examine patterns within the week, look at the number of births by day of the week.

Week Wday total
1 1 160224
1 2 183159
1 3 188855
1 4 185469
1 5 182825
1 6 185522

4. Pick a two-year span of the Birthdays that falls in the 1980s, say, 1980/1981. Extract out the data just in this interval, calling it MyTwoYears. Plotu out the births in this two-year span day by day. Color each date according to its day of the week. Explain the pattern that you see.

date wday total
1980-01-01 Tues 4576
1980-01-02 Wed 4112
1980-01-03 Thurs 5544
1980-01-04 Fri 4411
1980-01-05 Sat 4725
1980-01-06 Sun 3656

5. A few days each year don’t follow the pattern in (4). We’re going to examine the hypothesis that these are holidays. You can find a data set listing US federal holidays at “http://tiny.cc/dcf/US-Holidays.csv”.

6. Add a couple of layers to your plot from (4).

  1. Draw a vertical bar at each date which is a holiday. Youll use the geom_vline() glyph. You can give a data = argument to geom_vline() to tell it to plot out the information from Holdays rather than MyTwoYears.
## Warning in c(1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L, 1985L,
## 1986L, : longer object length is not a multiple of shorter object length

  1. Add a text label to each of the vertical bars to identify which holiday it is. Use the geom_text glyph.
## Warning in c(1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L, 1985L,
## 1986L, : longer object length is not a multiple of shorter object length
## Warning: Removed 713 rows containing missing values (geom_text).

7. Join MyTwoYears and Holidays.

date wday total holiday year
1980-01-01 Tues 4576 New Year’s Day 1980
1980-01-02 Wed 4112 NA NA
1980-01-03 Thurs 5544 NA NA
1980-01-04 Fri 4411 NA NA
1980-01-05 Sat 4725 NA NA
1980-01-06 Sun 3656 NA NA

8. Mutate the holiday variable to be “yes” or “no”, depending on whether the day is a holiday or not. As an appropriate argument to mutate would be is_holiday = ifelse(is.na(holiday), "no", "yes").

date wday total holiday year is_holiday
1980-01-01 Tues 4576 New Year’s Day 1980 yes
1980-01-02 Wed 4112 NA NA no
1980-01-03 Thurs 5544 NA NA no
1980-01-04 Fri 4411 NA NA no
1980-01-05 Sat 4725 NA NA no
1980-01-06 Sun 3656 NA NA no

9. Plot out the daily pattern over the two years of MyTwoYears, setting the size of the symbol to is_holiday. Is your hypothesis in (5) correct? If yes, which holidays do not follow the pattern?

## Warning: Removed 713 rows containing missing values (geom_text).