Data Analysis Final Paper

Miguel Seaton

December 13, 2024

Issue Description

I would like to investigate how federal holidays impact us financially. Also, do crime rates happen more during holidays or not? Federal holidays can affect us in many ways whether they are receiving or making payments as well as overspending or having an extra day to spend money rather than making money. Also, some businesses do not pay employees overtime or holiday pay and require employees to work on those holidays without additional pay.

Questions

Question 1: How do federal holidays impact employees be at work before, during and after?

Question 2: Is there an increase in late payments during the federal holidays?

Data Source

https://en.wikipedia.org/wiki/Federal_holidays_in_the_United_States

Documentation

https://github.com/rfordatascience/tidytuesday/blob/main/data/2024/2024-06-18/readme.md

Description of the Data

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.4.2
## Warning: package 'ggplot2' was built under R version 4.4.2
## Warning: package 'readr' was built under R version 4.4.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
federal_holidays <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-06-18/federal_holidays.csv')
## Rows: 11 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (4): date, date_definition, official_name, details
## dbl  (1): year_established
## date (1): date_established
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Federal_holidays
str(federal_holidays)
## spc_tbl_ [11 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ date            : chr [1:11] "January 1" "January 15–21" "February 15–21" "May 25–31" ...
##  $ date_definition : chr [1:11] "fixed date" "3rd monday" "3rd monday" "last monday" ...
##  $ official_name   : chr [1:11] "New Year's Day" "Birthday of Martin Luther King, Jr." "Washington's Birthday" "Memorial Day" ...
##  $ year_established: num [1:11] 1870 1983 1879 1868 2021 ...
##  $ date_established: Date[1:11], format: "1870-06-28" "1983-11-02" ...
##  $ details         : chr [1:11] "Celebrates the beginning of the Gregorian calendar year. Festivities include counting down to 12:00 midnight on"| __truncated__ "Honors Dr. Martin Luther King Jr., a civil rights leader who was born on January 15, 1929. Some municipalities "| __truncated__ "Honors George Washington, Founding Father, commander of the Continental Army, and the first U.S. president, who"| __truncated__ "Honors U.S. military personnel who have fought and died while serving in the United States Armed Forces. Many m"| __truncated__ ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   date = col_character(),
##   ..   date_definition = col_character(),
##   ..   official_name = col_character(),
##   ..   year_established = col_double(),
##   ..   date_established = col_date(format = ""),
##   ..   details = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>
# Summary of federal_holidays
summary(federal_holidays)
##      date           date_definition    official_name      year_established
##  Length:11          Length:11          Length:11          Min.   :1868    
##  Class :character   Class :character   Class :character   1st Qu.:1870    
##  Mode  :character   Mode  :character   Mode  :character   Median :1894    
##                                                           Mean   :1918    
##                                                           3rd Qu.:1954    
##                                                           Max.   :2021    
##                                                                           
##  date_established       details         
##  Min.   :1870-06-28   Length:11         
##  1st Qu.:1927-03-01   Class :character  
##  Median :1983-11-02   Mode  :character  
##  Mean   :1958-08-06                     
##  3rd Qu.:2002-08-25                     
##  Max.   :2021-06-17                     
##  NA's   :8
# Manually assign months based on the holiday names
federal_holidays <- federal_holidays %>%
  mutate(month = case_when(
    official_name == "New Year's Day" ~ 1,
    official_name == "Birthday of Martin Luther King, Jr." ~ 1,
    official_name == "Washington's Birthday" ~ 2,
    official_name == "Memorial Day" ~ 5,
    official_name == "Juneteenth National Independence Day" ~ 6,
    official_name == "Independence Day" ~ 7,
    official_name == "Labor Day" ~ 9,
    official_name == "Columbus Day" ~ 10,
    official_name == "Veterans Day" ~ 11,
    official_name == "Thanksgiving Day" ~ 11,
    official_name == "Christmas Day" ~ 12,
    TRUE ~ NA_integer_
  ))
# Calculate the frequency of holidays by month
holiday_frequency <- federal_holidays %>%
  count(month) %>%
  arrange(month)

# Print the holiday frequency
print(holiday_frequency)
## # A tibble: 9 × 2
##   month     n
##   <dbl> <int>
## 1     1     2
## 2     2     1
## 3     5     1
## 4     6     1
## 5     7     1
## 6     9     1
## 7    10     1
## 8    11     2
## 9    12     1
# Plot
ggplot(holiday_frequency, aes(x = month, y = n, color = factor(month))) +
  geom_line(color = "red", linetype = "solid") +
  geom_point(size = 18, alpha = 0.9) +
  geom_text(aes(label = n), vjust = -1, color = "black") +
  scale_x_continuous(breaks = 1:12, 
                     labels = month.abb) +
  scale_color_manual(values = c("#FF6666", "#66B2FF", "#66FF66", "#FFFF66", 
                                "#FF66B2", "#66FFFF", "#FF9966", "#9966FF", 
                                "#99FF66", "#FFB266", "#B266FF", "#66FFB2")) +
  labs(title = "Federal Holidays by Month",
       x = "Month", 
       y = "Number of Holidays") +
  theme_minimal() +
  theme(legend.position = "none")

Cleaning and Preparation

The first step was to load the data needed for my project.

library(tidyverse)

# Load the data
federal_holidays <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-06-18/federal_holidays.csv')
## Rows: 11 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (4): date, date_definition, official_name, details
## dbl  (1): year_established
## date (1): date_established
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

To see my data and what I was able to work with I had to see the imported data.

# Federal_holidays structure
str(federal_holidays)
## spc_tbl_ [11 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ date            : chr [1:11] "January 1" "January 15–21" "February 15–21" "May 25–31" ...
##  $ date_definition : chr [1:11] "fixed date" "3rd monday" "3rd monday" "last monday" ...
##  $ official_name   : chr [1:11] "New Year's Day" "Birthday of Martin Luther King, Jr." "Washington's Birthday" "Memorial Day" ...
##  $ year_established: num [1:11] 1870 1983 1879 1868 2021 ...
##  $ date_established: Date[1:11], format: "1870-06-28" "1983-11-02" ...
##  $ details         : chr [1:11] "Celebrates the beginning of the Gregorian calendar year. Festivities include counting down to 12:00 midnight on"| __truncated__ "Honors Dr. Martin Luther King Jr., a civil rights leader who was born on January 15, 1929. Some municipalities "| __truncated__ "Honors George Washington, Founding Father, commander of the Continental Army, and the first U.S. president, who"| __truncated__ "Honors U.S. military personnel who have fought and died while serving in the United States Armed Forces. Many m"| __truncated__ ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   date = col_character(),
##   ..   date_definition = col_character(),
##   ..   official_name = col_character(),
##   ..   year_established = col_double(),
##   ..   date_established = col_date(format = ""),
##   ..   details = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>
# Federal_holidays summary
summary(federal_holidays)
##      date           date_definition    official_name      year_established
##  Length:11          Length:11          Length:11          Min.   :1868    
##  Class :character   Class :character   Class :character   1st Qu.:1870    
##  Mode  :character   Mode  :character   Mode  :character   Median :1894    
##                                                           Mean   :1918    
##                                                           3rd Qu.:1954    
##                                                           Max.   :2021    
##                                                                           
##  date_established       details         
##  Min.   :1870-06-28   Length:11         
##  1st Qu.:1927-03-01   Class :character  
##  Median :1983-11-02   Mode  :character  
##  Mean   :1958-08-06                     
##  3rd Qu.:2002-08-25                     
##  Max.   :2021-06-17                     
##  NA's   :8

Step three was assigning month to holidays which was the most difficult.

# Assigned months based on the holiday names to the data because the dates were not properly transferring 
federal_holidays <- federal_holidays %>%
  mutate(month = case_when(
    official_name == "New Year's Day" ~ 1,
    official_name == "Birthday of Martin Luther King, Jr." ~ 1,
    official_name == "Washington's Birthday" ~ 2,
    official_name == "Memorial Day" ~ 5,
    official_name == "Juneteenth National Independence Day" ~ 6,
    official_name == "Independence Day" ~ 7,
    official_name == "Labor Day" ~ 9,
    official_name == "Columbus Day" ~ 10,
    official_name == "Veterans Day" ~ 11,
    official_name == "Thanksgiving Day" ~ 11,
    official_name == "Christmas Day" ~ 12,
    TRUE ~ NA_integer_
  ))

Step four I calculated the frequency of holidays per month.

# Calculate the frequency of holidays by month
holiday_frequency <- federal_holidays %>%
  count(month) %>%
  arrange(month)

# Print the holiday frequency
print(holiday_frequency)
## # A tibble: 9 × 2
##   month     n
##   <dbl> <int>
## 1     1     2
## 2     2     1
## 3     5     1
## 4     6     1
## 5     7     1
## 6     9     1
## 7    10     1
## 8    11     2
## 9    12     1

The final step was visualizing the data.

# Plot
ggplot(holiday_frequency, aes(x = month, y = n, color = factor(month))) +
  geom_line(color = "red", linetype = "solid") +
  geom_point(size = 18, alpha = 0.9) +
  geom_text(aes(label = n), vjust = -1, color = "black") +
  scale_x_continuous(breaks = 1:12, 
                     labels = month.abb) +
  scale_color_manual(values = c("#FF6666", "#66B2FF", "#66FF66", "#FFFF66", "#FF66B2", "#66FFFF", "#FF9966", "#9966FF", "#99FF66", "#FFB266", "#B266FF", "#66FFB2")) +
  labs(title = "Federal Holidays by Month",
       x = "Month", 
       y = "Number of Holidays") +
  theme_minimal() +
  theme(legend.position = "none")

Final Results

I took the months through out the year and separated the holidays by months. After visualizing what months had 1,2 and no holidays I came to the conclusion that some months affect individuals more than others related to question 1 and 2.

Question 1: How do federal holidays impact employees at work before, during and after? I have come to the conclusion based on the data that March, April and August have no federal holiday which helps employees have a consistent work schedule and pay without any known disturbances. Months like January and November have two holidays which can disrupt the employees pay by either receiving the pay stub too early which can throw off payments normally scheduled on certain days. Employees that get paid after the holiday can also interfere with payments causing individuals to have late payments and be fined an additional payment fee if no money is allocated and or allotted in case of emergencies. Having months like November(2 holidays), December(1 holiday) and January(2 holidays) back to back can have a major impact on individuals with limited pay, more time off, and spending of additional purchases outside of the normal monthly payments. Some employees work during federal holidays which can require overtime (more than usual) as well as less employees calling in during the holidays which create a workload increase for those who are at work.

Question 2: Is there an increase in late payments during the federal holidays? I was not able to attain this information with the data used but I believe that during the end of the year or beginning of the fiscal year (November, December, January) can lead to major offsets of payments as well as amount of payments received whether it is more or less than the more consistant payments through out the year can lead to an increase of late or missed payments compared to other times of the year like March and April.