##A.Introduction
The purpose of this analysis is to explore whether Friday the 13th truly lives up to its reputation as an unlucky day. The dataset comes from OpenIntro.org and is based on data collected by Scanlon et al. (1993) in the UK. They recorded information on traffic volume, shopping activity, and hospital admissions due to transport accidents between October 1989 and November 1992. It compares the counts from Friday the 6th (a typical Friday) and Friday the 13th of the same month. To answer the research question (Is Friday the 13th actually an unlucky day?) this analysis will focus on the variables: type (category of observation), sixth (counts on the 6th), thirteenth (counts on the 13th), diff (difference between the two days), and location (where data were collected). These variables allow for a comparison of activity levels and accident rates to determine whether human behavior or accident frequency differs on Friday the 13th compared to an ordinary Friday.
##B. Data Analysis To explore whether Friday the 13th is associated with changes in traffic, shopping, or accident activity, I will perform several data analysis steps in R. First, I will clean and inspect the dataset using functions such as summary() and head() to understand the structure and values of each variable. I will then use the functions filter() and select() to focus on the relevant variables.Next, I will calculate average counts for each observation type using group_by(type) and summarize(mean_sixth = mean(sixth), mean_thirteenth = mean(thirteenth)). To visualize the results, I will create a bar chart comparing the mean counts on the 6th versus the 13th for each observation type (traffic, shopping, and accident). This will show if there are fewer incidents on Friday the 13th. This visual and numerical summaries together will help determine whether there is evidence that Friday the 13th is truly “unlucky.”
#Visualize the dataset
install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr)
# Load the dataset
friday <- read.csv("friday.csv")
head(friday)
## type date sixth thirteenth diff location
## 1 traffic 1990, July 139246 138548 698 7 to 8
## 2 traffic 1990, July 134012 132908 1104 9 to 10
## 3 traffic 1991, September 137055 136018 1037 7 to 8
## 4 traffic 1991, September 133732 131843 1889 9 to 10
## 5 traffic 1991, December 123552 121641 1911 7 to 8
## 6 traffic 1991, December 121139 118723 2416 9 to 10
summary(friday)
## type date sixth thirteenth
## Length:61 Length:61 Min. : 3 Min. : 4
## Class :character Class :character 1st Qu.: 3799 1st Qu.: 3848
## Mode :character Mode :character Median : 4942 Median : 4882
## Mean : 24714 Mean : 24448
## 3rd Qu.: 6568 3rd Qu.: 6648
## Max. :139246 Max. :138548
## diff location
## Min. :-774.0 Length:61
## 1st Qu.: -81.0 Class :character
## Median : -3.0 Mode :character
## Mean : 266.3
## 3rd Qu.: 118.0
## Max. :4382.0
#First I will summarize average counts by observation type
friday_summary <- friday |>
group_by(type) |>
summarise(
mean_sixth = mean(sixth, na.rm = TRUE),
mean_thirteenth = mean(thirteenth, na.rm = TRUE),
mean_diff = mean(diff, na.rm = TRUE))
friday_summary
## # A tibble: 3 × 4
## type mean_sixth mean_thirteenth mean_diff
## <chr> <dbl> <dbl> <dbl>
## 1 accident 7.5 10.8 -3.33
## 2 shopping 4971. 5017 -46.5
## 3 traffic 128385. 126550. 1836.
# Bar chart comparing average counts on the 6th vs 13th
library(dplyr)
library(ggplot2)
library(tidyr)
friday_summary <- friday |>
group_by(type) |>
summarize(
mean_sixth = mean(sixth, na.rm = TRUE),
mean_thirteenth = mean(thirteenth, na.rm = TRUE)
)
friday_long <- friday_summary %>%
pivot_longer(cols = c(mean_sixth, mean_thirteenth),
names_to = "day",
values_to = "mean_count")
ggplot(friday_long, aes(x = type, y = mean_count, fill = day)) +
geom_bar(stat = "identity", position = "dodge") +
labs(
title = "Average Counts on Friday the 6th vs Friday the 13th",
x = "Observation Type",
y = "Average Count",
fill = "Day"
) +
theme_minimal()
The goal of this analysis was to determine whether Friday the 13th is truly an “unlucky” day, as suggested by common superstition. Using data from OpenIntro.org that compares traffic counts, shopping activity, and hospital admissions due to accidents on Friday the 6th versus Friday the 13th, I examined whether there were meaningful differences in behavior or accident frequency between the two dates. Based on the summary statistics and visualizations, the results show that the average counts for both days are extremely similar across all three observation types. Traffic levels, shopping activity, and accident admissions did not significantly increase or decrease on Friday the 13th compared to a normal Friday. Overall, the data provides no evidence that Friday the 13th is associated with higher risk or unusual behavior.
While this dataset offers useful insight, it covers only a limited time period and geographic region in the United Kingdom. Future research could expand the analysis by incorporating additional years of data or examining other measures of risk and behavior, such as crime reports, emergency service calls, or insurance claims. With broader data, researchers could gain a more complete understanding of whether superstitions have any measurable effect on human activity.