Project 1

##A.Introduction

The purpose of this analysis is to explore whether Friday the 13th truly lives up to its reputation as an unlucky day. The dataset comes from OpenIntro.org and is based on data collected by Scanlon et al. (1993) in the UK. They recorded information on traffic volume, shopping activity, and hospital admissions due to transport accidents between October 1989 and November 1992. It compares the counts from Friday the 6th (a typical Friday) and Friday the 13th of the same month. To answer the research question (Is Friday the 13th actually an unlucky day?) this analysis will focus on the variables: type (category of observation), sixth (counts on the 6th), thirteenth (counts on the 13th), diff (difference between the two days), and location (where data were collected). These variables allow for a comparison of activity levels and accident rates to determine whether human behavior or accident frequency differs on Friday the 13th compared to an ordinary Friday.

##B. Data Analysis To explore whether Friday the 13th is associated with changes in traffic, shopping, or accident activity, I will perform several data analysis steps in R. First, I will clean and inspect the dataset using functions such as summary() and head() to understand the structure and values of each variable. I will then use the functions filter() and select() to focus on the relevant variables.Next, I will calculate average counts for each observation type using group_by(type) and summarize(mean_sixth = mean(sixth), mean_thirteenth = mean(thirteenth)). To visualize the results, I will create a bar chart comparing the mean counts on the 6th versus the 13th for each observation type (traffic, shopping, and accident). This will show if there are fewer incidents on Friday the 13th. This visual and numerical summaries together will help determine whether there is evidence that Friday the 13th is truly “unlucky.”

#Visualize the dataset
install.packages("dplyr")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
library(tidyr)

# Load the dataset
friday <- read.csv("friday.csv")
head(friday)

##      type             date  sixth thirteenth diff location
## 1 traffic      1990,  July 139246     138548  698   7 to 8
## 2 traffic      1990,  July 134012     132908 1104  9 to 10
## 3 traffic 1991,  September 137055     136018 1037   7 to 8
## 4 traffic 1991,  September 133732     131843 1889  9 to 10
## 5 traffic  1991,  December 123552     121641 1911   7 to 8
## 6 traffic  1991,  December 121139     118723 2416  9 to 10

summary(friday)

##      type               date               sixth          thirteenth    
##  Length:61          Length:61          Min.   :     3   Min.   :     4  
##  Class :character   Class :character   1st Qu.:  3799   1st Qu.:  3848  
##  Mode  :character   Mode  :character   Median :  4942   Median :  4882  
##                                        Mean   : 24714   Mean   : 24448  
##                                        3rd Qu.:  6568   3rd Qu.:  6648  
##                                        Max.   :139246   Max.   :138548  
##       diff          location        
##  Min.   :-774.0   Length:61         
##  1st Qu.: -81.0   Class :character  
##  Median :  -3.0   Mode  :character  
##  Mean   : 266.3                     
##  3rd Qu.: 118.0                     
##  Max.   :4382.0

#First I will summarize average counts by observation type
friday_summary <- friday |>
  group_by(type) |>
  summarise(
    mean_sixth = mean(sixth, na.rm = TRUE),
    mean_thirteenth = mean(thirteenth, na.rm = TRUE),
    mean_diff = mean(diff, na.rm = TRUE))

friday_summary

## # A tibble: 3 × 4
##   type     mean_sixth mean_thirteenth mean_diff
##   <chr>         <dbl>           <dbl>     <dbl>
## 1 accident        7.5            10.8     -3.33
## 2 shopping     4971.           5017      -46.5 
## 3 traffic    128385.         126550.    1836.

# Bar chart comparing average counts on the 6th vs 13th
library(dplyr)
library(ggplot2)
library(tidyr)

friday_summary <- friday |>
  group_by(type) |>
  summarize(
    mean_sixth = mean(sixth, na.rm = TRUE),
    mean_thirteenth = mean(thirteenth, na.rm = TRUE)
  )

friday_long <- friday_summary %>%
  pivot_longer(cols = c(mean_sixth, mean_thirteenth),
               names_to = "day",
               values_to = "mean_count")

ggplot(friday_long, aes(x = type, y = mean_count, fill = day)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(
    title = "Average Counts on Friday the 6th vs Friday the 13th",
    x = "Observation Type",
    y = "Average Count",
    fill = "Day"
  ) +
  theme_minimal()

The goal of this analysis was to determine whether Friday the 13th is truly an “unlucky” day, as suggested by common superstition. Using data from OpenIntro.org that compares traffic counts, shopping activity, and hospital admissions due to accidents on Friday the 6th versus Friday the 13th, I examined whether there were meaningful differences in behavior or accident frequency between the two dates. Based on the summary statistics and visualizations, the results show that the average counts for both days are extremely similar across all three observation types. Traffic levels, shopping activity, and accident admissions did not significantly increase or decrease on Friday the 13th compared to a normal Friday. Overall, the data provides no evidence that Friday the 13th is associated with higher risk or unusual behavior.

While this dataset offers useful insight, it covers only a limited time period and geographic region in the United Kingdom. Future research could expand the analysis by incorporating additional years of data or examining other measures of risk and behavior, such as crime reports, emergency service calls, or insurance claims. With broader data, researchers could gain a more complete understanding of whether superstitions have any measurable effect on human activity.

Project 1

Daniela Ngassiki

2025-11-06