Overview

This project explores whether Friday the 13th is more unlucky than other days of the year. Additionally, this project considers whether superstitions surrounding Friday the 13th affect people’s behavior on that day.

Introduction

The data for this project was collected by researchers T.J. Scanlon, R.N. Luben, F.L. Scanlon, and N. Singleton in the United Kingdom between October 1989 and November 1992, and was later published in the datasets library in R. The data collected by Scanlon et al. compares traffic congestion, shopping patterns, and accident frequency on Friday the 6th and Friday the 13th over the course of three years.

The three observations (traffic, shopping, and accident) represent drivers, shoppers, and residents on the given dates. For the traffic observations, the data consider the numbers of cars flowing through the junctions numbered 7 to 8 and the junctions numbered 9 to 10 on the M25 highway. For the shopping observations, the data indicate the total number of shoppers in nine different supermarkets in the southeast region of England. Finally, the accident observations reflect the number of transportation-accident-related emergency admissions at an area hospital.

Exploring the Data

# Loading libraries
library(openintro)
library(ggplot2)
library(reshape2)
# Storing dataset into environment
friday <- friday

#Exploring dataset
str(friday)
## 'data.frame':    61 obs. of  6 variables:
##  $ type      : Factor w/ 3 levels "accident","shopping",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ date      : Factor w/ 6 levels "1989,  October",..: 2 2 4 4 3 3 5 5 6 6 ...
##  $ sixth     : int  139246 134012 137055 133732 123552 121139 128293 124631 124609 117584 ...
##  $ thirteenth: int  138548 132908 136018 131843 121641 118723 125532 120249 122770 117263 ...
##  $ diff      : int  698 1104 1037 1889 1911 2416 2761 4382 1839 321 ...
##  $ location  : Factor w/ 12 levels "7 to 8","9 to 10",..: 1 2 1 2 1 2 1 2 1 2 ...
head(friday)
##      type             date  sixth thirteenth diff location
## 1 traffic      1990,  July 139246     138548  698   7 to 8
## 2 traffic      1990,  July 134012     132908 1104  9 to 10
## 3 traffic 1991,  September 137055     136018 1037   7 to 8
## 4 traffic 1991,  September 133732     131843 1889  9 to 10
## 5 traffic  1991,  December 123552     121641 1911   7 to 8
## 6 traffic  1991,  December 121139     118723 2416  9 to 10

The dataset contains 10 traffic observations, 45 shopping observations, and 6 accident observations. Each observation includes discrete data (representing number of cars, number of shoppers, and total number of emergency-room admittances, respectively) recorded on Friday the 6th and Friday the 13th, noting the difference between the two date variables. The location and date of each observation (which are categorical) are also recorded.

Organizing and visualizing the data:

#Creating subsets for each category and reshaping each subset for ggplot2

traffic <- subset(friday, type=="traffic")
traffic <- melt(traffic, id.vars=c("type", "date", "location"), measure.vars=c("sixth", "thirteenth")) 

shopping <- subset(friday, type=="shopping")
shopping <- melt(shopping, id.vars=c("type", "date", "location"), measure.vars=c("sixth", "thirteenth")) 

accident <- subset(friday, type=="accident")
accident <- melt(accident, id.vars=c("type", "date", "location"), measure.vars=c("sixth", "thirteenth"))  
#side-by-side Traffic boxplot

ggplot(aes(y = value, x = variable), data = traffic) +
 geom_boxplot(color="black", fill="darkorange", alpha=0.6) +
  stat_summary(fun.y=mean, geom="point", shape=23, size=4) +
 labs(title ="Traffic on Friday the 6th vs. Friday the 13th", x = "Friday", y = "Number of Cars") +
 theme(
  plot.title = element_text(size=18, face="bold", vjust=2, hjust=0.5, lineheight=0.6), panel.background = element_rect(fill = '#FFF8DC'), axis.text.x=element_text(angle=20, size=12, vjust=0.4)
  )

#side-by-side Shopping boxplot

ggplot(aes(y = value, x = variable), data = shopping) +
 geom_boxplot(color="black", fill="magenta", alpha=0.4) +
  stat_summary(fun.y=mean, geom="point", shape=23, size=4) +
 labs(title ="Shopping on Friday the 6th vs. Friday the 13th", x = "Friday", y = "Number of Shoppers") +
 theme(
  plot.title = element_text(size=18, face="bold", vjust=2, hjust=0.5, lineheight=0.6), panel.background = element_rect(fill = '#FFF8DC'), axis.text.x=element_text(angle=20, size=12, vjust=0.4)
  )

#side-by-side Accident boxplot

ggplot(aes(y = value, x = variable), data = accident) +
 geom_boxplot(color="black", fill="limegreen", alpha=0.5) +
  stat_summary(fun.y=mean, geom="point", shape=23, size=4) +
 labs(title ="Accidents on Friday the 6th vs. Friday the 13th", x = "Friday", y = "Number of Emergency Room Admissions") +
 theme(
  plot.title = element_text(size=18, face="bold", vjust=2, hjust=0.5, lineheight=0.6), panel.background = element_rect(fill = '#FFF8DC'), axis.text.x=element_text(angle=20, size=12, vjust=0.4)
  )

Analysis

To analyze the data, we will conduct t-tests for each of the categories.

Traffic t-test

\(H_0: \mu_{sixth} = \mu_{thirteenth}\)

\(H_A: \mu_{sixth} > \mu_{thirteenth}\)

The alternative hypothesis theorizes that superstitions about Friday the 13th affect people’s behavior, causing them to stay home rather than go out. Thus, the number of cars on the road on Friday the 6th would be significantly greater than the number of cars on the road on Friday the 13th.

#Reverting back to original subset to prepare for t-test
traffic <- subset(friday, type=="traffic")

#Conducting t-test for traffic observations
t.test(traffic$diff, alternative="greater")$p.value
## [1] 0.0004030922

In conducting a t-test, the p-value of the traffic data is significantly less than 0.05, allowing us to reject the null hypothesis in favor of the alternative hypothesis. We can therefore conclude that there are significantly fewer cars on the road on Friday the 13th than on Friday the 6th.

Shopping t-test

\(H_0: \mu_{sixth} = \mu_{thirteenth}\)

\(H_A: \mu_{sixth} > \mu_{thirteenth}\)

Similar to the traffic theory, this alternative hypothesis posits that superstitions about Friday the 13th cause people to change their normal behavior and not go out to the supermarket. Here, the number of shoppers on Friday the 6th would be significantly greater than the number of shoppers on Friday the 13th.

#Reverting back to original subset to prepare for t-test

shopping <- subset(friday, type=="shopping")

#Conducting t-test for traffic observations

t.test(shopping$diff, alternative="greater")$p.value
## [1] 0.9591933

The p-value of the shopping data is extremely high; we therefore fail to reject the null hypothesis. Because of this, we cannot conclude that there is any significant evidence of a drop in shopping activity on Friday the 13th.

Accident t-test

\(H_0: \mu_{sixth} = \mu_{thirteenth}\)

\(H_A: \mu_{sixth} < \mu_{thirteenth}\)

This alternative hypothesis presumes that if Friday the 13th is truly an unlucky day in which bad things happen at an unusually high frequency, fewer instances of transportation-related accidents would occur on Friday the 6th in comparison to Friday the 13th.

#Reverting back to original subset to prepare for t-test

accident <- subset(friday, type=="accident")

#Conducting t-test for accident observations

t.test(accident$diff, alternative="less")$p.value
## [1] 0.021097

In this t-test, the p-value of the accident data is significantly less than 0.05, therefore allowing us to reject the null hypothesis in favor of the alternative hypothesis. We can therefore conclude that there is evidence suggesting that significantly more transportation-related accidents resulting in emergency room visits are reported on Friday the 13th than on Friday the 6th.

Conclusions

Of our three types of observations, two (traffic and shopping) focus on people’s superstitions about Friday the 13th. We can conclude that there are significantly fewer cars on the road on Friday the 13th, suggesting that people choose not to drive on that day because of their superstition. However, the “shopping” category, which has significantly more observations than “traffic,” does not allow for the same conclusion, as it effectively shows that more people shopped on the 13th than the 6th. Because of this discrepancy, we cannot conclude that Friday the 13th has a strong bearing on people’s behavior.

The accident category focuses on the ostensible “unluckiness” of Friday the 13th. Here, our t-test leads us to conclude that there is evidence of a higher frequency of transportation-related accidents on Friday the 13th, indeed suggesting that accidents are more likely on that day.

Limitations

It is glaringly obvious that the shopping data contains many more observations than the traffic and accident data. While the shopping observations were collected from nine different supermarkets (totaling 45 observations), the traffic observations included only two intersections, and the accident observations came from a single hospital. While it is perhaps more challenging to collect data from public institutions (versus privately-owned supermarkets), the researchers should have made more of an effort to collect data of equal depth and breadth in each category. If they had, our results would be that much stronger.

It is also quite difficult to compare these observations in regards to superstitions. On any given Friday, there can be any number of factors that affect whether someone leaves their home: the weather, construction on roadways, sales at the supermarket, proximity to holidays, and so on. This, too, can be said about transportation-related accidents; when it is raining or snowing, for example, car accidents are more likely.

In regards to the “accidents” category, it is especially difficult to be confident in our conclusion because of the limits of the data. The data for accidents in this study merely represent transportation-related accidents, limiting the observations to accidents related to car, bicycle, public transit, etc. There are many other kinds of accidents: kitchen accidents, yardwork-related accidents, occupational accidents, slips, and falls, and various other personal injuries. By focusing solely on transportation-related accidents, the researchers once again put a severe limitation on their data, possibly skewing results. Overall, this is a dataset with many avoidable limitations, making the conclusions that we have reached dubious at best.


This document was produced as a final project for MAT 143H - Introduction to Statistics (Honors) at North Shore Community College.
The course was led by Professor Billy Jackson.
Student Name: Jessica Cook Semester: Spring 2019