library(tidyverse)
library(RColorBrewer)
library(lubridate)
library(plotly)
library(highcharter)
library(leaflet)In Search of Bigfoot
Source: pabradyphoto/Getty Images
Introduction
The data set provides information about reported Bigfoot sightings as documented by the The Bigfoot Field Researchers Organization (BFRO). With this data set, I plan to investigate if there is a significant linear relationship between the number of Bigfoot sightings and weather visibility. I chose this topic and data set because it is an interesting topic to learn more about. Additionally, it is personal to me because when I was in elementary school, I was really fascinated by this mystery creature.
The way that the sightings data was collected was through individuals sending in their sightings and any additional information. Then the BFRO compiles the information and makes a report documenting everything. The BFRO will then give the sighting a classification depending on the proof given to them. Here they explain exactly how an individual can collect evidence, and below I explain the classification grades.
Variables I will use:
- state: The name of the state the sighting was reported in
- season: The name of the season the sighting was reported in
- latitude: The latitude coordinate of the sighting
- longitude: The longitude coordinate of the sighting
- date: The year, month, and day of the sighting
- classification: The classification grade of the sighting. More info here
Grade A: reports involve clear sightings in circumstances where misinterpretation or misidentification of other animals can be ruled out with greater confidence.
Grade B: incidents where a possible Sasquatch was observed at a great distance or in poor lighting conditions and incidents in any other circumstance that did not afford a clear view of the subject
Grade C: Most second-hand reports, and any third-hand reports, or stories with an untraceable sources, are considered Class C, because of the high potential for inaccuracy. Those reports are kept in BFRO archives but are very rarely listed publicly in this database.
- conditions: Weather conditions at the time of the sighting
- visibility: Weather visibility at the time of the sighting (in miles)
Background Research
According to Britannica, Bigfoot is a large, hairy, humanlike creature believed by some people to exist in the northwestern United States and western Canada. It seems to represent the North American counterpart of the Himalayan region’s mythical monster, the Abominable Snowman, or Yeti. The name “Sasquatch” derives from the Salish word se’sxac, which means “wild men.” The creature is also commonly called Sasquatch. They also say that, Bigfoot is variably described as a primate ranging from 6 to 15 feet tall, standing on two feet. It is important to note that most scientist do not believe in the existence of Bigfoot and believe the creature to be a hoax.
Load libraries
Set working directory and load data
setwd("/Users/bryana/Documents/Data110/Datasets")
sightings <- read_csv("bfro_reports_geocoded.csv")Clean the data:
Selecting only some headers
sightings2 <- sightings |>
select(state, season, latitude, longitude, date, classification, temperature_high, temperature_mid, temperature_low, humidity, cloud_cover, moon_phase, conditions, visibility, wind_speed) |>
group_by(state, season, latitude, longitude, date, classification, temperature_high, temperature_mid, temperature_low, humidity, cloud_cover, moon_phase, conditions, visibility, wind_speed)Remove any NA’s found in all columns
sightings_clean <- sightings2 |>
filter(!is.na(state)) |>
filter(!is.na(season)) |>
filter(!is.na(latitude)) |>
filter(!is.na(longitude)) |>
filter(!is.na(date)) |>
filter(!is.na(classification)) |>
filter(!is.na(temperature_high)) |>
filter(!is.na(temperature_mid)) |>
filter(!is.na(temperature_low)) |>
filter(!is.na(humidity)) |>
filter(!is.na(cloud_cover)) |>
filter(!is.na(moon_phase)) |>
filter(!is.na(conditions)) |>
filter(!is.na(visibility)) |>
filter(!is.na(wind_speed))Linear regression analysis:
I want to investigate if reports of Bigfoot sightings increase as the visibility increases.
Count the number of sightings for each recorded level of weather visibility
visibility_count <- sightings_clean |>
group_by(visibility) |>
summarise(count = n())Linear regression model using weather visibility to predict the number of sightings.
lm_model <- lm(count ~ visibility, data = visibility_count)
summary(lm_model)
Call:
lm(formula = count ~ visibility, data = visibility_count)
Residuals:
Min 1Q Median 3Q Max
-23.03 -10.99 -7.86 -1.02 849.80
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24.4082 5.3248 4.584 6.68e-06 ***
visibility -0.6274 0.2364 -2.654 0.00837 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 53.13 on 303 degrees of freedom
Multiple R-squared: 0.02272, Adjusted R-squared: 0.01949
F-statistic: 7.044 on 1 and 303 DF, p-value: 0.008374
plot(lm_model)The equation for my model is: Bigfoot Sightings = 24.4082 − 0.6274 × Visibility + ϵ
Based on an adjusted R^2 value of 0.01949 it suggests that the relationship between visibility and number of sightings only accounts for 1.949% of the variation in the data.
Based on a p-value of 0.008374 and using a significance level of 0.05, I can conclude that there is a significant linear relationship between visibility and number of sightings.
Graphs
Bar graph
I’d like to compare the sightings each season has and see what type of sighting classification is most common in each season.
Count the number of sightings for each season
seasons_count <- sightings_clean %>%
group_by(season, classification) %>%
summarize(count = n())`summarise()` has grouped output by 'season'. You can override using the
`.groups` argument.
Reorder the seasons
season_order <- c("Winter", "Spring", "Summer", "Fall", "Unknown")
# convert the "season" column to a character vector
seasons_count$season <- as.character(seasons_count$season)Interactive Bar Graph Visualization
highchart() |>
hc_title(text = "Bigfoot Sightings Classifications Throughout Each Seasons") |>
hc_caption(text = "Data Provided By: Bigfoot Field Researchers Organization") |>
hc_yAxis(title = list(text = "Count")) |>
hc_chart(type = "column") |>
hc_xAxis(categories = season_order) |>
hc_yAxis(title = list(text = "Count")) |>
hc_add_series(data = seasons_count, type = "column", hcaes(x = season, y = count, group = classification), colorByPoint = TRUE) |>
hc_legend(layout = "vertical", align = "right", verticalAlign = "top", itemMarginTop = 5) |>
hc_add_theme(hc_theme_darkunica())Visualization Comments
Based on the graph, it is apparent that summer has the most sighting overall when compared to the rest. A potential reasoning behind this is that summer and winter are popular times for outdoor activities, which may encourage more outdoor activities. With more people outdoor it increases the likelihood of sightings. On the other hand, fall has significantly less sightings than the other seasons. This could have to do with the unpredictable weather during fall that may discourage outdoor activities, or it has to do with animal behavior during certain times of the year.
Line Graph
I want to investigate if the sightings have increased over the years and
Pull the year from the date format
sightings_clean$year <- year(as.POSIXct(sightings_clean$date, format = "%Y/%m/%d"))Count how many of each class were in each year
yearly_class <- sightings_clean |>
group_by(year, classification) |>
summarize(count = n()) %>%
spread(key = classification, value = count, fill = 0)`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.
# Add the numbers together to figure out the total inspections per year
yearly_class$total <- rowSums(yearly_class[, c("Class A", "Class B", "Class C")])Rename the columns for consistency
yearly_class <- yearly_class %>%
rename(class_a = "Class A", class_b = "Class B", class_c = "Class C")Reorder the years
yearly_class$year<-factor(yearly_class$year, levels=c("1869", "1921", "1925", "1930", "1938", "1941", "1942", "1944", "1947", "1948", "1949", "1950", "1952", "1953", "1954", "1955", "1956", "1957", "1958", "1959", "1960", "1961", "1962", "1963", "1964", "1965", "1966", "1967", "1968", "1969", "1970", "1971", "1972", "1973", "1974", "1975", "1976", "1977", "1978", "1979", "1980", "1981", "1982", "1983", "1984", "1985", "1986", "1987", "1988", "1989", "1990", "1991", "1992", "1993", "1994", "1995", "1996", "1997", "1998", "1999", "2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015", "2016", "2017", "2018", "2019", "2020", "2021", "2022", "2023"
))Interactive Line Graph
hchart(yearly_class, "line", hcaes(x = year)) |>
hc_title(text = "Bigfoot Sightings Classifications Throughout The Years") |>
hc_caption(text = "Data Provided By: Bigfoot Field Researchers Organization") |>
hc_yAxis(title = list(text = "Number of Classification")) |>
hc_add_series(name = "Class A", data = yearly_class$class_a, color = brewer.pal(3, "Set2")[1], showInLegend = TRUE) |>
hc_add_series(name = "Class B", data = yearly_class$class_b, color = brewer.pal(3, "Set2")[2], showInLegend = TRUE) |>
hc_add_series(name = "Class C", data = yearly_class$class_c, color = brewer.pal(3, "Set2")[3], showInLegend = TRUE) |>
hc_add_series(name = "Combined", data = yearly_class$total, color = brewer.pal(3, "Set2")[4], showInLegend = TRUE)|>
hc_xAxis(categories = yearly_class$year, title = list(text = "Years")) |>
hc_legend(layout = "vertical", align = "right", verticalAlign = "top", itemMarginTop = 5) |>
hc_add_theme(hc_theme_darkunica())Visualization Comments
The line graph shows that a gradual increase in sightings of Bigfoot up until 2004 where it reached an all-time high and began decreasing. I thought to look at Google web search trends to figure out if there was an increase interest in Bigfoot during 2004 but I could not come up with any reasoning behind the big spike in that year.
Mapping
I want to investigate where in the United States BigFoot was sighted to find any patterns.
Select only the latitute and longitude for plotting
usa_sightings <- sightings_clean |>
group_by(latitude, longitude) |>
select(latitude,longitude)Create a custom popup for the map
popupusa <- paste0(
"<b>Date: </b>", sightings_clean$date, "<br>",
"<b>State: </b>", sightings_clean$state, "<br>",
"<b>Conditions: </b>", sightings_clean$conditions, "<br>",
"<b>Season: </b>", sightings_clean$season, "<br>",
"<b>Classification: </b>", sightings_clean$classification, "<br>"
)Interactive Map
leaflet() |>
setView(lng = -95.71, lat = 37.09, zoom = 4) |>
addProviderTiles("Esri.WorldStreetMap") |>
addCircles(
data = sightings_clean,
radius = 25,
color = "#715145",
fillOpacity = 1,
popup = popupusa
)Assuming "longitude" and "latitude" are longitude and latitude, respectively
Visualization Comments
The map shows there to be a high density of sightings on the East and North-West side of the country. If Bigfoot were to exist, it would make sense for it to live in wooded areas and the states with high density would be good locations. For example, Washington state has a large density of sightings and it also is home to big forest areas for a creature like Bigfoot to hide in. Conversely, a state like North Dakota has few sightings and is mainly comprised of plains or prairies which would be terrible places for Bigfoot to hide in.
References
Britannica, T. Editors of Encyclopaedia (2024, April 23). Sasquatch. Encyclopedia Britannica. https://www.britannica.com/topic/Sasquatch