I’d like to answer is if “nature” photo hotspots shift seasonally in Michigan. I predict spring photo spots will be around areas with flowers, such as hiking trails in the lower peninsula. Summer photos will likely be around the Great Lakes shorelines, beaches, and national parks because of the school vacation time. Fall nature hotspots could be where the fall foliage is most intense, such as scenic roads or walks, forests, maybe concentrating in the Upper Peninsula. Finally, winter nature photos may be concentrated in the snowier areas, such as ski resorts, but I also predict lower nature photo activity during this season, because of Michigan’s harsh winter weather.
First, load packages and dataset of Flickr photos.
library(readr)
library(ggplot2)
library(ggthemes)
library(gganimate)
library(foreign)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(rnaturalearth)
#install.packages("rnaturalearthdata")
library(rnaturalearthdata)
##
## Attaching package: 'rnaturalearthdata'
## The following object is masked from 'package:rnaturalearth':
##
## countries110
#install.packages("gifski")
library(gifski)
#install.packages("lubridate")
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
#library(tidyverse)
#install.packages("wesanderson")
library(wesanderson)
MichFlickr <- read.csv("MichiganFlickr.csv")
Change the photo timestamp to a real date and extracting the month number (1-12) to assign it to a season.
MichFlickr <- MichFlickr %>%
mutate(
date = as_datetime(dateupload),
month = month(date)
)
We make a new column in the original MichFlickr dataset for season and assign photos to each season. Winter: Dec-Feb, Spring: Mar-May, Summer: June-Aug, Fall: Sept-Nov.
MichFlickr <- MichFlickr %>%
mutate(season = case_when(
month %in% c(12,1,2) ~ "Winter",
month %in% c(3,4,5) ~ "Spring",
month %in% c(6,7,8) ~ "Summer",
month %in% c(9,10,11)~ "Fall"
))
Using the column predict_Na to sort out the photos that are likely of nature and putting those in a new dataset called Mich_nature.
Mich_nature <- MichFlickr %>%
filter(predict_Na > 0.5)
Animating! I’m adding Michigan as a background for the data and
points will appear at the location the photo was taking, following the
order of the date they were taken. Photos from different seasons will
appear as different colors because of the easy identification.
Making a line graph of the above data. Aggregating the photo counts by day.
library(tidyr)
daily_counts <- animateMich %>%
group_by(date) %>%
summarise(n_photos = n()) %>%
complete(date = seq.Date(as.Date("2010-01-01"), as.Date("2010-12-31"), by = "day"))
ggplot(daily_counts, aes(x = date, y = n_photos)) +
geom_line(color = "black", size = 1) +
labs(title = "Daily Nature Photos in Michigan (2010)",
x = "Date", y = "Number of Photos") +
theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
Coming up with a way to “filter” this data and create a more usuable
line graph.
library(zoo)
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
daily_counts <- animateMich %>%
group_by(date) %>%
summarise(n_photos = n()) %>%
complete(date = seq.Date(as.Date("2010-01-01"), as.Date("2010-12-31"), by = "day")) %>%
mutate(roll_7 = rollmean(n_photos, 7, fill = NA))
ggplot(daily_counts, aes(x = date)) +
geom_line(aes(y = n_photos), color = "grey60", size = 0.8) +
geom_line(aes(y = roll_7), color = "darkgreen", size = 1.2) +
labs(title = "Daily Nature Photos in Michigan (2010)",
subtitle = "Raw counts (grey) with 7-day moving average (green)",
x = "Date", y = "Number of Photos") +
theme_minimal()
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_line()`).
Identifying peaks and tagging them with a date.
daily_counts <- daily_counts %>%
mutate(peak = n_photos > lag(n_photos) & n_photos > lead(n_photos))
peak_points <- daily_counts %>% filter(peak == TRUE)
top_peaks <- peak_points %>%
arrange(desc(n_photos)) %>%
slice(1:7)
ggplot(daily_counts, aes(x = date, y = n_photos)) +
geom_line(color = "black", size = 1) +
geom_point(data = top_peaks, aes(x = date, y = n_photos), color = "blue", size = 2.5) +
geom_text(
data = top_peaks,
aes(label = format(date, "%b %d")),
vjust = -0.7,
size = 3,
color = "blue"
) +
labs(
title = "Daily Nature Photos in Michigan (2010)\nTop 5 Peak Days Highlighted",
x = "Date", y = "Number of Photos"
) +
theme_minimal()
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
Testing for significant difference of photo numbers between seasons.
season_counts <- animateMich %>%
group_by(season) %>%
summarise(n_photos = n())
daily_season <- animateMich %>%
group_by(date, season) %>%
summarise(n_photos = n()) %>%
ungroup()
## `summarise()` has grouped output by 'date'. You can override using the
## `.groups` argument.
anova_result <- aov(n_photos ~ season, data = daily_season)
summary(anova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## season 3 0.06 0.02064 3.633 0.0123 *
## Residuals 42659 242.32 0.00568
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
P-value tells us that there is a statistical difference, but because we have multiple groups we need to do some post-hoc tests.
pairwise.t.test(daily_season$n_photos, daily_season$season,
p.adjust.method = "bonferroni")
##
## Pairwise comparisons using t tests with pooled SD
##
## data: daily_season$n_photos and daily_season$season
##
## Winter Spring Summer
## Spring 0.174 - -
## Summer 1.000 0.011 -
## Fall 1.000 0.025 1.000
##
## P value adjustment method: bonferroni
Spring and fall are statistically different from one another, as are summer and spring, winter and fall, and winter and summer. The most significant difference is between summer and winter, the two weather extremes!
animateMich <- animateMich %>%
mutate(season = factor(season, levels = c("Winter", "Spring", "Summer", "Fall")))
ggplot(animateMich, aes(x = season, fill = season)) +
geom_bar() +
scale_fill_manual(values = c(
"Winter" = "blue", # blue
"Spring" = "pink", # green
"Summer" = "darkgreen",
"Fall" = "orange" # orange
)) +
labs(title = "Total Nature Photos by Season (2010)",
x = "Season", y = "Number of Photos") +
theme_minimal()
Creating a heatmap of photo locations to see if these concentrations
change throughout the season.
animateMich <- animateMich %>%
mutate(season = factor(season, levels = c("Winter", "Spring", "Summer", "Fall")))
ggplot(animateMich, aes(x = longitude, y = latitude)) +
stat_density_2d(
aes(fill = ..level..),
geom = "polygon",
contour_var = "count",
bins = 25,
alpha = 0.9
) +
geom_polygon(
data = mich_county,
aes(x = long, y = lat, group = group),
fill = NA, color = "black", linewidth = 0.3
) +
coord_fixed() +
scale_fill_viridis_c(option = "C", direction = 1) +
labs(
title = "Spatial Concentration of Nature Photos by Season (2010)",
fill = "Photo Count Density"
) +
facet_wrap(~season) +
theme_minimal() +
theme(panel.background = element_rect(fill = "gray95"))
## Warning: The dot-dot notation (`..level..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(level)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Animating a version of the above map.
I did an ANOVA test for significance in the number of photos taken in each season and the results showed that there were significant differences. A post-hoc test revealed that spring and fall, spring and summer, winter and fall, and winter and summer are all significantly different in the number of photos taken. The most significant difference is between the summer and winter seasons. This shows that although there is sum activity in nature photo uploading to Flickr throughout the entire year, there is an uneven distribution of it, specifically a drop off of activity in the winter. Because Michigan is well-known for it’s brutal winters, this reduction may be a result of decreased outdoor activity as a result of heavy snow, extreme cold, or ice. However, the opposite could also be true, and the increase of nature photography in the summer may be driven by tourism to Michigan state parks or the Upper Peninsula, which we can see pattern of the in the density “cloud” gif. I think it’s most likely that these spatial patterns of nature photography are caused by the availability of recreational activities, which depends on both the season and the area. It would be harder to access the Upper Peninsula in the winter, as most of Michigan’s population and airports is located in the lower peninsula, which would require lots of driving on potentially unsafe roads. This may explain why a lot of Flickr activity is clustered around Detroit and Ann Arbor for winter, but seems to float slowly more up north as spring progresses to summer. There were several interesting peaks seen in the line graph, which shows the more temporal side as well, which may correlate with special natural seasonal events, such as an especially beautiful days of blooms (perhaps cherry blossoms), summer holidays, peak fall foliage colors, or even the first snow. For a future analysis, it would be interesting to plot the number of nature photos from Flickr with the temperatures of each day in 2010 as a more accurate predictor of the historical weather conditions.
My first line graph I made to show the seasonal differences in the amount of photos taken used a daily count. There’s a lot of spikes, which may be interesting to look at what was happening those days. But to reduce noise, I used a rolling average or moving mean, taking the average of daily photo counts of nature over 7 day windows and plotting that instead, which removes some of those biggest outliers and makes a line that is more useful for predictions. It’s possible that Flickr users uploaded a bunch of photos on one day, even if that’s not the day they were taken, so this averaging should hopefully reduce the weight and occurrence of events like this. The average of 7 days should still preserve any seasonal patterns that might be present. The gif I made is of photos that are likely photos of nature taken in Michigan in 2010, with points appearing that correspond to the season they were taken in. I used kernel density estimation to create this density “cloud”, that moves in a time series with the different seasons. I was hoping to see some significant clustering of photos taken in distinct locations depending on the season, and there does seem to be a northward movement of photos during the spring and summer months, which then coalesce into smaller, more dense areas during the fall and winter, right around Detroit and Ann Arbor. There does seem to be slightly more activity along the coasts in the in the Upper Peninsula in the summer months, which I hypothesized may be because of the cultural norm to go to the beach during the summer and the agreeable weather and access to nature in the far north portion of the state.