The goal of this project is to add interactivity to my second portfolio project on the accuracy of weather forecasting in cites in the United States. The data is across 16 months, describes 167 different cities, and is from the National Weather Service. Let’s first wrangle the data like we did in the previous project.
# Data wrangling copied over from portfolio 2
weather <- read.csv("~/Desktop/work/STAT220/portfolio2/data/weather_forecasts.csv")
cities <- read.csv("~/Desktop/work/STAT220/portfolio2/data/forecast_cities.csv")
meanings <- read.csv("~/Desktop/work/STAT220/portfolio2/data/outlook_meanings.csv")
weather_cities <- left_join(weather, cities, by = c("city", "state"))
weather_cities <- weather_cities %>%
mutate(date = ymd(date),
city = as.factor(city),
state = as.factor(state),
high_or_low = as.factor(high_or_low),
forecast_outlook = as.factor(forecast_outlook),
koppen = as.factor(koppen),
forecast_error = forecast_temp - observed_temp)
weather_cities_analysis <- weather %>%
filter(city != "BUFFALO" & city != "RICHMOND") %>%
mutate(date = ymd(date),
city = as.factor(city),
state = as.factor(state),
high_or_low = as.factor(high_or_low),
forecast_outlook = as.factor(forecast_outlook),
forecast_error = forecast_temp - observed_temp) %>%
group_by(city, state, high_or_low) %>%
summarize(error_avg = mean(forecast_error, na.rm = TRUE),
abs_total_error = sum(abs(forecast_error), na.rm = TRUE)) %>%
left_join(cities, by = c("city", "state")) %>%
mutate(koppen = as.factor(koppen))
Now that we have our data in the correct format, we can start adding interactivity to the plots. This plot originally showed absolute total forecasting error for each state. Using plotly, we can add interactivity to the points so that we can mouse over them to see the city and state names rather than having to group by state and then look at the axis. Because of this interactivity, we can replace the x axis with something more meaningful, like the distance from the city to the coast.
p1 <- weather_cities_analysis %>%
filter(high_or_low == "high") %>%
ggplot(aes(x = distance_to_coast,
y = abs_total_error,
text = paste(city, state))) +
geom_point() +
scale_y_continuous(limits = c(2000, 8200)) +
labs(x = "Distance to Coast (Miles)",
y = "Absolute Total Forecasting Error (Degrees)",
title = "Total High Temp Forecasting Error by Distance to Coast") +
theme_clean() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
ggplotly(p1, tooltip = "text")
It looks like the total error tends to increase as cities get further from the coast.
Let’s also try to improve the longitude and latitude graphs. Originally, they were separate scatter plots. Let’s make a map instead, with longitude on the x axis and latitude on the y axis. We can then add tooltips to display important information about the city that is being observed.
p2 <- weather_cities_analysis %>%
filter(high_or_low == "high") %>%
ggplot(aes(x = lon,
y = lat,
text = paste(city, state),
color = abs_total_error,
a = distance_to_coast,
b = elevation)) +
geom_point() +
scale_x_continuous(limits = c(-125, -67)) +
scale_y_continuous(limits = c(25, 50)) +
labs(x = "Longitude",
y = "Latitude",
color = "Absolute Total Error",
title = "Total High Temp Forecasting Error by Location") +
theme_clean()
ggplotly(p2, tooltip = c("text", "color", "a", "b"))