title: “Weather Prediction Errors in the U.S.” subtitle: author: “Josh Madigan” output: html_document editor_options: markdown: wrap: 72 —
#/ include: false
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
library(tidyverse)
library(ggplot2)
library(patchwork)
library(readr)
forecast_cities <- read_delim("~/Stat 220/Assignments/proj2-JoshMadigan/data/forecast_cities.csv", delim = ",")
outlook_meanings <- read_delim("~/Stat 220/Assignments/proj2-JoshMadigan/data/outlook_meanings.csv", delim = ",")
weather_forecasts <- read_csv("data/weather_forecasts.csv")
mega_forecast <- weather_forecasts %>%
left_join(forecast_cities, by = c("city", "state"))
mega_forecast <- mega_forecast %>%
mutate(temp_error = abs(observed_temp - forecast_temp))
Inaccurate weather prediction is a common problem seen each and every day all around the country. It is all too often that we view reports stating certain conditions but experience very different ones. But does this issue impact each region evenly? To investigate, I took data of both actual weather and weather forecasts for various cities and performed exploratory data analysis. Specifically, I analyzed the error in temperature predictions based on climate classification, latitude, and longitude.
The temperature prediction error by climate classification is shown below. Though each “step” between classifications is relatively small when it comes to the difference seen in temperature error, the extremes are somewhat far from each other. The largest temperature error by a certain climate is over twice as significant as of the smallest error. As can be read from the plot, the climates Dfc and BSk have, on average higher temperature errors, and Am and Aw have lower errors. On a map, these areas correspond to Alaska and parts of the Rockies as locations of high temperature prediction error and the Great Lakes and Midwest as areas of low temperature prediction error.
mega_forecast %>%
group_by(koppen) %>%
summarize(avg_temp_error = mean(temp_error, na.rm = TRUE)) %>%
ggplot(aes(x = fct_reorder(koppen, avg_temp_error), y = avg_temp_error, fill = avg_temp_error)) +
geom_col() +
coord_flip() +
scale_fill_viridis_c() +
labs(
title = "Forecasat Error by Climate Classification",
x = "Köppen Climate Classification",
y = "Average Temperature Error"
)
When looking at average temperature error by latitude, a gradual increase can be seen, which correlates with an increase in the coordinate number (i.e. locations with higher latitude have larger average temperature errors than locations with lower latitude). Similarly to with the climate classifications, the difference between highest and lowest errors is quite significant proportionally. On a map, this shows up as locations that are further North having higher average temperature errors than locations that are further South.
mega_forecast %>%
mutate(lat_group = cut(lat, breaks = 11)) %>% #NEW ELEMENT - "CUT"
group_by(lat_group) %>%
summarize(avg_temp_error = mean(temp_error, na.rm = TRUE)) %>%
ggplot(aes(x = lat_group, y = avg_temp_error, fill = avg_temp_error)) +
geom_col() +
coord_flip() +
scale_fill_viridis_c() +
labs(title = "Forecast Error by Latitude",
x = "Latitude",
y = "Average Temperature Error")
Finally, when looking at average temperature error by longitude, the plot is considerably different than the previous two. Here, one group of longitudes stands far above the rest in error. All of the other longitudes are within roughly one degree on the x-axis, while the group (-149, -144) is nearly a degree and a half above the next highest error. On a map, this area is seen as the Eastern coast of Alaska.
mega_forecast %>%
mutate(lon_group = cut(lon, breaks = 20)) %>% #NEW ELEMENT - "CUT"
group_by(lon_group) %>%
summarize(avg_temp_error = mean(temp_error, na.rm = TRUE)) %>%
ggplot(aes(x = lon_group, y = avg_temp_error, fill = avg_temp_error)) +
geom_col() +
scale_fill_viridis_c() +
coord_flip() +
labs(title = "Forecast Error by Longitude",
x = "Longitude",
y = "Average Temperature Error")
Putting the information learned from climate classifications as well as latitude and longitude, we can see that areas in the Northwest of the US experience greater levels of average temperature error. This may be due to proximity to the coast, and/or could be related to the location of natural features such as mountain ranges directly to the the East of these locations.