http://rpubs.com/AngelML15/1269118
In this analysis, we explore the accuracy of high and low temperature forecasts across 167 US cities over a 16 month period. Using data from the National Weather Service, we investigate which areas struggle the most with accurate weather predictions and explore potential reasons for these discrepancies.
outlook_meanings <- read_csv("data/outlook_meanings.csv")
weather_forecasts <- read_csv("data/weather_forecasts.csv",
col_types = cols(
date = col_date(),
city = col_factor(),
state = col_factor(),
high_or_low = col_factor(),
forecast_hours_before = col_integer(),
observed_temp = col_integer(),
forecast_temp = col_integer(),
forecast_outlook = col_factor(),
possible_error = col_factor()
))
forecast_cities <- read_csv("data/forecast_cities.csv")state_error <- combined_weather_data |>
group_by(state) |>
summarize(mean_error = mean(temp_error, na.rm = TRUE)) |>
mutate(state = tolower(state)) |>
left_join(tibble( # new element changing abbreviations to full names
state = tolower(state.abb),
name = tolower(state.name)
))
us_states <- map_data("state")
ggplot(data = state_error) +
geom_map(aes(map_id = name, fill = mean_error),
color = "white",
map = us_states) +
expand_limits(x = us_states$long, y = us_states$lat) +
scale_fill_viridis_c(option = "G", direction = -1) +
coord_map() +
theme_void() +
theme(legend.position = "bottom") +
labs(
title = "Average Forecast Error by US State",
fill = "Avg. Temp Error (°F)"
)To visualize how forecasting accuracy varies across the country, I created a choropleth map showing the average temperature forecast errors by U.S. state. The map reveals that some states, particularly in the western US around Montana, exhibit relatively high forecasting errors, which could indicate that these regions face more challenges in weather prediction. In contrast, states in the southeastern US tend to have lower errors, suggesting more reliable temperature predictions in this region.
state_error <- combined_weather_data |>
group_by(state, high_or_low) |>
summarize(mean_error = mean(temp_error, na.rm = TRUE)) |>
mutate(state = tolower(state)) |>
left_join(tibble(
state = tolower(state.abb),
name = tolower(state.name)
), by = "state")
ggplot(data = state_error) +
geom_map(aes(map_id = name, fill = mean_error),
color = "white",
map = us_states) +
expand_limits(x = us_states$long, y = us_states$lat) +
scale_fill_viridis_c(option = "G", direction = -1) +
coord_map() +
theme_void() +
facet_wrap(~high_or_low) +
theme(legend.position = "bottom") +
labs(
title = "Average Forecast Error by US State",
fill = " Avg. Temp Error (°F)"
)When comparing two choropleth maps of US states showing forecasting errors for high and low temperatures, I observed that low temperature forecasting errors were generally higher than those for high temperatures. These errors were particularly concentrated in the western half of the US. In contrast, high temperature forecast errors appeared to be more evenly distributed across states, with no clear regional pattern.
forecast_time_error <- combined_weather_data |>
group_by(forecast_hours_before) |>
summarize(mean_error = mean(temp_error, na.rm = TRUE))
ggplot(data = forecast_time_error) +
geom_col(aes(x = factor(forecast_hours_before), y = mean_error, fill = mean_error)) +
labs(title = "Average Forecast Error by Lead Time",
x = "Forecast Hours Before Observation",
y = "Avg. Temp Error (°F)") +
theme(legend.position = "none")When analyzing the relationship between forecast hours before the observation and the average temperature error, the data shows that forecast errors tend to increase as the lead time grows. In general, longer lead times correlate with higher forecast errors, which is expected since predicting temperatures further in advance involves greater uncertainty. These findings highlight the importance of providing more accurate short term forecasts, as accuracy declines over longer forecast periods.
distance_coast_error <- combined_weather_data |>
group_by(distance_to_coast) |>
summarize(mean_error = mean(temp_error, na.rm = TRUE))
ggplot(data = distance_coast_error) +
geom_point(aes(x = distance_to_coast, y = mean_error), color = "blue") + # Scatter plot
labs(title = "Distance to Coast vs Average Forecast Error",
x = "Distance to Coast (miles)",
y = "Avg. temp Error (°F)") +
theme_minimal()Lastly, the analysis of the relationship between distance to the coast and average temperature forecast error reveals a suggestive pattern such that as the distance from the coast increases, the mean temperature forecast error tends to rise slightly. However, there is still significant scatter in the data, indicating that other factors may also play a role. This suggests that distance to the coast could be a key factor influencing forecast accuracy