Weather forecasting has become more and more important in daily life, influencing decisions in areas such as transportation, agriculture, and public events. This analysis focuses on understanding the factors that affect temperature forecast accuracy across different regions in the United States. By using three related dataset that include temperature forecasts and observed values from various cities, we are trying to find patterns in forecast errors and identify potential reasons behind them.
The data has been preprocessed such that multiple datasets were joined, cleaned, and filtered to ensure consistency and accuracy. We created a new variable “temp_error”, which is calculated as the absolute difference between the forecasted and observed temperatures.
One key finding was the discrepancy in accuracy between high and low temperature forecasts. Forecasts for high temperatures tended to be more accurate than those for low temperatures. This discrepancy could be due to the complex atmospheric conditions that influence nighttime cooling, making low-temperature predictions more variable.
## high_or_low mean_temp_error
## 1 high 2.25
## 2 low 2.44
Another major finding was the relationship between forecast lead time and accuracy. By grouping the data based on “forecast_hours_before,” we observed that forecasts made closer to the observation time were generally more accurate. Specifically, as the forecast horizon increased from 12 to 48 hours, the mean temperature error also increased. This pattern points out the challenges in long-term weather predictions and also reflects the reality that weather forecast for the next week might not be always accurate.
## forecast_hours_before mean_temp_error
## 1 12 2.11
## 2 24 2.28
## 3 36 2.42
## 4 48 2.56
To visualize this change more clearly, we tried a new type of graph, the heat map. The following heat map visualized the mean temperature error by state and indicated that inland states, particularly in the northern and central regions (such as Montana), exhibited higher forecast errors compared to coastal states (such as Florida). To enhance the clarity of this heat map, we increased the plot dimensions and adjusted the label formatting to ensure state names were legible and the overall visualization was easy to interpret.
To delve deeper, we analyzed the impact of elevation and distance from the coast on forecast accuracy. Our findings showed a clear trend: as elevation increased, the mean temperature error also rose. This correlation can be attributed to the complex and rapidly changing weather conditions often found in higher-altitude regions. Although the influence of distance from the coast on forecast error was not as pronounced as that of elevation, it exhibited a similar pattern. Interestingly, these results aligned with the spatial patterns observed in the state-level heatmap, and it also fit in the pattern we found for high vs. low temperature forecast, since higher-altitude regions usually have lower temperature. The result reinforced the role of geographic features in temperature forecast accuracy.
In conclusion, we identified several key factors influencing temperature forecast accuracy. Longer forecast lead times, low-temperature predictions, high-elevation regions, and certain inland states are associated with greater forecast errors. These patterns can be suggestive for future improvements in weather modeling, especially in regions identified as challenging for accurate temperature forecasting.
#Citation
https://stackoverflow.com/questions/25086500/ggplot2-assign-symbol-fill-based-on-fact
https://ggplot2.tidyverse.org/reference/scale_gradient.html
https://ggplot2.tidyverse.org/reference/coord_map.html
https://r-graph-gallery.com/79-levelplot-with-ggplot2.html
https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/as.data.frame
https://cmu-delphi.github.io/covidcast/covidcastR/reference/name_to_abbr.html