Introduction to the Data

In this project, I will be using data sourced from the National Weather Service over a 16-month long period. It originally contained 167 US cities, but I cleaned it to contain only cities from the continental US, reducing that number to 161 cities. One data set had information about forecast and observed high and low temperatures in degrees Fahrenheit for these cities and the other contained geographic information about major US cities.

The goal of this project is to explore possible explanations for error in weather forecasts. By error, I mean the difference between the predicted and observed temperatures for both the high and low predicted temperatures. In order to investigate this for each city, I added a column to the forecast dataset that was the absolute value of the mean difference between the forecast and observed temperature for each city. Then I combined the forecast dataset and the city information dataset by city and state and created graphics to explore and represent trends in the error.

On the whole, the error was not all that much, with the mean and median average error in the high forecast being 2.25 and 2.28, and that of the low forecast being 2.42 and 2.35 respectively.