R code to acquire and process Hourly Air Temperature for Two Cities

This code downloads weather data for two weather stations, and prepare it so that it can compared. The information is kept for hundreds of weather stations across the United States and World. These are in gross, the steps that the script follows; many of them are in the documentation for the rnoaa package.

The story behind the analysis, and the code, as well as the physical description of phenomena, can be found in this Linkedin article.

require(rnoaa)
require(lubridate)
require(dplyr)

# Station list here = ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-history.txt

## get data for multiple stations
# St. Petersburg, FL station 
res1 <- isd(usaf="997353", wban="99999", year=2016)
# Columbia Metropolitan Airport
res2 <- isd(usaf="723100", wban="13883", year=2016)

## combine data from the two cities
res_all <- rbind(res1, res2)

## combine date and time into a new column
# having date and time together is advantegous
res_all$date_time <- ymd_hm(
  sprintf("%s %s", as.character(res_all$date), res_all$time)
)
## remove 999, which is for data point not available
res_all <- res_all %>% filter(temperature < 900)

### subset to date range (15 days moreless)
date1 <- as.Date("2016-07-01")
date2 <- as.Date("2016-07-15")
res_all <- res_all[res_all$date >= date1 & res_all$date <= date2,]

The weather files from NOAA are extremely data rich, Temperature is just one of the 70 variables recorded. Amoung them was pressure, wind direction, speed, and visibility. Precipitation is kept in separate databases by the NOAA’s NCEI (National Center for Environmental Information).

Plot of Hourly Temperatures for Columbia SC, and St. Petersburg, FL

The plot shows the Air Temperature for two cities during the first 15 days of July, 2016. Note that Columbia, SC, which is about 400 miles North of Saint Petersburg, FL, has higher daily maxinums, and lower daily minimums than Saint Petersburg, FL. Note that the curve for Columbia has higer daily maxima and minima than St. Pete, with hotter days and cooler nights. This is common for this time of the summer. Temperature is in °Celsius.

The plot is made with the ggplot function of the ggplot2 R library, and uses the geom_line to plot the temperatures throughout the 15 summer days range.

require(ggplot2)
ggplot(res_all, aes(x=date_time, y=temperature, group=usaf_station, colour=usaf_station)) +
  geom_line(size=1.0) + scale_color_manual(values=c("#FF6666", "#56B4E9"), 
  name="Weather\nStation", labels=c("Columbia,SC", "St. Pete.,FL")) + theme(legend.key.size=unit(1,"cm"))

Plot Summary Statistics for Daily Temperatures of Saint Pete, FL and Columbia, SC

The comparative boxplot provides another picture of the comparison. In this case, the R code uses geom_boxplot to create the graph. It also uses the stat_summary to include the mean as a red point, and the geom_text to include the mean values calculated using the aggregate R function. Temperature is in °Celsius.

The box plot tells a bit more information about the story and the distributions. With Columbia-SC (red box) having a broader distribution of data (quartiles and min/max) than Saint Petersburg-FL (blue box). The average temperature for Columbia (30.13°C) is slightly higher than Saint Pete (29.87°C), while St. Pete’s median (horizontal black lines) is slightly higher than Columbia.

means <- aggregate(temperature ~  usaf_station, res_all, FUN = function(x) round(mean(x),digits=2))
ggplot(res_all, aes(x=usaf_station, y=temperature, fill=usaf_station)) +  geom_boxplot() + 
  scale_fill_manual(name="Weather\nStation", values=c("#FF6666", "#56B4E9"),
                    labels=c("Columbia,SC", "St. Pete.,FL")) + theme(legend.key.size=unit(1,"cm")) +
  stat_summary(fun.y = mean, geom="point", color = "red", size = 3) +
  geom_text(data = means, aes(label = temperature, y = temperature + 0.08),vjust=-0.7)

And just for fun, here is exactly the same plot, but using Violin plots, which give a bit more information about the distributions. Note that the only change is the geom_violin() substituting the geom_boxplot().

means <- aggregate(temperature ~  usaf_station, res_all, FUN = function(x) round(mean(x),digits=2))
ggplot(res_all, aes(x=usaf_station, y=temperature, fill=usaf_station)) +  geom_violin() + 
  scale_fill_manual(name="Weather\nStation", values=c("#FF6666", "#56B4E9"),
                    labels=c("Columbia,SC", "St. Pete.,FL")) + theme(legend.key.size=unit(1,"cm")) +
  stat_summary(fun.y = mean, geom="point", color = "red", size = 3) +
  geom_text(data = means, aes(label = temperature, y = temperature + 0.08),vjust=-0.7)