A project I am now working on, involves a model for predicting the number of visitors in a “brick-and-mortar” retail shop. A proxy for predicting visitors, among others, is the weather temperature on a given day. There are quite a few websites that collect weather data from weather stations all over the world.
One of them is Weather Underground and luckily there is also an R package - rwundergroud - that is an R interface to the website’s API.

Retrieving weather data from Weather Underground can be done using two things:

  1. An API key from Weather Underground
  2. The weather station ID

How to get the API key

Go to wunderground to get the API key. You can select the free option Stratus Plan. This plan allows for 500 calls per day and 10 calls per minute. The 500 calls will not be a problem, but the 10 calls per minute might be. We will deal with this shortly. Note, that you will be redirected to register for an account first.

The weather station id

Each weather station has a universal station ID. For example, the weather station in the University of Cyprus in Aglandjia, Cyprus is IAGLANDJ2

To get the station ID, go to the home page and search for the location you need.
I searched for Nicosia and I got the option of KENTAVROU STATION

Next, click on CHANGE and a drop-list of weather stations near the location you selected, appears. See the stations’ ID’s in the parentheses.
This is what you want

Let’s do it!

Lets get the weather data using R

First, load the nescessary libraries

library(rwunderground)  #for the Weather Underground API
## Warning: package 'rwunderground' was built under R version 3.4.3
library(dplyr)          #for data manipulation
library(ggplot2)        #for visualisations
library(lubridate)      #to work with dates

Set the API key using the set_api_key function from the rwundergound library.
Next, you can save the weather station ID into an object my_weather_station for quickly referencing it.

set_api_key("put_your_own_api_key_here")     #API key in weather underground page
my_weather_station<-"IAGLANDJ2"              #The uni of cyprus weather station id

(Note!, as the documentation states, “locations can be specified by the airport code, zip code, personal weather station ID or simply by specifying state and city (if in US) or country and city (if outside US). The set_location function will validate locations and format things correctly or you can use a (correctly formatted) string.”.)

There are a few functions within the rwunderground package to retreive weather data. See the rwunderound documentation

We will use functions history and history_range.

With history you get hourly data for a single day. Let’s get weather data for Jan 1st, 2017. Dates should be written in a YYYYMMDD format

weather_data<-history(set_location(PWS_id = my_weather_station), date = "20170801")
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170801/q/pws:IAGLANDJ2.json"
head(weather_data)
## # A tibble: 6 x 21
##                  date  temp dew_pt   hum wind_spd wind_gust   dir   vis
##                <dttm> <dbl>  <dbl> <dbl>    <dbl>     <dbl> <chr> <dbl>
## 1 2017-08-01 00:01:00  82.6   71.8    70      0.0       0.0  West    NA
## 2 2017-08-01 00:06:00  82.4   71.2    69      1.1       2.5   WSW    NA
## 3 2017-08-01 00:11:00  82.4   71.2    69      0.9       2.5   WNW    NA
## 4 2017-08-01 00:16:00  82.4   70.9    68      0.2       2.5   WSW    NA
## 5 2017-08-01 00:21:00  82.2   70.2    67      0.0       0.0  West    NA
## 6 2017-08-01 00:26:00  82.2   70.2    67      0.0       0.0   SSW    NA
## # ... with 13 more variables: pressure <dbl>, wind_chill <dbl>,
## #   heat_index <dbl>, precip <dbl>, precip_rate <dbl>, precip_total <dbl>,
## #   cond <chr>, fog <dbl>, rain <dbl>, snow <dbl>, hail <dbl>,
## #   thunder <dbl>, tornado <dbl>

We get weather data including temperature, humidity, wind speed etc.

With history_range you get hourly weather data for a specified date range

Assuming a date range between Aug 1st and Aug 3rd 201:

range_of_weather_data<-history_range(set_location(PWS_id = my_weather_station),
                           date_start = "20170801", date_end = "20170803")
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170801/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170802/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170803/q/pws:IAGLANDJ2.json"
head(range_of_weather_data)
## # A tibble: 6 x 21
##                  date  temp dew_pt   hum wind_spd wind_gust   dir   vis
##                <dttm> <dbl>  <dbl> <dbl>    <dbl>     <dbl> <chr> <dbl>
## 1 2017-08-01 00:01:00  82.6   71.8    70      0.0       0.0  West    NA
## 2 2017-08-01 00:06:00  82.4   71.2    69      1.1       2.5   WSW    NA
## 3 2017-08-01 00:11:00  82.4   71.2    69      0.9       2.5   WNW    NA
## 4 2017-08-01 00:16:00  82.4   70.9    68      0.2       2.5   WSW    NA
## 5 2017-08-01 00:21:00  82.2   70.2    67      0.0       0.0  West    NA
## 6 2017-08-01 00:26:00  82.2   70.2    67      0.0       0.0   SSW    NA
## # ... with 13 more variables: pressure <dbl>, wind_chill <dbl>,
## #   heat_index <dbl>, precip <dbl>, precip_rate <dbl>, precip_total <dbl>,
## #   cond <chr>, fog <dbl>, rain <dbl>, snow <dbl>, hail <dbl>,
## #   thunder <dbl>, tornado <dbl>

You can see that the history_range makes 3 different calls to the API, one for each day. Should you require a date range of more than 10 days, the execution of the code will halt for 1 minute- and then continue fetching more data.

In the project I am working on, I will be needing a full month’s weather data. A way to bypass the 10 calls limitation is to simply brake the start date and end dates into chunks of 10 days.

dates_to_collect=c("20170801", "20170809", 
                   "20170810", "20170817",
                   "20170818", "20170825",
                   "20170826", "20170831")

This will create four dataframes of weather data. I plan to save these into a list

weatherdata_list<-list()

Now, lets write a small loop function to iterate through the 4 chunks of start and end dates and save each range dates into the list

i=1 #will help us select the ranges

for (k in 1:4) {
  date_start<-dates_to_collect[i]
  date_end<-dates_to_collect[i+1]

  weatherdata_list[[k]]<-history_range(set_location(PWS_id = my_weather_station),
                           date_start = date_start, date_end = date_end)
  i=i+2 #to get the next range
}
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170801/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170802/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170803/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170804/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170805/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170806/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170807/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170808/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170809/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170810/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170811/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170812/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170813/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170814/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170815/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170816/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170817/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170818/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170819/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170820/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170821/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170822/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170823/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170824/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170825/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170826/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170827/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170828/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170829/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170830/q/pws:IAGLANDJ2.json"
## [1] "Requesting: http://api.wunderground.com/api/6ea8126e87b08519/history_20170831/q/pws:IAGLANDJ2.json"

Put together all elements of the weatherdata_list-i.e. the 4 data frames - usings dplyr’s bind_rows

weather_data<-dplyr::bind_rows(weatherdata_list)
glimpse(weather_data)
## Observations: 8,796
## Variables: 21
## $ date         <dttm> 2017-08-01 00:01:00, 2017-08-01 00:06:00, 2017-0...
## $ temp         <dbl> 82.6, 82.4, 82.4, 82.4, 82.2, 82.2, 82.2, 82.2, 8...
## $ dew_pt       <dbl> 71.8, 71.2, 71.2, 70.9, 70.2, 70.2, 69.3, 68.9, 6...
## $ hum          <dbl> 70, 69, 69, 68, 67, 67, 65, 64, 64, 63, 61, 59, 5...
## $ wind_spd     <dbl> 0.0, 1.1, 0.9, 0.2, 0.0, 0.0, 0.0, 0.2, 0.0, 0.0,...
## $ wind_gust    <dbl> 0.0, 2.5, 2.5, 2.5, 0.0, 0.0, 0.0, 2.5, 0.0, 0.0,...
## $ dir          <chr> "West", "WSW", "WNW", "WSW", "West", "SSW", "SSW"...
## $ vis          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ pressure     <dbl> 29.84, 29.84, 29.84, 29.84, 29.84, 29.84, 29.83, ...
## $ wind_chill   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ heat_index   <dbl> 87.6, 87.0, 87.0, 86.7, 86.1, 86.1, 85.7, 85.5, 8...
## $ precip       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ precip_rate  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ precip_total <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ cond         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ fog          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ rain         <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ snow         <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ hail         <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ thunder      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ tornado      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...

As you can see, there are more than 8000 obervations

Now, lets visualise some data using ggplot2 combined with dplyr functionality

weather_data%>%
  mutate(day=lubridate::day(date))%>%             #get the day (1st,2nd, etc...) of the month
  group_by(day)%>%                                #group by the day
  summarise(temp_avg=mean(temp))%>%               #in order to get the average temperature/ day
  ggplot(aes(factor(day), temp_avg, group=1))+    #plot the summarised data
  geom_point()+                      
  geom_line()+
  labs(x="August (day)", y="Mean Temperature (F)",
       title="August", caption = "source: Weather Underground\nUniversity of Cyprus weather station")

Looks like 1st and 25th of August were the hottest days…

Really hope you enjoyed it!
If you have any thoughts, please share in the comments