Solution

1. Load our packages

We will be using the ‘bomrang’ and ‘tidyverse’ package for this excercise.

The bomrang library contains functions to interface with the BOM data, fetching and returning data sets of forecasts, historical and current weather data from weather stations in Australia.

The tidyverse package contains the dplyr package which contains functions for transforming our data.

library(bomrang)

## 
## Data (c) Australian Government Bureau of Meteorology,
## Creative Commons (CC) Attribution 3.0 licence or
## Public Access Licence (PAL) as appropriate.
## See http://www.bom.gov.au/other/copyright.shtml
## 
## 
## If you use bomrang, please cite it.
## See `citation('bomrang')` for the proper citation.

library(tidyverse)

## ── Attaching packages ────────────────────────────────── tidyverse 1.2.1 ──

## ✔ ggplot2 3.0.0     ✔ purrr   0.2.5
## ✔ tibble  1.4.2     ✔ dplyr   0.7.6
## ✔ tidyr   0.8.1     ✔ stringr 1.2.0
## ✔ readr   1.1.1     ✔ forcats 0.2.0

## ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

2. Load our data

Use the get_historical() function from the ‘bomrang’ library to load past daily observations for a particular weather station in Australia.

In this example, we are using two arguments to load the maximum temperatures from a weather station in each capital city.

To retrieve data from a particular weather station we will use the Station ID which is uniquely allocated to each weather station. More infromation on station numbers can be found here.

The type argument is used to specify what weather measurement we would like to retrieve, such as rain, minimum temperature, maximum temperature or solar exposure. As we are interested in measuring maximum temperature we will be using ‘max’.

syd <- get_historical(stationid = "066062", type = "max")

## Data saved as /var/folders/48/xwjgszy56ms4kwz9k_kj45nw0000gn/T//Rtmpzzqcva/IDCJAC0010_066062_1800_Data.csv

mel <- get_historical(stationid = "086282", type = "max")

## Data saved as /var/folders/48/xwjgszy56ms4kwz9k_kj45nw0000gn/T//Rtmpzzqcva/IDCJAC0010_086282_1800_Data.csv

canb <- get_historical(stationid = "070351", type = "max")

## Data saved as /var/folders/48/xwjgszy56ms4kwz9k_kj45nw0000gn/T//Rtmpzzqcva/IDCJAC0010_070351_1800_Data.csv

bris <- get_historical(stationid = "040913", type = "max")

## Data saved as /var/folders/48/xwjgszy56ms4kwz9k_kj45nw0000gn/T//Rtmpzzqcva/IDCJAC0010_040913_1800_Data.csv

Let’s view a sample of our data frames:

head(syd)

##   Product_code Station_number Year Month Day Max_temperature
## 1   IDCJAC0010          66062 1859     1   1            24.4
## 2   IDCJAC0010          66062 1859     1   2            24.4
## 3   IDCJAC0010          66062 1859     1   3            24.2
## 4   IDCJAC0010          66062 1859     1   4            24.7
## 5   IDCJAC0010          66062 1859     1   5            24.6
## 6   IDCJAC0010          66062 1859     1   6            22.2
##   Accum_days_max Quality
## 1             NA       Y
## 2              1       Y
## 3              1       Y
## 4              1       Y
## 5              1       Y
## 6              1       Y

head(mel)

##   Product_code Station_number Year Month Day Max_temperature
## 1   IDCJAC0010          86282 1970     1   1              NA
## 2   IDCJAC0010          86282 1970     1   2              NA
## 3   IDCJAC0010          86282 1970     1   3              NA
## 4   IDCJAC0010          86282 1970     1   4              NA
## 5   IDCJAC0010          86282 1970     1   5              NA
## 6   IDCJAC0010          86282 1970     1   6              NA
##   Accum_days_max Quality
## 1             NA        
## 2             NA        
## 3             NA        
## 4             NA        
## 5             NA        
## 6             NA

head(canb)

##   Product_code Station_number Year Month Day Max_temperature
## 1   IDCJAC0010          70351 2008     1   1              NA
## 2   IDCJAC0010          70351 2008     1   2              NA
## 3   IDCJAC0010          70351 2008     1   3              NA
## 4   IDCJAC0010          70351 2008     1   4              NA
## 5   IDCJAC0010          70351 2008     1   5              NA
## 6   IDCJAC0010          70351 2008     1   6              NA
##   Accum_days_max Quality
## 1             NA        
## 2             NA        
## 3             NA        
## 4             NA        
## 5             NA        
## 6             NA

head(bris)

##   Product_code Station_number Year Month Day Max_temperature
## 1   IDCJAC0010          40913 1999     1   1              NA
## 2   IDCJAC0010          40913 1999     1   2              NA
## 3   IDCJAC0010          40913 1999     1   3              NA
## 4   IDCJAC0010          40913 1999     1   4              NA
## 5   IDCJAC0010          40913 1999     1   5              NA
## 6   IDCJAC0010          40913 1999     1   6              NA
##   Accum_days_max Quality
## 1             NA        
## 2             NA        
## 3             NA        
## 4             NA        
## 5             NA        
## 6             NA

We can see that the Sydney data frame displays the daily maximum recorded temperature for that particular station since 1st January 1859. It also displays the accumulated number of days of that same maximum temperature and the quality of the data, that is, whether the recorded observation has completed the routine quality control process or not.

When viewing the other data frames from the other cities, note that there are NA values, which is important to note when we summarise our data in the future steps.

As we are only displaying the Station ID, it is not easy to interpret which city the observation originated from. To easily interpret the location, let’s add a column called Location using the mutate() function. This will allow us to display the name of the city that weather station was located in:

syd <- mutate(syd, Location="Sydney")
mel <- mutate(mel, Location="Melbourne")
canb <- mutate(canb, Location="Canberra")
bris <- mutate(bris, Location="Brisbane")

3. Combine our dataset

We’d now like to combine the data sets from each of the four major cities to create one dataset allowing us to easily compare the temperatures of each of the major cities.

Use the bind_rows function to join the Sydney, Melbourne, Canberra and Brisbane data frames:

aus <- bind_rows(syd, mel, canb, bris)

4. Rearrange our dataset

Our final output is one table that should summarise the average maximum temperature for each of the major cities by decade, which will require rearranging our data set using the mutate, group_by, summarise and spread functions from dplyr the package in the tidyverse.

Mutate() computes and appends an additional column called Decade at the end of our table. The Decade column calculates the decade that year is in using simple mathematics calculations.

Then, the data set is grouped by the Location then Decade, where we calculate the average maximum temperature for each decade for that particular city. Note that in the summarise() function, we will include an na.rm argument, to specify that we should calcualte the mean of the non-missing values since there are a few decades with NA values.

We then rearrange the Location rows into their own separate columns using the spread().

aus <- mutate(aus,Decade=Year-Year%%10) %>%
  group_by(Location, Decade) %>% 
  summarise(avg_max=mean(Max_temperature,na.rm=T)) %>%
  spread(Location, avg_max)

Let’s now view the final aus dataframe:

aus

## # A tibble: 17 x 5
##    Decade Brisbane Canberra Melbourne Sydney
##     <dbl>    <dbl>    <dbl>     <dbl>  <dbl>
##  1   1850     NA       NA        NA     21.4
##  2   1860     NA       NA        NA     21.1
##  3   1870     NA       NA        NA     21.0
##  4   1880     NA       NA        NA     20.9
##  5   1890     NA       NA        NA     20.8
##  6   1900     NA       NA        NA     21.0
##  7   1910     NA       NA        NA     21.6
##  8   1920     NA       NA        NA     21.9
##  9   1930     NA       NA        NA     21.5
## 10   1940     NA       NA        NA     21.5
## 11   1950     NA       NA        NA     21.7
## 12   1960     NA       NA        NA     21.9
## 13   1970     NA       NA        19.3   22.1
## 14   1980     NA       NA        19.6   22.3
## 15   1990     27.2     NA        19.4   22.2
## 16   2000     26.5     22.1      20.3   22.9
## 17   2010     26.6     21.0      20.6   23.2

Data Munging of Australian Weather Data

Samantha Manila

19/08/2018

Introduction

Objective

Solution

1. Load our packages

2. Load our data

3. Combine our dataset

4. Rearrange our dataset