Introduction

This is R code equivalent to the MATLAB code to plot Australian rainfall data, as described in the Futurelearn course BIG DATA: DATA VISUALISATION, by QUEENSLAND UNIVERSITY OF TECHNOLOGY.

See sections 2.7 - 2.11 of the course

Data acquisition

Download 3 datasets from the Climate Data Online webpage at the Bureau of Metereology (BOM).

Read the data into R

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
NSW_data <- read.csv("NSW_Data.csv")
SA_data <- read.csv("SA_Data.csv")
WA_data <- read.csv("WA_Data.csv")

Preprocess the data

Shorten column names to first word

names(NSW_data) <- sub("\\..*","",names(NSW_data))
names(SA_data) <- sub("\\..*","",names(SA_data))
names(WA_data) <- sub("\\..*","",names(WA_data))

Add column to identify the station

SA_data$Station = "SA"
WA_data$Station = "WA"
NSW_data$Station = "NSW"

Add column for the day of the year

SA_data$DoY = 1:365
WA_data$DoY = 1:365
NSW_data$DoY = 1:365

Combine the 3 datasets

all_data <- rbind(NSW_data, SA_data, WA_data)

Fill not available rainfall with 0

all_data$Rainfall[is.na(all_data$Rainfall)] <- 0

Aggregate to monthly data

monthly <- aggregate(all_data$Rainfall, 
                     list(Station = all_data$Station, Month = all_data$Month),
                     sum
                    )
names(monthly)[3] <- 'Rainfall'

Plotting the data

plot the daily rainfall for one station

qplot(DoY,Rainfall,data=NSW_data, geom='line')

plot the daily rainfall for the 3 stations

qplot(DoY,Rainfall,data=all_data, geom='line', colour=Station)

plot the monthly rainfall for the 3 stations as lines:

qplot(Month, Rainfall, data=monthly, geom='line', colour=Station)

and as stacked bars:

monthly$Month <- as.factor(monthly$Month)
ggplot(data=monthly, aes(Month)) + geom_bar(aes(weight = Rainfall, fill = Station))

and as dodged bars:

ggplot(data=monthly, aes(Month)) + geom_bar(aes(weight = Rainfall, fill = Station), position="dodge")

Conclusion

R is an excellent platform to do exploratory data visualisation. RStudio provides a simular integrated development environment as matlab, it is free and thus there are no expensiee license costs for students after the trial period.

With the integrated RMarkdown you can create reproduceable documents like this one.