Introduction

This procedure explains how to download and process the Bureau of Meteorology data.

Download and Extract Data

The data is stored in a ZIP file at a fixed ftp location.

# Download and extract data
file <- "ftp://ftp.bom.gov.au/anon2/home/ncc/srds/Scheduled_Jobs/DS036_ColibanWater/DS036Coliban.zip"
download.file(file, destfile="DS036Coliban.zip")
unzip("DS036Coliban.zip")
data.files <- dir()

The ZIP file contains 34 files, of which 37 contain data. The DC02D_Notes_5559344761.txt file describes the data files. The DC02D_StnDet_5559344761.txt file is a control table of the stations.

Control Table

The control table includes a range of characteristics of the included stations, including geospatial information, length of record keeping and data quality. The currently included stations are:

stations <- read.csv("DC02D_StnDet_5559344761.txt")
stations$Station.Name <- gsub("(?<=\\b)([a-z])", "\\U\\1", tolower(stations$Station.Name), perl=TRUE) # Capitalisation
stations$Station.Name <- sub("\\s+$", "", stations$Station.Name) #Remove trailing spaces
knitr::kable(stations[,c(2,4)])
Bureau.of.Meteorology.Station.Number Station.Name
80002 Boort
80013 Pyramid Hill (Sylvaterre)
80015 Echuca Aerodrome
80017 Gladfield Hopefield Estate
80020 Gunbower Gee Tee Stud
80023 Kerang
80027 Korong Vale (Burnbank)
80036 Mincha
80049 Rochester
80103 Dingee
80128 Charlton
81002 Bealiba
81020 Inglewood (Post Office)
81041 Raywood
81047 Tarnagulla
81058 Bridgewater (Post Office)
81083 Eppalock Reservoir
81085 Dunolly
81092 Eastville (Bonnie Banks)
81115 Wanalta Daen Station
81123 Bendigo Airport
88029 Heathcote
88042 Malmsbury Reservoir
88043 Maryborough
88048 Newstead
88050 Pyalong West (Cavan Park)
88059 Trentham (Post Office)
88108 Vaughan
88110 Castlemaine Prison
88118 Harcourt
88123 Kyneton
88161 Maldon (Stump St)

Merge Data

Each .txt file contains 61 days of climate data (percipitaion, maximum temperature and evaporation). Not all data is available for each station at a daily basis. Refer to the DC02D_Notes_5559344761.txt file for details.

data.files <- data.files[grepl("_Data_", data.files)]
climate.data <- data.frame()
for (data.file in data.files) {
  station.data <- read.csv(data.file)
  climate.data <- rbind(climate.data, station.data)
}
climate.data$Day.Month.Year.in.DD.MM.YYYY.format <- as.Date(climate.data$Day.Month.Year.in.DD.MM.YYYY.format, format="%d/%m/%Y")

This results in a data table with 3059 observations. This data is merged with parts of the stationlist to add geospatial context to the data.

climate.data <- merge(climate.data, stations[,c(2,4,7,8)], by.x="Station.Number", by.y="Bureau.of.Meteorology.Station.Number" )

Explore data

library(ggplot2)
ggplot(climate.data, aes(x=Day.Month.Year.in.DD.MM.YYYY.format, y=Maximum.temperature.in.24.hours.after.9am..local.time..in.Degrees.C)) + geom_line() + facet_wrap(~Station.Name) + labs(title="Maximum temperature")
## Warning: Removed 1000 rows containing missing values (geom_path).

ggplot(climate.data, aes(x=Day.Month.Year.in.DD.MM.YYYY.format, y=Precipitation.in.the.24.hours.before.9am..local.time..in.mm)) + geom_line() + facet_wrap(~Station.Name) + labs(title="Percipitaion")

ggplot(climate.data, aes(x=Day.Month.Year.in.DD.MM.YYYY.format, y=Evaporation.in.24.hours.before.9am..local.time..in.mm)) + geom_line() + facet_wrap(~Station.Name) + labs(title="Percipitaion")
## Warning: Removed 2967 rows containing missing values (geom_path).