This procedure explains how to download and process the Bureau of Meteorology data.
The data is stored in a ZIP file at a fixed ftp location.
# Download and extract data
file <- "ftp://ftp.bom.gov.au/anon2/home/ncc/srds/Scheduled_Jobs/DS036_ColibanWater/DS036Coliban.zip"
download.file(file, destfile="DS036Coliban.zip")
unzip("DS036Coliban.zip")
data.files <- dir()
The ZIP file contains 34 files, of which 37 contain data. The DC02D_Notes_5559344761.txt
file describes the data files. The DC02D_StnDet_5559344761.txt
file is a control table of the stations.
The control table includes a range of characteristics of the included stations, including geospatial information, length of record keeping and data quality. The currently included stations are:
stations <- read.csv("DC02D_StnDet_5559344761.txt")
stations$Station.Name <- gsub("(?<=\\b)([a-z])", "\\U\\1", tolower(stations$Station.Name), perl=TRUE) # Capitalisation
stations$Station.Name <- sub("\\s+$", "", stations$Station.Name) #Remove trailing spaces
knitr::kable(stations[,c(2,4)])
Bureau.of.Meteorology.Station.Number | Station.Name |
---|---|
80002 | Boort |
80013 | Pyramid Hill (Sylvaterre) |
80015 | Echuca Aerodrome |
80017 | Gladfield Hopefield Estate |
80020 | Gunbower Gee Tee Stud |
80023 | Kerang |
80027 | Korong Vale (Burnbank) |
80036 | Mincha |
80049 | Rochester |
80103 | Dingee |
80128 | Charlton |
81002 | Bealiba |
81020 | Inglewood (Post Office) |
81041 | Raywood |
81047 | Tarnagulla |
81058 | Bridgewater (Post Office) |
81083 | Eppalock Reservoir |
81085 | Dunolly |
81092 | Eastville (Bonnie Banks) |
81115 | Wanalta Daen Station |
81123 | Bendigo Airport |
88029 | Heathcote |
88042 | Malmsbury Reservoir |
88043 | Maryborough |
88048 | Newstead |
88050 | Pyalong West (Cavan Park) |
88059 | Trentham (Post Office) |
88108 | Vaughan |
88110 | Castlemaine Prison |
88118 | Harcourt |
88123 | Kyneton |
88161 | Maldon (Stump St) |
Each .txt
file contains 61 days of climate data (percipitaion, maximum temperature and evaporation). Not all data is available for each station at a daily basis. Refer to the DC02D_Notes_5559344761.txt
file for details.
data.files <- data.files[grepl("_Data_", data.files)]
climate.data <- data.frame()
for (data.file in data.files) {
station.data <- read.csv(data.file)
climate.data <- rbind(climate.data, station.data)
}
climate.data$Day.Month.Year.in.DD.MM.YYYY.format <- as.Date(climate.data$Day.Month.Year.in.DD.MM.YYYY.format, format="%d/%m/%Y")
This results in a data table with 3059 observations. This data is merged with parts of the stationlist to add geospatial context to the data.
climate.data <- merge(climate.data, stations[,c(2,4,7,8)], by.x="Station.Number", by.y="Bureau.of.Meteorology.Station.Number" )
library(ggplot2)
ggplot(climate.data, aes(x=Day.Month.Year.in.DD.MM.YYYY.format, y=Maximum.temperature.in.24.hours.after.9am..local.time..in.Degrees.C)) + geom_line() + facet_wrap(~Station.Name) + labs(title="Maximum temperature")
## Warning: Removed 1000 rows containing missing values (geom_path).
ggplot(climate.data, aes(x=Day.Month.Year.in.DD.MM.YYYY.format, y=Precipitation.in.the.24.hours.before.9am..local.time..in.mm)) + geom_line() + facet_wrap(~Station.Name) + labs(title="Percipitaion")
ggplot(climate.data, aes(x=Day.Month.Year.in.DD.MM.YYYY.format, y=Evaporation.in.24.hours.before.9am..local.time..in.mm)) + geom_line() + facet_wrap(~Station.Name) + labs(title="Percipitaion")
## Warning: Removed 2967 rows containing missing values (geom_path).