library(readr)
library(readxl)
library(dplyr)
library(ggplot2)
library(deductive)
library(validate)
library(lubridate)
library(Hmisc)
library(tidyr)
library(outliers)
Using the ‘rbind’ function I combined 12 weather data sets. The combined data set was then merged with a 13th to add seasons as a variable. ‘left_join’ function was used and they were merged by the date variable. However, prior to merging data sets I changed the date format so they matched. I removed the first empty column and changed column names to remove unwanted characters. I wish I also removed the spaces in the column names at this stage… I’ve then proceeded to create an extra variable for month as a factor by duplicating the date variable, separating day, month and year and then dropping day and year columns. I then converted the month number to a factor and added labels. I changed the data type for 22 variables. I believed I would achieve more accurate alternatives for missing characters if I completed this step before tidying the data, so I’ve gone ahead and replaced missing data with the mean for numeric and mode for categorical variables. I’ve then gone and tidied up the data based on the variables that had multiple values stored in rows. In hindsight I should’ve created separate data frames for each of the variables i.e. a separate data frame for MSL pressure, cloud, humidity, wind direction, wind speed and temperature. I eventually ended up with 105408 observations. I’ve used box plots to easily search for an identify outliers. Once outliers were identified, I created subsets of the data and used the capping method to replace outliers. Another hindsight moment but I think there’s an opportunity to search for and replace outliers before tidying the data. However, by doing this we would obviously lose the original values completely. Hope you enjoy the following 25 pages.
13 Data sets overall
# Import data and skip the rows that don't form part of the data set
vicsep <- read_csv("VIC_IDCJDW3050.201909.csv", skip = 7)
vicoct <- read_csv("VIC_IDCJDW3050.201910.csv", skip = 7)
vicnov <- read_csv("VIC_IDCJDW3050.201911.csv", skip = 7)
vicdec <- read_csv("VIC_IDCJDW3033.201912.csv", skip = 4)
vicjan <- read_csv("VIC_IDCJDW3050.202001.csv", skip = 7)
vicfeb <- read_csv("VIC_IDCJDW3050.202002.csv", skip = 7)
vicmar <- read_csv("VIC_IDCJDW3050.202003.csv", skip = 7)
vicapr <- read_csv("VIC_IDCJDW3050.202004.csv", skip = 7)
vicmay <- read_csv("VIC_IDCJDW3050.202005.csv", skip = 7)
vicjun <- read_csv("VIC_IDCJDW3050.202006.csv", skip = 7)
vicjul <- read_csv("VIC_IDCJDW3050.202007.csv", skip = 7)
vicaug <- read_csv("VIC_IDCJDW3050.202008.csv", skip = 7)
# Example of one data set as they all have the same variables
head(vicsep)
# Import seasons data set
seasons <- read_csv("seasons.csv")
head(seasons)
# Combine the data sets using rbind
viccomb <- rbind(vicsep, vicoct, vicnov, vicdec, vicjan, vicfeb, vicmar, vicapr, vicmay, vicjun, vicjul, vicaug)
# Drop the first column that won't be used
viccomb$X1 <- NULL
# Add the City variable
viccomb$City <- c("Melbourne")
head(viccomb)
# As we can see from the header of 'seasons' & 'viccomb' data sets the date formats don't match. This variable is key for our merge & we need to convert them into the same format before merging
class(seasons$Date)
## [1] "character"
class(viccomb$Date)
## [1] "character"
seasons$Date <- as.Date(seasons$Date, format = "%d/%m/%Y")
viccomb$Date <- as.Date(viccomb$Date)
# Merge 'seasons' to 'viccomb'
vicseason <- left_join(viccomb, seasons, by = "Date")
head(vicseason)
# Change column names to remove characters
colnames(vicseason)[2] <- "Minimum temperature (C)"
colnames(vicseason)[3] <- "Maximum temperature (C)"
colnames(vicseason)[10] <- "9am Temperature (C)"
colnames(vicseason)[16] <- "3pm Temperature (C)"
# Add in Month variable
vicseason$Month = vicseason$Date
vsm <- vicseason %>% separate(Month, into = c("year", "Month", "day"), sep = "-")
vsm$year <- NULL
vsm$day <- NULL
head(vsm)
We can obtain the data types from the headers and will need to make some changes to make the data suitable for analysis. Below table highlights current formations and required formats in bold. As you’ll see, we need to change the data types for 22 variables, keeping in mind we’ve already made the date changes.
| Variable | Current Data Type | Required Data Type |
|---|---|---|
| Date | date | date |
| Minimum temperature (C) | double | numeric |
| Maximum temperature (C) | double | numeric |
| Rainfall (mm) | double | numeric |
| Evaporation (mm) | double | numeric |
| Sunshine (hours) | double | numeric |
| Direction of maximum wind gust | character | factor |
| Speed of maximum wind gust (km/h) | number | factor |
| Time of maximum wind gust | time | time |
| 9am Temperature (C) | double | numeric |
| 9am relative humidity (%) | double | numeric |
| 9am cloud amount (oktas) | double | numeric |
| 9am wind direction | character | factor |
| 9am wind speed (km/h) | character | numeric |
| 9am MSL pressure (hPa) | double | numeric |
| 3pm Temperature (C) | double | numeric |
| 3pm relative humidity (%) | double | numeric |
| 3pm cloud amount (oktas) | double | numeric |
| 3pm wind direction | character | factor |
| 3pm wind speed (km/h) | character | numeric |
| 3pm MSL pressure (hPa) | double | numeric |
| City | character | factor |
| Season | double | factor |
| Month | character | factor |
# Below code is to change data types and apply labels where appropriate
vsm$`Minimum temperature (C)` <- as.numeric(vsm$`Minimum temperature (C)`)
vsm$`Maximum temperature (C)` <- as.numeric(vsm$`Maximum temperature (C)`)
vsm$`Rainfall (mm)` <- as.numeric(vsm$`Rainfall (mm)`)
vsm$`Evaporation (mm)` <- as.numeric(vsm$`Evaporation (mm)`)
vsm$`Sunshine (hours)` <- as.numeric(vsm$`Sunshine (hours)`)
vsm$`Direction of maximum wind gust` <- as.factor(vsm$`Direction of maximum wind gust`)
vsm$`9am cloud amount (oktas)` <- as.numeric(vsm$`9am cloud amount (oktas)`)
vsm$`Speed of maximum wind gust (km/h)` <- as.numeric(vsm$`Speed of maximum wind gust (km/h)`)
vsm$`9am Temperature (C)` <- as.numeric(vsm$`9am Temperature (C)`)
vsm$`9am relative humidity (%)` <- as.numeric(vsm$`9am relative humidity (%)`)
vsm$`9am wind direction` <- as.factor(vsm$`9am wind direction`)
vsm$`9am wind speed (km/h)` <- as.numeric(vsm$`9am wind speed (km/h)`)
vsm$`9am MSL pressure (hPa)` <- as.numeric(vsm$`9am MSL pressure (hPa)`)
vsm$`3pm cloud amount (oktas)` <- as.numeric(vsm$`3pm cloud amount (oktas)`)
vsm$`3pm Temperature (C)` <- as.numeric(vsm$`3pm Temperature (C)`)
vsm$`3pm relative humidity (%)` <- as.numeric(vsm$`3pm relative humidity (%)`)
vsm$`3pm wind direction` <- as.factor(vsm$`3pm wind direction`)
vsm$`3pm wind speed (km/h)` <- as.numeric(vsm$`3pm wind speed (km/h)`)
vsm$`3pm MSL pressure (hPa)` <- as.numeric(vsm$`3pm MSL pressure (hPa)`)
vsm$Season <- as.factor(vsm$Season)
vsm$Season <- factor(vsm$Season, levels = c("1","2","3","4"),
labels = c("Spring", "Summer", "Winter", "Autumn"), ordered = TRUE)
levels(vsm$Season)
## [1] "Spring" "Summer" "Winter" "Autumn"
vsm$City <- as.factor(vsm$City)
vsm$Month <- as.factor(vsm$Month)
vsm$Month <- factor(vsm$Month, levels = c("09","10","11","12","01","02","03","04","05","06","07","08"),
labels = c("September", "October", "November","December", "January", "February",
"March", "April", "May", "June", "July", "August"), ordered = TRUE)
levels(vsm$Month)
## [1] "September" "October" "November" "December" "January" "February"
## [7] "March" "April" "May" "June" "July" "August"
Mixing it up here but for more accurate imputations I’ve decided to scan for missing values before tidying the data. This allows for more accurate mean estimates as when we tidy the data we’ll be joining variable from 9am to 3pm and min and max values.
# First lets check which columns are missing values
colSums(is.na(vsm))
## Date Minimum temperature (C)
## 0 0
## Maximum temperature (C) Rainfall (mm)
## 0 0
## Evaporation (mm) Sunshine (hours)
## 33 32
## Direction of maximum wind gust Speed of maximum wind gust (km/h)
## 5 5
## Time of maximum wind gust 9am Temperature (C)
## 5 1
## 9am relative humidity (%) 9am cloud amount (oktas)
## 1 35
## 9am wind direction 9am wind speed (km/h)
## 13 13
## 9am MSL pressure (hPa) 3pm Temperature (C)
## 2 0
## 3pm relative humidity (%) 3pm cloud amount (oktas)
## 0 35
## 3pm wind direction 3pm wind speed (km/h)
## 2 2
## 3pm MSL pressure (hPa) City
## 0 0
## Season Month
## 0 0
# 14 columns have missing values. I'll start off first with the numeric variables and replace missing with mean values.
vsm$`Evaporation (mm)`<- impute(vsm$`Evaporation (mm)`, fun = mean)
vsm$`Sunshine (hours)`<- impute(vsm$`Sunshine (hours)`, fun = mean)
vsm$`Speed of maximum wind gust (km/h)`<- impute(vsm$`Speed of maximum wind gust (km/h)`, fun = mean)
vsm$`9am Temperature (C)` <- impute(vsm$`9am Temperature (C)`, fun = mean)
vsm$`9am relative humidity (%)` <- impute(vsm$`9am relative humidity (%)`, fun = mean)
vsm$`9am cloud amount (oktas)` <- impute(vsm$`9am cloud amount (oktas)`, fun = mean)
vsm$`9am wind speed (km/h)`<- impute(vsm$`9am wind speed (km/h)`, fun = mean)
vsm$`9am MSL pressure (hPa)`<- impute(vsm$`9am MSL pressure (hPa)`, fun = mean)
vsm$`3pm cloud amount (oktas)` <- impute(vsm$`3pm cloud amount (oktas)`, fun = mean)
vsm$`3pm wind speed (km/h)`<- impute(vsm$`3pm wind speed (km/h)`, fun = mean)
# Now to replace the missing categorical/factor variables with the mode
vsm$`Direction of maximum wind gust` <- impute(vsm$`Direction of maximum wind gust`, fun = mode)
vsm$`9am wind direction` <- impute(vsm$`9am wind direction`, fun = mode)
vsm$`3pm wind direction`<- impute(vsm$`3pm wind direction`, fun = mode)
vsm$`Time of maximum wind gust`<- impute(vsm$`Time of maximum wind gust`, fun = mode)
# I'll be using the below function from Dr. Anil Dolgun's Module 5 course material to Check every numerical column whether they have infinite or NaN or NA values using a function called is.specialorNA
is.specialorNA <- function(x){
if (is.numeric(x)) (is.infinite(x) | is.nan(x) | is.na(x))
}
# Now to apply this function to the data set.
sapply(vsm, function(x) sum( is.na(x) ))
## Date Minimum temperature (C)
## 0 0
## Maximum temperature (C) Rainfall (mm)
## 0 0
## Evaporation (mm) Sunshine (hours)
## 0 0
## Direction of maximum wind gust Speed of maximum wind gust (km/h)
## 0 0
## Time of maximum wind gust 9am Temperature (C)
## 5 0
## 9am relative humidity (%) 9am cloud amount (oktas)
## 0 0
## 9am wind direction 9am wind speed (km/h)
## 0 0
## 9am MSL pressure (hPa) 3pm Temperature (C)
## 0 0
## 3pm relative humidity (%) 3pm cloud amount (oktas)
## 0 0
## 3pm wind direction 3pm wind speed (km/h)
## 0 0
## 3pm MSL pressure (hPa) City
## 0 0
## Season Month
## 0 0
head(vsm)
I get a bit nervous with this step as I go from 366 to 105408 observations, but here we go. Wind speed, temperature (c), wind direction, humidity, cloud amount and MSL pressure variables all have multiple values stored in rows, therefore the data set is not tidy.
# Below codes will aim to ensure each variable has it's own column by grouping every time the variable is measured and bring it into 2 columns being the categorical variable and the value. For example, combining the max speed of wind, 9am wind speed and 3pm wind speed:
vsmgather <- vsm %>% gather ('Speed of maximum wind gust (km/h)', '9am wind speed (km/h)',
'3pm wind speed (km/h)', key = "Wind Measurement", value = "Wind km/h")
vsmgather1 <- vsmgather %>% gather ('Minimum temperature (C)','Maximum temperature (C)','9am Temperature (C)',
'3pm Temperature (C)', key = "Temperature Measurement", value = "Degrees Celcius(C)")
vsmgather2 <- vsmgather1 %>% gather ('Direction of maximum wind gust', '9am wind direction','3pm wind direction',
'3pm wind direction',key = 'Wind Direction', value = 'Direction')
vsmgather3 <- vsmgather2 %>% gather('9am relative humidity (%)','3pm relative humidity (%)', key = "Humidity", value = "%")
vsmgather4 <- vsmgather3 %>% gather('9am cloud amount (oktas)','3pm cloud amount (oktas)', key = "Cloud Amount", value = "Oktas")
vsmgather5 <- vsmgather4 %>% gather('9am MSL pressure (hPa)','3pm MSL pressure (hPa)', key = "MSL Pressure", value = "hPA")
# AS each line
Below we’ll create a column to calculate the Temperature normalisation by dividing the temperature value by the mean temperature value
# Mutate and divide the newly created 'Degrees Celcius (C)' variable by mean
vsmtidy <- vsmgather5 %>% mutate(Temp_norm = `Degrees Celcius(C)`/mean(`Degrees Celcius(C)`))
head(vsmtidy)
Below we’ll scan the variables for any outliers. We have 3 univariate outliers and 5 multivariate outliers. All scans will be done using box plots.
#During the imputation of missing values some of the data types changed to double. In order to create the box plots I'll need to change these back to numeric.
vsmtidy$`Sunshine (hours)` <- as.numeric(vsmtidy$`Sunshine (hours)`)
vsmtidy$`Wind km/h` <- as.numeric(vsmtidy$`Wind km/h`)
vsmtidy$`Degrees Celcius(C)` <- as.numeric(vsmtidy$`Degrees Celcius(C)`)
vsmtidy$`%` <- as.numeric(vsmtidy$`%`)
vsmtidy$hPA <- as.numeric(vsmtidy$hPA)
vsmtidy$Oktas <- as.numeric(vsmtidy$Oktas)
# Now to check for Univariate outliers in Rainfall. As rainfall is best measured by monthly rainfall as opposed to daily I've created a dataframe with monthly totals. If i used daily figures a lot of values would end up with too many outliers and the data would be useless. Therefore by using using monthly values we end up with a more accurate picture.
#Rainfall box plots
monthrain <- vsm %>% group_by(Month) %>% summarise(Monthly_Rainfall = sum(`Rainfall (mm)`, na.rm = TRUE))
monthrain$Monthly_Rainfall %>% boxplot(main="Rainfall", ylab="mm", col = "lightblue")
#Good start, no outliers for rainfall
#Evaporation box plot, similar to rainfall we're using monthly totals.
monthevaporation <- vsm %>% group_by(Month) %>% summarise(Monthly_Evap = sum(`Evaporation (mm)`, na.rm = TRUE))
monthevaporation$Monthly_Evap %>% boxplot(main="Evaporation", ylab="mm", col = "cyan")
#No outliers for evaporation
#sunshine box plot
vsmtidy$`Sunshine (hours)` %>% boxplot(main="Sunshine", ylab="hours", col = "coral")
#No outliers
#Now to check the Multivariate outliers
#Wind box plot
boxplot(vsmtidy$`Wind km/h` ~ vsmtidy$`Wind Measurement`, main="Wind by km/h", ylab = "km/h", xlab = "Measurement", col = "pink")
#Outliers detected we'll deal with them shortly
#Temperature box plot
boxplot(vsmtidy$`Degrees Celcius(C)` ~ vsmtidy$`Temperature Measurement`, main="Temperature by Degree Celcius", ylab = "Temperature (C)", xlab = "Temperature Measurement", col = "yellow")
#Outliers detected
#Humidity box plot
boxplot(vsmtidy$`%` ~ vsmtidy$Humidity, main="Humidity time by Humidity %", ylab = "Humidity %", xlab = "Humidity time", col = "orange")
#outliers detected
#MSL Pressure
boxplot(vsmtidy$hPA ~ vsmtidy$`MSL Pressure`, main="MSL Pressure by hPa", ylab = "hPA", xlab = "MSL Pressure", col= "magenta")
#Outliers detected are minimal, no action .
#Cloud box plots
boxplot(vsmtidy$Oktas ~ vsmtidy$`Cloud Amount`, main="Cloud measurement by Oktas", ylab = "Oktas", xlab = "Time of Cloud Measurement", col = "purple")
#No outliers
#I am going to use the method of capping to replace outliers. These outliers aren't mistakes therefore using capping ensures we keep our results close enough to actuals without having outliers skewing the outcomes. Again, I'm using a function Dr. Anil Dolgun's Data Wrangling module 6 notes. Which also indicate, the function was taken from https://stackoverflow.com/questions/13339685/how-to-replace-outliers-with-the-5th-and-95th-percentile-values-in-r?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa
cap <- function(x){
quantiles <- quantile( x, c(.05, 0.25, 0.75, .95 ) )
x[ x < quantiles[2] - 1.5*IQR(x) ] <- quantiles[1]
x[ x > quantiles[3] + 1.5*IQR(x) ] <- quantiles[4]
x
}
# Now to replace the outliers I'll need to break the multivariate outliers down by the qualitative variable. First I'll start by creating data frame for Wind
wind <- vsmtidy %>% select(Date, `Wind Measurement`,`Wind km/h`)
#Now to filter variable '3pm Wind Speed'
wind3 <- wind %>% filter(wind$`Wind Measurement` == "3pm wind speed (km/h)")
#apply the capping function to a subset of the 3pm wind data
wind3_capped <- wind3$`Wind km/h` %>% cap()
#repeat the above steps to other variables as well
#9am Wind Speed
wind9 <- wind %>% filter(wind$`Wind Measurement` == "9am wind speed (km/h)")
wind9_capped <- wind9$`Wind km/h` %>% cap()
#Max Wind Speed
windmax <- wind %>% filter(wind$`Wind Measurement` == "Speed of maximum wind gust (km/h)")
windmax_capped <- windmax$`Wind km/h` %>% cap()
#Below subsets and summaries have been set up to draw a comparison between original values and the capped values.
winduncapped <- cbind(windmax$`Wind km/h`, wind9$`Wind km/h`, wind3$`Wind km/h`)
windcapped <- cbind(windmax_capped, wind9_capped, wind3_capped)
summary(winduncapped)
## V1 V2 V3
## Min. :13.00 Min. : 2.00 Min. : 2.0
## 1st Qu.:28.00 1st Qu.: 7.00 1st Qu.: 9.0
## Median :33.00 Median : 9.00 Median :13.0
## Mean :34.83 Mean :10.17 Mean :13.8
## 3rd Qu.:43.00 3rd Qu.:13.00 3rd Qu.:17.0
## Max. :85.00 Max. :33.00 Max. :33.0
summary(windcapped)
## windmax_capped wind9_capped wind3_capped
## Min. :13.00 Min. : 2.00 Min. : 2.00
## 1st Qu.:28.00 1st Qu.: 7.00 1st Qu.: 9.00
## Median :33.00 Median : 9.00 Median :13.00
## Mean :34.71 Mean :10.09 Mean :13.77
## 3rd Qu.:43.00 3rd Qu.:13.00 3rd Qu.:17.00
## Max. :61.00 Max. :22.00 Max. :28.00
boxplot(winduncapped, col="red")
boxplot(windcapped, col="maroon")
# We'll follow the same steps for the Temperature outliers
temp <- vsmtidy %>% select(Date, `Temperature Measurement`,`Degrees Celcius(C)`)
#3pm Temperature
temp3 <- temp %>% filter(temp$`Temperature Measurement` == "3pm Temperature (C)")
temp3_capped <- temp3$`Degrees Celcius(C)` %>% cap()
#9am Temperature
temp9 <-temp %>% filter(temp$`Temperature Measurement`== "9am Temperature (C)")
temp9_capped <- temp9$`Degrees Celcius(C)` %>% cap()
#Max Temperature
tempmax <- temp %>% filter(temp$`Temperature Measurement`== "Maximum temperature (C)")
tempmax_capped <- tempmax$`Degrees Celcius(C)` %>% cap()
#Min Temperature
tempmin <- temp %>% filter(temp$`Temperature Measurement`== "Minimum temperature (C)")
tempmin_capped <- tempmin$`Degrees Celcius(C)` %>% cap()
#Comparison with uncapped and capped data
tempuncapped <- cbind(tempmax$`Degrees Celcius(C)`,tempmin$`Degrees Celcius(C)`, temp3$`Degrees Celcius(C)`, temp9$`Degrees Celcius(C)`)
tempcapped <- cbind(tempmax_capped, tempmin_capped, temp3_capped, temp9_capped)
summary(tempuncapped)
## V1 V2 V3 V4
## Min. :10.30 Min. : 1.70 Min. : 7.20 Min. : 2.50
## 1st Qu.:15.30 1st Qu.: 8.20 1st Qu.:14.30 1st Qu.:10.80
## Median :18.60 Median :10.85 Median :17.45 Median :13.65
## Mean :19.94 Mean :11.00 Mean :18.37 Mean :13.98
## 3rd Qu.:23.00 3rd Qu.:13.60 3rd Qu.:20.90 3rd Qu.:16.50
## Max. :43.50 Max. :23.50 Max. :42.20 Max. :33.10
summary(tempcapped)
## tempmax_capped tempmin_capped temp3_capped temp9_capped
## Min. :10.30 Min. : 1.70 Min. : 7.20 Min. : 2.50
## 1st Qu.:15.30 1st Qu.: 8.20 1st Qu.:14.30 1st Qu.:10.80
## Median :18.60 Median :10.85 Median :17.45 Median :13.65
## Mean :19.74 Mean :10.98 Mean :18.15 Mean :13.83
## 3rd Qu.:23.00 3rd Qu.:13.60 3rd Qu.:20.90 3rd Qu.:16.50
## Max. :34.30 Max. :21.50 Max. :30.60 Max. :24.20
boxplot(tempuncapped, col="red")
boxplot(tempcapped, col="maroon")
# We'll follow the same steps for the Temperature outliers
humidity <- vsmtidy %>% select(Date, Humidity,'%')
#3pm humidity
humidity3 <- humidity %>% filter(humidity$Humidity == "3pm relative humidity (%)")
humidity3_capped <- humidity3$'%' %>% cap()
# 9am humidity
humidity9 <- humidity %>% filter(humidity$Humidity == "9am relative humidity (%)")
humidity9_capped <- humidity9$'%' %>% cap()
#Comparison with uncapped and capped data
humidityuncapped <- cbind(humidity9$'%',humidity3$'%')
humiditycapped <- cbind(humidity9_capped, humidity3_capped)
summary(humidityuncapped)
## V1 V2
## Min. : 17.00 Min. : 13.00
## 1st Qu.: 65.00 1st Qu.: 47.00
## Median : 74.00 Median : 56.00
## Mean : 73.72 Mean : 57.11
## 3rd Qu.: 83.00 3rd Qu.: 66.00
## Max. :100.00 Max. :100.00
summary(humiditycapped)
## humidity9_capped humidity3_capped
## Min. : 38.00 Min. :19.00
## 1st Qu.: 65.00 1st Qu.:47.00
## Median : 74.00 Median :56.00
## Mean : 74.09 Mean :57.13
## 3rd Qu.: 83.00 3rd Qu.:66.00
## Max. :100.00 Max. :92.00
boxplot(humidityuncapped, col="red")
boxplot(humiditycapped, col="maroon")
Dolgun, A 2020, ‘Module 6 Scan: Outliers’, Module notes, MATH2349, RMIT University, http://rare-phoenix-161610.appspot.com/secured/Module_06.html#multivariate_outlier_detection_methods
Dolgun, A 2020, ‘Module 5 Scan: Missing Values’, Module notes, MATH2349, RMIT University, http://rare-phoenix-161610.appspot.com/secured/Module_05.html
Baglin, J 2020, ‘R Bootcamp - Course 4: R Markdown’, Module notes, MATH1324, RMIT University, https://astral-theory-157510.appspot.com/secured/RBootcamp_Course_04.html#creating_an_r_markdown_document_in_r_studio
RStudio, R Markdown from RStudio https://rmarkdown.rstudio.com/lesson-1.html
Modern Data Science with R, 2nd edition, Benjamin S. Baumer, Daniel T. Kaplan, and Nicholas J. Horton https://beanumber.github.io/mdsr2e/ch-dataII.html
Bureau of Meteorology, Notes to accompany Daily Weather Observations, http://www.bom.gov.au/climate/dwo/IDCJDW0000.shtml
Bureau of Meteorology, Melbourne (Olympic Park), Victoria October 2020 Daily Weather Observations, http://www.bom.gov.au/climate/dwo/IDCJDW3033.latest.shtml