Data Wrangling Assessment Task 3: Dataset challenge

Introduction:

Australia is a country that is synonymous for the consistently warm weather and the fantastic beaches that surround our great nation. In the international communities, it may surprise some when presented with the idea that Australia is in fact a diverse landscape of many climates and environments. In particular, Australia is not commonly known for the ski fields; this is generally thought of in conjunction with countries such as Canada, New Zealand & Japan.

On the continent of main-land Australia, there is an area of 7,938 km2 that are covered in snow for 30 days per year on average. Whilst this only makes up 0.1% of the total land mass of Australia, it is still a significant portion of land.

Amongst the snow covered areas of Australia include various ski resorts that attract a vast range of tourists from all backgrounds. These include Mt Hotham, Mt Buller & Falls Creek in Victoria, Perisher Blue & Thredbo in NSW and Ben Lomond & Mt Mawson in Tasmania.

The purpose of this report is to utilize publicly available data-sets to attempt to determine whether there is any corollary evidence between the depth of natural snow fall in the Australian ski fields in comparison to weather data in order to depict a trend between declining snowfalls and global warming.

R Markdown will be utilized to analyse and present the relevant information. To commence this, the following items must be loaded into R Studio:

LOAD THE REQUIRED PACKAGES

#Run the required package
library(rvest)
library(readr)

## 
## Attaching package: 'readr'

## The following object is masked from 'package:rvest':
## 
##     guess_encoding

library(readxl)
library(data.table)
library(tidyverse)

## ── Attaching packages
## ───────────────────────────────────────
## tidyverse 1.3.2 ──

## ✔ ggplot2 3.3.6     ✔ dplyr   1.0.9
## ✔ tibble  3.1.7     ✔ stringr 1.4.0
## ✔ tidyr   1.2.0     ✔ forcats 0.5.1
## ✔ purrr   0.3.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::between()        masks data.table::between()
## ✖ dplyr::filter()         masks stats::filter()
## ✖ dplyr::first()          masks data.table::first()
## ✖ readr::guess_encoding() masks rvest::guess_encoding()
## ✖ dplyr::lag()            masks stats::lag()
## ✖ dplyr::last()           masks data.table::last()
## ✖ purrr::transpose()      masks data.table::transpose()

library(magrittr)

## 
## Attaching package: 'magrittr'
## 
## The following object is masked from 'package:purrr':
## 
##     set_names
## 
## The following object is masked from 'package:tidyr':
## 
##     extract

library(knitr)

#Might not need later
library(stringr)
require(lubridate)

## Loading required package: lubridate
## 
## Attaching package: 'lubridate'
## 
## The following objects are masked from 'package:data.table':
## 
##     hour, isoweek, mday, minute, month, quarter, second, wday, week,
##     yday, year
## 
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

library(writexl)
library(forecast)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

library(ggplot2)
library(here)

## here() starts at C:/Users/cal_m/OneDrive/Documents/MATH2405

Hypothesis

The 2 hypothesis’ that this report aims to discover are as follows:

There is a direct relationship between snowfall levels and weather factors such as temperature
There is a downward trend in Snow Fall depths at Mt Hotham due to increasing global temperatures

Data Description

Mt Hotham Snow-Fall

The following report utilities a data set made available by the Victorian State Government title “Victorian Alpine Resorts - Daily Snow Depth Records Mount Hotham”, which can be accessed from the following website:

https://discover.data.vic.gov.au/dataset/victorian-alpine-resorts-daily-snow-depth-records-mount-hotham

This contains the following information:

Natural snow depth at Mt Hotham from 1993 to 2020
Man-made snow depth (snow-making machine in use) from 2006 to 2020
10 Year average of snow depth

For the purposes of this report, the only information required will be the natural snow-depths in meters for each day of the 27-years of data.

What causes snowfall?

The geographical location, elevation level and wind velocity are examples of environmental factors that may effect the climate of a specific region. A low temperature is a key element that must be present as water vapor must be below zero degrees to reach freezing point. But what else is truly important in snow-making process?

An article published by X. Zhang & X. Li titled “Environmental factors influencing snowfall and snowfall prediction in the Tianshan Mountains, Northwest China” examined this in great detail. Whilst Northwest China is significantly different geographically to Southern-east Australia, the article states that “relative humidity, temperature and longitude were identified as three of the most important variables influencing snowfall”.

For the purposes of this report, the following variables will be examined in conjunction with snowfall data:

Mean temperature
Max temperature
Relative humidity

This Weather information is readily accessible from Long Paddock - this is an QLD Government organisation with publicly available weather data that is targeted at creating ease in access for climate data-scientists. This can be accessed from the following website:

https://www.longpaddock.qld.gov.au/silo/

For the purposes of this report, it will be assumed that the longitude is a constant factor, so it will not be taken into account.

IMPORT DATA FROM APPROVED DATASOURCES

#**DATA FOR HOTHAM SNOWFALL**

SNOW <- "https://arcc.vic.gov.au/wp-content/uploads/2021/06/Data-2020-Daily-Snow-Depth-Records_Hotham.csv"  #create an object for the URL

SNOW_data <- read.csv(SNOW) #use read.csv function to create a data frame from the csv link embedded in the URL


#**DATA FOR HOTHAM WEATHER**

SILO <- "https://www.longpaddock.qld.gov.au/cgi-bin/silo/PatchedPointDataset.php?station=83085&format=csv&start=19930101&finish=20201231&username=noemail@net.com&password=BoMonly&comment=rxnhg"  #create an object for the URL

SILO_data <- read.csv(SILO) #use read.csv function to create a data frame from the csv link embedded in the URL

Understand

For simplicity, the data sets utilized will henceforth be referred to as SNOW_data (Mt Hotham snowfall dataset) & SILO_data (Weather dataset).

The following code chunk contains the function str() - this is used to display data structure, variables and attributes of a dataset. See below results for detailed explanation.

SNOW_data

#Display str() information for SNOW_data
str(SNOW_data)

## 'data.frame':    162 obs. of  99 variables:
##  $ MOUNT.HOTHAM                                          : chr  "" "" "5-Jun" "6-Jun" ...
##  $ X                                                     : chr  "" "Day" "5/6" "6/6" ...
##  $ X.1                                                   : chr  "" "" "5-Jun" "" ...
##  $ X.2                                                   : chr  "" "10 Year Average Snow Making Depth" "0.00" "0.00" ...
##  $ X.3                                                   : chr  "" "10 Year Average Natural Snow Depth" "6" "0" ...
##  $ X.4                                                   : chr  "Daily data" "2020 Natural Snow Depth" "0" "0" ...
##  $ X.5                                                   : chr  "" "2020 Average depth in snow-making area (36 ha)" "" "" ...
##  $ X.6                                                   : chr  "" "+ means that snowmaking depth is = or > than natural snow depth" "" "" ...
##  $ X.7                                                   : chr  "" "2019 Natural Snow Depth" "56" "" ...
##  $ X.8                                                   : chr  "" "2019 Average depth in snow-making area (36 ha)" "" "" ...
##  $ X.9                                                   : chr  "" "+ means that snowmaking depth is = or > than natural snow depth" "" "" ...
##  $ X.10                                                  : chr  "Daily data" "2018 Natural Snow Depth" "" "" ...
##  $ X.11                                                  : chr  "" "2018 Average depth in snow-making area (36 ha)" "" "" ...
##  $ X.12                                                  : chr  "" "+ means that snowmaking depth is = or > than natural snow depth" "" "" ...
##  $ SNOW.DEPTHS...NATURAL.FROM.1993..SNOW.MAKING.FROM.2006: chr  "" "2017 Natural Snow Depth" "" "" ...
##  $ X.13                                                  : chr  "" "2017 Average depth in snow-making area (36 ha)" "" "" ...
##  $ X.14                                                  : chr  "" "+ means that snowmaking depth is = or > than natural snow depth" "" "" ...
##  $ X.15                                                  : chr  "" "2016 Natural Snow Depth" "" "" ...
##  $ X.16                                                  : chr  "" "2016 Average depth in snow-making area (36 ha)" "" "" ...
##  $ X.17                                                  : chr  "" "+ means that snowmaking depth is = or > than natural snow depth" "" "" ...
##  $ X.18                                                  : chr  "" "2015 Natural Snow Depth" "" "0" ...
##  $ X.19                                                  : chr  "" "2015 Average depth in snow-making area (36 ha)" "" "0" ...
##  $ X.20                                                  : chr  "" "+ means that snowmaking depth is = or > than natural snow depth" "" "" ...
##  $ X.21                                                  : chr  "" "2014 Natural Snow Depth" "" "" ...
##  $ X.22                                                  : chr  "" "2014 Average depth in snow-making area (36 ha)" "" "" ...
##  $ X.23                                                  : logi  NA NA NA NA NA NA ...
##  $ X.24                                                  : chr  "" "2013 Natural Snow Depth" "" "" ...
##  $ X.25                                                  : chr  "" "2013 Average depth in snow-making area (36 ha)" "" "" ...
##  $ X.26                                                  : chr  "" "" "" "" ...
##  $ X.27                                                  : chr  "" "2012 Natural Snow Depth" "" "" ...
##  $ X.28                                                  : chr  "" "2012 Average depth in snow-making area (36 ha)" "" "" ...
##  $ X.29                                                  : chr  "" "" "" "" ...
##  $ X.30                                                  : chr  "" "2011 Natural Snow Depth" "" "" ...
##  $ X.31                                                  : chr  "" "2011 Average depth in snow-making area (36 ha)" "" "" ...
##  $ X.32                                                  : chr  "" "" "" "" ...
##  $ X.33                                                  : chr  "" "2010 Natural Snow Depth" "" "" ...
##  $ X.34                                                  : chr  "" "2010 Average depth in snow-making area (36 ha)" "" "" ...
##  $ X.35                                                  : chr  "" "" "" "" ...
##  $ X.36                                                  : chr  "" "2009 Natural Snow Depth" "0" "0" ...
##  $ X.37                                                  : chr  "" "2009 Average depth in snow-making area (36 ha)" "0" "0" ...
##  $ X.38                                                  : chr  "" "" "" "" ...
##  $ X.39                                                  : chr  "" "2008 Natural Snow Depth" "" "" ...
##  $ X.40                                                  : chr  "" "2008 Average depth in snow-making area (36 ha)" "" "" ...
##  $ X.41                                                  : chr  "" "" "" "" ...
##  $ X.42                                                  : chr  "" "2007 Natural Snow Depth" "" "" ...
##  $ X.43                                                  : chr  "" "2007 Average depth in snow-making area (36 ha)" "" "" ...
##  $ X.44                                                  : logi  NA NA NA NA NA NA ...
##  $ X.45                                                  : chr  "" "2006 Natural Snow Depth" "" "" ...
##  $ X.46                                                  : chr  "" "2006 Average depth in snow-making area (36 ha)" "" "" ...
##  $ X.47                                                  : chr  "" "+ means that snowmaking depth is = or > than natural snow depth" "" "" ...
##  $ X.48                                                  : int  NA 2005 NA NA NA NA NA 0 0 0 ...
##  $ X.49                                                  : chr  "" "BLANK" "" "" ...
##  $ X.50                                                  : chr  "" "BLANK" "" "" ...
##  $ X.51                                                  : int  NA 2004 0 0 0 0 0 0 0 0 ...
##  $ X.52                                                  : int  NA 2003 0 11 31 35 33 31 30 29 ...
##  $ X.53                                                  : int  NA 2002 NA NA 2 11 2 0 0 0 ...
##  $ X.54                                                  : int  NA 2001 0 0 0 0 0 0 0 0 ...
##  $ X.55                                                  : int  NA 2000 75 85 82 80 81 84 80 74 ...
##  $ X.56                                                  : int  NA 1999 0 0 0 0 0 0 0 0 ...
##  $ X.57                                                  : int  NA 1998 0 0 0 20 18 15 15 12 ...
##  $ X.58                                                  : num  NA 1997 0 0 0 ...
##  $ X.59                                                  : int  NA 1996 0 0 0 4 3 2 0 0 ...
##  $ X.60                                                  : int  NA 1995 0 0 0 0 0 0 14 17 ...
##  $ X.61                                                  : int  NA 1994 0 0 0 0 0 2 4 6 ...
##  $ X.62                                                  : int  NA 1993 0 0 0 0 0 0 16 67 ...
##  $ X.63                                                  : logi  NA NA NA NA NA NA ...
##  $ X.64                                                  : logi  NA NA NA NA NA NA ...
##  $ X.65                                                  : logi  NA NA NA NA NA NA ...
##  $ X.66                                                  : logi  NA NA NA NA NA NA ...
##  $ X.67                                                  : logi  NA NA NA NA NA NA ...
##  $ X.68                                                  : logi  NA NA NA NA NA NA ...
##  $ X.69                                                  : logi  NA NA NA NA NA NA ...
##  $ X.70                                                  : logi  NA NA NA NA NA NA ...
##  $ X.71                                                  : logi  NA NA NA NA NA NA ...
##  $ X.72                                                  : logi  NA NA NA NA NA NA ...
##  $ X.73                                                  : logi  NA NA NA NA NA NA ...
##  $ X.74                                                  : logi  NA NA NA NA NA NA ...
##  $ X.75                                                  : logi  NA NA NA NA NA NA ...
##  $ X.76                                                  : logi  NA NA NA NA NA NA ...
##  $ X.77                                                  : logi  NA NA NA NA NA NA ...
##  $ X.78                                                  : logi  NA NA NA NA NA NA ...
##  $ X.79                                                  : logi  NA NA NA NA NA NA ...
##  $ X.80                                                  : logi  NA NA NA NA NA NA ...
##  $ X.81                                                  : logi  NA NA NA NA NA NA ...
##  $ X.82                                                  : logi  NA NA NA NA NA NA ...
##  $ X.83                                                  : logi  NA NA NA NA NA NA ...
##  $ X.84                                                  : logi  NA NA NA NA NA NA ...
##  $ X.85                                                  : logi  NA NA NA NA NA NA ...
##  $ X.86                                                  : logi  NA NA NA NA NA NA ...
##  $ X.87                                                  : logi  NA NA NA NA NA NA ...
##  $ X.88                                                  : logi  NA NA NA NA NA NA ...
##  $ X.89                                                  : logi  NA NA NA NA NA NA ...
##  $ X.90                                                  : logi  NA NA NA NA NA NA ...
##  $ X.91                                                  : logi  NA NA NA NA NA NA ...
##  $ X.92                                                  : logi  NA NA NA NA NA NA ...
##  $ X.93                                                  : logi  NA NA NA NA NA NA ...
##  $ X.94                                                  : logi  NA NA NA NA NA NA ...
##  $ X.95                                                  : logi  NA NA NA NA NA NA ...
##  $ X.96                                                  : logi  NA NA NA NA NA NA ...

Description of SNOW_data

Structure:

This is a data frame structure that consists of 162 observations and 99 variables.

Chr format:

Many of the variables contained are in Character format - this will need to be corrected as the Chr data type does not allow for further statistics and calculation.

Headings:

The variable headings have not come through correctly in the import and will require rectification - if left in their current state then the dataset becomes difficult to understand.

Important variables:

As previously stated, the only important information from this data set is the date variable and the Natural snow depth variables for each year.

Log format:

The variables ‘X. 63’ to ‘X. 96’ are shown as Logical data types but appear to consist solely of NA information so they should be removed.

Tidiness:

Upon review, this is a substantially untidy dataset - this will be re-formatted to conform with tidy data principles in a later section of this report.

SNOW_data: RENAME HEADERS, CORRECTION TO DATA TYPE, REMOVE UNNECESSARY VARIABLES

#Create numeric observation
SNOW_data2 <- rbind(SNOW_data, 1:99)

#Convert new ROW to header to assist in Sub-setting - this way it is easy to refer to a specific variable whilst the appropriate headers are a work-in-progress
colnames(SNOW_data2) <- SNOW_data2[163, ]

#Subset so we are left with Date column & natural snow depth for each year
SNOW_data2 <- SNOW_data2[, c("2", "6", "9", "12", "15", "18", "21", "24", "27", "30", "33", "36", "39", "42", "45", "48", "51", "54", "55", "56", "57", "58", "59", "60", "61", "62", "63", "64", "65")]

#Create new list - to be used as column headers
chr_years <- as.list(c("DDMM", "2020", "2019", "2018", "2017", "2016", "2015", "2014", "2013", "2012", "2011", "2010", "2009", "2008", "2007", "2006", "2005", "2004", "2003", "2002", "2001", "2000", "1999", "1998", "1997", "1996", "1995", "1994", "1993"))

#Insert new list as a Row in Dataset
SNOW_data2 <- rbind(SNOW_data2, chr_years)

#Change column headings to new Row
colnames(SNOW_data2) <- SNOW_data2[164, ]

#Remove first 2 Rows
SNOW_data2 <- SNOW_data2[-1:-2, ]

#Convert 'Snow_depth (m)' to numeric - note: 'suppresswarnings()' function is utilized as it is my intention to replace blank information with NAs through coercion. If this function wasn't used then a number of unnecessary warning errors would appear.
suppressWarnings(SNOW_data2$`1993` <- as.numeric(SNOW_data2$`1993`))
suppressWarnings(SNOW_data2$`1994` <- as.numeric(SNOW_data2$`1994`))
suppressWarnings(SNOW_data2$`1995` <- as.numeric(SNOW_data2$`1995`))
suppressWarnings(SNOW_data2$`1996` <- as.numeric(SNOW_data2$`1996`))
suppressWarnings(SNOW_data2$`1997` <- round(as.numeric(SNOW_data2$`1997`)))
suppressWarnings(SNOW_data2$`1998` <- as.numeric(SNOW_data2$`1998`))
suppressWarnings(SNOW_data2$`1999` <- as.numeric(SNOW_data2$`1999`))
suppressWarnings(SNOW_data2$`2000` <- as.numeric(SNOW_data2$`2000`))
suppressWarnings(SNOW_data2$`2001` <- as.numeric(SNOW_data2$`2001`))
suppressWarnings(SNOW_data2$`2002` <- as.numeric(SNOW_data2$`2002`))
suppressWarnings(SNOW_data2$`2003` <- as.numeric(SNOW_data2$`2003`))
suppressWarnings(SNOW_data2$`2004` <- as.numeric(SNOW_data2$`2004`))
suppressWarnings(SNOW_data2$`2005` <- as.numeric(SNOW_data2$`2005`))
suppressWarnings(SNOW_data2$`2006` <- as.numeric(SNOW_data2$`2006`))
suppressWarnings(SNOW_data2$`2007` <- as.numeric(SNOW_data2$`2007`))
suppressWarnings(SNOW_data2$`2008` <- as.numeric(SNOW_data2$`2008`))
suppressWarnings(SNOW_data2$`2009` <- as.numeric(SNOW_data2$`2009`))
suppressWarnings(SNOW_data2$`2010` <- as.numeric(SNOW_data2$`2010`))
suppressWarnings(SNOW_data2$`2011` <- as.numeric(SNOW_data2$`2011`))
suppressWarnings(SNOW_data2$`2012` <- as.numeric(SNOW_data2$`2012`))
suppressWarnings(SNOW_data2$`2013` <- as.numeric(SNOW_data2$`2013`))
suppressWarnings(SNOW_data2$`2014` <- as.numeric(SNOW_data2$`2014`))
suppressWarnings(SNOW_data2$`2015` <- as.numeric(SNOW_data2$`2015`))
suppressWarnings(SNOW_data2$`2016` <- as.numeric(SNOW_data2$`2016`))
suppressWarnings(SNOW_data2$`2017` <- round(as.numeric(SNOW_data2$`2017`), digits = 0))
suppressWarnings(SNOW_data2$`2018` <- round(as.numeric(SNOW_data2$`2018`), digits = 0))
suppressWarnings(SNOW_data2$`2019` <- round(as.numeric(SNOW_data2$`2019`), digits = 0))
suppressWarnings(SNOW_data2$`2020` <- as.numeric(SNOW_data2$`2020`))

#Convert NAs to Zero for the whole dataset
SNOW_data2[is.na(SNOW_data2)] <- 0

#Subset to remove unnecessary rows
SNOW_data2 <- SNOW_data2[-127:-162 , ]

#Display the updated Header names for each variable
colnames(SNOW_data2)

##  [1] "DDMM" "2020" "2019" "2018" "2017" "2016" "2015" "2014" "2013" "2012"
## [11] "2011" "2010" "2009" "2008" "2007" "2006" "2005" "2004" "2003" "2002"
## [21] "2001" "2000" "1999" "1998" "1997" "1996" "1995" "1994" "1993"

#Display updated str() information for SNOW_data
str(SNOW_data2)

## 'data.frame':    126 obs. of  29 variables:
##  $ DDMM: chr  "5/6" "6/6" "7/6" "8/6" ...
##  $ 2020: num  0 0 0 0 0 0 0 0 0 3 ...
##  $ 2019: num  56 0 52 49 43 33 30 27 14 14 ...
##  $ 2018: num  0 0 0 0 5 3 1 5 3 19 ...
##  $ 2017: num  0 0 30 30 26 25 25 25 23 22 ...
##  $ 2016: num  0 0 0 0 0 0 2 6 0 0 ...
##  $ 2015: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 2014: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 2013: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 2012: num  0 0 0 0 35 33 33 33 30 29 ...
##  $ 2011: num  0 0 0 0 0 0 27 26 24 21 ...
##  $ 2010: num  0 0 0 0 0 0 0 8 13 13 ...
##  $ 2009: num  0 0 0 10 19 38 38 35 33 33 ...
##  $ 2008: num  0 0 0 0 0 0 0 0 10 15 ...
##  $ 2007: num  0 0 0 23 23 19 19 21 22 23 ...
##  $ 2006: num  0 0 0 0 0 0 5 5 5 5 ...
##  $ 2005: num  0 0 0 0 0 0 0 0 0 4 ...
##  $ 2004: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 2003: num  0 11 31 35 33 31 30 29 25 53 ...
##  $ 2002: num  0 0 2 11 2 0 0 0 10 27 ...
##  $ 2001: num  0 0 0 0 0 0 0 0 0 4 ...
##  $ 2000: num  75 85 82 80 81 84 80 74 74 73 ...
##  $ 1999: num  0 0 0 0 0 0 0 0 3 30 ...
##  $ 1998: num  0 0 0 20 18 15 15 12 25 24 ...
##  $ 1997: num  0 0 0 0 0 2 0 0 0 20 ...
##  $ 1996: num  0 0 0 4 3 2 0 0 0 0 ...
##  $ 1995: num  0 0 0 0 0 0 14 17 17 16 ...
##  $ 1994: num  0 0 0 0 0 2 4 6 5 5 ...
##  $ 1993: num  0 0 0 0 0 0 16 67 81 79 ...

SILO_data

#Display str() information for SNOW_data
str(SILO_data)

## 'data.frame':    10227 obs. of  13 variables:
##  $ station          : int  83085 83085 83085 83085 83085 83085 83085 83085 83085 83085 ...
##  $ YYYY.MM.DD       : chr  "1993-01-01" "1993-01-02" "1993-01-03" "1993-01-04" ...
##  $ daily_rain       : num  0 0.6 3.6 5.4 0 0 0 0.5 0.1 0 ...
##  $ daily_rain_source: int  25 25 25 25 25 25 25 25 25 25 ...
##  $ max_temp         : num  17.8 20.7 15.5 11.1 11.8 11.5 12.1 16.5 15.4 18.9 ...
##  $ max_temp_source  : int  0 0 0 0 0 0 25 25 25 25 ...
##  $ min_temp         : num  12.1 12.1 9.8 -1.9 2.8 2.1 3.8 4.7 7.6 8 ...
##  $ min_temp_source  : int  0 0 0 0 0 0 25 25 25 25 ...
##  $ rh_tmax          : num  56.5 29.1 69.9 43.2 62.9 56 61.7 46.4 66.3 48.6 ...
##  $ rh_tmax_source   : int  26 26 26 26 26 26 26 26 26 26 ...
##  $ rh_tmin          : num  81.5 50.3 100 100 100 100 100 100 100 98.9 ...
##  $ rh_tmin_source   : int  26 26 26 26 26 26 26 26 26 26 ...
##  $ metadata         : chr  "name=MOUNT HOTHAM                            " "latitude= -36.9772" "longitude= 147.1342" "elevation=1849.0 m" ...

Description of SILO_data

Structure:

This is also a data frame structure and consists of 10,227 observations and 13 variables.

Headings:

will be added to the numeric variables to make it clear that the figures presented are measured in Celsius. The following are the current variables description for this data set and summarises the course of action to occur in the upcoming chunk.

‘station’: integer - all values at same weather station. Variable will be removed

‘YYYY.MM.DD’: character - this is in a tidy structure for conversion to date format

‘daily_rain’: numeric

‘daily_rain_source’: integer - this information is irrelevant - will be removed

max_temp - numeric - Add ‘(C)’

max_temp_source’: integer - this information is irrelevant - will be removed

min_temp - numeric - Add ‘(C)’

min_temp_source’: integer - this information is irrelevant - will be removed

rh_tmax - numeric - change to ‘Relat_Humid_max’

rh_tmax_source’: integer - this information is irrelevant - will be removed

rh_tmin - Action required: change to ‘Relat_Humid_min’

rh_tmin_source’: integer - this information is irrelevant - will be removed

metadata’: integer - this information is irrelevant - will be removed

Tidiness:

This data set is significantly tidier and conforms with the tidy data principles effectively.

No further manipulation to this dataset will be required due to the following:

Each distinct variable is contained within it’s own column
It is in a long-format so that each specific day has a unique row

The second point stipulated is important as the Date variable will be utilized in the following chunk to combine the two datasets.

SILO_data: RENAME HEADERS, CORRECTION TO DATA TYPE, REMOVE UNNECESSARY VARIABLES

#Bring forward data backup
SILO_data2 <- SILO_data

SILO_data3 <-
  data.frame(
    'Date' = as.Date(c(SILO_data2$YYYY.MM.DD)),
    'Rainfall_mm' = c(SILO_data2$daily_rain),
    'Max_Temp_cels' = c(SILO_data2$max_temp),
    'Min_Temp_cels' = c(SILO_data2$min_temp),
    'Relat_Humid_max' = c(SILO_data2$rh_tmax),
    'Relat_Humid_min' = c(SILO_data2$rh_tmin)
  )

#Display updated string
str(SILO_data)

## 'data.frame':    10227 obs. of  13 variables:
##  $ station          : int  83085 83085 83085 83085 83085 83085 83085 83085 83085 83085 ...
##  $ YYYY.MM.DD       : chr  "1993-01-01" "1993-01-02" "1993-01-03" "1993-01-04" ...
##  $ daily_rain       : num  0 0.6 3.6 5.4 0 0 0 0.5 0.1 0 ...
##  $ daily_rain_source: int  25 25 25 25 25 25 25 25 25 25 ...
##  $ max_temp         : num  17.8 20.7 15.5 11.1 11.8 11.5 12.1 16.5 15.4 18.9 ...
##  $ max_temp_source  : int  0 0 0 0 0 0 25 25 25 25 ...
##  $ min_temp         : num  12.1 12.1 9.8 -1.9 2.8 2.1 3.8 4.7 7.6 8 ...
##  $ min_temp_source  : int  0 0 0 0 0 0 25 25 25 25 ...
##  $ rh_tmax          : num  56.5 29.1 69.9 43.2 62.9 56 61.7 46.4 66.3 48.6 ...
##  $ rh_tmax_source   : int  26 26 26 26 26 26 26 26 26 26 ...
##  $ rh_tmin          : num  81.5 50.3 100 100 100 100 100 100 100 98.9 ...
##  $ rh_tmin_source   : int  26 26 26 26 26 26 26 26 26 26 ...
##  $ metadata         : chr  "name=MOUNT HOTHAM                            " "latitude= -36.9772" "longitude= 147.1342" "elevation=1849.0 m" ...

Tidy & Manipulate Data - Part 1

SNOW_data

This dataset is particularly untidy and will require significant manipulation in order for it to be merged with the much tidier SILO_data.

Untidy Date format

This dataset contains a Date format that violates the tidy data principles in a few different ways. The key principle at effect here is that each individual variable must be located in it’s own unique Column in the dataset.

The variable ‘DDMM’ contains Day and Month conjoined with a ‘/’. This makes re-formatting to a date format difficult as these will first need to be separated to remove the ‘/’.
The Snowfall depths are summarized in that each specific year has it’s own variable - this is untidy as every column contains the same information (how deep is the snow) and should therefore be contained within a single variable.

This will be rectified in the following chunk by utilizing the long_pivot function. This will result in all snowfall depth data being contained within a single variable based on the specific date.

Day and Month values must be combined with Year value in order for the date to be re-formatted into a usable data type.

PIVOT THE DATASET TO LONG FORMAT & RE-FORMAT DATE VARIABLE

#Create dataset backup to ensure integrity of data in manipulation process
SNOW_data3 <- SNOW_data2

#Pivot the dataset so it's in a Long Format using the pivot_longer function
SNOW_data4 <- 
  SNOW_data3 %>%
  pivot_longer(cols = c("2020", "2019", "2018", "2017", "2016", "2015", "2014", "2013", "2012", "2011", "2010", "2009", "2008", "2007", "2006", "2005", "2004", "2003", "2002", "2001", "2000", "1999", "1998", "1997", "1996", "1995", "1994", "1993"),
               names_to = "Year",
               values_to = "Snow_depth (m)")

#Create a separate dataset for 'DDMM' variable
SNOW_dates_ddmm <-
  data.frame('SNOW_dd_mm' = SNOW_data4$DDMM)

#Separate days and months so they each have a unique variable
SNOW_dates <- 
  SNOW_dates_ddmm %>% separate(SNOW_dd_mm, c("Day", "Month"), sep = "/")

#Mutate to bring in the Year column from the prev dataset
SNOW_dates <-
  SNOW_dates %>% mutate(
    Year = c(SNOW_data4$Year)
  )

#Change all variables to Integers
SNOW_dates$Year <- as.double(SNOW_dates$Year)
SNOW_dates$Month <- as.double(SNOW_dates$Month)
SNOW_dates$Day <- as.double(SNOW_dates$Day)

#Separate Day data
day_df <- tibble(Day = c(SNOW_dates$Day))

#Format all values so they have 2x numbers, if single-digit then 'pad' the value with '0' (Result is DD)
day_df <- 
  day_df %>% 
  mutate(
    "Day" = str_pad(day_df$Day, width = 2, pad = '0'))

#Separate Month data
mon_df <- tibble(Month = c(SNOW_dates$Month))

#Format all values so they have 2x numbers, if single-digit then 'pad' the value with '0' (Result is MM)
mon_df <-
  mon_df %>%
    mutate(
      "Month" = str_pad(mon_df$Month, width = 2, pad = '0'))

#Change Day variable to character
SNOW_dates2 <-
  SNOW_dates %>%
  mutate("Day" = c(as.character(day_df$Day)))

#Change Month variable to character  
SNOW_dates2 <-
  SNOW_dates2 %>% 
  mutate("Month" = c(as.character(mon_df$Month)))

#Create new Data Frame with tidied Day, Month & Year
dates <-
  as.data.frame(do.call(paste,
                 c(SNOW_dates2[c("Day", "Month", "Year")], sep = "")))

#Subset to remove original date variables 'DDMM' & 'Year'
SNOW_data4 <-
  subset(SNOW_data4, select = c('Snow_depth (m)'))

#Add revised date column to original snow depth data
SNOW_data5 <-
  SNOW_data4 %>%
    mutate(Date = as.Date(dates$`do.call(paste, c(SNOW_dates2[c("Day", "Month", "Year")], sep = ""))`, "%d%m%Y"))

#Move 'Date' variable to the Left
SNOW_data5 <-
  data.frame('Date' = c(SNOW_data5$Date),
             'Snow_Depth_cm' = c(SNOW_data5$`Snow_depth (m)`))

#Check string
str(SNOW_data5)

## 'data.frame':    3528 obs. of  2 variables:
##  $ Date         : Date, format: "2020-06-05" "2019-06-05" ...
##  $ Snow_Depth_cm: num  0 56 0 0 0 0 0 0 0 0 ...

We can see now that we are left with just 2 vital variables: Date in the appropriate date format and Snow Depth in numeric format

SILO_data

The date format of the SILO_data needs correction also. This is corrected in the following chunk.

RE-FORMAT DATE VARIABLE

#Bring forward back up
SILO_data4 <- SILO_data3

#Create separate data frame for Date variable
dates2 <- data.frame(
  'Date' = c(SILO_data4$Date))

dates3 <- data.frame(
  'Date' = c(as.Date(dates2$Date, format = "%Y-%m-%d")))

#Add revised date column to original SILO data
SILO_data4 <-
  SILO_data4 %>%
    mutate('Date' = c(dates3$Date))

#Display string information
str(SILO_data4)

## 'data.frame':    10227 obs. of  6 variables:
##  $ Date           : Date, format: "1993-01-01" "1993-01-02" ...
##  $ Rainfall_mm    : num  0 0.6 3.6 5.4 0 0 0 0.5 0.1 0 ...
##  $ Max_Temp_cels  : num  17.8 20.7 15.5 11.1 11.8 11.5 12.1 16.5 15.4 18.9 ...
##  $ Min_Temp_cels  : num  12.1 12.1 9.8 -1.9 2.8 2.1 3.8 4.7 7.6 8 ...
##  $ Relat_Humid_max: num  56.5 29.1 69.9 43.2 62.9 56 61.7 46.4 66.3 48.6 ...
##  $ Relat_Humid_min: num  81.5 50.3 100 100 100 100 100 100 100 98.9 ...

MERGE DATASETS

#Bring forward datasets
pre_merge_SILO <- SILO_data4
pre_merge_SNOW <- SNOW_data5

#Use 'inner_join' function to connect the datasets based on Date variable
FRESH_data <-
  inner_join(pre_merge_SILO, pre_merge_SNOW, by = "Date")

#Display string information
str(FRESH_data)

## 'data.frame':    3528 obs. of  7 variables:
##  $ Date           : Date, format: "1993-06-05" "1993-06-06" ...
##  $ Rainfall_mm    : num  0 0 0 0.2 0 2 1.7 14.8 3.5 0.2 ...
##  $ Max_Temp_cels  : num  1.9 4 3 4.2 -0.5 -3 1.4 -1.2 0.6 4.8 ...
##  $ Min_Temp_cels  : num  -8.5 -1.3 -1.6 -2 -2.6 -4.2 -7.5 -5.3 -5.2 -3.3 ...
##  $ Relat_Humid_max: num  60 55.4 59.4 69.1 100 91.9 66.6 73.3 67.4 57 ...
##  $ Relat_Humid_min: num  100 81 82.8 100 100 100 100 99.5 100 100 ...
##  $ Snow_Depth_cm  : num  0 0 0 0 0 0 16 67 81 79 ...

Tidy & Manipulate Data - Part 2

Now that the datasets have been appropriately tidied and merged together in a concise format, we can begin to expand on the data to extrapolate further information.

As previously discussed, the key weather factors that will be examined are temperature and relative humidity as these generally lead to snow fall. In order to review this in conjunction with the snowfall data, a new variable - ‘SNOW_factor’ - will be inserted into the dataset.

To do this, the following will be created from the dataset, and then combined into a single ‘SNOW_factor’ variable.

In theory, the lower the ‘SNOW_factor’, the higher the snowfall depth recorded.

Average Temperature

This is simply the average between the Max and Min temperatures for a given day - these two variables will be added together and divided by 2 to provide an average value.

In theory, a lower average temperature will be more conducive to snowfall than a high average.

Relative Humidity factor

Not that the relative humidity variables already included have a maximum value of 100 - this looks like this data is actually intended to be a percentage of humidity rather than a numeric value. So this can be treated as a percentage, the values will all be divided by 100.

In theory, a HIGHER relative humidity is more conducive to snowfall.

CREATE A VARIABLE FOR SNOW_factor

#Add new variable to Data set
FRESH_data2 <- 
  FRESH_data %>%
  mutate(
    Ave_Temp_cels = ((FRESH_data$Max_Temp_cels + FRESH_data$Min_Temp_cels)/2)
  )

#Manipulate relative humidity variables to be a quotient (divide by 100)
FRESH_data2 <-
  FRESH_data2 %>%
  mutate(Relat_Humid_max = FRESH_data$Relat_Humid_max / 100)

FRESH_data2 <-
  FRESH_data2 %>%
  mutate(Relat_Humid_min = FRESH_data$Relat_Humid_min / 100)

#Create SNOW_factor variable by multiplying 'ave_temp_cels' and 'Relat_Humid_min', and add to Data set
FRESH_data3 <- 
  FRESH_data2 %>%
  mutate(SNOW_factor = Relat_Humid_max * Ave_Temp_cels)

Scan Data - Part 1 (Missing data)

In the initial stages of preparing the Mt Hotham snowfall data (‘SNOW_data’) for analysis, a number of blank variables were present within the data. As this information was required to be transformed into numeric, the blank cells were coerced into NA values. The NA values were then replaced in the data set with the value ‘0’.

The issue here is that there are certain ‘0’ values present throughout the dataset that have an effect on the output information. ‘0’ is expected result for snowfall depths in the very early seasons (e.g. June), but they should not appear throughout an average year.

An example of this error is present within the the variable for 2005 snow data - which is shown as follows:

#The following is used to display all Row numbers that contain '0'. 

#Note '+ 2' is add to this formula as SNOW_data2 has Rows 1 & 2 hidden.

which(SNOW_data2$`2005` == 0) + 2

##  [1]   3   4   5   6   7   8   9  10  11 107 110 111 112 113 123 124 125 126 127
## [20] 128

How can there be a recorded snow depth in observations 106 and 108 but a 0 value in 107? Likewise for 110-113. The snow depth cannot simply disappear completely one day and re-appear the next, so this must be an error in the data.

In order to fix this, all ‘0’ value observations will be removed from the updated dataset.

CODE TO REMOVE ‘0’ VALUES

#Remove '0' value observations using filter and '!' to retain all observations that DON'T contain the value of 0 in Snow Depth variable
SCAN_data <-
  FRESH_data3 %>%
  filter(!FRESH_data3$Snow_Depth_cm == 0)

Scan Data - Part 2 (Outlier data)

When observing the boxplot diagram for the Rainfall_mm variable, it is clear that there are only 4 values out of a possible 3,044 observations that have greater than 80 mm of rainfall in a single day.

However this isn’t a significant anomaly, so no further data manipulation will be required. The datasets created already contain a limited range of potential values so adverse outlier values are immaterial.

There are some outlier years within the dataset that will be dealt with in the Transformation section of this report. The year 2000 is a good example of this as it has a very poor SNOW_factor score whilst have a very high average snow depth for the year. This goes against the general principles that I have applied throughout this report so these will be removed from analysis.

A potential cause of this is that the snow fall in 2000 came within a much smaller period of time, and the rest of the year had higher temperatures that adversely impacted it’s SNOW_factor.

ANALYSIS OF DATA TO IDENTIFY OUTLIERS

#Display mean information
mean(SCAN_data$Rainfall_mm)

## [1] 4.170861

mean(SCAN_data$Relat_Humid_max)

## [1] 0.7337986

mean(SCAN_data$Snow_Depth_cm)

## [1] 81.64619

mean(SCAN_data$Ave_Temp_cels)

## [1] -0.6180683

mean(SCAN_data$SNOW_factor)

## [1] -0.6936285

#Display boxplot information for visual representation
boxplot(SCAN_data$Rainfall_mm)

boxplot(SCAN_data$Relat_Humid_max)

boxplot(SCAN_data$Snow_Depth_cm)

boxplot(SCAN_data$Ave_Temp_cels)

boxplot(SCAN_data$SNOW_factor)

Transform

Now that the dataset are in a tidy format, and some missing values have been eliminated, we can look to transform the data in the attempt to provide a clearer understanding to viewers.

The approach adopted in this section is to take the average snowfall depth and SNOW_factor grouped by each individual year. By doing so, we can compare the relationship between Snowfall depth and SNOW_factor to test whether this newly created variable is verifiable with the initial hypothesis.

Firstly, in order to review the distribution formation of the average years, the following summary has been created and applied to a Histogram.

SUMMARISE & REVIEW HISTOGRAM

#Create new Dates data frame in order to add the variable 'Year' back into the dataset
dates4 <-
  data.frame(
    'Year' = c(SNOW_dates$Year),
    'Date' = c(SNOW_data5$Date)
  )

#Create new data set and use inner-join to connect the year with each observation (using 'Date' as the joining variable)
SUM_Stat <-
  SCAN_data %>%
  inner_join(dates4)

## Joining, by = "Date"

#Create new data set that is grouped by 'Year'
ave_snowfall_per_year <-
  SUM_Stat %>%
  group_by(Year) %>%
  summarise(
    ave_SNOW_factor = mean(SNOW_factor, na.rm = TRUE),
    ave_SNOW_depth = mean(Snow_Depth_cm, na.rm = TRUE),
  )

#Present Histogram for new summarized variables
hist(ave_snowfall_per_year$ave_SNOW_factor,
     xlab = "Average SNOW Factor",
     main = bquote("Histogram of Average SNOW Factor"),
     sub = "Prior to transformation")

hist(ave_snowfall_per_year$ave_SNOW_depth,
     xlab = "Average SNOW Depth",
     main = bquote("Histogram of average SNOW Depth"),
     sub = "Prior to transformation")

Analysis

These histograms appear reasonably correct, however the distribution of information at this stage is random.

In order to gain a clearer picture of the variables, they should find be normalized so they are comparable.

The following section will change the scaling and display of the average snow fall & average SNOW_factor variables so they can be normalized. This will allow for greater clarity when comparing the information as the scaling will be more uniform.

The methods used below are Mean Centering & Box Cox transformation - prior to plotting the information, the best option will fist be considered between the two methods.

TRANSFORMATION OF DATA

#Bring forward data backup
Transformed_data <- SUM_Stat

#Setup variables for ease of access
a1 <- ave_snowfall_per_year$ave_SNOW_depth
b1 <- ave_snowfall_per_year$ave_SNOW_factor
c1 <- ave_snowfall_per_year$Year

#**SNOW_Depth**

#Mean centering
a1_mean_cent <-
  scale(a1, center = TRUE, scale = FALSE)
hist(a1_mean_cent,
     xlab = "Average Snow Depth",
     main = bquote("Histogram of Average Snow Depth"),
     sub = "Mean Centered Transformation")

#Box Cox
a1_box_cox <- BoxCox(a1, lambda = "auto")
lambda <- attr(a1_box_cox, which = "lambda")
hist(a1_box_cox,
     xlab = "Average Snow Depth",
     main = bquote("Histogram of Average Snow Depth"),
     sub = "Box-Cox Transformation")

#**SNOW_Factor**

#Mean centering
b1_mean_cent <-
  scale(b1, center = TRUE, scale = FALSE)
hist(b1_mean_cent,
     xlab = "Average Snow Factor",
     main = bquote("Histogram of Average Snow Factor"),
     sub = "Mean Centered Transformation")

#Box Cox
b1_box_cox <- BoxCox(b1, lambda = lambda)
lambda2 <- attr(b1_box_cox, which = "lambda")
hist(b1_box_cox,
     xlab = "Average Snow Factor",
     main = bquote("Histogram of Average Snow Factor"),
     sub = "Box-Cox Transformation")

Further Analysis

When comparing the display of the Mean Centered averages to the Box-Cox transformation, the Box-Cox information appears to be more suitable for further plotting.

For Snow Depth, the Box-Cox transformation is nicely improved in comparison to the original display, and is comparably better than the mean centered histogram as the distribution appears to have greater normality.

Conversely, the histograms for SNOW Factor do appear much better when Mean Centering is utilised - Therefore, in order to display the best piece of information to depict the relationship, the Box-Cox variable for Snow Depth will be plotted against the Mean Centered varaible for SNOW_Factor.

DISPLAY INFORMATION IN PLOT FORMAT

#Create plot graph
Factor_Depth_relationship1 <- plot(a1_box_cox, b1_mean_cent, pch = 20, col = "20",
             xlab = 'Snow Depth (Box-Cox)',
             ylab = 'Snow Factor (Mean-centered)',
             main = bquote('Plot relationship between Snow Depth & Snow Factor'),
             sub = ('Transformed'))

#This plot output shows an outlier (note the top right corner of the graph - this is the year 2000) - this will be subset from the data set by creating a new data frame with the box cox figures for Depth & Factor

#Create separate data frame
OUTLIER_data <- data.frame(
  SNOW_Factor_meancent = c(b1_mean_cent),
  SNOW_Depth_boxcox = c(a1_box_cox),
  Year = c(ave_snowfall_per_year$Year))

#Inspect str to locate outlier observations
str(OUTLIER_data$SNOW_Depth_boxcox)

##  num [1:28] 8.36 9.89 11.3 13.48 7.81 ...

#Note: upon review of the results, the outlier observations are in Rows 8

REMOVE OUTLIER AND REPREPARE PLOT GRAPH

#Subset to remove identified observations
OUTLIER_data <-
  OUTLIER_data[-c(8), ]

#Box Cox
Factor_Depth_relationship2 <- plot(OUTLIER_data$SNOW_Factor_meancent, OUTLIER_data$SNOW_Depth_boxcox, pch = 20, col = "20",
                                  main = bquote('Relationship between average yearly snow depth and SNOW_Factor'),
                                  sub = ('Excluding 2000'),
                                  xlab = ('Snow Factor'),
                                  ylab = ('Normalised Snow Depth'))

Analysis based on Year

It is clear from this output that there is a definitive relationship between the level of snow fall received and the snow factor utilized.

Now that this is clear, the same variables will not be compared with each year to see if there are any downward trends that could point towards climate change impact.

CREATE PLOT DIAGRAM TO SHOW SNOW FACTOR PER YEAR & SNOW DEPTH PER YEAR

#Bring forward relevant variables
a2 <- OUTLIER_data$SNOW_Depth_boxcox
b2 <- OUTLIER_data$SNOW_Factor_meancent
c2 <- OUTLIER_data$Year

#Create plot graph for Snow Depth per year
Snow_depth_yearly <- plot(c2, a2, pch = 20, col = "red",
                          xlab = ('Year'),
                          ylab = ('Normalised Average Snowfall'),
                          main = bquote("Average Yearly Snow Depth"))

#Create plot graph for Snow Factor per year
Snow_factor_yearly <- plot(c2, b2, pch = 20, col = "blue",
                          xlab = ('Year'),
                          ylab = ('Normalised Average SNOW_Factor'),
                          main = bquote('Average Yearly SNOW_Factor'))

Conclusion

Hypothesis 1

There is a relationship between snowfall, relative humidity & temperature (replicate the results of X. Zhang & X. Li)

It is evident from the analysis that there is a definitive relationship between snowfall, relative humidity and temperature related data. This is an important point to provide evidence for as the initial theory was developed based on the unique climate of North Western China.

Now that it has been established that these metrics are suitable for the Australian environment, they may be used in future data analysis for the impacts of rising temperatures and associated snowfall.

Hypothesis 2

There is a downward trend in Snow Fall depths at Mt Hotham due to increasing global temperatures

From this analysis, it is not immediately apparent that climate change has had any impact of the level of Snow depth recorded at Mt Hotham over the last 30 years, so further analysis must be undertaken.

It has also become apparent throughout the process of this analysis that climate change science is generally measured over a much broader scope - 30 years worth of information may not be great enough of a time period in order to see any clear trend.

Summary

whilst the impacts of climate change have the potential to be vast, the data analytics that underpins the science based evidence is based upon minor changes over a longer period of time. This is because the impacts on temperature have a compounding effect, meaning that the effect are not easily measurable in just a subset period of time.

Fortunately though, this data analysis and the corresponding code may help to form the basis of further analysis in the future - a greater set of snowfall data could be applied to this analysis so the results are readily available. This type of science is vital given the looming impacts of global warming on our natural ecosystems as the planet in which we inhabit starts to struggle with man-kind - hopefully this report may help to continue the discussion so real change can be made with the way humans interact with mother Earth.

Reflective journal

My initial plan

My plan from the outset was to keep my options broad and my topic un-selected whilst looking for good datasets.

By far the most complex part of this assignment is actually finding good usable data from an online source that is free to use. To make things even harder, I had to find multiple datasets that fit the criteria.

By keeping my options broad, I can tailor the contents of the report/presentation to whatever suits the best 2 datasets.

In our final online webinar, Tam mentioned that a good source for untidy data was Government websites, so this is where I started.

I eventually stumbled across snowfall data for Mt Hotham, Falls Creek, Mt Buller and other snow fields in Australia. These datasets are in a really terrible layout so good opportunity to satisfy tidy section of the report.

After I found this, I started thinking of what I kind of data I could combine this with - I found average number of visitor information so thought it would be cool to see if the most popular days correlated with high levels of snowfall.

I also thought of the climate change angle to my report - if temperatures are increasing due to global warming then surely snowfall is dropping too???

The key question that I aimed to answer

Is there any measurable impact of global warming on the snowfall levels in Australian ski fields.

Challenges

The Bureau of Meteorology website forces data scrapes into a Secured format so read.csv and tabular data scrapping is not working. At this stage I am reconsidering the weather aspect to this report as if I can’t get good data from BOM then what chance do I have?

My new favourite website! SILO is a QLD government website that allows for open access to detailed weather data for all of Australia. This is absolutely perfect and even allows to be filtered to just give temperature and humidity.

This Mt Hotham snowfall data is really untidy.

Cols 2006 - 2020 work fine, but for some reason 1993 - 2005 are not working. The data type for these columns must be different. Note also that the column names for 1993-2005 are all numeric, whereas 2006-2020 contains the words Natural Snow Depth, which forces them to be Chr.

Having a lot of trouble with the date integers being in single-digit format for day and month. I needed to extrapolate the date data in order to create a single 8 digit value of DDMMYYYY.

This is particularly difficult with SNOW_data as the chr value was really inconsistent - sometimes it was DD/M, D/MM or even D/M

I couldn’t just add a ‘0’ to the variable as then I would be adding 0 to double-digit values too.

I could explore the following: IF single digit variable in ‘Day’ <- add 0 at the start & IF single digit variable in ‘Month’ <- add 0 to the start

Update: This idea didn’t work

When researching I found the str_pad function and I got it to work! This allows you to specify the number of digits required in a value (2) and to ‘pad’ any missing values with a 0. Nice :)

I found another awesome function that has got the date format into the right order - it’s a combination of as.Date, do.call & paste functions. I don’t really understand how this is working but it gets me the right answer.

I am not good at making graphs. Future unit of Data Visualization would be handy right now!!!

Conclusion

I can’t really see any upward trend in temperature and no downward trend in snow fall depth - so can’t really prove global warming here… I need more data to be able to see a longer term picture.

The hypothesis used to create SNOW_factor seems reasonably correct though which is great.

Insights

A date variable in a dataset is a great way to combine data if the date format is correct - the date that something occurred is pretty universal so can apply to many different datasets.

I’m feel much more proficient with using R - I always learn best when I am practically trying to do something so using real world data and trying to solve an actual problem allowed me to be creative to find the right answers and to master what I have learnt in Data Wrangling.

I really wanted to combine the two plot diagrams so they could be directly compared but I couldn’t work it out. I tried creating separate vectors for each plot and creating a new plot with teh two original plots (pardon the plots) which didn’t work.

I also tried simply creating a plot and saving based on (plot 1 + plot 2) but this also didn’t work.

I will just present the two separate plots in the video presentation - not ideal but running short on time here..

Loom is also awesome! I have never used this before but got it to work well. Had some troubles with getting my camera and screen share on the same screen (hmtl blocks camera access - annoying), but got this to work by publishing my report to RPub.

Presentation link

Bibliography

Cohen, M., 2022. Why does humidity affect snow? | SiOWfa14 Science in Our World: Certainty and Cont. [online] Sites.psu.edu. Available at: https://sites.psu.edu/siowfa14/2014/10/24/why-does-humidity-affect-snow/#:~:text=If%20the%20air%20is%20more%20humid%20then%20more,high%20and%20temperature%20is%20low%20snow%20is%20formed. [Accessed 8 August 2022].

Zhang, X., Li, X., Li, L., Zhang, S. and Qin, Q., 2022. Environmental factors influencing snowfall and snowfall prediction in the Tianshan Mountains, Northwest China. [online] Springer Link. Available at: https://link.springer.com/article/10.1007/s40333-018-0110-2 [Accessed 8 August 2022].

Migiro, G., 2022. Does It Snow In Australia?. [online] WorldAtlas. Available at: https://www.worldatlas.com/articles/does-it-snow-in-australia.html [Accessed 8 August 2022].

Wiki User, A., 2022. Does Australia have more snow-covered land than Switzerland? - Answers. [online] Answers. Available at: https://www.answers.com/Q/Does_Australia_have_more_snow-covered_land_than_Switzerland [Accessed 8 August 2022].