Winter is not my favourite season and it is not about the short days. It is not my favourite because it is also FLU season. I hate being sick. As an individual there is only that much you can do to avoid being sick. When you are stuck in heated, overcrowded trainswith an orchestra of coughs and sneezes, you can’t help thinking your turn is next. Last winter, I managed to give FLU a miss. This year though … not so lucky. I am recovering from my second episode.
I am curious if it is because substantial temperature difference between winter 2016 and winter 2017 in Sydney. Is winter 2017 colder than winter 2016.
So I decided to do gather some weather data and do some exploration. I have few companions:
* R
* Datacamp
* Stackoverflow
* MDSI community
I am using weatherData R package to fetch weather data. This package returns clean data frame ready for processing. It allows you to supply city (SYD) and time interval you are interested in.
getWeatherforDate is the function within the package to retrieve weather data. Two data frames are generated from following steps:
* sydney_winter_2016 contains Winter 2016 weather data (1st June to 31st July 2016)
* sydney_winter_2017 contains Winter 2017 weather data (1st June to 31st July 2017)
library("weatherData")
sydney_winter_2016 <- getWeatherForDate("SYD", "2016-06-01", end_date = "2016-07-31", opt_detailed=TRUE, opt_all_columns = T)
## [1] 71
## [1] "TimeAEST" "TemperatureC"
## [3] "Dew_PointC" "Humidity"
## [5] "Sea_Level_PressurehPa" "VisibilityKm"
## [7] "Wind_Direction" "Wind_SpeedKm_h"
## [9] "Gust_SpeedKm_h" "Precipitationmm"
## [11] "Events" "Conditions"
## [13] "WindDirDegrees" "DateUTC"
## [1] 35 14
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
## 1 12:00 AM 13 12 96 1028 25 NW 13.0 Mostly Cloudy 310
## 2 1:00 AM 12 11 93 1028 NA WNW 14.8 300
## 3 1:30 AM 12 12 100 1028 10 NW 11.1 - N/A Mostly Cloudy 320
## 4 2:00 AM 12 11 94 1028 10 WNW 13.0 - N/A Mostly Cloudy 290
## 5 3:00 AM 13 12 91 1028 30 NW 9.3 Mostly Cloudy 320
## 6 3:30 AM 12 11 94 1027 10 NW 13.0 - N/A Mostly Cloudy 310
## V14
## 1 2016-05-31 14:00:00
## 2 2016-05-31 15:00:00
## 3 2016-05-31 15:30:00
## 4 2016-05-31 16:00:00
## 5 2016-05-31 17:00:00
## 6 2016-05-31 17:30:00
## [1] 71
## [1] "TimeAEST" "TemperatureC"
## [3] "Dew_PointC" "Humidity"
## [5] "Sea_Level_PressurehPa" "VisibilityKm"
## [7] "Wind_Direction" "Wind_SpeedKm_h"
## [9] "Gust_SpeedKm_h" "Precipitationmm"
## [11] "Events" "Conditions"
## [13] "WindDirDegrees" "DateUTC"
## [1] 35 14
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
## 1 12:00 AM 12 8 70 1019 30 North 9.3 NA Partly Cloudy 350
## 2 12:30 AM 12 9 82 1018 -9999 North 11.1 - N/A NA Clear 350
## 3 1:00 AM 12 9 82 1018 -9999 NNW 13.0 - N/A NA Clear 340
## 4 2:00 AM 12 9 77 1018 NA North 9.3 NA 0
## 5 2:30 AM 13 10 82 1017 -9999 North 11.1 - N/A NA Clear 350
## 6 3:00 AM 12 9 82 1017 -9999 WNW 11.1 - N/A NA Clear 290
## V14
## 1 2016-07-30 14:00:00
## 2 2016-07-30 14:30:00
## 3 2016-07-30 15:00:00
## 4 2016-07-30 16:00:00
## 5 2016-07-30 16:30:00
## 6 2016-07-30 17:00:00
## [1] "Time" "TimeAEST"
## [3] "TemperatureC" "Dew_PointC"
## [5] "Humidity" "Sea_Level_PressurehPa"
## [7] "VisibilityKm" "Wind_Direction"
## [9] "Wind_SpeedKm_h" "Gust_SpeedKm_h"
## [11] "Precipitationmm" "Events"
## [13] "Conditions" "WindDirDegrees"
## [15] "DateUTC"
sydney_winter_2017 <- getWeatherForDate("SYD", "2017-06-01", end_date = "2017-07-31", opt_detailed=TRUE, opt_all_columns = T)
## [1] 65
## [1] "TimeAEST" "TemperatureC"
## [3] "Dew_PointC" "Humidity"
## [5] "Sea_Level_PressurehPa" "VisibilityKm"
## [7] "Wind_Direction" "Wind_SpeedKm_h"
## [9] "Gust_SpeedKm_h" "Precipitationmm"
## [11] "Events" "Conditions"
## [13] "WindDirDegrees" "DateUTC"
## [1] 32 14
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
## 1 12:00 AM 10 2 48 1030 40 WSW 24.1 NA Partly Cloudy 250
## 2 12:30 AM 9 2 62 1030 -9999 West 20.4 - N/A NA Clear 270
## 3 1:00 AM 9 2 62 1030 10 WSW 22.2 - N/A NA Scattered Clouds 250
## 4 2:00 AM 9 2 53 1030 NA West 24.1 NA 270
## 5 2:30 AM 9 2 62 1030 -9999 West 20.4 - N/A NA Clear 270
## 6 3:00 AM 9 3 66 1030 -9999 West 20.4 - N/A NA Clear 270
## V14
## 1 2017-05-31 14:00:00
## 2 2017-05-31 14:30:00
## 3 2017-05-31 15:00:00
## 4 2017-05-31 16:00:00
## 5 2017-05-31 16:30:00
## 6 2017-05-31 17:00:00
## [1] 72
## [1] "TimeAEST" "TemperatureC"
## [3] "Dew_PointC" "Humidity"
## [5] "Sea_Level_PressurehPa" "VisibilityKm"
## [7] "Wind_Direction" "Wind_SpeedKm_h"
## [9] "Gust_SpeedKm_h" "Precipitationmm"
## [11] "Events" "Conditions"
## [13] "WindDirDegrees" "DateUTC"
## [1] 36 14
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
## 1 12:00 AM 17 8 42 1017 30 South 7.4 Partly Cloudy 180
## 2 1:00 AM 18 7 38 1018 NA North 5.6 0
## 3 2:00 AM 17 6 38 1018 NA South 7.4 190
## 4 2:34 AM 17 8 55 1018 -9999 SE 27.8 - N/A Clear 130
## 5 3:00 AM 17 11 68 1017 -9999 East 20.4 - N/A Clear 90
## 6 4:00 AM 16 12 73 1018 NA SE 13.0 140
## V14
## 1 2017-07-30 14:00:00
## 2 2017-07-30 15:00:00
## 3 2017-07-30 16:00:00
## 4 2017-07-30 16:34:00
## 5 2017-07-30 17:00:00
## 6 2017-07-30 18:00:00
## [1] "Time" "TimeAEST"
## [3] "TemperatureC" "Dew_PointC"
## [5] "Humidity" "Sea_Level_PressurehPa"
## [7] "VisibilityKm" "Wind_Direction"
## [9] "Wind_SpeedKm_h" "Gust_SpeedKm_h"
## [11] "Precipitationmm" "Events"
## [13] "Conditions" "WindDirDegrees"
## [15] "DateUTC"
Before you analyse the data, we need to get familiar with contents of the data frames.
sydney_winter_2016 has 4,443 observations and sydney_winter_2017 has 4,237 observations. Both data frames have 15 columns with identical name.
library("dplyr")
glimpse(sydney_winter_2016)
## Observations: 4,443
## Variables: 15
## $ Time <dttm> 2016-06-01 00:00:00, 2016-06-01 00:00:0...
## $ TimeAEST <chr> "12:00 AM", "12:00 AM", "1:00 AM", "1:00...
## $ TemperatureC <dbl> 13, 13, 12, 12, 12, 12, 12, 13, 13, 13, ...
## $ Dew_PointC <dbl> 12, 12, 11, 11, 12, 11, 11, 12, 12, 12, ...
## $ Humidity <int> 96, 94, 93, 94, 100, 93, 94, 94, 91, 94,...
## $ Sea_Level_PressurehPa <int> 1028, 1028, 1028, 1028, 1028, 1028, 1028...
## $ VisibilityKm <dbl> 25, 10, NA, 10, 10, NA, 10, 10, 30, 10, ...
## $ Wind_Direction <chr> "NW", "NW", "WNW", "WNW", "NW", "WNW", "...
## $ Wind_SpeedKm_h <chr> "13", "13", "14.8", "14.8", "11.1", "13"...
## $ Gust_SpeedKm_h <chr> "", "-", "", "-", "-", "", "-", "-", "",...
## $ Precipitationmm <chr> "", "N/A", "", "N/A", "N/A", "", "N/A", ...
## $ Events <chr> "", "", "", "", "", "", "", "", "", "", ...
## $ Conditions <chr> "Mostly Cloudy", "Mostly Cloudy", "", "M...
## $ WindDirDegrees <int> 310, 310, 300, 300, 320, 290, 290, 350, ...
## $ DateUTC <chr> "2016-05-31 14:00:00", "2016-05-31 14:00...
glimpse(sydney_winter_2017)
## Observations: 4,237
## Variables: 15
## $ Time <dttm> 2017-06-01 00:00:00, 2017-06-01 00:00:0...
## $ TimeAEST <chr> "12:00 AM", "12:00 AM", "12:30 AM", "1:0...
## $ TemperatureC <dbl> 10, 10, 9, 9, 9, 9, 9, 9, 9, 8, 9, 8, 8,...
## $ Dew_PointC <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 2...
## $ Humidity <int> 48, 58, 62, 51, 62, 62, 53, 62, 62, 57, ...
## $ Sea_Level_PressurehPa <int> 1030, 1030, 1030, 1030, 1030, 1030, 1030...
## $ VisibilityKm <dbl> 40, -9999, -9999, NA, 10, -9999, NA, -99...
## $ Wind_Direction <chr> "WSW", "WSW", "West", "WSW", "WSW", "Wes...
## $ Wind_SpeedKm_h <chr> "24.1", "24.1", "20.4", "22.2", "22.2", ...
## $ Gust_SpeedKm_h <chr> "", "-", "-", "", "-", "-", "", "-", "-"...
## $ Precipitationmm <chr> "", "N/A", "N/A", "", "N/A", "N/A", "", ...
## $ Events <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ Conditions <chr> "Partly Cloudy", "Clear", "Clear", "", "...
## $ WindDirDegrees <int> 250, 250, 270, 250, 250, 260, 270, 270, ...
## $ DateUTC <chr> "2017-05-31 14:00:00", "2017-05-31 14:00...
Looks like Time and TemperatureC are perfect for what I need. Do we have missing values? Hint: search for NA’s in the output. There are no missing values in both columns of interest, so great!!
summary(sydney_winter_2016)
## Time TimeAEST TemperatureC
## Min. :2016-06-01 00:00:00 Length:4443 Min. : 6.00
## 1st Qu.:2016-06-16 08:45:00 Class :character 1st Qu.:12.00
## Median :2016-07-01 12:00:00 Mode :character Median :15.00
## Mean :2016-07-01 11:28:01 Mean :14.36
## 3rd Qu.:2016-07-16 16:45:00 3rd Qu.:17.00
## Max. :2016-07-31 23:30:00 Max. :27.00
##
## Dew_PointC Humidity Sea_Level_PressurehPa VisibilityKm
## Min. :-11.000 Min. : 5.0 Min. : 992 Min. :-9999
## 1st Qu.: 2.000 1st Qu.: 43.0 1st Qu.:1010 1st Qu.:-9999
## Median : 6.000 Median : 58.0 Median :1017 Median : 3
## Mean : 6.479 Mean : 59.5 Mean :1017 Mean :-4905
## 3rd Qu.: 11.000 3rd Qu.: 77.0 3rd Qu.:1023 3rd Qu.: 10
## Max. : 18.000 Max. :100.0 Max. :1039 Max. : 40
## NA's :1 NA's :976
## Wind_Direction Wind_SpeedKm_h Gust_SpeedKm_h
## Length:4443 Length:4443 Length:4443
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## Precipitationmm Events Conditions WindDirDegrees
## Length:4443 Length:4443 Length:4443 Min. : 0.0
## Class :character Class :character Class :character 1st Qu.:230.0
## Mode :character Mode :character Mode :character Median :280.0
## Mean :253.1
## 3rd Qu.:320.0
## Max. :360.0
## NA's :2
## DateUTC
## Length:4443
## Class :character
## Mode :character
##
##
##
##
summary(sydney_winter_2016)
## Time TimeAEST TemperatureC
## Min. :2016-06-01 00:00:00 Length:4443 Min. : 6.00
## 1st Qu.:2016-06-16 08:45:00 Class :character 1st Qu.:12.00
## Median :2016-07-01 12:00:00 Mode :character Median :15.00
## Mean :2016-07-01 11:28:01 Mean :14.36
## 3rd Qu.:2016-07-16 16:45:00 3rd Qu.:17.00
## Max. :2016-07-31 23:30:00 Max. :27.00
##
## Dew_PointC Humidity Sea_Level_PressurehPa VisibilityKm
## Min. :-11.000 Min. : 5.0 Min. : 992 Min. :-9999
## 1st Qu.: 2.000 1st Qu.: 43.0 1st Qu.:1010 1st Qu.:-9999
## Median : 6.000 Median : 58.0 Median :1017 Median : 3
## Mean : 6.479 Mean : 59.5 Mean :1017 Mean :-4905
## 3rd Qu.: 11.000 3rd Qu.: 77.0 3rd Qu.:1023 3rd Qu.: 10
## Max. : 18.000 Max. :100.0 Max. :1039 Max. : 40
## NA's :1 NA's :976
## Wind_Direction Wind_SpeedKm_h Gust_SpeedKm_h
## Length:4443 Length:4443 Length:4443
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## Precipitationmm Events Conditions WindDirDegrees
## Length:4443 Length:4443 Length:4443 Min. : 0.0
## Class :character Class :character Class :character 1st Qu.:230.0
## Mode :character Mode :character Mode :character Median :280.0
## Mean :253.1
## 3rd Qu.:320.0
## Max. :360.0
## NA's :2
## DateUTC
## Length:4443
## Class :character
## Mode :character
##
##
##
##
Let’s plot temperature variations in these two datasets. There were more periods during June 2016 where average temperature was above 150C. The same was not observed for 2017. Clearly winter 2017 is colder than winter 2016.
library("ggplot2")
ggplot(sydney_winter_2016, aes(Time, TemperatureC)) + geom_line(colour="blue") + geom_smooth(colour="darkblue") + labs(x="Winter2016")
ggplot(sydney_winter_2017, aes(Time, TemperatureC)) + geom_line(colour="red") + geom_smooth(colour="darkred") + labs(x="Winter2017")
In the following graph, we are comparing minimum, maximum and median temperature on weekly basis. There were more days during 2016 where maximum temperature was above 150C in comparison to 2017. However in week 30 in 2017, was the warmest (close to 200C ).
library("plyr")
library("lubridate")
rbind(sydney_winter_2016, sydney_winter_2017) %>%
mutate(year=year(Time), weekno = strftime(Time,format="%W")) %>%
group_by(year, weekno) %>%
ggplot(mapping=aes(x=weekno, y=TemperatureC)) + geom_boxplot() + facet_wrap(~year, nrow=2)