Context

Winter is not my favourite season and it is not about the short days. It is not my favourite because it is also FLU season. I hate being sick. As an individual there is only that much you can do to avoid being sick. When you are stuck in heated, overcrowded trainswith an orchestra of coughs and sneezes, you can’t help thinking your turn is next. Last winter, I managed to give FLU a miss. This year though … not so lucky. I am recovering from my second episode.

I am curious if it is because substantial temperature difference between winter 2016 and winter 2017 in Sydney. Is winter 2017 colder than winter 2016.

So I decided to do gather some weather data and do some exploration. I have few companions:
* R
* Datacamp
* Stackoverflow
* MDSI community

Data Source

I am using weatherData R package to fetch weather data. This package returns clean data frame ready for processing. It allows you to supply city (SYD) and time interval you are interested in.

Acquiring weather data

getWeatherforDate is the function within the package to retrieve weather data. Two data frames are generated from following steps:
* sydney_winter_2016 contains Winter 2016 weather data (1st June to 31st July 2016)
* sydney_winter_2017 contains Winter 2017 weather data (1st June to 31st July 2017)

library("weatherData")
sydney_winter_2016 <- getWeatherForDate("SYD", "2016-06-01", end_date = "2016-07-31", opt_detailed=TRUE, opt_all_columns = T)
## [1] 71
##  [1] "TimeAEST"              "TemperatureC"         
##  [3] "Dew_PointC"            "Humidity"             
##  [5] "Sea_Level_PressurehPa" "VisibilityKm"         
##  [7] "Wind_Direction"        "Wind_SpeedKm_h"       
##  [9] "Gust_SpeedKm_h"        "Precipitationmm"      
## [11] "Events"                "Conditions"           
## [13] "WindDirDegrees"        "DateUTC"              
## [1] 35 14
##         V1 V2 V3  V4   V5 V6  V7   V8 V9 V10 V11           V12 V13
## 1 12:00 AM 13 12  96 1028 25  NW 13.0            Mostly Cloudy 310
## 2  1:00 AM 12 11  93 1028 NA WNW 14.8                          300
## 3  1:30 AM 12 12 100 1028 10  NW 11.1  - N/A     Mostly Cloudy 320
## 4  2:00 AM 12 11  94 1028 10 WNW 13.0  - N/A     Mostly Cloudy 290
## 5  3:00 AM 13 12  91 1028 30  NW  9.3            Mostly Cloudy 320
## 6  3:30 AM 12 11  94 1027 10  NW 13.0  - N/A     Mostly Cloudy 310
##                   V14
## 1 2016-05-31 14:00:00
## 2 2016-05-31 15:00:00
## 3 2016-05-31 15:30:00
## 4 2016-05-31 16:00:00
## 5 2016-05-31 17:00:00
## 6 2016-05-31 17:30:00
## [1] 71
##  [1] "TimeAEST"              "TemperatureC"         
##  [3] "Dew_PointC"            "Humidity"             
##  [5] "Sea_Level_PressurehPa" "VisibilityKm"         
##  [7] "Wind_Direction"        "Wind_SpeedKm_h"       
##  [9] "Gust_SpeedKm_h"        "Precipitationmm"      
## [11] "Events"                "Conditions"           
## [13] "WindDirDegrees"        "DateUTC"              
## [1] 35 14
##         V1 V2 V3 V4   V5    V6    V7   V8 V9 V10 V11           V12 V13
## 1 12:00 AM 12  8 70 1019    30 North  9.3         NA Partly Cloudy 350
## 2 12:30 AM 12  9 82 1018 -9999 North 11.1  - N/A  NA         Clear 350
## 3  1:00 AM 12  9 82 1018 -9999   NNW 13.0  - N/A  NA         Clear 340
## 4  2:00 AM 12  9 77 1018    NA North  9.3         NA                 0
## 5  2:30 AM 13 10 82 1017 -9999 North 11.1  - N/A  NA         Clear 350
## 6  3:00 AM 12  9 82 1017 -9999   WNW 11.1  - N/A  NA         Clear 290
##                   V14
## 1 2016-07-30 14:00:00
## 2 2016-07-30 14:30:00
## 3 2016-07-30 15:00:00
## 4 2016-07-30 16:00:00
## 5 2016-07-30 16:30:00
## 6 2016-07-30 17:00:00
##  [1] "Time"                  "TimeAEST"             
##  [3] "TemperatureC"          "Dew_PointC"           
##  [5] "Humidity"              "Sea_Level_PressurehPa"
##  [7] "VisibilityKm"          "Wind_Direction"       
##  [9] "Wind_SpeedKm_h"        "Gust_SpeedKm_h"       
## [11] "Precipitationmm"       "Events"               
## [13] "Conditions"            "WindDirDegrees"       
## [15] "DateUTC"
sydney_winter_2017 <- getWeatherForDate("SYD", "2017-06-01", end_date = "2017-07-31", opt_detailed=TRUE, opt_all_columns = T)
## [1] 65
##  [1] "TimeAEST"              "TemperatureC"         
##  [3] "Dew_PointC"            "Humidity"             
##  [5] "Sea_Level_PressurehPa" "VisibilityKm"         
##  [7] "Wind_Direction"        "Wind_SpeedKm_h"       
##  [9] "Gust_SpeedKm_h"        "Precipitationmm"      
## [11] "Events"                "Conditions"           
## [13] "WindDirDegrees"        "DateUTC"              
## [1] 32 14
##         V1 V2 V3 V4   V5    V6   V7   V8 V9 V10 V11              V12 V13
## 1 12:00 AM 10  2 48 1030    40  WSW 24.1         NA    Partly Cloudy 250
## 2 12:30 AM  9  2 62 1030 -9999 West 20.4  - N/A  NA            Clear 270
## 3  1:00 AM  9  2 62 1030    10  WSW 22.2  - N/A  NA Scattered Clouds 250
## 4  2:00 AM  9  2 53 1030    NA West 24.1         NA                  270
## 5  2:30 AM  9  2 62 1030 -9999 West 20.4  - N/A  NA            Clear 270
## 6  3:00 AM  9  3 66 1030 -9999 West 20.4  - N/A  NA            Clear 270
##                   V14
## 1 2017-05-31 14:00:00
## 2 2017-05-31 14:30:00
## 3 2017-05-31 15:00:00
## 4 2017-05-31 16:00:00
## 5 2017-05-31 16:30:00
## 6 2017-05-31 17:00:00
## [1] 72
##  [1] "TimeAEST"              "TemperatureC"         
##  [3] "Dew_PointC"            "Humidity"             
##  [5] "Sea_Level_PressurehPa" "VisibilityKm"         
##  [7] "Wind_Direction"        "Wind_SpeedKm_h"       
##  [9] "Gust_SpeedKm_h"        "Precipitationmm"      
## [11] "Events"                "Conditions"           
## [13] "WindDirDegrees"        "DateUTC"              
## [1] 36 14
##         V1 V2 V3 V4   V5    V6    V7   V8 V9 V10 V11           V12 V13
## 1 12:00 AM 17  8 42 1017    30 South  7.4            Partly Cloudy 180
## 2  1:00 AM 18  7 38 1018    NA North  5.6                            0
## 3  2:00 AM 17  6 38 1018    NA South  7.4                          190
## 4  2:34 AM 17  8 55 1018 -9999    SE 27.8  - N/A             Clear 130
## 5  3:00 AM 17 11 68 1017 -9999  East 20.4  - N/A             Clear  90
## 6  4:00 AM 16 12 73 1018    NA    SE 13.0                          140
##                   V14
## 1 2017-07-30 14:00:00
## 2 2017-07-30 15:00:00
## 3 2017-07-30 16:00:00
## 4 2017-07-30 16:34:00
## 5 2017-07-30 17:00:00
## 6 2017-07-30 18:00:00
##  [1] "Time"                  "TimeAEST"             
##  [3] "TemperatureC"          "Dew_PointC"           
##  [5] "Humidity"              "Sea_Level_PressurehPa"
##  [7] "VisibilityKm"          "Wind_Direction"       
##  [9] "Wind_SpeedKm_h"        "Gust_SpeedKm_h"       
## [11] "Precipitationmm"       "Events"               
## [13] "Conditions"            "WindDirDegrees"       
## [15] "DateUTC"

Knowing your data

Before you analyse the data, we need to get familiar with contents of the data frames.

How many observations and columns are there?

sydney_winter_2016 has 4,443 observations and sydney_winter_2017 has 4,237 observations. Both data frames have 15 columns with identical name.

library("dplyr")
glimpse(sydney_winter_2016)
## Observations: 4,443
## Variables: 15
## $ Time                  <dttm> 2016-06-01 00:00:00, 2016-06-01 00:00:0...
## $ TimeAEST              <chr> "12:00 AM", "12:00 AM", "1:00 AM", "1:00...
## $ TemperatureC          <dbl> 13, 13, 12, 12, 12, 12, 12, 13, 13, 13, ...
## $ Dew_PointC            <dbl> 12, 12, 11, 11, 12, 11, 11, 12, 12, 12, ...
## $ Humidity              <int> 96, 94, 93, 94, 100, 93, 94, 94, 91, 94,...
## $ Sea_Level_PressurehPa <int> 1028, 1028, 1028, 1028, 1028, 1028, 1028...
## $ VisibilityKm          <dbl> 25, 10, NA, 10, 10, NA, 10, 10, 30, 10, ...
## $ Wind_Direction        <chr> "NW", "NW", "WNW", "WNW", "NW", "WNW", "...
## $ Wind_SpeedKm_h        <chr> "13", "13", "14.8", "14.8", "11.1", "13"...
## $ Gust_SpeedKm_h        <chr> "", "-", "", "-", "-", "", "-", "-", "",...
## $ Precipitationmm       <chr> "", "N/A", "", "N/A", "N/A", "", "N/A", ...
## $ Events                <chr> "", "", "", "", "", "", "", "", "", "", ...
## $ Conditions            <chr> "Mostly Cloudy", "Mostly Cloudy", "", "M...
## $ WindDirDegrees        <int> 310, 310, 300, 300, 320, 290, 290, 350, ...
## $ DateUTC               <chr> "2016-05-31 14:00:00", "2016-05-31 14:00...
glimpse(sydney_winter_2017)
## Observations: 4,237
## Variables: 15
## $ Time                  <dttm> 2017-06-01 00:00:00, 2017-06-01 00:00:0...
## $ TimeAEST              <chr> "12:00 AM", "12:00 AM", "12:30 AM", "1:0...
## $ TemperatureC          <dbl> 10, 10, 9, 9, 9, 9, 9, 9, 9, 8, 9, 8, 8,...
## $ Dew_PointC            <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 2...
## $ Humidity              <int> 48, 58, 62, 51, 62, 62, 53, 62, 62, 57, ...
## $ Sea_Level_PressurehPa <int> 1030, 1030, 1030, 1030, 1030, 1030, 1030...
## $ VisibilityKm          <dbl> 40, -9999, -9999, NA, 10, -9999, NA, -99...
## $ Wind_Direction        <chr> "WSW", "WSW", "West", "WSW", "WSW", "Wes...
## $ Wind_SpeedKm_h        <chr> "24.1", "24.1", "20.4", "22.2", "22.2", ...
## $ Gust_SpeedKm_h        <chr> "", "-", "-", "", "-", "-", "", "-", "-"...
## $ Precipitationmm       <chr> "", "N/A", "N/A", "", "N/A", "N/A", "", ...
## $ Events                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ Conditions            <chr> "Partly Cloudy", "Clear", "Clear", "", "...
## $ WindDirDegrees        <int> 250, 250, 270, 250, 250, 260, 270, 270, ...
## $ DateUTC               <chr> "2017-05-31 14:00:00", "2017-05-31 14:00...

Any missing data?

Looks like Time and TemperatureC are perfect for what I need. Do we have missing values? Hint: search for NA’s in the output. There are no missing values in both columns of interest, so great!!

summary(sydney_winter_2016)
##       Time                       TimeAEST          TemperatureC  
##  Min.   :2016-06-01 00:00:00   Length:4443        Min.   : 6.00  
##  1st Qu.:2016-06-16 08:45:00   Class :character   1st Qu.:12.00  
##  Median :2016-07-01 12:00:00   Mode  :character   Median :15.00  
##  Mean   :2016-07-01 11:28:01                      Mean   :14.36  
##  3rd Qu.:2016-07-16 16:45:00                      3rd Qu.:17.00  
##  Max.   :2016-07-31 23:30:00                      Max.   :27.00  
##                                                                  
##    Dew_PointC         Humidity     Sea_Level_PressurehPa  VisibilityKm  
##  Min.   :-11.000   Min.   :  5.0   Min.   : 992          Min.   :-9999  
##  1st Qu.:  2.000   1st Qu.: 43.0   1st Qu.:1010          1st Qu.:-9999  
##  Median :  6.000   Median : 58.0   Median :1017          Median :    3  
##  Mean   :  6.479   Mean   : 59.5   Mean   :1017          Mean   :-4905  
##  3rd Qu.: 11.000   3rd Qu.: 77.0   3rd Qu.:1023          3rd Qu.:   10  
##  Max.   : 18.000   Max.   :100.0   Max.   :1039          Max.   :   40  
##                                    NA's   :1             NA's   :976    
##  Wind_Direction     Wind_SpeedKm_h     Gust_SpeedKm_h    
##  Length:4443        Length:4443        Length:4443       
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##  Precipitationmm       Events           Conditions        WindDirDegrees 
##  Length:4443        Length:4443        Length:4443        Min.   :  0.0  
##  Class :character   Class :character   Class :character   1st Qu.:230.0  
##  Mode  :character   Mode  :character   Mode  :character   Median :280.0  
##                                                           Mean   :253.1  
##                                                           3rd Qu.:320.0  
##                                                           Max.   :360.0  
##                                                           NA's   :2      
##    DateUTC         
##  Length:4443       
##  Class :character  
##  Mode  :character  
##                    
##                    
##                    
## 
summary(sydney_winter_2016)
##       Time                       TimeAEST          TemperatureC  
##  Min.   :2016-06-01 00:00:00   Length:4443        Min.   : 6.00  
##  1st Qu.:2016-06-16 08:45:00   Class :character   1st Qu.:12.00  
##  Median :2016-07-01 12:00:00   Mode  :character   Median :15.00  
##  Mean   :2016-07-01 11:28:01                      Mean   :14.36  
##  3rd Qu.:2016-07-16 16:45:00                      3rd Qu.:17.00  
##  Max.   :2016-07-31 23:30:00                      Max.   :27.00  
##                                                                  
##    Dew_PointC         Humidity     Sea_Level_PressurehPa  VisibilityKm  
##  Min.   :-11.000   Min.   :  5.0   Min.   : 992          Min.   :-9999  
##  1st Qu.:  2.000   1st Qu.: 43.0   1st Qu.:1010          1st Qu.:-9999  
##  Median :  6.000   Median : 58.0   Median :1017          Median :    3  
##  Mean   :  6.479   Mean   : 59.5   Mean   :1017          Mean   :-4905  
##  3rd Qu.: 11.000   3rd Qu.: 77.0   3rd Qu.:1023          3rd Qu.:   10  
##  Max.   : 18.000   Max.   :100.0   Max.   :1039          Max.   :   40  
##                                    NA's   :1             NA's   :976    
##  Wind_Direction     Wind_SpeedKm_h     Gust_SpeedKm_h    
##  Length:4443        Length:4443        Length:4443       
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##  Precipitationmm       Events           Conditions        WindDirDegrees 
##  Length:4443        Length:4443        Length:4443        Min.   :  0.0  
##  Class :character   Class :character   Class :character   1st Qu.:230.0  
##  Mode  :character   Mode  :character   Mode  :character   Median :280.0  
##                                                           Mean   :253.1  
##                                                           3rd Qu.:320.0  
##                                                           Max.   :360.0  
##                                                           NA's   :2      
##    DateUTC         
##  Length:4443       
##  Class :character  
##  Mode  :character  
##                    
##                    
##                    
## 

Daily Temperature Comparison

Let’s plot temperature variations in these two datasets. There were more periods during June 2016 where average temperature was above 150C. The same was not observed for 2017. Clearly winter 2017 is colder than winter 2016.

library("ggplot2")
ggplot(sydney_winter_2016, aes(Time, TemperatureC)) + geom_line(colour="blue") + geom_smooth(colour="darkblue") + labs(x="Winter2016")

ggplot(sydney_winter_2017, aes(Time, TemperatureC)) + geom_line(colour="red") + geom_smooth(colour="darkred") + labs(x="Winter2017")

Weekly Temperature Variation Comparison

In the following graph, we are comparing minimum, maximum and median temperature on weekly basis. There were more days during 2016 where maximum temperature was above 150C in comparison to 2017. However in week 30 in 2017, was the warmest (close to 200C ).

library("plyr")
library("lubridate")
rbind(sydney_winter_2016, sydney_winter_2017) %>%
  mutate(year=year(Time), weekno = strftime(Time,format="%W")) %>%
  group_by(year, weekno) %>%
ggplot(mapping=aes(x=weekno, y=TemperatureC)) + geom_boxplot() + facet_wrap(~year, nrow=2)