| Energy Consumption |
Note that I am setting all NA=0
## [1] "Month" "Consumtion" "Year"
## [4] "Region" "AverageHighTemp" "AverageLowTemp"
## [7] "StudentsHolidaysDays"
## [1] "Month" "consumption" "Year"
## [4] "Region" "AverageHighTemp" "AverageLowTemp"
## [7] "StudentsHolidaysDays"
## [1] "tbl_df" "tbl" "data.frame"
## # A tibble: 6 × 7
## Month consumption Year Region AverageHighTemp AverageLowTemp
## <chr> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 Jan 6722071 2021 T 17 5
## 2 Feb 5356671 2021 T 20 6
## 3 Mar 4167616 2021 T 24 10
## 4 Apr 5231828 2021 T 30 15
## 5 May 5356671 2021 T 34 19
## 6 Jun 4282131 2021 T 37 23
## # ℹ 1 more variable: StudentsHolidaysDays <dbl>
## [1] "numeric"
First, I am going to make 5 different dataframes, one for each Region (I am excluding “Tay” since there are data for only 1 year. Then, I will check for correlations between Energy Consumption and the other variables
Note that: - “Consumption” NEVER correlates with “StudentsHolidaysDays”, thus I will exclude this variable from the downstream analysis - In stations H and D, there is a significant correlation between “Consumption” and the two temperature variable. - In stations T, U and W, there is NOT a significant correlation between “Consumption” and the two temperature variable. Notably, the correlation is lower in Station T
The next step is to convert the dataframe in time series. Note that the conversion is not perfect (Year alone is no a date), but it is ok for our purpouses.
Now, I plot the data as seasonal plots. Herein, the Energy Consumption variable for each year is plotted against the months. Note that the RED line is 2003, the BLUE line is 2001, the LIGHT GREEN line is 2002 The bottom line(s) are the other variables (not interesting at this point).
As you can see, region T behaves in a quite different way from the other regions along the year.
Finally, I will plot Energy Consumption as subseries plot: for monthly data, all the January values are plotted, then all the February values, and so on.
## Results of statistical testing
## Presence of trend not tested.
## Evidence of seasonality: TRUE (pval: 0)
## Results of statistical testing
## Presence of trend not tested.
## Evidence of seasonality: TRUE (pval: 0.001)
## Results of statistical testing
## Presence of trend not tested.
## Evidence of seasonality: TRUE (pval: 0)
## Results of statistical testing
## Presence of trend not tested.
## Evidence of seasonality: TRUE (pval: 0)
## Results of statistical testing
## Presence of trend not tested.
## Evidence of seasonality: TRUE (pval: 0.011)
Again, T behaves in a different way from the other regions, also with lower variability per month.
To CONCLUDE:
Region T show no correlation with Temperature variable, whereas the other Regions partially do. I have no clues to say what drives Energy Consumption in T