Energy Consumption

Note that I am setting all NA=0

## [1] "Month"                "Consumtion"           "Year"                
## [4] "Region"               "AverageHighTemp"      "AverageLowTemp"      
## [7] "StudentsHolidaysDays"
## [1] "Month"                "consumption"          "Year"                
## [4] "Region"               "AverageHighTemp"      "AverageLowTemp"      
## [7] "StudentsHolidaysDays"
## [1] "tbl_df"     "tbl"        "data.frame"
## # A tibble: 6 × 7
##   Month consumption  Year Region AverageHighTemp AverageLowTemp
##   <chr>       <dbl> <dbl> <chr>            <dbl>          <dbl>
## 1 Jan       6722071  2021 T                   17              5
## 2 Feb       5356671  2021 T                   20              6
## 3 Mar       4167616  2021 T                   24             10
## 4 Apr       5231828  2021 T                   30             15
## 5 May       5356671  2021 T                   34             19
## 6 Jun       4282131  2021 T                   37             23
## # ℹ 1 more variable: StudentsHolidaysDays <dbl>
## [1] "numeric"

First, I am going to make 5 different dataframes, one for each Region (I am excluding “Tay” since there are data for only 1 year. Then, I will check for correlations between Energy Consumption and the other variables

Note that: - “Consumption” NEVER correlates with “StudentsHolidaysDays”, thus I will exclude this variable from the downstream analysis - In stations H and D, there is a significant correlation between “Consumption” and the two temperature variable. - In stations T, U and W, there is NOT a significant correlation between “Consumption” and the two temperature variable. Notably, the correlation is lower in Station T

The next step is to convert the dataframe in time series. Note that the conversion is not perfect (Year alone is no a date), but it is ok for our purpouses.

Now, I plot the data as seasonal plots. Herein, the Energy Consumption variable for each year is plotted against the months. Note that the RED line is 2003, the BLUE line is 2001, the LIGHT GREEN line is 2002 The bottom line(s) are the other variables (not interesting at this point).

As you can see, region T behaves in a quite different way from the other regions along the year.

Finally, I will plot Energy Consumption as subseries plot: for monthly data, all the January values are plotted, then all the February values, and so on.

## Results of statistical testing
## Presence of trend not tested.
## Evidence of seasonality: TRUE  (pval: 0)
## Results of statistical testing
## Presence of trend not tested.
## Evidence of seasonality: TRUE  (pval: 0.001)
## Results of statistical testing
## Presence of trend not tested.
## Evidence of seasonality: TRUE  (pval: 0)
## Results of statistical testing
## Presence of trend not tested.
## Evidence of seasonality: TRUE  (pval: 0)
## Results of statistical testing
## Presence of trend not tested.
## Evidence of seasonality: TRUE  (pval: 0.011)

Again, T behaves in a different way from the other regions, also with lower variability per month.

To CONCLUDE:

Region T show no correlation with Temperature variable, whereas the other Regions partially do. I have no clues to say what drives Energy Consumption in T