Loading required packages and data:

setwd("~/Desktop/Ubiqum/R_task5_electricity") 
pacman::p_load(rmarkdown, dplyr, tidyr, ggplot2, lubridate, labeling, scales, tseries, forecast)
Consumption <- read.csv("household_power_consumption.txt", TRUE, sep = ";", na.strings = c("NA","?"), stringsAsFactors = FALSE)  

Goal:

The aim of this document is to provide to clients of the Sub-meters company the benefits of sub-meters with data insights. Our hyphotesis is if those clients are actually going to save electricity using the sub-meters. The data that we have is the electric power consumption in one French household with a one-minute sampling over a period of 4 years. We have three different sub-meters in different rooms which represent arround the 50% electricity consumption of the house.

head(Consumption)
##         Date     Time Global_active_power Global_reactive_power Voltage
## 1 16/12/2006 17:24:00               4.216                 0.418  234.84
## 2 16/12/2006 17:25:00               5.360                 0.436  233.63
## 3 16/12/2006 17:26:00               5.374                 0.498  233.29
## 4 16/12/2006 17:27:00               5.388                 0.502  233.74
## 5 16/12/2006 17:28:00               3.666                 0.528  235.68
## 6 16/12/2006 17:29:00               3.520                 0.522  235.02
##   Global_intensity Sub_metering_1 Sub_metering_2 Sub_metering_3
## 1             18.4              0              1             17
## 2             23.0              0              1             16
## 3             23.0              0              2             17
## 4             23.0              0              1             17
## 5             15.8              0              1             17
## 6             15.0              0              2             17

Cleaning the data

  1. NA= 1.25% Most of them for holidays. –> Replaced by 0
  2. Computation of daylight saving in France
  3. Is it the data correct? –> Physics
Consumption$Date <- as.Date(Consumption$Date, "%d/%m/%Y") 
Consumption$DateTime <- as.POSIXct(strptime(Consumption$DateTime, "%d/%m/%Y %H:%M:%S", tz = "GMT"))  
# We add the timezone change for France for every year: 
Consumption <- Consumption %>% 
  mutate(DateTime = ifelse(between(DateTime, as_datetime("2007-03-25 02:00:00"),
                                   as_datetime("2007-10-28 01:59:00")),
                                   (DateTime + 3600) , (DateTime))) 
ggplot() +
  geom_col(data=anual.week,
           aes(x=week_days, y=Mean_GC, fill=Month),
           position= "dodge") +
  labs(x="Week Days", y="Average Global Consumption", title="Average Consumption by Weekdays ") +
  theme_linedraw(base_size = 11, base_family = "") +
  theme(plot.title = element_text(hjust = 0.5, face="bold"))

  1. Electricity consumption almost doubles in winter respect summer
  2. The consumption in weekends it’s around a 10% higher in winter.
  3. On summer, the consumption is the same for weekends than weekdays.
ggplot() +
  geom_col(data=tbl_year3,
           aes(x=Year, y=Consump, fill=Submeter),
           position="dodge") +
  labs(x="Year", y="Consumption Watt/Hour", title="Total Consumption by Sub-meters") +
  scale_y_continuous(labels=scales::comma) +
  theme_linedraw(base_size = 11, base_family = "") +
  theme(plot.title = element_text(hjust = 0.5, face="bold")) 

  1. Consumption S1 reduces by 30%
  2. Consumption S2 reduces by 41%
  3. Consumption S3 increases by 10%
  4. Consumption not measured decreases by 26%
ggplot() + 
  geom_col(data=tbl2_2007,
           aes(x=month, y=Consump, fill=Submeter),
           position="dodge" ) +
  labs(x="2007", y="Average Consumption", title="Average Consumption 2007 by Month") +
  theme_linedraw(base_size = 11, base_family = "") +
  theme(plot.title = element_text(hjust = 0.5, face="bold"))

ggplot() + 
  geom_col(data=tbl2_2008,
           aes(x=month, y=Consump, fill=Submeter),
           position="dodge" ) +
  labs(x="2008", y="Average Consumption", title="Average Consumption 2008 by Month") +
  theme_linedraw(base_size = 11, base_family = "") +
  theme(plot.title = element_text(hjust = 0.5, face="bold"))

We have seasonability over the year. On winter the electricity consumption is higher On summer the electricity consumption is much lower

ggplot() +
  geom_line(data=x2007_sub_EL,
            aes(x=DateTime, y=Consump, color=Power),
            alpha = 0.4, size = 4) +
  labs(x="Time", y="Watt/hour", title="Power Consumption on January 15th 2007") +
  theme_linedraw(base_size = 11, base_family = "")

This graph show us the evolution of consumption of electricity in one day.

We prove that the data is correct: Power = Intensity x Voltage

Time Series Analysis and Predictions

Best model ARIMA

  1. Why predictions by Month?
  • Most useful information: We pay the bills by months
  • More accurate predictions (tested in models)
  1. Best model for forecasting: ARIMA
  • Lowest RMSE vs LM and vs HoltWinters. RMSE= 2,1
  • Lowest AIC between different ARIMA’s model and vs ARMA Model
  • We add seasonality for predictions of less than one year.

autoplot(forecastS1) + 
  ggtitle("Forecast 1 Year consumption S1-Kitchen" ) + 
  xlab("Time") + 
  ylab("Consumption Watt/Hour") + 
  theme_linedraw(base_size = 11, base_family = "") + 
  theme(plot.title = element_text(hjust = 0.5))

Prediction of consumption in the Submeter 1 - Kitchen

autoplot(forecastS2) + 
  ggtitle("Forecast 1 Year consumption S2-Laundry Room " ) + 
  xlab("Time") + 
  ylab("Consumption Watt/Hour") + 
  theme_linedraw(base_size = 11, base_family = "") + 
  theme(plot.title = element_text(hjust = 0.5))

Prediction of consumption in the Submeter 2 - Laundry room

autoplot(forecastS3) + 
  ggtitle("Forecast 1 Year consumption S3-Water heater & Air Conditioner" ) + 
  xlab("Time") + 
  ylab("Consumption Watt/Hour") + 
  theme_linedraw(base_size = 11, base_family = "") + 
  theme(plot.title = element_text(hjust = 0.5))

Prediction of consumption in the Submeter 3 - Water heater & Air Conditioner room

autoplot(foreArimaMonth) + 
  ggtitle("Forecast 1 Year Consumption Active Energy") + 
  xlab("Time") + 
  ylab("Average Consumption Watt/Hour") + 
  theme_linedraw(base_size = 11, base_family = "") + 
  theme(plot.title = element_text(hjust = 0.5))

For predictions in the short term, seasonability it is important to predict the consumption. Ex: Household to predict next bill electricity expenses

Best model showed in the above graph: ARIMA

autoplot(foreArimaMonth2) + 
  ggtitle("Forecast 3 Years Total Consumption Active Energy") + 
  xlab("Time") + 
  ylab("Average Consumption Watt/Hour") + 
  theme_linedraw(base_size = 11, base_family = "") + 
  theme(plot.title = element_text(hjust = 0.5))

For predictions in the long term, we prefer not to add seasonability. Ex. Company to predict next years electricity consumption expenses.

Best model showed in the above graph: ARIMA

Conculsions

Clients save energy and money from the first year with the use of the submeters.

With the data analysis of the submeters clients will be able to:

  1. Past Consumption:
    • Check energy evolution by different submeter over different periods of time
    • Check the costs of any period of time
  2. Future predictions:
    • Predict their bills by months ( total energy or submeters)
    • Predict their energy expenses over the years