Background:

The data that we have is the electric power consumption in one French household with a one-minute sampling from Dec 2006 to Nov 2010. We have been contacted by a law firm that needs help analyzing archived data records for their client. The client claims to have not been occupying a specific residence at the time of an undisclosed event during the Summer of 2008.

Objective:

Our goal is to perform an in-depth analysis of the power consumption dataset via data visualization , time series regression modeling and provide evidence to law firm to support client’s claim. Also provide the client with useful correlations or predictions and five energy savings suggestions based on insights that we glean from our analysis.

Source data:

The data was gathered in a single household located in Sceaux, France between December 2006 and November 2010 (47 months).It contains over 2 million measurements of electric power consumption from 3 sub-meters in one house with a one-minute sampling rate over a period of almost 4 years.

Source data description

Sub-meters info:

Load required libraries

library(readr)
library(dplyr)
library(RMySQL)
library(lubridate)
library(ggplot2)
library(fpp2)
library(tidyr)
library(devtools)
library(plotly)
library(tidyverse)
library(ggthemes)
library(extrafont)
library(reshape2)
library(kableExtra)

Load Source data

### Establish database connection
con = dbConnect(MySQL(), user='deepAnalytics', password='Sqltask1234!', dbname='dataanalytics2018', host='data-analytics-2018.cbrosir2cswx.us-east-1.rds.amazonaws.com')

## List the tables in the database
dbListTables(con)
## [1] "iris"    "yr_2006" "yr_2007" "yr_2008" "yr_2009" "yr_2010"
## Get the column names of the table

dbListFields(con,'yr_2006')
##  [1] "id"                    "Date"                  "Time"                 
##  [4] "Global_active_power"   "Global_reactive_power" "Global_intensity"     
##  [7] "Voltage"               "Sub_metering_1"        "Sub_metering_2"       
## [10] "Sub_metering_3"
## Select required columns from all the tables)
yr_2006 <-dbGetQuery(con,"SELECT Date,Time,Sub_metering_1,Sub_metering_2,Sub_metering_3 FROM yr_2006")
yr_2007 <-dbGetQuery(con,"SELECT Date,Time,Sub_metering_1,Sub_metering_2,Sub_metering_3 FROM yr_2007")
yr_2008 <-dbGetQuery(con,"SELECT Date,Time,Sub_metering_1,Sub_metering_2,Sub_metering_3 FROM yr_2008")
yr_2009 <-dbGetQuery(con,"SELECT Date,Time,Sub_metering_1,Sub_metering_2,Sub_metering_3 FROM yr_2009")
yr_2010 <-dbGetQuery(con,"SELECT Date,Time,Sub_metering_1,Sub_metering_2,Sub_metering_3 FROM yr_2010")

Understand data

## Check the structure of dataframe
str(yr_2006)
str(yr_2007)
str(yr_2008)
str(yr_2009)
str(yr_2010)

## Check head and tail of all the dataframes
head(yr_2006) # Starts from Dec 2016
tail(yr_2006)

head(yr_2010)
tail(yr_2010) # Ends on Nov 26,2011

## Check summary
summary(yr_2007)
summary(yr_2008)
summary(yr_2009)

Load data

submeter_data<- bind_rows(yr_2007, yr_2008, yr_2009)

Data Preprocessing

Check missing data and nulls

Date_minutes_count <- submeter_data %>% count(Date)
Date_minutes_missing <-Date_minutes_count %>%  filter(n !=1440)

missing_data_freq <-data.frame(table(Date_minutes_missing$n))

missing_data <- Date_minutes_missing %>% filter (n <1000)

kable(missing_data,format = "html",caption= "Results of all Random Forest models",digits=4)%>% kable_styling(bootstrap_options = "striped", full_width = F)
Results of all Random Forest models
Date n
2007-04-28 21
2007-04-30 576
2009-06-13 30
2009-06-15 985
2009-08-13 549

Observation:

Each date should have 1440 observations (24 hrs*60 mins) as the data is taken every minute.The source data is checked for nulls and 53 dates have missing data. Out of 53 dates, 40 dates have less than 5 minutes of data missing per day. 7 dates have less than 2 hours of data missing per day.Remaining 5 dates have more than 2 hours of data missing per day. Since the missing data is negligible,imputation is not done.

Date and Time manipulation

### Create a new column with Date and Time combined
submeter_data<-cbind(submeter_data,paste(submeter_data$Date,submeter_data$Time), stringsAsFactors=FALSE)
colnames(submeter_data)[6]<- "DateTime"

## Reordering the datetime column as 1st column
submeter_data <- submeter_data[,c(6,1,2,3,4,5)]


## Convert the DateTime column from characters to date class in R to make it as a time series data
submeter_data$DateTime <- as.POSIXct(submeter_data$DateTime, "%Y/%m/%d %H:%M:%S")
attr(submeter_data$DateTime, "tzone") <- "Europe/Paris"
str(submeter_data)
## 'data.frame':    1569894 obs. of  6 variables:
##  $ DateTime      : POSIXct, format: "2007-01-01 01:00:00" "2007-01-01 01:01:00" ...
##  $ Date          : chr  "2007-01-01" "2007-01-01" "2007-01-01" "2007-01-01" ...
##  $ Time          : chr  "00:00:00" "00:01:00" "00:02:00" "00:03:00" ...
##  $ Sub_metering_1: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Sub_metering_2: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Sub_metering_3: num  0 0 0 0 0 0 0 0 0 0 ...
### Drop date and time columns and create new date and time as timezone has changed
submeter_data$Date <-NULL
submeter_data$Time <-NULL


### Extract date, time,year, quarter, month, day, hour, minute etc for data analysis.
submeter_data$Date <- date(submeter_data$DateTime)
submeter_data$Time <- format(submeter_data$DateTime,'%H:%M:%S')
submeter_data$year <- year(submeter_data$DateTime)
submeter_data$month <- month(submeter_data$DateTime,label=TRUE)
submeter_data$quarter <- quarter(submeter_data$Date,with_year = TRUE)
submeter_data$week <- isoweek(submeter_data$Date)
submeter_data$day <- day(submeter_data$Date)
submeter_data$hour <- as.numeric(format(submeter_data$DateTime,'%H'))
submeter_data$dayname <- wday(submeter_data$DateTime,label=TRUE)
submeter_data$year_day <-yday(submeter_data$DateTime)
submeter_data$minute <-minute(submeter_data$DateTime)

## Filter 2007,2008,2009 data again as some Dec 31st 2009 rows have become 2010 Jan 1 
submeter_data <-submeter_data %>% filter(year %in% c(2007,2008,2009)) 


## Understand the final data
summary(submeter_data)
##     DateTime                   Sub_metering_1   Sub_metering_2  
##  Min.   :2007-01-01 01:00:00   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.:2007-10-03 08:24:15   1st Qu.: 0.000   1st Qu.: 0.000  
##  Median :2008-07-01 21:35:30   Median : 0.000   Median : 0.000  
##  Mean   :2008-07-02 03:24:05   Mean   : 1.159   Mean   : 1.343  
##  3rd Qu.:2009-03-31 13:47:45   3rd Qu.: 0.000   3rd Qu.: 1.000  
##  Max.   :2009-12-31 23:59:00   Max.   :82.000   Max.   :78.000  
##                                                                 
##  Sub_metering_3        Date                Time                year     
##  Min.   : 0.000   Min.   :2007-01-01   Length:1569834     Min.   :2007  
##  1st Qu.: 0.000   1st Qu.:2007-10-03   Class :character   1st Qu.:2007  
##  Median : 1.000   Median :2008-07-01   Mode  :character   Median :2008  
##  Mean   : 6.216   Mean   :2008-07-01                      Mean   :2008  
##  3rd Qu.:17.000   3rd Qu.:2009-03-31                      3rd Qu.:2009  
##  Max.   :31.000   Max.   :2009-12-31                      Max.   :2009  
##                                                                         
##      month           quarter          week            day             hour     
##  Oct    :134054   Min.   :2007   Min.   : 1.00   Min.   : 1.00   Min.   : 0.0  
##  May    :133914   1st Qu.:2007   1st Qu.:13.00   1st Qu.: 8.00   1st Qu.: 5.0  
##  Jan    :133856   Median :2008   Median :27.00   Median :16.00   Median :12.0  
##  Dec    :133846   Mean   :2008   Mean   :26.62   Mean   :15.71   Mean   :11.5  
##  Jul    :133783   3rd Qu.:2009   3rd Qu.:40.00   3rd Qu.:23.00   3rd Qu.:18.0  
##  Mar    :133736   Max.   :2009   Max.   :53.00   Max.   :31.00   Max.   :23.0  
##  (Other):766645                                                                
##  dayname         year_day         minute     
##  Sun:221579   Min.   :  1.0   Min.   : 0.00  
##  Mon:224448   1st Qu.: 91.0   1st Qu.:14.25  
##  Tue:226049   Median :184.0   Median :30.00  
##  Wed:225980   Mean   :183.4   Mean   :29.50  
##  Thu:225184   3rd Qu.:275.0   3rd Qu.:44.00  
##  Fri:224634   Max.   :366.0   Max.   :59.00  
##  Sat:221960
nrow(submeter_data)
## [1] 1569834
sum(submeter_data$Sub_metering_1)
## [1] 1819989
sum(submeter_data$Sub_metering_2)
## [1] 2108410
sum(submeter_data$Sub_metering_3)
## [1] 9757744

Summary of Date and Time Manipulation:

  1. Only Date, Time, Sub_metering_1, Sub_metering_2, Sub_metering_3 are chosen for analysis.
  2. Year 2006 has data only from December and Year 2010 has data till November. For Analysisand forecasting, only years with 12 months data are considered. The source data is filtered and only 2007-2009 data are chosen.
  3. DateTime column is created with Date and Time column from the source and the time zoneis changed to Paris. DateTime column is converted to POSIXct format required by R for timeseries forecasting.
  4. Date and Time columns are deleted and new columns are created from DateTime column for Exploratory data analysis and are as follows – Date, Time, year, month, quarter, week,day, hour, dayname, year_day, minute.

Data Preparation for EDA

Summarize submeter readings by hour,day,week, month

sub_monthly_total <-submeter_data %>% group_by(year,month)%>%
  summarise(sub3_monthly_total= sum(Sub_metering_3),
            sub2_monthly_total= sum(Sub_metering_2),
            sub1_monthly_total= sum(Sub_metering_1))


sub_weekly_total <-submeter_data %>% group_by(year,week)%>%
  summarise(sub3_weekly_total= sum(Sub_metering_3),
            sub2_weekly_total= sum(Sub_metering_2),
            sub1_weekly_total= sum(Sub_metering_1))

sub_daily_total <-submeter_data %>% group_by(year,month,week,Date,day)%>%
  summarise(sub3_daily_total= sum(Sub_metering_3),
            sub2_daily_total= sum(Sub_metering_2),
            sub1_daily_total= sum(Sub_metering_1))


sub_hourly_total <-submeter_data %>% group_by(year,Date,week,dayname,hour)%>%
  summarise(sub3_hourly_total= sum(Sub_metering_3),
            sub2_hourly_total= sum(Sub_metering_2),
            sub1_hourly_total= sum(Sub_metering_1))

Create a new dataframe with Submeter columns as rows to compare all the 3 submeters across years

Meter_data <- submeter_data %>%
  gather(All_Meter, Watt_hr, "Sub_metering_1", "Sub_metering_2", "Sub_metering_3")  

Let’s get the proportion of total consumption by each submeter

  1. Across different years
  2. Different months in year 2009
  3. Different quarters in Year 2009
  4. Different weeks
  5. Different days
  6. Different hours

Proportion of total energy consumption by 3 submeters

Proportion of total power use over an entire year by each sub-meter.

Meter_data %>% 
  group_by(year,All_Meter) %>% 
  summarise(Total_usage_kwh=sum(Watt_hr)/1000)%>%
  ggplot(aes(x=factor(year),y=Total_usage_kwh,group=All_Meter,fill=All_Meter))+
  geom_col(color="black")+ theme_bw()+
  labs(x="Year", y="Total Energy usage in Kwh")+
  ggtitle ("Proportion of Total Energy consumption by each submeter")+
  theme(text = element_text(size = 12))+
   scale_fill_discrete(name = 'Submeter:   ', labels = c('Kitchen', 'Laundry', 'Water Heater & AC'))

Proportion of total power usage across months by each submeter in a particular year

Meter_data %>% 
  filter (year==2009) %>% 
  group_by(month,All_Meter) %>% 
  summarise(Total_usage_kwh=sum(Watt_hr)/1000)%>%
  ggplot(aes(x=factor(month),y=Total_usage_kwh,group=All_Meter,fill=All_Meter))+
  geom_col(color="black")+ theme_bw()+
  labs(x="Months", y="Total Energy usage in Kwh")+
  ggtitle ("Proportion of Total Energy consumption by each submeter/ month in 2009")+
  theme(text = element_text(size = 12))+
   scale_fill_discrete(name = 'Submeter:   ', labels = c('Kitchen', 'Laundry', 'Water Heater & AC'))

Proportion of total power use across weeks by each sub-meter.

Meter_data %>%
  group_by(week, All_Meter) %>%
  summarise(Total_usage_kwh=sum(Watt_hr)/1000)%>%
  ggplot(aes(x=week,y=Total_usage_kwh,group=All_Meter,fill=All_Meter))+
  geom_col(color="black")+ theme_bw()+
  labs(x="Weeks", y="Total Energy usage in Kwh")+
  ggtitle ("Proportion of Total Energyconsumption by each submeter/ Week in 3 years")+
  theme(text = element_text(size = 12))+
   scale_fill_discrete(name = 'Submeter:   ', labels = c('Kitchen', 'Laundry', 'Water Heater & AC'))

Proportion of total power use different quarters of a year by each sub-meter.

#### Compare Submeters usage at diff quarters in 2009
Meter_data %>% 
  filter(year==2009) %>% 
  group_by(quarter, All_Meter) %>% summarise(Total_usage_kwh=sum(Watt_hr)/1000)%>%
  ggplot(aes(x=factor(quarter),y=Total_usage_kwh,group=All_Meter,fill=All_Meter))+
  geom_col(color="black")+ theme_bw()+
  labs(x="Quarters", y="Total Energy usage in Kwh")+
  ggtitle ("Proportion of Total Energy consumption by each submeter/ Quarter in 2009")+
  theme(text = element_text(size = 12))+
   scale_fill_discrete(name = 'Submeter:   ', labels = c('Kitchen', 'Laundry', 'Water Heater & AC'))

Proportion of total use at various times of day by each sub-meter.

Meter_data %>% 
 # filter(Date=="2009-05-15") %>% 
  group_by(hour, All_Meter) %>% 
  summarise(Total_usage_kwh=sum(Watt_hr)/1000)%>%
  ggplot(aes(x=factor(hour),y=Total_usage_kwh,group=All_Meter,fill=All_Meter))+
  geom_col(color="black")+ theme_bw()+
  labs(x="Time in a day", y="Total Energy usage in Kwh")+
  ggtitle ("Proportion of Total Energy by each submeter/ hour in 3 years")+
  theme(text = element_text(size = 12))+
   scale_fill_discrete(name = 'Submeter:   ', labels = c('Kitchen', 'Laundry', 'Water Heater & AC'))

Proportion of total power usage by each submeter across days

Meter_data %>% 
 # filter(Date=="2009-05-15") %>% 
  group_by(dayname, All_Meter) %>% 
  summarise(Total_usage_kwh=sum(Watt_hr)/1000)%>%
  ggplot(aes(x=factor(dayname),y=Total_usage_kwh,group=All_Meter,fill=All_Meter))+
  geom_col(color="black")+ theme_bw()+
  labs(x="Time in a day", y="Total Energy usage in Kwh")+
  ggtitle ("Proportion of Total Energy by each submeter/ day in 3 years")+
  theme(text = element_text(size = 12))+
   scale_fill_discrete(name = 'Submeter:   ', labels = c('Kitchen', 'Laundry', 'Water Heater & AC'))

Let’s do comparison of total Submeter usage by each submeter - Across months of all years - Across weeks of all years - Across days of all years - Across hours of all years

Comparison of submeter usage across months for a year

Meter_data %>% 
  #filter(year==2007)%>%
  group_by(month,All_Meter) %>% 
  summarise(Total_usage_kwh=sum(Watt_hr)/1000)%>%
  ggplot(aes(x=factor(month),y=Total_usage_kwh,group=All_Meter,color=All_Meter))+
  geom_line(size=1.25)+ theme_classic()+
  labs(x="Months", y="Total Energy usage in Kwh")+
  ggtitle ("Energy consumed by each submeter across months in 3 years")+
  theme(text = element_text(size = 12))+
   scale_color_discrete(name = 'Submeter:   ', labels = c('Kitchen', 'Laundry', 'Water Heater & AC'))

Comparison of submeter usage across weeks for 2009

Meter_data %>% 
 # filter(year==2009)%>%
  group_by(week,All_Meter) %>% 
  summarise(Total_usage_kwh=sum(Watt_hr)/1000)%>%
  ggplot(aes(x=week,y=Total_usage_kwh,group=All_Meter,color=All_Meter))+
  geom_line(size=1.25)+ theme_classic()+
  labs(x="Weeks", y="Total Energy usage in Kwh")+
  ggtitle ("Energy consumed by each submeter across weeks in 3 years")+
  theme(text = element_text(size = 12))+
  scale_x_continuous(breaks = scales::pretty_breaks(n = 12))+
   scale_color_discrete(name = 'Submeter:   ', labels = c('Kitchen', 'Laundry', 'Water Heater & AC'))

Comparison of submeter usage across days for a particular year

Meter_data %>% 
  #filter(year==2007)%>%
  group_by(dayname,All_Meter) %>% 
  summarise(Total_usage_kwh=sum(Watt_hr)/1000)%>%
  ggplot(aes(x=factor(dayname),y=Total_usage_kwh,group=All_Meter,color=All_Meter))+
  geom_line(size=1.25)+ theme_classic()+
  labs(x="Days of the week", y="Total Energy usage in Kwh")+
  ggtitle ("Energy consumed by each submeter across days in 3 years")+
  theme(text = element_text(size = 12))+
   scale_color_discrete(name = 'Submeter:   ', labels = c('Kitchen', 'Laundry', 'Water Heater & AC'))

Comparison of submeter usage across hours

Meter_data %>% 
  #filter(year==2007)%>%
  group_by(hour,All_Meter) %>% 
  summarise(Total_usage_kwh=sum(Watt_hr)/1000)%>%
  ggplot(aes(x=hour,y=Total_usage_kwh,group=All_Meter,color=All_Meter))+
  geom_line(size=1.25)+ theme_classic()+
  labs(x="Hours of the day", y="Total Energy usage in Kwh")+
  ggtitle ("Energy consumed by each submeter across hours in 3 years")+
  theme(text = element_text(size = 12))+
  scale_x_continuous(breaks = scales::pretty_breaks(n = 12))+
   scale_color_discrete(name = 'Submeter:   ', labels = c('Kitchen', 'Laundry', 'Water Heater & AC'))

Meter_data %>% 
  group_by(month,All_Meter,year) %>% 
  summarise(Total_usage_kwh=sum(Watt_hr)/1000)%>%
  ggplot(aes(x=factor(month),y=Total_usage_kwh,group=All_Meter,color=All_Meter))+
  facet_grid(rows=vars(year))+
  geom_line(size=1.25)+ theme_classic()+
  labs(x="Months", y="Total Energy usage in Kwh")+
  ggtitle ("Energy consumed by each submeter across months in 3 years")+
  theme(text = element_text(size = 12))+
  theme_bw()+
   scale_color_discrete(name = 'Submeter:   ', labels = c('Kitchen', 'Laundry', 'Water Heater & AC')) 

Detailed Analysis of typical power usage by each submeter across years

Submeter 1 -Kitchen detailed Analysis:

  1. Plot Submeter 1 usage across months of the year for each year
  2. Plot Submeter 1 usage across weeks for each year
  3. Plot Submeter 1 usage across days of the week for each year
  4. Plot Submeter 1 usage across hours of each day for all years
  5. Plot Submeter 1 usage across a weekday and weekend for each year
  6. Plot Submeter 1 usage across days in Summer and Winter days

Plot Submeter 1 usage across months of the year

sub_monthly_total %>%
  ggplot( aes(x=month,y=sub1_monthly_total/1000,group=factor(year)))+
  geom_line(aes(color=factor(year)),size=1.25)+theme_classic()+
  ggtitle("Comparison of monthly total Submeter 1 readings(Kwh) across different years ")+
  labs(x="Months", y="Total Submeter 1/ month in kwh")+
   scale_color_discrete(name = 'Year')

Plot Submeter 1 usage across weeks for each year

sub_weekly_total %>%
  ggplot( aes(x=week,y=sub1_weekly_total/1000,group=factor(year)))+
  geom_line(aes(color=factor(year)),size=1.25)+theme_classic()+
  ggtitle("Comparison of weekly total Submeter 1 readings(Kwh) across different years ")+
  labs(x="Weeks", y="Total Submeter 1/ Week in kwh")+
  scale_x_continuous(breaks = scales::pretty_breaks(n = 5))+
   scale_color_discrete(name = 'Year')

Analyze preferred day for dishwasher and cooking

 submeter_data %>% group_by(dayname,year) %>% summarise(Sub1total_day =sum(Sub_metering_1))%>%
   ggplot(aes(x=dayname,y=Sub1total_day,group=factor(year)))+
   geom_line(aes(color=factor(year)),size=1)+theme_bw()+
   labs(x="Days of the week",y="Submeter total/ year for that day")+
   ggtitle("Analyzing Submeter1-usage to find preferred day for dishwasher")+
   scale_color_discrete(name = 'Year')

Analyze the time of the day kitchen appliances are mostly used

submeter_data %>% group_by(dayname,hour) %>% summarise(Sub1total_hour =sum(Sub_metering_1))%>%
   ggplot(aes(x=hour,y=Sub1total_hour,color=factor(dayname),group=(dayname)))+
   geom_line(size=1)+theme_bw()+geom_point()+
   labs(x="Time of the day",y="Submeter total/ hour ")+
   ggtitle("Analyzing the time of the day kitchen appliances are mostly used")+
   scale_x_continuous(breaks = scales::pretty_breaks(n = 12))

Lets go further and check which time of the day is preferred on weekday vs weekend

 submeter_data %>% filter(dayname %in% c("Wed","Sun")) %>% group_by(dayname,hour,year) %>% summarise(Sub1total_hour =sum(Sub_metering_1))%>%
   ggplot(aes(x=hour,y=Sub1total_hour,color=factor(dayname),shape=factor(year),group=interaction(year,dayname)))+
   geom_line(size=1)+theme_bw()+geom_point()+
   labs(x="Time of the day",y="Submeter total/ hour ")+
   ggtitle("Comparing weekday vs weekend usage of kitchen appliances")+
   scale_x_continuous(breaks = scales::pretty_breaks(n = 12))+
  scale_colour_brewer(palette = "Set2")

Comparison of SUbmeter1 usage between summer and winter

#### Get typical submeter 1 usage per day for summer weeks

summer_weeks_sub1 <- submeter_data %>%
                  filter( week >=23 & week<=30)%>%
                group_by(dayname)%>%
                summarise(summer_sub1_usage= sum(Sub_metering_1)/1000)

#### Get typical submeter1 usage per day for winter weeks

winter_weeks_sub1 <- submeter_data %>%
  filter( week >=1 & week<=8)%>%
  group_by(dayname)%>%
  summarise(winter_sub1_usage= sum(Sub_metering_1)/1000)

Comparison of submeter 1 usage in Summer and Winter

 ggplot(data=summer_weeks_sub1)+
   geom_line(aes(x=dayname, y=summer_sub1_usage, group=1,colour='Summer'),size=1.25) +
   geom_line(data = winter_weeks_sub1, aes(x=dayname, y=winter_sub1_usage, group=1, color='Winter'),size=1.25)+
   scale_colour_manual(values=c('Winter'='blue', 'Summer'='red')) +
   labs(colour='Season') + theme_classic()+
   labs(x='Day of the Week', y='Submeter usage in kWh') +
   ggtitle('Submeter1 usage in Summer vs Winter') +
   theme(panel.border=element_rect(colour='black', fill=NA))+
 theme(axis.line = element_line(size=1, colour = "black"),
       panel.grid.major = element_line(colour = "#d3d3d3"), panel.grid.minor = element_blank(),
       panel.border = element_blank(), panel.background = element_blank()) +
   theme(plot.title = element_text(size = 14, family = "Tahoma", face = "bold"),
         text=element_text(family="Tahoma"),
         axis.text.x=element_text(colour="black", size = 10),
         axis.text.y=element_text(colour="black", size = 10),
         legend.key=element_rect(fill="white", colour="white"))

Submeter 2 detailed Analysis:

  1. Plot Submeter 2 usage across months of the year for each year4
  2. Plot Submeter 2 usage across weeks of each year
  3. Plot Submeter 2 usage across days of the week for each year
  4. Plot Submeter 2 usage across hours of each day for all years
  5. Plot Submeter 2 usage across a weekday and weekend for each year
  6. Plot Submeter 2 usage across days in Summer and Winter days

Submeter2 monthly total across years

sub_monthly_total %>%
  ggplot( aes(x=month,y=sub2_monthly_total/1000,group=factor(year)))+
  geom_line(aes(color=factor(year)),size=1.25)+theme_classic()+
  ggtitle("Comparison of monthly total Submeter 2 readings(Kwh) across different years ")+
  labs(x="Months", y="Total Submeter 2/ month in kwh")+
   scale_color_discrete(name = 'Year')

Compare Submeter 2 usage across weeks of each year

sub_weekly_total %>%
  ggplot( aes(x=week,y=sub2_weekly_total/1000,group=factor(year)))+
  geom_line(aes(color=factor(year)),size=1.25)+theme_classic()+
  ggtitle("Comparison of weekly total Submeter 2 readings(Kwh) across different years ")+
  labs(x="Weeks", y="Total Submeter 2/ Week in kwh")+
  scale_x_continuous(breaks = scales::pretty_breaks(n = 5))+
   scale_color_discrete(name = 'Year')

Find preferred day for laundry

 submeter_data %>% group_by(dayname,year) %>% summarise(Sub2total_day =sum(Sub_metering_2))%>%
   ggplot(aes(x=dayname,y=Sub2total_day,group=factor(year)))+
   geom_line(aes(color=factor(year)),size=1)+theme_bw()+
   labs(x="Days of the week",y="Submeter2 total/ year for that day")+
   ggtitle("Analyzing Submeter2-usage to find preferred day for Laundry ")+
   scale_color_discrete(name = 'Year')

Lets go further and check which time of the day laundry is preferred

 submeter_data %>% group_by(dayname,hour) %>% summarise(Sub2total_hour =sum(Sub_metering_2))%>%
   ggplot(aes(x=hour,y=Sub2total_hour,color=factor(dayname),group=(dayname)))+
   geom_line(size=1)+theme_bw()+geom_point()+
   labs(x="Time of the day",y="Submeter total/ hour ")+
   ggtitle("Analyzing the time of the day preferred for laundry")+
   scale_x_continuous(breaks = scales::pretty_breaks(n = 12))

Compare laundry usage between weekday and weekend

 submeter_data %>% filter(dayname %in% c("Wed","Sun")) %>%group_by(dayname,hour,year) %>% summarise(Sub2total_hour =sum(Sub_metering_2))%>%
   ggplot(aes(x=hour,y=Sub2total_hour,color=factor(dayname),shape=factor(year),group=interaction(year,dayname)))+
   geom_line(size=1)+theme_bw()+geom_point()+
   labs(x="Time of the day",y="Submeter total/ hour ")+
   ggtitle("Comparing weekday vs weekend for laundry")+
   scale_x_continuous(breaks = scales::pretty_breaks(n = 12))+
   scale_colour_brewer(palette = "Set2")

Get laundry usage in Summer and winter

#### Get typical submeter 2 usage per day for summer weeks

summer_weeks_sub2 <- submeter_data %>%
                  filter(week >=23 & week<=30)%>%
                group_by(dayname)%>%
                summarise(summer_sub2_usage= sum(Sub_metering_2)/1000)

#### Get typical submeter2 usage per day for winter weeks

winter_weeks_sub2 <- submeter_data %>%
  filter( week >=1 & week<=8)%>%
  group_by(dayname)%>%
  summarise(winter_sub2_usage= sum(Sub_metering_2)/1000)

Comparison of Submeter 2 usage in Summer and Winter of 2009

 ggplot(data=summer_weeks_sub2)+
   geom_line(aes(x=dayname, y=summer_sub2_usage, group=1,colour='Summer'),size=1.25) +
   geom_line(data = winter_weeks_sub2, aes(x=dayname, y=winter_sub2_usage, group=1, color='Winter'),size=1.25)+
   scale_colour_manual(values=c('Winter'='blue', 'Summer'='red')) +
   labs(colour='Season') + theme_classic()+
   labs(x='Day of the Week', y='Submeter usage in kWh') +
   ggtitle('Submeter2 usage in Summer vs Winter') +
   theme(panel.border=element_rect(colour='black', fill=NA))+
 theme(axis.line = element_line(size=1, colour = "black"),
       panel.grid.major = element_line(colour = "#d3d3d3"), panel.grid.minor = element_blank(),
       panel.border = element_blank(), panel.background = element_blank()) +
   theme(plot.title = element_text(size = 14, family = "Tahoma", face = "bold"),
         text=element_text(family="Tahoma"),
         axis.text.x=element_text(colour="black", size = 10),
         axis.text.y=element_text(colour="black", size = 10),
         legend.key=element_rect(fill="white", colour="white"))

Submeter 3 detailed Analysis:

  1. Plot Submeter 3 usage across months of the year for each year
  2. Plot Submeter 3 usage across weeks
  3. Plot Submeter 3 usage across days of the week for each year
  4. Plot Submeter 3 usage across hours of each day for all years
  5. Plot Submeter 3 usage across a weekday and weekend for each year
  6. Plot Submeter 3 usage across days in Summer and Winter days

Submeter3 monthly total across years

sub_monthly_total %>%
  ggplot( aes(x=month,y=sub3_monthly_total/1000,group=factor(year)))+
  geom_line(aes(color=factor(year)),size=1.25)+theme_classic()+
  ggtitle("Comparison of monthly total Submeter 3 readings(Kwh) across different years ")+
  labs(x="Months", y="Total Submeter 3/ month in kwh")+
   scale_color_discrete(name = 'Year')

Comparison of AC and Water heater usage across weeks

sub_weekly_total %>%
  ggplot( aes(x=week,y=sub3_weekly_total/1000,group=factor(year)))+
  geom_line(aes(color=factor(year)),size=1.25)+theme_classic()+
  ggtitle("Comparison of weekly total Submeter 3 readings(Kwh) across different years ")+
  labs(x="Weeks", y="Total Submeter 3/ Week in kwh")+
  scale_x_continuous(breaks = scales::pretty_breaks(n = 5))+
   scale_color_discrete(name = 'Year')

Analyze the days for high consumption of water heater and AC

 submeter_data %>% group_by(dayname,year) %>% summarise(Sub3total_day =sum(Sub_metering_3))%>%
   ggplot(aes(x=dayname,y=Sub3total_day,group=factor(year)))+
   geom_line(aes(color=factor(year)),size=1)+theme_bw()+
   labs(x="Days of the week",y="Submeter3 total/ year for that day")+
   ggtitle("Analyzing Submeter3- to check AC and heater usage ")+
   scale_color_discrete(name = 'Year')

Analyze the time of the day AC and Water heater are mostly used

 submeter_data %>% group_by(dayname,hour) %>% summarise(Sub3total_hour=sum(Sub_metering_3)/1000)%>%
   ggplot(aes(x=hour,y=Sub3total_hour,color=factor(dayname),group=(dayname)))+
   geom_line(size=1)+theme_bw()+geom_point()+
   labs(x="Time of the day",y="Submeter total in Kwh/ hour ")+
   ggtitle("Analyzing the time of the day AC and waterheater is mostly used")+
   scale_x_continuous(breaks = scales::pretty_breaks(n = 12))

Analyze weekends alone

submeter_data %>% filter(dayname %in% c("Sat","Sun")) %>%group_by(dayname,hour,year) %>% summarise(Sub3total_hour =sum(Sub_metering_3))%>%
   ggplot(aes(x=hour,y=Sub3total_hour,color=factor(dayname),shape=factor(year),group=interaction(year,dayname)))+
   geom_line(size=1)+theme_bw()+geom_point()+
   labs(x="Time of the day",y="Submeter total/ hour ")+
   ggtitle("Analyzing the time of the weekend AC and waterheater is mostly used")+
   scale_x_continuous(breaks = scales::pretty_breaks(n = 12))

Compare weekday with Weekend

 submeter_data %>% filter(dayname %in% c("Tue","Sat")) %>%group_by(dayname,hour,year) %>% summarise(Sub3total_hour =sum(Sub_metering_3))%>%
   ggplot(aes(x=hour,y=Sub3total_hour,color=factor(dayname),shape=factor(year),group=interaction(year,dayname)))+
   geom_line(size=1)+theme_bw()+geom_point()+
   labs(x="Time of the day",y="Submeter total/ hour ")+
   ggtitle("Comparing weekday and weekend for AC and waterheater usage")+
   scale_x_continuous(breaks = scales::pretty_breaks(n = 12))+
   scale_colour_brewer(palette = "Set2")

Get Submeter 3 usage in summer and winter

#### Get typical submeter 3 usage per day for summer weeks

summer_weeks <- submeter_data %>%
                  filter( week >=23 & week<=30)%>%
                group_by(dayname)%>%
                summarise(summer_sub3_usage= sum(Sub_metering_3)/1000)

#### Get typical submeter3 usage per day for winter weeks

winter_weeks <- submeter_data %>%
  filter( week >=1 & week<=8)%>%
  group_by(dayname)%>%
  summarise(winter_sub3_usage= sum(Sub_metering_3)/1000)

Plot summer vs winter for submeter 3

 ggplot(data=summer_weeks)+
   geom_line(aes(x=dayname, y=summer_sub3_usage, group=1,colour='Summer'),size=1.25) +
   geom_line(data = winter_weeks, aes(x=dayname, y=winter_sub3_usage, group=1, color='Winter'),size=1.25)+
   scale_colour_manual(values=c('Winter'='blue', 'Summer'='red')) +
   labs(colour='Season') + theme_classic()+
   labs(x='Day of the Week', y='Submeter usage in kWh') +
   ggtitle('Submeter3 usage in Summer vs Winter') +
   theme(panel.border=element_rect(colour='black', fill=NA))+
 theme(axis.line = element_line(size=1, colour = "black"),
       panel.grid.major = element_line(colour = "#d3d3d3"), panel.grid.minor = element_blank(),
       panel.border = element_blank(), panel.background = element_blank()) +
   theme(plot.title = element_text(size = 14, family = "Tahoma", face = "bold"),
         text=element_text(family="Tahoma"),
         axis.text.x=element_text(colour="black", size = 10),
         axis.text.y=element_text(colour="black", size = 10),
         legend.key=element_rect(fill="white", colour="white"))

Summary of EDA:

  1. There are 1,569,834 observations in the final dataset that has data from 2007-2009
  2. Max value for Submeter 1 is 82, 78 for Submeter 2 and 31 for Submeter 3
  3. Min value for all 3 submeters is 0
  4. Energy consumption by Submeter 3- AC and Water heater is the highest
  5. Energy consumption by Submeter 1 – Kitchen is the least
  6. Submeter 3 contributes to 71% of the total power consumption whereas Submeter 1 and 2 contributes 13% and 16% respectively
  7. 91.6% of the submeter 1 values are 0 , 71.3% of the Submeter 2 values are 0 and only 46.7% of Submeter 3 values are 0.
  8. Submeter 1&2 Consumption decreased over years and Submeter 3 Consumption increased over years. S3 consumption increased 11.8% from 2008-2009 while S2 reduced 10.5% in the same period.
  9. Energy consumption by Kitchen Appliances, laundry room , water heater and AC is higher on weekends compared to Weekdays
  10. Energy consumption is less during night and high during day for obvious reasons
  11. Energy consumption by Kitchen Appliances (S1) is almost zero from 3 am to 8 am
  12. Dishwasher runs at night between 9 pm-11 pm most of the days
  13. Water heater and AC power consumption is highest on Saturday followed by Friday and lowest on Thursdays
  14. Kitchen is mostly used on Weekend (Sat, Sun) followed by Wednesday- mid week
  15. Laundry is mostly done on Sunday followed by Wednesday and Saturday
  16. Laundry power consumption is high in the afternoon from 1 pm- 4 pm followed by 9-10 pm at night
  17. Water heater and AC consumption is high during the day from 8 am-3 pm may be because of shower, getting ready for the day in the morning. The next high consumption is from 9-11 pm.

Evidence for client’s claim

In order to find out whether the client’s claim of not occupying the residence for a certain period of time in Summer of 2008 is valid, the weekly submeter 3 total is compared against the summer weeks of 2007 and 2009 to understand typical power usage.

sub_weekly_total%>% filter(week>=23 & week <=37) %>%
  ggplot( aes(x = week, y = sub3_weekly_total)) +
  facet_wrap(~year)+
  geom_col(aes(fill=factor(year)))+ theme_classic()+
  labs(x="Weeks",y="Submeter3 usage/Week")+
  ggtitle("Weekly Submeter3 usage across years")+
   scale_fill_discrete(name = 'Year')

It can be noted that weeks 32-34 of 2008 have unusually lower weekly total. They correspond to Aug 2008.

Evidence for Client’s claim - Lets Take August month- Wk 31-36 of 2008

Meter_data %>% 
  filter(year==2008 & week>=31 &week <=36) %>% 
  group_by(All_Meter,day,year) %>% 
  summarise(Weekly_usage=sum(Watt_hr)/1000)%>%
  ggplot(aes(x = day, y = Weekly_usage)) +
  geom_line(aes(color=factor(All_Meter)),size=1.25)+ theme_classic()+
  xlab("August Month")+
  ylab("Submeter usage/day in Kwh")+
  ggtitle ("Submeter usage in Summer- August 2008")+
  scale_x_continuous(breaks=seq(1,31,4))+
   scale_color_discrete(name = 'Submeter:   ', labels = c('Kitchen', 'Laundry', 'Water Heater & AC'))

Power consumption- Aug 2008

houseMonth <- filter(submeter_data, year == 2008 & month=="Aug" & minute==0)
plot_ly(houseMonth, x = ~houseMonth$DateTime, y = ~houseMonth$Sub_metering_1, name = 'Kitchen', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~houseMonth$Sub_metering_2, name = 'Laundry Room', mode = 'lines') %>%
  add_trace(y = ~houseMonth$Sub_metering_3, name = 'Water Heater & AC', mode = 'lines') %>%
  layout(title = "Power Consumption Aug 2008",
         xaxis = list(title = "Time"),
         yaxis = list (title = "Power (watt-hours)"))
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

The above plot shows the daily total of all 3 submeters for Aug 2008. It can be noted that there is a sudden drop in the total from Aug 9th till 28th of August. This proves that the Client’s claim is true.

Visualize Submeters for a time period

To gain maximum information from Visualization, subsetting the data and adjusting the granularity is important. Another goal of subsetting is to focus on periods of time that highlights patterns of power usage. Granularity describes the frequency of observations within a time series dataset.

The power consumption changes during the course of the day, week, month. Therefore these 3 time periods can be considered to visualize the submeter data. Since the source data has data for every minute, we must reduce the granularity to gain information. It can be every 10 mins/ 30 mins/ 1 h/ 4hrs/ twice daily etc.

The following time period and granularity are chosen for plotting.

  1. Plot power consumption by all submeters in a day – Jan 9th 2008 with data collected every 10 mins throughout the day (144 Observations)
  2. Plot power consumption by all submeters in a week – 30th week of 2007 (Summer month July)with data collected every 30 mins throughout the day ( 336 observations)
  3. Plot power consumption by all submeters in a week - 50th week of 2009 (Winter month Dec)with data collected every hour throughout the day (168 observations)
  4. Plot power consumption by all submeters in a month – Jan 2008 (Winter month) with data collected every 4 hrs throughout the day (185 observations)

Visualize a single day with data collected every 10 mins

houseDay<- filter(submeter_data, year == 2008 & month == "Jan" & day == 9 & (minute == 0 | minute == 10 | minute == 20 | minute == 30 | minute == 40 | minute == 50))
plot_ly(houseDay, x = ~houseDay$DateTime, y = ~houseDay$Sub_metering_1, name = 'Kitchen', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~houseDay$Sub_metering_2, name = 'Laundry Room', mode = 'lines') %>%
  add_trace(y = ~houseDay$Sub_metering_3, name = 'Water Heater & AC', mode = 'lines') %>%
  layout(title = "Power Consumption January 9th, 2008",
         xaxis = list(title = "Time"),
         yaxis = list (title = "Power (watt-hours)"))

Insights:

  1. The above plot shows the power consumption of all 3 sub meters on Jan 9th with data taken every 10 mins throughout the day
  2. Laundry room power consumption is even throughout the day
  3. Water heater and AC consumption is more from 6 am-3 pm and after 9 pm and its around 18-19 watt hours
  4. Kitchen appliances- either cooking/ dishwasher is used from 5:30-6:30 pm and consumed the maximum power of almost 38 watt hours

Visualize a single week with data collected every 30 mins

houseWeek25 <- filter(submeter_data, year == 2007 & week == 30 &(minute==0|minute==30))
## Plot sub-meter 1, 2 and 3 with title, legend and labels - All observations 
plot_ly(houseWeek25, x = ~houseWeek25$DateTime, y = ~houseWeek25$Sub_metering_1, name = 'Kitchen', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~houseWeek25$Sub_metering_2, name = 'Laundry Room', mode = 'lines') %>%
  add_trace(y = ~houseWeek25$Sub_metering_3, name = 'Water Heater & AC', mode = 'lines') %>%
  layout(title = "Power Consumption Week 30 July 2007",
         xaxis = list(title = "Time"),
         yaxis = list (title = "Power (watt-hours)"))

Insights:

  1. The above plot shows the power consumption by all 3 submeters during week 30 of 2007- July 23, Mon- July 29, Sun with data collected every 30 mins
  2. Water heater and AC consumption is evenly distributed throughout the week at around 17 watt hours
  3. Laundry usage was highest on Saturday-July 28th with highest being 70 watt hours. Looks like the laundry was done on every alternate days- Mon, Thu, Sat.
  4. Kitchen is also used on alternate days- 23rd,26th and 28th and consumed almost 36-37 watt hours.

Visualize a single week with data collected every 1 h

houseWeek50 <- filter(submeter_data, year == 2009 & week == 50 & minute==0)
## Plot sub-meter 1, 2 and 3 with title, legend and labels - All observations 
plot_ly(houseWeek50, x = ~houseWeek50$DateTime, y = ~houseWeek50$Sub_metering_1, name = 'Kitchen', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~houseWeek50$Sub_metering_2, name = 'Laundry Room', mode = 'lines') %>%
  add_trace(y = ~houseWeek50$Sub_metering_3, name = 'Water Heater & AC', mode = 'lines') %>%
  layout(title = "Power Consumption Week 50 Dec 2009",
         xaxis = list(title = "Time"),
         yaxis = list (title = "Power (watt-hours)"))

Insights:

  1. The above plot shows the power consumption by all submeters in the week 50 of 2009 with data collected every hour throughout the week
  2. Since its winter, the Water heater and AC consumption spikes to 28-30 watt hours throughout the week
  3. Laundry is done only once in this week and done on Saturday and it consumed the highest power of 71 watt hours
  4. Kitchen is used only on Dec 10-13, Thu-Sun with highest consumption being 40 watt hours

Visualize submeter data taken every 4 hrs for a month

houseMonthJan <- filter(submeter_data, year == 2008 & month=="Jan" & minute==0 & (hour == 0 | hour == 4 | hour==8|hour==12 |hour==16| hour==20))
## Plot sub-meter 1, 2 and 3 with title, legend and labels - All observations 
plot_ly(houseMonthJan, x = ~houseMonthJan$DateTime, y = ~houseMonthJan$Sub_metering_1, name = 'Kitchen', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~houseMonthJan$Sub_metering_2, name = 'Laundry Room', mode = 'lines') %>%
  add_trace(y = ~houseMonthJan$Sub_metering_3, name = 'Water Heater & AC', mode = 'lines') %>%
  layout(title = "Power Consumption Jan 2008",
         xaxis = list(title = "Time"),
         yaxis = list (title = "Power (watt-hours)"))

Insights:

  1. The above plot shows the power consumption by all 3 submeters in Jan 2008 with data collected every 4 hrs throughout the month (185 observations)
  2. Water heater and AC consumption is evenly distributed with 18-19 watt hours on an average
  3. Laundry is done on 3 Tuesdays- Jan 15,22,29 with highest consumption being 38 watt hours.
  4. Kitchen is mostly used on Saturdays – Jan 12,19,26 consuming 36-39 watt hours

Forecasting

Visualize and forecast time series

Below are the 4 time series objects created using ts() function and visualization of the same.

  1. Submeter 3 data – Collected every Monday at 8:01 pm with a frequency of 52 observations/year – Weekly time series
  2. Submeter 2 data- Collected every hour from Sep 1 2008 to Sep 30 2008 with frequency of 30-Hourly time series
  3. Submeter 1 data- Collected monthly total usage from Jan 2007- Dec 2009 with frequency of 12 observations/ year- Monthly time series
  4. Submeter 3 data – Collected daily total usage from Jan 2007- Dec 2009 -with frequency of 365 observations/ year.

Weekly time series for submeter3 collected every monday for 3 years and forecast next 20 weeks

houseweek_mon <- filter(submeter_data, dayname=="Mon" & hour == 20 & minute == 1)
houseweek_ts <-ts(houseweek_mon$Sub_metering_3, frequency=52, start=c(2007,1))
plot(houseweek_ts, xlab = "Weeks from 2007-2010", ylab = "Watt Hours", main = "Weekly time series of Submeter 3")+ theme_classic()

## NULL
## Fit tslm model
fit_weekly <- tslm(houseweek_ts ~ trend + season) 
summary(fit_weekly)
## 
## Call:
## tslm(formula = houseweek_ts ~ trend + season)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.0000  -4.2882  -0.3333   2.0452  14.0452 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  19.84427    3.58071   5.542 2.27e-07 ***
## trend        -0.02651    0.01277  -2.075 0.040430 *  
## season2     -12.74608    5.25762  -2.424 0.017063 *  
## season3     -12.71957    5.25686  -2.420 0.017272 *  
## season4     -18.35972    5.25613  -3.493 0.000703 ***
## season5     -12.66655    5.25543  -2.410 0.017700 *  
## season6     -18.30670    5.25476  -3.484 0.000725 ***
## season7     -12.61353    5.25412  -2.401 0.018141 *  
## season8      -7.25368    5.25352  -1.381 0.170322    
## season9     -18.22717    5.25294  -3.470 0.000759 ***
## season10    -12.20066    5.25240  -2.323 0.022133 *  
## season11    -12.50749    5.25189  -2.382 0.019060 *  
## season12    -12.48098    5.25141  -2.377 0.019298 *  
## season13    -18.12114    5.25096  -3.451 0.000808 ***
## season14    -18.09463    5.25054  -3.446 0.000821 ***
## season15    -12.40145    5.25015  -2.362 0.020032 *  
## season16    -11.70827    5.24979  -2.230 0.027880 *  
## season17    -17.34843    5.24946  -3.305 0.001305 ** 
## season18    -17.65525    5.24917  -3.363 0.001079 ** 
## season19    -11.62874    5.24890  -2.215 0.028909 *  
## season20    -11.60223    5.24867  -2.211 0.029261 *  
## season21    -17.24239    5.24847  -3.285 0.001389 ** 
## season22     -6.21588    5.24830  -1.184 0.238970    
## season23    -14.18937    5.24816  -2.704 0.008012 ** 
## season24    -17.49620    5.24805  -3.334 0.001188 ** 
## season25    -11.46969    5.24797  -2.186 0.031092 *  
## season26     -7.10984    5.24792  -1.355 0.178420    
## season27    -17.08333    5.24791  -3.255 0.001529 ** 
## season28    -17.39016    5.24792  -3.314 0.001268 ** 
## season29    -17.36365    5.24797  -3.309 0.001288 ** 
## season30    -17.00380    5.24805  -3.240 0.001605 ** 
## season31    -16.97729    5.24816  -3.235 0.001632 ** 
## season32    -17.28412    5.24830  -3.293 0.001354 ** 
## season33    -11.25761    5.24847  -2.145 0.034286 *  
## season34    -17.23110    5.24867  -3.283 0.001400 ** 
## season35    -17.20459    5.24890  -3.278 0.001423 ** 
## season36    -16.84475    5.24917  -3.209 0.001771 ** 
## season37    -13.81824    5.24946  -2.632 0.009772 ** 
## season38    -11.12506    5.24979  -2.119 0.036459 *  
## season39    -10.76522    5.25015  -2.050 0.042835 *  
## season40    -11.40537    5.25054  -2.172 0.032110 *  
## season41     -7.37886    5.25096  -1.405 0.162929    
## season42    -16.68569    5.25141  -3.177 0.001957 ** 
## season43    -10.65918    5.25189  -2.030 0.044952 *  
## season44    -16.96600    5.25240  -3.230 0.001657 ** 
## season45    -10.93949    5.25294  -2.083 0.039746 *  
## season46    -16.91298    5.25352  -3.219 0.001714 ** 
## season47     -5.55314    5.25412  -1.057 0.293001    
## season48    -16.85996    5.25476  -3.209 0.001774 ** 
## season49    -16.83345    5.25543  -3.203 0.001805 ** 
## season50    -13.80694    5.25613  -2.627 0.009921 ** 
## season51     -5.11377    5.25686  -0.973 0.332920    
## season52    -11.08726    5.25762  -2.109 0.037363 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.871 on 104 degrees of freedom
## Multiple R-squared:  0.3831, Adjusted R-squared:  0.07461 
## F-statistic: 1.242 on 52 and 104 DF,  p-value: 0.1747
## Create the forecast for sub-meter 3. Forecast ahead 20 time periods 
forecastfit_weekly <- forecast(fit_weekly, h=20,level=c(80,90))
## Plot the forecast for sub-meter 3. 
plot(forecastfit_weekly, ylim = c(0, 20), ylab= "Watt-Hours", xlab="Weeks", main="20 weekly Mondays forecast of Submeter3 readings in 2010")

plot(forecastfit_weekly, ylim = c(0, 20), ylab= "Watt-Hours", xlab="Weeks", main="20 weeks forecast of submeter3 from lm model",start(c(2010)),shadecols="pink")

Hourly time series for submeter 2 for Sep 2008 and forecast next 72 hours

house_hourly <- submeter_data %>% filter(month=='Sep' & year==2008 &minute==0) 
house_hourly_ts <-ts(house_hourly$Sub_metering_2, frequency=30)
plot(house_hourly_ts,  xlab = "Hours", ylab = "Watt Hours", main = "Hourly time series of Sub-meter 2- Sep 2008")+ theme_classic()

## NULL
## Fit tslm model
fit_hourly <- tslm(house_hourly_ts ~ trend+season ) 
summary(fit_hourly)$r.squared
## [1] 0.032314
summary(fit_hourly)$adj.r.squared
## [1] -0.009820365
## Create the forecast for sub-meter 2. 
forecastfit_hourly <- forecast(fit_hourly, h=72)
## Plot the forecast for sub-meter 3. 
plot(forecastfit_hourly,ylim = c(0, 40), ylab= "Watt-Hours", xlab="Hours", main="Forecast of hourly time series of submeter2",shadecols = "pink")

plot(forecastfit_hourly,ylim = c(0, 40), ylab= "Watt-Hours", xlab="Hours", main="Forecast of hourly time series of submeter2",start(24),shadecols = "pink")

Monthly time series- Forecast Submeter 1 total for3 years and forecast submeter total for next 10 months

## Submeter 1 monthly time series
sub1_monthly_total <- submeter_data %>% group_by(year,month) %>% summarise(Totalpermonth= sum(Sub_metering_1)/1000)
sub1_monthly_total_ts <- ts(sub1_monthly_total$Totalpermonth,frequency=12,start=c(2007,1))
plot(sub1_monthly_total_ts, main="Monthly time series of Submeter 1 total",ylab="Submeter1 usage/month")

fit_monthly <-  tslm(sub1_monthly_total_ts ~ trend + season)
forecast_monthly <- forecast(fit_monthly, h=10,level=c(80,90))
plot(forecast_monthly,ylim = c(0, 80), ylab= "Submeter total in Kwh", xlab="Months", main="Forecast total submeter1 for next 10 months with tslm model",shadecols="pink")

plot(forecast_monthly,ylim = c(0, 80), ylab= "Submeter total in Kwh", xlab="Months", main="Forecast total submeter1 for next 10 months with tslm model",start(2010),shadecols="pink")

Daily time series- Forecast Submeter 3 total for3 years and forecast submeter total for next 150 days

## Submeter 3 daily series
sub3_daily_total <- submeter_data %>% group_by(year,year_day) %>% summarise(Totalperday= sum(Sub_metering_3)/1000)
sub3_daily_total_ts <- ts(sub3_daily_total$Totalperday,frequency=365,start=c(2007,1))
plot(sub3_daily_total_ts,main="Daily time series of Submeter 3 total",ylab="Submeter3 usage/day") 

fit_daily<-  tslm(sub3_daily_total_ts ~ trend + season)
forecast_daily <- forecast(fit_daily, h=150,level=c(80,90))
plot(forecast_daily,xlab="Days",ylab="Submeter usage/day",main="150 days Forecast of daily submeter3 total ")

plot(forecast_daily,xlab="Days",ylab="Submeter usage/day",main="150 days Forecast of daily submeter3 total ",start(2010),shadecols="pink")

COMPARISON OF METRICS OF ALL THE ABOVE 4 LINEAR MODELS

Storing metrics of 4 models in a dataframe

Linear_model_results <- data.frame(rbind(accuracy(fit_weekly),accuracy(fit_hourly),accuracy(fit_monthly),accuracy(fit_daily)))

Linear_model_results ['R2'] <- data.frame(rbind(summary(fit_weekly)$r.squared,summary(fit_hourly)$r.squared,summary(fit_monthly)$r.squared, summary(fit_daily)$r.squared))


Linear_model_results ['Adjusted_R2'] <- data.frame(rbind(summary(fit_weekly)$adj.r.squared,summary(fit_hourly)$adj.r.squared,summary(fit_monthly)$adj.r.squared, summary(fit_daily)$adj.r.squared))

Linear_model_results['Model_desc'] <- data.frame(rbind("Forecast of Monday 8:01 pm every week","Forecast every hour for a month- submeter2","Forecast of monthly total Submeter1 for 10 months","Forecast of total submeter3 for 150 days" ))

Linear_model_results['Model_name'] <- data.frame(rbind("fit_weekly","fit_hourly","fit_monthly","fit_daily"))
Model_comparison <- Linear_model_results %>% select(Model_desc,Model_name, R2,Adjusted_R2, RMSE)

kable(Model_comparison,format = "html",caption= "Results of all Random Forest models",digits=4)%>% kable_styling(bootstrap_options = "striped", full_width = F)
Results of all Random Forest models
Model_desc Model_name R2 Adjusted_R2 RMSE
Training.set Forecast of Monday 8:01 pm every week fit_weekly 0.3831 0.0746 5.5924
Training.set.1 Forecast every hour for a month- submeter2 fit_hourly 0.0323 -0.0098 4.7615
Training.set.2 Forecast of monthly total Submeter1 for 10 months fit_monthly 0.5796 0.3602 9.4306
Training.set.3 Forecast of total submeter3 for 150 days fit_daily 0.5173 0.2753 2.4273

Decompose time series

1. Decompose the weekly time series - submeter3

decomp_weekly <- decompose(houseweek_ts)
plot(decomp_weekly)

#### 2. Decompose the hourly time series - submeter2

decomp_hourly<- decompose(house_hourly_ts)
plot(decomp_hourly)

3. Decompose the monthly time series- submeter1

decomp_monthly <- decompose(sub1_monthly_total_ts)
plot(decomp_monthly)

4. Decompose the daily time series - submeter3

decomp_daily <- decompose(sub3_daily_total_ts)
plot(decomp_daily)

Comparison of decomposition summary

Seasonal_decomposition_summary <-data.frame(rbind(summary(decomp_weekly$seasonal), summary(decomp_hourly$seasonal),summary(decomp_monthly$seasonal),summary(decomp_daily$seasonal)))
Seasonal_decomposition_summary['Timeseries']<- data.frame(rbind("Weekly ts seasonal","Hourly ts seasonal ","Monthly ts seasonal","Daily ts seasonal"))
kable(Seasonal_decomposition_summary,format = "html",caption= "Results of all Random Forest models",digits=4)%>% kable_styling(bootstrap_options = "striped", full_width = F)
Results of all Random Forest models
Min. X1st.Qu. Median Mean X3rd.Qu. Max. Timeseries
-3.7014 -3.1101 -2.8120 0.0931 5.5053 14.6207 Weekly ts seasonal
-0.9420 -0.7246 -0.4858 0.0000 0.9291 2.4059 Hourly ts seasonal
-31.6092 -5.1400 1.1778 0.0000 4.5897 19.1106 Monthly ts seasonal
-7.0325 -1.6698 0.1302 -0.0033 1.6551 6.7177 Daily ts seasonal
Trend_decomposition_summary <-data.frame(rbind(summary(decomp_weekly$trend), summary(decomp_hourly$trend),summary(decomp_monthly$trend),summary(decomp_daily$trend)))
Trend_decomposition_summary['Timeseries']<- data.frame(rbind("Weekly ts trend","Hourly ts trend ","Monthly ts trend","Daily ts trend"))
kable(Trend_decomposition_summary,format = "html",caption= "Results of all Random Forest models",digits=4)%>% kable_styling(bootstrap_options = "striped", full_width = F)
Results of all Random Forest models
Min. X1st.Qu. Median Mean X3rd.Qu. Max. NA.s Timeseries
1.8654 2.4808 3.0577 3.3295 3.8173 6.6635 52 Weekly ts trend
0.0167 0.2667 0.4167 0.9812 1.4333 4.0333 30 Hourly ts trend
46.0098 48.8336 50.4856 50.4622 52.6139 53.7758 12 Monthly ts trend
8.1182 8.6760 8.8914 8.8620 9.0618 9.7742 364 Daily ts trend
Random_decomposition_summary <-data.frame(rbind(summary(decomp_weekly$random), summary(decomp_hourly$random),summary(decomp_monthly$random),summary(decomp_daily$random)))
Random_decomposition_summary['Timeseries']<- data.frame(rbind("Weekly ts Random","Hourly ts Random ","Monthly ts Random","Daily ts Random"))
kable(Random_decomposition_summary,format = "html",caption= "Results of all Random Forest models",digits=4)%>% kable_styling(bootstrap_options = "striped", full_width = F)
Results of all Random Forest models
Min. X1st.Qu. Median Mean X3rd.Qu. Max. NA.s Timeseries
-10.5726 -1.5341 -0.1784 -0.2216 1.0909 10.1293 52 Weekly ts Random
-5.8559 -1.3370 -0.2870 0.0028 0.4447 65.5608 30 Hourly ts Random
-18.3063 -5.2336 -0.4250 -0.4250 4.3835 17.4562 12 Monthly ts Random
-6.5524 -1.2683 0.1072 0.1072 1.4828 6.7669 364 Daily ts Random

Function for plotting Actual vs fitted Holt Winters forecast

library(ggplot2)
library(reshape)


HWplot<-function(ts_object,  n.ahead=4,  CI=.95,  error.ribbon='green', line.size=1){
  
  hw_object<-HoltWinters(ts_object)
  
  forecast<-predict(hw_object,  n.ahead=n.ahead,  prediction.interval=T,  level=CI)
  
  
  for_values<-data.frame(time=round(time(forecast),  3),  value_forecast=as.data.frame(forecast)$fit,  dev=as.data.frame(forecast)$upr-as.data.frame(forecast)$fit)
  
  fitted_values<-data.frame(time=round(time(hw_object$fitted),  3),  value_fitted=as.data.frame(hw_object$fitted)$xhat)
  
  actual_values<-data.frame(time=round(time(hw_object$x),  3),  Actual=c(hw_object$x))
  
  
  graphset<-merge(actual_values,  fitted_values,  by='time',  all=TRUE)
  graphset<-merge(graphset,  for_values,  all=TRUE,  by='time')
  graphset[is.na(graphset$dev),  ]$dev<-0
  
  graphset$Fitted<-c(rep(NA,  NROW(graphset)-(NROW(for_values) + NROW(fitted_values))),  fitted_values$value_fitted,  for_values$value_forecast)
  
  
  graphset.melt<-melt(graphset[, c('time', 'Actual', 'Fitted')], id='time')
  
  p<-ggplot(graphset.melt,  aes(x=time,  y=value)) + theme_bw()+geom_ribbon(data=graphset, aes(x=time, y=Fitted, ymin=Fitted-dev,  ymax=Fitted + dev),  alpha=.2,  fill=error.ribbon) + geom_line(aes(colour=variable), size=line.size) + geom_vline(aes(xintercept=max(actual_values$time)),  lty=2) + xlab('Time') + ylab('Value') + theme(legend.position='bottom') + scale_colour_hue('')
  return(p)
  
}

Holt winters forecasting

Forecast without trend and seasonality

houseweek_tsadjusted <- houseweek_ts- decomp_weekly$seasonal
plot(houseweek_tsadjusted)

## Lets check whether seasonality has been removed
plot(decompose(houseweek_tsadjusted))

## Fitting Holt Winters model
## beta is for trend and gamma is for seasonal. 
## Setting both parameters to FALSE , the Holtwinters does Exponential Smoothing
fit_weekly_hw <- HoltWinters(houseweek_ts,beta=FALSE, gamma=FALSE)
plot(fit_weekly_hw,ylim = c(0, 25))

## Forecast using HW model
forecast_weekly_hw <- forecast(fit_weekly_hw,h=25,level=c(10,25))
plot(forecast_weekly_hw, ylim = c(0, 20), ylab= "Watt-Hours", xlab="Time - Sub-meter 3", main="Holt Winters forecast for 20 weeks- Submeter3",shadecols="pink")

## Zoom on the forecasted part
plot(forecast_weekly_hw, ylim = c(0, 20), ylab= "Watt-Hours", xlab="Time - Sub-meter 3",start(2010) ,main="Holt Winters forecast for 20 weeks- Submeter3",shadecols="pink")

Consider Trend and Seasonality for Holt Winters model - Weekly time series

## beta and gamma are not specified and it will be automatically computed by R
fit_weekly_hw <-HoltWinters(houseweek_ts)
plot(fit_weekly_hw,ylim = c(0, 25))

## Alpha, beta, gamma is choosen by R- 0.06,002 and 0.72 are choosem
## Can see that recent seasonality is given more importance

## Forecast using HW model
forecast_weekly_hw <- forecast(fit_weekly_hw,h=20,level=c(10,25))
plot(forecast_weekly_hw, ylim = c(0, 20), ylab= "Watt-Hours", xlab="Time - Sub-meter 3", main="Holt Winters forecast for 20 weeks- Submeter3",shadecols="pink")

## Zoom on the forecasted part
plot(forecast_weekly_hw, ylim = c(0, 20), ylab= "Watt-Hours", xlab="Time - Sub-meter 3",start(2010) ,main="Holt Winters forecast for 20 weeks- Submeter3",shadecols="pink")

## Plot Actual vs fitted
HWplot(houseweek_ts, n.ahead = 25,error.ribbon = "red")

Holt Winters model - hourly time series

## beta and gamma are not specified and it will be automatically computed by R
fit_hourly_hw <-HoltWinters(house_hourly_ts)
plot(fit_hourly_hw,ylim = c(0, 40))

## Check the model parameters
fit_hourly_hw
## Holt-Winters exponential smoothing with trend and additive seasonal component.
## 
## Call:
## HoltWinters(x = house_hourly_ts)
## 
## Smoothing parameters:
##  alpha: 0.01552357
##  beta : 0
##  gamma: 0.005444683
## 
## Coefficients:
##             [,1]
## a    0.589554228
## b   -0.002651094
## s1  -0.355729322
## s2  -0.391695318
## s3   0.615181543
## s4  -0.375690028
## s5  -0.394920621
## s6   0.528912531
## s7  -0.181360401
## s8  -0.182575888
## s9   1.605280905
## s10 -0.346558476
## s11 -0.366030801
## s12 -0.322279732
## s13 -0.127749163
## s14 -0.255403336
## s15 -0.342872768
## s16 -0.254728347
## s17 -0.419972044
## s18  0.818952922
## s19 -0.369233920
## s20 -0.251494958
## s21  0.500859684
## s22 -0.390115645
## s23 -0.395492273
## s24  1.368757276
## s25 -0.428930208
## s26 -0.408908514
## s27  1.404398062
## s28 -0.187099159
## s29 -0.091820787
## s30  0.695595446
## Alpha, beta, gamma is choosen by R- 0.015,0 and 0.005 are choosem

# Check Sum of squared error (SSE) metrics
fit_hourly_hw$SSE
## [1] 17187.7
## Forecast using HW model
forecast_hourly_hw <- forecast(fit_hourly_hw,h=72,level=c(10,25))
plot(forecast_hourly_hw, ylim = c(0, 50), ylab= "Watt-Hours", xlab="Time - Sub-meter 2", main="Holt Winters forecast for next 24 hrs- Submeter 2")

## Zoom on the forecasted part
plot(forecast_hourly_hw, ylim = c(0, 20), ylab= "Watt-Hours", xlab="Time - Sub-meter 3",start(25) ,main="Holt Winters forecast for next 24 hrs- Submeter 2",shadecols="pink")

# Plot residuals
plot(forecast_hourly_hw$residuals)

## Plot Actual vs fitted

HWplot(house_hourly_ts, n.ahead = 72,error.ribbon = "red")

Holt Winters model - Monthly time series

## beta and gamma are not specified and it will be automatically computed by R
fit_monthly_hw <-HoltWinters(sub1_monthly_total_ts)
plot(fit_monthly_hw,ylim = c(0, 100))

## Check the model parameters
fit_monthly_hw
## Holt-Winters exponential smoothing with trend and additive seasonal component.
## 
## Call:
## HoltWinters(x = sub1_monthly_total_ts)
## 
## Smoothing parameters:
##  alpha: 0
##  beta : 0
##  gamma: 0.1578098
## 
## Coefficients:
##            [,1]
## a    45.6411943
## b    -0.3770460
## s1   10.6711466
## s2  -10.9136261
## s3   11.7957596
## s4   -2.2818286
## s5    0.3641721
## s6   12.5299996
## s7  -13.2888449
## s8  -21.6040477
## s9    0.2742031
## s10  -7.3688571
## s11   1.2534914
## s12  16.1428707
## Alpha, beta, gamma is choosen by R- 0,0 and 0.402 are choosen
## Seasonality is given importance

# Check Sum of squared error (SSE) metrics
fit_monthly_hw$SSE
## [1] 4546.921
## Forecast using HW model
forecast_monthly_hw <- forecast(fit_monthly_hw,h=10,level=c(10,25))
plot(forecast_monthly_hw, ylim = c(0, 100), ylab= "Watt-Hours", xlab="Time - Sub-meter 1", main="Holt Winters forecast for next 10  months- Submeter 1",shadecols="pink")

## Zoom on the forecasted part
plot(forecast_monthly_hw, ylim = c(0, 100), ylab= "Watt-Hours", xlab="Time - Sub-meter 1",start(2010) ,main="Holt Winters forecast for next 10  months- Submeter 1",shadecols="pink")

# Plot residuals
plot(forecast_monthly_hw$residuals)

## Plot Actual vs fitted

HWplot(sub1_monthly_total_ts, n.ahead = 10,error.ribbon = "red")

Holt Winters model - daily time series

## beta and gamma are not specified and it will be automatically computed by R
fit_daily_hw <-HoltWinters(sub3_daily_total_ts)
plot(fit_daily_hw,ylim = c(0, 20))

## Check the model parameters
fit_daily_hw
## Holt-Winters exponential smoothing with trend and additive seasonal component.
## 
## Call:
## HoltWinters(x = sub3_daily_total_ts)
## 
## Smoothing parameters:
##  alpha: 0.1623532
##  beta : 0
##  gamma: 0.5957045
## 
## Coefficients:
##              [,1]
## a    11.737787919
## b     0.002221964
## s1    5.809819211
## s2   -1.306074613
## s3    2.831117234
## s4    3.432433932
## s5    1.141457474
## s6    1.710508408
## s7    3.114814669
## s8    2.908980865
## s9    1.459777774
## s10   1.309740454
## s11   1.454446656
## s12  -0.494744682
## s13   2.650137978
## s14   3.367313256
## s15   3.334487640
## s16   0.730608695
## s17   2.266600316
## s18   5.211499991
## s19   1.074540158
## s20  -1.740252860
## s21   1.625256802
## s22  -1.581444471
## s23   0.374599511
## s24  -1.193825317
## s25  -0.162197699
## s26   3.165873697
## s27  -0.200650910
## s28  -0.462969414
## s29   4.050399423
## s30  -1.502160981
## s31   0.940061792
## s32   1.220300568
## s33   1.385548655
## s34   0.132065413
## s35   3.140869797
## s36   4.764664094
## s37   0.145401804
## s38  -0.195036917
## s39   2.553436385
## s40   2.648989568
## s41  -1.879787516
## s42   0.131688808
## s43   1.697245811
## s44   2.134204270
## s45  -0.813876916
## s46   4.618181089
## s47  -2.306859827
## s48  -3.973281144
## s49  -2.557548924
## s50  -1.143501645
## s51  -4.549561305
## s52   0.207783214
## s53  -1.305652680
## s54  -0.414778249
## s55   4.085944798
## s56   0.469505410
## s57  -4.801257101
## s58  -4.938980592
## s59  -1.602826482
## s60  -3.115781892
## s61  -4.888067968
## s62  -5.880328075
## s63  -3.471379278
## s64  -2.557201783
## s65  -3.670906665
## s66  -1.442683844
## s67  -0.810916137
## s68  -0.125395642
## s69  -3.743176172
## s70  -1.430080150
## s71   0.703751782
## s72   2.244658321
## s73  -2.546404092
## s74   5.041981565
## s75  -2.979300809
## s76  -0.862733276
## s77   1.060308305
## s78   2.061826476
## s79  -2.359738720
## s80  -1.669250716
## s81   1.850947705
## s82   0.305420386
## s83   0.939955425
## s84  -0.177067203
## s85   2.413512520
## s86  -0.574893641
## s87   0.905154328
## s88   2.446987194
## s89   4.171465490
## s90   1.442256918
## s91   1.872440898
## s92   2.089025165
## s93   2.921756757
## s94  -1.492267723
## s95   3.882573927
## s96   2.588499821
## s97   0.072538250
## s98   0.289365637
## s99   3.717797534
## s100  2.652583650
## s101  1.078031365
## s102  6.469433805
## s103  4.360944537
## s104  0.761337215
## s105  0.867715733
## s106 -0.727024092
## s107  2.377659667
## s108 -0.096555383
## s109  2.154857717
## s110  2.254648463
## s111  2.731841272
## s112 -1.009912893
## s113  3.449803408
## s114 -1.848537817
## s115 -1.454283148
## s116  1.965359265
## s117 -1.712528138
## s118 -3.718975291
## s119 -3.093082644
## s120 -0.831205277
## s121  0.248743221
## s122 -2.549376953
## s123 -2.513072242
## s124 -4.373293906
## s125 -2.055665043
## s126 -0.388478379
## s127 -0.460134762
## s128 -3.144371934
## s129  2.987921412
## s130  0.651216507
## s131  0.416360659
## s132 -0.986461620
## s133  2.279062728
## s134 -1.044111608
## s135  0.802137039
## s136  0.569282939
## s137  0.724803764
## s138 -1.104390688
## s139  0.176630724
## s140 -0.103840025
## s141  1.111927763
## s142  1.150295506
## s143 -2.371757495
## s144  4.390894971
## s145  2.092043590
## s146 -1.603378348
## s147 -0.016647006
## s148  2.118347445
## s149 -1.297934755
## s150  0.217866409
## s151  1.472655637
## s152 -0.837434110
## s153  2.168171991
## s154 -1.219683602
## s155  1.166425462
## s156 -2.452565574
## s157  1.367359002
## s158 -0.711236419
## s159  0.763453320
## s160  0.524373853
## s161  0.031714038
## s162  0.221884935
## s163  0.137012685
## s164 -2.113666146
## s165 -4.396906714
## s166 -0.225045763
## s167  4.351504727
## s168  0.202814270
## s169  2.064188626
## s170  0.904687500
## s171  1.760599246
## s172  1.861392195
## s173  3.588204972
## s174  1.304315992
## s175  4.544627404
## s176  1.596133168
## s177  2.193467989
## s178  2.291434673
## s179  0.251414812
## s180 -1.027586630
## s181 -1.838152629
## s182 -1.022156409
## s183 -1.427452335
## s184 -1.519104683
## s185 -2.917126160
## s186 -1.643306129
## s187 -2.989540556
## s188 -2.256090251
## s189 -1.563813720
## s190 -2.943527387
## s191 -1.802665067
## s192 -1.848601516
## s193 -3.999360021
## s194 -3.569952175
## s195 -2.936339051
## s196 -3.218500460
## s197 -3.732785805
## s198 -3.539623281
## s199 -3.233261403
## s200 -3.803264639
## s201 -3.725547070
## s202 -2.337636751
## s203 -3.671098775
## s204 -2.424641617
## s205 -4.541066736
## s206 -4.888700724
## s207 -4.886560343
## s208 -3.455037088
## s209 -3.796114486
## s210 -3.212630375
## s211 -5.207222595
## s212 -3.939459882
## s213 -2.361209677
## s214 -4.458908691
## s215 -3.825826312
## s216 -3.772812067
## s217 -3.677607312
## s218 -2.010652551
## s219 -2.792524390
## s220 -2.398830846
## s221 -2.381568838
## s222 -2.281054550
## s223  0.062555598
## s224 -2.595347280
## s225 -1.910075463
## s226 -1.594855172
## s227  1.273635659
## s228 -1.540582795
## s229 -0.540857358
## s230 -1.963028407
## s231 -1.531470017
## s232 -1.596915809
## s233 -1.679644636
## s234 -1.006259864
## s235 -1.812458006
## s236 -0.628165562
## s237 -2.581782521
## s238 -1.316421187
## s239 -3.450461606
## s240 -1.746208601
## s241  0.022070573
## s242 -1.384813255
## s243 -0.763038949
## s244 -0.652429639
## s245  2.095307907
## s246 -2.398442918
## s247  1.630348861
## s248  1.138082159
## s249 -0.975556931
## s250  0.755073784
## s251  0.223256150
## s252  0.905546674
## s253 -1.155274002
## s254  0.646331111
## s255  0.763376955
## s256 -1.559445897
## s257  0.345246719
## s258 -2.195768687
## s259 -0.007394597
## s260 -0.853732553
## s261 -1.567812282
## s262  0.897315364
## s263  0.969209131
## s264 -0.281687060
## s265 -2.278662930
## s266 -0.434527419
## s267 -1.052452093
## s268 -0.448507112
## s269  0.644892570
## s270 -3.001887766
## s271  1.130842498
## s272 -1.267321697
## s273 -1.352969565
## s274 -1.833124279
## s275  1.933969781
## s276  0.748743645
## s277 -1.630079330
## s278  0.280890558
## s279 -1.426738752
## s280 -0.217408542
## s281 -0.259088920
## s282  1.387010807
## s283  1.102670865
## s284 -0.816436178
## s285  0.404649526
## s286 -1.215457753
## s287 -0.715538774
## s288 -0.793610755
## s289  0.199675645
## s290  2.650472398
## s291  0.185978133
## s292 -0.166379628
## s293  0.645343541
## s294 -1.089764574
## s295  2.488717218
## s296  1.219991163
## s297  2.440958394
## s298  3.128070329
## s299 -2.323145579
## s300 -2.863367280
## s301 -4.109612030
## s302 -5.342051105
## s303 -3.601319911
## s304 -0.204963528
## s305 -1.726728596
## s306 -3.303607647
## s307 -5.094139617
## s308 -1.455637000
## s309 -3.051870447
## s310 -0.150181405
## s311 -2.149266876
## s312  0.991623665
## s313 -0.949078177
## s314  3.117949146
## s315  0.234028908
## s316  2.245681816
## s317  0.882834700
## s318  5.416831475
## s319  0.702822316
## s320  2.484051934
## s321  3.571990763
## s322  2.941454753
## s323  2.065234070
## s324  2.391887162
## s325  1.484811005
## s326  3.299355965
## s327  0.261229159
## s328  1.269079006
## s329  1.693399847
## s330  1.095677399
## s331  2.800429966
## s332  3.149241387
## s333  0.357664583
## s334  3.197551502
## s335  3.424832946
## s336  3.354882896
## s337  1.409328176
## s338  0.808783700
## s339  3.021700544
## s340  3.166743275
## s341  1.212136450
## s342  4.916253596
## s343  0.549488737
## s344  3.712317736
## s345  1.966150000
## s346 -1.063865870
## s347  2.496886446
## s348  1.174329412
## s349  2.525320401
## s350  0.319707267
## s351  3.733156068
## s352  2.054241167
## s353  3.532174933
## s354  0.970970787
## s355  2.741084646
## s356  6.216555916
## s357  2.602401976
## s358  4.024252650
## s359  2.682054596
## s360  2.141569349
## s361  0.911566882
## s362  4.015617219
## s363  0.500741078
## s364  1.282279868
## s365  5.225810959
## Alpha, beta, gamma is choosen by R- 0.16,0 and 0.59 are choosen
## Seasonality is given importance

# Check Sum of squared error (SSE) metrics
fit_daily_hw$SSE
## [1] 7603.579
## Forecast using HW model
forecast_daily_hw <- forecast(fit_daily_hw,h=150,level=c(10,25))
plot(forecast_daily_hw, ylim = c(0, 25), ylab= "Watt-Hours", xlab="Time - Sub-meter 3", main="Holt Winters forecast for next 150 days- Submeter 3",shadecols="pink")

## Zoom on the forecasted part
plot(forecast_daily_hw, ylim = c(0, 25), ylab= "Watt-Hours", xlab="Time - Sub-meter 3",start(2010) ,main="Holt Winters forecast for next 150 days- Submeter 3",shadecols="pink")

# Plot residuals
plot(forecast_daily_hw$residuals)

## Plot Actual vs fitted
HWplot(sub3_daily_total_ts, n.ahead = 150,error.ribbon = "red")

Summary:

Evidence has been found to support client’s claim that they have not been occupying the residence in the Summer of 2008. From our analysis, it is found that the client has been away from Aug 9th-28th 2008 as there is no activity in Kitchen/ laundry and Submeter 3- AC and Water heater is bare minimum. This is quite different from Aug 2007/2009 and compared to other months.

Suggestions for saving energy

  1. During the summer, heat and humidity-generating activities such as cooking, or dishwashing should be done at dawn or dusk to avoid excess heat creation.
  2. In hot weather, raise your thermostat to 80 degrees or higher if you are leaving your home for more than four hours.
  3. Turn down electric and gas heaters when you are on vacation.
  4. Air dry your dishes instead of using your dishwasher’s heated drying setting and only run your dishwasher when it is fully loaded.
  5. Turn off the lights when you leave a room to save both energy and money.
  6. By switching to high-efficiency air conditioners, you would be reducing your air conditioning usage by up to 50%. To maintain your energy savings, regularly clean and replace your unit’s filters.
  7. Avoid using hot water whenever possible. Doing your laundry on either the cold or the warm cycle will save you a tremendous amount of electricity.
  8. Ensure your home is properly insulated and sealed by identifying where your home has leaks