The data that we have is the electric power consumption in one French household with a one-minute sampling from Dec 2006 to Nov 2010. We have been contacted by a law firm that needs help analyzing archived data records for their client. The client claims to have not been occupying a specific residence at the time of an undisclosed event during the Summer of 2008.
Our goal is to perform an in-depth analysis of the power consumption dataset via data visualization , time series regression modeling and provide evidence to law firm to support client’s claim. Also provide the client with useful correlations or predictions and five energy savings suggestions based on insights that we glean from our analysis.
The data was gathered in a single household located in Sceaux, France between December 2006 and November 2010 (47 months).It contains over 2 million measurements of electric power consumption from 3 sub-meters in one house with a one-minute sampling rate over a period of almost 4 years.
Source data description
Sub-meters info:
library(readr)
library(dplyr)
library(RMySQL)
library(lubridate)
library(ggplot2)
library(fpp2)
library(tidyr)
library(devtools)
library(plotly)
library(tidyverse)
library(ggthemes)
library(extrafont)
library(reshape2)
library(kableExtra)
### Establish database connection
con = dbConnect(MySQL(), user='deepAnalytics', password='Sqltask1234!', dbname='dataanalytics2018', host='data-analytics-2018.cbrosir2cswx.us-east-1.rds.amazonaws.com')
## List the tables in the database
dbListTables(con)
## [1] "iris" "yr_2006" "yr_2007" "yr_2008" "yr_2009" "yr_2010"
## Get the column names of the table
dbListFields(con,'yr_2006')
## [1] "id" "Date" "Time"
## [4] "Global_active_power" "Global_reactive_power" "Global_intensity"
## [7] "Voltage" "Sub_metering_1" "Sub_metering_2"
## [10] "Sub_metering_3"
## Select required columns from all the tables)
yr_2006 <-dbGetQuery(con,"SELECT Date,Time,Sub_metering_1,Sub_metering_2,Sub_metering_3 FROM yr_2006")
yr_2007 <-dbGetQuery(con,"SELECT Date,Time,Sub_metering_1,Sub_metering_2,Sub_metering_3 FROM yr_2007")
yr_2008 <-dbGetQuery(con,"SELECT Date,Time,Sub_metering_1,Sub_metering_2,Sub_metering_3 FROM yr_2008")
yr_2009 <-dbGetQuery(con,"SELECT Date,Time,Sub_metering_1,Sub_metering_2,Sub_metering_3 FROM yr_2009")
yr_2010 <-dbGetQuery(con,"SELECT Date,Time,Sub_metering_1,Sub_metering_2,Sub_metering_3 FROM yr_2010")
## Check the structure of dataframe
str(yr_2006)
str(yr_2007)
str(yr_2008)
str(yr_2009)
str(yr_2010)
## Check head and tail of all the dataframes
head(yr_2006) # Starts from Dec 2016
tail(yr_2006)
head(yr_2010)
tail(yr_2010) # Ends on Nov 26,2011
## Check summary
summary(yr_2007)
summary(yr_2008)
summary(yr_2009)
submeter_data<- bind_rows(yr_2007, yr_2008, yr_2009)
Date_minutes_count <- submeter_data %>% count(Date)
Date_minutes_missing <-Date_minutes_count %>% filter(n !=1440)
missing_data_freq <-data.frame(table(Date_minutes_missing$n))
missing_data <- Date_minutes_missing %>% filter (n <1000)
kable(missing_data,format = "html",caption= "Results of all Random Forest models",digits=4)%>% kable_styling(bootstrap_options = "striped", full_width = F)
| Date | n |
|---|---|
| 2007-04-28 | 21 |
| 2007-04-30 | 576 |
| 2009-06-13 | 30 |
| 2009-06-15 | 985 |
| 2009-08-13 | 549 |
Each date should have 1440 observations (24 hrs*60 mins) as the data is taken every minute.The source data is checked for nulls and 53 dates have missing data. Out of 53 dates, 40 dates have less than 5 minutes of data missing per day. 7 dates have less than 2 hours of data missing per day.Remaining 5 dates have more than 2 hours of data missing per day. Since the missing data is negligible,imputation is not done.
### Create a new column with Date and Time combined
submeter_data<-cbind(submeter_data,paste(submeter_data$Date,submeter_data$Time), stringsAsFactors=FALSE)
colnames(submeter_data)[6]<- "DateTime"
## Reordering the datetime column as 1st column
submeter_data <- submeter_data[,c(6,1,2,3,4,5)]
## Convert the DateTime column from characters to date class in R to make it as a time series data
submeter_data$DateTime <- as.POSIXct(submeter_data$DateTime, "%Y/%m/%d %H:%M:%S")
attr(submeter_data$DateTime, "tzone") <- "Europe/Paris"
str(submeter_data)
## 'data.frame': 1569894 obs. of 6 variables:
## $ DateTime : POSIXct, format: "2007-01-01 01:00:00" "2007-01-01 01:01:00" ...
## $ Date : chr "2007-01-01" "2007-01-01" "2007-01-01" "2007-01-01" ...
## $ Time : chr "00:00:00" "00:01:00" "00:02:00" "00:03:00" ...
## $ Sub_metering_1: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Sub_metering_2: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Sub_metering_3: num 0 0 0 0 0 0 0 0 0 0 ...
### Drop date and time columns and create new date and time as timezone has changed
submeter_data$Date <-NULL
submeter_data$Time <-NULL
### Extract date, time,year, quarter, month, day, hour, minute etc for data analysis.
submeter_data$Date <- date(submeter_data$DateTime)
submeter_data$Time <- format(submeter_data$DateTime,'%H:%M:%S')
submeter_data$year <- year(submeter_data$DateTime)
submeter_data$month <- month(submeter_data$DateTime,label=TRUE)
submeter_data$quarter <- quarter(submeter_data$Date,with_year = TRUE)
submeter_data$week <- isoweek(submeter_data$Date)
submeter_data$day <- day(submeter_data$Date)
submeter_data$hour <- as.numeric(format(submeter_data$DateTime,'%H'))
submeter_data$dayname <- wday(submeter_data$DateTime,label=TRUE)
submeter_data$year_day <-yday(submeter_data$DateTime)
submeter_data$minute <-minute(submeter_data$DateTime)
## Filter 2007,2008,2009 data again as some Dec 31st 2009 rows have become 2010 Jan 1
submeter_data <-submeter_data %>% filter(year %in% c(2007,2008,2009))
## Understand the final data
summary(submeter_data)
## DateTime Sub_metering_1 Sub_metering_2
## Min. :2007-01-01 01:00:00 Min. : 0.000 Min. : 0.000
## 1st Qu.:2007-10-03 08:24:15 1st Qu.: 0.000 1st Qu.: 0.000
## Median :2008-07-01 21:35:30 Median : 0.000 Median : 0.000
## Mean :2008-07-02 03:24:05 Mean : 1.159 Mean : 1.343
## 3rd Qu.:2009-03-31 13:47:45 3rd Qu.: 0.000 3rd Qu.: 1.000
## Max. :2009-12-31 23:59:00 Max. :82.000 Max. :78.000
##
## Sub_metering_3 Date Time year
## Min. : 0.000 Min. :2007-01-01 Length:1569834 Min. :2007
## 1st Qu.: 0.000 1st Qu.:2007-10-03 Class :character 1st Qu.:2007
## Median : 1.000 Median :2008-07-01 Mode :character Median :2008
## Mean : 6.216 Mean :2008-07-01 Mean :2008
## 3rd Qu.:17.000 3rd Qu.:2009-03-31 3rd Qu.:2009
## Max. :31.000 Max. :2009-12-31 Max. :2009
##
## month quarter week day hour
## Oct :134054 Min. :2007 Min. : 1.00 Min. : 1.00 Min. : 0.0
## May :133914 1st Qu.:2007 1st Qu.:13.00 1st Qu.: 8.00 1st Qu.: 5.0
## Jan :133856 Median :2008 Median :27.00 Median :16.00 Median :12.0
## Dec :133846 Mean :2008 Mean :26.62 Mean :15.71 Mean :11.5
## Jul :133783 3rd Qu.:2009 3rd Qu.:40.00 3rd Qu.:23.00 3rd Qu.:18.0
## Mar :133736 Max. :2009 Max. :53.00 Max. :31.00 Max. :23.0
## (Other):766645
## dayname year_day minute
## Sun:221579 Min. : 1.0 Min. : 0.00
## Mon:224448 1st Qu.: 91.0 1st Qu.:14.25
## Tue:226049 Median :184.0 Median :30.00
## Wed:225980 Mean :183.4 Mean :29.50
## Thu:225184 3rd Qu.:275.0 3rd Qu.:44.00
## Fri:224634 Max. :366.0 Max. :59.00
## Sat:221960
nrow(submeter_data)
## [1] 1569834
sum(submeter_data$Sub_metering_1)
## [1] 1819989
sum(submeter_data$Sub_metering_2)
## [1] 2108410
sum(submeter_data$Sub_metering_3)
## [1] 9757744
sub_monthly_total <-submeter_data %>% group_by(year,month)%>%
summarise(sub3_monthly_total= sum(Sub_metering_3),
sub2_monthly_total= sum(Sub_metering_2),
sub1_monthly_total= sum(Sub_metering_1))
sub_weekly_total <-submeter_data %>% group_by(year,week)%>%
summarise(sub3_weekly_total= sum(Sub_metering_3),
sub2_weekly_total= sum(Sub_metering_2),
sub1_weekly_total= sum(Sub_metering_1))
sub_daily_total <-submeter_data %>% group_by(year,month,week,Date,day)%>%
summarise(sub3_daily_total= sum(Sub_metering_3),
sub2_daily_total= sum(Sub_metering_2),
sub1_daily_total= sum(Sub_metering_1))
sub_hourly_total <-submeter_data %>% group_by(year,Date,week,dayname,hour)%>%
summarise(sub3_hourly_total= sum(Sub_metering_3),
sub2_hourly_total= sum(Sub_metering_2),
sub1_hourly_total= sum(Sub_metering_1))
Meter_data <- submeter_data %>%
gather(All_Meter, Watt_hr, "Sub_metering_1", "Sub_metering_2", "Sub_metering_3")
Let’s get the proportion of total consumption by each submeter
Meter_data %>%
group_by(year,All_Meter) %>%
summarise(Total_usage_kwh=sum(Watt_hr)/1000)%>%
ggplot(aes(x=factor(year),y=Total_usage_kwh,group=All_Meter,fill=All_Meter))+
geom_col(color="black")+ theme_bw()+
labs(x="Year", y="Total Energy usage in Kwh")+
ggtitle ("Proportion of Total Energy consumption by each submeter")+
theme(text = element_text(size = 12))+
scale_fill_discrete(name = 'Submeter: ', labels = c('Kitchen', 'Laundry', 'Water Heater & AC'))
Meter_data %>%
filter (year==2009) %>%
group_by(month,All_Meter) %>%
summarise(Total_usage_kwh=sum(Watt_hr)/1000)%>%
ggplot(aes(x=factor(month),y=Total_usage_kwh,group=All_Meter,fill=All_Meter))+
geom_col(color="black")+ theme_bw()+
labs(x="Months", y="Total Energy usage in Kwh")+
ggtitle ("Proportion of Total Energy consumption by each submeter/ month in 2009")+
theme(text = element_text(size = 12))+
scale_fill_discrete(name = 'Submeter: ', labels = c('Kitchen', 'Laundry', 'Water Heater & AC'))
Meter_data %>%
group_by(week, All_Meter) %>%
summarise(Total_usage_kwh=sum(Watt_hr)/1000)%>%
ggplot(aes(x=week,y=Total_usage_kwh,group=All_Meter,fill=All_Meter))+
geom_col(color="black")+ theme_bw()+
labs(x="Weeks", y="Total Energy usage in Kwh")+
ggtitle ("Proportion of Total Energyconsumption by each submeter/ Week in 3 years")+
theme(text = element_text(size = 12))+
scale_fill_discrete(name = 'Submeter: ', labels = c('Kitchen', 'Laundry', 'Water Heater & AC'))
#### Compare Submeters usage at diff quarters in 2009
Meter_data %>%
filter(year==2009) %>%
group_by(quarter, All_Meter) %>% summarise(Total_usage_kwh=sum(Watt_hr)/1000)%>%
ggplot(aes(x=factor(quarter),y=Total_usage_kwh,group=All_Meter,fill=All_Meter))+
geom_col(color="black")+ theme_bw()+
labs(x="Quarters", y="Total Energy usage in Kwh")+
ggtitle ("Proportion of Total Energy consumption by each submeter/ Quarter in 2009")+
theme(text = element_text(size = 12))+
scale_fill_discrete(name = 'Submeter: ', labels = c('Kitchen', 'Laundry', 'Water Heater & AC'))
Meter_data %>%
# filter(Date=="2009-05-15") %>%
group_by(hour, All_Meter) %>%
summarise(Total_usage_kwh=sum(Watt_hr)/1000)%>%
ggplot(aes(x=factor(hour),y=Total_usage_kwh,group=All_Meter,fill=All_Meter))+
geom_col(color="black")+ theme_bw()+
labs(x="Time in a day", y="Total Energy usage in Kwh")+
ggtitle ("Proportion of Total Energy by each submeter/ hour in 3 years")+
theme(text = element_text(size = 12))+
scale_fill_discrete(name = 'Submeter: ', labels = c('Kitchen', 'Laundry', 'Water Heater & AC'))
Meter_data %>%
# filter(Date=="2009-05-15") %>%
group_by(dayname, All_Meter) %>%
summarise(Total_usage_kwh=sum(Watt_hr)/1000)%>%
ggplot(aes(x=factor(dayname),y=Total_usage_kwh,group=All_Meter,fill=All_Meter))+
geom_col(color="black")+ theme_bw()+
labs(x="Time in a day", y="Total Energy usage in Kwh")+
ggtitle ("Proportion of Total Energy by each submeter/ day in 3 years")+
theme(text = element_text(size = 12))+
scale_fill_discrete(name = 'Submeter: ', labels = c('Kitchen', 'Laundry', 'Water Heater & AC'))
Let’s do comparison of total Submeter usage by each submeter - Across months of all years - Across weeks of all years - Across days of all years - Across hours of all years
Meter_data %>%
#filter(year==2007)%>%
group_by(month,All_Meter) %>%
summarise(Total_usage_kwh=sum(Watt_hr)/1000)%>%
ggplot(aes(x=factor(month),y=Total_usage_kwh,group=All_Meter,color=All_Meter))+
geom_line(size=1.25)+ theme_classic()+
labs(x="Months", y="Total Energy usage in Kwh")+
ggtitle ("Energy consumed by each submeter across months in 3 years")+
theme(text = element_text(size = 12))+
scale_color_discrete(name = 'Submeter: ', labels = c('Kitchen', 'Laundry', 'Water Heater & AC'))
Meter_data %>%
# filter(year==2009)%>%
group_by(week,All_Meter) %>%
summarise(Total_usage_kwh=sum(Watt_hr)/1000)%>%
ggplot(aes(x=week,y=Total_usage_kwh,group=All_Meter,color=All_Meter))+
geom_line(size=1.25)+ theme_classic()+
labs(x="Weeks", y="Total Energy usage in Kwh")+
ggtitle ("Energy consumed by each submeter across weeks in 3 years")+
theme(text = element_text(size = 12))+
scale_x_continuous(breaks = scales::pretty_breaks(n = 12))+
scale_color_discrete(name = 'Submeter: ', labels = c('Kitchen', 'Laundry', 'Water Heater & AC'))
Meter_data %>%
#filter(year==2007)%>%
group_by(dayname,All_Meter) %>%
summarise(Total_usage_kwh=sum(Watt_hr)/1000)%>%
ggplot(aes(x=factor(dayname),y=Total_usage_kwh,group=All_Meter,color=All_Meter))+
geom_line(size=1.25)+ theme_classic()+
labs(x="Days of the week", y="Total Energy usage in Kwh")+
ggtitle ("Energy consumed by each submeter across days in 3 years")+
theme(text = element_text(size = 12))+
scale_color_discrete(name = 'Submeter: ', labels = c('Kitchen', 'Laundry', 'Water Heater & AC'))
Meter_data %>%
#filter(year==2007)%>%
group_by(hour,All_Meter) %>%
summarise(Total_usage_kwh=sum(Watt_hr)/1000)%>%
ggplot(aes(x=hour,y=Total_usage_kwh,group=All_Meter,color=All_Meter))+
geom_line(size=1.25)+ theme_classic()+
labs(x="Hours of the day", y="Total Energy usage in Kwh")+
ggtitle ("Energy consumed by each submeter across hours in 3 years")+
theme(text = element_text(size = 12))+
scale_x_continuous(breaks = scales::pretty_breaks(n = 12))+
scale_color_discrete(name = 'Submeter: ', labels = c('Kitchen', 'Laundry', 'Water Heater & AC'))
Meter_data %>%
group_by(month,All_Meter,year) %>%
summarise(Total_usage_kwh=sum(Watt_hr)/1000)%>%
ggplot(aes(x=factor(month),y=Total_usage_kwh,group=All_Meter,color=All_Meter))+
facet_grid(rows=vars(year))+
geom_line(size=1.25)+ theme_classic()+
labs(x="Months", y="Total Energy usage in Kwh")+
ggtitle ("Energy consumed by each submeter across months in 3 years")+
theme(text = element_text(size = 12))+
theme_bw()+
scale_color_discrete(name = 'Submeter: ', labels = c('Kitchen', 'Laundry', 'Water Heater & AC'))
sub_monthly_total %>%
ggplot( aes(x=month,y=sub1_monthly_total/1000,group=factor(year)))+
geom_line(aes(color=factor(year)),size=1.25)+theme_classic()+
ggtitle("Comparison of monthly total Submeter 1 readings(Kwh) across different years ")+
labs(x="Months", y="Total Submeter 1/ month in kwh")+
scale_color_discrete(name = 'Year')
sub_weekly_total %>%
ggplot( aes(x=week,y=sub1_weekly_total/1000,group=factor(year)))+
geom_line(aes(color=factor(year)),size=1.25)+theme_classic()+
ggtitle("Comparison of weekly total Submeter 1 readings(Kwh) across different years ")+
labs(x="Weeks", y="Total Submeter 1/ Week in kwh")+
scale_x_continuous(breaks = scales::pretty_breaks(n = 5))+
scale_color_discrete(name = 'Year')
submeter_data %>% group_by(dayname,year) %>% summarise(Sub1total_day =sum(Sub_metering_1))%>%
ggplot(aes(x=dayname,y=Sub1total_day,group=factor(year)))+
geom_line(aes(color=factor(year)),size=1)+theme_bw()+
labs(x="Days of the week",y="Submeter total/ year for that day")+
ggtitle("Analyzing Submeter1-usage to find preferred day for dishwasher")+
scale_color_discrete(name = 'Year')
submeter_data %>% group_by(dayname,hour) %>% summarise(Sub1total_hour =sum(Sub_metering_1))%>%
ggplot(aes(x=hour,y=Sub1total_hour,color=factor(dayname),group=(dayname)))+
geom_line(size=1)+theme_bw()+geom_point()+
labs(x="Time of the day",y="Submeter total/ hour ")+
ggtitle("Analyzing the time of the day kitchen appliances are mostly used")+
scale_x_continuous(breaks = scales::pretty_breaks(n = 12))
submeter_data %>% filter(dayname %in% c("Wed","Sun")) %>% group_by(dayname,hour,year) %>% summarise(Sub1total_hour =sum(Sub_metering_1))%>%
ggplot(aes(x=hour,y=Sub1total_hour,color=factor(dayname),shape=factor(year),group=interaction(year,dayname)))+
geom_line(size=1)+theme_bw()+geom_point()+
labs(x="Time of the day",y="Submeter total/ hour ")+
ggtitle("Comparing weekday vs weekend usage of kitchen appliances")+
scale_x_continuous(breaks = scales::pretty_breaks(n = 12))+
scale_colour_brewer(palette = "Set2")
#### Get typical submeter 1 usage per day for summer weeks
summer_weeks_sub1 <- submeter_data %>%
filter( week >=23 & week<=30)%>%
group_by(dayname)%>%
summarise(summer_sub1_usage= sum(Sub_metering_1)/1000)
#### Get typical submeter1 usage per day for winter weeks
winter_weeks_sub1 <- submeter_data %>%
filter( week >=1 & week<=8)%>%
group_by(dayname)%>%
summarise(winter_sub1_usage= sum(Sub_metering_1)/1000)
ggplot(data=summer_weeks_sub1)+
geom_line(aes(x=dayname, y=summer_sub1_usage, group=1,colour='Summer'),size=1.25) +
geom_line(data = winter_weeks_sub1, aes(x=dayname, y=winter_sub1_usage, group=1, color='Winter'),size=1.25)+
scale_colour_manual(values=c('Winter'='blue', 'Summer'='red')) +
labs(colour='Season') + theme_classic()+
labs(x='Day of the Week', y='Submeter usage in kWh') +
ggtitle('Submeter1 usage in Summer vs Winter') +
theme(panel.border=element_rect(colour='black', fill=NA))+
theme(axis.line = element_line(size=1, colour = "black"),
panel.grid.major = element_line(colour = "#d3d3d3"), panel.grid.minor = element_blank(),
panel.border = element_blank(), panel.background = element_blank()) +
theme(plot.title = element_text(size = 14, family = "Tahoma", face = "bold"),
text=element_text(family="Tahoma"),
axis.text.x=element_text(colour="black", size = 10),
axis.text.y=element_text(colour="black", size = 10),
legend.key=element_rect(fill="white", colour="white"))
sub_monthly_total %>%
ggplot( aes(x=month,y=sub2_monthly_total/1000,group=factor(year)))+
geom_line(aes(color=factor(year)),size=1.25)+theme_classic()+
ggtitle("Comparison of monthly total Submeter 2 readings(Kwh) across different years ")+
labs(x="Months", y="Total Submeter 2/ month in kwh")+
scale_color_discrete(name = 'Year')
sub_weekly_total %>%
ggplot( aes(x=week,y=sub2_weekly_total/1000,group=factor(year)))+
geom_line(aes(color=factor(year)),size=1.25)+theme_classic()+
ggtitle("Comparison of weekly total Submeter 2 readings(Kwh) across different years ")+
labs(x="Weeks", y="Total Submeter 2/ Week in kwh")+
scale_x_continuous(breaks = scales::pretty_breaks(n = 5))+
scale_color_discrete(name = 'Year')
submeter_data %>% group_by(dayname,year) %>% summarise(Sub2total_day =sum(Sub_metering_2))%>%
ggplot(aes(x=dayname,y=Sub2total_day,group=factor(year)))+
geom_line(aes(color=factor(year)),size=1)+theme_bw()+
labs(x="Days of the week",y="Submeter2 total/ year for that day")+
ggtitle("Analyzing Submeter2-usage to find preferred day for Laundry ")+
scale_color_discrete(name = 'Year')
submeter_data %>% group_by(dayname,hour) %>% summarise(Sub2total_hour =sum(Sub_metering_2))%>%
ggplot(aes(x=hour,y=Sub2total_hour,color=factor(dayname),group=(dayname)))+
geom_line(size=1)+theme_bw()+geom_point()+
labs(x="Time of the day",y="Submeter total/ hour ")+
ggtitle("Analyzing the time of the day preferred for laundry")+
scale_x_continuous(breaks = scales::pretty_breaks(n = 12))
submeter_data %>% filter(dayname %in% c("Wed","Sun")) %>%group_by(dayname,hour,year) %>% summarise(Sub2total_hour =sum(Sub_metering_2))%>%
ggplot(aes(x=hour,y=Sub2total_hour,color=factor(dayname),shape=factor(year),group=interaction(year,dayname)))+
geom_line(size=1)+theme_bw()+geom_point()+
labs(x="Time of the day",y="Submeter total/ hour ")+
ggtitle("Comparing weekday vs weekend for laundry")+
scale_x_continuous(breaks = scales::pretty_breaks(n = 12))+
scale_colour_brewer(palette = "Set2")
#### Get typical submeter 2 usage per day for summer weeks
summer_weeks_sub2 <- submeter_data %>%
filter(week >=23 & week<=30)%>%
group_by(dayname)%>%
summarise(summer_sub2_usage= sum(Sub_metering_2)/1000)
#### Get typical submeter2 usage per day for winter weeks
winter_weeks_sub2 <- submeter_data %>%
filter( week >=1 & week<=8)%>%
group_by(dayname)%>%
summarise(winter_sub2_usage= sum(Sub_metering_2)/1000)
ggplot(data=summer_weeks_sub2)+
geom_line(aes(x=dayname, y=summer_sub2_usage, group=1,colour='Summer'),size=1.25) +
geom_line(data = winter_weeks_sub2, aes(x=dayname, y=winter_sub2_usage, group=1, color='Winter'),size=1.25)+
scale_colour_manual(values=c('Winter'='blue', 'Summer'='red')) +
labs(colour='Season') + theme_classic()+
labs(x='Day of the Week', y='Submeter usage in kWh') +
ggtitle('Submeter2 usage in Summer vs Winter') +
theme(panel.border=element_rect(colour='black', fill=NA))+
theme(axis.line = element_line(size=1, colour = "black"),
panel.grid.major = element_line(colour = "#d3d3d3"), panel.grid.minor = element_blank(),
panel.border = element_blank(), panel.background = element_blank()) +
theme(plot.title = element_text(size = 14, family = "Tahoma", face = "bold"),
text=element_text(family="Tahoma"),
axis.text.x=element_text(colour="black", size = 10),
axis.text.y=element_text(colour="black", size = 10),
legend.key=element_rect(fill="white", colour="white"))
sub_monthly_total %>%
ggplot( aes(x=month,y=sub3_monthly_total/1000,group=factor(year)))+
geom_line(aes(color=factor(year)),size=1.25)+theme_classic()+
ggtitle("Comparison of monthly total Submeter 3 readings(Kwh) across different years ")+
labs(x="Months", y="Total Submeter 3/ month in kwh")+
scale_color_discrete(name = 'Year')
sub_weekly_total %>%
ggplot( aes(x=week,y=sub3_weekly_total/1000,group=factor(year)))+
geom_line(aes(color=factor(year)),size=1.25)+theme_classic()+
ggtitle("Comparison of weekly total Submeter 3 readings(Kwh) across different years ")+
labs(x="Weeks", y="Total Submeter 3/ Week in kwh")+
scale_x_continuous(breaks = scales::pretty_breaks(n = 5))+
scale_color_discrete(name = 'Year')
submeter_data %>% group_by(dayname,year) %>% summarise(Sub3total_day =sum(Sub_metering_3))%>%
ggplot(aes(x=dayname,y=Sub3total_day,group=factor(year)))+
geom_line(aes(color=factor(year)),size=1)+theme_bw()+
labs(x="Days of the week",y="Submeter3 total/ year for that day")+
ggtitle("Analyzing Submeter3- to check AC and heater usage ")+
scale_color_discrete(name = 'Year')
submeter_data %>% group_by(dayname,hour) %>% summarise(Sub3total_hour=sum(Sub_metering_3)/1000)%>%
ggplot(aes(x=hour,y=Sub3total_hour,color=factor(dayname),group=(dayname)))+
geom_line(size=1)+theme_bw()+geom_point()+
labs(x="Time of the day",y="Submeter total in Kwh/ hour ")+
ggtitle("Analyzing the time of the day AC and waterheater is mostly used")+
scale_x_continuous(breaks = scales::pretty_breaks(n = 12))
submeter_data %>% filter(dayname %in% c("Sat","Sun")) %>%group_by(dayname,hour,year) %>% summarise(Sub3total_hour =sum(Sub_metering_3))%>%
ggplot(aes(x=hour,y=Sub3total_hour,color=factor(dayname),shape=factor(year),group=interaction(year,dayname)))+
geom_line(size=1)+theme_bw()+geom_point()+
labs(x="Time of the day",y="Submeter total/ hour ")+
ggtitle("Analyzing the time of the weekend AC and waterheater is mostly used")+
scale_x_continuous(breaks = scales::pretty_breaks(n = 12))
submeter_data %>% filter(dayname %in% c("Tue","Sat")) %>%group_by(dayname,hour,year) %>% summarise(Sub3total_hour =sum(Sub_metering_3))%>%
ggplot(aes(x=hour,y=Sub3total_hour,color=factor(dayname),shape=factor(year),group=interaction(year,dayname)))+
geom_line(size=1)+theme_bw()+geom_point()+
labs(x="Time of the day",y="Submeter total/ hour ")+
ggtitle("Comparing weekday and weekend for AC and waterheater usage")+
scale_x_continuous(breaks = scales::pretty_breaks(n = 12))+
scale_colour_brewer(palette = "Set2")
#### Get typical submeter 3 usage per day for summer weeks
summer_weeks <- submeter_data %>%
filter( week >=23 & week<=30)%>%
group_by(dayname)%>%
summarise(summer_sub3_usage= sum(Sub_metering_3)/1000)
#### Get typical submeter3 usage per day for winter weeks
winter_weeks <- submeter_data %>%
filter( week >=1 & week<=8)%>%
group_by(dayname)%>%
summarise(winter_sub3_usage= sum(Sub_metering_3)/1000)
ggplot(data=summer_weeks)+
geom_line(aes(x=dayname, y=summer_sub3_usage, group=1,colour='Summer'),size=1.25) +
geom_line(data = winter_weeks, aes(x=dayname, y=winter_sub3_usage, group=1, color='Winter'),size=1.25)+
scale_colour_manual(values=c('Winter'='blue', 'Summer'='red')) +
labs(colour='Season') + theme_classic()+
labs(x='Day of the Week', y='Submeter usage in kWh') +
ggtitle('Submeter3 usage in Summer vs Winter') +
theme(panel.border=element_rect(colour='black', fill=NA))+
theme(axis.line = element_line(size=1, colour = "black"),
panel.grid.major = element_line(colour = "#d3d3d3"), panel.grid.minor = element_blank(),
panel.border = element_blank(), panel.background = element_blank()) +
theme(plot.title = element_text(size = 14, family = "Tahoma", face = "bold"),
text=element_text(family="Tahoma"),
axis.text.x=element_text(colour="black", size = 10),
axis.text.y=element_text(colour="black", size = 10),
legend.key=element_rect(fill="white", colour="white"))
In order to find out whether the client’s claim of not occupying the residence for a certain period of time in Summer of 2008 is valid, the weekly submeter 3 total is compared against the summer weeks of 2007 and 2009 to understand typical power usage.
sub_weekly_total%>% filter(week>=23 & week <=37) %>%
ggplot( aes(x = week, y = sub3_weekly_total)) +
facet_wrap(~year)+
geom_col(aes(fill=factor(year)))+ theme_classic()+
labs(x="Weeks",y="Submeter3 usage/Week")+
ggtitle("Weekly Submeter3 usage across years")+
scale_fill_discrete(name = 'Year')
It can be noted that weeks 32-34 of 2008 have unusually lower weekly total. They correspond to Aug 2008.
Meter_data %>%
filter(year==2008 & week>=31 &week <=36) %>%
group_by(All_Meter,day,year) %>%
summarise(Weekly_usage=sum(Watt_hr)/1000)%>%
ggplot(aes(x = day, y = Weekly_usage)) +
geom_line(aes(color=factor(All_Meter)),size=1.25)+ theme_classic()+
xlab("August Month")+
ylab("Submeter usage/day in Kwh")+
ggtitle ("Submeter usage in Summer- August 2008")+
scale_x_continuous(breaks=seq(1,31,4))+
scale_color_discrete(name = 'Submeter: ', labels = c('Kitchen', 'Laundry', 'Water Heater & AC'))
houseMonth <- filter(submeter_data, year == 2008 & month=="Aug" & minute==0)
plot_ly(houseMonth, x = ~houseMonth$DateTime, y = ~houseMonth$Sub_metering_1, name = 'Kitchen', type = 'scatter', mode = 'lines') %>%
add_trace(y = ~houseMonth$Sub_metering_2, name = 'Laundry Room', mode = 'lines') %>%
add_trace(y = ~houseMonth$Sub_metering_3, name = 'Water Heater & AC', mode = 'lines') %>%
layout(title = "Power Consumption Aug 2008",
xaxis = list(title = "Time"),
yaxis = list (title = "Power (watt-hours)"))
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
The above plot shows the daily total of all 3 submeters for Aug 2008. It can be noted that there is a sudden drop in the total from Aug 9th till 28th of August. This proves that the Client’s claim is true.
To gain maximum information from Visualization, subsetting the data and adjusting the granularity is important. Another goal of subsetting is to focus on periods of time that highlights patterns of power usage. Granularity describes the frequency of observations within a time series dataset.
The power consumption changes during the course of the day, week, month. Therefore these 3 time periods can be considered to visualize the submeter data. Since the source data has data for every minute, we must reduce the granularity to gain information. It can be every 10 mins/ 30 mins/ 1 h/ 4hrs/ twice daily etc.
The following time period and granularity are chosen for plotting.
houseDay<- filter(submeter_data, year == 2008 & month == "Jan" & day == 9 & (minute == 0 | minute == 10 | minute == 20 | minute == 30 | minute == 40 | minute == 50))
plot_ly(houseDay, x = ~houseDay$DateTime, y = ~houseDay$Sub_metering_1, name = 'Kitchen', type = 'scatter', mode = 'lines') %>%
add_trace(y = ~houseDay$Sub_metering_2, name = 'Laundry Room', mode = 'lines') %>%
add_trace(y = ~houseDay$Sub_metering_3, name = 'Water Heater & AC', mode = 'lines') %>%
layout(title = "Power Consumption January 9th, 2008",
xaxis = list(title = "Time"),
yaxis = list (title = "Power (watt-hours)"))
houseWeek25 <- filter(submeter_data, year == 2007 & week == 30 &(minute==0|minute==30))
## Plot sub-meter 1, 2 and 3 with title, legend and labels - All observations
plot_ly(houseWeek25, x = ~houseWeek25$DateTime, y = ~houseWeek25$Sub_metering_1, name = 'Kitchen', type = 'scatter', mode = 'lines') %>%
add_trace(y = ~houseWeek25$Sub_metering_2, name = 'Laundry Room', mode = 'lines') %>%
add_trace(y = ~houseWeek25$Sub_metering_3, name = 'Water Heater & AC', mode = 'lines') %>%
layout(title = "Power Consumption Week 30 July 2007",
xaxis = list(title = "Time"),
yaxis = list (title = "Power (watt-hours)"))
houseWeek50 <- filter(submeter_data, year == 2009 & week == 50 & minute==0)
## Plot sub-meter 1, 2 and 3 with title, legend and labels - All observations
plot_ly(houseWeek50, x = ~houseWeek50$DateTime, y = ~houseWeek50$Sub_metering_1, name = 'Kitchen', type = 'scatter', mode = 'lines') %>%
add_trace(y = ~houseWeek50$Sub_metering_2, name = 'Laundry Room', mode = 'lines') %>%
add_trace(y = ~houseWeek50$Sub_metering_3, name = 'Water Heater & AC', mode = 'lines') %>%
layout(title = "Power Consumption Week 50 Dec 2009",
xaxis = list(title = "Time"),
yaxis = list (title = "Power (watt-hours)"))
houseMonthJan <- filter(submeter_data, year == 2008 & month=="Jan" & minute==0 & (hour == 0 | hour == 4 | hour==8|hour==12 |hour==16| hour==20))
## Plot sub-meter 1, 2 and 3 with title, legend and labels - All observations
plot_ly(houseMonthJan, x = ~houseMonthJan$DateTime, y = ~houseMonthJan$Sub_metering_1, name = 'Kitchen', type = 'scatter', mode = 'lines') %>%
add_trace(y = ~houseMonthJan$Sub_metering_2, name = 'Laundry Room', mode = 'lines') %>%
add_trace(y = ~houseMonthJan$Sub_metering_3, name = 'Water Heater & AC', mode = 'lines') %>%
layout(title = "Power Consumption Jan 2008",
xaxis = list(title = "Time"),
yaxis = list (title = "Power (watt-hours)"))
Below are the 4 time series objects created using ts() function and visualization of the same.
houseweek_mon <- filter(submeter_data, dayname=="Mon" & hour == 20 & minute == 1)
houseweek_ts <-ts(houseweek_mon$Sub_metering_3, frequency=52, start=c(2007,1))
plot(houseweek_ts, xlab = "Weeks from 2007-2010", ylab = "Watt Hours", main = "Weekly time series of Submeter 3")+ theme_classic()
## NULL
## Fit tslm model
fit_weekly <- tslm(houseweek_ts ~ trend + season)
summary(fit_weekly)
##
## Call:
## tslm(formula = houseweek_ts ~ trend + season)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.0000 -4.2882 -0.3333 2.0452 14.0452
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 19.84427 3.58071 5.542 2.27e-07 ***
## trend -0.02651 0.01277 -2.075 0.040430 *
## season2 -12.74608 5.25762 -2.424 0.017063 *
## season3 -12.71957 5.25686 -2.420 0.017272 *
## season4 -18.35972 5.25613 -3.493 0.000703 ***
## season5 -12.66655 5.25543 -2.410 0.017700 *
## season6 -18.30670 5.25476 -3.484 0.000725 ***
## season7 -12.61353 5.25412 -2.401 0.018141 *
## season8 -7.25368 5.25352 -1.381 0.170322
## season9 -18.22717 5.25294 -3.470 0.000759 ***
## season10 -12.20066 5.25240 -2.323 0.022133 *
## season11 -12.50749 5.25189 -2.382 0.019060 *
## season12 -12.48098 5.25141 -2.377 0.019298 *
## season13 -18.12114 5.25096 -3.451 0.000808 ***
## season14 -18.09463 5.25054 -3.446 0.000821 ***
## season15 -12.40145 5.25015 -2.362 0.020032 *
## season16 -11.70827 5.24979 -2.230 0.027880 *
## season17 -17.34843 5.24946 -3.305 0.001305 **
## season18 -17.65525 5.24917 -3.363 0.001079 **
## season19 -11.62874 5.24890 -2.215 0.028909 *
## season20 -11.60223 5.24867 -2.211 0.029261 *
## season21 -17.24239 5.24847 -3.285 0.001389 **
## season22 -6.21588 5.24830 -1.184 0.238970
## season23 -14.18937 5.24816 -2.704 0.008012 **
## season24 -17.49620 5.24805 -3.334 0.001188 **
## season25 -11.46969 5.24797 -2.186 0.031092 *
## season26 -7.10984 5.24792 -1.355 0.178420
## season27 -17.08333 5.24791 -3.255 0.001529 **
## season28 -17.39016 5.24792 -3.314 0.001268 **
## season29 -17.36365 5.24797 -3.309 0.001288 **
## season30 -17.00380 5.24805 -3.240 0.001605 **
## season31 -16.97729 5.24816 -3.235 0.001632 **
## season32 -17.28412 5.24830 -3.293 0.001354 **
## season33 -11.25761 5.24847 -2.145 0.034286 *
## season34 -17.23110 5.24867 -3.283 0.001400 **
## season35 -17.20459 5.24890 -3.278 0.001423 **
## season36 -16.84475 5.24917 -3.209 0.001771 **
## season37 -13.81824 5.24946 -2.632 0.009772 **
## season38 -11.12506 5.24979 -2.119 0.036459 *
## season39 -10.76522 5.25015 -2.050 0.042835 *
## season40 -11.40537 5.25054 -2.172 0.032110 *
## season41 -7.37886 5.25096 -1.405 0.162929
## season42 -16.68569 5.25141 -3.177 0.001957 **
## season43 -10.65918 5.25189 -2.030 0.044952 *
## season44 -16.96600 5.25240 -3.230 0.001657 **
## season45 -10.93949 5.25294 -2.083 0.039746 *
## season46 -16.91298 5.25352 -3.219 0.001714 **
## season47 -5.55314 5.25412 -1.057 0.293001
## season48 -16.85996 5.25476 -3.209 0.001774 **
## season49 -16.83345 5.25543 -3.203 0.001805 **
## season50 -13.80694 5.25613 -2.627 0.009921 **
## season51 -5.11377 5.25686 -0.973 0.332920
## season52 -11.08726 5.25762 -2.109 0.037363 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.871 on 104 degrees of freedom
## Multiple R-squared: 0.3831, Adjusted R-squared: 0.07461
## F-statistic: 1.242 on 52 and 104 DF, p-value: 0.1747
## Create the forecast for sub-meter 3. Forecast ahead 20 time periods
forecastfit_weekly <- forecast(fit_weekly, h=20,level=c(80,90))
## Plot the forecast for sub-meter 3.
plot(forecastfit_weekly, ylim = c(0, 20), ylab= "Watt-Hours", xlab="Weeks", main="20 weekly Mondays forecast of Submeter3 readings in 2010")
plot(forecastfit_weekly, ylim = c(0, 20), ylab= "Watt-Hours", xlab="Weeks", main="20 weeks forecast of submeter3 from lm model",start(c(2010)),shadecols="pink")
house_hourly <- submeter_data %>% filter(month=='Sep' & year==2008 &minute==0)
house_hourly_ts <-ts(house_hourly$Sub_metering_2, frequency=30)
plot(house_hourly_ts, xlab = "Hours", ylab = "Watt Hours", main = "Hourly time series of Sub-meter 2- Sep 2008")+ theme_classic()
## NULL
## Fit tslm model
fit_hourly <- tslm(house_hourly_ts ~ trend+season )
summary(fit_hourly)$r.squared
## [1] 0.032314
summary(fit_hourly)$adj.r.squared
## [1] -0.009820365
## Create the forecast for sub-meter 2.
forecastfit_hourly <- forecast(fit_hourly, h=72)
## Plot the forecast for sub-meter 3.
plot(forecastfit_hourly,ylim = c(0, 40), ylab= "Watt-Hours", xlab="Hours", main="Forecast of hourly time series of submeter2",shadecols = "pink")
plot(forecastfit_hourly,ylim = c(0, 40), ylab= "Watt-Hours", xlab="Hours", main="Forecast of hourly time series of submeter2",start(24),shadecols = "pink")
## Submeter 1 monthly time series
sub1_monthly_total <- submeter_data %>% group_by(year,month) %>% summarise(Totalpermonth= sum(Sub_metering_1)/1000)
sub1_monthly_total_ts <- ts(sub1_monthly_total$Totalpermonth,frequency=12,start=c(2007,1))
plot(sub1_monthly_total_ts, main="Monthly time series of Submeter 1 total",ylab="Submeter1 usage/month")
fit_monthly <- tslm(sub1_monthly_total_ts ~ trend + season)
forecast_monthly <- forecast(fit_monthly, h=10,level=c(80,90))
plot(forecast_monthly,ylim = c(0, 80), ylab= "Submeter total in Kwh", xlab="Months", main="Forecast total submeter1 for next 10 months with tslm model",shadecols="pink")
plot(forecast_monthly,ylim = c(0, 80), ylab= "Submeter total in Kwh", xlab="Months", main="Forecast total submeter1 for next 10 months with tslm model",start(2010),shadecols="pink")
## Submeter 3 daily series
sub3_daily_total <- submeter_data %>% group_by(year,year_day) %>% summarise(Totalperday= sum(Sub_metering_3)/1000)
sub3_daily_total_ts <- ts(sub3_daily_total$Totalperday,frequency=365,start=c(2007,1))
plot(sub3_daily_total_ts,main="Daily time series of Submeter 3 total",ylab="Submeter3 usage/day")
fit_daily<- tslm(sub3_daily_total_ts ~ trend + season)
forecast_daily <- forecast(fit_daily, h=150,level=c(80,90))
plot(forecast_daily,xlab="Days",ylab="Submeter usage/day",main="150 days Forecast of daily submeter3 total ")
plot(forecast_daily,xlab="Days",ylab="Submeter usage/day",main="150 days Forecast of daily submeter3 total ",start(2010),shadecols="pink")
Linear_model_results <- data.frame(rbind(accuracy(fit_weekly),accuracy(fit_hourly),accuracy(fit_monthly),accuracy(fit_daily)))
Linear_model_results ['R2'] <- data.frame(rbind(summary(fit_weekly)$r.squared,summary(fit_hourly)$r.squared,summary(fit_monthly)$r.squared, summary(fit_daily)$r.squared))
Linear_model_results ['Adjusted_R2'] <- data.frame(rbind(summary(fit_weekly)$adj.r.squared,summary(fit_hourly)$adj.r.squared,summary(fit_monthly)$adj.r.squared, summary(fit_daily)$adj.r.squared))
Linear_model_results['Model_desc'] <- data.frame(rbind("Forecast of Monday 8:01 pm every week","Forecast every hour for a month- submeter2","Forecast of monthly total Submeter1 for 10 months","Forecast of total submeter3 for 150 days" ))
Linear_model_results['Model_name'] <- data.frame(rbind("fit_weekly","fit_hourly","fit_monthly","fit_daily"))
Model_comparison <- Linear_model_results %>% select(Model_desc,Model_name, R2,Adjusted_R2, RMSE)
kable(Model_comparison,format = "html",caption= "Results of all Random Forest models",digits=4)%>% kable_styling(bootstrap_options = "striped", full_width = F)
| Model_desc | Model_name | R2 | Adjusted_R2 | RMSE | |
|---|---|---|---|---|---|
| Training.set | Forecast of Monday 8:01 pm every week | fit_weekly | 0.3831 | 0.0746 | 5.5924 |
| Training.set.1 | Forecast every hour for a month- submeter2 | fit_hourly | 0.0323 | -0.0098 | 4.7615 |
| Training.set.2 | Forecast of monthly total Submeter1 for 10 months | fit_monthly | 0.5796 | 0.3602 | 9.4306 |
| Training.set.3 | Forecast of total submeter3 for 150 days | fit_daily | 0.5173 | 0.2753 | 2.4273 |
decomp_weekly <- decompose(houseweek_ts)
plot(decomp_weekly)
#### 2. Decompose the hourly time series - submeter2
decomp_hourly<- decompose(house_hourly_ts)
plot(decomp_hourly)
decomp_monthly <- decompose(sub1_monthly_total_ts)
plot(decomp_monthly)
decomp_daily <- decompose(sub3_daily_total_ts)
plot(decomp_daily)
Seasonal_decomposition_summary <-data.frame(rbind(summary(decomp_weekly$seasonal), summary(decomp_hourly$seasonal),summary(decomp_monthly$seasonal),summary(decomp_daily$seasonal)))
Seasonal_decomposition_summary['Timeseries']<- data.frame(rbind("Weekly ts seasonal","Hourly ts seasonal ","Monthly ts seasonal","Daily ts seasonal"))
kable(Seasonal_decomposition_summary,format = "html",caption= "Results of all Random Forest models",digits=4)%>% kable_styling(bootstrap_options = "striped", full_width = F)
| Min. | X1st.Qu. | Median | Mean | X3rd.Qu. | Max. | Timeseries |
|---|---|---|---|---|---|---|
| -3.7014 | -3.1101 | -2.8120 | 0.0931 | 5.5053 | 14.6207 | Weekly ts seasonal |
| -0.9420 | -0.7246 | -0.4858 | 0.0000 | 0.9291 | 2.4059 | Hourly ts seasonal |
| -31.6092 | -5.1400 | 1.1778 | 0.0000 | 4.5897 | 19.1106 | Monthly ts seasonal |
| -7.0325 | -1.6698 | 0.1302 | -0.0033 | 1.6551 | 6.7177 | Daily ts seasonal |
Trend_decomposition_summary <-data.frame(rbind(summary(decomp_weekly$trend), summary(decomp_hourly$trend),summary(decomp_monthly$trend),summary(decomp_daily$trend)))
Trend_decomposition_summary['Timeseries']<- data.frame(rbind("Weekly ts trend","Hourly ts trend ","Monthly ts trend","Daily ts trend"))
kable(Trend_decomposition_summary,format = "html",caption= "Results of all Random Forest models",digits=4)%>% kable_styling(bootstrap_options = "striped", full_width = F)
| Min. | X1st.Qu. | Median | Mean | X3rd.Qu. | Max. | NA.s | Timeseries |
|---|---|---|---|---|---|---|---|
| 1.8654 | 2.4808 | 3.0577 | 3.3295 | 3.8173 | 6.6635 | 52 | Weekly ts trend |
| 0.0167 | 0.2667 | 0.4167 | 0.9812 | 1.4333 | 4.0333 | 30 | Hourly ts trend |
| 46.0098 | 48.8336 | 50.4856 | 50.4622 | 52.6139 | 53.7758 | 12 | Monthly ts trend |
| 8.1182 | 8.6760 | 8.8914 | 8.8620 | 9.0618 | 9.7742 | 364 | Daily ts trend |
Random_decomposition_summary <-data.frame(rbind(summary(decomp_weekly$random), summary(decomp_hourly$random),summary(decomp_monthly$random),summary(decomp_daily$random)))
Random_decomposition_summary['Timeseries']<- data.frame(rbind("Weekly ts Random","Hourly ts Random ","Monthly ts Random","Daily ts Random"))
kable(Random_decomposition_summary,format = "html",caption= "Results of all Random Forest models",digits=4)%>% kable_styling(bootstrap_options = "striped", full_width = F)
| Min. | X1st.Qu. | Median | Mean | X3rd.Qu. | Max. | NA.s | Timeseries |
|---|---|---|---|---|---|---|---|
| -10.5726 | -1.5341 | -0.1784 | -0.2216 | 1.0909 | 10.1293 | 52 | Weekly ts Random |
| -5.8559 | -1.3370 | -0.2870 | 0.0028 | 0.4447 | 65.5608 | 30 | Hourly ts Random |
| -18.3063 | -5.2336 | -0.4250 | -0.4250 | 4.3835 | 17.4562 | 12 | Monthly ts Random |
| -6.5524 | -1.2683 | 0.1072 | 0.1072 | 1.4828 | 6.7669 | 364 | Daily ts Random |
library(ggplot2)
library(reshape)
HWplot<-function(ts_object, n.ahead=4, CI=.95, error.ribbon='green', line.size=1){
hw_object<-HoltWinters(ts_object)
forecast<-predict(hw_object, n.ahead=n.ahead, prediction.interval=T, level=CI)
for_values<-data.frame(time=round(time(forecast), 3), value_forecast=as.data.frame(forecast)$fit, dev=as.data.frame(forecast)$upr-as.data.frame(forecast)$fit)
fitted_values<-data.frame(time=round(time(hw_object$fitted), 3), value_fitted=as.data.frame(hw_object$fitted)$xhat)
actual_values<-data.frame(time=round(time(hw_object$x), 3), Actual=c(hw_object$x))
graphset<-merge(actual_values, fitted_values, by='time', all=TRUE)
graphset<-merge(graphset, for_values, all=TRUE, by='time')
graphset[is.na(graphset$dev), ]$dev<-0
graphset$Fitted<-c(rep(NA, NROW(graphset)-(NROW(for_values) + NROW(fitted_values))), fitted_values$value_fitted, for_values$value_forecast)
graphset.melt<-melt(graphset[, c('time', 'Actual', 'Fitted')], id='time')
p<-ggplot(graphset.melt, aes(x=time, y=value)) + theme_bw()+geom_ribbon(data=graphset, aes(x=time, y=Fitted, ymin=Fitted-dev, ymax=Fitted + dev), alpha=.2, fill=error.ribbon) + geom_line(aes(colour=variable), size=line.size) + geom_vline(aes(xintercept=max(actual_values$time)), lty=2) + xlab('Time') + ylab('Value') + theme(legend.position='bottom') + scale_colour_hue('')
return(p)
}
houseweek_tsadjusted <- houseweek_ts- decomp_weekly$seasonal
plot(houseweek_tsadjusted)
## Lets check whether seasonality has been removed
plot(decompose(houseweek_tsadjusted))
## Fitting Holt Winters model
## beta is for trend and gamma is for seasonal.
## Setting both parameters to FALSE , the Holtwinters does Exponential Smoothing
fit_weekly_hw <- HoltWinters(houseweek_ts,beta=FALSE, gamma=FALSE)
plot(fit_weekly_hw,ylim = c(0, 25))
## Forecast using HW model
forecast_weekly_hw <- forecast(fit_weekly_hw,h=25,level=c(10,25))
plot(forecast_weekly_hw, ylim = c(0, 20), ylab= "Watt-Hours", xlab="Time - Sub-meter 3", main="Holt Winters forecast for 20 weeks- Submeter3",shadecols="pink")
## Zoom on the forecasted part
plot(forecast_weekly_hw, ylim = c(0, 20), ylab= "Watt-Hours", xlab="Time - Sub-meter 3",start(2010) ,main="Holt Winters forecast for 20 weeks- Submeter3",shadecols="pink")
## beta and gamma are not specified and it will be automatically computed by R
fit_weekly_hw <-HoltWinters(houseweek_ts)
plot(fit_weekly_hw,ylim = c(0, 25))
## Alpha, beta, gamma is choosen by R- 0.06,002 and 0.72 are choosem
## Can see that recent seasonality is given more importance
## Forecast using HW model
forecast_weekly_hw <- forecast(fit_weekly_hw,h=20,level=c(10,25))
plot(forecast_weekly_hw, ylim = c(0, 20), ylab= "Watt-Hours", xlab="Time - Sub-meter 3", main="Holt Winters forecast for 20 weeks- Submeter3",shadecols="pink")
## Zoom on the forecasted part
plot(forecast_weekly_hw, ylim = c(0, 20), ylab= "Watt-Hours", xlab="Time - Sub-meter 3",start(2010) ,main="Holt Winters forecast for 20 weeks- Submeter3",shadecols="pink")
## Plot Actual vs fitted
HWplot(houseweek_ts, n.ahead = 25,error.ribbon = "red")
## beta and gamma are not specified and it will be automatically computed by R
fit_hourly_hw <-HoltWinters(house_hourly_ts)
plot(fit_hourly_hw,ylim = c(0, 40))
## Check the model parameters
fit_hourly_hw
## Holt-Winters exponential smoothing with trend and additive seasonal component.
##
## Call:
## HoltWinters(x = house_hourly_ts)
##
## Smoothing parameters:
## alpha: 0.01552357
## beta : 0
## gamma: 0.005444683
##
## Coefficients:
## [,1]
## a 0.589554228
## b -0.002651094
## s1 -0.355729322
## s2 -0.391695318
## s3 0.615181543
## s4 -0.375690028
## s5 -0.394920621
## s6 0.528912531
## s7 -0.181360401
## s8 -0.182575888
## s9 1.605280905
## s10 -0.346558476
## s11 -0.366030801
## s12 -0.322279732
## s13 -0.127749163
## s14 -0.255403336
## s15 -0.342872768
## s16 -0.254728347
## s17 -0.419972044
## s18 0.818952922
## s19 -0.369233920
## s20 -0.251494958
## s21 0.500859684
## s22 -0.390115645
## s23 -0.395492273
## s24 1.368757276
## s25 -0.428930208
## s26 -0.408908514
## s27 1.404398062
## s28 -0.187099159
## s29 -0.091820787
## s30 0.695595446
## Alpha, beta, gamma is choosen by R- 0.015,0 and 0.005 are choosem
# Check Sum of squared error (SSE) metrics
fit_hourly_hw$SSE
## [1] 17187.7
## Forecast using HW model
forecast_hourly_hw <- forecast(fit_hourly_hw,h=72,level=c(10,25))
plot(forecast_hourly_hw, ylim = c(0, 50), ylab= "Watt-Hours", xlab="Time - Sub-meter 2", main="Holt Winters forecast for next 24 hrs- Submeter 2")
## Zoom on the forecasted part
plot(forecast_hourly_hw, ylim = c(0, 20), ylab= "Watt-Hours", xlab="Time - Sub-meter 3",start(25) ,main="Holt Winters forecast for next 24 hrs- Submeter 2",shadecols="pink")
# Plot residuals
plot(forecast_hourly_hw$residuals)
## Plot Actual vs fitted
HWplot(house_hourly_ts, n.ahead = 72,error.ribbon = "red")
## beta and gamma are not specified and it will be automatically computed by R
fit_monthly_hw <-HoltWinters(sub1_monthly_total_ts)
plot(fit_monthly_hw,ylim = c(0, 100))
## Check the model parameters
fit_monthly_hw
## Holt-Winters exponential smoothing with trend and additive seasonal component.
##
## Call:
## HoltWinters(x = sub1_monthly_total_ts)
##
## Smoothing parameters:
## alpha: 0
## beta : 0
## gamma: 0.1578098
##
## Coefficients:
## [,1]
## a 45.6411943
## b -0.3770460
## s1 10.6711466
## s2 -10.9136261
## s3 11.7957596
## s4 -2.2818286
## s5 0.3641721
## s6 12.5299996
## s7 -13.2888449
## s8 -21.6040477
## s9 0.2742031
## s10 -7.3688571
## s11 1.2534914
## s12 16.1428707
## Alpha, beta, gamma is choosen by R- 0,0 and 0.402 are choosen
## Seasonality is given importance
# Check Sum of squared error (SSE) metrics
fit_monthly_hw$SSE
## [1] 4546.921
## Forecast using HW model
forecast_monthly_hw <- forecast(fit_monthly_hw,h=10,level=c(10,25))
plot(forecast_monthly_hw, ylim = c(0, 100), ylab= "Watt-Hours", xlab="Time - Sub-meter 1", main="Holt Winters forecast for next 10 months- Submeter 1",shadecols="pink")
## Zoom on the forecasted part
plot(forecast_monthly_hw, ylim = c(0, 100), ylab= "Watt-Hours", xlab="Time - Sub-meter 1",start(2010) ,main="Holt Winters forecast for next 10 months- Submeter 1",shadecols="pink")
# Plot residuals
plot(forecast_monthly_hw$residuals)
## Plot Actual vs fitted
HWplot(sub1_monthly_total_ts, n.ahead = 10,error.ribbon = "red")
## beta and gamma are not specified and it will be automatically computed by R
fit_daily_hw <-HoltWinters(sub3_daily_total_ts)
plot(fit_daily_hw,ylim = c(0, 20))
## Check the model parameters
fit_daily_hw
## Holt-Winters exponential smoothing with trend and additive seasonal component.
##
## Call:
## HoltWinters(x = sub3_daily_total_ts)
##
## Smoothing parameters:
## alpha: 0.1623532
## beta : 0
## gamma: 0.5957045
##
## Coefficients:
## [,1]
## a 11.737787919
## b 0.002221964
## s1 5.809819211
## s2 -1.306074613
## s3 2.831117234
## s4 3.432433932
## s5 1.141457474
## s6 1.710508408
## s7 3.114814669
## s8 2.908980865
## s9 1.459777774
## s10 1.309740454
## s11 1.454446656
## s12 -0.494744682
## s13 2.650137978
## s14 3.367313256
## s15 3.334487640
## s16 0.730608695
## s17 2.266600316
## s18 5.211499991
## s19 1.074540158
## s20 -1.740252860
## s21 1.625256802
## s22 -1.581444471
## s23 0.374599511
## s24 -1.193825317
## s25 -0.162197699
## s26 3.165873697
## s27 -0.200650910
## s28 -0.462969414
## s29 4.050399423
## s30 -1.502160981
## s31 0.940061792
## s32 1.220300568
## s33 1.385548655
## s34 0.132065413
## s35 3.140869797
## s36 4.764664094
## s37 0.145401804
## s38 -0.195036917
## s39 2.553436385
## s40 2.648989568
## s41 -1.879787516
## s42 0.131688808
## s43 1.697245811
## s44 2.134204270
## s45 -0.813876916
## s46 4.618181089
## s47 -2.306859827
## s48 -3.973281144
## s49 -2.557548924
## s50 -1.143501645
## s51 -4.549561305
## s52 0.207783214
## s53 -1.305652680
## s54 -0.414778249
## s55 4.085944798
## s56 0.469505410
## s57 -4.801257101
## s58 -4.938980592
## s59 -1.602826482
## s60 -3.115781892
## s61 -4.888067968
## s62 -5.880328075
## s63 -3.471379278
## s64 -2.557201783
## s65 -3.670906665
## s66 -1.442683844
## s67 -0.810916137
## s68 -0.125395642
## s69 -3.743176172
## s70 -1.430080150
## s71 0.703751782
## s72 2.244658321
## s73 -2.546404092
## s74 5.041981565
## s75 -2.979300809
## s76 -0.862733276
## s77 1.060308305
## s78 2.061826476
## s79 -2.359738720
## s80 -1.669250716
## s81 1.850947705
## s82 0.305420386
## s83 0.939955425
## s84 -0.177067203
## s85 2.413512520
## s86 -0.574893641
## s87 0.905154328
## s88 2.446987194
## s89 4.171465490
## s90 1.442256918
## s91 1.872440898
## s92 2.089025165
## s93 2.921756757
## s94 -1.492267723
## s95 3.882573927
## s96 2.588499821
## s97 0.072538250
## s98 0.289365637
## s99 3.717797534
## s100 2.652583650
## s101 1.078031365
## s102 6.469433805
## s103 4.360944537
## s104 0.761337215
## s105 0.867715733
## s106 -0.727024092
## s107 2.377659667
## s108 -0.096555383
## s109 2.154857717
## s110 2.254648463
## s111 2.731841272
## s112 -1.009912893
## s113 3.449803408
## s114 -1.848537817
## s115 -1.454283148
## s116 1.965359265
## s117 -1.712528138
## s118 -3.718975291
## s119 -3.093082644
## s120 -0.831205277
## s121 0.248743221
## s122 -2.549376953
## s123 -2.513072242
## s124 -4.373293906
## s125 -2.055665043
## s126 -0.388478379
## s127 -0.460134762
## s128 -3.144371934
## s129 2.987921412
## s130 0.651216507
## s131 0.416360659
## s132 -0.986461620
## s133 2.279062728
## s134 -1.044111608
## s135 0.802137039
## s136 0.569282939
## s137 0.724803764
## s138 -1.104390688
## s139 0.176630724
## s140 -0.103840025
## s141 1.111927763
## s142 1.150295506
## s143 -2.371757495
## s144 4.390894971
## s145 2.092043590
## s146 -1.603378348
## s147 -0.016647006
## s148 2.118347445
## s149 -1.297934755
## s150 0.217866409
## s151 1.472655637
## s152 -0.837434110
## s153 2.168171991
## s154 -1.219683602
## s155 1.166425462
## s156 -2.452565574
## s157 1.367359002
## s158 -0.711236419
## s159 0.763453320
## s160 0.524373853
## s161 0.031714038
## s162 0.221884935
## s163 0.137012685
## s164 -2.113666146
## s165 -4.396906714
## s166 -0.225045763
## s167 4.351504727
## s168 0.202814270
## s169 2.064188626
## s170 0.904687500
## s171 1.760599246
## s172 1.861392195
## s173 3.588204972
## s174 1.304315992
## s175 4.544627404
## s176 1.596133168
## s177 2.193467989
## s178 2.291434673
## s179 0.251414812
## s180 -1.027586630
## s181 -1.838152629
## s182 -1.022156409
## s183 -1.427452335
## s184 -1.519104683
## s185 -2.917126160
## s186 -1.643306129
## s187 -2.989540556
## s188 -2.256090251
## s189 -1.563813720
## s190 -2.943527387
## s191 -1.802665067
## s192 -1.848601516
## s193 -3.999360021
## s194 -3.569952175
## s195 -2.936339051
## s196 -3.218500460
## s197 -3.732785805
## s198 -3.539623281
## s199 -3.233261403
## s200 -3.803264639
## s201 -3.725547070
## s202 -2.337636751
## s203 -3.671098775
## s204 -2.424641617
## s205 -4.541066736
## s206 -4.888700724
## s207 -4.886560343
## s208 -3.455037088
## s209 -3.796114486
## s210 -3.212630375
## s211 -5.207222595
## s212 -3.939459882
## s213 -2.361209677
## s214 -4.458908691
## s215 -3.825826312
## s216 -3.772812067
## s217 -3.677607312
## s218 -2.010652551
## s219 -2.792524390
## s220 -2.398830846
## s221 -2.381568838
## s222 -2.281054550
## s223 0.062555598
## s224 -2.595347280
## s225 -1.910075463
## s226 -1.594855172
## s227 1.273635659
## s228 -1.540582795
## s229 -0.540857358
## s230 -1.963028407
## s231 -1.531470017
## s232 -1.596915809
## s233 -1.679644636
## s234 -1.006259864
## s235 -1.812458006
## s236 -0.628165562
## s237 -2.581782521
## s238 -1.316421187
## s239 -3.450461606
## s240 -1.746208601
## s241 0.022070573
## s242 -1.384813255
## s243 -0.763038949
## s244 -0.652429639
## s245 2.095307907
## s246 -2.398442918
## s247 1.630348861
## s248 1.138082159
## s249 -0.975556931
## s250 0.755073784
## s251 0.223256150
## s252 0.905546674
## s253 -1.155274002
## s254 0.646331111
## s255 0.763376955
## s256 -1.559445897
## s257 0.345246719
## s258 -2.195768687
## s259 -0.007394597
## s260 -0.853732553
## s261 -1.567812282
## s262 0.897315364
## s263 0.969209131
## s264 -0.281687060
## s265 -2.278662930
## s266 -0.434527419
## s267 -1.052452093
## s268 -0.448507112
## s269 0.644892570
## s270 -3.001887766
## s271 1.130842498
## s272 -1.267321697
## s273 -1.352969565
## s274 -1.833124279
## s275 1.933969781
## s276 0.748743645
## s277 -1.630079330
## s278 0.280890558
## s279 -1.426738752
## s280 -0.217408542
## s281 -0.259088920
## s282 1.387010807
## s283 1.102670865
## s284 -0.816436178
## s285 0.404649526
## s286 -1.215457753
## s287 -0.715538774
## s288 -0.793610755
## s289 0.199675645
## s290 2.650472398
## s291 0.185978133
## s292 -0.166379628
## s293 0.645343541
## s294 -1.089764574
## s295 2.488717218
## s296 1.219991163
## s297 2.440958394
## s298 3.128070329
## s299 -2.323145579
## s300 -2.863367280
## s301 -4.109612030
## s302 -5.342051105
## s303 -3.601319911
## s304 -0.204963528
## s305 -1.726728596
## s306 -3.303607647
## s307 -5.094139617
## s308 -1.455637000
## s309 -3.051870447
## s310 -0.150181405
## s311 -2.149266876
## s312 0.991623665
## s313 -0.949078177
## s314 3.117949146
## s315 0.234028908
## s316 2.245681816
## s317 0.882834700
## s318 5.416831475
## s319 0.702822316
## s320 2.484051934
## s321 3.571990763
## s322 2.941454753
## s323 2.065234070
## s324 2.391887162
## s325 1.484811005
## s326 3.299355965
## s327 0.261229159
## s328 1.269079006
## s329 1.693399847
## s330 1.095677399
## s331 2.800429966
## s332 3.149241387
## s333 0.357664583
## s334 3.197551502
## s335 3.424832946
## s336 3.354882896
## s337 1.409328176
## s338 0.808783700
## s339 3.021700544
## s340 3.166743275
## s341 1.212136450
## s342 4.916253596
## s343 0.549488737
## s344 3.712317736
## s345 1.966150000
## s346 -1.063865870
## s347 2.496886446
## s348 1.174329412
## s349 2.525320401
## s350 0.319707267
## s351 3.733156068
## s352 2.054241167
## s353 3.532174933
## s354 0.970970787
## s355 2.741084646
## s356 6.216555916
## s357 2.602401976
## s358 4.024252650
## s359 2.682054596
## s360 2.141569349
## s361 0.911566882
## s362 4.015617219
## s363 0.500741078
## s364 1.282279868
## s365 5.225810959
## Alpha, beta, gamma is choosen by R- 0.16,0 and 0.59 are choosen
## Seasonality is given importance
# Check Sum of squared error (SSE) metrics
fit_daily_hw$SSE
## [1] 7603.579
## Forecast using HW model
forecast_daily_hw <- forecast(fit_daily_hw,h=150,level=c(10,25))
plot(forecast_daily_hw, ylim = c(0, 25), ylab= "Watt-Hours", xlab="Time - Sub-meter 3", main="Holt Winters forecast for next 150 days- Submeter 3",shadecols="pink")
## Zoom on the forecasted part
plot(forecast_daily_hw, ylim = c(0, 25), ylab= "Watt-Hours", xlab="Time - Sub-meter 3",start(2010) ,main="Holt Winters forecast for next 150 days- Submeter 3",shadecols="pink")
# Plot residuals
plot(forecast_daily_hw$residuals)
## Plot Actual vs fitted
HWplot(sub3_daily_total_ts, n.ahead = 150,error.ribbon = "red")
Evidence has been found to support client’s claim that they have not been occupying the residence in the Summer of 2008. From our analysis, it is found that the client has been away from Aug 9th-28th 2008 as there is no activity in Kitchen/ laundry and Submeter 3- AC and Water heater is bare minimum. This is quite different from Aug 2007/2009 and compared to other months.