Big news about a new project! Hi,

I have exciting news!

A large regional residential developer is designing a large ‘Smart Home’ apartment housing development and is looking for evidence or positive reasons for adopting the use of electrical sub-metering devices used for power management in Smart Homes.

Installing these sub-meters could be a big step towards the developer’s goal of offering highly efficient Smart Homes that provide owners with power usage analytics - a large piece of their planned marketing efforts.

We have been asked to find enough evidence to help support the developer’s marketing claims of sub-meters providing owners with ‘useful’ power usage analytics; we need to do this by performing an ‘analytical deep dive’ of sub-metering generated data and producing high quality visualizations that support a positive narrative or story around your findings (you will create this narrative).

We have also been asked to demonstrate that we can predict future energy consumption from the same data; this would help further support their adoption of sub-meters from the viewpoint of possible energy consumption reduction and a monetary savings as a result.,

As a starting point, they have provided us with a very large data set that contains 47 months of energy usage data from these devices. Our job over the next few weeks will be to analyze this data to determine what kind of analytics and visualizations can be created that would empower Smart Home owners with greater understanding and control of their power usage.

Since you are new at IOT Analytics, I’ll explain our onboarding process for new clients.

Conduct research to become informed on the client’s business. In this case, you will need to get up to speed on the domains of Smart Homes, sub-meters and household power consumption.
Identify any analytic skill/knowledge gaps foreseen for the project and plug those gaps with self-learning. With a data set this big, data munging and sub-setting will be essential to the analytic process. Working with DateTime and Time Series also needs to be mastered.
Perform an initial exploration of the data. This exploration should be used to understand any potential issues, conduct early preprocessing, note summary statistics and identify any early recommendations about improvements to future data collection.
Hold a project kick-off meeting with the client to close the deal. This meeting will center around a presentation that contains all of the project details and our initial exploration of the data.
Your initial task is to work through the onboarding process and produce a PowerPoint presentation that will be delivered to the home developer’s management team during the kick off meeting. This report will give them confidence in our process and convince them that this project is relevant to their business needs.

It is important to remember that we do not know what the data will show, and neither does the developer; it is your job to tell them how we will conduct the analysis and what they are likely to gain, so be precise and accurate with your findings and any initial recommendations. However, keep in mind that you will be presenting your report to business rather than technical people.

Good luck,

Kathy

VP, IOT Analytics

Dataset Information

The dataset contains 47 months of energy usage data from sub-meters in a house located in Sceaux, France. The data includes measurements of electric power consumption from three sub-meters with a one-minute sampling rate.

library(RMySQL)

## Loading required package: DBI

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(lubridate)

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0     ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2     ✔ tibble  3.2.1
## ✔ purrr   1.0.1     ✔ tidyr   1.3.0
## ✔ readr   2.1.4

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(plotly)

## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout

library(ggplot2)
library(ggfortify)
library(fracdiff)
library(plotly)
library(tidyquant)

## Loading required package: PerformanceAnalytics
## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## 
## ######################### Warning from 'xts' package ##########################
## #                                                                             #
## # The dplyr lag() function breaks how base R's lag() function is supposed to  #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or       #
## # source() into this session won't work correctly.                            #
## #                                                                             #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop           #
## # dplyr from breaking base R's lag() function.                                #
## #                                                                             #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning.  #
## #                                                                             #
## ###############################################################################
## 
## Attaching package: 'xts'
## 
## The following objects are masked from 'package:dplyr':
## 
##     first, last
## 
## 
## Attaching package: 'PerformanceAnalytics'
## 
## The following object is masked from 'package:graphics':
## 
##     legend
## 
## Loading required package: quantmod
## Loading required package: TTR
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

library(lubridate)
library(forecast)

## Registered S3 methods overwritten by 'forecast':
##   method                 from     
##   autoplot.Arima         ggfortify
##   autoplot.acf           ggfortify
##   autoplot.ar            ggfortify
##   autoplot.bats          ggfortify
##   autoplot.decomposed.ts ggfortify
##   autoplot.ets           ggfortify
##   autoplot.forecast      ggfortify
##   autoplot.stl           ggfortify
##   autoplot.ts            ggfortify
##   fitted.ar              ggfortify
##   fortify.ts             ggfortify
##   residuals.ar           ggfortify

library(zoo)
library(dplyr)
library(fpp2)

## ── Attaching packages ────────────────────────────────────────────── fpp2 2.5 ──
## ✔ fma       2.5     ✔ expsmooth 2.3

library(shiny)
library(textshaping)
library(inline)

## 
## Attaching package: 'inline'
## 
## The following object is masked from 'package:shiny':
## 
##     code

library(pryr)

## 
## Attaching package: 'pryr'
## 
## The following objects are masked from 'package:purrr':
## 
##     compose, partial
## 
## The following object is masked from 'package:dplyr':
## 
##     where

library(stats)

## Create a database connection 
#con = dbConnect(MySQL(), user='deepAnalytics', password='Sqltask1234!', dbname='dataanalytics2018', #host='data-analytics-2018.cbrosir2cswx.us-east-1.rds.amazonaws.com')

## Create a database connection 
#y2007 <- dbGetQuery(con, "SELECT * FROM yr_2007")
#y2008 <- dbGetQuery(con, "SELECT * FROM yr_2008")
#y2009 <- dbGetQuery(con, "SELECT * FROM yr_2009")

2nd Option: Download the Data:

# Load required libraries
library(shiny)
library(DBI)
library(dplyr)

# Define UI for the Shiny app
ui <- fluidPage(
  # Application title
  titlePanel("Database Connection"),
  
  # Sidebar layout
  sidebarLayout(
    # Sidebar panel
    sidebarPanel(
      # Database connection inputs
      textInput("user_input", "Username", value = ""),
      passwordInput("password_input", "Password", value = ""),
      textInput("dbname_input", "Database Name", value = ""),
      textInput("host_input", "Host", value = ""),
      
      # Action button to trigger connection and data retrieval
      actionButton("connect_button", "Connect to Database")
    ),
    
    # Main panel
    mainPanel(
      # Output for status message
      verbatimTextOutput("status_output"),
      
      # Output for downloaded data
      downloadButton("download_button", "Download Data")
    )
  )
)

# Define server logic for the Shiny app
server <- function(input, output) {
  # Reactive values for storing the database connection and retrieved data
  con <- reactiveValues()
  data <- reactiveValues()
  
  # Connect to the database when the "Connect to Database" button is clicked
  observeEvent(input$connect_button, {
    # Retrieve user inputs
    user <- input$user_input
    password <- input$password_input
    dbname <- input$dbname_input
    host <- input$host_input
    
    # Create a database connection
    con$connection <- dbConnect(MySQL(), user = user, password = password, dbname = dbname, host = host)
    
    # Check if the connection was successful
    if (!is.null(con$connection)) {
      output$status_output <- renderPrint("Database connection successful.")
    } else {
      output$status_output <- renderPrint("Database connection failed.")
    }
  })
  
  # Retrieve data from the database and store it when the connection is successful
  observeEvent(con$connection, {
    if (!is.null(con$connection)) {
      # Retrieve data from the database
      y2007 <- dbGetQuery(con$connection, "SELECT * FROM yr_2007")
      y2008 <- dbGetQuery(con$connection, "SELECT * FROM yr_2008")
      y2009 <- dbGetQuery(con$connection, "SELECT * FROM yr_2009")
      
      # Combine the data into a single dataframe
      data$df <- bind_rows(y2007, y2008, y2009)
    }
  })
  
  # Download the data as a CSV file
  output$download_button <- downloadHandler(
    filename = "data.csv",
    content = function(file) {
      if (!is.null(data$df)) {
        write.csv(data$df, file, row.names = FALSE)
      }
    }
  )
}

# Run the Shiny app
#shinyApp(ui = ui, server = server)

3rd option: Load dataset.

Smart Home Power Consumption Analysis

This project aims to analyze and forecast power consumption data from a large ‘Smart Home’ apartment housing development. The goal is to provide evidence for the adoption of electrical sub-metering devices used for power management in Smart Homes, as well as empower Smart Home owners with greater understanding and control of their power usage.

Project Overview

We have been tasked with conducting an analytical deep dive into sub-metering generated data and producing high-quality visualizations that support a positive narrative around the benefits of sub-meters. Additionally, we need to demonstrate the ability to predict future energy consumption based on the same data.

# Read data.csv
data <- read.csv("data.csv")

# Convert "Date" column to datetime
data$Date <- as.POSIXct(data$Date)

# Select specific years
y2007 <- data[format(data$Date, "%Y") == "2007", ]
y2008 <- data[format(data$Date, "%Y") == "2008", ]
y2009 <- data[format(data$Date, "%Y") == "2009", ]

Investigate each new data frame.

## 2007
str(y2007)

## 'data.frame':    521669 obs. of  10 variables:
##  $ id                   : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ Date                 : POSIXct, format: "2007-01-01" "2007-01-01" ...
##  $ Time                 : chr  "00:00:00" "00:01:00" "00:02:00" "00:03:00" ...
##  $ Global_active_power  : num  2.58 2.55 2.55 2.55 2.55 ...
##  $ Global_reactive_power: num  0.136 0.1 0.1 0.1 0.1 0.1 0.096 0 0 0 ...
##  $ Global_intensity     : num  10.6 10.4 10.4 10.4 10.4 10.4 10.4 10.2 10.2 10.2 ...
##  $ Voltage              : num  242 242 242 242 242 ...
##  $ Sub_metering_1       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Sub_metering_2       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Sub_metering_3       : int  0 0 0 0 0 0 0 0 0 0 ...

summary(y2007)

##        id              Date                            Time          
##  Min.   :     1   Min.   :2007-01-01 00:00:00.00   Length:521669     
##  1st Qu.:130423   1st Qu.:2007-04-01 00:00:00.00   Class :character  
##  Median :264606   Median :2007-07-03 00:00:00.00   Mode  :character  
##  Mean   :263456   Mean   :2007-07-02 11:19:19.42                     
##  3rd Qu.:395178   3rd Qu.:2007-10-02 00:00:00.00                     
##  Max.   :525600   Max.   :2007-12-31 00:00:00.00                     
##  Global_active_power Global_reactive_power Global_intensity    Voltage     
##  Min.   : 0.082      Min.   :0.0000        Min.   : 0.400   Min.   :223.5  
##  1st Qu.: 0.278      1st Qu.:0.0000        1st Qu.: 1.200   1st Qu.:236.9  
##  Median : 0.504      Median :0.1000        Median : 2.400   Median :239.7  
##  Mean   : 1.117      Mean   :0.1174        Mean   : 4.764   Mean   :239.4  
##  3rd Qu.: 1.548      3rd Qu.:0.1860        3rd Qu.: 6.400   3rd Qu.:241.8  
##  Max.   :10.670      Max.   :1.1480        Max.   :46.400   Max.   :252.1  
##  Sub_metering_1   Sub_metering_2   Sub_metering_3  
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.000  
##  Median : 0.000   Median : 0.000   Median : 0.000  
##  Mean   : 1.232   Mean   : 1.638   Mean   : 5.795  
##  3rd Qu.: 0.000   3rd Qu.: 1.000   3rd Qu.:17.000  
##  Max.   :78.000   Max.   :78.000   Max.   :20.000

head(y2007)

##   id       Date     Time Global_active_power Global_reactive_power
## 1  1 2007-01-01 00:00:00               2.580                 0.136
## 2  2 2007-01-01 00:01:00               2.552                 0.100
## 3  3 2007-01-01 00:02:00               2.550                 0.100
## 4  4 2007-01-01 00:03:00               2.550                 0.100
## 5  5 2007-01-01 00:04:00               2.554                 0.100
## 6  6 2007-01-01 00:05:00               2.550                 0.100
##   Global_intensity Voltage Sub_metering_1 Sub_metering_2 Sub_metering_3
## 1             10.6  241.97              0              0              0
## 2             10.4  241.75              0              0              0
## 3             10.4  241.64              0              0              0
## 4             10.4  241.71              0              0              0
## 5             10.4  241.98              0              0              0
## 6             10.4  241.83              0              0              0

tail(y2007)

##            id       Date     Time Global_active_power Global_reactive_power
## 521664 525595 2007-12-31 23:54:00               1.648                 0.102
## 521665 525596 2007-12-31 23:55:00               1.746                 0.204
## 521666 525597 2007-12-31 23:56:00               1.732                 0.210
## 521667 525598 2007-12-31 23:57:00               1.732                 0.210
## 521668 525599 2007-12-31 23:58:00               1.684                 0.144
## 521669 525600 2007-12-31 23:59:00               1.628                 0.072
##        Global_intensity Voltage Sub_metering_1 Sub_metering_2 Sub_metering_3
## 521664              6.8  241.56              0              0             18
## 521665              7.2  242.41              0              0             18
## 521666              7.2  242.42              0              0             18
## 521667              7.2  242.50              0              0             18
## 521668              7.0  242.18              0              0             18
## 521669              6.6  241.79              0              0             18

## 2008
str(y2008)

## 'data.frame':    526905 obs. of  10 variables:
##  $ id                   : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ Date                 : POSIXct, format: "2008-01-01" "2008-01-01" ...
##  $ Time                 : chr  "00:00:00" "00:01:00" "00:02:00" "00:03:00" ...
##  $ Global_active_power  : num  1.62 1.63 1.62 1.61 1.61 ...
##  $ Global_reactive_power: num  0.07 0.072 0.072 0.07 0.07 0 0 0 0 0 ...
##  $ Global_intensity     : num  6.6 6.6 6.6 6.6 6.6 6.4 6.4 6.4 6.4 6.4 ...
##  $ Voltage              : num  241 242 242 241 241 ...
##  $ Sub_metering_1       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Sub_metering_2       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Sub_metering_3       : int  18 18 18 18 18 17 18 18 18 18 ...

summary(y2008)

##        id              Date                            Time          
##  Min.   :     1   Min.   :2008-01-01 00:00:00.00   Length:526905     
##  1st Qu.:131732   1st Qu.:2008-04-01 00:00:00.00   Class :character  
##  Median :263461   Median :2008-07-01 00:00:00.00   Mode  :character  
##  Mean   :263474   Mean   :2008-07-01 11:39:11.64                     
##  3rd Qu.:395191   3rd Qu.:2008-10-01 00:00:00.00                     
##  Max.   :527040   Max.   :2008-12-31 00:00:00.00                     
##  Global_active_power Global_reactive_power Global_intensity    Voltage     
##  Min.   : 0.076      Min.   :0.0000        Min.   : 0.200   Min.   :224.6  
##  1st Qu.: 0.300      1st Qu.:0.0460        1st Qu.: 1.400   1st Qu.:238.9  
##  Median : 0.566      Median :0.0940        Median : 2.400   Median :240.7  
##  Mean   : 1.072      Mean   :0.1171        Mean   : 4.552   Mean   :240.6  
##  3rd Qu.: 1.518      3rd Qu.:0.1840        3rd Qu.: 6.400   3rd Qu.:242.5  
##  Max.   :10.348      Max.   :1.3900        Max.   :44.600   Max.   :250.9  
##  Sub_metering_1  Sub_metering_2   Sub_metering_3  
##  Min.   : 0.00   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 0.00   1st Qu.: 0.000   1st Qu.: 0.000  
##  Median : 0.00   Median : 0.000   Median : 1.000  
##  Mean   : 1.11   Mean   : 1.256   Mean   : 6.034  
##  3rd Qu.: 0.00   3rd Qu.: 1.000   3rd Qu.:17.000  
##  Max.   :80.00   Max.   :76.000   Max.   :31.000

head(y2008)

##        id       Date     Time Global_active_power Global_reactive_power
## 521670  1 2008-01-01 00:00:00               1.620                 0.070
## 521671  2 2008-01-01 00:01:00               1.626                 0.072
## 521672  3 2008-01-01 00:02:00               1.622                 0.072
## 521673  4 2008-01-01 00:03:00               1.612                 0.070
## 521674  5 2008-01-01 00:04:00               1.612                 0.070
## 521675  6 2008-01-01 00:05:00               1.546                 0.000
##        Global_intensity Voltage Sub_metering_1 Sub_metering_2 Sub_metering_3
## 521670              6.6  241.25              0              0             18
## 521671              6.6  241.74              0              0             18
## 521672              6.6  241.52              0              0             18
## 521673              6.6  240.82              0              0             18
## 521674              6.6  240.80              0              0             18
## 521675              6.4  240.66              0              0             17

tail(y2008)

##             id       Date     Time Global_active_power Global_reactive_power
## 1048569 527035 2008-12-31 23:54:00               0.484                 0.064
## 1048570 527036 2008-12-31 23:55:00               0.484                 0.064
## 1048571 527037 2008-12-31 23:56:00               0.482                 0.064
## 1048572 527038 2008-12-31 23:57:00               0.482                 0.064
## 1048573 527039 2008-12-31 23:58:00               0.480                 0.064
## 1048574 527040 2008-12-31 23:59:00               0.482                 0.062
##         Global_intensity Voltage Sub_metering_1 Sub_metering_2 Sub_metering_3
## 1048569              2.2  248.15              0              0              0
## 1048570              2.2  247.69              0              0              0
## 1048571              2.2  247.35              0              0              0
## 1048572              2.2  246.99              0              0              0
## 1048573              2.2  246.52              0              0              0
## 1048574              2.2  246.97              0              0              0

## 2009
str(y2009)

## 'data.frame':    521320 obs. of  10 variables:
##  $ id                   : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ Date                 : POSIXct, format: "2009-01-01" "2009-01-01" ...
##  $ Time                 : chr  "00:00:00" "00:01:00" "00:02:00" "00:03:00" ...
##  $ Global_active_power  : num  0.484 0.484 0.482 0.482 0.482 0.57 0.59 0.588 0.586 0.586 ...
##  $ Global_reactive_power: num  0.062 0.062 0.062 0.06 0.062 0 0.078 0.078 0.078 0.078 ...
##  $ Global_intensity     : num  2.2 2.2 2.2 2.2 2.2 2.6 2.6 2.6 2.6 2.6 ...
##  $ Voltage              : num  248 248 248 248 247 ...
##  $ Sub_metering_1       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Sub_metering_2       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Sub_metering_3       : int  0 0 0 0 0 0 0 0 0 0 ...

summary(y2009)

##        id              Date                            Time          
##  Min.   :     1   Min.   :2009-01-01 00:00:00.00   Length:521320     
##  1st Qu.:130398   1st Qu.:2009-04-01 00:00:00.00   Class :character  
##  Median :264038   Median :2009-07-03 00:00:00.00   Mode  :character  
##  Mean   :262890   Mean   :2009-07-02 01:54:31.30                     
##  3rd Qu.:395266   3rd Qu.:2009-10-02 00:00:00.00                     
##  Max.   :525600   Max.   :2009-12-31 00:00:00.00                     
##  Global_active_power Global_reactive_power Global_intensity    Voltage     
##  Min.   : 0.122      Min.   :0.0000        Min.   : 0.400   Min.   :223.2  
##  1st Qu.: 0.318      1st Qu.:0.0520        1st Qu.: 1.400   1st Qu.:240.1  
##  Median : 0.622      Median :0.1060        Median : 2.800   Median :241.9  
##  Mean   : 1.079      Mean   :0.1314        Mean   : 4.555   Mean   :241.9  
##  3rd Qu.: 1.514      3rd Qu.:0.2060        3rd Qu.: 6.200   3rd Qu.:243.6  
##  Max.   :11.122      Max.   :1.2400        Max.   :48.400   Max.   :254.2  
##  Sub_metering_1   Sub_metering_2   Sub_metering_3  
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.000  
##  Median : 0.000   Median : 0.000   Median : 1.000  
##  Mean   : 1.137   Mean   : 1.136   Mean   : 6.823  
##  3rd Qu.: 0.000   3rd Qu.: 1.000   3rd Qu.:18.000  
##  Max.   :82.000   Max.   :77.000   Max.   :31.000

head(y2009)

##         id       Date     Time Global_active_power Global_reactive_power
## 1048575  1 2009-01-01 00:00:00               0.484                 0.062
## 1048576  2 2009-01-01 00:01:00               0.484                 0.062
## 1048577  3 2009-01-01 00:02:00               0.482                 0.062
## 1048578  4 2009-01-01 00:03:00               0.482                 0.060
## 1048579  5 2009-01-01 00:04:00               0.482                 0.062
## 1048580  6 2009-01-01 00:05:00               0.570                 0.000
##         Global_intensity Voltage Sub_metering_1 Sub_metering_2 Sub_metering_3
## 1048575              2.2  247.86              0              0              0
## 1048576              2.2  247.72              0              0              0
## 1048577              2.2  247.75              0              0              0
## 1048578              2.2  247.52              0              0              0
## 1048579              2.2  246.94              0              0              0
## 1048580              2.6  246.94              0              0              0

tail(y2009)

##             id       Date     Time Global_active_power Global_reactive_power
## 1569889 525595 2009-12-31 23:54:00               1.704                 0.128
## 1569890 525596 2009-12-31 23:55:00               1.746                 0.158
## 1569891 525597 2009-12-31 23:56:00               1.786                 0.234
## 1569892 525598 2009-12-31 23:57:00               1.784                 0.232
## 1569893 525599 2009-12-31 23:58:00               1.792                 0.236
## 1569894 525600 2009-12-31 23:59:00               1.792                 0.238
##         Global_intensity Voltage Sub_metering_1 Sub_metering_2 Sub_metering_3
## 1569889              7.0  239.43              0              0             18
## 1569890              7.2  239.95              0              0             18
## 1569891              7.4  240.09              0              0             19
## 1569892              7.4  239.99              0              0             18
## 1569893              7.4  240.62              0              0             18
## 1569894              7.4  240.82              0              0             19

1. Preprocessing DATA

Combine all dataframes

df<-(bind_rows(y2007, y2008, y2009))

str(df)

## 'data.frame':    1569894 obs. of  10 variables:
##  $ id                   : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ Date                 : POSIXct, format: "2007-01-01" "2007-01-01" ...
##  $ Time                 : chr  "00:00:00" "00:01:00" "00:02:00" "00:03:00" ...
##  $ Global_active_power  : num  2.58 2.55 2.55 2.55 2.55 ...
##  $ Global_reactive_power: num  0.136 0.1 0.1 0.1 0.1 0.1 0.096 0 0 0 ...
##  $ Global_intensity     : num  10.6 10.4 10.4 10.4 10.4 10.4 10.4 10.2 10.2 10.2 ...
##  $ Voltage              : num  242 242 242 242 242 ...
##  $ Sub_metering_1       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Sub_metering_2       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Sub_metering_3       : int  0 0 0 0 0 0 0 0 0 0 ...

summary(df)

##        id              Date                            Time          
##  Min.   :     1   Min.   :2007-01-01 00:00:00.00   Length:1569894    
##  1st Qu.:130851   1st Qu.:2007-10-03 00:00:00.00   Class :character  
##  Median :264035   Median :2008-07-01 00:00:00.00   Mode  :character  
##  Mean   :263274   Mean   :2008-07-01 14:19:46.66                     
##  3rd Qu.:395212   3rd Qu.:2009-03-31 00:00:00.00                     
##  Max.   :527040   Max.   :2009-12-31 00:00:00.00                     
##  Global_active_power Global_reactive_power Global_intensity    Voltage     
##  Min.   : 0.076      Min.   :0.0000        Min.   : 0.200   Min.   :223.2  
##  1st Qu.: 0.300      1st Qu.:0.0460        1st Qu.: 1.400   1st Qu.:238.7  
##  Median : 0.566      Median :0.1000        Median : 2.600   Median :240.8  
##  Mean   : 1.089      Mean   :0.1219        Mean   : 4.624   Mean   :240.6  
##  3rd Qu.: 1.524      3rd Qu.:0.1920        3rd Qu.: 6.400   3rd Qu.:242.8  
##  Max.   :11.122      Max.   :1.3900        Max.   :48.400   Max.   :254.2  
##  Sub_metering_1   Sub_metering_2   Sub_metering_3  
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.000  
##  Median : 0.000   Median : 0.000   Median : 1.000  
##  Mean   : 1.159   Mean   : 1.343   Mean   : 6.216  
##  3rd Qu.: 0.000   3rd Qu.: 1.000   3rd Qu.:17.000  
##  Max.   :82.000   Max.   :78.000   Max.   :31.000

head(df)

##   id       Date     Time Global_active_power Global_reactive_power
## 1  1 2007-01-01 00:00:00               2.580                 0.136
## 2  2 2007-01-01 00:01:00               2.552                 0.100
## 3  3 2007-01-01 00:02:00               2.550                 0.100
## 4  4 2007-01-01 00:03:00               2.550                 0.100
## 5  5 2007-01-01 00:04:00               2.554                 0.100
## 6  6 2007-01-01 00:05:00               2.550                 0.100
##   Global_intensity Voltage Sub_metering_1 Sub_metering_2 Sub_metering_3
## 1             10.6  241.97              0              0              0
## 2             10.4  241.75              0              0              0
## 3             10.4  241.64              0              0              0
## 4             10.4  241.71              0              0              0
## 5             10.4  241.98              0              0              0
## 6             10.4  241.83              0              0              0

tail(df)

##             id       Date     Time Global_active_power Global_reactive_power
## 1569889 525595 2009-12-31 23:54:00               1.704                 0.128
## 1569890 525596 2009-12-31 23:55:00               1.746                 0.158
## 1569891 525597 2009-12-31 23:56:00               1.786                 0.234
## 1569892 525598 2009-12-31 23:57:00               1.784                 0.232
## 1569893 525599 2009-12-31 23:58:00               1.792                 0.236
## 1569894 525600 2009-12-31 23:59:00               1.792                 0.238
##         Global_intensity Voltage Sub_metering_1 Sub_metering_2 Sub_metering_3
## 1569889              7.0  239.43              0              0             18
## 1569890              7.2  239.95              0              0             18
## 1569891              7.4  240.09              0              0             19
## 1569892              7.4  239.99              0              0             18
## 1569893              7.4  240.62              0              0             18
## 1569894              7.4  240.82              0              0             19

Combine date and time columns using as.POSIXct. Change the order of columns

df$datetime <- as.POSIXct(paste(df$Date, df$Time), format = "%Y-%m-%d %H:%M:%S")
df <- df[,c(ncol(df), 1:(ncol(df)-1))]

## Analize by year (Lubridate)
df$year <- year(df$datetime)

This archive contains 2075259 measurements gathered in a house located in

Sceaux (7km of Paris, France) between December 2006 and November 2010 (47 months).

http://archive.ics.uci.edu/ml/datasets/Individual+household+electric+power+consumption

Checking for Missing Values

sum(is.na(df))

## [1] 360

df<-na.omit(df)

Checking for Duplicate Values

duplicated_rows <- subset(df, duplicated(df))
print(duplicated_rows)

##  [1] datetime              id                    Date                 
##  [4] Time                  Global_active_power   Global_reactive_power
##  [7] Global_intensity      Voltage               Sub_metering_1       
## [10] Sub_metering_2        Sub_metering_3        year                 
## <0 rows> (or 0-length row.names)

Types of variables

str(df)

## 'data.frame':    1569714 obs. of  12 variables:
##  $ datetime             : POSIXct, format: "2007-01-01 00:00:00" "2007-01-01 00:01:00" ...
##  $ id                   : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ Date                 : POSIXct, format: "2007-01-01" "2007-01-01" ...
##  $ Time                 : chr  "00:00:00" "00:01:00" "00:02:00" "00:03:00" ...
##  $ Global_active_power  : num  2.58 2.55 2.55 2.55 2.55 ...
##  $ Global_reactive_power: num  0.136 0.1 0.1 0.1 0.1 0.1 0.096 0 0 0 ...
##  $ Global_intensity     : num  10.6 10.4 10.4 10.4 10.4 10.4 10.4 10.2 10.2 10.2 ...
##  $ Voltage              : num  242 242 242 242 242 ...
##  $ Sub_metering_1       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Sub_metering_2       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Sub_metering_3       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ year                 : num  2007 2007 2007 2007 2007 ...
##  - attr(*, "na.action")= 'omit' Named int [1:180] 119637 119638 119639 119640 119641 119642 119643 119644 119645 119646 ...
##   ..- attr(*, "names")= chr [1:180] "119637" "119638" "119639" "119640" ...

summary(df)

##     datetime                            id        
##  Min.   :2007-01-01 00:00:00.00   Min.   :     1  
##  1st Qu.:2007-10-03 06:54:15.00   1st Qu.:130896  
##  Median :2008-07-01 20:35:30.00   Median :264065  
##  Mean   :2008-07-02 02:35:23.74   Mean   :263290  
##  3rd Qu.:2009-03-31 13:17:45.00   3rd Qu.:395227  
##  Max.   :2009-12-31 23:59:00.00   Max.   :527040  
##       Date                            Time           Global_active_power
##  Min.   :2007-01-01 00:00:00.00   Length:1569714     Min.   : 0.076     
##  1st Qu.:2007-10-03 00:00:00.00   Class :character   1st Qu.: 0.300     
##  Median :2008-07-01 00:00:00.00   Mode  :character   Median : 0.566     
##  Mean   :2008-07-01 14:35:36.68                      Mean   : 1.089     
##  3rd Qu.:2009-03-31 00:00:00.00                      3rd Qu.: 1.524     
##  Max.   :2009-12-31 00:00:00.00                      Max.   :11.122     
##  Global_reactive_power Global_intensity    Voltage      Sub_metering_1  
##  Min.   :0.0000        Min.   : 0.200   Min.   :223.2   Min.   : 0.000  
##  1st Qu.:0.0460        1st Qu.: 1.400   1st Qu.:238.7   1st Qu.: 0.000  
##  Median :0.1000        Median : 2.600   Median :240.8   Median : 0.000  
##  Mean   :0.1219        Mean   : 4.623   Mean   :240.6   Mean   : 1.159  
##  3rd Qu.:0.1920        3rd Qu.: 6.400   3rd Qu.:242.8   3rd Qu.: 0.000  
##  Max.   :1.3900        Max.   :48.400   Max.   :254.2   Max.   :82.000  
##  Sub_metering_2   Sub_metering_3        year     
##  Min.   : 0.000   Min.   : 0.000   Min.   :2007  
##  1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.:2007  
##  Median : 0.000   Median : 1.000   Median :2008  
##  Mean   : 1.343   Mean   : 6.216   Mean   :2008  
##  3rd Qu.: 1.000   3rd Qu.:17.000   3rd Qu.:2009  
##  Max.   :78.000   Max.   :31.000   Max.   :2009

Which sub-meter is using the most power? The least? Is there anything to learn from the max and min?

From the mean of each sub-meter:

Sub-meter_3 uses the most at 6.458 mean

Sub_meter_1 uses the least at mean 1.122

2. DATA EXPLORATION

Explore by year, month, day using Lubridate:

df$year <- year(df$datetime)
df$month <-month(df$datetime)
df$week <-week(df$datetime)
df$day<- day(df$datetime)
df$hour<-hour(df$datetime)
df$minute<-minute(df$datetime)
df$weekday <- weekdays(df$datetime)
df$weekday <- factor(df$weekday, levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"))

df$season <- df %>%
  mutate(season = case_when(
    month %in%  9:11 ~ "Fall",
    month %in%  c(12, 1, 2)  ~ "Winter",
    month %in%  3:5  ~ "Spring",
    TRUE ~ "Summer"))

# Now, let's create a sample dataframe with 1000 rows from the larger dataframe
set.seed(123) # set seed for reproducibility
sample_size <- 20000
sample_indices <- sample(1:nrow(df), sample_size, replace = FALSE)
df_sample <- df[sample_indices, ]

head(df_sample)

##                    datetime     id       Date     Time Global_active_power
## 1237698 2009-05-12 09:13:00 189194 2009-05-12 09:13:00               1.374
## 134118  2007-04-04 03:22:00 134123 2007-04-04 03:22:00               0.318
## 1172718 2009-03-28 06:10:00 124211 2009-03-28 06:10:00               0.354
## 685405  2008-04-23 17:00:00 163741 2008-04-23 17:00:00               0.426
## 1275074 2009-06-07 08:12:00 226573 2009-06-07 08:12:00               0.222
## 1413965 2009-09-14 17:04:00 369665 2009-09-14 17:04:00               0.368
##         Global_reactive_power Global_intensity Voltage Sub_metering_1
## 1237698                 0.000              5.6  239.61              0
## 134118                  0.110              1.4  241.87              0
## 1172718                 0.100              1.6  243.52              0
## 685405                  0.048              2.0  243.82              0
## 1275074                 0.000              0.8  240.12              0
## 1413965                 0.256              1.8  243.51              0
##         Sub_metering_2 Sub_metering_3 year month week day hour minute   weekday
## 1237698              0             18 2009     5   19  12    9     13   Tuesday
## 134118               0              0 2007     4   14   4    3     22 Wednesday
## 1172718              1              0 2009     3   13  28    6     10  Saturday
## 685405               0              1 2008     4   17  23   17      0 Wednesday
## 1275074              0              0 2009     6   23   7    8     12    Sunday
## 1413965              0              0 2009     9   37  14   17      4    Monday
##             season.datetime season.id season.Date season.Time
## 1237698 2009-05-12 09:13:00    189194  2009-05-12    09:13:00
## 134118  2007-04-04 03:22:00    134123  2007-04-04    03:22:00
## 1172718 2009-03-28 06:10:00    124211  2009-03-28    06:10:00
## 685405  2008-04-23 17:00:00    163741  2008-04-23    17:00:00
## 1275074 2009-06-07 08:12:00    226573  2009-06-07    08:12:00
## 1413965 2009-09-14 17:04:00    369665  2009-09-14    17:04:00
##         season.Global_active_power season.Global_reactive_power
## 1237698                      1.374                        0.000
## 134118                       0.318                        0.110
## 1172718                      0.354                        0.100
## 685405                       0.426                        0.048
## 1275074                      0.222                        0.000
## 1413965                      0.368                        0.256
##         season.Global_intensity season.Voltage season.Sub_metering_1
## 1237698                     5.6         239.61                     0
## 134118                      1.4         241.87                     0
## 1172718                     1.6         243.52                     0
## 685405                      2.0         243.82                     0
## 1275074                     0.8         240.12                     0
## 1413965                     1.8         243.51                     0
##         season.Sub_metering_2 season.Sub_metering_3 season.year season.month
## 1237698                     0                    18        2009            5
## 134118                      0                     0        2007            4
## 1172718                     1                     0        2009            3
## 685405                      0                     1        2008            4
## 1275074                     0                     0        2009            6
## 1413965                     0                     0        2009            9
##         season.week season.day season.hour season.minute season.weekday
## 1237698          19         12           9            13        Tuesday
## 134118           14          4           3            22      Wednesday
## 1172718          13         28           6            10       Saturday
## 685405           17         23          17             0      Wednesday
## 1275074          23          7           8            12         Sunday
## 1413965          37         14          17             4         Monday
##         season.season
## 1237698        Spring
## 134118         Spring
## 1172718        Spring
## 685405         Spring
## 1275074        Summer
## 1413965          Fall

write.csv(df_sample, file = "df_sample.csv", row.names = FALSE)

Using mean:

by_year <- df %>% group_by(year)

by_year %>% summarise(
  Sub_metering_1 = mean(Sub_metering_1),
  Sub_metering_2 = mean(Sub_metering_2),
  Sub_metering_3 = mean(Sub_metering_3)
)

## # A tibble: 3 × 4
##    year Sub_metering_1 Sub_metering_2 Sub_metering_3
##   <dbl>          <dbl>          <dbl>          <dbl>
## 1  2007           1.23           1.64           5.80
## 2  2008           1.11           1.26           6.03
## 3  2009           1.14           1.14           6.82

by_month <- df %>% group_by(month)

by_month %>% summarise(
  Sub_metering_1 = mean(Sub_metering_1),
  Sub_metering_2 = mean(Sub_metering_2),
  Sub_metering_3 = mean(Sub_metering_3)
)

## # A tibble: 12 × 4
##    month Sub_metering_1 Sub_metering_2 Sub_metering_3
##    <dbl>          <dbl>          <dbl>          <dbl>
##  1     1          1.44           1.61            7.25
##  2     2          1.10           1.41            6.72
##  3     3          1.41           1.74            6.71
##  4     4          1.14           1.29            6.30
##  5     5          1.29           1.35            6.21
##  6     6          1.27           1.29            5.85
##  7     7          0.812          1.07            4.26
##  8     8          0.557          0.828           3.74
##  9     9          1.21           1.28            6.23
## 10    10          1.06           1.53            6.42
## 11    11          1.31           1.41            7.07
## 12    12          1.32           1.31            7.87

by_week<- df %>% group_by(week)

by_week %>% summarise(
  Sub_metering_1 = mean(Sub_metering_1),
  Sub_metering_2 = mean(Sub_metering_2),
  Sub_metering_3 = mean(Sub_metering_3)
)

## # A tibble: 53 × 4
##     week Sub_metering_1 Sub_metering_2 Sub_metering_3
##    <dbl>          <dbl>          <dbl>          <dbl>
##  1     1          1.03            1.73           6.15
##  2     2          1.74            1.25           7.50
##  3     3          1.67            1.81           7.91
##  4     4          1.44            1.61           7.17
##  5     5          1.33            1.59           8.25
##  6     6          1.47            1.66           7.69
##  7     7          1.08            1.25           7.02
##  8     8          0.990           1.53           5.94
##  9     9          0.658           1.19           3.75
## 10    10          1.64            2.08           6.51
## # ℹ 43 more rows

by_day<- df %>% group_by(day)

by_day %>% summarise(
  Sub_metering_1 = mean(Sub_metering_1),
  Sub_metering_2 = mean(Sub_metering_2),
  Sub_metering_3 = mean(Sub_metering_3)
)

## # A tibble: 31 × 4
##      day Sub_metering_1 Sub_metering_2 Sub_metering_3
##    <int>          <dbl>          <dbl>          <dbl>
##  1     1          1.27            1.00           5.93
##  2     2          0.844           1.27           5.90
##  3     3          1.14            1.14           5.47
##  4     4          1.10            1.98           6.25
##  5     5          1.27            1.60           6.26
##  6     6          1.08            1.29           6.02
##  7     7          1.22            1.47           6.45
##  8     8          1.43            1.49           6.90
##  9     9          1.08            1.17           6.21
## 10    10          1.28            1.22           6.43
## # ℹ 21 more rows

by_hour<- df %>% group_by(day)

by_hour %>% summarise(
  Sub_metering_1 = mean(Sub_metering_1),
  Sub_metering_2 = mean(Sub_metering_2),
  Sub_metering_3 = mean(Sub_metering_3)
)

## # A tibble: 31 × 4
##      day Sub_metering_1 Sub_metering_2 Sub_metering_3
##    <int>          <dbl>          <dbl>          <dbl>
##  1     1          1.27            1.00           5.93
##  2     2          0.844           1.27           5.90
##  3     3          1.14            1.14           5.47
##  4     4          1.10            1.98           6.25
##  5     5          1.27            1.60           6.26
##  6     6          1.08            1.29           6.02
##  7     7          1.22            1.47           6.45
##  8     8          1.43            1.49           6.90
##  9     9          1.08            1.17           6.21
## 10    10          1.28            1.22           6.43
## # ℹ 21 more rows

by_minute<- df %>% group_by(day)

by_minute %>% summarise(
  Sub_metering_1 = mean(Sub_metering_1),
  Sub_metering_2 = mean(Sub_metering_2),
  Sub_metering_3 = mean(Sub_metering_3)
)

## # A tibble: 31 × 4
##      day Sub_metering_1 Sub_metering_2 Sub_metering_3
##    <int>          <dbl>          <dbl>          <dbl>
##  1     1          1.27            1.00           5.93
##  2     2          0.844           1.27           5.90
##  3     3          1.14            1.14           5.47
##  4     4          1.10            1.98           6.25
##  5     5          1.27            1.60           6.26
##  6     6          1.08            1.29           6.02
##  7     7          1.22            1.47           6.45
##  8     8          1.43            1.49           6.90
##  9     9          1.08            1.17           6.21
## 10    10          1.28            1.22           6.43
## # ℹ 21 more rows

#LEGEND: #SUB_METER 1-> KITCHEN #SUB_METER_2-> LAUNDRY ROOM #SUB_METER_3-> WATER HEATER & AC

#1. SUM ALL 3 SUBMETERS

df$total_sub_meter<-df$Sub_metering_1 + df$Sub_metering_2 + df$Sub_metering_3

#2.WEEKLY SUB_METER_MEANS

#2.1 WEEKLY/WEEKDAY/MONTHLY/HOURLY MEAN SUB_METER_1 KITCHEN for 2007 & 2008 & 2009

mean_per_week_kitchen <- df %>%
  filter(year %in% c(2007, 2008, 2009)) %>%
  group_by(year, week) %>%
  summarise(mean = mean(Sub_metering_1), .groups = "drop") %>%
  arrange(year, week)

mean_per_weekday_kitchen <- df %>%
  filter(year %in% c(2007, 2008, 2009)) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(Sub_metering_1), .groups = "drop") %>%
  arrange(year, weekday)

mean_per_month_kitchen <- df %>%
  filter(year %in% c(2007, 2008, 2009)) %>%
  group_by(year, month, week, day, weekday) %>%
  summarise(mean = mean(Sub_metering_1), .groups = "drop") %>%
  arrange(year, month, week, day, weekday)

mean_per_hour_kitchen <- df %>%
  filter(year %in% c(2007, 2008, 2009)) %>%
  group_by(year, hour) %>%
  summarise(mean = mean(Sub_metering_1), .groups = "drop") %>%
  arrange(year, hour)

2.2. WEEKLY/WEEKDAY/MONTHLY MEAN SUB_METER_2 LAUNDRY ROOM for 2007 & 2008 & 2009

mean_per_week_laundry <- df %>%
  filter(year %in% c(2007, 2008, 2009)) %>%
  group_by(year, week) %>%
  summarise(mean = mean(Sub_metering_2), .groups = "drop") %>%
  arrange(year)

mean_per_weekday_laundry<- df %>%
  filter(year %in% c(2007, 2008, 2009)) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(Sub_metering_2), .groups = "drop") %>%
  arrange(year, weekday) 

mean_per_month_laundry <- df %>%
  filter(year %in% c(2007, 2008, 2009)) %>%
  group_by(year, month, week, day, weekday) %>%
  summarise(mean = mean(Sub_metering_2), .groups = "drop") %>%
  arrange(year, month, week, day, weekday) 

mean_per_hour_laundry <- df %>%
  filter(year %in% c(2007, 2008, 2009)) %>%
  group_by(year, hour) %>%
  summarise(mean = mean(Sub_metering_2), .groups = "drop") %>%
  arrange(year, hour)

2.3. WEEKLY/WEEKDAY/MONTHLY MEAN SUB_METER_3 (’Water Heater & AC) for 2007, 2008 & 2009, decomposed

mean_per_week_water <- df %>%
  filter(year %in% c(2007, 2008, 2009)) %>%
  group_by(year, week) %>%
  summarise(mean = mean(Sub_metering_3), .groups = "drop") %>%
  arrange(year, week)

mean_per_weekday_water <- df %>%
  filter(year %in% c(2007, 2008, 2009)) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(Sub_metering_3), .groups = "drop") %>%
  arrange(year, weekday) 

mean_per_month_water <- df %>%
  filter(year %in% c(2007, 2008, 2009)) %>%
  group_by(year, month, week, day, weekday) %>%
  summarise(mean = mean(Sub_metering_3), .groups = "drop") %>%
  arrange(year, month, week, day, weekday) 

mean_per_water <- df %>%
  filter(year %in% c(2007, 2008, 2009)) %>%
  group_by(year, weekday, hour) %>%
  summarise(mean = mean(Sub_metering_3), .groups = "drop") %>%
  arrange(year, weekday, hour) 

mean_per_hour_water <- df %>%
  filter(year %in% c(2007, 2008, 2009)) %>%
  group_by(year, hour) %>%
  summarise(mean = mean(Sub_metering_3), .groups = "drop") %>%
  arrange(year, hour)

2.4. TOTAL SUB_METER MEAN PER DAY

mean_per_week_total <-df %>%
  filter(year %in% c(2007, 2008, 2009)) %>%
  group_by(year, week) %>%
  summarise(mean = mean(total_sub_meter), .groups = "drop") %>%
  arrange(year, week)

mean_per_weekday_total <- df %>%
  filter(year %in% c(2007, 2008, 2009)) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(Sub_metering_1+Sub_metering_2+Sub_metering_3), .groups = "drop") %>%
  arrange(year, weekday) 

mean_per_month_total <-df %>%
  filter(year %in% c(2007, 2008, 2009)) %>%
  group_by(year, month, week, day, weekday) %>%
  summarise(mean = mean(total_sub_meter), .groups = "drop") %>%
  arrange(year, month, week, day, weekday)

2.5. TOTAL SUB_METER MEAN PER YEAR

mean_2007 <- df %>%
  filter(year %in% c(2007)) %>%
  group_by(weekday) %>%
  summarise('2007' = mean(total_sub_meter), .groups = "drop") %>%
  arrange(weekday) 

mean_2008 <- df %>%
  filter(year %in% c(2008)) %>%
  group_by(weekday) %>%
  summarise('2008' = mean(total_sub_meter), .groups = "drop") %>%
  arrange(weekday) 

mean_2009 <- df %>%
  filter(year %in% c(2009)) %>%
  group_by(year, weekday) %>%
  summarise('2009' = mean(total_sub_meter), .groups = "drop") %>%
  arrange(year, weekday)

#3. PER SEASON and SUB-METER AND WEEKDAY

#3.1. SUB-METER 1 - KITCHEN

season_1 <-df %>%
  group_by(year, month, weekday) %>%
  summarise(mean = mean(Sub_metering_1)) %>%
  arrange(year, month)

## `summarise()` has grouped output by 'year', 'month'. You can override using the
## `.groups` argument.

season_1<- season_1 %>%
  mutate(season = case_when(
    month %in%  9:11 ~ "Fall",
    month %in%  c(12, 1, 2)  ~ "Winter",
    month %in%  3:5  ~ "Spring",
    TRUE ~ "Summer")) 

season_1 <- subset(season_1, select = - c(month))   
season_1

## # A tibble: 252 × 4
## # Groups:   year [3]
##     year weekday    mean season
##    <dbl> <fct>     <dbl> <chr> 
##  1  2007 Monday    0.455 Winter
##  2  2007 Tuesday   0.677 Winter
##  3  2007 Wednesday 1.07  Winter
##  4  2007 Thursday  0.623 Winter
##  5  2007 Friday    0.824 Winter
##  6  2007 Saturday  2.54  Winter
##  7  2007 Sunday    3.05  Winter
##  8  2007 Monday    0.826 Winter
##  9  2007 Tuesday   0.563 Winter
## 10  2007 Wednesday 1.76  Winter
## # ℹ 242 more rows

season_1_1 <-season_1 %>%
  filter(season %in% c("Winter")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean)) %>%
  arrange(year, weekday)

## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.

season_1_2 <-season_1 %>%
  filter(season %in% c("Spring")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean)) %>%
  arrange(year, weekday)

## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.

season_1_3 <-season_1 %>%
  filter(season %in% c("Summer")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean)) %>%
  arrange(year, weekday)

## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.

season_1_4 <-season_1 %>%
  filter(season %in% c("Fall")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean)) %>%
  arrange(year, weekday)

## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.

season_kitchen<-season_1_1 %>% 
  left_join(season_1_2, by = c("year","weekday")) %>%
  left_join(season_1_3, by = c("year","weekday")) %>%
  left_join(season_1_4, by = c("year","weekday"))

seasonal_kitchen <- season_kitchen %>%
  rename(mean_per_kitchen_winter = mean.x,
         mean_per_kitchen_spring = mean.y,
         mean_per_kitchen_summer = mean.x.x,
         mean_per_kitchen_fall = mean.y.y)

#3.1. SUB-METER 2 - LAUNDRY

season_2 <-df %>%
  group_by(year, month, weekday) %>%
  summarise(mean = mean(Sub_metering_2)) %>%
  arrange(year, month)

## `summarise()` has grouped output by 'year', 'month'. You can override using the
## `.groups` argument.

season_2<- season_2 %>%
  mutate(season = case_when(
    month %in%  9:11 ~ "Fall",
    month %in%  c(12, 1, 2)  ~ "Winter",
    month %in%  3:5  ~ "Spring",
    TRUE ~ "Summer")) 

season_2 <- subset(season_2, select = - c(month))   
season_2

## # A tibble: 252 × 4
## # Groups:   year [3]
##     year weekday    mean season
##    <dbl> <fct>     <dbl> <chr> 
##  1  2007 Monday    0.880 Winter
##  2  2007 Tuesday   1.08  Winter
##  3  2007 Wednesday 2.63  Winter
##  4  2007 Thursday  1.80  Winter
##  5  2007 Friday    0.245 Winter
##  6  2007 Saturday  2.17  Winter
##  7  2007 Sunday    3.81  Winter
##  8  2007 Monday    1.08  Winter
##  9  2007 Tuesday   1.52  Winter
## 10  2007 Wednesday 1.73  Winter
## # ℹ 242 more rows

season_2_1 <-season_2 %>%
  filter(season %in% c("Winter")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean)) %>%
  arrange(year, weekday)

## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.

season_2_2 <-season_2 %>%
  filter(season %in% c("Spring")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean)) %>%
  arrange(year, weekday)

## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.

season_2_3 <-season_2 %>%
  filter(season %in% c("Summer")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean)) %>%
  arrange(year, weekday)

## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.

season_2_4 <-season_2 %>%
  filter(season %in% c("Fall")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean)) %>%
  arrange(year, weekday)

## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.

season_laundry<-season_2_1 %>% 
  left_join(season_2_2, by = c("year","weekday")) %>%
  left_join(season_2_3, by = c("year","weekday")) %>%
  left_join(season_2_4, by = c("year","weekday"))

seasonal_laundry <- season_laundry %>%
  rename(mean_per_laundry_winter = mean.x,
         mean_per_laundry_spring = mean.y,
         mean_per_laundry_summer = mean.x.x,
         mean_per_laundry_fall = mean.y.y)

#3.3. SUB-METER 3 - WATER

season_3 <-df %>%
  group_by(year, month, weekday) %>%
  summarise(mean = mean(Sub_metering_3)) %>%
  arrange(year, month)

## `summarise()` has grouped output by 'year', 'month'. You can override using the
## `.groups` argument.

season_3<- season_3 %>%
  mutate(season = case_when(
    month %in%  9:11 ~ "Fall",
    month %in%  c(12, 1, 2)  ~ "Winter",
    month %in%  3:5  ~ "Spring",
    TRUE ~ "Summer")) 

season_3 <- subset(season_3, select = - c(month))   
season_3

## # A tibble: 252 × 4
## # Groups:   year [3]
##     year weekday    mean season
##    <dbl> <fct>     <dbl> <chr> 
##  1  2007 Monday     9.06 Winter
##  2  2007 Tuesday    5.02 Winter
##  3  2007 Wednesday  8.14 Winter
##  4  2007 Thursday   8.34 Winter
##  5  2007 Friday     5.43 Winter
##  6  2007 Saturday   7.09 Winter
##  7  2007 Sunday     8.58 Winter
##  8  2007 Monday     6.13 Winter
##  9  2007 Tuesday    3.89 Winter
## 10  2007 Wednesday  6.44 Winter
## # ℹ 242 more rows

season_3_1 <-season_3 %>%
  filter(season %in% c("Winter")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean)) %>%
  arrange(year, weekday)

## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.

season_3_2 <-season_3 %>%
  filter(season %in% c("Spring")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean)) %>%
  arrange(year, weekday)

## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.

season_3_3 <-season_3 %>%
  filter(season %in% c("Summer")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean)) %>%
  arrange(year, weekday)

## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.

season_3_4 <-season_3 %>%
  filter(season %in% c("Fall")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean)) %>%
  arrange(year, weekday)

## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.

season_water<-season_3_1 %>% 
  left_join(season_3_2, by = c("year","weekday")) %>%
  left_join(season_3_3, by = c("year","weekday")) %>%
  left_join(season_3_4, by = c("year","weekday"))

seasonal_water <- season_water %>%
  rename(mean_per_water_winter = mean.x,
         mean_per_water_spring = mean.y,
         mean_per_water_summer = mean.x.x,
         mean_per_water_fall = mean.y.y)

#3. PER HOUR and SUB-METER AND WEEKDAY

#3.1. SUB-METER 1 - KITCHEN

hour_kitchen <-df %>%
  group_by(year, weekday, hour) %>%
  summarise(mean = mean(Sub_metering_1), .groups = "drop") %>%
  arrange(year, weekday, hour) 

hour_kitchen<- hour_kitchen %>%
  mutate(timezone = case_when(
    hour %in%  6:13 ~ "Morning",
    hour %in%  13:15  ~ "MidDay",
    hour %in%  15:19  ~ "Afternoon",
    hour %in%  19:23  ~ "Noon",
    hour %in%  c(23, 0, 1, 2, 3, 4, 5, 6)  ~ "Night")) 

hour_morning_kitchen <-hour_kitchen %>%
  filter(timezone%in% c("Morning")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, weekday) 

hour_midday_kitchen <-hour_kitchen %>%
  filter(timezone %in% c("MidDay")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, weekday) 

hour_afternoon_kitchen <-hour_kitchen %>%
  filter(timezone %in% c("Afternoon")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, weekday) 

hour_noon_kitchen <-hour_kitchen %>%
  filter(timezone %in% c("Noon")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, weekday) 

hour_night_kitchen <-hour_kitchen %>%
  filter(timezone %in% c("Night")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, weekday) 

hour_kitchen<-hour_morning_kitchen %>% 
  left_join(hour_midday_kitchen, by = c("year","weekday")) %>%
  left_join(hour_afternoon_kitchen, by = c("year","weekday")) %>%
  left_join(hour_noon_kitchen, by = c("year","weekday")) %>%
  left_join(hour_night_kitchen, by = c("year","weekday"))
  
hour_kitchen <- hour_kitchen %>%
  rename(morning_kitchen = mean.x,
         midday_kitchen = mean.y,
         afternoon_kitchen = mean.x.x,
         noon_kitchen = mean.y.y,
         night_kitchen = mean)

#3.2. SUB-METER 2 - LAUNDRY

hour_laundry <-df %>%
  group_by(year, weekday, hour) %>%
  summarise(mean = mean(Sub_metering_2), .groups = "drop") %>%
  arrange(year, weekday, hour) 

hour_laundry<- hour_laundry %>%
  mutate(timezone = case_when(
    hour %in%  6:13 ~ "Morning",
    hour %in%  13:15  ~ "MidDay",
    hour %in%  15:19  ~ "Afternoon",
    hour %in%  19:23  ~ "Noon",
    hour %in%  c(23, 0, 1, 2, 3, 4, 5, 6)  ~ "Night")) 

hour_morning_laundry <-hour_laundry %>%
  filter(timezone%in% c("Morning")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, weekday) 

hour_midday_laundry <-hour_laundry %>%
  filter(timezone %in% c("MidDay")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, weekday) 

hour_afternoon_laundry <-hour_laundry %>%
  filter(timezone %in% c("Afternoon")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, weekday) 

hour_noon_laundry <-hour_laundry %>%
  filter(timezone %in% c("Noon")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, weekday) 

hour_night_laundry <-hour_laundry %>%
  filter(timezone %in% c("Night")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, weekday) 

hour_laundry<-hour_morning_laundry %>% 
  left_join(hour_midday_laundry, by = c("year","weekday")) %>%
  left_join(hour_afternoon_laundry, by = c("year","weekday")) %>%
  left_join(hour_noon_laundry, by = c("year","weekday")) %>%
  left_join(hour_night_laundry, by = c("year","weekday"))

hour_laundry <- hour_laundry %>%
  rename(morning_laundry = mean.x,
         midday_laundry = mean.y,
         afternoon_laundry = mean.x.x,
         noon_laundry = mean.y.y,
         night_laundry = mean)

#3.3. SUB-METER 3 - WATER

hour_water <-df %>%
  group_by(year, weekday, hour) %>%
  summarise(mean = mean(Sub_metering_3), .groups = "drop") %>%
  arrange(year, weekday, hour) 

hour_water<- hour_water %>%
  mutate(timezone = case_when(
    hour %in%  6:13 ~ "Morning",
    hour %in%  13:15  ~ "MidDay",
    hour %in%  15:19  ~ "Afternoon",
    hour %in%  19:23  ~ "Noon",
    hour %in%  c(23, 0, 1, 2, 3, 4, 5, 6)  ~ "Night")) 

hour_morning_water <-hour_water %>%
  filter(timezone%in% c("Morning")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, weekday) 

hour_midday_water <-hour_water %>%
  filter(timezone %in% c("MidDay")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, weekday) 

hour_afternoon_water <-hour_water %>%
  filter(timezone %in% c("Afternoon")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, weekday) 

hour_noon_water <-hour_water %>%
  filter(timezone %in% c("Noon")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, weekday) 

hour_night_water <-hour_water %>%
  filter(timezone %in% c("Night")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, weekday) 

hour_water<-hour_morning_water %>% 
  left_join(hour_midday_water, by = c("year","weekday")) %>%
  left_join(hour_afternoon_water, by = c("year","weekday")) %>%
  left_join(hour_noon_water, by = c("year","weekday")) %>%
  left_join(hour_night_water, by = c("year","weekday"))

hour_water <- hour_water %>%
  rename(morning_water = mean.x,
         midday_water = mean.y,
         afternoon_water = mean.x.x,
         noon_water = mean.y.y,
         night_water = mean)

#3.1. SUB-METER 1 - KITCHEN

hour_week_kitchen <-df %>%
  group_by(year, week, hour) %>%
  summarise(mean = mean(Sub_metering_1), .groups = "drop") %>%
  arrange(year, week, hour) 

hour_week_kitchen<- hour_week_kitchen %>%
  mutate(timezone = case_when(
    hour %in%  6:13 ~ "Morning",
    hour %in%  13:15  ~ "MidDay",
    hour %in%  15:19  ~ "Afternoon",
    hour %in%  19:23  ~ "Noon",
    hour %in%  c(23, 0, 1, 2, 3, 4, 5, 6)  ~ "Night")) 

hour_morning_week_kitchen <-hour_week_kitchen %>%
  filter(timezone%in% c("Morning")) %>%
  group_by(year, week) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, week) 

hour_midday_week_kitchen <-hour_week_kitchen %>%
  filter(timezone %in% c("MidDay")) %>%
  group_by(year, week) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, week) 

hour_afternoon_week_kitchen <-hour_week_kitchen %>%
  filter(timezone %in% c("Afternoon")) %>%
  group_by(year, week) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, week) 

hour_noon_week_kitchen <-hour_week_kitchen %>%
  filter(timezone %in% c("Noon")) %>%
  group_by(year, week) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, week) 

hour_night_week_kitchen<-kitchen <-hour_week_kitchen %>%
  filter(timezone %in% c("Night")) %>%
  group_by(year, week) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, week) 

hour_week_kitchen<-hour_morning_week_kitchen %>% 
  left_join(hour_midday_week_kitchen, by = c("year","week")) %>%
  left_join(hour_afternoon_week_kitchen, by = c("year","week")) %>%
  left_join(hour_noon_week_kitchen, by = c("year","week")) %>%
  left_join(hour_night_week_kitchen, by = c("year","week"))

hour_week_kitchen <- hour_week_kitchen %>%
  rename(morning_week_kitchen = mean.x,
         midday_weeK_kitchen = mean.y,
         afternoon_week_kitchen = mean.x.x,
         noon_week_kitchen = mean.y.y,
         night_week_kitchen = mean)

#3.2. SUB-METER 2 - LAUNDRY

hour_laundry <-df %>%
  group_by(year, weekday, hour) %>%
  summarise(mean = mean(Sub_metering_2), .groups = "drop") %>%
  arrange(year, weekday, hour) 

hour_laundry<- hour_laundry %>%
  mutate(timezone = case_when(
    hour %in%  6:13 ~ "Morning",
    hour %in%  13:15  ~ "MidDay",
    hour %in%  15:19  ~ "Afternoon",
    hour %in%  19:23  ~ "Noon",
    hour %in%  c(23, 0, 1, 2, 3, 4, 5, 6)  ~ "Night")) 

hour_morning_laundry <-hour_laundry %>%
  filter(timezone%in% c("Morning")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, weekday) 

hour_midday_laundry <-hour_laundry %>%
  filter(timezone %in% c("MidDay")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, weekday) 

hour_afternoon_laundry <-hour_laundry %>%
  filter(timezone %in% c("Afternoon")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, weekday) 

hour_noon_laundry <-hour_laundry %>%
  filter(timezone %in% c("Noon")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, weekday) 

hour_night_laundry <-hour_laundry %>%
  filter(timezone %in% c("Night")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, weekday) 

hour_laundry<-hour_morning_laundry %>% 
  left_join(hour_midday_laundry, by = c("year","weekday")) %>%
  left_join(hour_afternoon_laundry, by = c("year","weekday")) %>%
  left_join(hour_noon_laundry, by = c("year","weekday")) %>%
  left_join(hour_night_laundry, by = c("year","weekday"))

hour_laundry <- hour_laundry %>%
  rename(morning_laundry = mean.x,
         midday_laundry = mean.y,
         afternoon_laundry = mean.x.x,
         noon_laundry = mean.y.y,
         night_laundry = mean)

#3.3. SUB-METER 3 - WATER

hour_water <-df %>%
  group_by(year, weekday, hour) %>%
  summarise(mean = mean(Sub_metering_3), .groups = "drop") %>%
  arrange(year, weekday, hour) 

hour_water<- hour_water %>%
  mutate(timezone = case_when(
    hour %in%  6:13 ~ "Morning",
    hour %in%  13:15  ~ "MidDay",
    hour %in%  15:19  ~ "Afternoon",
    hour %in%  19:23  ~ "Noon",
    hour %in%  c(23, 0, 1, 2, 3, 4, 5, 6)  ~ "Night")) 

hour_morning_water <-hour_water %>%
  filter(timezone%in% c("Morning")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, weekday) 

hour_midday_water <-hour_water %>%
  filter(timezone %in% c("MidDay")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, weekday) 

hour_afternoon_water <-hour_water %>%
  filter(timezone %in% c("Afternoon")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, weekday) 

hour_noon_water <-hour_water %>%
  filter(timezone %in% c("Noon")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, weekday) 

hour_night_water <-hour_water %>%
  filter(timezone %in% c("Night")) %>%
  group_by(year, weekday) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, weekday) 

hour_water<-hour_morning_water %>% 
  left_join(hour_midday_water, by = c("year","weekday")) %>%
  left_join(hour_afternoon_water, by = c("year","weekday")) %>%
  left_join(hour_noon_water, by = c("year","weekday")) %>%
  left_join(hour_night_water, by = c("year","weekday"))

hour_water <- hour_water %>%
  rename(morning_water = mean.x,
         midday_water = mean.y,
         afternoon_water = mean.x.x,
         noon_water = mean.y.y,
         night_water = mean)

#4.1. SUB-METER 1 - KITCHEN BY HOUR AND MONTH

month_kitchen <-df %>%
  group_by(year, month, hour) %>%
  summarise(mean = mean(Sub_metering_1), .groups = "drop") %>%
  arrange(year, month, hour) 

month_kitchen<- month_kitchen %>%
  mutate(timezone = case_when(
    hour %in%  6:13 ~ "Morning",
    hour %in%  13:19  ~ "Afternoon",
    hour %in%  19:23  ~ "Dusk",
    hour %in%  c(23, 0, 1, 2, 3, 4, 5, 6)  ~ "Night")) 

month_kitchen<- month_kitchen %>%
  mutate(season = case_when(
                month %in%  9:11 ~ "Fall",
                month %in%  c(12, 1, 2)  ~ "Winter",
                month %in%  3:5  ~ "Spring",
                TRUE ~ "Summer")) 

month_morning_kitchen <-month_kitchen %>%
  filter(timezone %in% c("Morning")) %>%
  group_by(year, month) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, month) 

month_afternoon_kitchen <-month_kitchen %>%
  filter(timezone %in% c("Afternoon")) %>%
  group_by(year, month) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, month) 

month_noon_kitchen <-month_kitchen %>%
  filter(timezone %in% c("Afternoon")) %>%
  group_by(year, month) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, month) 

month_dusk_kitchen <-month_kitchen %>%
  filter(timezone %in% c("Dusk")) %>%
  group_by(year, month) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, month) 

month_night_kitchen <-month_kitchen %>%
  filter(timezone %in% c("Night")) %>%
  group_by(year, month) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, month) 

month_kitchen<-month_morning_kitchen %>% 
  left_join(month_afternoon_kitchen, by = c("year","month")) %>%
  left_join(month_dusk_kitchen, by = c("year","month")) %>%
  left_join(month_night_kitchen, by = c("year","month")) 

month_kitchen <- month_kitchen %>%
  rename(morning_kitchen = mean.x,
         afternoon_kitchen = mean.y,
         dusk_kitchen = mean.x.x,
         night_kitchen = mean.y.y)

#3.4. SUB-METER 2 - LAUNDRY

month_laundry <-df %>%
  group_by(year, month, hour) %>%
  summarise(mean = mean(Sub_metering_1), .groups = "drop") %>%
  arrange(year, month, hour) 

month_laundry<- month_laundry %>%
  mutate(timezone = case_when(
    hour %in%  6:13 ~ "Morning",
    hour %in%  13:19  ~ "Afternoon",
    hour %in%  19:23  ~ "Dusk",
    hour %in%  c(23, 0, 1, 2, 3, 4, 5, 6)  ~ "Night")) 

month_laundry<- month_laundry %>%
  mutate(season = case_when(
    month %in%  9:11 ~ "Fall",
    month %in%  c(12, 1, 2)  ~ "Winter",
    month %in%  3:5  ~ "Spring",
    TRUE ~ "Summer")) 

month_morning_laundry <-month_laundry %>%
  filter(timezone %in% c("Morning")) %>%
  group_by(year, month) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, month) 

month_afternoon_laundry <-month_laundry %>%
  filter(timezone %in% c("Afternoon")) %>%
  group_by(year, month) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, month) 

month_dusk_laundry <-month_laundry %>%
  filter(timezone %in% c("Dusk")) %>%
  group_by(year, month) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, month) 

month_night_laundry <-month_laundry %>%
  filter(timezone %in% c("Night")) %>%
  group_by(year, month) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, month) 

month_laundry<-month_morning_laundry %>% 
  left_join(month_afternoon_laundry, by = c("year","month")) %>%
  left_join(month_dusk_laundry, by = c("year","month")) %>%
  left_join(month_night_laundry, by = c("year","month")) 

month_laundry <- month_laundry %>%
  rename(morning_laundry = mean.x,
         afternoon_laundry = mean.y,
         dusk_laundry = mean.x.x,
         night_laundry = mean.y.y)

#3.3. SUB-METER 3 - WATER

month_water <-df %>%
  group_by(year, month, hour) %>%
  summarise(mean = mean(Sub_metering_1), .groups = "drop") %>%
  arrange(year, month, hour) 

month_water<- month_water %>%
  mutate(timezone = case_when(
    hour %in%  6:13 ~ "Morning",
    hour %in%  13:19  ~ "Afternoon",
    hour %in%  19:23  ~ "Dusk",
    hour %in%  c(23, 0, 1, 2, 3, 4, 5, 6)  ~ "Night")) 

month_water<- month_water %>%
  mutate(season = case_when(
    month %in%  9:11 ~ "Fall",
    month %in%  c(12, 1, 2)  ~ "Winter",
    month %in%  3:5  ~ "Spring",
    TRUE ~ "Summer")) 

month_morning_water <-month_water %>%
  filter(timezone %in% c("Morning")) %>%
  group_by(year, month) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, month) 

month_afternoon_water <-month_water %>%
  filter(timezone %in% c("Afternoon")) %>%
  group_by(year, month) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, month) 

month_dusk_water <-month_water %>%
  filter(timezone %in% c("Dusk")) %>%
  group_by(year, month) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, month) 

month_night_water <-month_water %>%
  filter(timezone %in% c("Night")) %>%
  group_by(year, month) %>%
  summarise(mean = mean(mean), .groups = "drop") %>%
  arrange(year, month) 

month_water<-month_morning_water %>% 
  left_join(month_afternoon_water, by = c("year","month")) %>%
  left_join(month_dusk_water, by = c("year","month")) %>%
  left_join(month_night_water, by = c("year","month")) 

month_water <- month_water %>%
  rename(morning_water = mean.x,
         afternoon_water = mean.y,
         dusk_water = mean.x.x,
         night_water = mean.y.y)

4. JOINING DATAFRAME AND CHANGING NAMES

total <- mean_per_week_water %>% 
  left_join(mean_per_week_kitchen, by = c("year", "week")) %>%
  left_join(mean_per_week_laundry, by = c("year", "week")) %>%
  left_join(mean_per_week_total, by = c("year", "week"))

total <- total %>%
  rename(mean_per_week_water = mean.x,
         mean_per_week_kitchen = mean.y,
         mean_per_week_laundry = mean.x.x,
         mean_per_week_total = mean.y.y)

total

## # A tibble: 159 × 6
##     year  week mean_per_week_water mean_per_week_kitchen mean_per_week_laundry
##    <dbl> <dbl>               <dbl>                 <dbl>                 <dbl>
##  1  2007     1                5.38                 0.581                  1.75
##  2  2007     2                8.27                 1.33                   1.88
##  3  2007     3                8.09                 2.32                   1.99
##  4  2007     4                7.15                 1.14                   1.48
##  5  2007     5                9.45                 1.15                   1.74
##  6  2007     6                7.12                 1.50                   1.74
##  7  2007     7                7.18                 1.44                   2.03
##  8  2007     8                6.21                 0.856                  1.56
##  9  2007     9                2.45                 0.146                  1.06
## 10  2007    10                6.55                 1.89                   3.09
## # ℹ 149 more rows
## # ℹ 1 more variable: mean_per_week_total <dbl>

total_month<-mean_per_month_water %>% 
  left_join(mean_per_month_kitchen, by = c("year", "month")) %>%
  left_join(mean_per_month_laundry, by = c("year", "month")) %>%
  left_join(mean_per_month_total, by = c("year", "month"))

## Warning in left_join(., mean_per_month_kitchen, by = c("year", "month")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 1 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

## Warning in left_join(., mean_per_month_laundry, by = c("year", "month")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 1 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

## Warning in left_join(., mean_per_month_total, by = c("year", "month")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 1 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

total_month <- total_month %>%
  rename(mean_per_month_water = mean.x,
         mean_per_month_kitchen = mean.y,
         mean_per_month_laundry = mean.x.x,
         mean_per_month_total= mean.y.y)


total_month <- mean_per_month_water %>% 
  left_join(mean_per_month_kitchen, by = c("year", "month", "week", "day", "weekday")) %>%
  left_join(mean_per_month_laundry, by = c("year", "month", "week", "day", "weekday")) %>%
  left_join(mean_per_month_total, by = c("year", "month", "week", "day", "weekday"))


total_month <- total_month %>%
  rename(mean_per_month_water = mean.x,
         mean_per_month_kitchen = mean.y,
         mean_per_month_laundry = mean.x.x,
         mean_per_month_total= mean.y.y)

total_month

## # A tibble: 1,094 × 9
##     year month  week   day weekday   mean_per_month_water mean_per_month_kitchen
##    <dbl> <dbl> <dbl> <int> <fct>                    <dbl>                  <dbl>
##  1  2007     1     1     1 Monday                    4.08                  0    
##  2  2007     1     1     2 Tuesday                   4.56                  0    
##  3  2007     1     1     3 Wednesday                 3.31                  0    
##  4  2007     1     1     4 Thursday                  7.57                  0.730
##  5  2007     1     1     5 Friday                    5.28                  1.03 
##  6  2007     1     1     6 Saturday                  3.94                  0.928
##  7  2007     1     1     7 Sunday                    8.90                  1.38 
##  8  2007     1     2     8 Monday                   12.2                   0    
##  9  2007     1     2     9 Tuesday                   6.80                  1.17 
## 10  2007     1     2    10 Wednesday                 7.65                  0.535
## # ℹ 1,084 more rows
## # ℹ 2 more variables: mean_per_month_laundry <dbl>, mean_per_month_total <dbl>

total_year<-mean_2007 %>% 
  left_join(mean_2008, by = c("weekday")) %>%
  left_join(mean_2009, by = c("weekday")) 


total_year

## # A tibble: 7 × 5
##   weekday   `2007` `2008`  year `2009`
##   <fct>      <dbl>  <dbl> <dbl>  <dbl>
## 1 Monday      8.04   7.04  2009   7.47
## 2 Tuesday     7.63   9.37  2009   8.38
## 3 Wednesday   8.86   8.31  2009   9.82
## 4 Thursday    7.40   6.41  2009   7.86
## 5 Friday      8.30   8.79  2009   8.51
## 6 Saturday   10.2    9.93  2009  11.8 
## 7 Sunday     10.3    8.92  2009   9.92

total_season<-season_1 %>% 
  left_join(season_2, by = c("year","season", "weekday")) %>%
  left_join(season_3, by = c("year", "season", "weekday"))

## Warning in left_join(., season_2, by = c("year", "season", "weekday")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 1 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

## Warning in left_join(., season_3, by = c("year", "season", "weekday")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 1 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

total_season <- total_season %>%
  rename(mean_per_season_kitchen = mean.x,
         mean_per_season_laundry = mean.y,
         mean_per_season_water = mean)

total_season

## # A tibble: 2,268 × 6
## # Groups:   year [3]
##     year weekday mean_per_season_kitchen season mean_per_season_laundry
##    <dbl> <fct>                     <dbl> <chr>                    <dbl>
##  1  2007 Monday                    0.455 Winter                   0.880
##  2  2007 Monday                    0.455 Winter                   0.880
##  3  2007 Monday                    0.455 Winter                   0.880
##  4  2007 Monday                    0.455 Winter                   1.08 
##  5  2007 Monday                    0.455 Winter                   1.08 
##  6  2007 Monday                    0.455 Winter                   1.08 
##  7  2007 Monday                    0.455 Winter                   0.916
##  8  2007 Monday                    0.455 Winter                   0.916
##  9  2007 Monday                    0.455 Winter                   0.916
## 10  2007 Tuesday                   0.677 Winter                   1.08 
## # ℹ 2,258 more rows
## # ℹ 1 more variable: mean_per_season_water <dbl>

total_hour<-mean_per_hour_kitchen %>% 
  left_join(mean_per_hour_laundry, by = c("year", "hour")) %>%
  left_join(mean_per_hour_water, by = c("year", "hour"))

total_hour <- total_hour %>%
  rename(mean_per_hour_kitchen = mean.x,
         mean_per_hour_laundry = mean.y,
         mean_per_hour_water = mean)

total_hour

## # A tibble: 72 × 5
##     year  hour mean_per_hour_kitchen mean_per_hour_laundry mean_per_hour_water
##    <dbl> <int>                 <dbl>                 <dbl>               <dbl>
##  1  2007     0                0.522                  0.809               3.13 
##  2  2007     1                0.223                  0.381               2.12 
##  3  2007     2                0.125                  0.339               1.28 
##  4  2007     3                0.0230                 0.322               0.720
##  5  2007     4                0                      0.316               0.785
##  6  2007     5                0                      0.295               1.03 
##  7  2007     6                0.0271                 0.316               3.35 
##  8  2007     7                0.187                  0.552               8.83 
##  9  2007     8                1.72                   1.15               11.4  
## 10  2007     9                1.61                   1.24               11.2  
## # ℹ 62 more rows

total_weekday<-mean_per_weekday_total %>% 
  left_join(mean_per_weekday_kitchen, by = c("year", "weekday")) %>%
  left_join(mean_per_weekday_laundry, by = c("year", "weekday")) %>%
  left_join(mean_per_weekday_water, by = c("year", "weekday"))

total_weekday <- total_weekday %>%
  rename(mean_per_weekday_total = mean.x,
         mean_per_weekday_kitchen = mean.y,
         mean_per_weekday_laundry = mean.x.x,
         mean_per_weekday_water= mean.y.y)

total_weekday

## # A tibble: 21 × 6
##     year weekday   mean_per_weekday_total mean_per_weekday_kitchen
##    <dbl> <fct>                      <dbl>                    <dbl>
##  1  2007 Monday                      8.04                    0.930
##  2  2007 Tuesday                     7.63                    0.893
##  3  2007 Wednesday                   8.86                    1.12 
##  4  2007 Thursday                    7.40                    0.951
##  5  2007 Friday                      8.30                    0.930
##  6  2007 Saturday                   10.2                     1.79 
##  7  2007 Sunday                     10.3                     2.05 
##  8  2008 Monday                      7.04                    0.710
##  9  2008 Tuesday                     9.37                    0.927
## 10  2008 Wednesday                   8.31                    1.13 
## # ℹ 11 more rows
## # ℹ 2 more variables: mean_per_weekday_laundry <dbl>,
## #   mean_per_weekday_water <dbl>

5.1. WATER HEATER & AC WEEKLY TIMESERIES

weekly_water <- ts(mean_per_week_water$mean, frequency=52.25, start=c(2007,1))
weekly_water<-(weekly_water)
autoplot(weekly_water) + xlab("Week") + ylab("Mean") +
  ggtitle("Water Heater & AC")

boxplot(mean_per_week_water ~ week, 
        data=total, 
        main="Water & AC",
        ylab="Mean", xlab="week")

class(total)

## [1] "tbl_df"     "tbl"        "data.frame"

5.1.1. WATER HEATER & AC WEEKLDAY TIMESERIES

weekday_water<-ts(mean_per_weekday_kitchen$mean, frequency=52.25, start=c(2007, 1))
plot(weekday_water)

autoplot(weekday_water) + xlab("Day of the Week") + ylab("Mean") +
  ggtitle("Water Heater & AC")

5.2. KITCHEN WEEKLY TIMESERIES

weekly_kitchen <- ts(mean_per_week_kitchen$mean, frequency=52.25, start=c(2007,1))

plot(weekly_kitchen)

autoplot(weekly_kitchen) + xlab("Week") + ylab("Mean") +
  ggtitle("Kitchen")

5.3. LAUNDRY ROOM TIMESERIES

weekly_laundry<- ts(mean_per_week_laundry$mean, frequency=52.25, start=c(2007,1))

plot(weekly_laundry)

autoplot(weekly_laundry) + xlab("Week") + ylab("Mean") +
  ggtitle("Laundry Room")

5.4. TOTAL WEEKLY TIMESERIES

weekly_total<- ts(mean_per_week_total$mean, frequency=52.25, start=c(2007,1))
plot(weekly_total)

5.5. TOTAL POWER CONSUMPTION BY YEAR

by_year<-plot_ly(total_year, x = ~total_year$weekday, y = ~total_year$"2007", 
        name = '2007', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~total_year$"2008" , name = '2008', mode = 'lines') %>%
  add_trace(y = ~total_year$"2009", name = '2009', mode = 'lines') %>%
  layout(title = "Total Power Consumption by Year",
         xaxis = list(title = "Weekday"),
         yaxis = list (title = "Power (watt-hours)"))

by_year

5.6. 2007 CONSUMPTION BY SUBMETER weekday

weekdayy_2007 <- filter(total_weekday, (year == 2007))
weekday_2007<-plot_ly(weekdayy_2007, x = ~weekdayy_2007$weekday, y = ~weekdayy_2007$"mean_per_weekday_water", 
         name = 'mean_per_weekday_water ', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~weekdayy_2007$"mean_per_weekday_kitchen" , name = 'mean_per_weekday_kitchen ', mode = 'lines') %>%
  add_trace(y = ~weekdayy_2007$"mean_per_weekday_laundry", name = 'mean_per_weekday_laundry ', mode = 'lines') %>%
  layout(title = "Power Consumption by Sub-meter 2007",
         xaxis = list(title = "Weekday"),
         yaxis = list (title = "Power (watt-hours)"))
weekday_2007

5.6. 2008 CONSUMPTION BY SUBMETER weekday

weekdayy_2008 <- filter(total_weekday, (year == 2008))
weekday_2008<-plot_ly(weekdayy_2008, x = ~weekdayy_2008$weekday, y = ~weekdayy_2008$"mean_per_weekday_water", 
                      name = 'mean_per_weekday_water ', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~weekdayy_2008$"mean_per_weekday_kitchen" , name = 'mean_per_weekday_kitchen ', mode = 'lines') %>%
  add_trace(y = ~weekdayy_2008$"mean_per_weekday_laundry", name = 'mean_per_weekday_laundry ', mode = 'lines') %>%
  layout(title = "Power Consumption by Sub-meter 2008",
         xaxis = list(title = "Weekday"),
         yaxis = list (title = "Power (watt-hours)"))
weekday_2008

5.6. 2009 CONSUMPTION BY SUBMETER weekday

weekdayy_2009 <- filter(total_weekday, (year == 2009))
weekday_2009<-plot_ly(weekdayy_2009, x = ~weekdayy_2009$weekday, y = ~weekdayy_2009$"mean_per_weekday_water", 
                      name = 'mean_per_weekday_water ', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~weekdayy_2009$"mean_per_weekday_kitchen" , name = 'mean_per_weekday_kitchen ', mode = 'lines') %>%
  add_trace(y = ~weekdayy_2009$"mean_per_weekday_laundry", name = 'mean_per_weekday_laundry ', mode = 'lines') %>%
  layout(title = "Power Consumption by Sub-meter 2009",
         xaxis = list(title = "Weekday"),
         yaxis = list (title = "Power (watt-hours)"))
weekday_2009

5.6. PLOT BY SEASON (HUE), SUB_METER1, YEAR

kitchen_season <- filter(seasonal_kitchen, (year == 2007))
kitchen_season_2007<-plot_ly(kitchen_season, x = ~kitchen_season$weekday, y = ~kitchen_season$"mean_per_kitchen_winter", 
                             name = 'WINTER', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~kitchen_season$"mean_per_kitchen_spring" , name = 'SPRING', mode = 'lines') %>%
  add_trace(y = ~kitchen_season$"mean_per_kitchen_summer", name = 'SUMMER', mode = 'lines') %>%
  add_trace(y = ~kitchen_season$"mean_per_kitchen_fall", name = 'FALL', mode = 'lines') %>%
  layout(title = "Power Consumption kitchen 2007",
         xaxis = list(title = "Weekday"),
         yaxis = list (title = "Power (watt-hours)"))

kitchen_seasonn <- filter(seasonal_kitchen, (year == 2008))
kitchen_season_2008<-plot_ly(kitchen_seasonn, x = ~kitchen_seasonn$weekday, y = ~kitchen_seasonn$"mean_per_kitchen_winter", 
                             name = 'WINTER', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~kitchen_seasonn$"mean_per_kitchen_spring" , name = 'SPRING', mode = 'lines') %>%
  add_trace(y = ~kitchen_seasonn$"mean_per_kitchen_summer", name = 'SUMMER', mode = 'lines') %>%
  add_trace(y = ~kitchen_seasonn$"mean_per_kitchen_fall", name = 'FALL', mode = 'lines') %>%
  layout(title = "Power Consumption kitchen 2008",
         xaxis = list(title = "Weekday"),
         yaxis = list (title = "Power (watt-hours)"))

kitchen_seasonnn <- filter(seasonal_kitchen, (year == 2009))
kitchen_season_2009<-plot_ly(kitchen_seasonnn, x = ~kitchen_seasonnn$weekday, y = ~kitchen_seasonnn$"mean_per_kitchen_winter", 
                             name = 'WINTER', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~kitchen_seasonnn$"mean_per_kitchen_spring" , name = 'SPRING', mode = 'lines') %>%
  add_trace(y = ~kitchen_seasonnn$"mean_per_kitchen_summer", name = 'SUMMER', mode = 'lines') %>%
  add_trace(y = ~kitchen_seasonnn$"mean_per_kitchen_fall", name = 'FALL', mode = 'lines') %>%
  layout(title = "Power Consumption kitchen 2009",
         xaxis = list(title = "Weekday"),
         yaxis = list (title = "Power (watt-hours)"))
kitchen_season_2007

5.6. PLOT BY SEASON (HUE), SUB_METER2, YEAR

laundry_season <- filter(seasonal_laundry, (year == 2007))
laundry_season_2007<-plot_ly(laundry_season, x = ~laundry_season$weekday, y = ~laundry_season$"mean_per_laundry_winter", 
                             name = 'WINTER', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~laundry_season$"mean_per_laundry_spring" , name = 'SPRING', mode = 'lines') %>%
  add_trace(y = ~laundry_season$"mean_per_laundry_summer", name = 'SUMMER', mode = 'lines') %>%
  add_trace(y = ~laundry_season$"mean_per_laundry_fall", name = 'FALL', mode = 'lines') %>%
  layout(title = "Power Consumption LAUNDRY 2007",
         xaxis = list(title = "Weekday"),
         yaxis = list (title = "Power (watt-hours)"))

laundry_seasonn <- filter(seasonal_laundry, (year == 2008))
laundry_season_2008<-plot_ly(laundry_seasonn, x = ~laundry_seasonn$weekday, y = ~laundry_seasonn$"mean_per_laundry_winter", 
                             name = 'WINTER', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~laundry_seasonn$"mean_per_laundry_spring" , name = 'SPRING', mode = 'lines') %>%
  add_trace(y = ~laundry_seasonn$"mean_per_laundry_summer", name = 'SUMMER', mode = 'lines') %>%
  add_trace(y = ~laundry_seasonn$"mean_per_laundry_fall", name = 'FALL', mode = 'lines') %>%
  layout(title = "Power Consumption LAUNDRY 2008",
         xaxis = list(title = "Weekday"),
         yaxis = list (title = "Power (watt-hours)"))

laundry_seasonnn <- filter(seasonal_laundry, (year == 2009))
laundry_season_2009<-plot_ly(laundry_seasonnn, x = ~laundry_seasonnn$weekday, y = ~laundry_seasonnn$"mean_per_laundry_winter", 
                             name = 'WINTER', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~laundry_seasonnn$"mean_per_laundry_spring" , name = 'SPRING', mode = 'lines') %>%
  add_trace(y = ~laundry_seasonnn$"mean_per_laundry_summer", name = 'SUMMER', mode = 'lines') %>%
  add_trace(y = ~laundry_seasonnn$"mean_per_laundry_fall", name = 'FALL', mode = 'lines') %>%
  layout(title = "Power Consumption LAUNDRY 2009",
         xaxis = list(title = "Weekday"),
         yaxis = list (title = "Power (watt-hours)"))
laundry_season_2007

5.7. PLOT BY SEASON (HUE), SUB_METER3, YEAR

water_season <- filter(seasonal_water, (year == 2007))
water_season_2007<-plot_ly(water_season, x = ~water_season$weekday, y = ~water_season$"mean_per_water_winter", 
                             name = 'WINTER', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~water_season$"mean_per_water_spring" , name = 'SPRING', mode = 'lines') %>%
  add_trace(y = ~water_season$"mean_per_water_summer", name = 'SUMMER', mode = 'lines') %>%
  add_trace(y = ~water_season$"mean_per_water_fall", name = 'FALL', mode = 'lines') %>%
  layout(title = "Power Consumption WATER 2007",
         xaxis = list(title = "Weekday"),
         yaxis = list (title = "Power (watt-hours)"))

water_seasonn <- filter(seasonal_water, (year == 2008))
water_season_2008<-plot_ly(water_seasonn, x = ~water_seasonn$weekday, y = ~water_seasonn$"mean_per_water_winter", 
                             name = 'WINTER', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~water_seasonn$"mean_per_water_spring" , name = 'SPRING', mode = 'lines') %>%
  add_trace(y = ~water_seasonn$"mean_per_water_summer", name = 'SUMMER', mode = 'lines') %>%
  add_trace(y = ~water_seasonn$"mean_per_water_fall", name = 'FALL', mode = 'lines') %>%
  layout(title = "Power Consumption WATER 2008",
         xaxis = list(title = "Weekday"),
         yaxis = list (title = "Power (watt-hours)"))

water_seasonnn <- filter(seasonal_water, (year == 2009))
water_season_2009<-plot_ly(water_seasonnn, x = ~water_seasonnn$weekday, y = ~water_seasonnn$"mean_per_water_winter", 
                             name = 'WINTER', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~water_seasonnn$"mean_per_water_spring" , name = 'SPRING', mode = 'lines') %>%
  add_trace(y = ~water_seasonnn$"mean_per_water_summer", name = 'SUMMER', mode = 'lines') %>%
  add_trace(y = ~water_seasonnn$"mean_per_water_fall", name = 'FALL', mode = 'lines') %>%
  layout(title = "Power Consumption WATER 2009",
         xaxis = list(title = "Weekday"),
         yaxis = list (title = "Power (watt-hours)"))
water_season_2007

#6. PLOT BY HOUR

hour_207 <- filter(total_hour, (year == 2007))
hour_2007<-plot_ly(hour_207, x = ~hour_207$hour, y = ~hour_207$"mean_per_hour_kitchen", 
                           name = 'KITCHEN', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~hour_207$"mean_per_hour_laundry" , name = 'LAUNDRY', mode = 'lines') %>%
  add_trace(y = ~hour_207$"mean_per_hour_water", name = 'WATER', mode = 'lines') %>%
  layout(title = "Power Consumption WATER by hour 2007",
         xaxis = list(title = "hour"),
         yaxis = list (title = "Power (watt-hours)"))

hour_208 <- filter(total_hour, (year == 2008))
hour_2008<-plot_ly(hour_208, x = ~hour_208$hour, y = ~hour_208$"mean_per_hour_kitchen", 
                   name = 'KITCHEN', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~hour_208$"mean_per_hour_laundry" , name = 'LAUNDRY', mode = 'lines') %>%
  add_trace(y = ~hour_208$"mean_per_hour_water", name = 'WATER', mode = 'lines') %>%
  layout(title = "Power Consumption by hour 2008",
         xaxis = list(title = "hour"),
         yaxis = list (title = "Power (watt-hours)"))

hour_209 <- filter(total_hour, (year == 2009))
hour_2009<-plot_ly(hour_209, x = ~hour_209$hour, y = ~hour_209$"mean_per_hour_kitchen", 
                   name = 'KITCHEN', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~hour_209$"mean_per_hour_laundry" , name = 'LAUNDRY', mode = 'lines') %>%
  add_trace(y = ~hour_209$"mean_per_hour_water", name = 'WATER', mode = 'lines') %>%
  layout(title = "Power Consumption by hour 2009",
         xaxis = list(title = "hour"),
         yaxis = list (title = "Power (watt-hours)"))
hour_2007

hour_2008

hour_2009

5.6. PLOT BY HOUR (HUE), SUB_METER1, YEAR

hourly_kitchen <- filter(hour_kitchen, (year == 2007))
ly_2007<-plot_ly(hourly_kitchen, x = ~hourly_kitchen$weekday, y = ~hourly_kitchen$"morning_kitchen", 
                   name = 'Morning', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~hourly_kitchen$"midday_kitchen" , name = 'MidDay', mode = 'lines') %>%
  add_trace(y = ~hourly_kitchen$"afternoon_kitchen", name = 'Afternoon', mode = 'lines') %>%
  add_trace(y = ~hourly_kitchen$"noon_kitchen", name = 'Noon', mode = 'lines') %>%
  add_trace(y = ~hourly_kitchen$"night_kitchen", name = 'Night', mode = 'lines') %>%
  layout(title = "Power Consumption by hour Kitchen 2007",
         xaxis = list(title = "hour"),
         yaxis = list (title = "Power (watt-hours)"))

hourlyy_kitchen <- filter(hour_kitchen, (year == 2008))
ly_2008<-plot_ly(hourlyy_kitchen, x = ~hourlyy_kitchen$weekday, y = ~hourlyy_kitchen$"morning_kitchen", 
                 name = 'Morning', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~hourlyy_kitchen$"midday_kitchen" , name = 'MidDay', mode = 'lines') %>%
  add_trace(y = ~hourlyy_kitchen$"afternoon_kitchen", name = 'Afternoon', mode = 'lines') %>%
  add_trace(y = ~hourlyy_kitchen$"noon_kitchen", name = 'Noon', mode = 'lines') %>%
  add_trace(y = ~hourlyy_kitchen$"night_kitchen", name = 'Night', mode = 'lines') %>%
  layout(title = "Power Consumption by hour Kitchen 2008",
         xaxis = list(title = "hour"),
         yaxis = list (title = "Power (watt-hours)"))

hourlyyy_kitchen <- filter(hour_kitchen, (year == 2009))
ly_2009<-plot_ly(hourlyyy_kitchen, x = ~hourlyyy_kitchen$weekday, y = ~hourlyyy_kitchen$"morning_kitchen", 
                 name = 'Morning', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~hourlyyy_kitchen$"midday_kitchen" , name = 'MidDay', mode = 'lines') %>%
  add_trace(y = ~hourlyyy_kitchen$"afternoon_kitchen", name = 'Afternoon', mode = 'lines') %>%
  add_trace(y = ~hourlyyy_kitchen$"noon_kitchen", name = 'Noon', mode = 'lines') %>%
  add_trace(y = ~hourlyyy_kitchen$"night_kitchen", name = 'Night', mode = 'lines') %>%
  layout(title = "Power Consumption by hour Kitchen 2009",
         xaxis = list(title = "hour"),
         yaxis = list (title = "Power (watt-hours)"))
ly_2007

ly_2008

ly_2009

5.7. PLOT BY HOUR (HUE), SUB_METER2, YEAR

hourly_laundry <- filter(hour_laundry, (year == 2007))
lyy_2007<-plot_ly(hourly_laundry, x = ~hourly_laundry$weekday, y = ~hourly_laundry$"morning_laundry", 
                 name = 'Morning', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~hourly_laundry$"midday_laundry" , name = 'MidDay', mode = 'lines') %>%
  add_trace(y = ~hourly_laundry$"afternoon_laundry", name = 'Afternoon', mode = 'lines') %>%
  add_trace(y = ~hourly_laundry$"noon_laundry", name = 'Noon', mode = 'lines') %>%
  add_trace(y = ~hourly_laundry$"night_laundry", name = 'Night', mode = 'lines') %>%
  layout(title = "Power Consumption by hour Laundry Room 2007",
         xaxis = list(title = "hour"),
         yaxis = list (title = "Power (watt-hours)"))

hourlyy_laundry <- filter(hour_laundry, (year == 2008))
lyy_2008<-plot_ly(hourlyy_laundry, x = ~hourlyy_laundry$weekday, y = ~hourlyy_laundry$"morning_laundry", 
                 name = 'Morning', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~hourlyy_laundry$"midday_laundry" , name = 'MidDay', mode = 'lines') %>%
  add_trace(y = ~hourlyy_laundry$"afternoon_laundry", name = 'Afternoon', mode = 'lines') %>%
  add_trace(y = ~hourlyy_laundry$"noon_laundry", name = 'Noon', mode = 'lines') %>%
  add_trace(y = ~hourlyy_laundry$"night_laundry", name = 'Night', mode = 'lines') %>%
  layout(title = "Power Consumption by hour Laundry Room 2008",
         xaxis = list(title = "hour"),
         yaxis = list (title = "Power (watt-hours)"))

hourlyyy_laundry <- filter(hour_laundry, (year == 2009))
lyy_2009<-plot_ly(hourlyyy_laundry, x = ~hourlyyy_laundry$weekday, y = ~hourlyyy_laundry$"morning_laundry", 
                 name = 'Morning', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~hourlyyy_laundry$"midday_laundry" , name = 'MidDay', mode = 'lines') %>%
  add_trace(y = ~hourlyyy_laundry$"afternoon_laundry", name = 'Afternoon', mode = 'lines') %>%
  add_trace(y = ~hourlyyy_laundry$"noon_laundry", name = 'Noon', mode = 'lines') %>%
  add_trace(y = ~hourlyyy_laundry$"night_laundry", name = 'Night', mode = 'lines') %>%
  layout(title = "Power Consumption by hour Laundry Room 2009",
         xaxis = list(title = "hour"),
         yaxis = list (title = "Power (watt-hours)"))

lyy_2007

lyy_2008

lyy_2009

5.7. PLOT BY HOUR (HUE), SUB_METER3, YEAR

hourly_water <- filter(hour_water, (year == 2007))
lyyy_2007<-plot_ly(hourly_water, x = ~hourly_water$weekday, y = ~hourly_water$"morning_water", 
                  name = 'Morning', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~hourly_water$"midday_water" , name = 'MidDay', mode = 'lines') %>%
  add_trace(y = ~hourly_water$"afternoon_water", name = 'Afternoon', mode = 'lines') %>%
  add_trace(y = ~hourly_water$"noon_water", name = 'Noon', mode = 'lines') %>%
  add_trace(y = ~hourly_water$"night_water", name = 'Night', mode = 'lines') %>%
  layout(title = "Power Consumption by hour Water Heater & AC 2007",
         xaxis = list(title = "hour"),
         yaxis = list (title = "Power (watt-hours)"))

hourlyy_water <- filter(hour_water, (year == 2008))
lyyy_2008<-plot_ly(hourlyy_water, x = ~hourlyy_water$weekday, y = ~hourlyy_water$"morning_water", 
                  name = 'Morning', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~hourlyy_water$"midday_water" , name = 'MidDay', mode = 'lines') %>%
  add_trace(y = ~hourlyy_water$"afternoon_water", name = 'Afternoon', mode = 'lines') %>%
  add_trace(y = ~hourlyy_water$"noon_water", name = 'Noon', mode = 'lines') %>%
  add_trace(y = ~hourlyy_water$"night_water", name = 'Night', mode = 'lines') %>%
  layout(title = "Power Consumption by hour Water Heater & AC 2008",
         xaxis = list(title = "hour"),
         yaxis = list (title = "Power (watt-hours)"))

hourlyyy_water <- filter(hour_water, (year == 2009))
lyyy_2009<-plot_ly(hourlyyy_water, x = ~hourlyyy_water$weekday, y = ~hourlyyy_water$"morning_water", 
                  name = 'Morning', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~hourlyyy_water$"midday_water" , name = 'MidDay', mode = 'lines') %>%
  add_trace(y = ~hourlyyy_water$"afternoon_water", name = 'Afternoon', mode = 'lines') %>%
  add_trace(y = ~hourlyyy_water$"noon_water", name = 'Noon', mode = 'lines') %>%
  add_trace(y = ~hourlyyy_water$"night_water", name = 'Night', mode = 'lines') %>%
  layout(title = "Power Consumption by hour Water Heater & AC 2009",
         xaxis = list(title = "hour"),
         yaxis = list (title = "Power (watt-hours)"))

lyyy_2007

lyyy_2008

lyyy_2009

5.8. PLOT BY HOUR (HUE), SUB_METER1, MONTH

monthly_kitchen <- filter(month_kitchen, (year == 2007))
thly_2007<-plot_ly(monthly_kitchen , x = ~monthly_kitchen$month, y = ~monthly_kitchen$"morning_kitchen", 
                 name = 'Morning', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~monthly_kitchen$"afternoon_kitchen", name = 'Afternoon', mode = 'lines') %>%
  add_trace(y = ~monthly_kitchen$"dusk_kitchen", name = 'Dusk', mode = 'lines') %>%
  add_trace(y = ~monthly_kitchen$"night_kitchen", name = 'Night', mode = 'lines') %>%
  layout(title = "Power Consumption by hour Kitchen 2007",
         xaxis = list(title = "Month"),
         yaxis = list (title = "Power (watt-hours)"))

monthly_kitchen <- filter(month_kitchen, (year == 2008))
thly_2008<-plot_ly(monthly_kitchen , x = ~monthly_kitchen$month, y = ~monthly_kitchen$"morning_kitchen", 
                   name = 'Morning', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~monthly_kitchen$"afternoon_kitchen", name = 'Afternoon', mode = 'lines') %>%
  add_trace(y = ~monthly_kitchen$"dusk_kitchen", name = 'Dusk', mode = 'lines') %>%
  add_trace(y = ~monthly_kitchen$"night_kitchen", name = 'Night', mode = 'lines') %>%
  layout(title = "Power Consumption by hour Kitchen 2008",
         xaxis = list(title = "Month"),
         yaxis = list (title = "Power (watt-hours)"))

monthly_kitchen <- filter(month_kitchen, (year == 2009))
thly_2009<-plot_ly(monthly_kitchen , x = ~monthly_kitchen$month, y = ~monthly_kitchen$"morning_kitchen", 
                   name = 'Morning', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~monthly_kitchen$"afternoon_kitchen", name = 'Afternoon', mode = 'lines') %>%
  add_trace(y = ~monthly_kitchen$"dusk_kitchen", name = 'Dusk', mode = 'lines') %>%
  add_trace(y = ~monthly_kitchen$"night_kitchen", name = 'Night', mode = 'lines') %>%
  layout(title = "Power Consumption by hour Kitchen 2009",
         xaxis = list(title = "Month"),
         yaxis = list (title = "Power (watt-hours)"))
thly_2007

thly_2008

thly_2009

5.8. PLOT BY HOUR (HUE), SUB_METER2, MONTH

monthly_laundry <- filter(month_laundry, (year == 2007))
thlyy_2007<-plot_ly(monthly_laundry , x = ~monthly_laundry$month, y = ~monthly_laundry$"morning_laundry", 
                   name = 'Morning', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~monthly_laundry$"afternoon_laundry", name = 'Afternoon', mode = 'lines') %>%
  add_trace(y = ~monthly_laundry$"dusk_laundry", name = 'Dusk', mode = 'lines') %>%
  add_trace(y = ~monthly_laundry$"night_laundry", name = 'Night', mode = 'lines') %>%
  layout(title = "Power Consumption by hour laundry 2007",
         xaxis = list(title = "Month"),
         yaxis = list (title = "Power (watt-hours)"))

monthly_laundry <- filter(month_laundry, (year == 2008))
thlyy_2008<-plot_ly(monthly_laundry , x = ~monthly_laundry$month, y = ~monthly_laundry$"morning_laundry", 
                    name = 'Morning', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~monthly_laundry$"afternoon_laundry", name = 'Afternoon', mode = 'lines') %>%
  add_trace(y = ~monthly_laundry$"dusk_laundry", name = 'Dusk', mode = 'lines') %>%
  add_trace(y = ~monthly_laundry$"night_laundry", name = 'Night', mode = 'lines') %>%
  layout(title = "Power Consumption by hour laundry 2008",
         xaxis = list(title = "Month"),
         yaxis = list (title = "Power (watt-hours)"))

monthly_laundry <- filter(month_laundry, (year == 2009))
thlyy_2009<-plot_ly(monthly_laundry , x = ~monthly_laundry$month, y = ~monthly_laundry$"morning_laundry", 
                    name = 'Morning', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~monthly_laundry$"afternoon_laundry", name = 'Afternoon', mode = 'lines') %>%
  add_trace(y = ~monthly_laundry$"dusk_laundry", name = 'Dusk', mode = 'lines') %>%
  add_trace(y = ~monthly_laundry$"night_laundry", name = 'Night', mode = 'lines') %>%
  layout(title = "Power Consumption by hour laundry 2009",
         xaxis = list(title = "Month"),
         yaxis = list (title = "Power (watt-hours)"))
thlyy_2007

thlyy_2008

thlyy_2009

5.8. PLOT BY HOUR (HUE), SUB_METER3, MONTH

monthly_water <- filter(month_water, (year == 2007))
thlyyy_2007<-plot_ly(monthly_water , x = ~monthly_water$month, y = ~monthly_water$"morning_water", 
                    name = 'Morning', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~monthly_water$"afternoon_water", name = 'Afternoon', mode = 'lines') %>%
  add_trace(y = ~monthly_water$"dusk_water", name = 'Dusk', mode = 'lines') %>%
  add_trace(y = ~monthly_water$"night_water", name = 'Night', mode = 'lines') %>%
  layout(title = "Power Consumption by hour water 2007",
         xaxis = list(title = "Month"),
         yaxis = list (title = "Power (watt-hours)"))

monthly_water <- filter(month_water, (year == 2008))
thlyyy_2008<-plot_ly(monthly_water , x = ~monthly_water$month, y = ~monthly_water$"morning_water", 
                    name = 'Morning', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~monthly_water$"afternoon_water", name = 'Afternoon', mode = 'lines') %>%
  add_trace(y = ~monthly_water$"dusk_water", name = 'Dusk', mode = 'lines') %>%
  add_trace(y = ~monthly_water$"night_water", name = 'Night', mode = 'lines') %>%
  layout(title = "Power Consumption by hour water 2008",
         xaxis = list(title = "Month"),
         yaxis = list (title = "Power (watt-hours)"))

monthly_water <- filter(month_water, (year == 2009))
thlyyy_2009<-plot_ly(monthly_water , x = ~monthly_water$month, y = ~monthly_water$"morning_water", 
                    name = 'Morning', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~monthly_water$"afternoon_water", name = 'Afternoon', mode = 'lines') %>%
  add_trace(y = ~monthly_water$"dusk_water", name = 'Dusk', mode = 'lines') %>%
  add_trace(y = ~monthly_water$"night_water", name = 'Night', mode = 'lines') %>%
  layout(title = "Power Consumption by hour water 2009",
         xaxis = list(title = "Month"),
         yaxis = list (title = "Power (watt-hours)"))
thlyyy_2007

thlyyy_2008

thlyyy_2009

5.7. HOUR BY WEEK

hourly_week_kitchen <- filter(hour_week_kitchen, (year == 2007))
ly_week_2007<-plot_ly(hourly_week_kitchen, x = ~hourly_week_kitchen$week, y = ~hourly_week_kitchen$"morning_week_kitchen", 
                 name = 'Morning', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~hourly_week_kitchen$"midday_weeK_kitchen" , name = 'MidDay', mode = 'lines') %>%
  add_trace(y = ~hourly_week_kitchen$"afternoon_week_kitchen", name = 'Afternoon', mode = 'lines') %>%
  add_trace(y = ~hourly_week_kitchen$"noon_week_kitchen", name = 'Noon', mode = 'lines') %>%
  add_trace(y = ~hourly_week_kitchen$"night_week_kitchen", name = 'Night', mode = 'lines') %>%
  layout(title = "Power Consumption by hour Kitchen 2007",
         xaxis = list(title = "hour"),
         yaxis = list (title = "Power (watt-hours)"))
ly_week_2007

5.5. PLOTLY (PLOTS OF THREE SUB-METERS AND TOTAL TOGETHER)

Plot all of three sub-meters year 2007

2007 - PER WEEK

all_2007 <- filter(total, (year == 2007))
## Plot sub-meter 1, 2 and 3 with title, legend and labels 
year_2007<-plot_ly(all_2007, x = ~all_2007$week, y = ~all_2007$mean_per_week_water, name = 'Water Heater & AC', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~all_2007$mean_per_week_kitchen , name = 'Kitchen', mode = 'lines') %>%
  add_trace(y = ~all_2007$mean_per_week_laundry, name = 'Laundry RoomC', mode = 'lines') %>%
  layout(title = "Power Consumption 2007",
         xaxis = list(title = "Week"),
         yaxis = list (title = "Power (watt-hours)"))
year_2007

2008 - PER WEEK

all_2008 <- filter(total, (year == 2008))
## Plot sub-meter 1, 2 and 3 with title, legend and labels 
year_2008<-plot_ly(all_2008, x = ~all_2008$week, y = ~all_2008$mean_per_week_water, name = 'Water Heater & AC', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~all_2008$mean_per_week_kitchen , name = 'Kitchen', mode = 'lines') %>%
  add_trace(y = ~all_2008$mean_per_week_laundry, name = 'Laundry Room', mode = 'lines') %>%
  layout(title = "Power Consumption 2008",
         xaxis = list(title = "Week"),
         yaxis = list (title = "Power (watt-hours)"))
year_2008

2009 - PER WEEK

all_2009 <- filter(total, (year == 2009))
## Plot sub-meter 1, 2 and 3 with title, legend and labels 
year_2009<-plot_ly(all_2009, x = ~all_2009$week, y = ~all_2009$mean_per_week_water, name = 'Water Heater & AC', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~all_2009$mean_per_week_kitchen , name = 'Kitchen', mode = 'lines') %>%
  add_trace(y = ~all_2009$mean_per_week_laundry, name = 'Laundry Room', mode = 'lines') %>%
  layout(title = "Power Consumption 2009",
         xaxis = list(title = "Week"),
         yaxis = list (title = "Power (watt-hours)"))
year_2009

Data Insights and Recommendations

Based on the analysis of the power consumption data, the following insights and recommendations were derived:

AC power consumption peaks around noon, suggesting the need to adjust temperature settings when leaving home.
A decreasing trend in Water Heater & AC consumption was observed in 2008, indicating a potential for power savings in the future.
Power consumption in the kitchen varies, likely due to dishwasher usage.
Adding outdoor temperature data to the dataset would provide insights based on temperature changes.
Separating Water Heater and AC into two different sub-meters would make it easier to understand power consumption for these appliances.
Generating a consumption comparison between the current month and the previous month can help homeowners identify behaviors that lead to lower power usage.
Developing a reminder system to alert homeowners when power consumption reaches a certain threshold would encourage energy-saving habits.
Creating an app for homeowners to monitor power consumption in real-time can promote awareness and enable informed decisions.
Generating reports on the amount of power saved through the use of sub-meters would showcase the benefits to the utility company.

5.6. DECOMPOSITION OF TIMESERIES

decomp_water <- decompose(weekly_water)
plot(decomp_water)

decomp_kitchen<- decompose(weekly_kitchen)
plot(decomp_kitchen)

decomp_laundry<- decompose(weekly_laundry)
plot(decomp_laundry)

decomp_total<- decompose(weekly_total)
plot(decomp_total)

weekly <- total[, c("mean_per_week_total", "mean_per_week_water",
                    "mean_per_week_kitchen", "mean_per_week_laundry")]
datacheck<- total[, c("mean_per_week_total")]

Forecasting and Analysis Summary

The forecasting and analysis process included cleaning and preprocessing the dataset, followed by exploratory data analysis. The findings can be summarized as follows:

Linear models were a good fit for monthly data but performed poorly for daily frequency predictions.
Holt-Winters approach showed promise in predicting monthly and hourly consumption trends, but the wide confidence intervals limited their usefulness.

Unknown

The accuracy of predictions can be improved by incorporating additional variables such as the number of residents, outdoor temperature, electricity price, and household income changes.
Alternative modeling approaches, such as random forest or gradient boosting, could be explored for more accurate predictions.

#TAKING AWAY THE TREND FOR TOTAL SUB_METER:

weekly_total <- total[, c("mean_per_week_total")]
weekly_total_frequency<-ts(weekly_total, start=c(2007, 1), frequency=52)
DY<-diff(weekly_total_frequency)
autoplot(DY)

ggseasonplot(DY)

ggsubseriesplot(DY)

fit<-snaive(DY)
print(summary(fit))

## 
## Forecast method: Seasonal naive method
## 
## Model Information:
## Call: snaive(y = DY) 
## 
## Residual sd: 2.9364 
## 
## Error measures:
##                         ME     RMSE      MAE      MPE     MAPE MASE      ACF1
## Training set -0.0006972522 2.936427 2.114023 275.0587 785.2804    1 -0.279661
## 
## Forecasts:
##          Point Forecast      Lo 80      Hi 80      Lo 95     Hi 95
## 2010.058    0.852302032 -2.9108804  4.6154844  -4.902989  6.607593
## 2010.077    1.880832888 -1.8823495  5.6440153  -3.874458  7.636124
## 2010.096   -1.691765873 -5.4549483  2.0714165  -7.447057  4.063525
## 2010.115    1.318258809 -2.4449236  5.0814412  -4.437032  7.073550
## 2010.135    0.938388016 -2.8247944  4.7015704  -4.816903  6.693679
## 2010.154   -4.567146891 -8.3303293 -0.8039645 -10.322438  1.188144
## 2010.173    0.171809589 -3.5913728  3.9349920  -5.583481  5.927100
## 2010.192    1.347579607 -2.4156028  5.1107620  -4.407711  7.102870
## 2010.212    1.122856901 -2.6403255  4.8860393  -4.632434  6.878148
## 2010.231   -0.413461127 -4.1766435  3.3497213  -6.168752  5.341830
## 2010.250   -1.488523000 -5.2517054  2.2746594  -7.243814  4.266768
## 2010.269    0.591057765 -3.1721246  4.3542402  -5.164233  6.346349
## 2010.288   -0.246811734 -4.0099941  3.5163707  -6.002103  5.508479
## 2010.308    2.350598841 -1.4125835  6.1137812  -3.404692  8.105890
## 2010.327   -2.661213920 -6.4243963  1.1019685  -8.416505  3.094077
## 2010.346   -0.126289683 -3.8894721  3.6368927  -5.881580  5.629001
## 2010.365    0.847916667 -2.9152657  4.6110991  -4.907374  6.603207
## 2010.385   -0.524796252 -4.2879786  3.2383861  -6.280087  5.230495
## 2010.404    0.631542284 -3.1316401  4.3947247  -5.123749  6.386833
## 2010.423   -0.777805304 -4.5409877  2.9853771  -6.533096  4.977486
## 2010.442   -0.662075648 -4.4252580  3.1011067  -6.417366  5.093215
## 2010.462    0.175496032 -3.5876864  3.9386784  -5.579795  5.930787
## 2010.481   -0.543146562 -4.3063290  3.2200358  -6.298437  5.212144
## 2010.500    0.134217990 -3.6289644  3.8974004  -5.621073  5.889509
## 2010.519   -0.418154762 -4.1813372  3.3450276  -6.173446  5.337136
## 2010.538   -0.274900794 -4.0380832  3.4882816  -6.030192  5.480390
## 2010.558   -2.178714193 -5.9418966  1.5844682  -7.934005  3.576577
## 2010.577   -0.705908823 -4.4690912  3.0572736  -6.461200  5.049382
## 2010.596    0.542261905 -3.2209205  4.3054443  -5.213029  6.297553
## 2010.615   -0.833928571 -4.5971110  2.9292538  -6.589219  4.921362
## 2010.635    0.741567460 -3.0216149  4.5047498  -5.013723  6.496858
## 2010.654    1.650652469 -2.1125299  5.4138349  -4.104638  7.405943
## 2010.673    0.855200706 -2.9079817  4.6183831  -4.900090  6.610492
## 2010.692   -0.048511905 -3.8116943  3.7146705  -5.803803  5.706779
## 2010.712    0.901884921 -2.8612975  4.6650673  -4.853406  6.657176
## 2010.731    1.685409609 -2.0777728  5.4485920  -4.069881  7.440700
## 2010.750   -0.022512784 -3.7856952  3.7406696  -5.777804  5.732778
## 2010.769   -0.681006877 -4.4441893  3.0821755  -6.436298  5.074284
## 2010.788   -0.272564552 -4.0357469  3.4906178  -6.027855  5.482726
## 2010.808    1.145347036 -2.6178354  4.9085294  -4.609944  6.900638
## 2010.827    0.061101377 -3.7020810  3.8242838  -5.694189  5.816392
## 2010.846   -0.295039683 -4.0582221  3.4681427  -6.050330  5.460251
## 2010.865   -0.051289683 -3.8144721  3.7118927  -5.806580  5.704001
## 2010.885    0.003553219 -3.7596292  3.7667356  -5.751738  5.758844
## 2010.904    1.628490432 -2.1346920  5.3916728  -4.126800  7.383781
## 2010.923   -0.510317460 -4.2734998  3.2528649  -6.265608  5.244973
## 2010.942   -0.981150794 -4.7443332  2.7820316  -6.736442  4.774140
## 2010.962    0.059226190 -3.7039562  3.8224086  -5.696065  5.814517
## 2010.981    0.368924925 -3.3942575  4.1321073  -5.386366  6.124216
## 2011.000   -0.628151116 -4.3913335  3.1350313  -6.383442  5.127140
## 2011.019    4.067361111  0.3041787  7.8305435  -1.687930  9.822652
## 2011.038   -0.073611111 -3.8367935  3.6895713  -5.828902  5.681680
## 2011.058    0.852302032 -4.4696415  6.1742456  -7.286908  8.991512
## 2011.077    1.880832888 -3.4411107  7.2027765  -6.258377 10.020043
## 2011.096   -1.691765873 -7.0137094  3.6301777  -9.830976  6.447444
## 2011.115    1.318258809 -4.0036848  6.6402024  -6.820952  9.457469
## 2011.135    0.938388016 -4.3835556  6.2603316  -7.200822  9.077598
## 2011.154   -4.567146891 -9.8890905  0.7547967 -12.706357  3.572063
## 2011.173    0.171809589 -5.1501340  5.4937532  -7.967401  8.311020
## 2011.192    1.347579607 -3.9743640  6.6695232  -6.791631  9.486790
## 2011.212    1.122856901 -4.1990867  6.4448005  -7.016353  9.262067
## 2011.231   -0.413461127 -5.7354047  4.9084824  -8.552671  7.725749
## 2011.250   -1.488523000 -6.8104666  3.8334206  -9.627733  6.650687
## 2011.269    0.591057765 -4.7308858  5.9130013  -7.548153  8.730268
## 2011.288   -0.246811734 -5.5687553  5.0751318  -8.386022  7.892399
## 2011.308    2.350598841 -2.9713447  7.6725424  -5.788611 10.489809
## 2011.327   -2.661213920 -7.9831575  2.6607297 -10.800424  5.477996
## 2011.346   -0.126289683 -5.4482333  5.1956539  -8.265500  8.012921
## 2011.365    0.847916667 -4.4740269  6.1698602  -7.291294  8.987127
## 2011.385   -0.524796252 -5.8467398  4.7971473  -8.664007  7.614414
## 2011.404    0.631542284 -4.6904013  5.9534859  -7.507668  8.770753
## 2011.423   -0.777805304 -6.0997489  4.5441383  -8.917016  7.361405
## 2011.442   -0.662075648 -5.9840192  4.6598679  -8.801286  7.477135
## 2011.462    0.175496032 -5.1464475  5.4974396  -7.963714  8.314706
## 2011.481   -0.543146562 -5.8650901  4.7787970  -8.682357  7.596064
## 2011.500    0.134217990 -5.1877256  5.4561616  -8.004992  8.273428
## 2011.519   -0.418154762 -5.7400983  4.9037888  -8.557365  7.721056
## 2011.538   -0.274900794 -5.5968444  5.0470428  -8.414111  7.864310
## 2011.558   -2.178714193 -7.5006578  3.1432294 -10.317925  5.960496
## 2011.577   -0.705908823 -6.0278524  4.6160347  -8.845119  7.433301
## 2011.596    0.542261905 -4.7796817  5.8642055  -7.596948  8.681472
## 2011.615   -0.833928571 -6.1558721  4.4880150  -8.973139  7.305282
## 2011.635    0.741567460 -4.5803761  6.0635110  -7.397643  8.880778
## 2011.654    1.650652469 -3.6712911  6.9725960  -6.488558  9.789863
## 2011.673    0.855200706 -4.4667429  6.1771443  -7.284010  8.994411
## 2011.692   -0.048511905 -5.3704555  5.2734317  -8.187722  8.090698
## 2011.712    0.901884921 -4.4200587  6.2238285  -7.237325  9.041095
## 2011.731    1.685409609 -3.6365340  7.0073532  -6.453801  9.824620
## 2011.750   -0.022512784 -5.3444564  5.2994308  -8.161723  8.116698
## 2011.769   -0.681006877 -6.0029504  4.6409367  -8.820217  7.458203
## 2011.788   -0.272564552 -5.5945081  5.0493790  -8.411775  7.866646
## 2011.808    1.145347036 -4.1765965  6.4672906  -6.993863  9.284557
## 2011.827    0.061101377 -5.2608422  5.3830449  -8.078109  8.200312
## 2011.846   -0.295039683 -5.6169833  5.0269039  -8.434250  7.844171
## 2011.865   -0.051289683 -5.3732333  5.2706539  -8.190500  8.087921
## 2011.885    0.003553219 -5.3183904  5.3254968  -8.135657  8.142764
## 2011.904    1.628490432 -3.6934531  6.9504340  -6.510720  9.767701
## 2011.923   -0.510317460 -5.8322610  4.8116261  -8.649528  7.628893
## 2011.942   -0.981150794 -6.3030944  4.3407928  -9.120361  7.158060
## 2011.962    0.059226190 -5.2627174  5.3811698  -8.079984  8.198437
## 2011.981    0.368924925 -4.9530186  5.6908685  -7.770285  8.508135
## 2012.000   -0.628151116 -5.9500947  4.6937925  -8.767361  7.511059
## 2012.019    4.067361111 -1.2545825  9.3893047  -4.071849 12.206571
## 2012.038   -0.073611111 -5.3955547  5.2483325  -8.212821  8.065599

checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 21.867, df = 32, p-value = 0.9109
## 
## Model df: 0.   Total lags used: 32

residuals: normal and centered around 0

ACF: CHECK

viewer <- function() {
  
  a <- readline(prompt = "Select from the following: Year / Seasonality / Week / Hour / Month")
  
  if (a == "Year") {    
    b <- readline(prompt = "By which year? (2007 / 2008 / 2009) ")
    if (b == "2007") {
      print(year_2007)
    } else if (b == "2008") {
      print(year_2008)
    } else if (b == "2009") {
      print(year_2009)
    }
  }
  
  if (a == "Seasonality") { 
    c <- readline(prompt = "By which sub-meter? (1 - Kitchen, 2 - Laundry or 3 - Water & AC) ")
    d <- readline(prompt = "By which year? (2007, 2008 or 2009) ")
    if (c == "1" && d == "2007") {
      print(kitchen_season_2007)
    } else if (c == "2" && d == "2007") {
      print(laundry_season_2007)
    } else if (c == "3" && d == "2007") {
      print(water_season_2007)
    } else if (c == "1" && d == "2008") {
      print(kitchen_season_2008)
    } else if (c == "2" && d == "2008") {
      print(laundry_season_2008)
    } else if (c == "3" && d == "2008") {
      print(water_season_2008)
    } else if (c == "1" && d == "2009") {
      print(kitchen_season_2009)
    } else if (c == "2" && d == "2009") {
      print(laundry_season_2009)
    } else if (c == "3" && d == "2009") {
      print(water_season_2009)
    }
  }
  
  if (a == "Week") {
    d <- readline(prompt="By what year? (2007 / 2008 / 2009) ")
    if (d == "2007") {
      print(weekday_2007)
    } else if (d == "2008") {
      print(weekday_2008)
    } else if (d == "2009") {
      print(weekday_2009)
    }
  }
  
  if(a == "Hour") { 
    e <- readline(prompt="By what year? (2007 / 2008 / 2009) ")
    f <- readline(prompt="By which Sub-meter? (1-Kitchen, 2-Laundry or 3-Water) ")
    if (e == "2007") {
      print(hour_2007)
    } else if (e == "2008") {
      print(hour_2008)
    } else if (e == "2009") {
      print(hour_2009)
    }  
    if (e == "2007") {
      if (f=="1-Kitchen") {
        print(ly_2007)
      } else if (f=="2-Laundry") {
        print(lyy_2007)
      } else if (f=="3-Water") {
        print(lyyy_2007)
      }
    } else if (e == "2008") {
      if (f=="1-Kitchen") {
        print(ly_2008)
      } else if (f=="2-Laundry") {
        print(lyy_2008)
      } else if (f=="3-Water") {
        print(lyyy_2008)
      }
    } else if (e == "2009")
      if (f=="1-Kitchen") {
        print(ly_2009)
      } else if (f=="2-Laundry") {
        print(lyy_2009)
      } else if (f=="3-Water") {
        print(lyyy_2009)
      }
  }
  
  if (a == "Month") { 
    g <- readline(prompt = "By which sub-meter? (1 - Kitchen, 2 - Laundry or 3 - Water & AC) ")
    h <- readline(prompt = "By which year? (2007, 2008 or 2009) ")
    if (g == "1 - Kitchen" && h == "2007") {
      print(thly_2007)
    } else if (g == "2 - Laundry" && h == "2007") {
      print(thlyy_2007)
    } else if (g == "3 - Water & AC" && h == "2007") {
      print(thlyyy_2007)
    } else if (g == "1 - Kitchen" && h == "2008") {
      print(thly_2008)
    } else if (g == "2 - Laundry" && h == "2008") {
      print(thlyy_2008)
    } else if (g == "3 - Water & AC" && h == "2008") {
      print(thlyyy_2008)
    } else if (g == "1 - Kitchen" && h == "2009") {
      print(thly_2009)
    } else if (g == "2 - Laundry" && h == "2009") {
      print(thlyy_2009)
    } else if (g == "3 - Water & AC" && h == "2009") {
      print(thlyyy_2009)
    }
  }
}

viewer()

v## Conclusion

This project aims to provide insights and forecasting capabilities for a sub-metering company operating in the Smart Home sector. By analyzing power consumption data, we can empower Smart Home owners with better understanding and control of their energy usage. The recommendations provided will assist the developer in offering highly efficient Smart Homes that provide owners with valuable power usage analytics.

The project also proposes the development of a dashboard application that enables homeowners to manage their power consumption effectively. The dashboard app will have the following features:

Visualization of Power Consumption: The app will provide intuitive visualizations that allow homeowners to monitor their power consumption patterns over time. This includes daily, monthly, and yearly views, as well as comparisons between different sub-meters and appliances.
Real-time Monitoring: Homeowners will be able to track their power consumption in real-time, allowing them to identify any sudden spikes or unusual patterns. This feature promotes awareness and encourages energy-saving habits.
Customizable Alerts: The dashboard app will allow homeowners to set custom thresholds for power consumption. When the consumption exceeds the specified threshold, the app will send alerts and reminders to prompt energy-saving actions.
Recommendations and Tips: Based on the analysis of power consumption data, the app will provide personalized recommendations and energy-saving tips to homeowners. This will help them make informed decisions and optimize their power usage.
Usage Comparison: The app will enable homeowners to compare their current power consumption with previous periods, such as the previous month or year. This feature will help identify behaviors or changes that lead to lower power usage and encourage sustainable practices.

Overall, the dashboard app will serve as a powerful tool for homeowners to manage their power consumption efficiently, promote energy-saving behaviors, and contribute to a more sustainable and environmentally friendly Smart Home ecosystem.

#LEGEND: #SUB_METER 1-> KITCHEN #SUB_METER_2-> LAUNDRY ROOM #SUB_METER_3-> WATER HEATER & AC

Time Series Analysis

2023-06-14

Dataset Information

2nd Option: Download the Data:

3rd option: Load dataset.

Smart Home Power Consumption Analysis

Project Overview

Investigate each new data frame.

1. Preprocessing DATA

Combine all dataframes

Combine date and time columns using as.POSIXct. Change the order of columns

This archive contains 2075259 measurements gathered in a house located in

Sceaux (7km of Paris, France) between December 2006 and November 2010 (47 months).

http://archive.ics.uci.edu/ml/datasets/Individual+household+electric+power+consumption

Checking for Missing Values

Checking for Duplicate Values

Types of variables

Which sub-meter is using the most power? The least? Is there anything to learn from the max and min?

From the mean of each sub-meter:

Sub-meter_3 uses the most at 6.458 mean

Sub_meter_1 uses the least at mean 1.122

2. DATA EXPLORATION

Explore by year, month, day using Lubridate:

Using mean:

2.2. WEEKLY/WEEKDAY/MONTHLY MEAN SUB_METER_2 LAUNDRY ROOM for 2007 & 2008 & 2009

2.3. WEEKLY/WEEKDAY/MONTHLY MEAN SUB_METER_3 (’Water Heater & AC) for 2007, 2008 & 2009, decomposed

2.4. TOTAL SUB_METER MEAN PER DAY

2.5. TOTAL SUB_METER MEAN PER YEAR

4. JOINING DATAFRAME AND CHANGING NAMES

5.1. WATER HEATER & AC WEEKLY TIMESERIES

5.1.1. WATER HEATER & AC WEEKLDAY TIMESERIES

5.2. KITCHEN WEEKLY TIMESERIES

5.3. LAUNDRY ROOM TIMESERIES

5.4. TOTAL WEEKLY TIMESERIES

5.5. TOTAL POWER CONSUMPTION BY YEAR

5.6. 2007 CONSUMPTION BY SUBMETER weekday

5.6. 2008 CONSUMPTION BY SUBMETER weekday

5.6. 2009 CONSUMPTION BY SUBMETER weekday

5.6. PLOT BY SEASON (HUE), SUB_METER1, YEAR

5.6. PLOT BY SEASON (HUE), SUB_METER2, YEAR

5.7. PLOT BY SEASON (HUE), SUB_METER3, YEAR

5.6. PLOT BY HOUR (HUE), SUB_METER1, YEAR

5.7. PLOT BY HOUR (HUE), SUB_METER2, YEAR

5.7. PLOT BY HOUR (HUE), SUB_METER3, YEAR

5.8. PLOT BY HOUR (HUE), SUB_METER1, MONTH

5.8. PLOT BY HOUR (HUE), SUB_METER2, MONTH

5.8. PLOT BY HOUR (HUE), SUB_METER3, MONTH

5.7. HOUR BY WEEK

5.5. PLOTLY (PLOTS OF THREE SUB-METERS AND TOTAL TOGETHER)

Plot all of three sub-meters year 2007

2007 - PER WEEK

2008 - PER WEEK

2009 - PER WEEK

Data Insights and Recommendations

5.6. DECOMPOSITION OF TIMESERIES

Forecasting and Analysis Summary

residuals: normal and centered around 0

ACF: CHECK