HOUSEHOLD ENERGY OPTIMIZATION: is the process of managing and controlling how energy is use in the home to reduce waste, improving efficiency, and reduces electricity costs.
This involves analyzing energy consuming appliance, to identify excessive or unnecessary usage (find out where and how electricity Is being used more than when not necessary in the house hold), using energy saving devices (using appliance or equipment that are designed to use less electricity while still perform the same function as regular devices), to prevent shortage of energy.
QUESTIONS TO SOLVE UNEDER THIS TOPIC:
1.How can wasteful or unnecessary energy consumption patterns within the household be detected?
2.How does energy usage vary across different time periods (hourly, weekly, or monthly)?
3.How do environmental factors such as temperature and humidity influence energy consumption levels?
4.Are there differences in energy consumption patterns between weekdays and weekends?
5.What recommendations can be made to help households reduce energy waste and improve energy efficiency?
6.How many household has the highest energy usage
The major aim for this project is to analyze and optimize the energy usage within house hold in order to minimize energy consumption and reduces electricity costs, and improve overall energy efficiency
To detect wasteful or unnecessary energy consumption patterns within the household.
To examine energy usage patterns across different time periods (hourly,weekly, or monthly).
To analyze environmental factors(such as temperature or humidity) that influence energy consumption levels.
To analyze differences in energy consumption patterns between weekdays and weekends.
To provide recommendations for households on reducing energy waste and improving efficiency.
THE EXPLANATION OF THIS DATASET: The energy optimization dataset i got from KAGGLE,this contains the time stamped measurements of household environment conditions, appliance energy usage, and energy related variables(temperature,humidity,and windspeed), it is designed to support analysis of consumption patterns, environmental impacts, and opportunities for reducing energy waste.This dataset has thousands of hourly observations
The key variables i well be working on in this dataset: ID,DATE/TIME,APPLIANES,LIGHTS,T1-T9,RH1-RH9,T_OUT,RH_OUT
library(tidyverse) # THIS FOR DATA MANIPULATION
library(lubridate) # THIS FOR WORKING WITH DATE AND TIME
library(ggplot2) # THIS FOR VISUALIZATION PLOT
library(reshape2) # THIS FOR RESAPING AND AGGREGATION
library(plotly) # THIS FOR INTERACTIVE VISUALIZATION PLOT
# Loading the dataset
#this function read.csv is used to import data from a csv file into a data frame
Energy_optm <- read.csv("C:/Users/user/Downloads/train (1).csv") # loading the data
# To see the first few rows of the data set with the function head()
head(Energy_optm) # the first six rows of the dataset
## ID date lights t1 rh_1 t2 rh_2 t3
## 1 2133 2016-01-26 12:30:00 0 19.89000 45.50 19.20000 45.09000 20.39000
## 2 19730 2016-05-27 17:20:00 0 25.56667 46.56 25.89000 42.02571 27.20000
## 3 3288 2016-02-03 13:00:00 0 22.50000 44.43 21.53333 42.59000 21.96333
## 4 7730 2016-03-05 09:20:00 0 19.79000 38.06 17.20000 40.93333 20.60000
## 5 8852 2016-03-13 04:20:00 0 20.60000 35.29 17.10000 39.79000 20.29000
## 6 425 2016-01-14 15:50:00 0 21.53333 40.00 21.07500 38.99750 21.32333
## rh_3 t4 rh_4 t5 rh_5 t6 rh_6 t7 rh_7
## 1 44.29000 19.10 46.70000 17.51111 53.00000 11.1000000 98.43333 17.50 43.50
## 2 41.16333 24.70 45.59000 23.20000 52.40000 24.7966667 1.00000 24.50 44.50
## 3 44.55500 22.00 40.46667 19.10000 55.32667 6.5300000 61.46333 19.29 34.32
## 4 37.16333 18.39 37.00000 18.29000 42.26000 2.7900000 79.93333 18.10 32.00
## 5 37.00000 19.50 34.50000 18.20000 49.00000 -0.6666667 68.53000 20.70 33.59
## 6 41.43333 18.76 42.36333 17.10000 53.50000 5.3666667 85.53000 17.39 37.90
## t8 rh_8 t9 rh_9 t_out press_mm_hg rh_out windspeed
## 1 18.11111 50.00000 17.16667 48.70000 10.3000000 761.9000 85.50000 7.500000
## 2 24.70000 50.07400 23.20000 46.79000 22.7333333 755.2000 55.66667 3.333333
## 3 20.56667 41.33111 18.60000 45.53000 6.6000000 760.2000 64.00000 8.000000
## 4 20.50000 42.59000 18.39000 40.72333 2.1000000 741.5333 94.33333 1.000000
## 5 22.70000 39.26000 18.92667 40.09000 -0.8666667 768.2667 92.33333 1.666667
## 6 17.89000 44.70000 17.10000 44.96667 5.4166667 747.5667 79.83333 6.000000
## visibility tdewpoint rv1 rv2 appliances
## 1 23.50000 7.950000 39.24086 39.24086 3.912023
## 2 23.66667 13.333333 43.09681 43.09681 4.605170
## 3 40.00000 0.200000 42.05466 42.05466 4.248495
## 4 48.66667 1.233333 12.61586 12.61586 3.688879
## 5 34.00000 -1.933333 10.89793 10.89793 3.688879
## 6 40.00000 2.166667 20.28818 20.28818 3.688879
# To view the dataset in a large format or to have a clearer view
View(Energy_optm) # using this function to view
dim(Energy_optm) # the number of rows and columns of the dataset
## [1] 15788 30
#convert the date to a normal date-time format
Energy_optm$date <- as.POSIXct(Energy_optm$date, format="%Y-%m-%d %H:%M:%S")
# to extract the useful time features
Energy_optm$hour <- format(Energy_optm$date, "%H") # to extract the hours
Energy_optm$day <- format(Energy_optm$date, "%A") # to extract the days
Energy_optm$month <- format(Energy_optm$date, "%Y-%m") # to extract the month
Energy_optm <- Energy_optm %>%
mutate(
day_type = ifelse(wday(date) %in% c(1, 7), "Weekend", "Weekday")
)
# using function str()to check for the structure of the data
str(Energy_optm) # this give me the skeleton view of my data
## 'data.frame': 15788 obs. of 34 variables:
## $ ID : int 2133 19730 3288 7730 8852 425 10277 15028 16841 1890 ...
## $ date : POSIXct, format: "2016-01-26 12:30:00" "2016-05-27 17:20:00" ...
## $ lights : int 0 0 0 0 0 0 0 0 0 30 ...
## $ t1 : num 19.9 25.6 22.5 19.8 20.6 ...
## $ rh_1 : num 45.5 46.6 44.4 38.1 35.3 ...
## $ t2 : num 19.2 25.9 21.5 17.2 17.1 ...
## $ rh_2 : num 45.1 42 42.6 40.9 39.8 ...
## $ t3 : num 20.4 27.2 22 20.6 20.3 ...
## $ rh_3 : num 44.3 41.2 44.6 37.2 37 ...
## $ t4 : num 19.1 24.7 22 18.4 19.5 ...
## $ rh_4 : num 46.7 45.6 40.5 37 34.5 ...
## $ t5 : num 17.5 23.2 19.1 18.3 18.2 ...
## $ rh_5 : num 53 52.4 55.3 42.3 49 ...
## $ t6 : num 11.1 24.797 6.53 2.79 -0.667 ...
## $ rh_6 : num 98.4 1 61.5 79.9 68.5 ...
## $ t7 : num 17.5 24.5 19.3 18.1 20.7 ...
## $ rh_7 : num 43.5 44.5 34.3 32 33.6 ...
## $ t8 : num 18.1 24.7 20.6 20.5 22.7 ...
## $ rh_8 : num 50 50.1 41.3 42.6 39.3 ...
## $ t9 : num 17.2 23.2 18.6 18.4 18.9 ...
## $ rh_9 : num 48.7 46.8 45.5 40.7 40.1 ...
## $ t_out : num 10.3 22.733 6.6 2.1 -0.867 ...
## $ press_mm_hg: num 762 755 760 742 768 ...
## $ rh_out : num 85.5 55.7 64 94.3 92.3 ...
## $ windspeed : num 7.5 3.33 8 1 1.67 ...
## $ visibility : num 23.5 23.7 40 48.7 34 ...
## $ tdewpoint : num 7.95 13.33 0.2 1.23 -1.93 ...
## $ rv1 : num 39.2 43.1 42.1 12.6 10.9 ...
## $ rv2 : num 39.2 43.1 42.1 12.6 10.9 ...
## $ appliances : num 3.91 4.61 4.25 3.69 3.69 ...
## $ hour : chr "12" "17" "13" "09" ...
## $ day : chr "Tuesday" "Friday" "Wednesday" "Saturday" ...
## $ month : chr "2016-01" "2016-05" "2016-02" "2016-03" ...
## $ day_type : chr "Weekday" "Weekday" "Weekday" "Weekend" ...
summary(Energy_optm) # this gives me the(min,1st.qu,median,mean,3rd.qu,,max,class,and length) of each variables
## ID date lights t1
## Min. : 1 Min. :2016-01-11 17:10:00 Min. : 0.000 Min. :16.79
## 1st Qu.: 4923 1st Qu.:2016-02-14 21:27:30 1st Qu.: 0.000 1st Qu.:20.78
## Median : 9908 Median :2016-03-20 12:15:00 Median : 0.000 Median :21.60
## Mean : 9873 Mean :2016-03-20 06:26:24 Mean : 3.809 Mean :21.69
## 3rd Qu.:14821 3rd Qu.:2016-04-23 15:12:30 3rd Qu.: 0.000 3rd Qu.:22.60
## Max. :19734 Max. :2016-05-27 18:00:00 Max. :70.000 Max. :26.26
## rh_1 t2 rh_2 t3
## Min. :27.02 Min. :16.10 Min. :20.46 Min. :17.20
## 1st Qu.:37.40 1st Qu.:18.82 1st Qu.:37.90 1st Qu.:20.79
## Median :39.66 Median :20.00 Median :40.50 Median :22.10
## Mean :40.27 Mean :20.35 Mean :40.43 Mean :22.27
## 3rd Qu.:43.06 3rd Qu.:21.50 3rd Qu.:43.29 3rd Qu.:23.29
## Max. :57.42 Max. :29.86 Max. :54.77 Max. :29.24
## rh_3 t4 rh_4 t5
## Min. :28.77 Min. :15.10 Min. :27.66 Min. :15.33
## 1st Qu.:36.90 1st Qu.:19.53 1st Qu.:35.59 1st Qu.:18.29
## Median :38.56 Median :20.63 Median :38.46 Median :19.39
## Mean :39.25 Mean :20.85 Mean :39.05 Mean :19.60
## 3rd Qu.:41.76 3rd Qu.:22.10 3rd Qu.:42.19 3rd Qu.:20.63
## Max. :50.16 Max. :26.20 Max. :51.09 Max. :25.80
## rh_5 t6 rh_6 t7
## Min. :30.17 Min. :-6.030 Min. : 1.00 Min. :15.39
## 1st Qu.:45.43 1st Qu.: 3.595 1st Qu.:29.99 1st Qu.:18.70
## Median :49.08 Median : 7.300 Median :55.30 Median :20.06
## Mean :50.95 Mean : 7.914 Mean :54.64 Mean :20.27
## 3rd Qu.:53.70 3rd Qu.:11.263 3rd Qu.:83.30 3rd Qu.:21.60
## Max. :96.32 Max. :28.290 Max. :99.90 Max. :25.96
## rh_7 t8 rh_8 t9
## Min. :23.23 Min. :16.31 Min. :29.60 Min. :14.89
## 1st Qu.:31.50 1st Qu.:20.79 1st Qu.:39.09 1st Qu.:18.00
## Median :34.90 Median :22.10 Median :42.43 Median :19.39
## Mean :35.41 Mean :22.03 Mean :42.96 Mean :19.49
## 3rd Qu.:39.02 3rd Qu.:23.39 3rd Qu.:46.56 3rd Qu.:20.60
## Max. :51.33 Max. :27.23 Max. :58.78 Max. :24.50
## rh_9 t_out press_mm_hg rh_out
## Min. :29.17 Min. :-5.000 Min. :729.3 Min. : 24.00
## 1st Qu.:38.53 1st Qu.: 3.633 1st Qu.:750.9 1st Qu.: 70.33
## Median :40.93 Median : 6.933 Median :756.1 Median : 84.00
## Mean :41.57 Mean : 7.418 Mean :755.5 Mean : 79.82
## 3rd Qu.:44.36 3rd Qu.:10.417 3rd Qu.:760.9 3rd Qu.: 91.67
## Max. :53.33 Max. :26.100 Max. :772.3 Max. :100.00
## windspeed visibility tdewpoint rv1
## Min. : 0.000 Min. : 1.00 Min. :-6.6000 Min. : 0.006033
## 1st Qu.: 2.000 1st Qu.:29.00 1st Qu.: 0.9333 1st Qu.:12.510037
## Median : 3.667 Median :40.00 Median : 3.4333 Median :24.912220
## Mean : 4.031 Mean :38.33 Mean : 3.7814 Mean :25.027694
## 3rd Qu.: 5.500 3rd Qu.:40.00 3rd Qu.: 6.6000 3rd Qu.:37.665543
## Max. :14.000 Max. :66.00 Max. :15.4000 Max. :49.996530
## rv2 appliances hour day
## Min. : 0.006033 Min. :2.303 Length:15788 Length:15788
## 1st Qu.:12.510037 1st Qu.:3.912 Class :character Class :character
## Median :24.912220 Median :4.094 Mode :character Mode :character
## Mean :25.027694 Mean :4.305
## 3rd Qu.:37.665543 3rd Qu.:4.605
## Max. :49.996530 Max. :6.985
## month day_type
## Length:15788 Length:15788
## Class :character Class :character
## Mode :character Mode :character
##
##
##
# so i want to check if there's missing values
all(is.na(Energy_optm))
## [1] FALSE
# the function is.na() is to check if not available or missing values
During the process of cleaning the data i found out that the date is not properly arranged, so i sorted it because it can give me proper plotting and good interpretation for the visualization.
# To arrange the date ascending order
sorted <- Energy_optm[ #assigning it with sorted
order(Energy_optm$date),
] # Using the function order()
Based on the objective for energy optimization which is to know where there is waste or unnecessary energy consumption in the househould. but before i check, i calculated the total amount of energy consumed in all the household
# Total energy used by appliances in all household
total_appliances <- sum(Energy_optm$appliances, na.rm = TRUE)
total_appliances
## [1] 67971.71
# Total energy used by lights in all household
total_lights <- sum(Energy_optm$lights, na.rm = TRUE)
total_lights
## [1] 60130
# Total energy used in the household
total_energy <- total_appliances + total_lights
total_energy
## [1] 128101.7
Energy_optm$total_energy_used<- Energy_optm$lights+ # assigning the new variable with the sum of lights and appliance
Energy_optm$appliances
tibble(Energy_optm$ID,Energy_optm$total_energy_used)
## # A tibble: 15,788 × 2
## `Energy_optm$ID` `Energy_optm$total_energy_used`
## <int> <dbl>
## 1 2133 3.91
## 2 19730 4.61
## 3 3288 4.25
## 4 7730 3.69
## 5 8852 3.69
## 6 425 3.69
## 7 10277 4.09
## 8 15028 4.09
## 9 16841 4.50
## 10 1890 34.6
## # ℹ 15,778 more rows
Energy_optm <- Energy_optm %>%
mutate(
day_type = ifelse(wday(date) %in% c(1, 7), "Weekend", "Weekday")
)
View(Energy_optm) # to check if the new variable is added
ggplot(Energy_optm, aes(x = appliances)) +
geom_histogram(fill="red") + # filling it with red colour
labs(
title = "Distribution of Household Energy Consumption", # the title
x = "Energy Consumption", # xlabel
y = "household" # ylabel
) +
theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
The distribution of household energy usage shows how energy is spread
across households. Most households use moderate energy, while a few may
have unusually high or low usage. Understanding this distribution helps
identify wasteful consumption,
ggplot(Energy_optm, aes(y = appliances)) + # using aes() to map out appliances
geom_boxplot(fill = "yellow") + # creating a boxplot
labs(title = " Appliances outliers(unusual and wasteful)", # add the title for the graph
y = "Appliances (Wh)") # ylab
The wasteful energy occurs when appliances is left on when not needed
e.g Lights, heaters, AC units, or other appliances running longer than
necessary. so the minimum energy usage occurs when the applainces is
turned off or unused in nighttime periods
# Create a Plotly boxplot to identify unusual or wasteful energy consumption
plot_ly(
data = Energy_optm, # Use the Energy_optm dataset
y = ~appliances, # Plot the appliances energy usage on the y-axis
type = "box", # Specify that this is a boxplot
boxpoints = "outliers", # Show only outlier points (unusual high/low usage)
marker = list(size = 6), # Set the size of the outlier points
color = "yellow",
line = list(width = 2), # Adjust the thickness of the boxplot lines
name = "Appliances" # Name the box in case of multiple boxplots
) %>%
# Customize the layout of the plot
plotly::layout(
title = "Appliances Outliers (Unusual and Wasteful)", # Title of the chart
yaxis = list(title = "Appliances (Wh)"), # Label for y-axis
showlegend = FALSE # Hide the legend
)
Energy_optm <- Energy_optm %>%
mutate(hour = hour(date)) # TO modify hour from the existing date and add it to the columns
Energy_optm <- Energy_optm %>%
mutate(day = day(date)) # TO modify day from the existing date and add it to the columns
Energy_optm <- Energy_optm %>%
mutate(week = week(date)) # TO modify week from the existing date and add it to the columns
Energy_optm <- Energy_optm %>%
mutate(month = month(date)) # TO modify hour from the existing date and add it to the columns
# Calculate average energy used per hour
hourly <- Energy_optm %>%
group_by(hour) %>% # calculate the mean and summaries it with total energy used
summarise(avg_energy = mean(total_energy_used, na.rm = TRUE))
# Plot for the average energy usage by hour
ggplot(hourly, aes(x=hour, y=avg_energy, fill=avg_energy)) +
geom_bar(stat="identity") +
labs(title="The Average Energy Usage by Hour",
x="Hours", # xlabel
y="Energy consumption (Wh)") + # ylabel
scale_fill_gradient(low = "green", high = "red") +
theme_minimal()
In this plot we can see it showing how energy is been used during the hours of the day.let’s consider homan activities as an example, energy usage is low in the morning when people are out of the house and moderate in the afternoon when some people are home, and high when most are home and using multiple appliances like Ac, television, water heater,space heater and some mini electronic devices
weekly <- Energy_optm %>%
group_by(week) %>% # grouping the data by week
summarise(avg_energy = mean(total_energy_used, na.rm = TRUE))
ggplot(weekly, aes(x=week, y=avg_energy, fill=avg_energy)) +
geom_bar(stat="identity") +
labs(title="The Average Energy Usage by Week", # title for the graph
x="Weekly", #xlab
y="Energy consumption (Wh)") + #ylab
scale_fill_gradient(low = "green", high = "red") + # add a gradient color highlight high(red) vs low (green) usage
theme_minimal()
monthly <- Energy_optm %>%
group_by(month) %>% # grouping the data by month
summarise(avg_energy = mean(total_energy_used, na.rm = TRUE))
# Create a bar plot comparing energy usage between month
ggplot(monthly, aes(x=month, y=avg_energy, fill=avg_energy)) +
geom_bar(stat="identity") +
labs(title="The Average Energy Usage by month",
x="month", # xlab
y="Energy consumption (Wh)") + # ylab
scale_fill_gradient(low = "green", high = "red") + # add a gradient color highlight high(red) vs low (green) usage
theme_minimal()
February had the highest energy consumption due to the cold weather,
which led to increased use of heating system.may has the lowest energy
because temperatures are moderate and comfortable, heating is rarely
needed and cooling may not be fully used.
avg_energy_by_day <- Energy_optm %>%
group_by(day_type) %>% # grouping the data by day type
summarise(avg_usage = mean(appliances, na.rm = TRUE)) %>% # Compute the mean appliances usage for each group
arrange(desc(avg_usage)) # Arrange the results from highest to lowest average usage
# Create a bar plot comparing energy usage between day types
ggplot(avg_energy_by_day, aes(x = day_type, y = avg_usage, fill = avg_usage)) +
geom_bar(stat = "identity") + # Draw bars where the height represents the average usage
labs(
title = "energy consumption patterns between weekdays and weekends", # add the title
x = "Day of the Week", # label for x-axis
y = "Energy Usage (Appliances)" # label for y-axis
) +
scale_fill_gradient(low = "green", high = "red") + # add a gradient color highlight high(red) vs low (green) usage
theme_minimal()
# Saturdays are often days when all family members are home. More people
at home usually means more use of appliances, like lights, fans, air
conditioning, washing machines, ovens, Televisions,heater and other
electronics.
# the effect of temperature on energy usage
ggplot(Energy_optm, aes(t1, appliances)) +
geom_point(alpha=0.4) + # add each points
geom_smooth(color="red") + # to add the temperature trend line
labs(title="Effect of Temperature on Energy usage",
x="Temperature (°C)", # label for x-axis
y="Energy Usage (Wh)") # label for the y-axis
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
Below the line showing the lower energy usage while above the line is
showing higher energy usage in respect to the temperature measured
outside each household, meaning that the higher the temp outside the
higher the appliances used(Ac) and the lower the temperature the higher
the appliances used(heater)
# the effect of humidity on energy usage
ggplot(Energy_optm, aes(rh_1,appliances)) +
geom_point(alpha=0.4) + # add the scatter points
geom_smooth(color="blue") + # add the trend line to show the general relationship
labs(title="Effect of Humidity on Energy used", # the title for the plot
x="Humidity (%)", # label for x-axis
y="Energy Usage (Wh)") # label for the y-axis
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
Low humidity often occurs during cold weather,When it’s cold and dry,
households may use heating systems more, which increases energy usage,
while high humidity increase cooling system usage(dehumidifier)
ggplot(Energy_optm, aes(windspeed,appliances)) +
geom_point(alpha=0.4) + # add the scatter points
geom_smooth(color="blue") + # add the trend line to show the general relationship
labs(title="Effect of windspeed on Energy used", # the title for the plot
x="windspeed(m/s) ", # label for x-axis
y="Energy Usage (Wh)") # label for the y-axis
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
windspeed contribute to energy usage in household but its effect is
smaller than temperature and humidity.for example in warm weather, wind
can help cool the house naturally, reducing AC use.
top10_households <- Energy_optm %>%
group_by(ID) %>% # Group by household
summarise(total_usage = sum(total_energy_used, na.rm = TRUE)) %>% # Total energy used
arrange(desc(total_usage)) %>% # Sort from highest to lowest
slice(1:10)
top10_households
## # A tibble: 10 × 2
## ID total_usage
## <int> <dbl>
## 1 10 75.4
## 2 11 66.4
## 3 12 56.1
## 4 2197 55.9
## 5 1313 55.8
## 6 2888 54.9
## 7 8933 54.5
## 8 8934 54.4
## 9 7 54.1
## 10 1837 46.3
# Plot top 10 households
ggplot(top10_households, aes(x = reorder(ID, total_usage), y = total_usage, fill = total_usage)) +
geom_bar(stat = "identity") + # Draw bars with fill mapped to total_usage
# Flip coordinates to make it horizontal
coord_flip() +
labs(
title = "Top 10 Households with Highest Energy Usage", # Add title and axis labels
x = "Household ID",
y = "Total Energy Usage (Wh)"
) +
scale_fill_gradient(low = "orange", high = "red") + # Apply a color gradient from green (low) to red (high)
theme_minimal(base_size = 14)
The top ten households use the most energy mainly because they need more
heating or cooling, use inefficient appliances, and do not schedule
their appliance use well. Larger household size and poor building
insulation also increase their energy use. In addition, weather
conditions like temperature and wind, together with daily usage habits,
further raise their energy consumption.
so after visualizing the results shows how energy is been used overtime (hour,week and month),the results show that temperature and humidity affect energy use, with higher consumption during extreme weather and this visualizations help identify wasteful energy use and provide insights that can guide households to reduce unnecessary consumption and improve energy efficiency.
My advice as a data analyst to the household is to turn off appliances like Ac, lights, and other devices when not in use, to maximize natural light during the day,to Maintain moderate indoor temperature and humidity to reduce HVAC load and Reduce usage during peak hours (evenings and weekends) when appliances consume the most energy.