This project explores the “Individual household electric power consumption Data Set” from the UC Irvine Machine Learning Repository, a popular repository for machine learning datasets.
It is part of the Exploratory Data Analysis Course on Coursera.
The data set contains measurements of electric power consumption in one household with a one-minute sampling rate over a period of almost 4 years. Different electrical quantities and some sub-metering values are available.
The following descriptions of the 9 variables in the dataset are taken from the UCI web site:
Date: Date in format dd/mm/yyyy
Time: time in format hh:mm:ss
Global_active_power: household global minute-averaged active power (in kilowatt)
Global_reactive_power: household global minute-averaged reactive power (in kilowatt)
Voltage: minute-averaged voltage (in volt)
Global_intensity: household global minute-averaged current intensity (in ampere)
Sub_metering_1: energy sub-metering No. 1 (in watt-hour of active energy). It corresponds to the kitchen, containing mainly a dishwasher, an oven and a microwave (hot plates are not electric but gas powered).
Sub_metering_2: energy sub-metering No. 2 (in watt-hour of active energy). It corresponds to the laundry room, containing a washing-machine, a tumble-drier, a refrigerator and a light.
Sub_metering_3: energy sub-metering No. 3 (in watt-hour of active energy). It corresponds to an electric water-heater and an air-conditioner.
data <- read.table("household_power_consumption.txt", sep = ";", header = TRUE)
data$Date <- as.Date(data$Date,"%d/%m/%Y")
data <- subset(data, Date == "2007-02-01" | Date == "2007-02-02")
# Getting date and time
datetime <- strptime(paste(data$Date,as.character(data$Time)),"%Y-%m-%d %H:%M:%S")
# plot 1
hist(as.numeric(as.character(data$Global_active_power)),
main = "Global Active Power", xlab = "Global Active Power (kilowatts)", col = "red")
# plot 2
plot(datetime, as.numeric(as.character(data$Global_active_power)),
type = "l", xlab = "", ylab = "Global Active Power (kilowatts)")
# plot 3
plot(datetime, as.numeric(as.character(data$Sub_metering_1)),
type = "l", xlab = "", ylab = "Energy sub metering")
lines(datetime, as.numeric(as.character(data$Sub_metering_2)),
type = "l", col = "red")
lines(datetime, data$Sub_metering_3,
type = "l", col = "blue")
legend("topright", "(x,y)",
legend=c("Sub_metering_1", "Sub_metering_2", "Sub_metering_3"),
col=c("black", "red", "blue"), lty=1, cex = 0.6)
# plot 4
par(mfrow=c(2,2), mar = c(4,4,3,2)+0.1)
## similar to plot 2
plot(datetime, as.numeric(as.character(data$Global_active_power)),
type = "l", xlab = "", ylab = "Global Active Power")
## voltage plot
plot(datetime, as.numeric(as.character(data$Voltage)),
type = "l", xlab = "datetime", ylab = "Voltage")
## similar to plot 3
plot(datetime, as.numeric(as.character(data$Sub_metering_1)),
type = "l", xlab = "", ylab = "Energy sub metering")
lines(datetime, as.numeric(as.character(data$Sub_metering_2)),
type = "l", col = "red")
lines(datetime, data$Sub_metering_3,
type = "l", col = "blue")
legend("topright", "(x,y)",
legend=c("Sub_metering_1", "Sub_metering_2", "Sub_metering_3"),
col = c("black", "red", "blue"), lty = 1, cex = 0.5, bty = "n")
## global reactive power plot
plot(datetime, as.numeric(as.character(data$Global_reactive_power)),
type = "l", xlab = "datetime", ylab = "Global_reactive_power")