Contributors:
Peter Fernandes
Arushi Arora
Project 2 requires to create 3 tidy datasets by either using the untidy datasets from week5 discussion or choose any of our own dataset. It requires the data set to be wide and untidy so that we read the data from a CSV and transform and tidy the datasets. we have used 3 of the datasets from the discussion and tried to transform and tidy the data. We have analysed the data over plots using the ggplot library.
Analysis :
Calculate the average of energy consumption and CO2 emissions.
knitr::opts_chunk$set(eval = TRUE, results = FALSE,
fig.show = "hide", message = FALSE)
if (!require("tidyr")) install.packages('tidyr')
## Loading required package: tidyr
if (!require("dplyr")) install.packages('dplyr')
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
if (!require("stringr")) install.packages('stringr')
## Loading required package: stringr
if (!require("DT")) install.packages('DT')
## Loading required package: DT
if (!require("ggplot2")) install.packages('ggplot2')
## Loading required package: ggplot2
csv <- read.csv("https://raw.githubusercontent.com/petferns/607-Project2/main/energy.csv", na.strings = c("", "NA"))
head(csv)
names(csv)[1] <- "Category"
names(csv)[2] <- "Building and industry consumption"
names(csv)[3] <- "Building and industry Emission"
names(csv)[4] <- "Transportation consumption"
names(csv)[5] <- "Transportation Emission"
head(csv)
csv <- csv[-c(1),]
csv
csv$Counties <- gsub(".*\\(+(.*)+\\)","\\1",csv$Category)
csv
csv1<- csv %>% gather("Type", "Count", 2:5)
csv1
csv1$Utils <- sub("^.+ ", "", csv1$Type)
csv1$Division <- str_extract(csv1$Type, "[^ ]+")
csv1$Division
We see the consumption and emission average across different segments using the below plot
csv1$Count <- as.numeric(csv1$Count)
overall_dt <- csv1 %>% group_by(Category, Type) %>% mutate(overall_avg = mean(`Count`))
overall_dt
ggplot(overall_dt ,aes(x= Type, y=overall_avg, fill= Type)) +
geom_bar(stat="identity", position=position_dodge()) + theme(axis.text.x = element_text(angle = 90))
We see from the analysis and plotting that the overall consumption is much higher than the Emissions across NY.
overall_dt <- csv1 %>% group_by(Utils) %>% mutate(overall_avg = mean(`Count`))
overall_dt
ggplot(overall_dt ,aes(x= Utils, y=overall_avg, fill= Utils)) +
geom_bar(stat="identity", position=position_dodge()) + theme(axis.text.x = element_text(angle = 90))