Work in Progress
Data Exploration
The data consists of variables related to energy useage from Con Edisons new analytic website. The primary variables from this dataset are date, usage, units.
describe <- describe(df1$USAGE)
kable(describe, "html", escape = F) %>%
kable_styling("striped", full_width = T) %>%
column_spec(1, bold = T) %>%
scroll_box(width = "100%", height = "700px")
vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
X1 | 1 | 10467 | 0.1189357 | 0.1110006 | 0.08 | 0.0966627 | 0.059304 | 0.02 | 0.82 | 0.8 | 2.11646 | 5.002591 | 0.001085 |
The date data within this dataset has to be binned to make some analysis easier - for example, by weeks and months.
df1$Month <- as.Date(cut(df1$DATE, breaks = "month"))
df1$week <- as.Date(cut(df1$DATE, breaks = "week", start.on.monday = F))
Binning the date data allows for easy ggplot visualizations, as shown below.
ggplot(data = df1,
aes(Month, df1$USAGE)) +
stat_summary(fun.y = sum,
geom = "bar", fill = "lightblue") +
theme_dark()
There is a steady increase in the amount of energy usage per month. July as of this data was only the beginning of the month.
ggplot(data = df1,
aes(week, df1$USAGE)) +
stat_summary(fun.y = sum, # adds up all observations for the month
geom = "bar", fill = "lightblue") +
theme_dark()
ggplot(df1) + geom_boxplot(aes(x = df1$Month, y = df1$USAGE, group = df1$Month, fill = as.factor(df1$Month)))