Electricity Data for 2018

Nicholas Schettini

July 10, 2018

Work in Progress

Data Exploration

The data consists of variables related to energy useage from Con Edisons new analytic website. The primary variables from this dataset are date, usage, units.

describe <- describe(df1$USAGE)

kable(describe, "html", escape = F) %>%
  kable_styling("striped", full_width = T) %>%
  column_spec(1, bold = T) %>%
  scroll_box(width = "100%", height = "700px")
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 10467 0.1189357 0.1110006 0.08 0.0966627 0.059304 0.02 0.82 0.8 2.11646 5.002591 0.001085

The date data within this dataset has to be binned to make some analysis easier - for example, by weeks and months.

df1$Month <- as.Date(cut(df1$DATE, breaks = "month"))
df1$week <- as.Date(cut(df1$DATE, breaks = "week", start.on.monday = F))

Binning the date data allows for easy ggplot visualizations, as shown below.

ggplot(data = df1,
  aes(Month, df1$USAGE)) +
  stat_summary(fun.y = sum,
    geom = "bar", fill = "lightblue") +
  theme_dark()

There is a steady increase in the amount of energy usage per month. July as of this data was only the beginning of the month.

ggplot(data = df1,
  aes(week, df1$USAGE)) +
  stat_summary(fun.y = sum, # adds up all observations for the month
    geom = "bar", fill = "lightblue") +
  theme_dark()

ggplot(df1) + geom_boxplot(aes(x = df1$Month, y = df1$USAGE, group = df1$Month, fill = as.factor(df1$Month)))