Použijeme databázu spotreby elektriny (2017, 10-minútové intervaly) s meteorologickými premennými. Cieľ: vysvetliť spotrebu elektriny v zóne 1 pomocou počasia a časových efektov.
library(zoo)
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(tseries)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(lmtest)
library(sandwich)
library(car)
## Loading required package: carData
rm(list = ls())
Budeme modelovať PowerConsumption_Zone1 (spotreba v zóne 1) v závislosti od: - teploty, vlhkosti, rýchlosti vetra, - difúzneho slnečného žiarenia, - časových efektov (hodina dňa, deň v týždni).
Predpokladáme, že časové premenné zachytia pravidelný denný a týždenný cyklus spotreby elektriny a meteorologické premenné budú mať dodatočný vysvetľujúci vplyv.
udaje <- read.csv("powerconsumption.csv", sep = ",", header = TRUE)
udaje$Datetime <- as.POSIXct(udaje$Datetime, format = "%m/%d/%Y %H:%M")
str(udaje)
## 'data.frame': 52416 obs. of 9 variables:
## $ Datetime : POSIXct, format: "2017-01-01 00:00:00" "2017-01-01 00:10:00" ...
## $ Temperature : num 6.56 6.41 6.31 6.12 5.92 ...
## $ Humidity : num 73.8 74.5 74.5 75 75.7 76.9 77.7 78.2 78.1 77.3 ...
## $ WindSpeed : num 0.083 0.083 0.08 0.083 0.081 0.081 0.08 0.085 0.081 0.082 ...
## $ GeneralDiffuseFlows : num 0.051 0.07 0.062 0.091 0.048 0.059 0.048 0.055 0.066 0.062 ...
## $ DiffuseFlows : num 0.119 0.085 0.1 0.096 0.085 0.108 0.096 0.093 0.141 0.111 ...
## $ PowerConsumption_Zone1: num 34056 29815 29128 28229 27336 ...
## $ PowerConsumption_Zone2: num 16129 19375 19007 18361 17872 ...
## $ PowerConsumption_Zone3: num 20241 20131 19668 18899 18442 ...
summary(udaje)
## Datetime Temperature Humidity WindSpeed
## Min. :2017-01-01 00:00:00 Min. : 3.247 Min. :11.34 Min. :0.050
## 1st Qu.:2017-04-01 23:57:30 1st Qu.:14.410 1st Qu.:58.31 1st Qu.:0.078
## Median :2017-07-01 23:55:00 Median :18.780 Median :69.86 Median :0.086
## Mean :2017-07-01 23:55:00 Mean :18.810 Mean :68.26 Mean :1.959
## 3rd Qu.:2017-09-30 23:52:30 3rd Qu.:22.890 3rd Qu.:81.40 3rd Qu.:4.915
## Max. :2017-12-30 23:50:00 Max. :40.010 Max. :94.80 Max. :6.483
## GeneralDiffuseFlows DiffuseFlows PowerConsumption_Zone1
## Min. : 0.004 Min. : 0.011 Min. :13896
## 1st Qu.: 0.062 1st Qu.: 0.122 1st Qu.:26311
## Median : 5.035 Median : 4.456 Median :32266
## Mean : 182.697 Mean : 75.028 Mean :32345
## 3rd Qu.: 319.600 3rd Qu.:101.000 3rd Qu.:37309
## Max. :1163.000 Max. :936.000 Max. :52204
## PowerConsumption_Zone2 PowerConsumption_Zone3
## Min. : 8560 Min. : 5935
## 1st Qu.:16981 1st Qu.:13129
## Median :20823 Median :16415
## Mean :21043 Mean :17835
## 3rd Qu.:24714 3rd Qu.:21624
## Max. :37409 Max. :47598
udaje_model <- udaje[, c("Datetime",
"PowerConsumption_Zone1",
"Temperature", "Humidity", "WindSpeed",
"GeneralDiffuseFlows", "DiffuseFlows")]
udaje_model$Hour <- as.integer(format(udaje_model$Datetime, "%H"))
udaje_model$DayOfWeek <- factor(format(udaje_model$Datetime, "%a"),
levels = c("Mon","Tue","Wed","Thu","Fri","Sat","Sun"))
udaje_model <- subset(udaje_model,
Datetime >= as.POSIXct("2017-06-01 00:00:00") &
Datetime <= as.POSIXct("2017-06-30 23:50:00"))
colSums(is.na(udaje_model))
## Datetime PowerConsumption_Zone1 Temperature
## 0 0 0
## Humidity WindSpeed GeneralDiffuseFlows
## 0 0 0
## DiffuseFlows Hour DayOfWeek
## 0 0 0
vars <- c("PowerConsumption_Zone1", "Temperature", "Humidity", "WindSpeed",
"GeneralDiffuseFlows", "DiffuseFlows")
par(mfrow = c(2, 3))
for (v in vars) {
boxplot(udaje_model[[v]], main = v, col = "lightblue")
}
par(mfrow = c(1, 1))
model1 <- lm(PowerConsumption_Zone1 ~ Temperature + Humidity + WindSpeed +
GeneralDiffuseFlows + DiffuseFlows +
factor(Hour) + DayOfWeek,
data = udaje_model)
summary(model1)
##
## Call:
## lm(formula = PowerConsumption_Zone1 ~ Temperature + Humidity +
## WindSpeed + GeneralDiffuseFlows + DiffuseFlows + factor(Hour) +
## DayOfWeek, data = udaje_model)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13732.4 -1228.7 19.2 1470.4 7187.2
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.770e+04 6.895e+02 69.176 < 2e-16 ***
## Temperature -1.875e+02 2.804e+01 -6.688 2.56e-11 ***
## Humidity -5.302e+01 3.352e+00 -15.818 < 2e-16 ***
## WindSpeed -2.844e+02 1.919e+01 -14.819 < 2e-16 ***
## GeneralDiffuseFlows 1.980e+00 3.552e-01 5.574 2.64e-08 ***
## DiffuseFlows -3.643e+00 4.106e-01 -8.872 < 2e-16 ***
## factor(Hour)1 -1.324e+03 2.567e+02 -5.157 2.62e-07 ***
## factor(Hour)2 -8.817e+02 2.570e+02 -3.431 0.000607 ***
## factor(Hour)3 -6.175e+03 2.577e+02 -23.965 < 2e-16 ***
## factor(Hour)4 -1.199e+04 2.583e+02 -46.409 < 2e-16 ***
## factor(Hour)5 -1.655e+04 2.589e+02 -63.910 < 2e-16 ***
## factor(Hour)6 -1.698e+04 2.591e+02 -65.529 < 2e-16 ***
## factor(Hour)7 -1.590e+04 2.661e+02 -59.733 < 2e-16 ***
## factor(Hour)8 -1.381e+04 3.006e+02 -45.941 < 2e-16 ***
## factor(Hour)9 -1.144e+04 3.474e+02 -32.920 < 2e-16 ***
## factor(Hour)10 -9.447e+03 3.696e+02 -25.562 < 2e-16 ***
## factor(Hour)11 -7.655e+03 3.712e+02 -20.620 < 2e-16 ***
## factor(Hour)12 -6.561e+03 3.795e+02 -17.288 < 2e-16 ***
## factor(Hour)13 -5.888e+03 3.831e+02 -15.367 < 2e-16 ***
## factor(Hour)14 -5.350e+03 3.668e+02 -14.587 < 2e-16 ***
## factor(Hour)15 -5.289e+03 3.488e+02 -15.165 < 2e-16 ***
## factor(Hour)16 -5.390e+03 3.294e+02 -16.364 < 2e-16 ***
## factor(Hour)17 -4.018e+03 3.166e+02 -12.690 < 2e-16 ***
## factor(Hour)18 -1.297e+03 3.077e+02 -4.213 2.57e-05 ***
## factor(Hour)19 3.778e+03 2.749e+02 13.741 < 2e-16 ***
## factor(Hour)20 5.556e+03 2.634e+02 21.096 < 2e-16 ***
## factor(Hour)21 5.157e+03 2.595e+02 19.876 < 2e-16 ***
## factor(Hour)22 4.926e+03 2.576e+02 19.124 < 2e-16 ***
## factor(Hour)23 2.986e+03 2.568e+02 11.629 < 2e-16 ***
## DayOfWeekTue -4.510e+02 1.446e+02 -3.119 0.001829 **
## DayOfWeekWed 1.452e+02 1.480e+02 0.981 0.326639
## DayOfWeekThu 1.139e+02 1.419e+02 0.803 0.421980
## DayOfWeekFri 3.682e+02 1.414e+02 2.603 0.009275 **
## DayOfWeekSat 1.329e+03 1.445e+02 9.197 < 2e-16 ***
## DayOfWeekSun -7.597e+02 1.440e+02 -5.276 1.38e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2435 on 4285 degrees of freedom
## Multiple R-squared: 0.8902, Adjusted R-squared: 0.8893
## F-statistic: 1021 on 34 and 4285 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(model1)
par(mfrow = c(1, 1))
res1 <- residuals(model1)
jarque.bera.test(res1)
##
## Jarque Bera Test
##
## data: res1
## X-squared = 1187.6, df = 2, p-value < 2.2e-16
outlierTest(model1)
## rstudent unadjusted p-value Bonferroni p
## 25459 -5.685671 1.3897e-08 6.0033e-05
## 25460 -5.421674 6.2293e-08 2.6910e-04
## 25603 -5.119478 3.1982e-07 1.3816e-03
## 25461 -5.079365 3.9480e-07 1.7056e-03
## 25604 -4.874049 1.1329e-06 4.8941e-03
## 25462 -4.709636 2.5601e-06 1.1060e-02
model2 <- lm(PowerConsumption_Zone1 ~ Temperature + Humidity + WindSpeed +
I(log1p(GeneralDiffuseFlows)) + I(log1p(DiffuseFlows)) +
factor(Hour) + DayOfWeek,
data = udaje_model)
summary(model2)
##
## Call:
## lm(formula = PowerConsumption_Zone1 ~ Temperature + Humidity +
## WindSpeed + I(log1p(GeneralDiffuseFlows)) + I(log1p(DiffuseFlows)) +
## factor(Hour) + DayOfWeek, data = udaje_model)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13673.6 -1229.7 23.5 1471.0 7188.5
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 47442.32 679.80 69.789 < 2e-16 ***
## Temperature -175.44 27.57 -6.363 2.18e-10 ***
## Humidity -52.36 3.36 -15.585 < 2e-16 ***
## WindSpeed -282.95 19.20 -14.737 < 2e-16 ***
## I(log1p(GeneralDiffuseFlows)) 488.64 96.11 5.084 3.85e-07 ***
## I(log1p(DiffuseFlows)) -771.21 72.87 -10.583 < 2e-16 ***
## factor(Hour)1 -1318.24 256.72 -5.135 2.95e-07 ***
## factor(Hour)2 -872.53 256.95 -3.396 0.000691 ***
## factor(Hour)3 -6164.95 257.63 -23.929 < 2e-16 ***
## factor(Hour)4 -11968.97 258.20 -46.354 < 2e-16 ***
## factor(Hour)5 -16515.61 259.14 -63.732 < 2e-16 ***
## factor(Hour)6 -16439.43 337.58 -48.699 < 2e-16 ***
## factor(Hour)7 -14934.54 453.37 -32.942 < 2e-16 ***
## factor(Hour)8 -12797.71 520.90 -24.568 < 2e-16 ***
## factor(Hour)9 -10443.56 556.63 -18.762 < 2e-16 ***
## factor(Hour)10 -8394.69 570.70 -14.710 < 2e-16 ***
## factor(Hour)11 -6356.62 575.51 -11.045 < 2e-16 ***
## factor(Hour)12 -5197.98 577.35 -9.003 < 2e-16 ***
## factor(Hour)13 -4516.20 580.21 -7.784 8.77e-15 ***
## factor(Hour)14 -4052.55 571.55 -7.090 1.56e-12 ***
## factor(Hour)15 -4106.16 562.12 -7.305 3.29e-13 ***
## factor(Hour)16 -4350.30 550.03 -7.909 3.27e-15 ***
## factor(Hour)17 -3012.67 534.24 -5.639 1.82e-08 ***
## factor(Hour)18 -293.25 516.47 -0.568 0.570197
## factor(Hour)19 4786.46 451.68 10.597 < 2e-16 ***
## factor(Hour)20 5986.12 307.41 19.473 < 2e-16 ***
## factor(Hour)21 5125.05 259.31 19.764 < 2e-16 ***
## factor(Hour)22 4904.81 257.53 19.045 < 2e-16 ***
## factor(Hour)23 2979.92 256.78 11.605 < 2e-16 ***
## DayOfWeekTue -430.79 144.72 -2.977 0.002930 **
## DayOfWeekWed 196.60 148.22 1.326 0.184779
## DayOfWeekThu 125.31 141.51 0.886 0.375922
## DayOfWeekFri 396.80 141.02 2.814 0.004920 **
## DayOfWeekSat 1331.95 144.37 9.226 < 2e-16 ***
## DayOfWeekSun -771.53 143.96 -5.359 8.79e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2435 on 4285 degrees of freedom
## Multiple R-squared: 0.8902, Adjusted R-squared: 0.8893
## F-statistic: 1021 on 34 and 4285 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(model2)
par(mfrow = c(1, 1))
res2 <- residuals(model2)
jarque.bera.test(res2)
##
## Jarque Bera Test
##
## data: res2
## X-squared = 1211.7, df = 2, p-value < 2.2e-16
outlierTest(model2)
## rstudent unadjusted p-value Bonferroni p
## 25459 -5.661457 1.5991e-08 0.00006908
## 25460 -5.392039 7.3413e-08 0.00031715
## 25603 -5.105855 3.4360e-07 0.00148430
## 25461 -5.047628 4.6589e-07 0.00201260
## 25604 -4.861572 1.2063e-06 0.00521110
## 25462 -4.679678 2.9619e-06 0.01279600
coeftest(model2, vcov. = vcovHC(model2, type = "HC1"))
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 47442.3217 598.7108 79.2408 < 2.2e-16 ***
## Temperature -175.4384 24.1693 -7.2587 4.617e-13 ***
## Humidity -52.3578 2.9974 -17.4679 < 2.2e-16 ***
## WindSpeed -282.9507 22.4675 -12.5938 < 2.2e-16 ***
## I(log1p(GeneralDiffuseFlows)) 488.6411 82.5838 5.9169 3.536e-09 ***
## I(log1p(DiffuseFlows)) -771.2077 66.4586 -11.6043 < 2.2e-16 ***
## factor(Hour)1 -1318.2374 326.1968 -4.0412 5.410e-05 ***
## factor(Hour)2 -872.5329 363.4400 -2.4008 0.016403 *
## factor(Hour)3 -6164.9508 330.7945 -18.6368 < 2.2e-16 ***
## factor(Hour)4 -11968.9716 273.4080 -43.7770 < 2.2e-16 ***
## factor(Hour)5 -16515.6105 239.4738 -68.9663 < 2.2e-16 ***
## factor(Hour)6 -16439.4311 285.7244 -57.5360 < 2.2e-16 ***
## factor(Hour)7 -14934.5411 372.4581 -40.0972 < 2.2e-16 ***
## factor(Hour)8 -12797.7136 429.3244 -29.8090 < 2.2e-16 ***
## factor(Hour)9 -10443.5591 453.2561 -23.0412 < 2.2e-16 ***
## factor(Hour)10 -8394.6861 464.2972 -18.0804 < 2.2e-16 ***
## factor(Hour)11 -6356.6240 467.1706 -13.6066 < 2.2e-16 ***
## factor(Hour)12 -5197.9775 470.7731 -11.0414 < 2.2e-16 ***
## factor(Hour)13 -4516.2052 484.7641 -9.3163 < 2.2e-16 ***
## factor(Hour)14 -4052.5488 479.3877 -8.4536 < 2.2e-16 ***
## factor(Hour)15 -4106.1628 470.6623 -8.7242 < 2.2e-16 ***
## factor(Hour)16 -4350.3018 458.6032 -9.4860 < 2.2e-16 ***
## factor(Hour)17 -3012.6718 448.9005 -6.7112 2.182e-11 ***
## factor(Hour)18 -293.2530 457.3322 -0.6412 0.521411
## factor(Hour)19 4786.4585 411.4654 11.6327 < 2.2e-16 ***
## factor(Hour)20 5986.1194 284.1544 21.0664 < 2.2e-16 ***
## factor(Hour)21 5125.0494 263.6862 19.4362 < 2.2e-16 ***
## factor(Hour)22 4904.8127 262.5919 18.6785 < 2.2e-16 ***
## factor(Hour)23 2979.9242 285.9998 10.4193 < 2.2e-16 ***
## DayOfWeekTue -430.7847 170.4356 -2.5276 0.011522 *
## DayOfWeekWed 196.6000 158.8200 1.2379 0.215829
## DayOfWeekThu 125.3096 147.1084 0.8518 0.394363
## DayOfWeekFri 396.7987 150.0967 2.6436 0.008232 **
## DayOfWeekSat 1331.9508 141.7264 9.3980 < 2.2e-16 ***
## DayOfWeekSun -771.5285 166.1685 -4.6430 3.536e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Spotreba elektriny je výrazne ovplyvnená časovými efektmi, pričom meteorologické premenné poskytujú dodatočnú vysvetľujúcu informáciu. Pri veľkom počte pozorovaní je potrebné interpretovať testy normality rezíduí spolu s diagnostickými grafmi.