Project Title: Meter Readings from a Railway Station (part 2)

NAME: ASWATHY GUNADEEP

EMAIL: aswathygunadeep@gmail.com

COLLEGE / COMPANY: NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA

setwd("C:/Users/user/Desktop/tarsha systems summer internship/datasets")
met1.df <- read.csv(paste("1.csv", sep=""))
sub.df <- subset(met1.df[,c(1,5,9,13,20,24,25,26,27,28,29,36,37,38,39,40)])
attach(sub.df)

Forecasting future 3 results We use forecast function to predict possible next 3 values. We 1st convert whatever numeric vector values are present in our dataset into an R time series object.

library(forecast)
myts <- ts(sub.df)
forecast(myts, 3)
## W.Total
##       Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 10481       4365.869 1770.950 6960.788 397.2834 8334.455
## 10482       4365.869 1698.642 7033.097 286.6965 8445.042
## 10483       4365.869 1628.242 7103.496 179.0296 8552.709
## 
## VAr.Total
##       Point Forecast     Lo 80    Hi 80     Lo 95     Hi 95
## 10481      -1141.739 -2472.715 189.2370 -3177.291  893.8131
## 10482      -1141.739 -2517.629 234.1510 -3245.981  962.5032
## 10483      -1141.739 -2561.123 277.6445 -3312.499 1029.0208
## 
## P.F
##       Point Forecast      Lo 80     Hi 80      Lo 95     Hi 95
## 10481      0.1060602 -0.2430591 0.4551795 -0.4278717 0.6399921
## 10482      0.1060602 -0.2575750 0.4696955 -0.4500719 0.6621923
## 10483      0.1060602 -0.2715333 0.4836538 -0.4714193 0.6835397
## 
## VA.Total
##       Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 10481         5365.4 3232.608 7498.193 2103.576 8627.225
## 10482         5365.4 3195.945 7534.856 2047.505 8683.296
## 10483         5365.4 3159.892 7570.909 1992.366 8738.435
## 
## Amps.Ave.
##       Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 10481       7.936767 4.651888 11.22165 2.912977 12.96056
## 10482       7.936767 4.601398 11.27214 2.835759 13.03778
## 10483       7.936767 4.551660 11.32187 2.759693 13.11384
## 
## Frequency
##       Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 10481       49.98045 49.34948 50.61141 49.01547 50.94543
## 10482       49.98045 49.34948 50.61141 49.01546 50.94543
## 10483       49.98045 49.34948 50.61141 49.01546 50.94543
## 
## Wh.Rec.
##       Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 10481       11700000 11691561 11708439 11687094 11712906
## 10482       11700000 11684735 11715265 11676654 11723346
## 10483       11700000 11676845 11723155 11664588 11735412
## 
## VAh.Rec.
##       Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 10481       15403297 15383782 15422812 15373451 15433143
## 10482       15404767 15383400 15426133 15372089 15437444
## 10483       15406236 15383165 15429307 15370952 15441520
## 
## VArh.I.Rec.
##       Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 10481       528997.5 528937.9 529057.2 528906.3 529088.7
## 10482       529312.6 529202.1 529423.2 529143.5 529481.7
## 10483       529621.4 529453.9 529789.0 529365.2 529877.7
## 
## VArh.C.Rec.
##       Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 10481       -8630275 -8639493 -8621058 -8644372 -8616178
## 10482       -8630882 -8640153 -8621612 -8645060 -8616704
## 10483       -8631489 -8640814 -8622164 -8645750 -8617228
## 
## Neutral.Current
##       Point Forecast      Lo 80    Hi 80      Lo 95    Hi 95
## 10481      0.3230546 -0.4748936 1.121003 -0.8973019 1.543411
## 10482      0.3230546 -0.5419039 1.188013 -0.9997852 1.645894
## 10483      0.3230546 -0.6040834 1.250193 -1.0948806 1.740990
## 
## Rising.Demand
##       Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 10481       4783.708 4081.888 5485.527 3710.367 5857.048
## 10482       4793.128 3887.345 5698.912 3407.852 6178.405
## 10483       4800.673 3722.367 5878.979 3151.547 6449.800
## 
## Maximum.Demand
##       Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 10481       8790.703 8686.287 8895.119 8631.012 8950.394
## 10482       8790.703 8685.485 8895.921 8629.786 8951.620
## 10483       8790.703 8684.689 8896.717 8628.569 8952.837
## 
## RPM
##       Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 10481       1499.413 1480.484 1518.342 1470.464 1528.363
## 10482       1499.413 1480.484 1518.342 1470.464 1528.363
## 10483       1499.413 1480.484 1518.342 1470.464 1528.363
## 
## Load.Hours.Received
##       Point Forecast      Lo 80      Hi 80      Lo 95      Hi 95
## 10481     3049055482 2407024932 3691086032 2067154488 4030956475
## 10482     3049055482 2148095545 3950015418 1671156165 4426954799
## 10483     3049055482 1948485453 4149625510 1365878873 4732232090
## 
## No.Of.Intrruptions
##       Point Forecast   Lo 80   Hi 80   Lo 95   Hi 95
## 10481        5112300 5098993 5125606 5091949 5132650
## 10482        5112729 5095059 5130399 5085705 5139753
## 10483        5113158 5092002 5134314 5080803 5145513
plot(forecast(myts, 3))

Exponential models to forecast results:

Both the HoltWinters() function and the ets() function in the forecast package, can be used to fit exponential models.

Let us try for a single variable, to predict total power and average current.

  1. Using ets() function for automated forecasting:
myts1 <- ts(sub.df$W.Total)
fit <- ets(myts1)
accuracy(fit)
##                      ME     RMSE      MAE  MPE MAPE      MASE        ACF1
## Training set 0.04446492 2024.826 1297.163 -Inf  Inf 0.8568332 -0.02051748
forecast(fit, 5)
##       Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
## 10481       4365.869 1770.950 6960.788 397.28336 8334.455
## 10482       4365.869 1698.642 7033.097 286.69653 8445.042
## 10483       4365.869 1628.242 7103.496 179.02961 8552.709
## 10484       4365.869 1559.608 7172.130  74.06283 8657.675
## 10485       4365.869 1492.613 7239.126 -28.39729 8760.136
plot(forecast(fit, 5))

  1. Using HoltWinters() function to fit exponential models
par(mfrow=c(1,1))
myts2 <- ts(sub.df$Amps.Ave.)
fit1 <- HoltWinters(myts2, beta=FALSE, gamma=FALSE)
forecast(fit1, 3)
##       Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 10481       7.936228 4.650865 11.22159 2.911698 12.96076
## 10482       7.936228 4.600239 11.27222 2.834273 13.03818
## 10483       7.936228 4.550370 11.32209 2.758006 13.11445
plot(forecast(fit1, 3))

Multiple linear regression(1)

fit2 <- lm(sub.df$W.Total ~ ., data=sub.df) 
summary(fit2)
## 
## Call:
## lm(formula = sub.df$W.Total ~ ., data = sub.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9484.5   -88.2    -0.2    82.5  1004.7 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -5.899e+01  2.403e+02  -0.245  0.80612    
## VAr.Total            5.838e-01  3.964e-03 147.274  < 2e-16 ***
## P.F                  1.613e+03  1.407e+01 114.633  < 2e-16 ***
## VA.Total             1.052e+00  1.637e-02  64.271  < 2e-16 ***
## Amps.Ave.           -5.228e+01  1.098e+01  -4.762 1.95e-06 ***
## Frequency            3.771e+04  2.247e+05   0.168  0.86674    
## Wh.Rec.              1.379e-04  8.971e-05   1.537  0.12439    
## VAh.Rec.             3.901e-05  1.197e-04   0.326  0.74444    
## VArh.I.Rec.         -5.339e-04  1.333e-04  -4.005 6.25e-05 ***
## VArh.C.Rec.          2.776e-04  1.069e-04   2.597  0.00942 ** 
## Neutral.Current      3.251e+01  2.813e+00  11.557  < 2e-16 ***
## Rising.Demand        1.568e-01  2.176e-03  72.047  < 2e-16 ***
## Maximum.Demand       2.978e-02  1.016e-02   2.931  0.00338 ** 
## RPM                 -1.258e+03  7.491e+03  -0.168  0.86665    
## Load.Hours.Received -5.201e-09  1.815e-09  -2.866  0.00417 ** 
## No.Of.Intrruptions   5.486e-05  1.044e-05   5.255 1.51e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 230 on 10464 degrees of freedom
## Multiple R-squared:  0.9899, Adjusted R-squared:  0.9899 
## F-statistic: 6.843e+04 on 15 and 10464 DF,  p-value: < 2.2e-16

The variables having 3 stars near the column having p values of the estimates, these stars show that the p-values of these estimates are statistically significant.Therefore, the affecting variables in this case are VAr.Total(total reactive power), P.F(power factor),VA.Total(total apparent power),Amps.Ave(average current), VArh.I.Rec.(Recative Inductive Energgy Received), Neutral.Current, Rising.Demand, No.Of.Intrruptions. The multiple R-squared value and the low residual standard error suggests that this is a pretty good model in determining total active power.

Multiple linear regression(2)

fit2 <- lm(sub.df$VAr.Total ~ ., data=sub.df) 
summary(fit2)
## 
## Call:
## lm(formula = sub.df$VAr.Total ~ ., data = sub.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1550.0  -102.8   -21.8    58.8 13487.6 
## 
## Coefficients:
##                       Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)          4.430e+02  3.381e+02    1.310 0.190109    
## W.Total              1.156e+00  7.846e-03  147.274  < 2e-16 ***
## P.F                 -2.916e+03  8.479e+00 -343.941  < 2e-16 ***
## VA.Total            -1.854e+00  2.028e-02  -91.439  < 2e-16 ***
## Amps.Ave.            5.395e+02  1.454e+01   37.117  < 2e-16 ***
## Frequency           -2.677e+05  3.162e+05   -0.847 0.397249    
## Wh.Rec.             -5.652e-04  1.261e-04   -4.481 7.49e-06 ***
## VAh.Rec.             1.143e-04  1.683e-04    0.679 0.497020    
## VArh.I.Rec.          1.122e-03  1.874e-04    5.989 2.18e-09 ***
## VArh.C.Rec.         -5.795e-04  1.504e-04   -3.854 0.000117 ***
## Neutral.Current     -6.855e+01  3.927e+00  -17.458  < 2e-16 ***
## Rising.Demand       -8.349e-02  3.655e-03  -22.844  < 2e-16 ***
## Maximum.Demand      -1.018e-01  1.427e-02   -7.135 1.04e-12 ***
## RPM                  8.923e+03  1.054e+04    0.847 0.397231    
## Load.Hours.Received  6.630e-09  2.553e-09    2.596 0.009431 ** 
## No.Of.Intrruptions  -6.139e-05  1.469e-05   -4.178 2.96e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 323.6 on 10464 degrees of freedom
## Multiple R-squared:  0.9666, Adjusted R-squared:  0.9666 
## F-statistic: 2.021e+04 on 15 and 10464 DF,  p-value: < 2.2e-16

The affecting variables in this case are W.Total(total active power), P.F(power factor),VA.Total(total apparent power),Amps.Ave(average current), Wh.Rec.(ActiveEnergy Received),VArh.I.Rec.(Recative Inductive Energgy Received), VArh.C.Rec(Recative Capacitive Energgy Received),Maximum.Demand, Neutral.Current, Rising.Demand, No.Of.Intrruptions. The multiple R-squared value and the residual standard error suggests that this model could be used, but not suggested, in determining total reactive power.

Multiple linear regression(3)

fit3 <- lm(sub.df$VA.Total ~ ., data=sub.df) 
summary(fit3)
## 
## Call:
## lm(formula = sub.df$VA.Total ~ ., data = sub.df)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -903.7  -51.6    0.1   51.3 3837.0 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -1.035e+02  1.215e+02  -0.851 0.394569    
## W.Total              2.690e-01  4.185e-03  64.271  < 2e-16 ***
## VAr.Total           -2.395e-01  2.619e-03 -91.439  < 2e-16 ***
## P.F                 -6.740e+02  8.416e+00 -80.081  < 2e-16 ***
## Amps.Ave.            4.918e+02  2.788e+00 176.391  < 2e-16 ***
## Frequency            2.752e+04  1.136e+05   0.242 0.808607    
## Wh.Rec.             -1.724e-04  4.533e-05  -3.803 0.000144 ***
## VAh.Rec.            -3.271e-05  6.050e-05  -0.541 0.588803    
## VArh.I.Rec.          5.059e-04  6.728e-05   7.519 5.99e-14 ***
## VArh.C.Rec.         -2.847e-04  5.400e-05  -5.272 1.38e-07 ***
## Neutral.Current      3.626e+00  1.431e+00   2.534 0.011299 *  
## Rising.Demand       -3.615e-03  1.345e-03  -2.687 0.007213 ** 
## Maximum.Demand      -1.762e-02  5.137e-03  -3.429 0.000608 ***
## RPM                 -9.172e+02  3.788e+03  -0.242 0.808671    
## Load.Hours.Received  4.512e-10  9.179e-10   0.492 0.623013    
## No.Of.Intrruptions  -1.949e-05  5.282e-06  -3.691 0.000225 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 116.3 on 10464 degrees of freedom
## Multiple R-squared:  0.9957, Adjusted R-squared:  0.9957 
## F-statistic: 1.632e+05 on 15 and 10464 DF,  p-value: < 2.2e-16

The affecting variables in this case are W.Total(total active power), VAr.Total(total reactive power), P.F(power factor),VA.Total(total apparent power),Amps.Ave(average current), Wh.Rec.(ActiveEnergy Received),VArh.I.Rec.(Recative Inductive Energgy Received), VArh.C.Rec(Recative Capacitive Energgy Received),Maximum.Demand, No.Of.Intrruptions. The multiple R-squared value and the low residual standard error suggests that this is a pretty good model in determining total apparent power.

Multiple linear regression(4)

fit4<- lm(sub.df$Amps.Ave. ~ ., data=sub.df) 
summary(fit4)
## 
## Call:
## lm(formula = sub.df$Amps.Ave. ~ ., data = sub.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.1688 -0.1165  0.0047  0.1092  2.5197 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          2.100e-01  2.138e-01   0.982 0.325970    
## W.Total             -4.136e-05  8.686e-06  -4.762 1.95e-06 ***
## VAr.Total            2.156e-04  5.810e-06  37.117  < 2e-16 ***
## P.F                  6.427e-01  1.772e-02  36.264  < 2e-16 ***
## VA.Total             1.522e-03  8.627e-06 176.391  < 2e-16 ***
## Frequency           -6.242e+01  1.999e+02  -0.312 0.754847    
## Wh.Rec.              2.993e-07  7.975e-08   3.752 0.000176 ***
## VAh.Rec.             5.522e-08  1.064e-07   0.519 0.603851    
## VArh.I.Rec.         -8.151e-07  1.184e-07  -6.884 6.17e-12 ***
## VArh.C.Rec.          4.785e-07  9.501e-08   5.037 4.81e-07 ***
## Neutral.Current     -1.401e-02  2.515e-03  -5.572 2.58e-08 ***
## Rising.Demand       -7.229e-05  2.259e-06 -31.998  < 2e-16 ***
## Maximum.Demand       2.723e-05  9.037e-06   3.013 0.002592 ** 
## RPM                  2.080e+00  6.663e+00   0.312 0.754865    
## Load.Hours.Received  1.169e-12  1.615e-12   0.724 0.469060    
## No.Of.Intrruptions   1.541e-08  9.296e-09   1.658 0.097321 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2046 on 10464 degrees of freedom
## Multiple R-squared:  0.9943, Adjusted R-squared:  0.9943 
## F-statistic: 1.219e+05 on 15 and 10464 DF,  p-value: < 2.2e-16

The affecting variables in this case are W.Total(total active power), VAr.Total(total reactive power), P.F(power factor),VA.Total(total apparent power), Wh.Rec.(ActiveEnergy Received),VArh.I.Rec.(Recative Inductive Energgy Received), VArh.C.Rec(Recative Capacitive Energgy Received),Neutral.Current,Rising.Demand. The multiple R-squared value and the very less residual standard error suggests that this is a very good model in determining average current.