Project Title: Time series analysis and forecasting on average power factor

NAME: ASWATHY GUNADEEP

EMAIL: aswathygunadeep@gmail.com

COLLEGE / COMPANY: NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA

setwd("C:/Users/user/Desktop/tarsha systems summer internship/project/data_3plants")
met1.df <- read.csv(paste("plant1-time sort.csv", sep=""))
str(met1.df)
## 'data.frame':    1048575 obs. of  10 variables:
##  $ dataid        : int  52733463 52728738 52761108 52758796 52719389 52763320 52765532 52751860 52754172 52756484 ...
##  $ paramrefid    : int  45831866 45831867 45831868 45831869 45831870 45831871 45831872 45831873 45831874 45831875 ...
##  $ timestamp     : Factor w/ 16760 levels "2017-10-27 18:15:00+00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ rawvalue      : num  0.00011 0 0.00173 0 49.91 ...
##  $ processedvalue: num  0.00011 0 0.00173 0 49.91 ...
##  $ meterid       : int  45830385 45830385 45830385 45830385 45830385 45830385 45830385 45830385 45830385 45830385 ...
##  $ tagid         : int  3006 3002 3306 3302 2252 2210 2213 2004 2005 2006 ...
##  $ interval      : int  15 15 15 15 15 15 15 15 15 15 ...
##  $ profileid     : int  1008 1008 1008 1008 1008 1008 1008 1008 1008 1008 ...
##  $ tagname       : Factor w/ 30 levels "Active Energy Export Time Integral 5",..: 1 2 3 4 5 6 7 16 17 18 ...
attach(met1.df)
sub1.df <- subset(met1.df[,c(3,4)],tagname=="Avg Power Factor")
View(sub1.df)
sub.df <- subset(sub1.df[,c(2)], sub1.df$rawvalue>"0")
View(sub.df)

The average power factor is 0.41(it has a maximum value of 1).

TIME SERIES ANALYSIS

powerfactor <- as.ts(sub.df)
class(powerfactor)
## [1] "ts"
start(powerfactor)
## [1] 1 1
end(powerfactor)
## [1] 22923     1
plot(powerfactor)
abline(reg=lm(powerfactor~time(powerfactor)))

plot(log(powerfactor))

plot(diff(log(powerfactor)))

The mean and variance does not vary with time and hence it is a stationary series. Since we differentiated it once to make it a stationary time series, d=1.

ARIMA MODEL

library(tseries)
adf.test(diff(log(powerfactor)), alternative=c("stationary","explosive"), k=0)
## Warning in adf.test(diff(log(powerfactor)), alternative = c("stationary", :
## p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(log(powerfactor))
## Dickey-Fuller = -193.68, Lag order = 0, p-value = 0.01
## alternative hypothesis: stationary

We see that the series is stationary enough to do any kind of time series modelling.

par(mfrow=c(1,2))
acf(powerfactor)
pacf(powerfactor)

par(mfrow=c(1,1))

The blue line above shows significantly different values than zero. Clearly, the graph above has a cut off on PACF curve after 3rd lag which means this is mostly an AR(3) process.

acf(log(powerfactor))

par(mfrow=c(1,2))
acf(diff(log(powerfactor)))
pacf(diff(log(powerfactor)))

fit <- arima(log(powerfactor), c(0, 1, 3),seasonal = list(order = c(0, 1, 3)))
pred <- predict(fit, n.ahead = 5*4)
ts.plot(powerfactor,2.718^pred$pred, log = "y", lty = c(1,3))

library(forecast)
plot(forecast(fit,100),ylim=c(-10,10))