Project Title: Time series analysis and forecasting on average power factor
NAME: ASWATHY GUNADEEP
EMAIL: aswathygunadeep@gmail.com
COLLEGE / COMPANY: NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA
setwd("C:/Users/user/Desktop/tarsha systems summer internship/project/data_3plants")
met1.df <- read.csv(paste("plant1-time sort.csv", sep=""))
str(met1.df)
## 'data.frame': 1048575 obs. of 10 variables:
## $ dataid : int 52733463 52728738 52761108 52758796 52719389 52763320 52765532 52751860 52754172 52756484 ...
## $ paramrefid : int 45831866 45831867 45831868 45831869 45831870 45831871 45831872 45831873 45831874 45831875 ...
## $ timestamp : Factor w/ 16760 levels "2017-10-27 18:15:00+00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ rawvalue : num 0.00011 0 0.00173 0 49.91 ...
## $ processedvalue: num 0.00011 0 0.00173 0 49.91 ...
## $ meterid : int 45830385 45830385 45830385 45830385 45830385 45830385 45830385 45830385 45830385 45830385 ...
## $ tagid : int 3006 3002 3306 3302 2252 2210 2213 2004 2005 2006 ...
## $ interval : int 15 15 15 15 15 15 15 15 15 15 ...
## $ profileid : int 1008 1008 1008 1008 1008 1008 1008 1008 1008 1008 ...
## $ tagname : Factor w/ 30 levels "Active Energy Export Time Integral 5",..: 1 2 3 4 5 6 7 16 17 18 ...
attach(met1.df)
sub1.df <- subset(met1.df[,c(3,4)],tagname=="Avg Power Factor")
View(sub1.df)
sub.df <- subset(sub1.df[,c(2)], sub1.df$rawvalue>"0")
View(sub.df)
The average power factor is 0.41(it has a maximum value of 1).
TIME SERIES ANALYSIS
powerfactor <- as.ts(sub.df)
class(powerfactor)
## [1] "ts"
start(powerfactor)
## [1] 1 1
end(powerfactor)
## [1] 22923 1
plot(powerfactor)
abline(reg=lm(powerfactor~time(powerfactor)))
plot(log(powerfactor))
plot(diff(log(powerfactor)))
The mean and variance does not vary with time and hence it is a stationary series. Since we differentiated it once to make it a stationary time series, d=1.
ARIMA MODEL
library(tseries)
adf.test(diff(log(powerfactor)), alternative=c("stationary","explosive"), k=0)
## Warning in adf.test(diff(log(powerfactor)), alternative = c("stationary", :
## p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: diff(log(powerfactor))
## Dickey-Fuller = -193.68, Lag order = 0, p-value = 0.01
## alternative hypothesis: stationary
We see that the series is stationary enough to do any kind of time series modelling.
par(mfrow=c(1,2))
acf(powerfactor)
pacf(powerfactor)
par(mfrow=c(1,1))
The blue line above shows significantly different values than zero. Clearly, the graph above has a cut off on PACF curve after 3rd lag which means this is mostly an AR(3) process.
acf(log(powerfactor))
par(mfrow=c(1,2))
acf(diff(log(powerfactor)))
pacf(diff(log(powerfactor)))
fit <- arima(log(powerfactor), c(0, 1, 3),seasonal = list(order = c(0, 1, 3)))
pred <- predict(fit, n.ahead = 5*4)
ts.plot(powerfactor,2.718^pred$pred, log = "y", lty = c(1,3))
library(forecast)
plot(forecast(fit,100),ylim=c(-10,10))