Project Title: Meter Readings from meters connected to 3 power plants in Jaisalmer, Rajasthan

NAME: ASWATHY GUNADEEP

EMAIL: aswathygunadeep@gmail.com

COLLEGE / COMPANY: NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA

setwd("C:/Users/user/Desktop/tarsha systems summer internship/project/data_3plants")
met1.df <- read.csv(paste("plant1.csv", sep=""))

SUMMARY OF THE VARIABLES OF THE DATASET

str(met1.df)
## 'data.frame':    1048575 obs. of  10 variables:
##  $ dataid        : int  70298689 70298690 70298691 70298692 70298693 70298694 70298695 70298696 70298697 70298698 ...
##  $ paramrefid    : int  45832289 45832289 45832289 45832289 45832289 45832289 45832289 45832289 45832289 45832289 ...
##  $ timestamp     : Factor w/ 16760 levels "2017-10-27 18:15:00+00",..: 385 386 387 388 389 390 391 392 393 394 ...
##  $ rawvalue      : num  0.00033 0.00033 0.00033 0.00034 0.00033 0.00033 0.00033 0.00033 0.00034 0.00033 ...
##  $ processedvalue: num  0.00033 0.00033 0.00033 0.00034 0.00033 0.00033 0.00033 0.00033 0.00034 0.00033 ...
##  $ meterid       : int  45829248 45829248 45829248 45829248 45829248 45829248 45829248 45829248 45829248 45829248 ...
##  $ tagid         : int  3006 3006 3006 3006 3006 3006 3006 3006 3006 3006 ...
##  $ interval      : int  15 15 15 15 15 15 15 15 15 15 ...
##  $ profileid     : int  1008 1008 1008 1008 1008 1008 1008 1008 1008 1008 ...
##  $ tagname       : Factor w/ 30 levels "Active Energy Export Time Integral 5",..: 1 1 1 1 1 1 1 1 1 1 ...
attach(met1.df)

VISUALIZATION: CORRGRAM

Problem Statement:We are trying to find out factors that drive energy usage in 3 plants located in Jaisalmer district of Rajasthan. The first two plants have 4 meters each, while the third one has 21 meters. This is the analysis for the 1st plant. Weather also plays a crucial role. Jaisalmer, being an arid desert region, is prone to extremes in terms of temperature. The temperature varies greatly from day to night in both summer and winter. The maximum summer temperature is around 49 °C (120 °F) while the minimum is 25 °C (77 °F). Alos, the meter readings are a combination of solar and wind generation units.

library(gplots)
## 
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
## 
##     lowess
library(corrplot)
## corrplot 0.84 loaded
par(mfrow=c(1,1))
sub.df <- subset(met1.df[c(1,2,4,5,6,7,8,9)])
corrplot.mixed(corr=cor(sub.df, use="complete.obs"), 
               upper = "pie", tl.pos="lt", main="corrgram")

Statistically, blue line shows that 2 continuous variables are positively correlated and negatively correlated if it is red. These are some of the possible conclusions that can be deduced from the above graph: There is strong relation between tagid and rawvalue. There is strong relation between tagid and processed value. There is strong relation between meterid and parameterreferenceid. rawvalue and processedvalue are almost the same. *profileid and inetrval are almost the same.

I 1. dataid and paramrefid distribution

par(mfrow=c(1,2))
with(met1.df, boxplot(dataid,col="red",main="dataid"))
with(met1.df, boxplot(paramrefid,col="brown", main="parameter refid"))

par(mfrow=c(1,1))

Since there are no outliners observed in the distribution of parameterrefid, the refid is not that skewed and have closeby values only.

  1. rawvalue and processedvalue distribution
par(mfrow=c(1,2))
with(met1.df, plot(rawvalue, col="yellowgreen", cex=0.4, main="rawvalue",log="xy"))
## Warning in xy.coords(x, y, xlabel, ylabel, log): 399144 y values <= 0
## omitted from logarithmic plot
with(met1.df, plot(processedvalue, col="darkcyan", cex=0.4, main="processedvalue",log="xy"))
## Warning in xy.coords(x, y, xlabel, ylabel, log): 399144 y values <= 0
## omitted from logarithmic plot

par(mfrow=c(1,1))

The graphs are almost exactly the same. The maximum reading goes upto more than 10000 energy units(the values are read on different energy units, for example ampere, kwh, kw etc.) which can be dangerous to the nearby localities.

  1. meterid and tagid distribution
mytable <- table(meterid)
mytable1 <- table(tagid)
par(mfrow=c(1,2))
pie(table(meterid),main="meterid", col=c("violet","cyan","darkseagreen"))
pie(table(tagid), cex=0.5, col=c("pink","violet","cyan","darkcyan","red","orange","blue","skyblue","turquoise","darkgreen","green"))

par(mfrow=c(1,1))
par(mfrow=c(1,1))
prop.table(mytable)*100
## meterid
## 45829248 45830379 45830385 
## 34.52514 34.52514 30.94972
prop.table(mytable1)*100
## tagid
##       2004       2005       2006       2110       2111       2112 
## 4.72183678 4.72183678 4.72183678 3.70436068 3.12347710 3.12347710 
##       2210       2213       2252       3001       3002       3005 
## 4.72183678 4.72183678 4.72183678 0.03337863 4.72183678 0.03337863 
##       3006       3011       3110       3111       3115       3116 
## 4.72183678 4.72183678 0.03337863 4.72183678 0.03337863 4.72183678 
##       3120       3121       3125       3126       3130       3131 
## 0.03337863 4.72183678 0.03337863 4.72183678 4.75521541 4.75521541 
##       3301       3302       3305       3306       8003       8004 
## 0.03337863 4.72183678 0.03337863 4.72183678 4.72183678 4.72183678

There are around 30 tagids used for different meter readings and there are only 3 meters used so 3 meterids.

  1. intervals at which the meter readings have been took
par(mfrow=c(1,1))
mytable <- table(interval)
prop.table(mytable)*100
## interval
##         15       1440 
## 99.6662137  0.3337863
barplot(mytable,main="inetrvals at which the meter readings have been took(in minutes)", ylim=c(0,12e+05),col=c("magenta","lightgreen"))

Most of the meters(3meters in plant1, 3 meters in plant2 and 21 meters in plant 3) have took readings at an interval of 15 minutes. The readings are either cumulative for the day, or even can be an average.

  1. profileid distribution
library(lattice)
barchart(table(profileid),main="profileid",xlab="count",ylab="profileid")

II 1. tagid and processedvalue

library(lattice)
plot(processedvalue ~ tagid, data=met1.df, log="xy", col="orange",main="tagid and processedvalue",cex=0.7)
## Warning in xy.coords(x, y, xlabel, ylabel, log): 399144 y values <= 0
## omitted from logarithmic plot

Most no. of readings have been taken for tagids in the range 3000-3300.

  1. profileid and interval of readings
barchart(interval ~ profileid,main="profileid and interval of readings")

  1. rawvalue and processedvalue are the same.
plot(rawvalue ~ processedvalue, log="xy",main="rawvalue and processedvalue", col="yellow")
## Warning in xy.coords(x, y, xlabel, ylabel, log): 399144 x values <= 0
## omitted from logarithmic plot
## Warning in xy.coords(x, y, xlabel, ylabel, log): 399144 y values <= 0
## omitted from logarithmic plot

Since the graph shows a linear relationship, both values are the same.

III For each tagid, we split the data into subsets.

par(mfrow=c(1,2))
sub.df <- subset(met1.df, tagid==2004)
sub1.df <- subset(met1.df, tagid==2005)
plot(sub.df$rawvalue, log="xy",main="tagid=2004",col="red",ylab="value")
## Warning in xy.coords(x, y, xlabel, ylabel, log): 958 y values <= 0 omitted
## from logarithmic plot
plot(sub1.df$rawvalue, log="xy", main="tagid=2005",col="orange", ylab="value")
## Warning in xy.coords(x, y, xlabel, ylabel, log): 967 y values <= 0 omitted
## from logarithmic plot

par(mfrow=c(1,1))
max(sub.df$rawvalue)
## [1] 3.389
mean(sub.df$rawvalue)
## [1] 0.5649939
max(sub1.df$rawvalue)
## [1] 3.398
mean(sub1.df$rawvalue)
## [1] 0.5595764

For tagid=2004, the average current measured goes upto 3.389 amperes. and for tagid=2005, the average current measured goes upto 3.389 amperes. The mean currnt are almost the ame for both.

par(mfrow=c(1,2))
sub.df <- subset(met1.df, tagid==2006)
sub1.df <- subset(met1.df, tagid==2110)
plot(sub.df$rawvalue, log="xy",main="tagid=2006",col="blue",ylab="value")
## Warning in xy.coords(x, y, xlabel, ylabel, log): 984 y values <= 0 omitted
## from logarithmic plot
plot(sub1.df$rawvalue, log="xy", main="tagid=2110",col="skyblue", ylab="value")
## Warning in xy.coords(x, y, xlabel, ylabel, log): 244 y values <= 0 omitted
## from logarithmic plot

par(mfrow=c(1,1))
max(sub.df$rawvalue)
## [1] 3.399
mean(sub.df$rawvalue)
## [1] 0.5622253
max(sub1.df$rawvalue)
## [1] 70.69
mean(sub1.df$rawvalue)
## [1] 63.84467

For tagid=2006, average current measured goes upto only 3.34 amperes. But for tagid=2110, the average volatge measured reaches a maximum value of 70.7 volts, which can be very hazardeous. Measures to reduce this much of energy usage needs to be taken.

par(mfrow=c(1,2))
sub.df <- subset(met1.df, tagid==2111)
sub1.df <- subset(met1.df, tagid==2112)
plot(sub.df$rawvalue, log="xy",main="tagid=2111",col="darkcyan",ylab="value")
## Warning in xy.coords(x, y, xlabel, ylabel, log): 238 y values <= 0 omitted
## from logarithmic plot
plot(sub1.df$rawvalue, log="xy", main="tagid=2112",col="cyan", ylab="value")
## Warning in xy.coords(x, y, xlabel, ylabel, log): 238 y values <= 0 omitted
## from logarithmic plot

par(mfrow=c(1,1))
max(sub.df$rawvalue)
## [1] 70.94
mean(sub.df$rawvalue)
## [1] 64.32853
max(sub1.df$rawvalue)
## [1] 70.72
mean(sub1.df$rawvalue)
## [1] 64.06652

For tagid=2111, the average voltage measured goes upto 71 volts and for tagid=2112, the voltage measured maximum value is 70.7 volts. These values might be from a solar plant.(because of such high values). Also the mean values measured also pretty high.(64volts for both)

par(mfrow=c(1,2))
sub.df <- subset(met1.df, tagid==2210)
sub1.df <- subset(met1.df, tagid==2213)
plot(sub.df$rawvalue, log="xy",main="tagid=2210",col="darkseagreen",ylab="value")
## Warning in xy.coords(x, y, xlabel, ylabel, log): 26589 y values <= 0
## omitted from logarithmic plot
plot(sub1.df$rawvalue, log="xy", main="tagid=2213",col="green", ylab="value")
## Warning in xy.coords(x, y, xlabel, ylabel, log): 22255 y values <= 0
## omitted from logarithmic plot

par(mfrow=c(1,1))
max(sub.df$rawvalue)
## [1] 1
mean(sub.df$rawvalue)
## [1] 0.4171192
max(sub1.df$rawvalue)
## [1] 0.209
mean(sub1.df$rawvalue)
## [1] 0.0369612

For tagid=2210, the value measured is a dimentionless quantity(powerfactor), which generally has a maximum value of 1. Powerfactor doesn’t affect energy consumption much.

par(mfrow=c(1,2))
sub.df <- subset(met1.df, tagid==2252)
sub1.df <- subset(met1.df, tagid==3001)
plot(sub.df$rawvalue, log="xy",main="tagid=2252",col="violet",ylab="value")
## Warning in xy.coords(x, y, xlabel, ylabel, log): 320 y values <= 0 omitted
## from logarithmic plot
plot(sub1.df$rawvalue, log="xy", main="tagid=3001",col="pink", ylab="value")

par(mfrow=c(1,1))
max(sub.df$rawvalue)
## [1] 50.28
mean(sub.df$rawvalue)
## [1] 49.64319
max(sub1.df$rawvalue)
## [1] 760.9999
mean(sub1.df$rawvalue)
## [1] 371.8952

The average frequency measured for tagid=2252 goes 50 Hz and the average active energy measured for tagid=3001 is 372kWh, cumulative for the whole day.

par(mfrow=c(1,2))
sub.df <- subset(met1.df, tagid==3002)
sub1.df <- subset(met1.df, tagid==3005)
plot(sub.df$rawvalue, log="xy",main="tagid=3002",col="red",ylab="value")
## Warning in xy.coords(x, y, xlabel, ylabel, log): 26646 y values <= 0
## omitted from logarithmic plot
plot(sub1.df$rawvalue, log="xy", main="tagid=3005",col="brown", ylab="value")

par(mfrow=c(1,1))
max(sub.df$rawvalue)
## [1] 0.15998
mean(sub.df$rawvalue)
## [1] 0.02436947
max(sub1.df$rawvalue)
## [1] 5.5187
mean(sub1.df$rawvalue)
## [1] 2.907322

The maximum active energy measured for tagid=3002 is a much lesser value of 0.16kWh, cumulative for the whole day and for tagid=3005 it is 5.5kWh.

par(mfrow=c(1,2))
sub.df <- subset(met1.df, tagid==3006)
sub1.df <- subset(met1.df, tagid==3011)
plot(sub.df$rawvalue, log="xy",main="tagid=3006",col="orange",ylab="value")
## Warning in xy.coords(x, y, xlabel, ylabel, log): 22323 y values <= 0
## omitted from logarithmic plot
plot(sub1.df$rawvalue, log="xy", main="tagid=3011",col="yellow", ylab="value")
## Warning in xy.coords(x, y, xlabel, ylabel, log): 26963 y values <= 0
## omitted from logarithmic plot

par(mfrow=c(1,1))
max(sub.df$rawvalue)
## [1] 0.00097
mean(sub.df$rawvalue)
## [1] 0.0001744971
max(sub1.df$rawvalue)
## [1] 0.15998
mean(sub1.df$rawvalue)
## [1] 0.02419497

The maximum active energy measured for tagid=3002 is a much lesser value of a few kWh, cumulative for the whole day and for tagid=3005 it is 0.16kWh.

par(mfrow=c(1,2))
sub.df <- subset(met1.df, tagid==3110)
sub1.df <- subset(met1.df, tagid==3111)
plot(sub.df$rawvalue, log="xy",main="tagid=3110",col="burlywood",ylab="value")
plot(sub1.df$rawvalue, log="xy", main="tagid=3111",col="darkolivegreen", ylab="value")
## Warning in xy.coords(x, y, xlabel, ylabel, log): 32854 y values <= 0
## omitted from logarithmic plot

par(mfrow=c(1,1))
max(sub.df$rawvalue)
## [1] 64.1161
mean(sub.df$rawvalue)
## [1] 25.3804
max(sub1.df$rawvalue)
## [1] 0.01408
mean(sub1.df$rawvalue)
## [1] 0.001484545

The maximum cumulative reactive energy measured is 64kvarh and for tagid=3111 it is very less.

par(mfrow=c(1,2))
sub.df <- subset(met1.df, tagid==3115)
sub1.df <- subset(met1.df, tagid==3116)
plot(sub.df$rawvalue, log="xy",main="tagid=3115",col="grey",ylab="value")
plot(sub1.df$rawvalue, log="xy", main="tagid=3116",col="thistle", ylab="value")
## Warning in xy.coords(x, y, xlabel, ylabel, log): 22149 y values <= 0
## omitted from logarithmic plot

par(mfrow=c(1,1))
max(sub.df$rawvalue)
## [1] 78.5634
mean(sub.df$rawvalue)
## [1] 38.01008
max(sub1.df$rawvalue)
## [1] 0.01072
mean(sub1.df$rawvalue)
## [1] 0.002538193

The maximum cumulative reactive energy measured is 64kvarh and for tagid=3116 it is very less.

par(mfrow=c(1,2))
sub.df <- subset(met1.df, tagid==3120)
sub1.df <- subset(met1.df, tagid==3121)
plot(sub.df$rawvalue, log="xy",main="tagid=3120",col="lightblue",ylab="value")
plot(sub1.df$rawvalue, log="xy", main="tagid=3121",col="hotpink", ylab="value")
## Warning in xy.coords(x, y, xlabel, ylabel, log): 49397 y values <= 0
## omitted from logarithmic plot

par(mfrow=c(1,1))
max(sub.df$rawvalue)
## [1] 0.2387
mean(sub.df$rawvalue)
## [1] 0.1508291
max(sub1.df$rawvalue)
## [1] 5e-05
mean(sub1.df$rawvalue)
## [1] 3.251737e-08

The data is a bit skewed. The maximum cumulative reactive energy measured is 0.24kvarh and for tagid=3116 it is very less.

par(mfrow=c(1,2))
sub.df <- subset(met1.df, tagid==3125)
sub1.df <- subset(met1.df, tagid==3126)
plot(sub.df$rawvalue, log="xy",main="tagid=3125",col="red",ylab="value")
plot(sub1.df$rawvalue, log="xy", main="tagid=3126",col="yellow", ylab="value")
## Warning in xy.coords(x, y, xlabel, ylabel, log): 41758 y values <= 0
## omitted from logarithmic plot

par(mfrow=c(1,1))
max(sub.df$rawvalue)
## [1] 14.4457
mean(sub.df$rawvalue)
## [1] 4.123199
max(sub1.df$rawvalue)
## [1] 0.01043
mean(sub1.df$rawvalue)
## [1] 0.0003459571

The maximum cumulative reactive energy measured is 14kvarh for tagid=3125 and for tagid=3116 it is very less.

par(mfrow=c(1,2))
sub.df <- subset(met1.df, tagid==3130)
sub1.df <- subset(met1.df, tagid==3131)
plot(sub.df$rawvalue, log="xy",main="tagid=3130",col="green",ylab="value")
## Warning in xy.coords(x, y, xlabel, ylabel, log): 31477 y values <= 0
## omitted from logarithmic plot
plot(sub1.df$rawvalue, log="xy", main="tagid=3131",col="cyan", ylab="value")
## Warning in xy.coords(x, y, xlabel, ylabel, log): 43869 y values <= 0
## omitted from logarithmic plot

par(mfrow=c(1,1))
max(sub.df$rawvalue)
## [1] 52.2515
mean(sub.df$rawvalue)
## [1] 0.1561168
max(sub1.df$rawvalue)
## [1] 8.1932
mean(sub1.df$rawvalue)
## [1] 0.03302984
par(mfrow=c(1,2))
sub.df <- subset(met1.df, tagid==3301)
sub1.df <- subset(met1.df, tagid==3302)
plot(sub.df$rawvalue, log="xy",main="tagid=3301",col="magenta",ylab="value")
plot(sub1.df$rawvalue, log="xy", main="tagid=3302",col="wheat", ylab="value")
## Warning in xy.coords(x, y, xlabel, ylabel, log): 26446 y values <= 0
## omitted from logarithmic plot

par(mfrow=c(1,1))

The maximum values are 52kvarh and 8kvarh.

par(mfrow=c(1,2))
sub.df <- subset(met1.df, tagid==3305)
sub1.df <- subset(met1.df, tagid==3306)
plot(sub.df$rawvalue, log="xy",main="tagid=3305",col="maroon",ylab="value")
plot(sub1.df$rawvalue, log="xy", main="tagid=3306",col="yellow", ylab="value")
## Warning in xy.coords(x, y, xlabel, ylabel, log): 22149 y values <= 0
## omitted from logarithmic plot

par(mfrow=c(1,1))
max(sub.df$rawvalue)
## [1] 78.8812
mean(sub.df$rawvalue)
## [1] 38.44312
max(sub1.df$rawvalue)
## [1] 0.01072
mean(sub1.df$rawvalue)
## [1] 0.002544449

The maximum cumulative energy measured is 78kVAh and 0.01kVAh respectively.

par(mfrow=c(1,2))
sub.df <- subset(met1.df, tagid==8003)
sub1.df <- subset(met1.df, tagid==8004)
plot(sub.df$rawvalue, log="xy",main="tagid=3002",col="cyan",ylab="value")
## Warning in xy.coords(x, y, xlabel, ylabel, log): 320 y values <= 0 omitted
## from logarithmic plot
plot(sub1.df$rawvalue, log="xy", main="tagid=3005",col="orange", ylab="value")

par(mfrow=c(1,1))
max(sub.df$rawvalue)
## [1] 63
mean(sub.df$rawvalue)
## [1] 47.46922
max(sub1.df$rawvalue)
## [1] 65535
mean(sub1.df$rawvalue)
## [1] 65535

The energy values recorded are pretty high, 63 and 65535 units.

IV Pearson’s correltion test and Chi-Squared test to confirm similarity.

  1. rawvalue and processedvalue are the same.
chisq.test(sub.df$rawvalue,sub.df$processedvalue)
## Warning in chisq.test(sub.df$rawvalue, sub.df$processedvalue): Chi-squared
## approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  sub.df$rawvalue and sub.df$processedvalue
## X-squared = 1188300, df = 576, p-value < 2.2e-16
cor(met1.df$rawvalue,met1.df$processedvalue)
## [1] 1

Since the p-value is very small and there is a correlation of 1, both are the same.

  1. profileid and interval of measure are the same.
chisq.test(met1.df$profileid, met1.df$interval)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  met1.df$profileid and met1.df$interval
## X-squared = 1048300, df = 1, p-value < 2.2e-16
cor(met1.df$profileid,met1.df$interval)
## [1] 1

Since the p-value is very small and there is a correlation of 1, both are the same.