Introduction

The Individual household electric power consumption Data Set is a very interesting dataset available in UCI Machine Learning Repository. This Dataset has huge number of instances which makes analysis more precise and challenging. This dataset contains about 2075259 instances and 9 attributes. It basically holds data on the power consumption made in a individual house over a period of 4 years with a sampling rate of 1 minute.

Objective

Our objective is to analyse the power consumption made in an individual house over a period of 4 years i.e. from december 2006 to november 2010. We can find the peak usage and also when electricity has not at all been consumed. We can plot various graphs to analyse this and produce our predictive analysis on this dataset. This dataset can be both done using regression or clustering.

Description about Dataset

1.date: Date in format dd/mm/yyyy

2.time: time in format hh:mm:ss

3.global_active_power: household global minute-averaged active power (in kilowatt). Global active power is the power consumed by appliances other than the appliances mapped to Sub Meters. Global active power is the real power consumption i.e. the power consumed by electrical appliances other than the sub metered appliances.It is basically called wattfull power.

4.global_reactive_power: household global minute-averaged reactive power (in kilowatt). Global reactive power is the power which bounces back and froth without any usage or leakage. It is the imaginary power consumption. It is basically called wattless power.

5.voltage: minute-averaged voltage (in volt)

6.global_intensity: household global minute-averaged current intensity (in ampere). Intensity is magnitude of the power consumed. Also called as strength of current.

7.sub_metering_1: energy sub-metering No. 1 (in watt-hour of active energy). It corresponds to the kitchen, containing mainly a dishwasher, an oven and a microwave (hot plates are not electric but gas powered).

8.sub_metering_2: energy sub-metering No. 2 (in watt-hour of active energy). It corresponds to the laundry room, containing a washing-machine, a tumble-drier, a refrigerator and a light.

9.sub_metering_3: energy sub-metering No. 3 (in watt-hour of active energy). It corresponds to an electric water-heater and an air-conditioner.

Models we can use

With this dataset we can use both Regression and Clustering.

1. Regression: We can use the graphs and predict what will the power consumprion for the next hour. Since electricity consumption is found to be numeric we can use regression model.

2. Clustering: We can also group these data into small groups i.e. Data of electricity consumption for 3 months can be clustered or grouped so that we can have an effective predictive model.

Reading the csv file

elec = read.csv("C:\\Users\\Damodaran\\Desktop\\Project\\el7.csv", header = T)

Processing of data

e1<-as.numeric(elec$GAP)
e2<-as.numeric(elec$GRP)
e3<-as.numeric(elec$V)
e4<-as.numeric(elec$GI)
e5<-as.numeric(elec$SB1)
e6<-as.numeric(elec$SB2)
e7<-as.numeric(elec$SB3)

Linear Model 1

elec1 <- lm(e1 ~ e2, data = elec)

Linear Model 2

elec2 <- lm(e1 ~ e3, data = elec)

Linear Model 3

elec3 <- lm(e1 ~ e2 + e4, data = elec)
predict(elec3, data = elec)
##         1         2         3         4         5         6         7 
## 1.9062868 1.4205145 1.5544700 1.5734990 1.1987225 1.1988253 1.4267200 
##         8         9        10        11        12        13        14 
## 0.9218618 0.8561097 0.9810192 1.0321137 0.8934448 0.7662382 0.7224313 
##        15        16        17        18        19        20        21 
## 0.6326043 0.7782531 0.7616811 0.9021542 1.0465592 1.1534633 1.0616692 
##        22        23        24        25        26        27        28 
## 1.1757962 1.4067941 1.5270307 1.7061855 1.4812157 1.4638150 1.3569702 
##        29        30        31        32        33        34        35 
## 0.9903034 1.2115269 1.2640589 1.2515183 0.9513844 1.0171626 1.0154945 
##        36        37        38        39        40        41        42 
## 1.0316662 0.9626875 0.8353368 0.7587419 0.3120323 0.2284210 0.9467264 
##        43        44        45        46        47        48        49 
## 1.0208919 1.1902717 1.0763046 1.2830739 1.4994149 1.3331603 1.1059714 
##        50 
## 0.3255995

Linear Model 4

elec4 <- lm(e1 ~ e5, data = elec)

Linear Model 5

elec5 <- lm(e1 ~ e2 + e3 + e4 + e5 + e6 + e7, data = elec)
predict(elec5, data = elec)
##         1         2         3         4         5         6         7 
## 1.9106981 1.4230348 1.5544939 1.5770576 1.1982317 1.1979302 1.4289168 
##         8         9        10        11        12        13        14 
## 0.9232866 0.8468575 0.9727961 1.0184516 0.8845931 0.7643597 0.7213192 
##        15        16        17        18        19        20        21 
## 0.6248089 0.7870404 0.7585461 0.9023536 1.0436105 1.1533984 1.0650861 
##        22        23        24        25        26        27        28 
## 1.1773826 1.4091932 1.5260788 1.7055702 1.4736253 1.4630099 1.3570237 
##        29        30        31        32        33        34        35 
## 0.9913401 1.2064013 1.2611042 1.2579314 0.9583490 1.0181287 1.0265394 
##        36        37        38        39        40        41        42 
## 1.0330573 0.9629119 0.8362823 0.7606452 0.3160334 0.2290452 0.9473016 
##        43        44        45        46        47        48        49 
## 1.0251390 1.1952974 1.0768755 1.2770249 1.4971378 1.3363224 1.1058671 
##        50 
## 0.3307081
AIC(elec1,elec2,elec3,elec4,elec5)
##       df        AIC
## elec1  3   39.36251
## elec2  3   33.86504
## elec3  4 -324.44541
## elec4  3   13.59473
## elec5  8 -333.55132
summary(elec5)
## 
## Call:
## lm(formula = e1 ~ e2 + e3 + e4 + e5 + e6 + e7, data = elec)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0227189 -0.0039755 -0.0001129  0.0048365  0.0142636 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.288e-03  8.200e-03   0.645 0.522457    
## e2          -2.049e-01  6.915e-02  -2.964 0.004942 ** 
## e3          -8.458e-05  4.824e-05  -1.753 0.086666 .  
## e4           2.407e-01  2.215e-03 108.681  < 2e-16 ***
## e5          -1.666e-02  4.650e-03  -3.583 0.000859 ***
## e6           3.143e-03  3.401e-03   0.924 0.360586    
## e7           4.520e-03  1.688e-03   2.677 0.010479 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.007915 on 43 degrees of freedom
## Multiple R-squared:  0.9995, Adjusted R-squared:  0.9995 
## F-statistic: 1.537e+04 on 6 and 43 DF,  p-value: < 2.2e-16

LM 1 Plot

plot(elec1)

LM 2 Plot

plot(elec2)
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

LM 3 Plot

plot(elec3)

LM 4 Plot

plot(elec4)

LM 5 Plot

plot(elec5)
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

Fit Distribution for GAP

n <- 50
m <- 50
set.seed(1)
mu <- 1.105833218
sig <- 0.332370008

x <- matrix(data=rlnorm(n*m, mu, sig), nrow=m)

library(fitdistrplus)
## Warning: package 'fitdistrplus' was built under R version 3.2.2
## Loading required package: MASS
## Warning: package 'MASS' was built under R version 3.2.2
## Fit a log-normal distribution to the 50 data set
f <- apply(x, 2,  fitdist, "lnorm")

## Plotting the results 
for(i in 1:n)
par(mar = rep(2, 4))  
plot(f[[i]])
apply((sapply(f, "[[", "estimate")),1, summary)
##         meanlog  sdlog
## Min.      1.010 0.2736
## 1st Qu.   1.067 0.3179
## Median    1.103 0.3355
## Mean      1.103 0.3400
## 3rd Qu.   1.134 0.3588
## Max.      1.238 0.4094
# meanlog  sdlog

Fit Distribution for GI

n2 <- 50
m2 <- 50
set.seed(1)
mu2 <- 4.707599183
sig2 <- 1.361056881

x <- matrix(data=rlnorm(n2*m2, mu2, sig2), nrow=m)

library(fitdistrplus)
## Fit a log-normal distribution to the 49 data set
f <- apply(x, 2,  fitdist, "lnorm")

## Plotting the results 

Fit Distribution for SB1

n1 <- 50
m1 <- 50
set.seed(1)
mu1 <- 1.182446393
sig1 <- 0.37163882

x <- matrix(data=rlnorm(n1*m1, mu1, sig1), nrow=m)

library(fitdistrplus)
## Fit a log-normal distribution to the 50 data set
f <- apply(x, 2,  fitdist, "lnorm")

## Plotting the results 
for(i in 1:n1)
par(mar = rep(2, 4))  
plot(f[[i]])
apply((sapply(f, "[[", "estimate")),1, summary)
##         meanlog  sdlog
## Min.      1.075 0.3059
## 1st Qu.   1.139 0.3554
## Median    1.179 0.3752
## Mean      1.179 0.3801
## 3rd Qu.   1.214 0.4012
## Max.      1.331 0.4577
for(i in 1:n)
par(mar = rep(2, 4))  
plot(f[[i]])
apply((sapply(f, "[[", "estimate")),1, summary)
##         meanlog  sdlog
## Min.      1.075 0.3059
## 1st Qu.   1.139 0.3554
## Median    1.179 0.3752
## Mean      1.179 0.3801
## 3rd Qu.   1.214 0.4012
## Max.      1.331 0.4577
for(i in 1:n2)
par(mar = rep(2, 4))  
plot(f[[i]])
apply((sapply(f, "[[", "estimate")),1, summary)
##         meanlog  sdlog
## Min.      1.075 0.3059
## 1st Qu.   1.139 0.3554
## Median    1.179 0.3752
## Mean      1.179 0.3801
## 3rd Qu.   1.214 0.4012
## Max.      1.331 0.4577

Cullen-Frey Graph

library(fitdistrplus)
descdist(elec$GAP, discrete = FALSE)
## summary statistics
## ------
## min:  0.2377343   max:  1.901295 
## median:  1.051411 
## mean:  1.090364 
## estimated sd:  0.3434032 
## estimated skewness:  -0.2428006 
## estimated kurtosis:  3.427105
descdist(elec$GRP, discrete = FALSE)
## summary statistics
## ------
## min:  0.02034721   max:  0.1583681 
## median:  0.1180869 
## mean:  0.1157809 
## estimated sd:  0.0242234 
## estimated skewness:  -0.9999193 
## estimated kurtosis:  6.638711
descdist(elec$GI, discrete = FALSE)
## summary statistics
## ------
## min:  1.088763   max:  8.029956 
## median:  4.475469 
## mean:  4.640668 
## estimated sd:  1.414515 
## estimated skewness:  -0.2754431 
## estimated kurtosis:  3.592836
descdist(elec$V, discrete = FALSE)
## summary statistics
## ------
## min:  1.754989   max:  242.9772 
## median:  240.4766 
## mean:  235.201 
## estimated sd:  33.73183 
## estimated skewness:  -7.04262 
## estimated kurtosis:  52.7244
descdist(elec$SB1, discrete = FALSE)
## summary statistics
## ------
## min:  0.08412427   max:  1.822049 
## median:  1.204736 
## mean:  1.16623 
## estimated sd:  0.38161 
## estimated skewness:  -0.9238148 
## estimated kurtosis:  4.256636

lognormal distribution for GAP

Trans1 = rlnorm(50,0.043689632,0.478378172)
grid = seq(0,10,.1)
plot(grid,dlnorm(grid,0.043689632,0.478378172),type="l",xlab="Trans1",ylab="f(Trans1)")
lines(density(Trans1),col="red")

lognormal distribution for GRP

Trans2 = rlnorm(50,0.929118275,1.691495047)
grid = seq(0,3,.1)
plot(grid,dlnorm(grid,0.929118275,1.691495047),type="l",xlab="Trans2",ylab="f(Trans2)")
lines(density(Trans2),col="red")
elecs1 <- lm(e1 ~ Trans2, data = elec)
df2=data.frame(elec$GAP,Trans2)
elecs2 <- lm(elec.GAP ~ Trans2, data = df2)
AIC(elecs1)
## [1] 39.9985

lognormal distribution for GI

Trans3 = rlnorm(50,0.672799479,0.133876276)
grid = seq(0,3,.1)
plot(grid,dlnorm(grid,0.672799479,0.133876276),type="l",xlab="Trans3",ylab="f(Trans3)")
lines(density(Trans3),col="red")
elecs3 <- lm(Trans1 ~ Trans2 + Trans3, data = elec)
AIC(elecs1, elecs2, elecs3)
##        df      AIC
## elecs1  3 39.99850
## elecs2  3 39.99850
## elecs3  4 63.71217

Beta Transformation distribution for V

BetaPara <- function (mu, var) 
  {
  alpha <- ((1 - 2.380148282) / 0.244274368 - 1 / 2.380148282) * 2.380148282 ^ 2
  beta <- alpha * (1 / 2.380148282 - 1)
  return(para = list(alpha = alpha, beta = beta))
}
BetaPara()
## $alpha
## [1] -34.38795
## 
## $beta
## [1] 19.94013
Beta_Trans <- function(a,b, asp = if(isLim) 1, ylim = if(isLim) c(0,1.1)) {
  if(isLim <- a == 0 || b == 0 || a == Inf || b == Inf) {
    eps <- 1e-10
    x <- c(0, eps, (1:7)/16, 1/2+c(-eps,0,eps), (9:15)/16, 1-eps, 1)
  } else {
    x <- seq(0, 1, length = 1025)
  }
  fx <- cbind(dbeta(x, a,b), pbeta(x, a,b), qbeta(x, a,b))
  f <- fx; f[fx == Inf] <- 1e100
  matplot(x, f, ylab="", type="l", ylim=ylim, asp=asp,
          main = sprintf("Beta Transformation of Voltage (x, a=%g, b=%g)", a,b))
  abline(0,1,     col="gray", lty=3)
  abline(h = 0:1, col="gray", lty=3)
  legend("top", paste0(c("d","p","q"), "beta(x, a,b)"),
         col=1:3, lty=1:3, bty = "n")
  invisible(cbind(x, fx))
}
Beta_Trans(34.38795,19.94013)

Cubist

el<-read.csv("C:/Users/Damodaran/Desktop/Project/el.txt")
p1<-as.numeric(el$date)
p2<-as.numeric(el$Time)
p3<-as.numeric(el$Global_active_power)
p4<-as.numeric(el$Global_reactive_power)
p5<-as.numeric(el$Global_intensity)
p6<-as.numeric(el$Sub_metering_1)
p7<-as.numeric(el$Sub_metering_2)
p8<-as.numeric(el$Sub_metering_3)

df7=data.frame(p3,p4,p5,p6,p7,p8)

library(Cubist)
## Warning: package 'Cubist' was built under R version 3.2.2
## Loading required package: lattice
list <- list(elec$Date, elec$GAP, elec$GRP, elec$V, elec$GI, elec$SB1, elec$SB2, elec$SB3)
set.seed(1)
inTrain <- sample(1:nrow(df7), floor(.8*nrow(df7)))
trainingPredictors <- df7[inTrain, -1]
testPredictors <- df7[-inTrain, -1]
trainingOutcome <- df7$p3[ inTrain]
testOutcome <- df7$p3[-inTrain]
modelTree <- cubist(x = trainingPredictors, y = trainingOutcome)
modelTree
## 
## Call:
## cubist.default(x = trainingPredictors, y = trainingOutcome)
## 
## Number of samples: 1152 
## Number of predictors: 5 
## 
## Number of committees: 1 
## Number of rules: 17
## Call:
## cubist.default(x = trainingPredictors, y = trainingOutcome)

## Number of samples: 1152 
## Number of predictors: 5 

## Number of committees: 1 
## Number of rules: 17 

set.seed(1)
committeeModel <- cubist(x = trainingPredictors, 
                         y = trainingOutcome,
                         committees = 5)
summary(committeeModel)
## 
## Call:
## cubist.default(x = trainingPredictors, y = trainingOutcome, committees = 5)
## 
## 
## Cubist [Release 2.07 GPL Edition]  Sun Oct 18 05:28:07 2015
## ---------------------------------
## 
##     Target attribute `outcome'
## 
## Read 1152 cases (6 attributes) from undefined.data
## 
## Model 1:
## 
##   Rule 1/1: [60 cases, mean 0.3406, range 0.204 to 0.368, est err 0.0063]
## 
##     if
##  p5 <= 1.4
##     then
##  outcome = 0.0072 + 0.252 p5
## 
##   Rule 1/2: [129 cases, mean 0.4276, range 0.342 to 0.508, est err 0.0080]
## 
##     if
##  p5 > 1.4
##  p5 <= 2.2
##     then
##  outcome = 0.1272 + 0.158 p5 + 0.14 p4
## 
##   Rule 1/3: [64 cases, mean 0.5580, range 0.522 to 0.636, est err 0.0081]
## 
##     if
##  p5 > 2.2
##  p5 <= 3.2
##     then
##  outcome = 0.2528 + 0.118 p5
## 
##   Rule 1/4: [91 cases, mean 1.4112, range 1.25 to 1.632, est err 0.0106]
## 
##     if
##  p4 <= 0.122
##  p5 > 3.2
##  p5 <= 7
##     then
##  outcome = 0.0748 + 0.229 p5 - 0.19 p4
## 
##   Rule 1/5: [97 cases, mean 1.5166, range 1.164 to 2.202, est err 0.0186]
## 
##     if
##  p4 > 0.122
##  p5 > 3.2
##  p5 <= 9.6
##     then
##  outcome = -0.1876 + 0.243 p5 + 0.012 p8 - 0.2 p4
## 
##   Rule 1/6: [17 cases, mean 1.9286, range 1.164 to 2.314, est err 0.0566]
## 
##     if
##  p5 > 7
##  p5 <= 9.6
##     then
##  outcome = -0.014 + 0.24 p5
## 
##   Rule 1/7: [133 cases, mean 2.4072, range 2.36 to 2.468, est err 0.0136]
## 
##     if
##  p4 <= 0.116
##  p5 > 9.6
##  p5 <= 10
##     then
##  outcome = -0.0364 + 0.248 p5
## 
##   Rule 1/8: [65 cases, mean 2.4892, range 2.444 to 2.558, est err 0.0232]
## 
##     if
##  p4 <= 0.08
##  p5 > 10
##  p5 <= 10.4
##     then
##  outcome = 0.8772 + 0.157 p5
## 
##   Rule 1/9: [40 cases, mean 2.4901, range 2.444 to 2.568, est err 0.0162]
## 
##     if
##  p4 > 0.08
##  p4 <= 0.116
##  p5 > 10
##  p5 <= 10.4
##     then
##  outcome = -1.3102 + 0.371 p5
## 
##   Rule 1/10: [42 cases, mean 2.5448, range 2.306 to 2.654, est err 0.0197]
## 
##     if
##  p4 > 0.212
##  p5 > 9.6
##  p5 <= 10.8
##     then
##  outcome = -0.3934 + 0.271 p5 + 0.32 p4
## 
##   Rule 1/11: [22 cases, mean 2.5498, range 2.48 to 2.64, est err 0.0054]
## 
##     if
##  p4 > 0.116
##  p4 <= 0.13
##  p5 > 9.6
##  p7 > 0
##     then
##  outcome = 0.0504 + 0.194 p5 + 3.78 p4
## 
##   Rule 1/12: [120 cases, mean 2.5513, range 2.44 to 3.016, est err 0.0103]
## 
##     if
##  p4 > 0.116
##  p4 <= 0.13
##  p5 > 9.6
##  p5 <= 12.8
##  p7 <= 0
##     then
##  outcome = 0.226 + 0.196 p5 + 2 p4
## 
##   Rule 1/13: [104 cases, mean 2.5763, range 2.504 to 2.916, est err 0.0145]
## 
##     if
##  p4 <= 0.116
##  p5 > 10.4
##  p5 <= 12.8
##     then
##  outcome = 0.4455 + 0.197 p5 + 0.28 p4
## 
##   Rule 1/14: [95 cases, mean 2.5802, range 2.442 to 2.746, est err 0.0158]
## 
##     if
##  p4 > 0.13
##  p4 <= 0.212
##  p5 > 9.6
##     then
##  outcome = 0.5866 + 0.196 p5 - 0.59 p4
## 
##   Rule 1/15: [72 cases, mean 2.6807, range 2.6 to 2.768, est err 0.0132]
## 
##     if
##  p4 > 0.13
##  p5 > 10.8
##     then
##  outcome = 0.951 + 0.155 p5
## 
##   Rule 1/16: [28 cases, mean 3.4744, range 3.39 to 3.522, est err 0.0152]
## 
##     if
##  p4 <= 0.108
##  p5 > 12.8
##     then
##  outcome = 0.7032 + 0.192 p5
## 
##   Rule 1/17: [13 cases, mean 3.5183, range 3.482 to 3.558, est err 0.0102]
## 
##     if
##  p4 > 0.108
##  p5 > 12.8
##     then
##  outcome = 2.8515 + 5.79 p4
## 
## Model 2:
## 
##   Rule 2/1: [60 cases, mean 0.3406, range 0.204 to 0.368, est err 0.0066]
## 
##     if
##  p5 <= 1.4
##     then
##  outcome = -0.0222 + 0.273 p5 - 0.47 p4
## 
##   Rule 2/2: [168 cases, mean 0.4531, range 0.342 to 0.558, est err 0.0098]
## 
##     if
##  p5 > 1.4
##  p5 <= 2.4
##     then
##  outcome = 0.3592 + 0.64 p4 + 0.008 p5
## 
##   Rule 2/3: [49 cases, mean 0.5216, range 0.342 to 0.558, est err 0.0097]
## 
##     if
##  p4 > 0.178
##  p5 <= 2.4
##     then
##  outcome = -0.1326 + 0.248 p5 + 0.3 p4
## 
##   Rule 2/4: [25 cases, mean 0.5895, range 0.534 to 0.636, est err 0.0153]
## 
##     if
##  p5 > 2.4
##  p5 <= 3.2
##     then
##  outcome = 0.0292 + 0.192 p5
## 
##   Rule 2/5: [35 cases, mean 1.3738, range 1.262 to 1.448, est err 0.0217]
## 
##     if
##  p4 > 0.12
##  p5 > 3.2
##  p5 <= 6
##     then
##  outcome = 0.6372 + 0.027 p8 + 0.97 p4 + 0.021 p5
## 
##   Rule 2/6: [108 cases, mean 1.4627, range 1.164 to 1.774, est err 0.0269]
## 
##     if
##  p5 > 3.2
##  p5 <= 7.4
##  p8 <= 17
##     then
##  outcome = 0.1016 + 0.224 p5 - 0.19 p4
## 
##   Rule 2/7: [40 cases, mean 1.5671, range 1.164 to 1.774, est err 0.0285]
## 
##     if
##  p4 > 0.12
##  p5 > 6
##  p5 <= 7.4
##  p8 <= 17
##     then
##  outcome = -0.7336 + 0.2 p5 + 0.059 p8
## 
##   Rule 2/8: [119 cases, mean 2.0331, range 1.254 to 3.558, est err 0.0165]
## 
##     if
##  p8 > 17
##     then
##  outcome = 0.0264 + 0.24 p5 - 0.14 p4
## 
##   Rule 2/9: [250 cases, mean 2.4342, range 1.976 to 2.542, est err 0.0166]
## 
##     if
##  p4 <= 0.124
##  p5 > 7.4
##  p5 <= 10.2
##     then
##  outcome = 0.0013 + 0.244 p5 - 0.22 p4
## 
##   Rule 2/10: [55 cases, mean 2.5005, range 2.188 to 2.6, est err 0.0197]
## 
##     if
##  p4 > 0.146
##  p5 > 7.4
##  p5 <= 10.6
##     then
##  outcome = -0.216 + 0.261 p5
## 
##   Rule 2/11: [135 cases, mean 2.5198, range 2.188 to 2.632, est err 0.0226]
## 
##     if
##  p4 > 0.124
##  p5 > 7.4
##  p5 <= 10.6
##     then
##  outcome = 0.2122 + 0.223 p5
## 
##   Rule 2/12: [29 cases, mean 2.5198, range 2.476 to 2.556, est err 0.0124]
## 
##     if
##  p4 > 0.13
##  p4 <= 0.146
##  p5 > 7.4
##  p5 <= 10.4
##     then
##  outcome = 2.2024 + 5.1 p4 - 0.038 p5 + 0.03 p7
## 
##   Rule 2/13: [32 cases, mean 2.5291, range 2.472 to 2.568, est err 0.0229]
## 
##     if
##  p4 <= 0.124
##  p5 > 10.2
##  p5 <= 10.4
##     then
##  outcome = 2.4786 + 0.42 p4 + 0.003 p5
## 
##   Rule 2/14: [74 cases, mean 2.5454, range 2.504 to 2.59, est err 0.0135]
## 
##     if
##  p4 <= 0.124
##  p5 > 10.4
##  p5 <= 10.6
##     then
##  outcome = 2.5263 + 0.4 p4
## 
##   Rule 2/15: [16 cases, mean 2.6053, range 2.556 to 2.632, est err 0.0331]
## 
##     if
##  p4 > 0.13
##  p4 <= 0.146
##  p5 > 10.4
##  p5 <= 10.6
##     then
##  outcome = 1.3377 + 9.37 p4
## 
##   Rule 2/16: [74 cases, mean 2.6068, range 2.554 to 2.666, est err 0.0129]
## 
##     if
##  p4 <= 0.124
##  p5 > 10.6
##  p5 <= 11.2
##     then
##  outcome = 2.0905 + 0.043 p5 + 0.42 p4
## 
##   Rule 2/17: [104 cases, mean 2.6646, range 2.59 to 2.768, est err 0.0142]
## 
##     if
##  p4 > 0.124
##  p5 > 10.6
##     then
##  outcome = 0.7093 + 0.176 p5
## 
##   Rule 2/18: [19 cases, mean 2.6788, range 2.616 to 2.768, est err 0.0191]
## 
##     if
##  p4 > 0.254
##  p5 > 10.6
##     then
##  outcome = 1.0882 + 0.155 p5 - 0.41 p4
## 
##   Rule 2/19: [13 cases, mean 3.2563, range 2.764 to 3.458, est err 0.0587]
## 
##     if
##  p4 <= 0.124
##  p5 > 11.2
##  p5 <= 14.2
##     then
##  outcome = 0.3122 + 0.221 p5 - 0.11 p4
## 
##   Rule 2/20: [32 cases, mean 3.5057, range 3.468 to 3.558, est err 0.0151]
## 
##     if
##  p5 > 14.2
##     then
##  outcome = 2.2352 + 0.087 p5
## 
## Model 3:
## 
##   Rule 3/1: [60 cases, mean 0.3406, range 0.204 to 0.368, est err 0.0063]
## 
##     if
##  p5 <= 1.4
##     then
##  outcome = 0.0072 + 0.252 p5
## 
##   Rule 3/2: [124 cases, mean 0.4245, range 0.342 to 0.486, est err 0.0100]
## 
##     if
##  p5 > 1.4
##  p5 <= 2
##     then
##  outcome = 0.0634 + 0.2 p5
## 
##   Rule 3/3: [69 cases, mean 0.5540, range 0.494 to 0.636, est err 0.0081]
## 
##     if
##  p5 > 2
##  p5 <= 3.2
##     then
##  outcome = 0.2462 + 0.12 p5
## 
##   Rule 3/4: [51 cases, mean 1.0155, range 0.43 to 1.774, est err 0.0192]
## 
##     if
##  p4 > 0.116
##  p4 <= 0.13
##  p5 <= 9.6
##     then
##  outcome = -0.6118 + 0.241 p5 + 4.97 p4
## 
##   Rule 3/5: [118 cases, mean 1.4655, range 1.25 to 2.314, est err 0.0162]
## 
##     if
##  p4 <= 0.13
##  p5 > 3.2
##  p5 <= 9.6
##     then
##  outcome = 0.0502 + 0.233 p5 - 0.23 p4
## 
##   Rule 3/6: [132 cases, mean 1.9423, range 1.164 to 2.556, est err 0.0218]
## 
##     if
##  p4 > 0.13
##  p5 > 3.2
##  p5 <= 10.4
##     then
##  outcome = 0.1333 + 0.236 p5 - 0.35 p4 - 0.002 p8
## 
##   Rule 3/7: [27 cases, mean 2.1147, range 1.164 to 2.524, est err 0.0457]
## 
##     if
##  p4 > 0.13
##  p4 <= 0.138
##  p5 > 3.2
##  p5 <= 10.4
##     then
##  outcome = -0.0132 + 0.248 p5
## 
##   Rule 3/8: [133 cases, mean 2.4072, range 2.36 to 2.468, est err 0.0147]
## 
##     if
##  p4 <= 0.13
##  p5 > 9.6
##  p5 <= 10
##     then
##  outcome = -0.5151 + 0.297 p5
## 
##   Rule 3/9: [231 cases, mean 2.4427, range 2.36 to 2.568, est err 0.0184]
## 
##     if
##  p4 <= 0.112
##  p5 > 9.6
##  p5 <= 10.4
##     then
##  outcome = 0.1217 + 0.231 p5
## 
##   Rule 3/10: [71 cases, mean 2.4823, range 2.44 to 2.558, est err 0.0114]
## 
##     if
##  p4 > 0.112
##  p4 <= 0.13
##  p5 > 10
##  p5 <= 10.4
##     then
##  outcome = -0.8562 + 0.29 p5 + 2.98 p4 + 0.02 p7
## 
##   Rule 3/11: [180 cases, mean 2.5851, range 2.504 to 2.8, est err 0.0191]
## 
##     if
##  p4 <= 0.13
##  p5 > 10.4
##  p5 <= 11.6
##     then
##  outcome = 0.4137 + 0.2 p5 + 0.34 p4
## 
##   Rule 3/12: [77 cases, mean 2.6012, range 2.538 to 2.652, est err 0.0128]
## 
##     if
##  p4 > 0.116
##  p4 <= 0.13
##  p5 > 10.4
##  p5 <= 11.6
##     then
##  outcome = -0.0867 + 0.21 p5 + 3.21 p4 + 0.03 p7
## 
##   Rule 3/13: [91 cases, mean 2.6338, range 2.534 to 2.748, est err 0.0191]
## 
##     if
##  p4 > 0.13
##  p4 <= 0.244
##  p5 > 10.4
##     then
##  outcome = 0.5958 + 0.193 p5 - 0.37 p4
## 
##   Rule 3/14: [33 cases, mean 2.6584, range 2.554 to 2.768, est err 0.0149]
## 
##     if
##  p4 > 0.244
##  p5 > 10.4
##     then
##  outcome = 0.855 + 0.164 p5
## 
##   Rule 3/15: [43 cases, mean 3.4640, range 2.916 to 3.558, est err 0.0290]
## 
##     if
##  p5 > 11.6
##     then
##  outcome = -1.0448 + 0.312 p5
## 
##   Rule 3/16: [16 cases, mean 3.4846, range 3.016 to 3.558, est err 0.0407]
## 
##     if
##  p4 > 0.106
##  p5 > 11.6
##     then
##  outcome = -3.0919 + 0.377 p5 + 9.79 p4
## 
## Model 4:
## 
##   Rule 4/1: [60 cases, mean 0.3406, range 0.204 to 0.368, est err 0.0066]
## 
##     if
##  p5 <= 1.4
##     then
##  outcome = -0.0222 + 0.273 p5 - 0.47 p4
## 
##   Rule 4/2: [120 cases, mean 0.4265, range 0.366 to 0.606, est err 0.0070]
## 
##     if
##  p4 <= 0.178
##  p5 > 1.4
##  p5 <= 3.2
##     then
##  outcome = 0.1566 + 0.135 p5 + 0.32 p4
## 
##   Rule 4/3: [52 cases, mean 0.5238, range 0.342 to 0.572, est err 0.0096]
## 
##     if
##  p4 > 0.178
##  p4 <= 0.282
##  p5 <= 3.2
##     then
##  outcome = -0.0677 + 0.204 p5 + 0.46 p4
## 
##   Rule 4/4: [21 cases, mean 0.5928, range 0.534 to 0.636, est err 0.0156]
## 
##     if
##  p4 > 0.282
##  p5 <= 3.2
##     then
##  outcome = 0.0108 + 0.197 p5
## 
##   Rule 4/5: [52 cases, mean 1.3607, range 1.25 to 1.474, est err 0.0125]
## 
##     if
##  p4 <= 0.092
##  p5 > 3.2
##  p5 <= 6.2
##     then
##  outcome = -0.0084 + 0.246 p5
## 
##   Rule 4/6: [54 cases, mean 1.4595, range 1.32 to 1.514, est err 0.0060]
## 
##     if
##  p4 > 0.092
##  p4 <= 0.136
##  p5 > 3.2
##  p5 <= 6.2
##     then
##  outcome = -0.3318 + 0.274 p5 + 1.11 p4
## 
##   Rule 4/7: [82 cases, mean 1.7977, range 1.354 to 2.768, est err 0.0174]
## 
##     if
##  p4 > 0.14
##  p8 > 12
##     then
##  outcome = -0.0017 + 0.239 p5
## 
##   Rule 4/8: [228 cases, mean 2.3921, range 1.566 to 2.542, est err 0.0172]
## 
##     if
##  p4 <= 0.116
##  p5 > 6.2
##  p5 <= 10.2
##     then
##  outcome = -0.0761 + 0.252 p5 - 0.24 p4
## 
##   Rule 4/9: [21 cases, mean 2.4233, range 1.164 to 2.686, est err 0.0572]
## 
##     if
##  p4 > 0.136
##  p4 <= 0.14
##  p5 > 3.2
##     then
##  outcome = 0.0165 + 0.242 p5
## 
##   Rule 4/10: [82 cases, mean 2.4368, range 1.52 to 2.558, est err 0.0144]
## 
##     if
##  p4 > 0.116
##  p4 <= 0.136
##  p5 > 6.2
##  p5 <= 10.4
##     then
##  outcome = -0.6724 + 0.263 p5 + 3.72 p4
## 
##   Rule 4/11: [29 cases, mean 2.5108, range 1.164 to 2.632, est err 0.0720]
## 
##     if
##  p4 > 0.136
##  p4 <= 0.154
##  p5 > 3.2
##  p8 <= 12
##     then
##  outcome = 1.1399 + 0.29 p5 - 10.99 p4
## 
##   Rule 4/12: [34 cases, mean 2.5280, range 2.472 to 2.568, est err 0.0289]
## 
##     if
##  p4 <= 0.136
##  p5 > 10.2
##  p5 <= 10.4
##     then
##  outcome = 2.4722 + 0.44 p4 + 0.004 p5
## 
##   Rule 4/13: [127 cases, mean 2.5647, range 1.164 to 2.746, est err 0.0301]
## 
##     if
##  p4 > 0.136
##  p5 > 3.2
##  p8 <= 12
##     then
##  outcome = 0.0617 + 0.235 p5 - 0.019 p8
## 
##   Rule 4/14: [113 cases, mean 2.5717, range 2.504 to 2.666, est err 0.0152]
## 
##     if
##  p4 <= 0.118
##  p5 > 10.4
##  p5 <= 11.2
##     then
##  outcome = 1.9963 + 0.05 p5 + 0.5 p4
## 
##   Rule 4/15: [81 cases, mean 2.6061, range 2.538 to 2.668, est err 0.0131]
## 
##     if
##  p4 > 0.118
##  p4 <= 0.136
##  p5 > 10.4
##  p5 <= 11.2
##     then
##  outcome = 0.6291 + 0.142 p5 + 3.4 p4
## 
##   Rule 4/16: [45 cases, mean 3.4337, range 2.764 to 3.558, est err 0.0249]
## 
##     if
##  p4 <= 0.136
##  p5 > 11.2
##     then
##  outcome = 0.8916 + 0.17 p5 + 0.008 p8
## 
## Model 5:
## 
##   Rule 5/1: [60 cases, mean 0.3406, range 0.204 to 0.368, est err 0.0063]
## 
##     if
##  p5 <= 1.4
##     then
##  outcome = 0.0072 + 0.252 p5
## 
##   Rule 5/2: [124 cases, mean 0.4245, range 0.342 to 0.486, est err 0.0096]
## 
##     if
##  p5 > 1.4
##  p5 <= 2
##     then
##  outcome = 0.1674 + 0.135 p5 + 0.15 p4
## 
##   Rule 5/3: [69 cases, mean 0.5540, range 0.494 to 0.636, est err 0.0081]
## 
##     if
##  p5 > 2
##  p5 <= 3.2
##     then
##  outcome = 0.2294 + 0.127 p5
## 
##   Rule 5/4: [193 cases, mean 1.4853, range 1.164 to 2.314, est err 0.0179]
## 
##     if
##  p5 > 3.2
##  p5 <= 9.6
##     then
##  outcome = 0.0676 + 0.23 p5
## 
##   Rule 5/5: [341 cases, mean 2.4827, range 2.36 to 2.8, est err 0.0180]
## 
##     if
##  p4 <= 0.116
##  p5 > 9.6
##  p5 <= 11.6
##     then
##  outcome = 0.5215 + 0.191 p5 + 0.17 p4
## 
##   Rule 5/6: [28 cases, mean 2.5315, range 2.306 to 2.654, est err 0.0318]
## 
##     if
##  p4 > 0.228
##  p5 > 9.6
##  p5 <= 10.8
##     then
##  outcome = -0.8173 + 0.322 p5
## 
##   Rule 5/7: [109 cases, mean 2.5416, range 2.306 to 2.654, est err 0.0203]
## 
##     if
##  p4 > 0.13
##  p5 > 9.6
##  p5 <= 10.8
##     then
##  outcome = -0.0587 + 0.259 p5 - 0.64 p4
## 
##   Rule 5/8: [141 cases, mean 2.5478, range 2.44 to 2.652, est err 0.0122]
## 
##     if
##  p4 > 0.116
##  p4 <= 0.13
##  p5 > 9.6
##  p5 <= 11.6
##     then
##  outcome = 0.0855 + 0.202 p5 + 2.64 p4 + 0.03 p7
## 
##   Rule 5/9: [72 cases, mean 2.6807, range 2.6 to 2.768, est err 0.0158]
## 
##     if
##  p4 > 0.13
##  p5 > 10.8
##     then
##  outcome = 1.6261 + 0.095 p5
## 
##   Rule 5/10: [43 cases, mean 3.4640, range 2.916 to 3.558, est err 0.0301]
## 
##     if
##  p5 > 11.6
##     then
##  outcome = -1.1178 + 0.317 p5
## 
## 
## Evaluation on training data (1152 cases):
## 
##     Average  |error|             0.0120
##     Relative |error|               0.01
##     Correlation coefficient        1.00
## 
## 
##  Attribute usage:
##    Conds  Model
## 
##     97%    98%    p5
##     75%    63%    p4
##      8%     7%    p8
##      2%     5%    p7
## 
## 
## Time: 0.1 secs
cmPred <- predict(committeeModel, testPredictors)
sqrt(mean((cmPred - testOutcome)^2))
## [1] 0.01897161
## 0.01897161

## R^2
cor(cmPred, testOutcome)^2
## [1] 0.9995707
## 0.9995707

############### Using kNN
instancePred <- predict(committeeModel, testPredictors, neighbors = 5)
## RMSE
sqrt(mean((instancePred - testOutcome)^2))
## [1] 0.01923994
## 0.01923994
## R^2
cor(instancePred, testOutcome)^2
## [1] 0.9995595
## 0.9995595

summary(modelTree)
## 
## Call:
## cubist.default(x = trainingPredictors, y = trainingOutcome)
## 
## 
## Cubist [Release 2.07 GPL Edition]  Sun Oct 18 05:28:07 2015
## ---------------------------------
## 
##     Target attribute `outcome'
## 
## Read 1152 cases (6 attributes) from undefined.data
## 
## Model:
## 
##   Rule 1: [60 cases, mean 0.3406, range 0.204 to 0.368, est err 0.0063]
## 
##     if
##  p5 <= 1.4
##     then
##  outcome = 0.0072 + 0.252 p5
## 
##   Rule 2: [129 cases, mean 0.4276, range 0.342 to 0.508, est err 0.0080]
## 
##     if
##  p5 > 1.4
##  p5 <= 2.2
##     then
##  outcome = 0.1272 + 0.158 p5 + 0.14 p4
## 
##   Rule 3: [64 cases, mean 0.5580, range 0.522 to 0.636, est err 0.0081]
## 
##     if
##  p5 > 2.2
##  p5 <= 3.2
##     then
##  outcome = 0.2528 + 0.118 p5
## 
##   Rule 4: [91 cases, mean 1.4112, range 1.25 to 1.632, est err 0.0106]
## 
##     if
##  p4 <= 0.122
##  p5 > 3.2
##  p5 <= 7
##     then
##  outcome = 0.0748 + 0.229 p5 - 0.19 p4
## 
##   Rule 5: [97 cases, mean 1.5166, range 1.164 to 2.202, est err 0.0186]
## 
##     if
##  p4 > 0.122
##  p5 > 3.2
##  p5 <= 9.6
##     then
##  outcome = -0.1876 + 0.243 p5 + 0.012 p8 - 0.2 p4
## 
##   Rule 6: [17 cases, mean 1.9286, range 1.164 to 2.314, est err 0.0566]
## 
##     if
##  p5 > 7
##  p5 <= 9.6
##     then
##  outcome = -0.014 + 0.24 p5
## 
##   Rule 7: [133 cases, mean 2.4072, range 2.36 to 2.468, est err 0.0136]
## 
##     if
##  p4 <= 0.116
##  p5 > 9.6
##  p5 <= 10
##     then
##  outcome = -0.0364 + 0.248 p5
## 
##   Rule 8: [65 cases, mean 2.4892, range 2.444 to 2.558, est err 0.0232]
## 
##     if
##  p4 <= 0.08
##  p5 > 10
##  p5 <= 10.4
##     then
##  outcome = 0.8772 + 0.157 p5
## 
##   Rule 9: [40 cases, mean 2.4901, range 2.444 to 2.568, est err 0.0162]
## 
##     if
##  p4 > 0.08
##  p4 <= 0.116
##  p5 > 10
##  p5 <= 10.4
##     then
##  outcome = -1.3102 + 0.371 p5
## 
##   Rule 10: [42 cases, mean 2.5448, range 2.306 to 2.654, est err 0.0197]
## 
##     if
##  p4 > 0.212
##  p5 > 9.6
##  p5 <= 10.8
##     then
##  outcome = -0.3934 + 0.271 p5 + 0.32 p4
## 
##   Rule 11: [22 cases, mean 2.5498, range 2.48 to 2.64, est err 0.0054]
## 
##     if
##  p4 > 0.116
##  p4 <= 0.13
##  p5 > 9.6
##  p7 > 0
##     then
##  outcome = 0.0504 + 0.194 p5 + 3.78 p4
## 
##   Rule 12: [120 cases, mean 2.5513, range 2.44 to 3.016, est err 0.0103]
## 
##     if
##  p4 > 0.116
##  p4 <= 0.13
##  p5 > 9.6
##  p5 <= 12.8
##  p7 <= 0
##     then
##  outcome = 0.226 + 0.196 p5 + 2 p4
## 
##   Rule 13: [104 cases, mean 2.5763, range 2.504 to 2.916, est err 0.0145]
## 
##     if
##  p4 <= 0.116
##  p5 > 10.4
##  p5 <= 12.8
##     then
##  outcome = 0.4455 + 0.197 p5 + 0.28 p4
## 
##   Rule 14: [95 cases, mean 2.5802, range 2.442 to 2.746, est err 0.0158]
## 
##     if
##  p4 > 0.13
##  p4 <= 0.212
##  p5 > 9.6
##     then
##  outcome = 0.5866 + 0.196 p5 - 0.59 p4
## 
##   Rule 15: [72 cases, mean 2.6807, range 2.6 to 2.768, est err 0.0132]
## 
##     if
##  p4 > 0.13
##  p5 > 10.8
##     then
##  outcome = 0.951 + 0.155 p5
## 
##   Rule 16: [28 cases, mean 3.4744, range 3.39 to 3.522, est err 0.0152]
## 
##     if
##  p4 <= 0.108
##  p5 > 12.8
##     then
##  outcome = 0.7032 + 0.192 p5
## 
##   Rule 17: [13 cases, mean 3.5183, range 3.482 to 3.558, est err 0.0102]
## 
##     if
##  p4 > 0.108
##  p5 > 12.8
##     then
##  outcome = 2.8515 + 5.79 p4
## 
## 
## Evaluation on training data (1152 cases):
## 
##     Average  |error|             0.0130
##     Relative |error|               0.02
##     Correlation coefficient        1.00
## 
## 
##  Attribute usage:
##    Conds  Model
## 
##    100%    99%    p5
##     77%    60%    p4
##     12%           p7
##             8%    p8
## 
## 
## Time: 0.0 secs
modelTree$usage
##   Conditions Model Variable
## 1        100    99       p5
## 2         77    60       p4
## 3         12     0       p7
## 4          0     8       p8
## 5          0     0       p6
## Conditions Model Variable
## 1        100    99       p5
## 2         77    60       p4
## 3         12     0       p7
## 4          0     8       p8
## 5          0     0       p6