Advanced regression models (using R very very quickly)
lmOloessO and smooth.splineOglmOgamOgamlssOVarious ways of checking your models
Model selection
Neural networks vs linear models
STATLAB.csv below.Create basic linear models using lm().
Call:
lm(formula = famit ~ famib, data = Data)
Coefficients:
(Intercept) famib
100.031 0.657
Use the summary() function to see the estimations.
Call:
lm(formula = famit ~ famib, data = Data)
Residuals:
Min 1Q Median 3Q Max
-91.311 -36.062 -8.762 32.448 169.409
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 100.0306 11.8130 8.468 3.31e-14 ***
famib 0.6570 0.1587 4.140 6.01e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 51.58 on 138 degrees of freedom
Multiple R-squared: 0.1105, Adjusted R-squared: 0.104
F-statistic: 17.14 on 1 and 138 DF, p-value: 0.00006013
Analysis of Variance Table
Response: famit
Df Sum Sq Mean Sq F value Pr(>F)
famib 1 45600 45600 17.14 0.00006013 ***
Residuals 138 367137 2660
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residuals apparently OK, but the Q-Q plot shows a bit of a divergence near the tails
Preform shapiro-wilk test
Shapiro-Wilk normality test
data: linmod$residuals
W = 0.97538, p-value = 0.01241
p<- ggplot(data=Data, aes(x=famit)) +
geom_histogram(aes(y =..density..), position="identity",
breaks=seq(min(Data$famit)-1, max(Data$famit)+1, by = 20),
alpha=.5, color="light blue", fill="red") +
geom_density(alpha=.3, fill="brown", col="black") +
labs(title="Histogram of famit")
ggplotly(p)The response variable does not have a normal shape histogram
We shall look at non-parametric models shortly
p<- ggplot(data = Data, aes(x = famib, y = famit)) +
geom_point(color='red') +
geom_smooth(method = "lm", se = TRUE)
ggplotly(p)autoplot()library(ggfortify)
autoplot(linmod, colour="red", which = 1:6, ncol = 2, label.size = 3, title="Model checking plots")p<- ggplot(data = Data, aes(x = famib, y = famit, color=sex)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE)
ggplotly(p)
Call:
lm(formula = famit ~ famib + sex, data = Data)
Coefficients:
(Intercept) famib sex2
94.8289 0.6711 8.7064
Call:
lm(formula = famit ~ famib + sex, data = Data)
Residuals:
Min 1Q Median 3Q Max
-95.378 -32.596 -7.268 31.908 164.780
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 94.8289 12.9202 7.340 1.72e-11 ***
famib 0.6711 0.1593 4.212 4.56e-05 ***
sex2 8.7064 8.7570 0.994 0.322
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 51.58 on 137 degrees of freedom
Multiple R-squared: 0.1169, Adjusted R-squared: 0.104
F-statistic: 9.064 on 2 and 137 DF, p-value: 0.000201
Analysis of Variance Table
Response: famit
Df Sum Sq Mean Sq F value Pr(>F)
famib 1 45600 45600 17.1388 0.00006039 ***
sex 1 2630 2630 0.9885 0.3219
Residuals 137 364507 2661
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Analysis of Variance Table
Model 1: famit ~ famib
Model 2: famit ~ famib + sex
Res.Df RSS Df Sum of Sq F Pr(>F)
1 138 367137
2 137 364507 1 2630 0.9885 0.3219
Call:
lm(formula = famit ~ famib + I(famib^2) + I(famib^3), data = Data)
Coefficients:
(Intercept) famib I(famib^2) I(famib^3)
43.29121857 2.97336055 -0.02761250 0.00009696
Call:
lm(formula = famit ~ famib + I(famib^2) + I(famib^3), data = Data)
Residuals:
Min 1Q Median 3Q Max
-90.974 -36.443 -7.469 31.488 167.917
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 43.29121857 54.06485200 0.801 0.425
famib 2.97336055 2.08264280 1.428 0.156
I(famib^2) -0.02761250 0.02434115 -1.134 0.259
I(famib^3) 0.00009696 0.00008606 1.127 0.262
Residual standard error: 51.71 on 136 degrees of freedom
Multiple R-squared: 0.1188, Adjusted R-squared: 0.09939
F-statistic: 6.113 on 3 and 136 DF, p-value: 0.0006228
Call:
lm(formula = famit ~ famib + sex + famib * sex, data = Data)
Coefficients:
(Intercept) famib sex2 famib:sex2
88.2815 0.7626 24.9666 -0.2372
Call:
lm(formula = famit ~ famib + sex + famib * sex, data = Data)
Residuals:
Min 1Q Median 3Q Max
-99.264 -32.680 -5.625 32.370 166.721
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 88.2815 15.7938 5.590 0.00000012 ***
famib 0.7626 0.2037 3.744 0.000266 ***
sex2 24.9666 24.1301 1.035 0.302661
famib:sex2 -0.2372 0.3279 -0.723 0.470707
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 51.67 on 136 degrees of freedom
Multiple R-squared: 0.1202, Adjusted R-squared: 0.1008
F-statistic: 6.196 on 3 and 136 DF, p-value: 0.0005615
loess()loess.model <- loess(famit ~ famib, data = Data, span=0.5, degree=1) # 50% smoothing span and degree 1 polynomial
summary(loess.model)Call:
loess(formula = famit ~ famib, data = Data, span = 0.5, degree = 1)
Number of Observations: 140
Equivalent Number of Parameters: 4.73
Residual Standard Error: 51.05
Trace of smoother matrix: 5.56 (exact)
Control settings:
span : 0.5
degree : 1
family : gaussian
surface : interpolate cell = 0.2
normalize: TRUE
parametric: FALSE
drop.square: FALSE
Call:
loess(formula = famit ~ famib, data = Data, span = 0.9, degree = 2)
Number of Observations: 140
Equivalent Number of Parameters: 4.36
Residual Standard Error: 51.03
Trace of smoother matrix: 4.75 (exact)
Control settings:
span : 0.9
degree : 2
family : gaussian
surface : interpolate cell = 0.2
normalize: TRUE
parametric: FALSE
drop.square: FALSE
plot(famit ~ famib, data = Data, col="pink", pch=19)
hat1 <- predict(loess.model)
lines(Data$famib[order(Data$famib)], hat1[order(Data$famib)], col="red", lwd=2)
hat2 <- predict(loess.model2)
lines(Data$famib[order(Data$famib)], hat2[order(Data$famib)], col="blue", lwd=2)
legend(20, 350, legend=c("span=0.5, degree=1", "span=0.9, degree=2"),col=c("red", "blue"),lty=1, cex=0.8,
title="Line types", text.font=4, bg='lightblue')smooth.spline()Call:
smooth.spline(x = Data$famit ~ Data$famib, df = 7)
Smoothing Parameter spar= 0.7922674 lambda= 0.001018668 (12 iterations)
Equivalent Degrees of Freedom (Df): 6.999176
Penalized Criterion (RSS): 147331.1
GCV: 2689.888
plot(famit ~ famib, data = Data, col="pink", pch=19)
abline(linmod, col="blue", lwd=2)
lines(Data$famib[order(Data$famib)], hat1[order(Data$famib)], col="red", lwd=2)
lines(spline.model, col="green", lwd=2)
legend(20, 350, legend=c("linear model", "loess span=0.5, degree=1", "smooth spline, df=7"),col=c("blue", "red", "green"),lty=1, cex=0.8,
title="Line types", text.font=4, bg='lightblue')glm() and gam()
Call:
glm(formula = famit ~ famib, family = Gamma(), data = Data)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.0736 -0.2830 -0.0493 0.2172 0.8766
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.008817606 0.000520140 16.952 < 2e-16 ***
famib -0.000026780 0.000006262 -4.276 0.0000352 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Gamma family taken to be 0.1209083)
Null deviance: 19.751 on 139 degrees of freedom
Residual deviance: 17.793 on 138 degrees of freedom
AIC: 1492.1
Number of Fisher Scoring iterations: 5
library(gam)
Call: gam(formula = famit ~ s(famib, df = 5), family = gaussian(),
data = Data)
Deviance Residuals:
Min 1Q Median 3Q Max
-99.05 -37.12 -4.34 31.84 167.00
(Dispersion Parameter for gaussian family taken to be 2565.676)
Null Deviance: 412737 on 139 degrees of freedom
Residual Deviance: 343800.7 on 134 degrees of freedom
AIC: 1504.167
Number of Local Scoring Iterations: 2
Anova for Parametric Effects
Df Sum Sq Mean Sq F value Pr(>F)
s(famib, df = 5) 1 45600 45600 17.773 0.00004548 ***
Residuals 134 343801 2566
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova for Nonparametric Effects
Npar Df Npar F Pr(F)
(Intercept)
s(famib, df = 5) 4 2.2739 0.06456 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
gamlss()library(gamlss)GAMLSS-RS iteration 1: Global Deviance = 1499.361
GAMLSS-RS iteration 2: Global Deviance = 1499.361
******************************************************************
Family: c("NO", "Normal")
Call: gamlss(formula = famit ~ famib, family = NO(), data = Data)
Fitting method: RS()
------------------------------------------------------------------
Mu link function: identity
Mu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 100.0306 11.7284 8.529 2.44e-14 ***
famib 0.6570 0.1576 4.170 5.37e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
Sigma link function: log
Sigma Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.93592 0.05976 65.86 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
No. of observations in the fit: 140
Degrees of Freedom for the fit: 3
Residual Deg. of Freedom: 137
at cycle: 2
Global Deviance: 1499.361
AIC: 1505.361
SBC: 1514.186
******************************************************************
******************************************************************
Summary of the Quantile Residuals
mean = 1.00614e-16
variance = 1.007194
coef. of skewness = 0.5688844
coef. of kurtosis = 3.224292
Filliben correlation coefficient = 0.9880375
******************************************************************
Shapiro-Wilk normality test
data: Gaml$residuals
W = 0.97538, p-value = 0.01241
GAMLSS-RS iteration 1: Global Deviance = 1489.07
GAMLSS-RS iteration 2: Global Deviance = 1487.675
GAMLSS-RS iteration 3: Global Deviance = 1487.005
GAMLSS-RS iteration 4: Global Deviance = 1486.604
GAMLSS-RS iteration 5: Global Deviance = 1486.324
GAMLSS-RS iteration 6: Global Deviance = 1486.115
GAMLSS-RS iteration 7: Global Deviance = 1485.951
GAMLSS-RS iteration 8: Global Deviance = 1485.822
GAMLSS-RS iteration 9: Global Deviance = 1485.718
GAMLSS-RS iteration 10: Global Deviance = 1485.633
GAMLSS-RS iteration 11: Global Deviance = 1485.578
GAMLSS-RS iteration 12: Global Deviance = 1485.578
******************************************************************
Family: c("GIG", "Generalised Inverse Gaussian")
Call: gamlss(formula = famit ~ famib, family = GIG(), data = Data)
Fitting method: RS()
------------------------------------------------------------------
Mu link function: log
Mu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.672463 0.081517 57.319 < 2e-16 ***
famib 0.004342 0.001095 3.963 0.000119 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
Sigma link function: log
Sigma Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.917 93.331 0.021 0.984
------------------------------------------------------------------
Nu link function: identity
Nu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.0556 0.9437 8.536 2.44e-14 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
No. of observations in the fit: 140
Degrees of Freedom for the fit: 4
Residual Deg. of Freedom: 136
at cycle: 12
Global Deviance: 1485.578
AIC: 1493.578
SBC: 1505.344
******************************************************************
******************************************************************
Summary of the Quantile Residuals
mean = 0.0007077885
variance = 1.006735
coef. of skewness = -0.1802651
coef. of kurtosis = 2.911699
Filliben correlation coefficient = 0.9972805
******************************************************************
Shapiro-Wilk normality test
data: Gaml$residuals
W = 0.99436, p-value = 0.8607
GAMLSS-RS iteration 1: Global Deviance = 1485.289
GAMLSS-RS iteration 2: Global Deviance = 1484.983
GAMLSS-RS iteration 3: Global Deviance = 1484.853
GAMLSS-RS iteration 4: Global Deviance = 1484.796
GAMLSS-RS iteration 5: Global Deviance = 1484.77
GAMLSS-RS iteration 6: Global Deviance = 1484.759
GAMLSS-RS iteration 7: Global Deviance = 1484.753
GAMLSS-RS iteration 8: Global Deviance = 1484.751
GAMLSS-RS iteration 9: Global Deviance = 1484.75
GAMLSS-RS iteration 10: Global Deviance = 1484.749
******************************************************************
Family: c("GG", "generalised Gamma Lopatatsidis-Green")
Call: gamlss(formula = famit ~ famib, family = GG(), data = Data)
Fitting method: RS()
------------------------------------------------------------------
Mu link function: log
Mu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.698616 0.082202 57.159 < 2e-16 ***
famib 0.004439 0.001083 4.099 0.0000711 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
Sigma link function: log
Sigma Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.07366 0.06868 -15.63 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
Nu link function: identity
Nu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.5778 0.6844 2.306 0.0227 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
No. of observations in the fit: 140
Degrees of Freedom for the fit: 4
Residual Deg. of Freedom: 136
at cycle: 10
Global Deviance: 1484.749
AIC: 1492.749
SBC: 1504.516
******************************************************************
******************************************************************
Summary of the Quantile Residuals
mean = 0.0009268872
variance = 1.006849
coef. of skewness = -0.002718162
coef. of kurtosis = 2.819451
Filliben correlation coefficient = 0.9983572
******************************************************************
Shapiro-Wilk normality test
data: Gaml$residuals
W = 0.99649, p-value = 0.9844
Data from https://www.kaggle.com/kemical/kickstarter-projects
library(dplyr)CountS<-count(Main,state, sort = T)
names(CountS)[2]<-"Count"
p<- ggplot(CountS, aes(x = reorder(state,-Count), y = Count)) +
geom_bar(stat = "identity", fill = "red", colour = "light blue", width = 0.7,
alpha=.5) +
xlab('Category')+ ylab('Number of Cases')+
theme(axis.text.x=element_text(angle=-45, hjust=0.001)) # "angle" will tilt the labels
ggplotly(p)CountCS <- count(Main, main_category, state, sort = F)
names(CountCS)[3] <- "Count"
p <- ggplot(CountCS, aes(x = reorder(main_category, -Count), y = Count, fill = state)) +
geom_bar(stat = "identity", width = 0.7) + xlab("Category") + ylab("Number of Cases") +
theme(axis.text.x = element_text(angle = -90, hjust = 0.001)) # 'angle' will tilt the labels
ggplotly(p)Main$Successful<-ifelse(Main$state=="successful"|Main$state=="live",1,0)
TS<-sample(1:nrow(Main), round(nrow(Main)/5))
MainTr<-Main[-TS,]
MainTe<-Main[TS,]
model <- glm(Successful ~main_category ,family=binomial(link='logit'),data=MainTr)
summary(model)
Call:
glm(formula = Successful ~ main_category, family = binomial(link = "logit"),
data = MainTr)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.4011 -0.9505 -0.7686 1.3261 1.7681
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.34291 0.01352 -25.360 <2e-16 ***
main_categoryComics 0.53705 0.02544 21.107 <2e-16 ***
main_categoryCrafts -0.74721 0.03059 -24.428 <2e-16 ***
main_categoryDance 0.85497 0.03981 21.479 <2e-16 ***
main_categoryDesign -0.22392 0.01904 -11.760 <2e-16 ***
main_categoryFashion -0.72517 0.02171 -33.399 <2e-16 ***
main_categoryFilm & Video -0.16241 0.01632 -9.953 <2e-16 ***
main_categoryFood -0.72269 0.02122 -34.063 <2e-16 ***
main_categoryGames -0.21732 0.01833 -11.854 <2e-16 ***
main_categoryJournalism -0.90417 0.04103 -22.038 <2e-16 ***
main_categoryMusic 0.23040 0.01672 13.778 <2e-16 ***
main_categoryPhotography -0.44096 0.02687 -16.409 <2e-16 ***
main_categoryPublishing -0.42471 0.01812 -23.442 <2e-16 ***
main_categoryTechnology -0.98503 0.02037 -48.349 <2e-16 ***
main_categoryTheater 0.74523 0.02569 29.007 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 396435 on 302928 degrees of freedom
Residual deviance: 383985 on 302914 degrees of freedom
AIC: 384015
Number of Fisher Scoring iterations: 4
Analysis of Deviance Table
Model: binomial, link: logit
Response: Successful
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 302928 396435
main_category 14 12450 302914 383985 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Classes and Methods for R developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University
## Simon Jackman
## hurdle and zeroinfl functions by Achim Zeileis
## llh llhNull G2 McFadden
## -191992.72941354 -198217.57463312 12449.69043915 0.03140410
## r2ML r2CU
## 0.04026466 0.05517053
fitted.results <- predict(model, newdata = select(MainTe, main_category), type = "response")
DiffD <- data.frame(Diff = MainTe$Successful - fitted.results)
mean(DiffD$Diff)## [1] -0.00180279
library(olsrr)
Call:
lm(formula = Weight ~ ., data = Data.fr)
Residuals:
Min 1Q Median 3Q Max
-4.3884 -2.7489 -0.6242 2.3178 8.5864
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -25.47398 36.46371 -0.699 0.4855
Height 9.32140 3.60675 2.584 0.0103 *
Pre -0.12786 0.06159 -2.076 0.0390 *
FacA.2 -11.36542 0.74355 -15.285 <2e-16 ***
DisD.2 0.71256 0.62631 1.138 0.2564
DisD.3 0.81611 0.67930 1.201 0.2308
DisD.4 1.23373 0.82184 1.501 0.1346
Day 0.08625 0.72441 0.119 0.9053
Err 1.00248 0.01052 95.272 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.312 on 241 degrees of freedom
Multiple R-squared: 0.9838, Adjusted R-squared: 0.9833
F-statistic: 1833 on 8 and 241 DF, p-value: < 2.2e-16
Shapiro-Wilk normality test
data: model$residuals
W = 0.92068, p-value = 2.767e-10
olsrrVariables with \(p\) value less than pent will enter into the model
Variables with \(p\) more than prem will be removed from the model
Stepwise Selection Method
---------------------------
Candidate Terms:
1. Height
2. Pre
3. Fac
4. Dis
5. Day
6. Err
We are selecting variables based on p value...
Stepwise Selection: Step 1
- Err added
Model Summary
---------------------------------------------------------------
R 0.845 RMSE 13.713
R-Squared 0.715 Coef. Var 37.449
Adj. R-Squared 0.714 MSE 188.043
Pred R-Squared 0.710 MAE 11.325
---------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
ANOVA
------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
------------------------------------------------------------------------
Regression 116856.198 1 116856.198 621.432 0.0000
Residual 46634.737 248 188.043
Total 163490.936 249
------------------------------------------------------------------------
Parameter Estimates
----------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
----------------------------------------------------------------------------------------
(Intercept) 38.010 0.869 43.736 0.000 36.298 39.721
Err 1.074 0.043 0.845 24.929 0.000 0.990 1.159
----------------------------------------------------------------------------------------
Stepwise Selection: Step 2
- Height added
Model Summary
--------------------------------------------------------------
R 0.973 RMSE 5.987
R-Squared 0.946 Coef. Var 16.349
Adj. R-Squared 0.945 MSE 35.840
Pred R-Squared 0.945 MAE 5.223
--------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
ANOVA
-------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
-------------------------------------------------------------------------
Regression 154638.542 2 77319.271 2157.367 0.0000
Residual 8852.394 247 35.840
Total 163490.936 249
-------------------------------------------------------------------------
Parameter Estimates
--------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
--------------------------------------------------------------------------------------------
(Intercept) -26.039 2.009 -12.962 0.000 -29.995 -22.082
Err 1.013 0.019 0.797 53.594 0.000 0.976 1.051
Height 8.529 0.263 0.483 32.469 0.000 8.012 9.046
--------------------------------------------------------------------------------------------
Model Summary
--------------------------------------------------------------
R 0.973 RMSE 5.987
R-Squared 0.946 Coef. Var 16.349
Adj. R-Squared 0.945 MSE 35.840
Pred R-Squared 0.945 MAE 5.223
--------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
ANOVA
-------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
-------------------------------------------------------------------------
Regression 154638.542 2 77319.271 2157.367 0.0000
Residual 8852.394 247 35.840
Total 163490.936 249
-------------------------------------------------------------------------
Parameter Estimates
--------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
--------------------------------------------------------------------------------------------
(Intercept) -26.039 2.009 -12.962 0.000 -29.995 -22.082
Err 1.013 0.019 0.797 53.594 0.000 0.976 1.051
Height 8.529 0.263 0.483 32.469 0.000 8.012 9.046
--------------------------------------------------------------------------------------------
Stepwise Selection: Step 3
- Fac added
Model Summary
--------------------------------------------------------------
R 0.992 RMSE 3.309
R-Squared 0.984 Coef. Var 9.036
Adj. R-Squared 0.983 MSE 10.948
Pred R-Squared 0.983 MAE 2.801
--------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
ANOVA
-------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
-------------------------------------------------------------------------
Regression 160797.703 3 53599.234 4895.757 0.0000
Residual 2693.233 246 10.948
Total 163490.936 249
-------------------------------------------------------------------------
Parameter Estimates
--------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
--------------------------------------------------------------------------------------------
(Intercept) -22.845 1.118 -20.427 0.000 -25.048 -20.642
Err 1.005 0.010 0.790 96.061 0.000 0.984 1.025
Height 8.871 0.146 0.502 60.801 0.000 8.583 9.158
FacA.2 -10.084 0.425 -0.195 -23.719 0.000 -10.922 -9.247
--------------------------------------------------------------------------------------------
Model Summary
--------------------------------------------------------------
R 0.992 RMSE 3.309
R-Squared 0.984 Coef. Var 9.036
Adj. R-Squared 0.983 MSE 10.948
Pred R-Squared 0.983 MAE 2.801
--------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
ANOVA
-------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
-------------------------------------------------------------------------
Regression 160797.703 3 53599.234 4895.757 0.0000
Residual 2693.233 246 10.948
Total 163490.936 249
-------------------------------------------------------------------------
Parameter Estimates
--------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
--------------------------------------------------------------------------------------------
(Intercept) -22.845 1.118 -20.427 0.000 -25.048 -20.642
Err 1.005 0.010 0.790 96.061 0.000 0.984 1.025
Height 8.871 0.146 0.502 60.801 0.000 8.583 9.158
FacA.2 -10.084 0.425 -0.195 -23.719 0.000 -10.922 -9.247
--------------------------------------------------------------------------------------------
No more variables to be added/removed.
Final Model Output
------------------
Model Summary
--------------------------------------------------------------
R 0.992 RMSE 3.309
R-Squared 0.984 Coef. Var 9.036
Adj. R-Squared 0.983 MSE 10.948
Pred R-Squared 0.983 MAE 2.801
--------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
ANOVA
-------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
-------------------------------------------------------------------------
Regression 160797.703 3 53599.234 4895.757 0.0000
Residual 2693.233 246 10.948
Total 163490.936 249
-------------------------------------------------------------------------
Parameter Estimates
--------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
--------------------------------------------------------------------------------------------
(Intercept) -22.845 1.118 -20.427 0.000 -25.048 -20.642
Err 1.005 0.010 0.790 96.061 0.000 0.984 1.025
Height 8.871 0.146 0.502 60.801 0.000 8.583 9.158
FacA.2 -10.084 0.425 -0.195 -23.719 0.000 -10.922 -9.247
--------------------------------------------------------------------------------------------
Stepwise Selection Summary
-----------------------------------------------------------------------------------------
Added/ Adj.
Step Variable Removed R-Square R-Square C(p) AIC RMSE
-----------------------------------------------------------------------------------------
1 Err addition 0.715 0.714 4005.5300 2022.6293 13.7129
2 Height addition 0.946 0.945 563.0430 1609.2148 5.9866
3 Fac addition 0.984 0.983 3.5330 1313.7284 3.3088
-----------------------------------------------------------------------------------------
Variables with \(p\) value less than pent=1 will enter into the model
Variables with \(p\) more than prem=1 will be removed from the model
Stepwise Selection Method
---------------------------
Candidate Terms:
1. Height
2. Pre
3. Fac
4. Dis
5. Day
6. Err
We are selecting variables based on p value...
Variables Entered/Removed:
- Err added
- Height added
- Fac added
- Pre added
- Dis added
- Day added
Final Model Output
------------------
Model Summary
--------------------------------------------------------------
R 0.992 RMSE 3.312
R-Squared 0.984 Coef. Var 9.045
Adj. R-Squared 0.983 MSE 10.969
Pred R-Squared 0.983 MAE 2.760
--------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
ANOVA
-------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
-------------------------------------------------------------------------
Regression 160847.424 8 20105.928 1832.989 0.0000
Residual 2643.512 241 10.969
Total 163490.936 249
-------------------------------------------------------------------------
Parameter Estimates
-------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
-------------------------------------------------------------------------------------------
(Intercept) -25.474 36.464 -0.699 0.485 -97.302 46.354
Err 1.002 0.011 0.789 95.272 0.000 0.982 1.023
Height 9.321 3.607 0.528 2.584 0.010 2.217 16.426
FacA.2 -11.365 0.744 -0.220 -15.285 0.000 -12.830 -9.901
Pre -0.128 0.062 -0.034 -2.076 0.039 -0.249 -0.007
DisD.2 0.713 0.626 0.012 1.138 0.256 -0.521 1.946
DisD.3 0.816 0.679 0.013 1.201 0.231 -0.522 2.154
DisD.4 1.234 0.822 0.021 1.501 0.135 -0.385 2.853
Day 0.086 0.724 0.024 0.119 0.905 -1.341 1.513
-------------------------------------------------------------------------------------------
Stepwise Selection Summary
-----------------------------------------------------------------------------------------
Added/ Adj.
Step Variable Removed R-Square R-Square C(p) AIC RMSE
-----------------------------------------------------------------------------------------
1 Err addition 0.715 0.714 4005.5300 2022.6293 13.7129
2 Height addition 0.946 0.945 563.0430 1609.2148 5.9866
3 Fac addition 0.984 0.983 3.5330 1313.7284 3.3088
4 Pre addition 0.984 0.983 3.4740 1313.6235 3.3016
5 Dis addition 0.984 0.983 3.0140 1317.0846 3.3052
6 Day addition 0.984 0.983 5.0000 1319.0699 3.3119
-----------------------------------------------------------------------------------------
Variables with \(p\) value less than pent= 10^{-150} will enter into the model
Variables with \(p\) more than prem= 0.5 will be removed from the model
Smallest possible number in R is .Machine$double.xmin which is 2.225073910^{-308}
Stepwise Selection Method
---------------------------
Candidate Terms:
1. Height
2. Pre
3. Fac
4. Dis
5. Day
6. Err
We are selecting variables based on p value...
Variables Entered/Removed:
- Err added
No more variables to be added/removed.
Final Model Output
------------------
Model Summary
---------------------------------------------------------------
R 0.845 RMSE 13.713
R-Squared 0.715 Coef. Var 37.449
Adj. R-Squared 0.714 MSE 188.043
Pred R-Squared 0.710 MAE 11.325
---------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
ANOVA
------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
------------------------------------------------------------------------
Regression 116856.198 1 116856.198 621.432 0.0000
Residual 46634.737 248 188.043
Total 163490.936 249
------------------------------------------------------------------------
Parameter Estimates
----------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
----------------------------------------------------------------------------------------
(Intercept) 38.010 0.869 43.736 0.000 36.298 39.721
Err 1.074 0.043 0.845 24.929 0.000 0.990 1.159
----------------------------------------------------------------------------------------
Stepwise Selection Summary
-----------------------------------------------------------------------------------------
Added/ Adj.
Step Variable Removed R-Square R-Square C(p) AIC RMSE
-----------------------------------------------------------------------------------------
1 Err addition 0.715 0.714 4005.5300 2022.6293 13.7129
-----------------------------------------------------------------------------------------
library(relaimpo)# Calculate Relative Importance for Each Predictor
calc.relimp(model,type=c("lmg","last","first"),
rela=TRUE)Response variable: Weight
Total response variance: 656.5901
Analysis based on 250 observations
8 Regressors:
Some regressors combined in groups:
Group Dis : DisD.2 DisD.3 DisD.4
Relative importance of 6 (groups of) regressors assessed:
Dis Height Pre Fac Day Err
Proportion of variance explained by model: 98.38%
Metrics are normalized to sum to 100% (rela=TRUE).
Relative importance metrics:
lmg last first
Dis 0.001595766 0.000263138186 0.0003205256
Height 0.150713910 0.000716370913 0.2289530014
Pre 0.007474442 0.000462176589 0.0053172473
Fac 0.029149724 0.025058770079 0.0203080482
Day 0.149484405 0.000001520218 0.2275498091
Err 0.661581754 0.973498024015 0.5175513684
Average coefficients for different model sizes:
1group 2groups 3groups 4groups 5groups
Height 9.9271262 12.4585380 13.8021218 13.63350255 12.03185259
Pre 0.3188422 0.2976070 0.2157637 0.07908466 -0.06262494
Fac -8.6555234 -10.0418802 -11.1093732 -11.97687392 -12.29966957
DisD.2 0.5128425 0.2309913 0.1408727 0.36166228 0.69564134
DisD.3 -0.6224845 -0.5992575 -0.4245367 0.06597999 0.64790985
DisD.4 -0.8760861 -1.3050856 -1.3468825 -0.56998870 0.61965646
Day -1.9879969 -1.0098538 -0.2827304 0.12297230 0.22513025
Err 1.0744190 1.0499086 1.0303313 1.01603189 1.00635427
6groups
Height 9.32140490
Pre -0.12785520
Fac -11.36541593
DisD.2 0.71255929
DisD.3 0.81610944
DisD.4 1.23373058
Day 0.08624536
Err 1.00247867
# Bootstrap Measures of Relative Importance (1000 samples)
boot <- boot.relimp(model, b = 100, type = c("lmg",
"last", "first"), rank = TRUE,
diff = TRUE, rela = TRUE)
booteval.relimp(boot) # print resultResponse variable: Weight
Total response variance: 656.5901
Analysis based on 250 observations
8 Regressors:
Some regressors combined in groups:
Group Dis : DisD.2 DisD.3 DisD.4
Relative importance of 6 (groups of) regressors assessed:
Dis Height Pre Fac Day Err
Proportion of variance explained by model: 98.38%
Metrics are normalized to sum to 100% (rela=TRUE).
Relative importance metrics:
lmg last first
Dis 0.001595766 0.000263138186 0.0003205256
Height 0.150713910 0.000716370913 0.2289530014
Pre 0.007474442 0.000462176589 0.0053172473
Fac 0.029149724 0.025058770079 0.0203080482
Day 0.149484405 0.000001520218 0.2275498091
Err 0.661581754 0.973498024015 0.5175513684
Average coefficients for different model sizes:
1group 2groups 3groups 4groups 5groups
Height 9.9271262 12.4585380 13.8021218 13.63350255 12.03185259
Pre 0.3188422 0.2976070 0.2157637 0.07908466 -0.06262494
Fac -8.6555234 -10.0418802 -11.1093732 -11.97687392 -12.29966957
DisD.2 0.5128425 0.2309913 0.1408727 0.36166228 0.69564134
DisD.3 -0.6224845 -0.5992575 -0.4245367 0.06597999 0.64790985
DisD.4 -0.8760861 -1.3050856 -1.3468825 -0.56998870 0.61965646
Day -1.9879969 -1.0098538 -0.2827304 0.12297230 0.22513025
Err 1.0744190 1.0499086 1.0303313 1.01603189 1.00635427
6groups
Height 9.32140490
Pre -0.12785520
Fac -11.36541593
DisD.2 0.71255929
DisD.3 0.81610944
DisD.4 1.23373058
Day 0.08624536
Err 1.00247867
Confidence interval information ( 100 bootstrap replicates, bty= perc ):
Relative Contributions with confidence intervals:
Lower Upper
percentage 0.95 0.95 0.95
Dis.lmg 0.0016 ____EF 0.0023 0.0178
Height.lmg 0.1507 _BC___ 0.1145 0.1870
Pre.lmg 0.0075 ____EF 0.0047 0.0221
Fac.lmg 0.0291 ___D__ 0.0142 0.0540
Day.lmg 0.1495 _BC___ 0.1136 0.1854
Err.lmg 0.6616 A_____ 0.5800 0.7226
Dis.last 0.0003 __CDEF 0.0000 0.0018
Height.last 0.0007 __CDEF 0.0001 0.0025
Pre.last 0.0005 __CDEF 0.0000 0.0017
Fac.last 0.0251 _B____ 0.0176 0.0384
Day.last 0.0000 ___DEF 0.0000 0.0006
Err.last 0.9735 A_____ 0.9565 0.9811
Dis.first 0.0003 ___DEF 0.0008 0.0261
Height.first 0.2290 _BC___ 0.1720 0.2681
Pre.first 0.0053 ___DEF 0.0000 0.0380
Fac.first 0.0203 ___DEF 0.0000 0.0628
Day.first 0.2275 _BC___ 0.1701 0.2660
Err.first 0.5176 A_____ 0.4330 0.5965
Letters indicate the ranks covered by bootstrap CIs.
(Rank bootstrap confidence intervals always obtained by percentile method)
CAUTION: Bootstrap confidence intervals can be somewhat liberal.
Differences between Relative Contributions:
Lower Upper
difference 0.95 0.95 0.95
Dis-Height.lmg -0.1491 * -0.1834 -0.1011
Dis-Pre.lmg -0.0059 -0.0158 0.0080
Dis-Fac.lmg -0.0276 * -0.0499 -0.0062
Dis-Day.lmg -0.1479 * -0.1812 -0.1001
Dis-Err.lmg -0.6600 * -0.7183 -0.5722
Height-Pre.lmg 0.1432 * 0.1004 0.1798
Height-Fac.lmg 0.1216 * 0.0736 0.1681
Height-Day.lmg 0.0012 -0.0012 0.0035
Height-Err.lmg -0.5109 * -0.6104 -0.4008
Pre-Fac.lmg -0.0217 * -0.0397 -0.0074
Pre-Day.lmg -0.1420 * -0.1781 -0.0992
Pre-Err.lmg -0.6541 * -0.7133 -0.5663
Fac-Day.lmg -0.1203 * -0.1670 -0.0728
Fac-Err.lmg -0.6324 * -0.7015 -0.5327
Day-Err.lmg -0.5121 * -0.6108 -0.4027
Dis-Height.last -0.0005 -0.0017 0.0012
Dis-Pre.last -0.0002 -0.0011 0.0011
Dis-Fac.last -0.0248 * -0.0376 -0.0173
Dis-Day.last 0.0003 -0.0002 0.0017
Dis-Err.last -0.9732 * -0.9808 -0.9558
Height-Pre.last 0.0003 -0.0012 0.0015
Height-Fac.last -0.0243 * -0.0360 -0.0167
Height-Day.last 0.0007 -0.0002 0.0019
Height-Err.last -0.9728 * -0.9804 -0.9543
Pre-Fac.last -0.0246 * -0.0367 -0.0174
Pre-Day.last 0.0005 -0.0002 0.0016
Pre-Err.last -0.9730 * -0.9809 -0.9548
Fac-Day.last 0.0251 * 0.0175 0.0379
Fac-Err.last -0.9484 * -0.9635 -0.9179
Day-Err.last -0.9735 * -0.9810 -0.9561
Dis-Height.first -0.2286 * -0.2636 -0.1521
Dis-Pre.first -0.0050 -0.0263 0.0205
Dis-Fac.first -0.0200 -0.0603 0.0174
Dis-Day.first -0.2272 * -0.2619 -0.1510
Dis-Err.first -0.5172 * -0.5924 -0.4096
Height-Pre.first 0.2236 * 0.1447 0.2660
Height-Fac.first 0.2086 * 0.1317 0.2615
Height-Day.first 0.0014 -0.0021 0.0045
Height-Err.first -0.2886 * -0.4263 -0.1757
Pre-Fac.first -0.0150 -0.0427 0.0079
Pre-Day.first -0.2222 * -0.2641 -0.1429
Pre-Err.first -0.5122 * -0.5834 -0.4101
Fac-Day.first -0.2072 * -0.2603 -0.1307
Fac-Err.first -0.4972 * -0.5721 -0.3831
Day-Err.first -0.2900 * -0.4271 -0.1779
* indicates that CI for difference does not include 0.
CAUTION: Bootstrap confidence intervals can be somewhat liberal.
Call:
glm(formula = Weight ~ ., data = train)
Deviance Residuals:
Min 1Q Median 3Q Max
-11.6230 -3.6645 0.0582 3.3049 13.0276
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.74076 59.04151 -0.063 0.950
Height 5.69671 5.83728 0.976 0.330
Pre 0.48497 0.04988 9.723 <2e-16 ***
Day -0.61044 1.17242 -0.521 0.603
Err 1.01651 0.01753 57.989 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for gaussian family taken to be 23.68618)
Null deviance: 129393.2 on 199 degrees of freedom
Residual deviance: 4618.8 on 195 degrees of freedom
AIC: 1207.5
Number of Fisher Scoring iterations: 2
library(neuralnet)Read more https://www.r-bloggers.com/fitting-a-neural-network-in-r-neuralnet-package/
Is similar to linear regression, but probably much more powerful
pr.nn <- compute(nn,test_[,-2])
pr.nn_ <- pr.nn$net.result*(max(Main$Weight)-min(Main$Weight))+min(Main$Weight)
test.r <- (test_$Weight)*(max(Main$Weight)-min(Main$Weight))+min(Main$Weight)
MSE.nn <- sum((test.r - pr.nn_)^2)/nrow(test_)
print(paste(MSE.lm,MSE.nn))[1] "32.2173825421678 16.2889592065124"
par(mfrow=c(1,2))
plot(test$Weight,pr.nn_,col='red',main='Real vs predicted NN',pch=19,cex=0.7)
abline(0,1,lwd=2)
legend('bottomright',legend='NN',pch=19,col='red', bty='n')
plot(test$Weight,pr.model,col='blue',main='Real vs predicted lm',pch=19, cex=0.7)
abline(0,1,lwd=2)
legend('bottomright',legend='LM',pch=19,col='blue', bty='n', cex=.95)plot(test$Weight,pr.nn_,col='red',main='Real vs predicted NN',pch=19)
points(test$Weight,pr.model,col='blue',pch=19)
abline(0,1,lwd=2)
legend('bottomright',legend=c('NN','LM'),pch=19,col=c('red','blue'))library(NeuralNetTools) to Check for Importance