Am going to formulate machine learning using all algotithm as listed below:
1.0 LINEAR REGRESSION
2.0 RIDGE REGRESSION
3.0 LASSO REGRESSION
4.0 ELASTIC NET REGRESSION
5.0 MULTINOMIAL REGRESSION
6.0 DECISION TREE REGRESSION
7.0 RANDOM FOREST
8.0 SUPPORT VECTOR MACHINE
9.0 LINEAR DISCRIMINANT ANALYSIS
10.0 BINARY LOGISTIC REGRESSION
11.0 ORDERED LOGISTIC REGRESSION
1.0 LINEAR REGRESSION
An insurance company in Arizona,was in dillemma :whether is charging its clients too high or too low.They decided to consult Mugo the data scientist.Mugo used kaggle data to explain to them the insights .They thanked him very much.Look what he told them.
GENERAL OBJECTIVE
To determine the effect of associated factors to charges in insurance.
SPECIFIC OBJECTIVES
1.0 to determine if age has effect on insurance charges
2.0 to determine if sex has effect on insurance charges
3.0 to determine if smoking has effect on insurance charges
4.0 to determine if number of children has effect on insurance charges
5.0 to determine if region has effect on insurance charges
NULL HYPOTHESIS
1.0 AGE has no effects to insurance charges
2.0 sex has no effect to insurance charges
3.0 smoking has no effect to insurance charges
4.0 region has no effect to insurance charges
5.0 no of children have no effect to insurance charges
library(readr)
insurance <- read_csv("C:/Users/USER/Desktop/insurance.csv")
## Rows: 1338 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): sex, smoker, region
## dbl (4): age, bmi, children, charges
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(insurance)
## # A tibble: 6 × 7
## age sex bmi children smoker region charges
## <dbl> <chr> <dbl> <dbl> <chr> <chr> <dbl>
## 1 19 female 27.9 0 yes southwest 16885.
## 2 18 male 33.8 1 no southeast 1726.
## 3 28 male 33 3 no southeast 4449.
## 4 33 male 22.7 0 no northwest 21984.
## 5 32 male 28.9 0 no northwest 3867.
## 6 31 female 25.7 0 no southeast 3757.
summary(insurance)
## age sex bmi children
## Min. :18.00 Length:1338 Min. :15.96 Min. :0.000
## 1st Qu.:27.00 Class :character 1st Qu.:26.30 1st Qu.:0.000
## Median :39.00 Mode :character Median :30.40 Median :1.000
## Mean :39.21 Mean :30.66 Mean :1.095
## 3rd Qu.:51.00 3rd Qu.:34.69 3rd Qu.:2.000
## Max. :64.00 Max. :53.13 Max. :5.000
## smoker region charges
## Length:1338 Length:1338 Min. : 1122
## Class :character Class :character 1st Qu.: 4740
## Mode :character Mode :character Median : 9382
## Mean :13270
## 3rd Qu.:16640
## Max. :63770
names(insurance)
## [1] "age" "sex" "bmi" "children" "smoker" "region" "charges"
insurance$children<-as.factor(insurance$children)
insurance$region<-as.factor(insurance$region)
insurance$sex<-as.factor(insurance$sex)
insurance$smoker<-as.factor(insurance$smoker)
levels(insurance$children)
## [1] "0" "1" "2" "3" "4" "5"
class(insurance$children)
## [1] "factor"
levels(insurance$sex)
## [1] "female" "male"
str(insurance)
## spc_tbl_ [1,338 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ age : num [1:1338] 19 18 28 33 32 31 46 37 37 60 ...
## $ sex : Factor w/ 2 levels "female","male": 1 2 2 2 2 1 1 1 2 1 ...
## $ bmi : num [1:1338] 27.9 33.8 33 22.7 28.9 ...
## $ children: Factor w/ 6 levels "0","1","2","3",..: 1 2 4 1 1 1 2 4 3 1 ...
## $ smoker : Factor w/ 2 levels "no","yes": 2 1 1 1 1 1 1 1 1 1 ...
## $ region : Factor w/ 4 levels "northeast","northwest",..: 4 3 3 2 2 3 3 2 1 2 ...
## $ charges : num [1:1338] 16885 1726 4449 21984 3867 ...
## - attr(*, "spec")=
## .. cols(
## .. age = col_double(),
## .. sex = col_character(),
## .. bmi = col_double(),
## .. children = col_double(),
## .. smoker = col_character(),
## .. region = col_character(),
## .. charges = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
explore the data
library(psych)
pairs.panels(insurance[1:4],gap=0,bg=c("red","green","blue"))
as shown in the table a bove there is no multicollinearity problem.But as shown below all the numeric variables are not normally distributed;since they to accept the null hypotheis;null hypothesis is there is no difference between avariable distribution curve with a normally distributed curve;since their pv are <0.5.
solution is to split the variables each into categories/binning
#(insurance$age)
#shapiro.test(insurance$bmi)
#boxplot(insurance$Charges)
binning
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ purrr 1.0.2
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ ggplot2::%+%() masks psych::%+%()
## ✖ ggplot2::alpha() masks psych::alpha()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
insurance<-insurance%>%mutate(bmis=cut(bmi,breaks =c(15,25,35,45,55)))
insurance<-insurance%>%mutate(ages=cut(age,breaks =c(15,35,55,75)))
tail(insurance)
## # A tibble: 6 × 9
## age sex bmi children smoker region charges bmis ages
## <dbl> <fct> <dbl> <fct> <fct> <fct> <dbl> <fct> <fct>
## 1 52 female 44.7 3 no southwest 11412. (35,45] (35,55]
## 2 50 male 31.0 3 no northwest 10601. (25,35] (35,55]
## 3 18 female 31.9 0 no northeast 2206. (25,35] (15,35]
## 4 18 female 36.8 0 no southeast 1630. (35,45] (15,35]
## 5 21 female 25.8 0 no southwest 2008. (25,35] (15,35]
## 6 61 female 29.1 0 yes northwest 29141. (25,35] (55,75]
levels(insurance$bmis)
## [1] "(15,25]" "(25,35]" "(35,45]" "(45,55]"
levels(insurance$region)
## [1] "northeast" "northwest" "southeast" "southwest"
levels(insurance$sex)
## [1] "female" "male"
levels(insurance$smoker)
## [1] "no" "yes"
lm<-lm(charges~ages+children+bmis+sex+smoker+region,data=insurance)
summary(lm)
##
## Call:
## lm(formula = charges ~ ages + children + bmis + sex + smoker +
## region, data = insurance)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14832 -3030 -1106 1669 30821
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1738.6 573.6 3.031 0.00249 **
## ages(35,55] 4544.6 384.2 11.829 < 2e-16 ***
## ages(55,75] 9388.4 509.2 18.438 < 2e-16 ***
## children1 1002.0 448.1 2.236 0.02553 *
## children2 2324.7 497.6 4.672 3.29e-06 ***
## children3 1897.5 576.0 3.294 0.00101 **
## children4 3463.4 1290.1 2.685 0.00735 **
## children5 1615.5 1518.8 1.064 0.28766
## bmis(25,35] 2669.7 464.5 5.748 1.12e-08 ***
## bmis(35,45] 5728.1 560.5 10.220 < 2e-16 ***
## bmis(45,55] 6166.7 1489.8 4.139 3.71e-05 ***
## sexmale -125.5 345.1 -0.364 0.71613
## smokeryes 23724.1 429.6 55.227 < 2e-16 ***
## regionnorthwest -406.6 495.0 -0.821 0.41154
## regionsoutheast -779.4 495.2 -1.574 0.11574
## regionsouthwest -838.5 496.2 -1.690 0.09126 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6283 on 1322 degrees of freedom
## Multiple R-squared: 0.7338, Adjusted R-squared: 0.7308
## F-statistic: 243 on 15 and 1322 DF, p-value: < 2.2e-16
results apart from region all the explanatory variable have a pv < 0.5 we render them statistically significant;They all fail to accept null hypothesis.
the objective have been achieved and all the related questions are positively answeared;apart from region.
73% of changes in charges are explained by the explanatory variables in the model.
smokers are charged 23724.1 higher than non smokers holding other variable constant in the model,males are charges 125.5 less than females holding other factors constant.
Holding other factors constant these who have 1,2,3,4,5 children are charged 1002 ,2324 , 1897, 3463 ,1615 higher than reference point(these who have o children)
Holding other factors constant in the model,age groups 35-55, 55-75 are charged higher then reference age group by 4544 and 9388 repectively
since region has ho significance i drop it from the model. PV of 0.3049 from LRT below means we accept null hypotheis that there is no statistically diffence between the two models and hence we drop region.but u can maintain it to improve predictive effect and causatic effect.
lm2<-lm(charges~ages+children+bmis+sex+smoker,data=insurance)
anova(lm,lm2,test="LRT")
## Analysis of Variance Table
##
## Model 1: charges ~ ages + children + bmis + sex + smoker + region
## Model 2: charges ~ ages + children + bmis + sex + smoker
## Res.Df RSS Df Sum of Sq Pr(>Chi)
## 1 1322 5.2190e+10
## 2 1325 5.2333e+10 -3 -143097306 0.3049
plot(lm)
```r
library(caret)
## Loading required package: lattice
##
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
##
## lift
library(glmnet)
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
## Loaded glmnet 4.1-8
library(mlbench)
library(psych)
library(Amelia)
## Loading required package: Rcpp
## ##
## ## Amelia II: Multiple Imputation
## ## (Version 1.8.1, built: 2022-11-18)
## ## Copyright (C) 2005-2023 James Honaker, Gary King and Matthew Blackwell
## ## Refer to http://gking.harvard.edu/amelia/ for more information
## ##
data(BostonHousing)
data<-BostonHousing
library(reticulate)
data cleaning check missing values
missmap(BostonHousing,col = c("white","black"),y.at = 1,y.labels = '',legend = TRUE)
explore the data
head(data)
## crim zn indus chas nox rm age dis rad tax ptratio b lstat
## 1 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98
## 2 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14
## 3 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03
## 4 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94
## 5 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33
## 6 0.02985 0 2.18 0 0.458 6.430 58.7 6.0622 3 222 18.7 394.12 5.21
## medv
## 1 24.0
## 2 21.6
## 3 34.7
## 4 33.4
## 5 36.2
## 6 28.7
variables
str(data)
## 'data.frame': 506 obs. of 14 variables:
## $ crim : num 0.00632 0.02731 0.02729 0.03237 0.06905 ...
## $ zn : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
## $ indus : num 2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
## $ chas : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ nox : num 0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
## $ rm : num 6.58 6.42 7.18 7 7.15 ...
## $ age : num 65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
## $ dis : num 4.09 4.97 4.97 6.06 6.06 ...
## $ rad : num 1 2 2 3 3 3 5 5 5 5 ...
## $ tax : num 296 242 242 222 222 222 311 311 311 311 ...
## $ ptratio: num 15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
## $ b : num 397 397 393 395 397 ...
## $ lstat : num 4.98 9.14 4.03 2.94 5.33 ...
## $ medv : num 24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
remove factor
head(data[c(-4,-14)])
## crim zn indus nox rm age dis rad tax ptratio b lstat
## 1 0.00632 18 2.31 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98
## 2 0.02731 0 7.07 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14
## 3 0.02729 0 7.07 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03
## 4 0.03237 0 2.18 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94
## 5 0.06905 0 2.18 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33
## 6 0.02985 0 2.18 0.458 6.430 58.7 6.0622 3 222 18.7 394.12 5.21
correlation
pairs.panels(data[c(-4,-14)])
partitioning
set.seed(123)
ind<-sample(2,nrow(data),replace=T,prob=c(0.7,0.3))
train<-data[ind==1,]
test<-data[ind==2,]
custom
custom<-trainControl(method = "repeatedcv",number = 10,repeats = 5,verboseIter = T)
multiple linear regression
set.seed(1234)
lm<-train(medv~.,train,method="lm",trControl=custom)
## + Fold01.Rep1: intercept=TRUE
## - Fold01.Rep1: intercept=TRUE
## + Fold02.Rep1: intercept=TRUE
## - Fold02.Rep1: intercept=TRUE
## + Fold03.Rep1: intercept=TRUE
## - Fold03.Rep1: intercept=TRUE
## + Fold04.Rep1: intercept=TRUE
## - Fold04.Rep1: intercept=TRUE
## + Fold05.Rep1: intercept=TRUE
## - Fold05.Rep1: intercept=TRUE
## + Fold06.Rep1: intercept=TRUE
## - Fold06.Rep1: intercept=TRUE
## + Fold07.Rep1: intercept=TRUE
## - Fold07.Rep1: intercept=TRUE
## + Fold08.Rep1: intercept=TRUE
## - Fold08.Rep1: intercept=TRUE
## + Fold09.Rep1: intercept=TRUE
## - Fold09.Rep1: intercept=TRUE
## + Fold10.Rep1: intercept=TRUE
## - Fold10.Rep1: intercept=TRUE
## + Fold01.Rep2: intercept=TRUE
## - Fold01.Rep2: intercept=TRUE
## + Fold02.Rep2: intercept=TRUE
## - Fold02.Rep2: intercept=TRUE
## + Fold03.Rep2: intercept=TRUE
## - Fold03.Rep2: intercept=TRUE
## + Fold04.Rep2: intercept=TRUE
## - Fold04.Rep2: intercept=TRUE
## + Fold05.Rep2: intercept=TRUE
## - Fold05.Rep2: intercept=TRUE
## + Fold06.Rep2: intercept=TRUE
## - Fold06.Rep2: intercept=TRUE
## + Fold07.Rep2: intercept=TRUE
## - Fold07.Rep2: intercept=TRUE
## + Fold08.Rep2: intercept=TRUE
## - Fold08.Rep2: intercept=TRUE
## + Fold09.Rep2: intercept=TRUE
## - Fold09.Rep2: intercept=TRUE
## + Fold10.Rep2: intercept=TRUE
## - Fold10.Rep2: intercept=TRUE
## + Fold01.Rep3: intercept=TRUE
## - Fold01.Rep3: intercept=TRUE
## + Fold02.Rep3: intercept=TRUE
## - Fold02.Rep3: intercept=TRUE
## + Fold03.Rep3: intercept=TRUE
## - Fold03.Rep3: intercept=TRUE
## + Fold04.Rep3: intercept=TRUE
## - Fold04.Rep3: intercept=TRUE
## + Fold05.Rep3: intercept=TRUE
## - Fold05.Rep3: intercept=TRUE
## + Fold06.Rep3: intercept=TRUE
## - Fold06.Rep3: intercept=TRUE
## + Fold07.Rep3: intercept=TRUE
## - Fold07.Rep3: intercept=TRUE
## + Fold08.Rep3: intercept=TRUE
## - Fold08.Rep3: intercept=TRUE
## + Fold09.Rep3: intercept=TRUE
## - Fold09.Rep3: intercept=TRUE
## + Fold10.Rep3: intercept=TRUE
## - Fold10.Rep3: intercept=TRUE
## + Fold01.Rep4: intercept=TRUE
## - Fold01.Rep4: intercept=TRUE
## + Fold02.Rep4: intercept=TRUE
## - Fold02.Rep4: intercept=TRUE
## + Fold03.Rep4: intercept=TRUE
## - Fold03.Rep4: intercept=TRUE
## + Fold04.Rep4: intercept=TRUE
## - Fold04.Rep4: intercept=TRUE
## + Fold05.Rep4: intercept=TRUE
## - Fold05.Rep4: intercept=TRUE
## + Fold06.Rep4: intercept=TRUE
## - Fold06.Rep4: intercept=TRUE
## + Fold07.Rep4: intercept=TRUE
## - Fold07.Rep4: intercept=TRUE
## + Fold08.Rep4: intercept=TRUE
## - Fold08.Rep4: intercept=TRUE
## + Fold09.Rep4: intercept=TRUE
## - Fold09.Rep4: intercept=TRUE
## + Fold10.Rep4: intercept=TRUE
## - Fold10.Rep4: intercept=TRUE
## + Fold01.Rep5: intercept=TRUE
## - Fold01.Rep5: intercept=TRUE
## + Fold02.Rep5: intercept=TRUE
## - Fold02.Rep5: intercept=TRUE
## + Fold03.Rep5: intercept=TRUE
## - Fold03.Rep5: intercept=TRUE
## + Fold04.Rep5: intercept=TRUE
## - Fold04.Rep5: intercept=TRUE
## + Fold05.Rep5: intercept=TRUE
## - Fold05.Rep5: intercept=TRUE
## + Fold06.Rep5: intercept=TRUE
## - Fold06.Rep5: intercept=TRUE
## + Fold07.Rep5: intercept=TRUE
## - Fold07.Rep5: intercept=TRUE
## + Fold08.Rep5: intercept=TRUE
## - Fold08.Rep5: intercept=TRUE
## + Fold09.Rep5: intercept=TRUE
## - Fold09.Rep5: intercept=TRUE
## + Fold10.Rep5: intercept=TRUE
## - Fold10.Rep5: intercept=TRUE
## Aggregating results
## Fitting final model on full training set
lm
## Linear Regression
##
## 363 samples
## 13 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times)
## Summary of sample sizes: 326, 327, 326, 327, 326, 327, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 4.975392 0.7293561 3.579993
##
## Tuning parameter 'intercept' was held constant at a value of TRUE
regression multiple regression results
lm$results
## intercept RMSE Rsquared MAE RMSESD RsquaredSD MAESD
## 1 TRUE 4.975392 0.7293561 3.579993 1.042083 0.08418269 0.5764493
summary
summary(lm)
##
## Call:
## lm(formula = .outcome ~ ., data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.8373 -2.9216 -0.6721 2.1082 27.2939
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 38.185451 6.039394 6.323 7.86e-10 ***
## crim -0.111750 0.035301 -3.166 0.001683 **
## zn 0.054660 0.016150 3.385 0.000794 ***
## indus 0.003361 0.074173 0.045 0.963887
## chas1 2.528007 1.000831 2.526 0.011982 *
## nox -22.848609 4.524133 -5.050 7.12e-07 ***
## rm 4.291549 0.503590 8.522 4.79e-16 ***
## age -0.004883 0.016287 -0.300 0.764497
## dis -1.789839 0.247693 -7.226 3.15e-12 ***
## rad 0.316714 0.079995 3.959 9.12e-05 ***
## tax -0.011560 0.004490 -2.574 0.010450 *
## ptratio -1.041622 0.156402 -6.660 1.07e-10 ***
## b 0.009382 0.003259 2.879 0.004235 **
## lstat -0.434205 0.060133 -7.221 3.26e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.88 on 349 degrees of freedom
## Multiple R-squared: 0.7463, Adjusted R-squared: 0.7368
## F-statistic: 78.95 on 13 and 349 DF, p-value: < 2.2e-16
plot
plot(lm$finalModel)
RIDGE REGRESSION
set.seed(1234)
ridge<-train(medv~.,train,method="glmnet",tuneGrid=expand.grid(alpha=0,lambda=seq(0.0001,1,length=5)),trControl=custom)
## + Fold01.Rep1: alpha=0, lambda=1
## - Fold01.Rep1: alpha=0, lambda=1
## + Fold02.Rep1: alpha=0, lambda=1
## - Fold02.Rep1: alpha=0, lambda=1
## + Fold03.Rep1: alpha=0, lambda=1
## - Fold03.Rep1: alpha=0, lambda=1
## + Fold04.Rep1: alpha=0, lambda=1
## - Fold04.Rep1: alpha=0, lambda=1
## + Fold05.Rep1: alpha=0, lambda=1
## - Fold05.Rep1: alpha=0, lambda=1
## + Fold06.Rep1: alpha=0, lambda=1
## - Fold06.Rep1: alpha=0, lambda=1
## + Fold07.Rep1: alpha=0, lambda=1
## - Fold07.Rep1: alpha=0, lambda=1
## + Fold08.Rep1: alpha=0, lambda=1
## - Fold08.Rep1: alpha=0, lambda=1
## + Fold09.Rep1: alpha=0, lambda=1
## - Fold09.Rep1: alpha=0, lambda=1
## + Fold10.Rep1: alpha=0, lambda=1
## - Fold10.Rep1: alpha=0, lambda=1
## + Fold01.Rep2: alpha=0, lambda=1
## - Fold01.Rep2: alpha=0, lambda=1
## + Fold02.Rep2: alpha=0, lambda=1
## - Fold02.Rep2: alpha=0, lambda=1
## + Fold03.Rep2: alpha=0, lambda=1
## - Fold03.Rep2: alpha=0, lambda=1
## + Fold04.Rep2: alpha=0, lambda=1
## - Fold04.Rep2: alpha=0, lambda=1
## + Fold05.Rep2: alpha=0, lambda=1
## - Fold05.Rep2: alpha=0, lambda=1
## + Fold06.Rep2: alpha=0, lambda=1
## - Fold06.Rep2: alpha=0, lambda=1
## + Fold07.Rep2: alpha=0, lambda=1
## - Fold07.Rep2: alpha=0, lambda=1
## + Fold08.Rep2: alpha=0, lambda=1
## - Fold08.Rep2: alpha=0, lambda=1
## + Fold09.Rep2: alpha=0, lambda=1
## - Fold09.Rep2: alpha=0, lambda=1
## + Fold10.Rep2: alpha=0, lambda=1
## - Fold10.Rep2: alpha=0, lambda=1
## + Fold01.Rep3: alpha=0, lambda=1
## - Fold01.Rep3: alpha=0, lambda=1
## + Fold02.Rep3: alpha=0, lambda=1
## - Fold02.Rep3: alpha=0, lambda=1
## + Fold03.Rep3: alpha=0, lambda=1
## - Fold03.Rep3: alpha=0, lambda=1
## + Fold04.Rep3: alpha=0, lambda=1
## - Fold04.Rep3: alpha=0, lambda=1
## + Fold05.Rep3: alpha=0, lambda=1
## - Fold05.Rep3: alpha=0, lambda=1
## + Fold06.Rep3: alpha=0, lambda=1
## - Fold06.Rep3: alpha=0, lambda=1
## + Fold07.Rep3: alpha=0, lambda=1
## - Fold07.Rep3: alpha=0, lambda=1
## + Fold08.Rep3: alpha=0, lambda=1
## - Fold08.Rep3: alpha=0, lambda=1
## + Fold09.Rep3: alpha=0, lambda=1
## - Fold09.Rep3: alpha=0, lambda=1
## + Fold10.Rep3: alpha=0, lambda=1
## - Fold10.Rep3: alpha=0, lambda=1
## + Fold01.Rep4: alpha=0, lambda=1
## - Fold01.Rep4: alpha=0, lambda=1
## + Fold02.Rep4: alpha=0, lambda=1
## - Fold02.Rep4: alpha=0, lambda=1
## + Fold03.Rep4: alpha=0, lambda=1
## - Fold03.Rep4: alpha=0, lambda=1
## + Fold04.Rep4: alpha=0, lambda=1
## - Fold04.Rep4: alpha=0, lambda=1
## + Fold05.Rep4: alpha=0, lambda=1
## - Fold05.Rep4: alpha=0, lambda=1
## + Fold06.Rep4: alpha=0, lambda=1
## - Fold06.Rep4: alpha=0, lambda=1
## + Fold07.Rep4: alpha=0, lambda=1
## - Fold07.Rep4: alpha=0, lambda=1
## + Fold08.Rep4: alpha=0, lambda=1
## - Fold08.Rep4: alpha=0, lambda=1
## + Fold09.Rep4: alpha=0, lambda=1
## - Fold09.Rep4: alpha=0, lambda=1
## + Fold10.Rep4: alpha=0, lambda=1
## - Fold10.Rep4: alpha=0, lambda=1
## + Fold01.Rep5: alpha=0, lambda=1
## - Fold01.Rep5: alpha=0, lambda=1
## + Fold02.Rep5: alpha=0, lambda=1
## - Fold02.Rep5: alpha=0, lambda=1
## + Fold03.Rep5: alpha=0, lambda=1
## - Fold03.Rep5: alpha=0, lambda=1
## + Fold04.Rep5: alpha=0, lambda=1
## - Fold04.Rep5: alpha=0, lambda=1
## + Fold05.Rep5: alpha=0, lambda=1
## - Fold05.Rep5: alpha=0, lambda=1
## + Fold06.Rep5: alpha=0, lambda=1
## - Fold06.Rep5: alpha=0, lambda=1
## + Fold07.Rep5: alpha=0, lambda=1
## - Fold07.Rep5: alpha=0, lambda=1
## + Fold08.Rep5: alpha=0, lambda=1
## - Fold08.Rep5: alpha=0, lambda=1
## + Fold09.Rep5: alpha=0, lambda=1
## - Fold09.Rep5: alpha=0, lambda=1
## + Fold10.Rep5: alpha=0, lambda=1
## - Fold10.Rep5: alpha=0, lambda=1
## Aggregating results
## Selecting tuning parameters
## Fitting alpha = 0, lambda = 0.5 on full training set
ridge
## glmnet
##
## 363 samples
## 13 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times)
## Summary of sample sizes: 326, 327, 326, 327, 326, 327, ...
## Resampling results across tuning parameters:
##
## lambda RMSE Rsquared MAE
## 0.000100 4.970605 0.7311616 3.505536
## 0.250075 4.970605 0.7311616 3.505536
## 0.500050 4.970605 0.7311616 3.505536
## 0.750025 4.974643 0.7310088 3.504826
## 1.000000 4.990820 0.7303565 3.506125
##
## Tuning parameter 'alpha' was held constant at a value of 0
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were alpha = 0 and lambda = 0.50005.
plotting
plot(ridge)
ploting further
plot(ridge$finalModel,xvar="lambda",label=T)
PLOT III
plot(ridge$finalModel,xvar="dev",label=T)
PLOT VARIMP
plot(varImp(ridge,scale=F))
LASSO REGRESSION
set.seed(123)
lasso<-train(medv~.,train,method="glmnet",tuneGrid=expand.grid(alpha=1,lambda=seq(0.0001,1,length=5)),trControl=custom)
## + Fold01.Rep1: alpha=1, lambda=1
## - Fold01.Rep1: alpha=1, lambda=1
## + Fold02.Rep1: alpha=1, lambda=1
## - Fold02.Rep1: alpha=1, lambda=1
## + Fold03.Rep1: alpha=1, lambda=1
## - Fold03.Rep1: alpha=1, lambda=1
## + Fold04.Rep1: alpha=1, lambda=1
## - Fold04.Rep1: alpha=1, lambda=1
## + Fold05.Rep1: alpha=1, lambda=1
## - Fold05.Rep1: alpha=1, lambda=1
## + Fold06.Rep1: alpha=1, lambda=1
## - Fold06.Rep1: alpha=1, lambda=1
## + Fold07.Rep1: alpha=1, lambda=1
## - Fold07.Rep1: alpha=1, lambda=1
## + Fold08.Rep1: alpha=1, lambda=1
## - Fold08.Rep1: alpha=1, lambda=1
## + Fold09.Rep1: alpha=1, lambda=1
## - Fold09.Rep1: alpha=1, lambda=1
## + Fold10.Rep1: alpha=1, lambda=1
## - Fold10.Rep1: alpha=1, lambda=1
## + Fold01.Rep2: alpha=1, lambda=1
## - Fold01.Rep2: alpha=1, lambda=1
## + Fold02.Rep2: alpha=1, lambda=1
## - Fold02.Rep2: alpha=1, lambda=1
## + Fold03.Rep2: alpha=1, lambda=1
## - Fold03.Rep2: alpha=1, lambda=1
## + Fold04.Rep2: alpha=1, lambda=1
## - Fold04.Rep2: alpha=1, lambda=1
## + Fold05.Rep2: alpha=1, lambda=1
## - Fold05.Rep2: alpha=1, lambda=1
## + Fold06.Rep2: alpha=1, lambda=1
## - Fold06.Rep2: alpha=1, lambda=1
## + Fold07.Rep2: alpha=1, lambda=1
## - Fold07.Rep2: alpha=1, lambda=1
## + Fold08.Rep2: alpha=1, lambda=1
## - Fold08.Rep2: alpha=1, lambda=1
## + Fold09.Rep2: alpha=1, lambda=1
## - Fold09.Rep2: alpha=1, lambda=1
## + Fold10.Rep2: alpha=1, lambda=1
## - Fold10.Rep2: alpha=1, lambda=1
## + Fold01.Rep3: alpha=1, lambda=1
## - Fold01.Rep3: alpha=1, lambda=1
## + Fold02.Rep3: alpha=1, lambda=1
## - Fold02.Rep3: alpha=1, lambda=1
## + Fold03.Rep3: alpha=1, lambda=1
## - Fold03.Rep3: alpha=1, lambda=1
## + Fold04.Rep3: alpha=1, lambda=1
## - Fold04.Rep3: alpha=1, lambda=1
## + Fold05.Rep3: alpha=1, lambda=1
## - Fold05.Rep3: alpha=1, lambda=1
## + Fold06.Rep3: alpha=1, lambda=1
## - Fold06.Rep3: alpha=1, lambda=1
## + Fold07.Rep3: alpha=1, lambda=1
## - Fold07.Rep3: alpha=1, lambda=1
## + Fold08.Rep3: alpha=1, lambda=1
## - Fold08.Rep3: alpha=1, lambda=1
## + Fold09.Rep3: alpha=1, lambda=1
## - Fold09.Rep3: alpha=1, lambda=1
## + Fold10.Rep3: alpha=1, lambda=1
## - Fold10.Rep3: alpha=1, lambda=1
## + Fold01.Rep4: alpha=1, lambda=1
## - Fold01.Rep4: alpha=1, lambda=1
## + Fold02.Rep4: alpha=1, lambda=1
## - Fold02.Rep4: alpha=1, lambda=1
## + Fold03.Rep4: alpha=1, lambda=1
## - Fold03.Rep4: alpha=1, lambda=1
## + Fold04.Rep4: alpha=1, lambda=1
## - Fold04.Rep4: alpha=1, lambda=1
## + Fold05.Rep4: alpha=1, lambda=1
## - Fold05.Rep4: alpha=1, lambda=1
## + Fold06.Rep4: alpha=1, lambda=1
## - Fold06.Rep4: alpha=1, lambda=1
## + Fold07.Rep4: alpha=1, lambda=1
## - Fold07.Rep4: alpha=1, lambda=1
## + Fold08.Rep4: alpha=1, lambda=1
## - Fold08.Rep4: alpha=1, lambda=1
## + Fold09.Rep4: alpha=1, lambda=1
## - Fold09.Rep4: alpha=1, lambda=1
## + Fold10.Rep4: alpha=1, lambda=1
## - Fold10.Rep4: alpha=1, lambda=1
## + Fold01.Rep5: alpha=1, lambda=1
## - Fold01.Rep5: alpha=1, lambda=1
## + Fold02.Rep5: alpha=1, lambda=1
## - Fold02.Rep5: alpha=1, lambda=1
## + Fold03.Rep5: alpha=1, lambda=1
## - Fold03.Rep5: alpha=1, lambda=1
## + Fold04.Rep5: alpha=1, lambda=1
## - Fold04.Rep5: alpha=1, lambda=1
## + Fold05.Rep5: alpha=1, lambda=1
## - Fold05.Rep5: alpha=1, lambda=1
## + Fold06.Rep5: alpha=1, lambda=1
## - Fold06.Rep5: alpha=1, lambda=1
## + Fold07.Rep5: alpha=1, lambda=1
## - Fold07.Rep5: alpha=1, lambda=1
## + Fold08.Rep5: alpha=1, lambda=1
## - Fold08.Rep5: alpha=1, lambda=1
## + Fold09.Rep5: alpha=1, lambda=1
## - Fold09.Rep5: alpha=1, lambda=1
## + Fold10.Rep5: alpha=1, lambda=1
## - Fold10.Rep5: alpha=1, lambda=1
## Aggregating results
## Selecting tuning parameters
## Fitting alpha = 1, lambda = 1e-04 on full training set
lasso
## glmnet
##
## 363 samples
## 13 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times)
## Summary of sample sizes: 326, 326, 327, 326, 327, 327, ...
## Resampling results across tuning parameters:
##
## lambda RMSE Rsquared MAE
## 0.000100 4.979856 0.7346555 3.567965
## 0.250075 5.142243 0.7166092 3.597452
## 0.500050 5.379660 0.6923277 3.770186
## 0.750025 5.526544 0.6786284 3.896402
## 1.000000 5.598585 0.6761963 3.959097
##
## Tuning parameter 'alpha' was held constant at a value of 1
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were alpha = 1 and lambda = 1e-04.
plot
plot(lasso)
plot
plot(lasso$finalModel,xvr="lambda",label=T)
## Warning in plot.window(...): "xvr" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "xvr" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "xvr" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "xvr" is not a
## graphical parameter
## Warning in box(...): "xvr" is not a graphical parameter
## Warning in title(...): "xvr" is not a graphical parameter
PLOT
plot(lasso$finalModel,xvr="dev",label=T)
## Warning in plot.window(...): "xvr" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "xvr" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "xvr" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "xvr" is not a
## graphical parameter
## Warning in box(...): "xvr" is not a graphical parameter
## Warning in title(...): "xvr" is not a graphical parameter
PLOT
plot(lasso$finalModel,xvr="dev",label=T)
## Warning in plot.window(...): "xvr" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "xvr" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "xvr" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "xvr" is not a
## graphical parameter
## Warning in box(...): "xvr" is not a graphical parameter
## Warning in title(...): "xvr" is not a graphical parameter
PLOT
plot(varImp(lasso,scale=F))
ELASTIC NET REGRESSION
set.seed(1234)
en<-train(medv~.,train,method="glmnet",tuneGrid=expand.grid(alpha=1,lambda=seq(0,1,length=5)),trControl=custom)
## + Fold01.Rep1: alpha=1, lambda=1
## - Fold01.Rep1: alpha=1, lambda=1
## + Fold02.Rep1: alpha=1, lambda=1
## - Fold02.Rep1: alpha=1, lambda=1
## + Fold03.Rep1: alpha=1, lambda=1
## - Fold03.Rep1: alpha=1, lambda=1
## + Fold04.Rep1: alpha=1, lambda=1
## - Fold04.Rep1: alpha=1, lambda=1
## + Fold05.Rep1: alpha=1, lambda=1
## - Fold05.Rep1: alpha=1, lambda=1
## + Fold06.Rep1: alpha=1, lambda=1
## - Fold06.Rep1: alpha=1, lambda=1
## + Fold07.Rep1: alpha=1, lambda=1
## - Fold07.Rep1: alpha=1, lambda=1
## + Fold08.Rep1: alpha=1, lambda=1
## - Fold08.Rep1: alpha=1, lambda=1
## + Fold09.Rep1: alpha=1, lambda=1
## - Fold09.Rep1: alpha=1, lambda=1
## + Fold10.Rep1: alpha=1, lambda=1
## - Fold10.Rep1: alpha=1, lambda=1
## + Fold01.Rep2: alpha=1, lambda=1
## - Fold01.Rep2: alpha=1, lambda=1
## + Fold02.Rep2: alpha=1, lambda=1
## - Fold02.Rep2: alpha=1, lambda=1
## + Fold03.Rep2: alpha=1, lambda=1
## - Fold03.Rep2: alpha=1, lambda=1
## + Fold04.Rep2: alpha=1, lambda=1
## - Fold04.Rep2: alpha=1, lambda=1
## + Fold05.Rep2: alpha=1, lambda=1
## - Fold05.Rep2: alpha=1, lambda=1
## + Fold06.Rep2: alpha=1, lambda=1
## - Fold06.Rep2: alpha=1, lambda=1
## + Fold07.Rep2: alpha=1, lambda=1
## - Fold07.Rep2: alpha=1, lambda=1
## + Fold08.Rep2: alpha=1, lambda=1
## - Fold08.Rep2: alpha=1, lambda=1
## + Fold09.Rep2: alpha=1, lambda=1
## - Fold09.Rep2: alpha=1, lambda=1
## + Fold10.Rep2: alpha=1, lambda=1
## - Fold10.Rep2: alpha=1, lambda=1
## + Fold01.Rep3: alpha=1, lambda=1
## - Fold01.Rep3: alpha=1, lambda=1
## + Fold02.Rep3: alpha=1, lambda=1
## - Fold02.Rep3: alpha=1, lambda=1
## + Fold03.Rep3: alpha=1, lambda=1
## - Fold03.Rep3: alpha=1, lambda=1
## + Fold04.Rep3: alpha=1, lambda=1
## - Fold04.Rep3: alpha=1, lambda=1
## + Fold05.Rep3: alpha=1, lambda=1
## - Fold05.Rep3: alpha=1, lambda=1
## + Fold06.Rep3: alpha=1, lambda=1
## - Fold06.Rep3: alpha=1, lambda=1
## + Fold07.Rep3: alpha=1, lambda=1
## - Fold07.Rep3: alpha=1, lambda=1
## + Fold08.Rep3: alpha=1, lambda=1
## - Fold08.Rep3: alpha=1, lambda=1
## + Fold09.Rep3: alpha=1, lambda=1
## - Fold09.Rep3: alpha=1, lambda=1
## + Fold10.Rep3: alpha=1, lambda=1
## - Fold10.Rep3: alpha=1, lambda=1
## + Fold01.Rep4: alpha=1, lambda=1
## - Fold01.Rep4: alpha=1, lambda=1
## + Fold02.Rep4: alpha=1, lambda=1
## - Fold02.Rep4: alpha=1, lambda=1
## + Fold03.Rep4: alpha=1, lambda=1
## - Fold03.Rep4: alpha=1, lambda=1
## + Fold04.Rep4: alpha=1, lambda=1
## - Fold04.Rep4: alpha=1, lambda=1
## + Fold05.Rep4: alpha=1, lambda=1
## - Fold05.Rep4: alpha=1, lambda=1
## + Fold06.Rep4: alpha=1, lambda=1
## - Fold06.Rep4: alpha=1, lambda=1
## + Fold07.Rep4: alpha=1, lambda=1
## - Fold07.Rep4: alpha=1, lambda=1
## + Fold08.Rep4: alpha=1, lambda=1
## - Fold08.Rep4: alpha=1, lambda=1
## + Fold09.Rep4: alpha=1, lambda=1
## - Fold09.Rep4: alpha=1, lambda=1
## + Fold10.Rep4: alpha=1, lambda=1
## - Fold10.Rep4: alpha=1, lambda=1
## + Fold01.Rep5: alpha=1, lambda=1
## - Fold01.Rep5: alpha=1, lambda=1
## + Fold02.Rep5: alpha=1, lambda=1
## - Fold02.Rep5: alpha=1, lambda=1
## + Fold03.Rep5: alpha=1, lambda=1
## - Fold03.Rep5: alpha=1, lambda=1
## + Fold04.Rep5: alpha=1, lambda=1
## - Fold04.Rep5: alpha=1, lambda=1
## + Fold05.Rep5: alpha=1, lambda=1
## - Fold05.Rep5: alpha=1, lambda=1
## + Fold06.Rep5: alpha=1, lambda=1
## - Fold06.Rep5: alpha=1, lambda=1
## + Fold07.Rep5: alpha=1, lambda=1
## - Fold07.Rep5: alpha=1, lambda=1
## + Fold08.Rep5: alpha=1, lambda=1
## - Fold08.Rep5: alpha=1, lambda=1
## + Fold09.Rep5: alpha=1, lambda=1
## - Fold09.Rep5: alpha=1, lambda=1
## + Fold10.Rep5: alpha=1, lambda=1
## - Fold10.Rep5: alpha=1, lambda=1
## Aggregating results
## Selecting tuning parameters
## Fitting alpha = 1, lambda = 0 on full training set
en
## glmnet
##
## 363 samples
## 13 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times)
## Summary of sample sizes: 326, 327, 326, 327, 326, 327, ...
## Resampling results across tuning parameters:
##
## lambda RMSE Rsquared MAE
## 0.00 4.971737 0.7296652 3.572301
## 0.25 5.114796 0.7165778 3.589798
## 0.50 5.340177 0.6970873 3.756935
## 0.75 5.491257 0.6841587 3.888663
## 1.00 5.564690 0.6816318 3.955273
##
## Tuning parameter 'alpha' was held constant at a value of 1
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were alpha = 1 and lambda = 0.
plot
plot(en)
plot
plot(en$finalModel,xvar="lambda",label=T)
PLOT
plot(en$finalModel,xvar="dev",label=T)
plot(varImp(en))
compare models
model_list<-list(linearmodel=lm,Ridge=ridge,Lasso=lasso,Elasticnet=en)
res<-resamples(model_list)
res
##
## Call:
## resamples.default(x = model_list)
##
## Models: linearmodel, Ridge, Lasso, Elasticnet
## Number of resamples: 50
## Performance metrics: MAE, RMSE, Rsquared
## Time estimates for: everything, final model fit
summary
summary(res)
##
## Call:
## summary.resamples(object = res)
##
## Models: linearmodel, Ridge, Lasso, Elasticnet
## Number of resamples: 50
##
## MAE
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## linearmodel 2.677284 3.187491 3.482063 3.579993 3.877766 5.247107 0
## Ridge 2.272557 3.049638 3.411169 3.505536 3.806774 5.202456 0
## Lasso 2.604518 3.245134 3.619309 3.567965 3.855642 4.994723 0
## Elasticnet 2.642375 3.157806 3.477071 3.572301 3.875828 5.239103 0
##
## RMSE
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## linearmodel 3.389000 4.115437 4.771370 4.975392 5.664986 7.702679 0
## Ridge 2.868358 4.069039 4.755433 4.970605 5.622595 7.940918 0
## Lasso 3.482731 4.287392 4.793794 4.979856 5.284133 8.242338 0
## Elasticnet 3.346567 4.098251 4.765642 4.971737 5.653089 7.717937 0
##
## Rsquared
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## linearmodel 0.5577033 0.6877298 0.7252159 0.7293561 0.7874223 0.8934893 0
## Ridge 0.5275743 0.6867336 0.7430211 0.7311616 0.7837230 0.8956834 0
## Lasso 0.4700928 0.6701740 0.7468853 0.7346555 0.8093734 0.8928662 0
## Elasticnet 0.5558059 0.6891828 0.7246773 0.7296652 0.7881560 0.8938194 0
xyplot
xyplot(res,metric="RMSE")
BEST MODEL
best<-en$bestTune
best<-en$finalModel
coef(best,s=en$bestTune$lambda)
## 14 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) 37.592179762
## crim -0.109348699
## zn 0.053511177
## indus .
## chas1 2.515006344
## nox -22.527533084
## rm 4.307031899
## age -0.004354252
## dis -1.762718351
## rad 0.302126999
## tax -0.010922566
## ptratio -1.034782961
## b 0.009306407
## lstat -0.434020577
prediction
p1<-predict(en,train)
p1
## 1 3 6 7 9 10 12 13
## 30.029355 30.396753 24.561699 22.921030 12.522902 18.865567 21.307142 21.232986
## 14 15 17 18 19 22 23 25
## 18.692488 18.633674 19.731177 16.681525 15.772828 17.385066 16.075634 15.460292
## 27 28 29 30 33 35 36 38
## 14.993827 14.651989 19.278293 20.719439 9.819935 14.110012 23.765122 22.905485
## 39 40 41 42 43 44 45 46
## 22.974545 31.774338 34.663044 27.872484 24.826324 24.401457 22.685034 21.917084
## 47 48 49 51 52 54 55 56
## 20.655299 18.379298 10.123980 21.301707 23.615972 23.753771 15.837700 31.241383
## 57 60 61 62 63 64 66 70
## 24.516441 20.438613 17.310199 18.157273 23.189925 22.066266 30.797144 20.543840
## 72 74 75 76 77 78 79 80
## 21.778072 24.064215 25.782342 24.177855 23.408338 23.730307 21.563822 22.491122
## 81 82 83 85 86 90 91 92
## 28.512491 27.036799 26.067510 24.939130 27.841751 31.214345 27.532738 27.725768
## 93 94 95 96 98 99 100 101
## 29.244565 29.217268 27.362654 29.084243 36.645698 35.994169 33.038339 24.865356
## 102 103 105 108 109 110 112 113
## 25.768284 20.016943 21.757105 21.322585 23.078511 20.375466 27.052276 21.485895
## 116 117 119 120 121 122 123 124
## 21.101957 23.837059 21.016411 21.201315 21.666406 22.223412 20.607733 17.063676
## 125 127 128 129 130 131 133 135
## 20.547315 15.421737 15.142586 19.084893 14.046744 19.853364 19.647732 13.090447
## 136 140 141 142 143 144 146 147
## 17.420815 16.779922 14.469562 5.258234 14.579225 12.273893 12.616658 15.112643
## 148 149 152 153 154 155 156 157
## 8.662768 9.875755 17.154117 18.559761 16.516555 21.549876 19.357130 12.657012
## 158 159 160 161 162 163 164 165
## 33.287297 28.809813 24.385979 32.348996 36.820134 40.546286 42.235114 24.784750
## 166 168 169 170 171 172 175 176
## 25.365368 23.204308 26.652203 26.922018 22.801499 24.317177 26.842734 31.049455
## 177 178 180 182 184 185 186 187
## 25.923752 29.180476 33.299926 28.174823 31.027790 23.018814 25.286960 36.550080
## 188 191 192 194 196 197 198 199
## 34.701145 31.304327 30.643005 32.790996 42.089150 37.016091 33.866751 35.795925
## 200 201 203 204 205 207 208 209
## 30.716619 31.343821 38.233107 43.507461 44.677221 23.741933 18.184987 23.586327
## 210 211 212 213 214 215 217 218
## 17.402570 22.606997 17.685627 23.022535 25.378294 12.845758 26.659772 28.375969
## 221 224 225 226 227 228 229 231
## 33.644571 29.941548 39.207950 40.900219 38.085233 32.789123 36.156637 24.694900
## 232 233 234 235 236 237 239 241
## 33.585206 38.508323 37.781603 31.795883 25.484735 30.076957 28.598154 27.904319
## 242 243 244 245 247 251 252 253
## 23.992499 24.254490 27.234300 15.558779 19.380342 23.762839 24.385389 24.092584
## 254 255 257 258 259 263 265 266
## 29.758281 23.648192 38.351456 44.535004 37.183861 41.978386 36.531534 28.467036
## 267 268 269 270 272 273 274 278
## 32.412671 42.345413 39.967621 25.980389 27.246930 28.739793 35.907790 34.957078
## 279 280 282 283 284 285 286 287
## 30.929424 35.954677 34.676591 40.652864 45.906817 32.576524 27.839164 20.275697
## 288 289 290 291 292 293 298 299
## 27.011402 27.205246 27.317851 33.933261 35.065631 32.363199 19.591201 29.317746
## 302 305 306 307 308 309 310 311
## 29.601553 33.922149 31.456079 36.434611 33.377825 28.224709 23.426212 18.485754
## 312 314 315 318 319 322 323 325
## 26.680723 25.264004 25.353925 18.418161 24.290884 24.494107 22.373109 24.713462
## 326 328 329 331 332 335 336 337
## 24.105969 19.159140 21.447661 21.702029 20.107221 20.467446 19.665673 19.443841
## 338 339 341 342 343 344 345 346
## 18.472321 21.592714 20.773890 30.525017 21.896050 27.953967 28.583016 15.823599
## 348 349 350 351 353 354 355 357
## 25.116765 27.409012 21.632987 19.678949 15.971861 24.116600 13.064029 19.319598
## 358 359 361 362 364 365 367 368
## 22.027816 21.190345 21.662456 18.485233 19.553477 37.727454 14.840983 9.950105
## 369 370 371 372 374 375 378 379
## 22.617232 32.191311 34.289291 24.989386 7.631042 2.191851 21.389142 16.992755
## 381 383 385 387 388 389 390 392
## 15.159348 13.965652 3.945227 6.663718 6.704284 7.291424 14.388025 17.578658
## 395 396 397 398 399 402 404 405
## 17.993011 20.717611 19.912041 16.670989 7.702981 18.481691 13.074444 8.312991
## 406 408 409 410 411 413 414 415
## 8.765305 19.792561 15.079756 21.215801 15.131298 3.269362 12.297671 -3.071166
## 416 418 419 420 421 422 423 424
## 11.079721 7.708828 6.576804 15.676791 19.626689 18.035926 18.615683 14.329842
## 426 427 428 429 432 433 435 436
## 10.734174 16.949499 14.210641 15.045954 19.959264 21.964047 15.842339 14.092238
## 437 438 439 440 441 442 444 448
## 14.732136 9.630333 6.346714 13.355870 12.997742 17.579725 18.332074 18.019147
## 449 450 451 452 453 454 455 459
## 17.631512 17.589919 16.872240 19.704515 18.682908 23.020719 15.673445 17.113835
## 460 462 463 464 465 466 467 468
## 18.207339 20.036266 19.524251 21.888314 20.243344 17.733011 14.464244 17.822137
## 469 471 473 475 476 477 478 479
## 17.547512 20.596876 22.930391 16.713147 17.396006 21.337025 12.350325 19.815520
## 481 483 484 486 487 488 489 492
## 23.643893 28.755741 20.925021 22.079396 19.798358 21.292015 12.215478 14.338384
## 493 495 497 498 499 501 502 503
## 16.089936 20.992295 14.538963 19.180972 21.540926 20.745266 23.444228 22.014430
## 504 505 506
## 27.341846 25.773079 21.746144
RMSE
sqrt(mean(train$medv-p1)^2)
## [1] 1.28278e-14
test prediction
p2<-predict(en,test)
sqrt(mean(test$medv-p2)^2)
## [1] 0.6604127
library(tidyverse)
library(readr)
library(nnet)
Admission_Predict <- read_csv("C:/Users/USER/Desktop/Admission_Predict.csv")
## Rows: 400 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (4): gre, sop, cgpa, admitted
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
attach(Admission_Predict)
Admission_Predict$admitted<-as.factor(Admission_Predict$admitted)
data<-Admission_Predict
str(data)
## spc_tbl_ [400 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ gre : num [1:400] 337 324 316 322 314 330 321 308 302 323 ...
## $ sop : num [1:400] 4.5 4 3 3.5 2 4.5 3 3 2 3.5 ...
## $ cgpa : num [1:400] 9.65 8.87 8 8.67 8.21 9.34 8.2 7.9 8 8.6 ...
## $ admitted: Factor w/ 2 levels "0","1": 2 2 2 2 1 2 2 1 1 1 ...
## - attr(*, "spec")=
## .. cols(
## .. gre = col_double(),
## .. sop = col_double(),
## .. cgpa = col_double(),
## .. admitted = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
explore the data
summary(data)
## gre sop cgpa admitted
## Min. :290.0 Min. :1.0 Min. :6.800 0:181
## 1st Qu.:308.0 1st Qu.:2.5 1st Qu.:8.170 1:219
## Median :317.0 Median :3.5 Median :8.610
## Mean :316.8 Mean :3.4 Mean :8.599
## 3rd Qu.:325.0 3rd Qu.:4.0 3rd Qu.:9.062
## Max. :340.0 Max. :5.0 Max. :9.920
table(data$admitted)
##
## 0 1
## 181 219
data partitioning
set.seed(123)
pd<-sample(2,nrow(data),replace = TRUE,prob = c(0.8,0.2))
train<-data[pd==1,]
test<-data[pd==2,]
MODEL
m1<-multinom(admitted~.,data=train)
## # weights: 5 (4 variable)
## initial value 225.272834
## iter 10 value 155.133415
## final value 155.131838
## converged
m1
## Call:
## multinom(formula = admitted ~ ., data = train)
##
## Coefficients:
## (Intercept) gre sop cgpa
## -40.1574370 0.1192646 0.4172153 0.1479590
##
## Residual Deviance: 310.2637
## AIC: 318.2637
miclassification rate
p<-predict(m1,data=train)
p
## [1] 1 1 0 1 1 0 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1
## [38] 1 1 0 0 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 1 1 0 0 0 0 1 1 1 1 1 1 1 0
## [75] 0 0 0 0 0 1 1 1 1 0 1 1 1 1 0 1 0 0 1 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 1 0
## [112] 1 1 1 1 1 1 0 1 1 0 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 0
## [149] 0 0 1 0 1 1 1 1 1 1 0 0 0 1 1 0 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0
## [186] 0 1 1 1 0 0 1 1 1 0 0 0 1 1 0 1 1 1 1 0 1 1 1 0 0 1 1 0 1 0 0 0 1 0 0 0 0
## [223] 0 1 1 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 0 0 1 1 1 0 1 1 1 0 0 1 1 1 0 0
## [260] 0 1 1 0 0 0 1 0 1 1 0 0 1 1 1 1 1 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1 0 0
## [297] 1 1 0 0 0 0 1 1 0 0 0 0 1 1 1 0 1 0 0 0 1 0 1 1 1 1 1 1 0
## Levels: 0 1
table
ta<-table(p,train$admitted)
ta
##
## p 0 1
## 0 108 34
## 1 38 145
accuracy (correct classification)
sum(diag(ta))/sum(ta)
## [1] 0.7784615
how many admitted not admitted
table(train$admitted)
##
## 0 1
## 146 179
model performance evaluation
library(ROCR)
pred<-predict(m1,data,type="prob")
head(pred)
## 1 2 3 4 5 6
## 0.9657758 0.8123789 0.4913724 0.7288536 0.3409223 0.9212373
head(data)
## # A tibble: 6 × 4
## gre sop cgpa admitted
## <dbl> <dbl> <dbl> <fct>
## 1 337 4.5 9.65 1
## 2 324 4 8.87 1
## 3 316 3 8 1
## 4 322 3.5 8.67 1
## 5 314 2 8.21 0
## 6 330 4.5 9.34 1
histogram
hist(pred)
roc_curve
pred<-prediction(pred,data$admitted)
pred
## A prediction instance
## with 400 data points
eval<-performance(pred,"acc")
plot(eval)
abline(h=0.79,v=0.60)
identify best cut off and accuracy
max<-which.max(slot(eval,"y.values")[[1]])
acc<-slot(eval,"y.values")[[1]][max]
max
## [1] 192
acc
## [1] 0.7925
cutoff<-slot(eval,"x.values")[[1]][max]
cutoff
## 334
## 0.5994509
print(c(accuracy=acc,cutoffp=cutoff))
## accuracy cutoffp.334
## 0.7925000 0.5994509
performance(pred,"tpr")
## A performance instance
## 'Cutoff' vs. 'True positive rate' (alpha: 'none')
## with 398 data points
roc<-performance(pred,"tpr","fpr")
plot(roc)
abline(a=0,b=1)
plot(roc,colorize=T,main="Roc Curve",ylab="sensitivity",xlab="1-specificity")
abline(a=0,b=1)
are under the curve
auc<-performance(pred,"auc")
auc<-unlist(slot(auc,"y.values"))
auc
## [1] 0.8513459
auc<-round(auc,2)
auc
## [1] 0.85
#legend(.6,.2,legend="auc",title = "AUC"))
multinomial logistic regression
Maternal_Health_Risk_Data_Set <- read_csv("Maternal Health Risk Data Set.csv")
## Rows: 1014 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): RiskLevel
## dbl (6): Age, SystolicBP, DiastolicBP, BS, BodyTemp, HeartRate
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
maternal_risk<-Maternal_Health_Risk_Data_Set
head(maternal_risk)
## # A tibble: 6 × 7
## Age SystolicBP DiastolicBP BS BodyTemp HeartRate RiskLevel
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 25 130 80 15 98 86 high risk
## 2 35 140 90 13 98 70 high risk
## 3 29 90 70 8 100 80 high risk
## 4 30 140 85 7 98 70 high risk
## 5 35 120 60 6.1 98 76 low risk
## 6 23 140 80 7.01 98 70 high risk
maternal_risk$RiskLevel<-as.factor(maternal_risk$RiskLevel)
names(maternal_risk)<-str_replace_all(names(maternal_risk),"\\s","_")
#maternal_risk$RiskLevel[maternal_risk$RiskLevel=='high risk']<-1
#maternal_risk$RiskLevel[maternal_risk$RiskLevel=='mid risk']<-2
#maternal_risk$RiskLevel[maternal_risk$RiskLevel=='low risk']<-3
names(maternal_risk)
## [1] "Age" "SystolicBP" "DiastolicBP" "BS" "BodyTemp"
## [6] "HeartRate" "RiskLevel"
str(maternal_risk)
## spc_tbl_ [1,014 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Age : num [1:1014] 25 35 29 30 35 23 23 35 32 42 ...
## $ SystolicBP : num [1:1014] 130 140 90 140 120 140 130 85 120 130 ...
## $ DiastolicBP: num [1:1014] 80 90 70 85 60 80 70 60 90 80 ...
## $ BS : num [1:1014] 15 13 8 7 6.1 7.01 7.01 11 6.9 18 ...
## $ BodyTemp : num [1:1014] 98 98 100 98 98 98 98 102 98 98 ...
## $ HeartRate : num [1:1014] 86 70 80 70 76 70 78 86 70 70 ...
## $ RiskLevel : Factor w/ 3 levels "high risk","low risk",..: 1 1 1 1 2 1 3 1 3 1 ...
## - attr(*, "spec")=
## .. cols(
## .. Age = col_double(),
## .. SystolicBP = col_double(),
## .. DiastolicBP = col_double(),
## .. BS = col_double(),
## .. BodyTemp = col_double(),
## .. HeartRate = col_double(),
## .. RiskLevel = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
mymodel<-multinom(RiskLevel~ Age+SystolicBP+DiastolicBP+BS+BodyTemp+HeartRate,data = maternal_risk)
## # weights: 24 (14 variable)
## initial value 1113.992861
## iter 10 value 919.635048
## iter 20 value 817.505685
## iter 30 value 780.847881
## iter 40 value 780.842087
## final value 780.838740
## converged
summary(mymodel)
## Call:
## multinom(formula = RiskLevel ~ Age + SystolicBP + DiastolicBP +
## BS + BodyTemp + HeartRate, data = maternal_risk)
##
## Coefficients:
## (Intercept) Age SystolicBP DiastolicBP BS BodyTemp
## low risk 100.0386 0.01939081 -0.06613182 -0.02202248 -0.7487849 -0.8074419
## mid risk 52.1469 0.01653015 -0.00778542 -0.05880622 -0.4038215 -0.4035825
## HeartRate
## low risk -0.06495778
## mid risk -0.04168854
##
## Std. Errors:
## (Intercept) Age SystolicBP DiastolicBP BS BodyTemp
## low risk 0.0001341594 0.01089657 0.010765949 0.01358323 0.08027454 0.01507466
## mid risk 0.0001270224 0.01025968 0.009867095 0.01263257 0.04493161 0.01312267
## HeartRate
## low risk 0.01456100
## mid risk 0.01313263
##
## Residual Deviance: 1561.677
## AIC: 1589.677
prediction
head(predict(mymodel,maternal_risk))
## [1] high risk high risk low risk mid risk mid risk mid risk
## Levels: high risk low risk mid risk
prediction in terms of probability
head(predict(mymodel,maternal_risk,type="prob"))
## high risk low risk mid risk
## 1 0.91038118 0.002803312 0.08681551
## 2 0.78607232 0.015382745 0.19854494
## 3 0.15001274 0.486422100 0.36356516
## 4 0.15895036 0.281647825 0.55940182
## 5 0.02522402 0.425887276 0.54888870
## 6 0.14485164 0.248306646 0.60684171
please calculate probability of 4th,10th and 18th patients
solution
predict(mymodel,maternal_risk[c(4,10,18),],type="prob")
## high risk low risk mid risk
## 1 0.1589504 0.281647825 0.5594018
## 2 0.9306159 0.001191721 0.0681924
## 3 0.4033055 0.241628846 0.3550657
confusion matrix
cm<-table(predict(mymodel),maternal_risk$RiskLevel)
cm
##
## high risk low risk mid risk
## high risk 198 13 54
## low risk 14 321 164
## mid risk 60 72 118
accuracy of the model in percentage
(sum(diag(cm))/sum(cm))*100
## [1] 62.82051
two tailed z test
z<-summary(mymodel)$coefficients/summary(mymodel)$std.errors
p<-(1-pnorm(abs(z),0,1))*2
p
## numeric(0)
decision tree
library(party)
## Loading required package: grid
## Loading required package: mvtnorm
## Loading required package: modeltools
## Loading required package: stats4
## Loading required package: strucchange
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: sandwich
##
## Attaching package: 'strucchange'
## The following object is masked from 'package:stringr':
##
## boundary
##
## Attaching package: 'party'
## The following object is masked from 'package:dplyr':
##
## where
tree<-ctree(RiskLevel~ Age+SystolicBP+DiastolicBP+BS+BodyTemp+HeartRate,data = maternal_risk,controls=ctree_control(mincriterion=0.99,minsplit=500))
plot(tree)
predict
head(predict(tree,maternal_risk,type='prob'))
## [[1]]
## [1] 0.75285171 0.03041825 0.21673004
##
## [[2]]
## [1] 0.75285171 0.03041825 0.21673004
##
## [[3]]
## [1] 0.75285171 0.03041825 0.21673004
##
## [[4]]
## [1] 0.8888889 0.0000000 0.1111111
##
## [[5]]
## [1] 0.02 0.54 0.44
##
## [[6]]
## [1] 0.8888889 0.0000000 0.1111111
head(predict(tree,maternal_risk))
## [1] high risk high risk high risk high risk low risk high risk
## Levels: high risk low risk mid risk
decision tree with rpart
library(rpart)
tree1<-rpart(RiskLevel~ Age+SystolicBP+DiastolicBP+BS+BodyTemp+HeartRate,data = maternal_risk)
library(rpart.plot)
rpart.plot(tree1,extra = 1)
rpart.plot(tree1,extra = 6)
## Warning: extra=6 but the response has 3 levels (only the 2nd level is
## displayed)
rpart.plot(tree1,extra = 10)
## Warning: extra=10 but the response has 3 levels (only the 2nd level is
## displayed)
prediction
head(predict(tree1,maternal_risk))
## high risk low risk mid risk
## 1 0.75285171 0.03041825 0.2167300
## 2 0.75285171 0.03041825 0.2167300
## 3 0.75285171 0.03041825 0.2167300
## 4 0.88888889 0.00000000 0.1111111
## 5 0.08474576 0.55932203 0.3559322
## 6 0.88888889 0.00000000 0.1111111
confusion matrix
tab<-table(predict(tree),maternal_risk$RiskLevel)
tab
##
## high risk low risk mid risk
## high risk 238 8 62
## low risk 9 359 170
## mid risk 25 39 104
accuracy of the model
(sum(diag(tab))/sum(tab))*100
## [1] 69.13215
random forest
library(randomForest)
## randomForest 4.7-1.1
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
## The following object is masked from 'package:dplyr':
##
## combine
## The following object is masked from 'package:ggplot2':
##
## margin
## The following object is masked from 'package:psych':
##
## outlier
rf<-randomForest(RiskLevel~ Age+SystolicBP+DiastolicBP+BS+BodyTemp+HeartRate,data = maternal_risk)
rf
##
## Call:
## randomForest(formula = RiskLevel ~ Age + SystolicBP + DiastolicBP + BS + BodyTemp + HeartRate, data = maternal_risk)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 2
##
## OOB estimate of error rate: 14.6%
## Confusion matrix:
## high risk low risk mid risk class.error
## high risk 248 10 14 0.08823529
## low risk 8 342 56 0.15763547
## mid risk 19 41 276 0.17857143
confusion matrix
tab<-table(predict(rf),maternal_risk$RiskLevel)
tab
##
## high risk low risk mid risk
## high risk 248 8 19
## low risk 10 342 41
## mid risk 14 56 276
accuracy of the model ```
SUPPORT_VECTOR_MACHINE"
library(ggplot2)
str(iris)
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
qplot(Petal.Length,Petal.Width,data=iris,color=Species)
## Warning: `qplot()` was deprecated in ggplot2 3.4.0.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
MODEL
library(e1071)
m1<-svm(Species~.,data=iris)
summary(m1)
##
## Call:
## svm(formula = Species ~ ., data = iris)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
##
## Number of Support Vectors: 51
##
## ( 8 22 21 )
##
##
## Number of Classes: 3
##
## Levels:
## setosa versicolor virginica
plot
plot(m1,data=iris,Petal.Width~Petal.Length,slice=list(Sepal.Width=3,Sepal.Length=4))
confusion matrix
pred<-predict(m1,iris)
table(pred,iris$Species)
##
## pred setosa versicolor virginica
## setosa 50 0 0
## versicolor 0 48 2
## virginica 0 2 48
tab1<-table(predicted=pred,Actual=iris$Species)
tab1
## Actual
## predicted setosa versicolor virginica
## setosa 50 0 0
## versicolor 0 48 2
## virginica 0 2 48
misclassification
1-sum(diag(tab1))/sum(tab1)
## [1] 0.02666667
model2
m2<-svm(Species~.,data=iris,Kernel="Linear")
summary
summary(m2)
##
## Call:
## svm(formula = Species ~ ., data = iris, Kernel = "Linear")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
##
## Number of Support Vectors: 51
##
## ( 8 22 21 )
##
##
## Number of Classes: 3
##
## Levels:
## setosa versicolor virginica
plot
plot(m2,data=iris,Petal.Width~Petal.Length,slice=list(Sepal.Width=3,Sepal.Width=3,Sepal.Length=4))
confusio matrix
pred<-predict(m2,iris)
tab2<-table(predicted=pred,Actual=iris$Species)
tab2
## Actual
## predicted setosa versicolor virginica
## setosa 50 0 0
## versicolor 0 48 2
## virginica 0 2 48
misclassification
1-sum(diag(tab2))/sum(tab2)
## [1] 0.02666667
fine tune—-hyper-parameter optimization
set.seed(123)
t<-tune(svm,Species~.,data=iris,ranges = list(epsilon=seq(0,1,0.1),cost=2^(2:9)))
plot(t)
summary
summary(t)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## epsilon cost
## 0 4
##
## - best performance: 0.04
##
## - Detailed performance results:
## epsilon cost error dispersion
## 1 0.0 4 0.04000000 0.04661373
## 2 0.1 4 0.04000000 0.04661373
## 3 0.2 4 0.04000000 0.04661373
## 4 0.3 4 0.04000000 0.04661373
## 5 0.4 4 0.04000000 0.04661373
## 6 0.5 4 0.04000000 0.04661373
## 7 0.6 4 0.04000000 0.04661373
## 8 0.7 4 0.04000000 0.04661373
## 9 0.8 4 0.04000000 0.04661373
## 10 0.9 4 0.04000000 0.04661373
## 11 1.0 4 0.04000000 0.04661373
## 12 0.0 8 0.04666667 0.06324555
## 13 0.1 8 0.04666667 0.06324555
## 14 0.2 8 0.04666667 0.06324555
## 15 0.3 8 0.04666667 0.06324555
## 16 0.4 8 0.04666667 0.06324555
## 17 0.5 8 0.04666667 0.06324555
## 18 0.6 8 0.04666667 0.06324555
## 19 0.7 8 0.04666667 0.06324555
## 20 0.8 8 0.04666667 0.06324555
## 21 0.9 8 0.04666667 0.06324555
## 22 1.0 8 0.04666667 0.06324555
## 23 0.0 16 0.04666667 0.04499657
## 24 0.1 16 0.04666667 0.04499657
## 25 0.2 16 0.04666667 0.04499657
## 26 0.3 16 0.04666667 0.04499657
## 27 0.4 16 0.04666667 0.04499657
## 28 0.5 16 0.04666667 0.04499657
## 29 0.6 16 0.04666667 0.04499657
## 30 0.7 16 0.04666667 0.04499657
## 31 0.8 16 0.04666667 0.04499657
## 32 0.9 16 0.04666667 0.04499657
## 33 1.0 16 0.04666667 0.04499657
## 34 0.0 32 0.04666667 0.04499657
## 35 0.1 32 0.04666667 0.04499657
## 36 0.2 32 0.04666667 0.04499657
## 37 0.3 32 0.04666667 0.04499657
## 38 0.4 32 0.04666667 0.04499657
## 39 0.5 32 0.04666667 0.04499657
## 40 0.6 32 0.04666667 0.04499657
## 41 0.7 32 0.04666667 0.04499657
## 42 0.8 32 0.04666667 0.04499657
## 43 0.9 32 0.04666667 0.04499657
## 44 1.0 32 0.04666667 0.04499657
## 45 0.0 64 0.05333333 0.06126244
## 46 0.1 64 0.05333333 0.06126244
## 47 0.2 64 0.05333333 0.06126244
## 48 0.3 64 0.05333333 0.06126244
## 49 0.4 64 0.05333333 0.06126244
## 50 0.5 64 0.05333333 0.06126244
## 51 0.6 64 0.05333333 0.06126244
## 52 0.7 64 0.05333333 0.06126244
## 53 0.8 64 0.05333333 0.06126244
## 54 0.9 64 0.05333333 0.06126244
## 55 1.0 64 0.05333333 0.06126244
## 56 0.0 128 0.06000000 0.05837300
## 57 0.1 128 0.06000000 0.05837300
## 58 0.2 128 0.06000000 0.05837300
## 59 0.3 128 0.06000000 0.05837300
## 60 0.4 128 0.06000000 0.05837300
## 61 0.5 128 0.06000000 0.05837300
## 62 0.6 128 0.06000000 0.05837300
## 63 0.7 128 0.06000000 0.05837300
## 64 0.8 128 0.06000000 0.05837300
## 65 0.9 128 0.06000000 0.05837300
## 66 1.0 128 0.06000000 0.05837300
## 67 0.0 256 0.06666667 0.06285394
## 68 0.1 256 0.06666667 0.06285394
## 69 0.2 256 0.06666667 0.06285394
## 70 0.3 256 0.06666667 0.06285394
## 71 0.4 256 0.06666667 0.06285394
## 72 0.5 256 0.06666667 0.06285394
## 73 0.6 256 0.06666667 0.06285394
## 74 0.7 256 0.06666667 0.06285394
## 75 0.8 256 0.06666667 0.06285394
## 76 0.9 256 0.06666667 0.06285394
## 77 1.0 256 0.06666667 0.06285394
## 78 0.0 512 0.07333333 0.06629526
## 79 0.1 512 0.07333333 0.06629526
## 80 0.2 512 0.07333333 0.06629526
## 81 0.3 512 0.07333333 0.06629526
## 82 0.4 512 0.07333333 0.06629526
## 83 0.5 512 0.07333333 0.06629526
## 84 0.6 512 0.07333333 0.06629526
## 85 0.7 512 0.07333333 0.06629526
## 86 0.8 512 0.07333333 0.06629526
## 87 0.9 512 0.07333333 0.06629526
## 88 1.0 512 0.07333333 0.06629526
BEST MODEL
model<-t$best.model
summary(model)
##
## Call:
## best.tune(METHOD = svm, train.x = Species ~ ., data = iris, ranges = list(epsilon = seq(0,
## 1, 0.1), cost = 2^(2:9)))
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 4
##
## Number of Support Vectors: 37
##
## ( 6 17 14 )
##
##
## Number of Classes: 3
##
## Levels:
## setosa versicolor virginica
plot
plot(model,data=iris,Petal.Width~Petal.Length,slice=list(Sepal.Width=3,Sepal.Length=4))
confusion matrix
pred<-predict(model,iris)
tab3<-table(predicted=pred,Actual=iris$Species)
tab3
## Actual
## predicted setosa versicolor virginica
## setosa 50 0 0
## versicolor 0 48 0
## virginica 0 2 50
misclassification
1-sum(diag(tab3))/sum(tab3)
## [1] 0.01333333
LINEAR_DISCRIMINANT_ANALYSIS
library(psych)
pairs.panels(iris[1:4],gap=0,bg=c("red","green","blue")[iris$Species],psc=21)
## Warning in plot.window(...): "psc" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter
## Warning in title(...): "psc" is not a graphical parameter
## Warning in plot.window(...): "psc" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter
## Warning in title(...): "psc" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "psc" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "psc" is not a graphical
## parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "psc" is not a graphical
## parameter
## Warning in plot.window(...): "psc" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter
## Warning in title(...): "psc" is not a graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "psc" is not a graphical
## parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "psc" is not a graphical
## parameter
## Warning in plot.window(...): "psc" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter
## Warning in title(...): "psc" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "psc" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "psc" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "psc" is not a graphical
## parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "psc" is not a graphical
## parameter
## Warning in plot.window(...): "psc" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter
## Warning in title(...): "psc" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "psc" is not a
## graphical parameter
## Warning in plot.window(...): "psc" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter
## Warning in title(...): "psc" is not a graphical parameter
## Warning in plot.window(...): "psc" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter
## Warning in title(...): "psc" is not a graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "psc" is not a graphical
## parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "psc" is not a graphical
## parameter
## Warning in plot.window(...): "psc" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter
## Warning in title(...): "psc" is not a graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "psc" is not a graphical
## parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "psc" is not a graphical
## parameter
## Warning in plot.window(...): "psc" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter
## Warning in title(...): "psc" is not a graphical parameter
## Warning in plot.window(...): "psc" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter
## Warning in title(...): "psc" is not a graphical parameter
## Warning in plot.window(...): "psc" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter
## Warning in title(...): "psc" is not a graphical parameter
## Warning in plot.window(...): "psc" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter
## Warning in title(...): "psc" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "psc" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "psc" is not a graphical
## parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "psc" is not a graphical
## parameter
## Warning in plot.window(...): "psc" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter
## Warning in title(...): "psc" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "psc" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "psc" is not a
## graphical parameter
## Warning in plot.window(...): "psc" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter
## Warning in title(...): "psc" is not a graphical parameter
## Warning in plot.window(...): "psc" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter
## Warning in title(...): "psc" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "psc" is not a
## graphical parameter
## Warning in plot.window(...): "psc" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter
## Warning in title(...): "psc" is not a graphical parameter
PARTITIONING
set.seed(555)
ind<-sample(2,nrow(iris),replace = TRUE,prob = c(0.8,0.2))
train<-iris[ind==1,]
test<-iris[ind==2,]
model
library(MASS)
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
linear<-lda(Species~.,train)
linear
## Call:
## lda(Species ~ ., data = train)
##
## Prior probabilities of groups:
## setosa versicolor virginica
## 0.3559322 0.3220339 0.3220339
##
## Group means:
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## setosa 4.983333 3.407143 1.464286 0.252381
## versicolor 5.955263 2.755263 4.273684 1.334211
## virginica 6.618421 2.965789 5.563158 2.013158
##
## Coefficients of linear discriminants:
## LD1 LD2
## Sepal.Length 1.143465 0.2050828
## Sepal.Width 1.251756 -2.3225224
## Petal.Length -2.434279 0.6721985
## Petal.Width -2.713041 -2.5114909
##
## Proportion of trace:
## LD1 LD2
## 0.9925 0.0075
attributes
attributes(linear)
## $names
## [1] "prior" "counts" "means" "scaling" "lev" "svd" "N"
## [8] "call" "terms" "xlevels"
##
## $class
## [1] "lda"
linear$count
## setosa versicolor virginica
## 42 38 38
linear$prior
## setosa versicolor virginica
## 0.3559322 0.3220339 0.3220339
linear$scaling
## LD1 LD2
## Sepal.Length 1.143465 0.2050828
## Sepal.Width 1.251756 -2.3225224
## Petal.Length -2.434279 0.6721985
## Petal.Width -2.713041 -2.5114909
stacked histogram of Discriminant Function values
```r
p<-predict(linear,train)
p
## $class
## [1] setosa setosa setosa setosa setosa setosa
## [7] setosa setosa setosa setosa setosa setosa
## [13] setosa setosa setosa setosa setosa setosa
## [19] setosa setosa setosa setosa setosa setosa
## [25] setosa setosa setosa setosa setosa setosa
## [31] setosa setosa setosa setosa setosa setosa
## [37] setosa setosa setosa setosa setosa setosa
## [43] versicolor versicolor versicolor versicolor versicolor versicolor
## [49] versicolor versicolor versicolor versicolor versicolor versicolor
## [55] versicolor versicolor versicolor versicolor versicolor versicolor
## [61] virginica versicolor versicolor versicolor versicolor versicolor
## [67] versicolor versicolor versicolor versicolor versicolor versicolor
## [73] versicolor versicolor versicolor versicolor versicolor versicolor
## [79] versicolor versicolor virginica virginica virginica virginica
## [85] virginica virginica virginica virginica virginica virginica
## [91] virginica virginica virginica virginica virginica virginica
## [97] virginica virginica virginica virginica virginica virginica
## [103] virginica virginica virginica versicolor virginica virginica
## [109] virginica virginica virginica virginica virginica virginica
## [115] virginica virginica virginica virginica
## Levels: setosa versicolor virginica
##
## $posterior
## setosa versicolor virginica
## 1 1.000000e+00 9.570171e-23 3.467391e-43
## 3 1.000000e+00 4.148630e-20 8.077736e-40
## 4 1.000000e+00 5.508665e-17 1.259714e-35
## 5 1.000000e+00 6.905823e-23 3.239328e-43
## 6 1.000000e+00 1.989014e-21 1.584854e-40
## 8 1.000000e+00 1.191699e-20 2.095622e-40
## 9 1.000000e+00 7.044587e-16 2.678402e-34
## 12 1.000000e+00 1.070802e-18 1.183246e-37
## 13 1.000000e+00 2.353548e-19 2.700003e-39
## 14 1.000000e+00 4.000902e-20 3.520246e-40
## 16 1.000000e+00 6.787907e-28 7.388311e-49
## 17 1.000000e+00 1.538362e-25 3.575203e-46
## 18 1.000000e+00 1.002833e-21 1.434635e-41
## 19 1.000000e+00 3.121659e-23 1.873737e-43
## 21 1.000000e+00 1.908362e-20 2.913916e-40
## 22 1.000000e+00 6.924301e-21 6.108942e-40
## 23 1.000000e+00 3.792552e-25 3.499021e-46
## 24 1.000000e+00 2.160663e-15 1.057841e-32
## 25 1.000000e+00 1.298356e-15 2.032785e-33
## 26 1.000000e+00 3.327927e-17 3.399044e-36
## 27 1.000000e+00 1.395337e-17 9.256824e-36
## 29 1.000000e+00 1.326246e-22 3.711511e-43
## 30 1.000000e+00 5.030246e-17 1.387733e-35
## 31 1.000000e+00 6.970974e-17 1.485436e-35
## 32 1.000000e+00 1.842844e-20 7.492207e-40
## 33 1.000000e+00 7.916134e-27 2.933146e-48
## 34 1.000000e+00 7.905464e-29 9.176394e-51
## 35 1.000000e+00 2.252033e-18 1.230656e-37
## 36 1.000000e+00 1.590517e-22 3.058317e-43
## 37 1.000000e+00 1.263948e-25 2.806406e-47
## 38 1.000000e+00 1.913068e-23 3.662364e-44
## 39 1.000000e+00 1.642225e-17 2.073054e-36
## 40 1.000000e+00 4.105279e-21 4.479896e-41
## 41 1.000000e+00 2.729968e-22 2.600843e-42
## 42 1.000000e+00 1.010682e-12 1.446954e-30
## 43 1.000000e+00 1.014790e-18 8.268463e-38
## 44 1.000000e+00 3.808628e-16 3.164798e-33
## 45 1.000000e+00 2.225498e-17 5.408310e-35
## 46 1.000000e+00 2.584287e-17 4.622119e-36
## 48 1.000000e+00 1.284173e-18 9.750050e-38
## 49 1.000000e+00 7.483600e-24 1.630794e-44
## 50 1.000000e+00 4.495726e-21 4.066621e-41
## 51 6.837393e-18 9.999464e-01 5.356221e-05
## 52 1.239301e-19 9.993679e-01 6.321340e-04
## 53 4.902141e-22 9.975222e-01 2.477847e-03
## 54 4.647975e-22 9.997431e-01 2.569214e-04
## 55 1.286113e-22 9.977254e-01 2.274627e-03
## 56 2.985366e-23 9.972594e-01 2.740564e-03
## 57 1.415505e-22 9.814186e-01 1.858141e-02
## 58 5.638331e-14 9.999999e-01 1.209049e-07
## 60 5.061565e-21 9.992694e-01 7.305519e-04
## 61 5.496318e-18 9.999989e-01 1.054475e-06
## 62 4.504370e-20 9.992490e-01 7.510365e-04
## 63 2.740898e-17 9.999995e-01 4.781694e-07
## 64 7.122616e-24 9.924874e-01 7.512623e-03
## 65 7.395146e-14 9.999987e-01 1.250812e-06
## 67 1.455021e-24 9.573374e-01 4.266264e-02
## 68 3.213591e-16 9.999990e-01 1.005720e-06
## 69 1.306777e-26 9.855505e-01 1.444951e-02
## 70 2.557212e-17 9.999973e-01 2.728496e-06
## 71 6.164746e-29 1.429533e-01 8.570467e-01
## 72 2.930603e-16 9.999951e-01 4.912004e-06
## 73 1.668151e-28 8.607639e-01 1.392361e-01
## 74 1.957680e-22 9.993960e-01 6.039622e-04
## 75 2.378311e-17 9.999866e-01 1.336160e-05
## 77 2.903408e-22 9.991927e-01 8.072534e-04
## 78 9.503086e-27 7.675201e-01 2.324799e-01
## 79 2.660810e-23 9.918416e-01 8.158404e-03
## 80 4.045835e-11 1.000000e+00 1.004847e-08
## 87 6.624103e-21 9.988996e-01 1.100428e-03
## 88 1.812604e-22 9.998063e-01 1.937170e-04
## 89 2.157571e-18 9.999166e-01 8.336390e-05
## 90 7.522454e-21 9.998342e-01 1.658480e-04
## 91 2.450117e-23 9.988442e-01 1.155832e-03
## 92 3.070771e-22 9.974931e-01 2.506885e-03
## 93 7.757766e-18 9.999919e-01 8.065305e-06
## 95 3.107085e-21 9.996111e-01 3.888813e-04
## 98 2.822355e-18 9.999653e-01 3.469698e-05
## 99 2.211578e-10 1.000000e+00 1.042401e-08
## 100 3.870206e-19 9.999199e-01 8.014712e-05
## 103 1.317172e-42 3.220037e-05 9.999678e-01
## 105 7.840059e-47 1.126925e-06 9.999989e-01
## 106 3.874149e-49 7.203430e-07 9.999993e-01
## 107 1.269734e-34 1.678549e-02 9.832145e-01
## 109 1.596966e-42 2.380285e-04 9.997620e-01
## 110 4.976922e-47 1.355850e-07 9.999999e-01
## 111 2.529196e-32 1.305030e-02 9.869497e-01
## 112 1.079614e-37 1.848637e-03 9.981514e-01
## 114 3.654120e-41 1.519895e-04 9.998480e-01
## 115 2.838221e-46 8.031545e-07 9.999992e-01
## 117 3.931949e-36 3.866017e-03 9.961340e-01
## 118 6.707402e-46 6.999584e-07 9.999993e-01
## 119 9.802945e-59 2.189326e-09 1.000000e+00
## 120 2.744397e-33 2.404620e-01 7.595380e-01
## 121 5.869706e-43 7.215299e-06 9.999928e-01
## 122 2.529668e-38 4.399863e-04 9.995600e-01
## 123 1.159049e-49 1.222596e-06 9.999988e-01
## 124 3.669876e-31 1.346388e-01 8.653612e-01
## 125 2.299196e-40 5.391577e-05 9.999461e-01
## 126 4.231977e-37 2.041033e-03 9.979590e-01
## 128 2.181013e-30 1.035716e-01 8.964284e-01
## 130 1.720483e-32 1.075489e-01 8.924511e-01
## 131 1.382413e-41 2.315330e-04 9.997685e-01
## 132 4.314830e-37 4.013577e-04 9.995986e-01
## 133 4.450738e-46 2.643006e-06 9.999974e-01
## 134 7.440809e-29 6.706163e-01 3.293837e-01
## 135 1.454401e-36 2.351838e-02 9.764816e-01
## 138 4.212482e-36 2.988751e-03 9.970112e-01
## 139 1.143640e-29 1.478430e-01 8.521570e-01
## 140 3.443215e-36 1.278875e-03 9.987211e-01
## 141 3.340946e-45 1.368035e-06 9.999986e-01
## 142 3.455844e-35 1.162388e-03 9.988376e-01
## 143 6.868742e-39 6.188908e-04 9.993811e-01
## 144 1.884646e-46 7.646829e-07 9.999992e-01
## 145 7.845946e-47 2.218301e-07 9.999998e-01
## 146 1.223600e-38 1.487672e-04 9.998512e-01
## 147 1.567893e-35 1.040148e-02 9.895985e-01
## 149 5.158270e-42 5.612656e-06 9.999944e-01
##
## $x
## LD1 LD2
## 1 7.9280301 -0.289614784
## 3 7.3385453 0.257888972
## 4 6.6121675 0.604072636
## 5 7.9388592 -0.542375299
## 6 7.4988801 -1.457737530
## 8 7.4450802 -0.010650971
## 9 6.3765512 0.960340708
## 12 6.9729593 0.015552326
## 13 7.2304168 1.061270666
## 14 7.3889680 0.757069727
## 16 8.9546532 -2.691913594
## 17 8.4725916 -1.726616936
## 18 7.6567260 -0.540763873
## 19 7.9880481 -0.912811372
## 21 7.4156104 0.205821840
## 22 7.3923452 -1.189197587
## 23 8.4551847 -0.893287812
## 24 6.1334831 -0.376898019
## 25 6.2426756 0.217211880
## 26 6.7009500 0.985577834
## 27 6.6590442 -0.445729297
## 29 7.9172010 -0.036854268
## 30 6.6082616 0.459548526
## 31 6.5974326 0.712309042
## 32 7.3598580 -0.430916041
## 33 8.8213062 -1.344250997
## 34 9.2616450 -1.833347345
## 35 6.9552070 0.665597467
## 36 7.9250127 0.252193952
## 37 8.6288440 -0.274801527
## 38 8.0958167 -0.311734487
## 39 6.7451547 0.660868618
## 40 7.5594267 0.009857306
## 41 7.7858074 -0.628492001
## 42 5.7119682 2.055993476
## 43 6.9955058 0.196364141
## 44 6.2416116 -1.180279713
## 45 6.5438093 -1.152570420
## 46 6.6878087 0.558972489
## 48 6.9807709 0.304600547
## 49 8.1636463 -0.645882855
## 50 7.5633325 0.154381416
## 51 -1.5636819 0.001265223
## 52 -2.0342092 -0.507373231
## 53 -2.5613638 0.096299798
## 54 -2.4301602 1.564521343
## 55 -2.6639929 0.509363851
## 56 -2.7927288 0.780375962
## 57 -2.7815400 -0.876843132
## 58 -0.4731563 1.492127750
## 60 -2.3003737 0.255618618
## 61 -1.3463678 2.576084683
## 62 -2.1260092 -0.347069693
## 63 -1.1696911 2.652762233
## 64 -2.9683271 0.513447445
## 65 -0.5910488 -0.077363216
## 67 -3.1993323 -0.206934970
## 68 -1.0159342 1.517704338
## 69 -3.5146579 1.774132600
## 70 -1.2794266 1.555603470
## 71 -4.1501375 -1.161702328
## 72 -1.1182034 0.526309813
## 73 -3.9984962 1.366763567
## 74 -2.5508945 1.247997861
## 75 -1.3802720 0.557241959
## 77 -2.5365051 0.956477474
## 78 -3.7012684 -0.147542844
## 79 -2.8671219 0.107350377
## 80 0.2051110 1.326129192
## 87 -2.3032010 -0.079156459
## 88 -2.4890998 1.997466965
## 89 -1.6830127 0.026483802
## 90 -2.1798091 1.100016866
## 91 -2.7570410 1.387793122
## 92 -2.5997237 0.213975355
## 93 -1.4402900 1.180438548
## 95 -2.3019673 0.790460369
## 98 -1.6089650 0.516225406
## 99 0.3396918 0.848083423
## 100 -1.8190173 0.511496556
## 103 -6.5199495 -0.465127429
## 105 -7.2339047 -0.906546031
## 106 -7.6522122 0.107952915
## 107 -5.1682437 0.308470106
## 109 -6.5458733 1.300328070
## 110 -7.2266217 -2.708289235
## 111 -4.7369503 -1.339291290
## 112 -5.6927263 0.187050417
## 114 -6.2845233 0.055188312
## 115 -7.1232943 -1.558436630
## 117 -5.4184048 -0.103609230
## 118 -7.0511932 -1.913485953
## 119 -9.3114598 0.756831522
## 120 -4.9604903 2.069215302
## 121 -6.5540438 -1.607386340
## 122 -5.7799153 -0.729296532
## 123 -7.7603407 0.911334609
## 124 -4.5620572 0.148811823
## 125 -6.1149531 -1.378356955
## 126 -5.5847676 -0.088456511
## 128 -4.4152235 -0.588961446
## 130 -4.8056549 0.743906440
## 131 -6.3715088 0.697639759
## 132 -5.5496084 -1.571830775
## 133 -7.1117466 -0.596989534
## 134 -4.1098253 0.804446554
## 135 -5.5347047 1.815182822
## 138 -5.4075758 -0.356369746
## 139 -4.2861422 -0.676689574
## 140 -5.4063276 -1.074495478
## 141 -6.9357885 -1.734519596
## 142 -5.2186520 -1.778453210
## 143 -5.8919495 -0.070438948
## 144 -7.1552460 -1.493454915
## 145 -7.2001693 -2.382953310
## 146 -5.8159485 -1.519997674
## 147 -5.3271403 0.429387063
## 149 -6.3738345 -2.417108310
histogram
c<-ldahist(data=p$x[,1],g=train$Species)
c
biplot
library(devtools)
## Loading required package: usethis
library(ggord)
ggord(linear,train$Species)
library(klaR)
partimat(Species~.,data=train,method="lda")
partimat(Species~.,data=train,method="qda")
confusion matrix
p1<-predict(linear,train)$class
tab<-table(predicted=p1,Actual=train$Species)
tab
## Actual
## predicted setosa versicolor virginica
## setosa 42 0 0
## versicolor 0 37 1
## virginica 0 1 37
accuracy
sum(diag(tab))/sum(tab)
## [1] 0.9830508
FOR TEST DATA
p2<-predict(linear,test)$class
t2<-table(predicted=p2,Actual=test$Species)
t2
## Actual
## predicted setosa versicolor virginica
## setosa 8 0 0
## versicolor 0 11 0
## virginica 0 1 12
accuracy
sum(diag(t2))/sum(t2)
## [1] 0.96875
BINARY LOGISTIC REGRESSION
telecom_churn <- read_csv("C:/Users/USER/Desktop/telecom_churn.csv")
## Rows: 3333 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (11): Churn, AccountWeeks, ContractRenewal, DataPlan, DataUsage, CustSer...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
churn<-telecom_churn
churn$Churn<-as.factor(churn$Churn)
names(churn)
## [1] "Churn" "AccountWeeks" "ContractRenewal" "DataPlan"
## [5] "DataUsage" "CustServCalls" "DayMins" "DayCalls"
## [9] "MonthlyCharge" "OverageFee" "RoamMins"
str(churn)
## spc_tbl_ [3,333 × 11] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Churn : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ AccountWeeks : num [1:3333] 128 107 137 84 75 118 121 147 117 141 ...
## $ ContractRenewal: num [1:3333] 1 1 1 0 0 0 1 0 1 0 ...
## $ DataPlan : num [1:3333] 1 1 0 0 0 0 1 0 0 1 ...
## $ DataUsage : num [1:3333] 2.7 3.7 0 0 0 0 2.03 0 0.19 3.02 ...
## $ CustServCalls : num [1:3333] 1 1 0 2 3 0 3 0 1 0 ...
## $ DayMins : num [1:3333] 265 162 243 299 167 ...
## $ DayCalls : num [1:3333] 110 123 114 71 113 98 88 79 97 84 ...
## $ MonthlyCharge : num [1:3333] 89 82 52 57 41 57 87.3 36 63.9 93.2 ...
## $ OverageFee : num [1:3333] 9.87 9.78 6.06 3.1 7.42 ...
## $ RoamMins : num [1:3333] 10 13.7 12.2 6.6 10.1 6.3 7.5 7.1 8.7 11.2 ...
## - attr(*, "spec")=
## .. cols(
## .. Churn = col_double(),
## .. AccountWeeks = col_double(),
## .. ContractRenewal = col_double(),
## .. DataPlan = col_double(),
## .. DataUsage = col_double(),
## .. CustServCalls = col_double(),
## .. DayMins = col_double(),
## .. DayCalls = col_double(),
## .. MonthlyCharge = col_double(),
## .. OverageFee = col_double(),
## .. RoamMins = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
head(churn)
## # A tibble: 6 × 11
## Churn AccountWeeks ContractRenewal DataPlan DataUsage CustServCalls DayMins
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 128 1 1 2.7 1 265.
## 2 0 107 1 1 3.7 1 162.
## 3 0 137 1 0 0 0 243.
## 4 0 84 0 0 0 2 299.
## 5 0 75 0 0 0 3 167.
## 6 0 118 0 0 0 0 223.
## # ℹ 4 more variables: DayCalls <dbl>, MonthlyCharge <dbl>, OverageFee <dbl>,
## # RoamMins <dbl>
DATA PARTITIONING
ind<-sample(2,nrow(churn),replace=TRUE,prob=c(0.8,0.2))
train<-churn[ind==1,]
test<-churn[ind==2,]
modelling
mymodel<-glm(Churn~MonthlyCharge+OverageFee,data=churn,family="binomial")
summary(mymodel)
##
## Call:
## glm(formula = Churn ~ MonthlyCharge + OverageFee, family = "binomial",
## data = churn)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.192084 0.246627 -12.943 < 2e-16 ***
## MonthlyCharge 0.008515 0.003046 2.795 0.00519 **
## OverageFee 0.090140 0.020466 4.404 1.06e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2758.3 on 3332 degrees of freedom
## Residual deviance: 2721.7 on 3330 degrees of freedom
## AIC: 2727.7
##
## Number of Fisher Scoring iterations: 4
PREDICTION
p1<-predict(mymodel,train,type='response')
head(p1)
## 1 2 3 4 5 6
## 0.17586805 0.16626793 0.09947492 0.08111335 0.10209668 0.15283956
head(train)
## # A tibble: 6 × 11
## Churn AccountWeeks ContractRenewal DataPlan DataUsage CustServCalls DayMins
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 128 1 1 2.7 1 265.
## 2 0 107 1 1 3.7 1 162.
## 3 0 137 1 0 0 0 243.
## 4 0 84 0 0 0 2 299.
## 5 0 75 0 0 0 3 167.
## 6 0 118 0 0 0 0 223.
## # ℹ 4 more variables: DayCalls <dbl>, MonthlyCharge <dbl>, OverageFee <dbl>,
## # RoamMins <dbl>
accuracy
pred1=ifelse(p1>0.5,1,0)
table(pred1)
## pred1
## 0
## 2679
tab1<-table(predict=pred1,actual=train$Churn)
confusion matrix
1-sum(diag(tab1))/sum(tab1)
## [1] 0.1433371
test data
p2<-predict(mymodel,test,type='response')
pred2<-ifelse(p2>0.5,1,0)
tab2<-table(predicted=pred2,actual=test$Churn)
tab2
## actual
## predicted 0 1
## 0 555 99
misclassification error
1-sum(diag(tab2))/sum(tab2)
## [1] 0.1513761
ordered logistic regression
milknew <- read_csv("C:/Users/USER/Desktop/milknew.csv")
## Rows: 1059 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Grade
## dbl (7): pH, Temprature, Taste, Odor, Fat, Turbidity, Colour
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
milknew$Grade<-as.factor(milknew$Grade)
milknew$Taste<-as.factor(milknew$Taste)
milknew$Odor<-as.factor(milknew$Odor)
milknew$Fat<-as.factor(milknew$Fat)
milknew$Turbidity<-as.factor(milknew$Turbidity)
milknew$Colour<-as.factor(milknew$Colour)
str(milknew)
## spc_tbl_ [1,059 × 8] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ pH : num [1:1059] 6.6 6.6 8.5 9.5 6.6 6.6 5.5 4.5 8.1 6.7 ...
## $ Temprature: num [1:1059] 35 36 70 34 37 37 45 60 66 45 ...
## $ Taste : Factor w/ 2 levels "0","1": 2 1 2 2 1 2 2 1 2 2 ...
## $ Odor : Factor w/ 2 levels "0","1": 1 2 2 2 1 2 1 2 1 2 ...
## $ Fat : Factor w/ 2 levels "0","1": 2 1 2 1 1 2 2 2 2 1 ...
## $ Turbidity : Factor w/ 2 levels "0","1": 1 2 2 2 1 2 2 2 2 1 ...
## $ Colour : Factor w/ 9 levels "240","245","246",..: 8 7 3 9 9 9 6 6 9 4 ...
## $ Grade : Factor w/ 3 levels "high","low","medium": 1 1 2 2 3 1 2 2 2 3 ...
## - attr(*, "spec")=
## .. cols(
## .. pH = col_double(),
## .. Temprature = col_double(),
## .. Taste = col_double(),
## .. Odor = col_double(),
## .. Fat = col_double(),
## .. Turbidity = col_double(),
## .. Colour = col_double(),
## .. Grade = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
head(milknew)
## # A tibble: 6 × 8
## pH Temprature Taste Odor Fat Turbidity Colour Grade
## <dbl> <dbl> <fct> <fct> <fct> <fct> <fct> <fct>
## 1 6.6 35 1 0 1 0 254 high
## 2 6.6 36 0 1 0 1 253 high
## 3 8.5 70 1 1 1 1 246 low
## 4 9.5 34 1 1 0 1 255 low
## 5 6.6 37 0 0 0 0 255 medium
## 6 6.6 37 1 1 1 1 255 high
names(milknew)
## [1] "pH" "Temprature" "Taste" "Odor" "Fat"
## [6] "Turbidity" "Colour" "Grade"
summary(milknew)
## pH Temprature Taste Odor Fat Turbidity Colour
## Min. :3.00 Min. :34.00 0:480 0:601 0:348 0:539 255 :628
## 1st Qu.:6.50 1st Qu.:38.00 1:579 1:458 1:711 1:520 250 :146
## Median :6.70 Median :41.00 245 :115
## Mean :6.63 Mean :44.23 247 : 48
## 3rd Qu.:6.80 3rd Qu.:45.00 246 : 44
## Max. :9.50 Max. :90.00 240 : 32
## (Other): 46
## Grade
## high :256
## low :429
## medium:374
##
##
##
##
partitioning
ind<-sample(2,nrow(milknew),replace=T,prob=c(0.8,0.2))
train<-milknew[ind==1,]
test<-milknew[ind==2,]
modelling
library(MASS)
model<-polr(Grade~Temprature + pH+Taste+Odor+Fat+Turbidity+Colour,data=train)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(model)
##
## Re-fitting to get Hessian
## Call:
## polr(formula = Grade ~ Temprature + pH + Taste + Odor + Fat +
## Turbidity + Colour, data = train)
##
## Coefficients:
## Value Std. Error t value
## Temprature -0.04308 8.579e-03 -5.021e+00
## pH -0.17298 5.200e-02 -3.327e+00
## Taste1 -1.46961 1.841e-01 -7.984e+00
## Odor1 -2.98636 2.688e-01 -1.111e+01
## Fat1 -0.95355 2.009e-01 -4.746e+00
## Turbidity1 0.19053 2.710e-01 7.031e-01
## Colour245 -10.00488 2.886e-01 -3.467e+01
## Colour246 -11.61656 3.569e-01 -3.255e+01
## Colour247 1.78725 9.263e-05 1.929e+04
## Colour248 -8.86921 4.454e-01 -1.991e+01
## Colour250 -12.44678 2.247e-01 -5.540e+01
## Colour253 -33.74576 1.517e-10 -2.225e+11
## Colour254 -18.49326 3.082e-06 -6.000e+06
## Colour255 -11.57078 1.390e-01 -8.327e+01
##
## Intercepts:
## Value Std. Error t value
## high|low -1.901630e+01 5.527000e-01 -3.440910e+01
## low|medium -1.597480e+01 4.999000e-01 -3.195590e+01
##
## Residual Deviance: 1212.989
## AIC: 1244.989
p values
coef(summary(model))
##
## Re-fitting to get Hessian
## Value Std. Error t value
## Temprature -0.0430763 8.579486e-03 -5.020848e+00
## pH -0.1729803 5.199521e-02 -3.326851e+00
## Taste1 -1.4696066 1.840798e-01 -7.983529e+00
## Odor1 -2.9863624 2.688305e-01 -1.110872e+01
## Fat1 -0.9535484 2.009220e-01 -4.745864e+00
## Turbidity1 0.1905287 2.709774e-01 7.031168e-01
## Colour245 -10.0048780 2.885778e-01 -3.466960e+01
## Colour246 -11.6165643 3.569180e-01 -3.254687e+01
## Colour247 1.7872452 9.263105e-05 1.929423e+04
## Colour248 -8.8692105 4.453870e-01 -1.991349e+01
## Colour250 -12.4467785 2.246867e-01 -5.539614e+01
## Colour253 -33.7457602 1.516871e-10 -2.224696e+11
## Colour254 -18.4932576 3.082265e-06 -5.999893e+06
## Colour255 -11.5707791 1.389623e-01 -8.326558e+01
## high|low -19.0163403 5.526548e-01 -3.440907e+01
## low|medium -15.9747577 4.998998e-01 -3.195592e+01
ctable<-coef(summary(model))
##
## Re-fitting to get Hessian
ctable
## Value Std. Error t value
## Temprature -0.0430763 8.579486e-03 -5.020848e+00
## pH -0.1729803 5.199521e-02 -3.326851e+00
## Taste1 -1.4696066 1.840798e-01 -7.983529e+00
## Odor1 -2.9863624 2.688305e-01 -1.110872e+01
## Fat1 -0.9535484 2.009220e-01 -4.745864e+00
## Turbidity1 0.1905287 2.709774e-01 7.031168e-01
## Colour245 -10.0048780 2.885778e-01 -3.466960e+01
## Colour246 -11.6165643 3.569180e-01 -3.254687e+01
## Colour247 1.7872452 9.263105e-05 1.929423e+04
## Colour248 -8.8692105 4.453870e-01 -1.991349e+01
## Colour250 -12.4467785 2.246867e-01 -5.539614e+01
## Colour253 -33.7457602 1.516871e-10 -2.224696e+11
## Colour254 -18.4932576 3.082265e-06 -5.999893e+06
## Colour255 -11.5707791 1.389623e-01 -8.326558e+01
## high|low -19.0163403 5.526548e-01 -3.440907e+01
## low|medium -15.9747577 4.998998e-01 -3.195592e+01
p<-pnorm(abs(ctable[,"t value"]))*2
(ctable<-cbind(ctable,"p value "=p))
## Value Std. Error t value p value
## Temprature -0.0430763 8.579486e-03 -5.020848e+00 1.999999
## pH -0.1729803 5.199521e-02 -3.326851e+00 1.999122
## Taste1 -1.4696066 1.840798e-01 -7.983529e+00 2.000000
## Odor1 -2.9863624 2.688305e-01 -1.110872e+01 2.000000
## Fat1 -0.9535484 2.009220e-01 -4.745864e+00 1.999998
## Turbidity1 0.1905287 2.709774e-01 7.031168e-01 1.518017
## Colour245 -10.0048780 2.885778e-01 -3.466960e+01 2.000000
## Colour246 -11.6165643 3.569180e-01 -3.254687e+01 2.000000
## Colour247 1.7872452 9.263105e-05 1.929423e+04 2.000000
## Colour248 -8.8692105 4.453870e-01 -1.991349e+01 2.000000
## Colour250 -12.4467785 2.246867e-01 -5.539614e+01 2.000000
## Colour253 -33.7457602 1.516871e-10 -2.224696e+11 2.000000
## Colour254 -18.4932576 3.082265e-06 -5.999893e+06 2.000000
## Colour255 -11.5707791 1.389623e-01 -8.326558e+01 2.000000
## high|low -19.0163403 5.526548e-01 -3.440907e+01 2.000000
## low|medium -15.9747577 4.998998e-01 -3.195592e+01 2.000000
TO BE CONTINUED