MACHINE_LEARNING ALGORITHMS

Am going to formulate machine learning using all algotithm as listed below:

1.0 LINEAR REGRESSION

2.0 RIDGE REGRESSION

3.0 LASSO REGRESSION

4.0 ELASTIC NET REGRESSION

5.0 MULTINOMIAL REGRESSION

6.0 DECISION TREE REGRESSION

7.0 RANDOM FOREST

8.0 SUPPORT VECTOR MACHINE

9.0 LINEAR DISCRIMINANT ANALYSIS

10.0 BINARY LOGISTIC REGRESSION

11.0 ORDERED LOGISTIC REGRESSION

1.0 LINEAR REGRESSION

An insurance company in Arizona,was in dillemma :whether is charging its clients too high or too low.They decided to consult Mugo the data scientist.Mugo used kaggle data to explain to them the insights .They thanked him very much.Look what he told them.

GENERAL OBJECTIVE

To determine the effect of associated factors to charges in insurance.

SPECIFIC OBJECTIVES

1.0 to determine if age has effect on insurance charges

2.0 to determine if sex has effect on insurance charges

3.0 to determine if smoking has effect on insurance charges

4.0 to determine if number of children has effect on insurance charges

5.0 to determine if region has effect on insurance charges

NULL HYPOTHESIS

1.0 AGE has no effects to insurance charges

2.0 sex has no effect to insurance charges

3.0 smoking has no effect to insurance charges

4.0 region has no effect to insurance charges

5.0 no of children have no effect to insurance charges

library(readr)
insurance <- read_csv("C:/Users/USER/Desktop/insurance.csv")

## Rows: 1338 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): sex, smoker, region
## dbl (4): age, bmi, children, charges
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(insurance)

## # A tibble: 6 × 7
##     age sex      bmi children smoker region    charges
##   <dbl> <chr>  <dbl>    <dbl> <chr>  <chr>       <dbl>
## 1    19 female  27.9        0 yes    southwest  16885.
## 2    18 male    33.8        1 no     southeast   1726.
## 3    28 male    33          3 no     southeast   4449.
## 4    33 male    22.7        0 no     northwest  21984.
## 5    32 male    28.9        0 no     northwest   3867.
## 6    31 female  25.7        0 no     southeast   3757.

summary(insurance)

##       age            sex                 bmi           children    
##  Min.   :18.00   Length:1338        Min.   :15.96   Min.   :0.000  
##  1st Qu.:27.00   Class :character   1st Qu.:26.30   1st Qu.:0.000  
##  Median :39.00   Mode  :character   Median :30.40   Median :1.000  
##  Mean   :39.21                      Mean   :30.66   Mean   :1.095  
##  3rd Qu.:51.00                      3rd Qu.:34.69   3rd Qu.:2.000  
##  Max.   :64.00                      Max.   :53.13   Max.   :5.000  
##     smoker             region             charges     
##  Length:1338        Length:1338        Min.   : 1122  
##  Class :character   Class :character   1st Qu.: 4740  
##  Mode  :character   Mode  :character   Median : 9382  
##                                        Mean   :13270  
##                                        3rd Qu.:16640  
##                                        Max.   :63770

names(insurance)

## [1] "age"      "sex"      "bmi"      "children" "smoker"   "region"   "charges"

insurance$children<-as.factor(insurance$children)
insurance$region<-as.factor(insurance$region)
insurance$sex<-as.factor(insurance$sex)
insurance$smoker<-as.factor(insurance$smoker)
levels(insurance$children)

## [1] "0" "1" "2" "3" "4" "5"

class(insurance$children)

## [1] "factor"

levels(insurance$sex)

## [1] "female" "male"

str(insurance)

## spc_tbl_ [1,338 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ age     : num [1:1338] 19 18 28 33 32 31 46 37 37 60 ...
##  $ sex     : Factor w/ 2 levels "female","male": 1 2 2 2 2 1 1 1 2 1 ...
##  $ bmi     : num [1:1338] 27.9 33.8 33 22.7 28.9 ...
##  $ children: Factor w/ 6 levels "0","1","2","3",..: 1 2 4 1 1 1 2 4 3 1 ...
##  $ smoker  : Factor w/ 2 levels "no","yes": 2 1 1 1 1 1 1 1 1 1 ...
##  $ region  : Factor w/ 4 levels "northeast","northwest",..: 4 3 3 2 2 3 3 2 1 2 ...
##  $ charges : num [1:1338] 16885 1726 4449 21984 3867 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   age = col_double(),
##   ..   sex = col_character(),
##   ..   bmi = col_double(),
##   ..   children = col_double(),
##   ..   smoker = col_character(),
##   ..   region = col_character(),
##   ..   charges = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

explore the data

library(psych)
pairs.panels(insurance[1:4],gap=0,bg=c("red","green","blue"))

as shown in the table a bove there is no multicollinearity problem.But as shown below all the numeric variables are not normally distributed;since they to accept the null hypotheis;null hypothesis is there is no difference between avariable distribution curve with a normally distributed curve;since their pv are <0.5.

solution is to split the variables each into categories/binning

#(insurance$age)
#shapiro.test(insurance$bmi)
#boxplot(insurance$Charges)

binning

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ purrr     1.0.2
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ ggplot2::%+%()   masks psych::%+%()
## ✖ ggplot2::alpha() masks psych::alpha()
## ✖ dplyr::filter()  masks stats::filter()
## ✖ dplyr::lag()     masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

insurance<-insurance%>%mutate(bmis=cut(bmi,breaks =c(15,25,35,45,55)))
insurance<-insurance%>%mutate(ages=cut(age,breaks =c(15,35,55,75)))
tail(insurance)

## # A tibble: 6 × 9
##     age sex      bmi children smoker region    charges bmis    ages   
##   <dbl> <fct>  <dbl> <fct>    <fct>  <fct>       <dbl> <fct>   <fct>  
## 1    52 female  44.7 3        no     southwest  11412. (35,45] (35,55]
## 2    50 male    31.0 3        no     northwest  10601. (25,35] (35,55]
## 3    18 female  31.9 0        no     northeast   2206. (25,35] (15,35]
## 4    18 female  36.8 0        no     southeast   1630. (35,45] (15,35]
## 5    21 female  25.8 0        no     southwest   2008. (25,35] (15,35]
## 6    61 female  29.1 0        yes    northwest  29141. (25,35] (55,75]

levels(insurance$bmis)

## [1] "(15,25]" "(25,35]" "(35,45]" "(45,55]"

levels(insurance$region)

## [1] "northeast" "northwest" "southeast" "southwest"

levels(insurance$sex)

## [1] "female" "male"

levels(insurance$smoker)

## [1] "no"  "yes"

lm<-lm(charges~ages+children+bmis+sex+smoker+region,data=insurance)
summary(lm)

## 
## Call:
## lm(formula = charges ~ ages + children + bmis + sex + smoker + 
##     region, data = insurance)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -14832  -3030  -1106   1669  30821 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       1738.6      573.6   3.031  0.00249 ** 
## ages(35,55]       4544.6      384.2  11.829  < 2e-16 ***
## ages(55,75]       9388.4      509.2  18.438  < 2e-16 ***
## children1         1002.0      448.1   2.236  0.02553 *  
## children2         2324.7      497.6   4.672 3.29e-06 ***
## children3         1897.5      576.0   3.294  0.00101 ** 
## children4         3463.4     1290.1   2.685  0.00735 ** 
## children5         1615.5     1518.8   1.064  0.28766    
## bmis(25,35]       2669.7      464.5   5.748 1.12e-08 ***
## bmis(35,45]       5728.1      560.5  10.220  < 2e-16 ***
## bmis(45,55]       6166.7     1489.8   4.139 3.71e-05 ***
## sexmale           -125.5      345.1  -0.364  0.71613    
## smokeryes        23724.1      429.6  55.227  < 2e-16 ***
## regionnorthwest   -406.6      495.0  -0.821  0.41154    
## regionsoutheast   -779.4      495.2  -1.574  0.11574    
## regionsouthwest   -838.5      496.2  -1.690  0.09126 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6283 on 1322 degrees of freedom
## Multiple R-squared:  0.7338, Adjusted R-squared:  0.7308 
## F-statistic:   243 on 15 and 1322 DF,  p-value: < 2.2e-16

results apart from region all the explanatory variable have a pv < 0.5 we render them statistically significant;They all fail to accept null hypothesis.

the objective have been achieved and all the related questions are positively answeared;apart from region.

73% of changes in charges are explained by the explanatory variables in the model.

smokers are charged 23724.1 higher than non smokers holding other variable constant in the model,males are charges 125.5 less than females holding other factors constant.

Holding other factors constant these who have 1,2,3,4,5 children are charged 1002 ,2324 , 1897, 3463 ,1615 higher than reference point(these who have o children)

Holding other factors constant in the model,age groups 35-55, 55-75 are charged higher then reference age group by 4544 and 9388 repectively

since region has ho significance i drop it from the model. PV of 0.3049 from LRT below means we accept null hypotheis that there is no statistically diffence between the two models and hence we drop region.but u can maintain it to improve predictive effect and causatic effect.

lm2<-lm(charges~ages+children+bmis+sex+smoker,data=insurance)
anova(lm,lm2,test="LRT")

## Analysis of Variance Table
## 
## Model 1: charges ~ ages + children + bmis + sex + smoker + region
## Model 2: charges ~ ages + children + bmis + sex + smoker
##   Res.Df        RSS Df  Sum of Sq Pr(>Chi)
## 1   1322 5.2190e+10                       
## 2   1325 5.2333e+10 -3 -143097306   0.3049

plot(lm)


```r
library(caret)

## Loading required package: lattice

## 
## Attaching package: 'caret'

## The following object is masked from 'package:purrr':
## 
##     lift

library(glmnet)

## Loading required package: Matrix

## 
## Attaching package: 'Matrix'

## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack

## Loaded glmnet 4.1-8

library(mlbench)
library(psych)
library(Amelia)

## Loading required package: Rcpp

## ## 
## ## Amelia II: Multiple Imputation
## ## (Version 1.8.1, built: 2022-11-18)
## ## Copyright (C) 2005-2023 James Honaker, Gary King and Matthew Blackwell
## ## Refer to http://gking.harvard.edu/amelia/ for more information
## ##

data(BostonHousing)
data<-BostonHousing
library(reticulate)

data cleaning check missing values

missmap(BostonHousing,col = c("white","black"),y.at = 1,y.labels = '',legend = TRUE)

explore the data

head(data)

##      crim zn indus chas   nox    rm  age    dis rad tax ptratio      b lstat
## 1 0.00632 18  2.31    0 0.538 6.575 65.2 4.0900   1 296    15.3 396.90  4.98
## 2 0.02731  0  7.07    0 0.469 6.421 78.9 4.9671   2 242    17.8 396.90  9.14
## 3 0.02729  0  7.07    0 0.469 7.185 61.1 4.9671   2 242    17.8 392.83  4.03
## 4 0.03237  0  2.18    0 0.458 6.998 45.8 6.0622   3 222    18.7 394.63  2.94
## 5 0.06905  0  2.18    0 0.458 7.147 54.2 6.0622   3 222    18.7 396.90  5.33
## 6 0.02985  0  2.18    0 0.458 6.430 58.7 6.0622   3 222    18.7 394.12  5.21
##   medv
## 1 24.0
## 2 21.6
## 3 34.7
## 4 33.4
## 5 36.2
## 6 28.7

variables

str(data)

## 'data.frame':    506 obs. of  14 variables:
##  $ crim   : num  0.00632 0.02731 0.02729 0.03237 0.06905 ...
##  $ zn     : num  18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
##  $ indus  : num  2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
##  $ chas   : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ nox    : num  0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
##  $ rm     : num  6.58 6.42 7.18 7 7.15 ...
##  $ age    : num  65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
##  $ dis    : num  4.09 4.97 4.97 6.06 6.06 ...
##  $ rad    : num  1 2 2 3 3 3 5 5 5 5 ...
##  $ tax    : num  296 242 242 222 222 222 311 311 311 311 ...
##  $ ptratio: num  15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
##  $ b      : num  397 397 393 395 397 ...
##  $ lstat  : num  4.98 9.14 4.03 2.94 5.33 ...
##  $ medv   : num  24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...

remove factor

head(data[c(-4,-14)])

##      crim zn indus   nox    rm  age    dis rad tax ptratio      b lstat
## 1 0.00632 18  2.31 0.538 6.575 65.2 4.0900   1 296    15.3 396.90  4.98
## 2 0.02731  0  7.07 0.469 6.421 78.9 4.9671   2 242    17.8 396.90  9.14
## 3 0.02729  0  7.07 0.469 7.185 61.1 4.9671   2 242    17.8 392.83  4.03
## 4 0.03237  0  2.18 0.458 6.998 45.8 6.0622   3 222    18.7 394.63  2.94
## 5 0.06905  0  2.18 0.458 7.147 54.2 6.0622   3 222    18.7 396.90  5.33
## 6 0.02985  0  2.18 0.458 6.430 58.7 6.0622   3 222    18.7 394.12  5.21

correlation

pairs.panels(data[c(-4,-14)])

partitioning

set.seed(123)
ind<-sample(2,nrow(data),replace=T,prob=c(0.7,0.3))
train<-data[ind==1,]
test<-data[ind==2,]

custom

custom<-trainControl(method = "repeatedcv",number = 10,repeats = 5,verboseIter = T)

multiple linear regression

set.seed(1234)
lm<-train(medv~.,train,method="lm",trControl=custom)

## + Fold01.Rep1: intercept=TRUE 
## - Fold01.Rep1: intercept=TRUE 
## + Fold02.Rep1: intercept=TRUE 
## - Fold02.Rep1: intercept=TRUE 
## + Fold03.Rep1: intercept=TRUE 
## - Fold03.Rep1: intercept=TRUE 
## + Fold04.Rep1: intercept=TRUE 
## - Fold04.Rep1: intercept=TRUE 
## + Fold05.Rep1: intercept=TRUE 
## - Fold05.Rep1: intercept=TRUE 
## + Fold06.Rep1: intercept=TRUE 
## - Fold06.Rep1: intercept=TRUE 
## + Fold07.Rep1: intercept=TRUE 
## - Fold07.Rep1: intercept=TRUE 
## + Fold08.Rep1: intercept=TRUE 
## - Fold08.Rep1: intercept=TRUE 
## + Fold09.Rep1: intercept=TRUE 
## - Fold09.Rep1: intercept=TRUE 
## + Fold10.Rep1: intercept=TRUE 
## - Fold10.Rep1: intercept=TRUE 
## + Fold01.Rep2: intercept=TRUE 
## - Fold01.Rep2: intercept=TRUE 
## + Fold02.Rep2: intercept=TRUE 
## - Fold02.Rep2: intercept=TRUE 
## + Fold03.Rep2: intercept=TRUE 
## - Fold03.Rep2: intercept=TRUE 
## + Fold04.Rep2: intercept=TRUE 
## - Fold04.Rep2: intercept=TRUE 
## + Fold05.Rep2: intercept=TRUE 
## - Fold05.Rep2: intercept=TRUE 
## + Fold06.Rep2: intercept=TRUE 
## - Fold06.Rep2: intercept=TRUE 
## + Fold07.Rep2: intercept=TRUE 
## - Fold07.Rep2: intercept=TRUE 
## + Fold08.Rep2: intercept=TRUE 
## - Fold08.Rep2: intercept=TRUE 
## + Fold09.Rep2: intercept=TRUE 
## - Fold09.Rep2: intercept=TRUE 
## + Fold10.Rep2: intercept=TRUE 
## - Fold10.Rep2: intercept=TRUE 
## + Fold01.Rep3: intercept=TRUE 
## - Fold01.Rep3: intercept=TRUE 
## + Fold02.Rep3: intercept=TRUE 
## - Fold02.Rep3: intercept=TRUE 
## + Fold03.Rep3: intercept=TRUE 
## - Fold03.Rep3: intercept=TRUE 
## + Fold04.Rep3: intercept=TRUE 
## - Fold04.Rep3: intercept=TRUE 
## + Fold05.Rep3: intercept=TRUE 
## - Fold05.Rep3: intercept=TRUE 
## + Fold06.Rep3: intercept=TRUE 
## - Fold06.Rep3: intercept=TRUE 
## + Fold07.Rep3: intercept=TRUE 
## - Fold07.Rep3: intercept=TRUE 
## + Fold08.Rep3: intercept=TRUE 
## - Fold08.Rep3: intercept=TRUE 
## + Fold09.Rep3: intercept=TRUE 
## - Fold09.Rep3: intercept=TRUE 
## + Fold10.Rep3: intercept=TRUE 
## - Fold10.Rep3: intercept=TRUE 
## + Fold01.Rep4: intercept=TRUE 
## - Fold01.Rep4: intercept=TRUE 
## + Fold02.Rep4: intercept=TRUE 
## - Fold02.Rep4: intercept=TRUE 
## + Fold03.Rep4: intercept=TRUE 
## - Fold03.Rep4: intercept=TRUE 
## + Fold04.Rep4: intercept=TRUE 
## - Fold04.Rep4: intercept=TRUE 
## + Fold05.Rep4: intercept=TRUE 
## - Fold05.Rep4: intercept=TRUE 
## + Fold06.Rep4: intercept=TRUE 
## - Fold06.Rep4: intercept=TRUE 
## + Fold07.Rep4: intercept=TRUE 
## - Fold07.Rep4: intercept=TRUE 
## + Fold08.Rep4: intercept=TRUE 
## - Fold08.Rep4: intercept=TRUE 
## + Fold09.Rep4: intercept=TRUE 
## - Fold09.Rep4: intercept=TRUE 
## + Fold10.Rep4: intercept=TRUE 
## - Fold10.Rep4: intercept=TRUE 
## + Fold01.Rep5: intercept=TRUE 
## - Fold01.Rep5: intercept=TRUE 
## + Fold02.Rep5: intercept=TRUE 
## - Fold02.Rep5: intercept=TRUE 
## + Fold03.Rep5: intercept=TRUE 
## - Fold03.Rep5: intercept=TRUE 
## + Fold04.Rep5: intercept=TRUE 
## - Fold04.Rep5: intercept=TRUE 
## + Fold05.Rep5: intercept=TRUE 
## - Fold05.Rep5: intercept=TRUE 
## + Fold06.Rep5: intercept=TRUE 
## - Fold06.Rep5: intercept=TRUE 
## + Fold07.Rep5: intercept=TRUE 
## - Fold07.Rep5: intercept=TRUE 
## + Fold08.Rep5: intercept=TRUE 
## - Fold08.Rep5: intercept=TRUE 
## + Fold09.Rep5: intercept=TRUE 
## - Fold09.Rep5: intercept=TRUE 
## + Fold10.Rep5: intercept=TRUE 
## - Fold10.Rep5: intercept=TRUE 
## Aggregating results
## Fitting final model on full training set

lm

## Linear Regression 
## 
## 363 samples
##  13 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 326, 327, 326, 327, 326, 327, ... 
## Resampling results:
## 
##   RMSE      Rsquared   MAE     
##   4.975392  0.7293561  3.579993
## 
## Tuning parameter 'intercept' was held constant at a value of TRUE

regression multiple regression results

lm$results

##   intercept     RMSE  Rsquared      MAE   RMSESD RsquaredSD     MAESD
## 1      TRUE 4.975392 0.7293561 3.579993 1.042083 0.08418269 0.5764493

summary

summary(lm)

## 
## Call:
## lm(formula = .outcome ~ ., data = dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.8373  -2.9216  -0.6721   2.1082  27.2939 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  38.185451   6.039394   6.323 7.86e-10 ***
## crim         -0.111750   0.035301  -3.166 0.001683 ** 
## zn            0.054660   0.016150   3.385 0.000794 ***
## indus         0.003361   0.074173   0.045 0.963887    
## chas1         2.528007   1.000831   2.526 0.011982 *  
## nox         -22.848609   4.524133  -5.050 7.12e-07 ***
## rm            4.291549   0.503590   8.522 4.79e-16 ***
## age          -0.004883   0.016287  -0.300 0.764497    
## dis          -1.789839   0.247693  -7.226 3.15e-12 ***
## rad           0.316714   0.079995   3.959 9.12e-05 ***
## tax          -0.011560   0.004490  -2.574 0.010450 *  
## ptratio      -1.041622   0.156402  -6.660 1.07e-10 ***
## b             0.009382   0.003259   2.879 0.004235 ** 
## lstat        -0.434205   0.060133  -7.221 3.26e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.88 on 349 degrees of freedom
## Multiple R-squared:  0.7463, Adjusted R-squared:  0.7368 
## F-statistic: 78.95 on 13 and 349 DF,  p-value: < 2.2e-16

plot

plot(lm$finalModel)

RIDGE REGRESSION

set.seed(1234)
ridge<-train(medv~.,train,method="glmnet",tuneGrid=expand.grid(alpha=0,lambda=seq(0.0001,1,length=5)),trControl=custom)

## + Fold01.Rep1: alpha=0, lambda=1 
## - Fold01.Rep1: alpha=0, lambda=1 
## + Fold02.Rep1: alpha=0, lambda=1 
## - Fold02.Rep1: alpha=0, lambda=1 
## + Fold03.Rep1: alpha=0, lambda=1 
## - Fold03.Rep1: alpha=0, lambda=1 
## + Fold04.Rep1: alpha=0, lambda=1 
## - Fold04.Rep1: alpha=0, lambda=1 
## + Fold05.Rep1: alpha=0, lambda=1 
## - Fold05.Rep1: alpha=0, lambda=1 
## + Fold06.Rep1: alpha=0, lambda=1 
## - Fold06.Rep1: alpha=0, lambda=1 
## + Fold07.Rep1: alpha=0, lambda=1 
## - Fold07.Rep1: alpha=0, lambda=1 
## + Fold08.Rep1: alpha=0, lambda=1 
## - Fold08.Rep1: alpha=0, lambda=1 
## + Fold09.Rep1: alpha=0, lambda=1 
## - Fold09.Rep1: alpha=0, lambda=1 
## + Fold10.Rep1: alpha=0, lambda=1 
## - Fold10.Rep1: alpha=0, lambda=1 
## + Fold01.Rep2: alpha=0, lambda=1 
## - Fold01.Rep2: alpha=0, lambda=1 
## + Fold02.Rep2: alpha=0, lambda=1 
## - Fold02.Rep2: alpha=0, lambda=1 
## + Fold03.Rep2: alpha=0, lambda=1 
## - Fold03.Rep2: alpha=0, lambda=1 
## + Fold04.Rep2: alpha=0, lambda=1 
## - Fold04.Rep2: alpha=0, lambda=1 
## + Fold05.Rep2: alpha=0, lambda=1 
## - Fold05.Rep2: alpha=0, lambda=1 
## + Fold06.Rep2: alpha=0, lambda=1 
## - Fold06.Rep2: alpha=0, lambda=1 
## + Fold07.Rep2: alpha=0, lambda=1 
## - Fold07.Rep2: alpha=0, lambda=1 
## + Fold08.Rep2: alpha=0, lambda=1 
## - Fold08.Rep2: alpha=0, lambda=1 
## + Fold09.Rep2: alpha=0, lambda=1 
## - Fold09.Rep2: alpha=0, lambda=1 
## + Fold10.Rep2: alpha=0, lambda=1 
## - Fold10.Rep2: alpha=0, lambda=1 
## + Fold01.Rep3: alpha=0, lambda=1 
## - Fold01.Rep3: alpha=0, lambda=1 
## + Fold02.Rep3: alpha=0, lambda=1 
## - Fold02.Rep3: alpha=0, lambda=1 
## + Fold03.Rep3: alpha=0, lambda=1 
## - Fold03.Rep3: alpha=0, lambda=1 
## + Fold04.Rep3: alpha=0, lambda=1 
## - Fold04.Rep3: alpha=0, lambda=1 
## + Fold05.Rep3: alpha=0, lambda=1 
## - Fold05.Rep3: alpha=0, lambda=1 
## + Fold06.Rep3: alpha=0, lambda=1 
## - Fold06.Rep3: alpha=0, lambda=1 
## + Fold07.Rep3: alpha=0, lambda=1 
## - Fold07.Rep3: alpha=0, lambda=1 
## + Fold08.Rep3: alpha=0, lambda=1 
## - Fold08.Rep3: alpha=0, lambda=1 
## + Fold09.Rep3: alpha=0, lambda=1 
## - Fold09.Rep3: alpha=0, lambda=1 
## + Fold10.Rep3: alpha=0, lambda=1 
## - Fold10.Rep3: alpha=0, lambda=1 
## + Fold01.Rep4: alpha=0, lambda=1 
## - Fold01.Rep4: alpha=0, lambda=1 
## + Fold02.Rep4: alpha=0, lambda=1 
## - Fold02.Rep4: alpha=0, lambda=1 
## + Fold03.Rep4: alpha=0, lambda=1 
## - Fold03.Rep4: alpha=0, lambda=1 
## + Fold04.Rep4: alpha=0, lambda=1 
## - Fold04.Rep4: alpha=0, lambda=1 
## + Fold05.Rep4: alpha=0, lambda=1 
## - Fold05.Rep4: alpha=0, lambda=1 
## + Fold06.Rep4: alpha=0, lambda=1 
## - Fold06.Rep4: alpha=0, lambda=1 
## + Fold07.Rep4: alpha=0, lambda=1 
## - Fold07.Rep4: alpha=0, lambda=1 
## + Fold08.Rep4: alpha=0, lambda=1 
## - Fold08.Rep4: alpha=0, lambda=1 
## + Fold09.Rep4: alpha=0, lambda=1 
## - Fold09.Rep4: alpha=0, lambda=1 
## + Fold10.Rep4: alpha=0, lambda=1 
## - Fold10.Rep4: alpha=0, lambda=1 
## + Fold01.Rep5: alpha=0, lambda=1 
## - Fold01.Rep5: alpha=0, lambda=1 
## + Fold02.Rep5: alpha=0, lambda=1 
## - Fold02.Rep5: alpha=0, lambda=1 
## + Fold03.Rep5: alpha=0, lambda=1 
## - Fold03.Rep5: alpha=0, lambda=1 
## + Fold04.Rep5: alpha=0, lambda=1 
## - Fold04.Rep5: alpha=0, lambda=1 
## + Fold05.Rep5: alpha=0, lambda=1 
## - Fold05.Rep5: alpha=0, lambda=1 
## + Fold06.Rep5: alpha=0, lambda=1 
## - Fold06.Rep5: alpha=0, lambda=1 
## + Fold07.Rep5: alpha=0, lambda=1 
## - Fold07.Rep5: alpha=0, lambda=1 
## + Fold08.Rep5: alpha=0, lambda=1 
## - Fold08.Rep5: alpha=0, lambda=1 
## + Fold09.Rep5: alpha=0, lambda=1 
## - Fold09.Rep5: alpha=0, lambda=1 
## + Fold10.Rep5: alpha=0, lambda=1 
## - Fold10.Rep5: alpha=0, lambda=1 
## Aggregating results
## Selecting tuning parameters
## Fitting alpha = 0, lambda = 0.5 on full training set

ridge

## glmnet 
## 
## 363 samples
##  13 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 326, 327, 326, 327, 326, 327, ... 
## Resampling results across tuning parameters:
## 
##   lambda    RMSE      Rsquared   MAE     
##   0.000100  4.970605  0.7311616  3.505536
##   0.250075  4.970605  0.7311616  3.505536
##   0.500050  4.970605  0.7311616  3.505536
##   0.750025  4.974643  0.7310088  3.504826
##   1.000000  4.990820  0.7303565  3.506125
## 
## Tuning parameter 'alpha' was held constant at a value of 0
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were alpha = 0 and lambda = 0.50005.

plotting

plot(ridge)

ploting further

plot(ridge$finalModel,xvar="lambda",label=T)

PLOT III

plot(ridge$finalModel,xvar="dev",label=T)

PLOT VARIMP

plot(varImp(ridge,scale=F))

LASSO REGRESSION

set.seed(123)
lasso<-train(medv~.,train,method="glmnet",tuneGrid=expand.grid(alpha=1,lambda=seq(0.0001,1,length=5)),trControl=custom)

## + Fold01.Rep1: alpha=1, lambda=1 
## - Fold01.Rep1: alpha=1, lambda=1 
## + Fold02.Rep1: alpha=1, lambda=1 
## - Fold02.Rep1: alpha=1, lambda=1 
## + Fold03.Rep1: alpha=1, lambda=1 
## - Fold03.Rep1: alpha=1, lambda=1 
## + Fold04.Rep1: alpha=1, lambda=1 
## - Fold04.Rep1: alpha=1, lambda=1 
## + Fold05.Rep1: alpha=1, lambda=1 
## - Fold05.Rep1: alpha=1, lambda=1 
## + Fold06.Rep1: alpha=1, lambda=1 
## - Fold06.Rep1: alpha=1, lambda=1 
## + Fold07.Rep1: alpha=1, lambda=1 
## - Fold07.Rep1: alpha=1, lambda=1 
## + Fold08.Rep1: alpha=1, lambda=1 
## - Fold08.Rep1: alpha=1, lambda=1 
## + Fold09.Rep1: alpha=1, lambda=1 
## - Fold09.Rep1: alpha=1, lambda=1 
## + Fold10.Rep1: alpha=1, lambda=1 
## - Fold10.Rep1: alpha=1, lambda=1 
## + Fold01.Rep2: alpha=1, lambda=1 
## - Fold01.Rep2: alpha=1, lambda=1 
## + Fold02.Rep2: alpha=1, lambda=1 
## - Fold02.Rep2: alpha=1, lambda=1 
## + Fold03.Rep2: alpha=1, lambda=1 
## - Fold03.Rep2: alpha=1, lambda=1 
## + Fold04.Rep2: alpha=1, lambda=1 
## - Fold04.Rep2: alpha=1, lambda=1 
## + Fold05.Rep2: alpha=1, lambda=1 
## - Fold05.Rep2: alpha=1, lambda=1 
## + Fold06.Rep2: alpha=1, lambda=1 
## - Fold06.Rep2: alpha=1, lambda=1 
## + Fold07.Rep2: alpha=1, lambda=1 
## - Fold07.Rep2: alpha=1, lambda=1 
## + Fold08.Rep2: alpha=1, lambda=1 
## - Fold08.Rep2: alpha=1, lambda=1 
## + Fold09.Rep2: alpha=1, lambda=1 
## - Fold09.Rep2: alpha=1, lambda=1 
## + Fold10.Rep2: alpha=1, lambda=1 
## - Fold10.Rep2: alpha=1, lambda=1 
## + Fold01.Rep3: alpha=1, lambda=1 
## - Fold01.Rep3: alpha=1, lambda=1 
## + Fold02.Rep3: alpha=1, lambda=1 
## - Fold02.Rep3: alpha=1, lambda=1 
## + Fold03.Rep3: alpha=1, lambda=1 
## - Fold03.Rep3: alpha=1, lambda=1 
## + Fold04.Rep3: alpha=1, lambda=1 
## - Fold04.Rep3: alpha=1, lambda=1 
## + Fold05.Rep3: alpha=1, lambda=1 
## - Fold05.Rep3: alpha=1, lambda=1 
## + Fold06.Rep3: alpha=1, lambda=1 
## - Fold06.Rep3: alpha=1, lambda=1 
## + Fold07.Rep3: alpha=1, lambda=1 
## - Fold07.Rep3: alpha=1, lambda=1 
## + Fold08.Rep3: alpha=1, lambda=1 
## - Fold08.Rep3: alpha=1, lambda=1 
## + Fold09.Rep3: alpha=1, lambda=1 
## - Fold09.Rep3: alpha=1, lambda=1 
## + Fold10.Rep3: alpha=1, lambda=1 
## - Fold10.Rep3: alpha=1, lambda=1 
## + Fold01.Rep4: alpha=1, lambda=1 
## - Fold01.Rep4: alpha=1, lambda=1 
## + Fold02.Rep4: alpha=1, lambda=1 
## - Fold02.Rep4: alpha=1, lambda=1 
## + Fold03.Rep4: alpha=1, lambda=1 
## - Fold03.Rep4: alpha=1, lambda=1 
## + Fold04.Rep4: alpha=1, lambda=1 
## - Fold04.Rep4: alpha=1, lambda=1 
## + Fold05.Rep4: alpha=1, lambda=1 
## - Fold05.Rep4: alpha=1, lambda=1 
## + Fold06.Rep4: alpha=1, lambda=1 
## - Fold06.Rep4: alpha=1, lambda=1 
## + Fold07.Rep4: alpha=1, lambda=1 
## - Fold07.Rep4: alpha=1, lambda=1 
## + Fold08.Rep4: alpha=1, lambda=1 
## - Fold08.Rep4: alpha=1, lambda=1 
## + Fold09.Rep4: alpha=1, lambda=1 
## - Fold09.Rep4: alpha=1, lambda=1 
## + Fold10.Rep4: alpha=1, lambda=1 
## - Fold10.Rep4: alpha=1, lambda=1 
## + Fold01.Rep5: alpha=1, lambda=1 
## - Fold01.Rep5: alpha=1, lambda=1 
## + Fold02.Rep5: alpha=1, lambda=1 
## - Fold02.Rep5: alpha=1, lambda=1 
## + Fold03.Rep5: alpha=1, lambda=1 
## - Fold03.Rep5: alpha=1, lambda=1 
## + Fold04.Rep5: alpha=1, lambda=1 
## - Fold04.Rep5: alpha=1, lambda=1 
## + Fold05.Rep5: alpha=1, lambda=1 
## - Fold05.Rep5: alpha=1, lambda=1 
## + Fold06.Rep5: alpha=1, lambda=1 
## - Fold06.Rep5: alpha=1, lambda=1 
## + Fold07.Rep5: alpha=1, lambda=1 
## - Fold07.Rep5: alpha=1, lambda=1 
## + Fold08.Rep5: alpha=1, lambda=1 
## - Fold08.Rep5: alpha=1, lambda=1 
## + Fold09.Rep5: alpha=1, lambda=1 
## - Fold09.Rep5: alpha=1, lambda=1 
## + Fold10.Rep5: alpha=1, lambda=1 
## - Fold10.Rep5: alpha=1, lambda=1 
## Aggregating results
## Selecting tuning parameters
## Fitting alpha = 1, lambda = 1e-04 on full training set

lasso

## glmnet 
## 
## 363 samples
##  13 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 326, 326, 327, 326, 327, 327, ... 
## Resampling results across tuning parameters:
## 
##   lambda    RMSE      Rsquared   MAE     
##   0.000100  4.979856  0.7346555  3.567965
##   0.250075  5.142243  0.7166092  3.597452
##   0.500050  5.379660  0.6923277  3.770186
##   0.750025  5.526544  0.6786284  3.896402
##   1.000000  5.598585  0.6761963  3.959097
## 
## Tuning parameter 'alpha' was held constant at a value of 1
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were alpha = 1 and lambda = 1e-04.

plot

plot(lasso)

plot

plot(lasso$finalModel,xvr="lambda",label=T)

## Warning in plot.window(...): "xvr" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "xvr" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "xvr" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "xvr" is not a
## graphical parameter

## Warning in box(...): "xvr" is not a graphical parameter

## Warning in title(...): "xvr" is not a graphical parameter

PLOT

plot(lasso$finalModel,xvr="dev",label=T)

## Warning in plot.window(...): "xvr" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "xvr" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "xvr" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "xvr" is not a
## graphical parameter

## Warning in box(...): "xvr" is not a graphical parameter

## Warning in title(...): "xvr" is not a graphical parameter

PLOT

plot(lasso$finalModel,xvr="dev",label=T)

## Warning in plot.window(...): "xvr" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "xvr" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "xvr" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "xvr" is not a
## graphical parameter

## Warning in box(...): "xvr" is not a graphical parameter

## Warning in title(...): "xvr" is not a graphical parameter

PLOT

plot(varImp(lasso,scale=F))

ELASTIC NET REGRESSION

set.seed(1234)
en<-train(medv~.,train,method="glmnet",tuneGrid=expand.grid(alpha=1,lambda=seq(0,1,length=5)),trControl=custom)

## + Fold01.Rep1: alpha=1, lambda=1 
## - Fold01.Rep1: alpha=1, lambda=1 
## + Fold02.Rep1: alpha=1, lambda=1 
## - Fold02.Rep1: alpha=1, lambda=1 
## + Fold03.Rep1: alpha=1, lambda=1 
## - Fold03.Rep1: alpha=1, lambda=1 
## + Fold04.Rep1: alpha=1, lambda=1 
## - Fold04.Rep1: alpha=1, lambda=1 
## + Fold05.Rep1: alpha=1, lambda=1 
## - Fold05.Rep1: alpha=1, lambda=1 
## + Fold06.Rep1: alpha=1, lambda=1 
## - Fold06.Rep1: alpha=1, lambda=1 
## + Fold07.Rep1: alpha=1, lambda=1 
## - Fold07.Rep1: alpha=1, lambda=1 
## + Fold08.Rep1: alpha=1, lambda=1 
## - Fold08.Rep1: alpha=1, lambda=1 
## + Fold09.Rep1: alpha=1, lambda=1 
## - Fold09.Rep1: alpha=1, lambda=1 
## + Fold10.Rep1: alpha=1, lambda=1 
## - Fold10.Rep1: alpha=1, lambda=1 
## + Fold01.Rep2: alpha=1, lambda=1 
## - Fold01.Rep2: alpha=1, lambda=1 
## + Fold02.Rep2: alpha=1, lambda=1 
## - Fold02.Rep2: alpha=1, lambda=1 
## + Fold03.Rep2: alpha=1, lambda=1 
## - Fold03.Rep2: alpha=1, lambda=1 
## + Fold04.Rep2: alpha=1, lambda=1 
## - Fold04.Rep2: alpha=1, lambda=1 
## + Fold05.Rep2: alpha=1, lambda=1 
## - Fold05.Rep2: alpha=1, lambda=1 
## + Fold06.Rep2: alpha=1, lambda=1 
## - Fold06.Rep2: alpha=1, lambda=1 
## + Fold07.Rep2: alpha=1, lambda=1 
## - Fold07.Rep2: alpha=1, lambda=1 
## + Fold08.Rep2: alpha=1, lambda=1 
## - Fold08.Rep2: alpha=1, lambda=1 
## + Fold09.Rep2: alpha=1, lambda=1 
## - Fold09.Rep2: alpha=1, lambda=1 
## + Fold10.Rep2: alpha=1, lambda=1 
## - Fold10.Rep2: alpha=1, lambda=1 
## + Fold01.Rep3: alpha=1, lambda=1 
## - Fold01.Rep3: alpha=1, lambda=1 
## + Fold02.Rep3: alpha=1, lambda=1 
## - Fold02.Rep3: alpha=1, lambda=1 
## + Fold03.Rep3: alpha=1, lambda=1 
## - Fold03.Rep3: alpha=1, lambda=1 
## + Fold04.Rep3: alpha=1, lambda=1 
## - Fold04.Rep3: alpha=1, lambda=1 
## + Fold05.Rep3: alpha=1, lambda=1 
## - Fold05.Rep3: alpha=1, lambda=1 
## + Fold06.Rep3: alpha=1, lambda=1 
## - Fold06.Rep3: alpha=1, lambda=1 
## + Fold07.Rep3: alpha=1, lambda=1 
## - Fold07.Rep3: alpha=1, lambda=1 
## + Fold08.Rep3: alpha=1, lambda=1 
## - Fold08.Rep3: alpha=1, lambda=1 
## + Fold09.Rep3: alpha=1, lambda=1 
## - Fold09.Rep3: alpha=1, lambda=1 
## + Fold10.Rep3: alpha=1, lambda=1 
## - Fold10.Rep3: alpha=1, lambda=1 
## + Fold01.Rep4: alpha=1, lambda=1 
## - Fold01.Rep4: alpha=1, lambda=1 
## + Fold02.Rep4: alpha=1, lambda=1 
## - Fold02.Rep4: alpha=1, lambda=1 
## + Fold03.Rep4: alpha=1, lambda=1 
## - Fold03.Rep4: alpha=1, lambda=1 
## + Fold04.Rep4: alpha=1, lambda=1 
## - Fold04.Rep4: alpha=1, lambda=1 
## + Fold05.Rep4: alpha=1, lambda=1 
## - Fold05.Rep4: alpha=1, lambda=1 
## + Fold06.Rep4: alpha=1, lambda=1 
## - Fold06.Rep4: alpha=1, lambda=1 
## + Fold07.Rep4: alpha=1, lambda=1 
## - Fold07.Rep4: alpha=1, lambda=1 
## + Fold08.Rep4: alpha=1, lambda=1 
## - Fold08.Rep4: alpha=1, lambda=1 
## + Fold09.Rep4: alpha=1, lambda=1 
## - Fold09.Rep4: alpha=1, lambda=1 
## + Fold10.Rep4: alpha=1, lambda=1 
## - Fold10.Rep4: alpha=1, lambda=1 
## + Fold01.Rep5: alpha=1, lambda=1 
## - Fold01.Rep5: alpha=1, lambda=1 
## + Fold02.Rep5: alpha=1, lambda=1 
## - Fold02.Rep5: alpha=1, lambda=1 
## + Fold03.Rep5: alpha=1, lambda=1 
## - Fold03.Rep5: alpha=1, lambda=1 
## + Fold04.Rep5: alpha=1, lambda=1 
## - Fold04.Rep5: alpha=1, lambda=1 
## + Fold05.Rep5: alpha=1, lambda=1 
## - Fold05.Rep5: alpha=1, lambda=1 
## + Fold06.Rep5: alpha=1, lambda=1 
## - Fold06.Rep5: alpha=1, lambda=1 
## + Fold07.Rep5: alpha=1, lambda=1 
## - Fold07.Rep5: alpha=1, lambda=1 
## + Fold08.Rep5: alpha=1, lambda=1 
## - Fold08.Rep5: alpha=1, lambda=1 
## + Fold09.Rep5: alpha=1, lambda=1 
## - Fold09.Rep5: alpha=1, lambda=1 
## + Fold10.Rep5: alpha=1, lambda=1 
## - Fold10.Rep5: alpha=1, lambda=1 
## Aggregating results
## Selecting tuning parameters
## Fitting alpha = 1, lambda = 0 on full training set

en

## glmnet 
## 
## 363 samples
##  13 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 326, 327, 326, 327, 326, 327, ... 
## Resampling results across tuning parameters:
## 
##   lambda  RMSE      Rsquared   MAE     
##   0.00    4.971737  0.7296652  3.572301
##   0.25    5.114796  0.7165778  3.589798
##   0.50    5.340177  0.6970873  3.756935
##   0.75    5.491257  0.6841587  3.888663
##   1.00    5.564690  0.6816318  3.955273
## 
## Tuning parameter 'alpha' was held constant at a value of 1
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were alpha = 1 and lambda = 0.

plot

plot(en)

plot

plot(en$finalModel,xvar="lambda",label=T)

PLOT

plot(en$finalModel,xvar="dev",label=T)

plot(varImp(en))

compare models

model_list<-list(linearmodel=lm,Ridge=ridge,Lasso=lasso,Elasticnet=en)
res<-resamples(model_list)
res

## 
## Call:
## resamples.default(x = model_list)
## 
## Models: linearmodel, Ridge, Lasso, Elasticnet 
## Number of resamples: 50 
## Performance metrics: MAE, RMSE, Rsquared 
## Time estimates for: everything, final model fit

summary

summary(res)

## 
## Call:
## summary.resamples(object = res)
## 
## Models: linearmodel, Ridge, Lasso, Elasticnet 
## Number of resamples: 50 
## 
## MAE 
##                 Min.  1st Qu.   Median     Mean  3rd Qu.     Max. NA's
## linearmodel 2.677284 3.187491 3.482063 3.579993 3.877766 5.247107    0
## Ridge       2.272557 3.049638 3.411169 3.505536 3.806774 5.202456    0
## Lasso       2.604518 3.245134 3.619309 3.567965 3.855642 4.994723    0
## Elasticnet  2.642375 3.157806 3.477071 3.572301 3.875828 5.239103    0
## 
## RMSE 
##                 Min.  1st Qu.   Median     Mean  3rd Qu.     Max. NA's
## linearmodel 3.389000 4.115437 4.771370 4.975392 5.664986 7.702679    0
## Ridge       2.868358 4.069039 4.755433 4.970605 5.622595 7.940918    0
## Lasso       3.482731 4.287392 4.793794 4.979856 5.284133 8.242338    0
## Elasticnet  3.346567 4.098251 4.765642 4.971737 5.653089 7.717937    0
## 
## Rsquared 
##                  Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
## linearmodel 0.5577033 0.6877298 0.7252159 0.7293561 0.7874223 0.8934893    0
## Ridge       0.5275743 0.6867336 0.7430211 0.7311616 0.7837230 0.8956834    0
## Lasso       0.4700928 0.6701740 0.7468853 0.7346555 0.8093734 0.8928662    0
## Elasticnet  0.5558059 0.6891828 0.7246773 0.7296652 0.7881560 0.8938194    0

xyplot

xyplot(res,metric="RMSE")

BEST MODEL

best<-en$bestTune
best<-en$finalModel
coef(best,s=en$bestTune$lambda)

## 14 x 1 sparse Matrix of class "dgCMatrix"
##                        s1
## (Intercept)  37.592179762
## crim         -0.109348699
## zn            0.053511177
## indus         .          
## chas1         2.515006344
## nox         -22.527533084
## rm            4.307031899
## age          -0.004354252
## dis          -1.762718351
## rad           0.302126999
## tax          -0.010922566
## ptratio      -1.034782961
## b             0.009306407
## lstat        -0.434020577

prediction

p1<-predict(en,train)
p1

##         1         3         6         7         9        10        12        13 
## 30.029355 30.396753 24.561699 22.921030 12.522902 18.865567 21.307142 21.232986 
##        14        15        17        18        19        22        23        25 
## 18.692488 18.633674 19.731177 16.681525 15.772828 17.385066 16.075634 15.460292 
##        27        28        29        30        33        35        36        38 
## 14.993827 14.651989 19.278293 20.719439  9.819935 14.110012 23.765122 22.905485 
##        39        40        41        42        43        44        45        46 
## 22.974545 31.774338 34.663044 27.872484 24.826324 24.401457 22.685034 21.917084 
##        47        48        49        51        52        54        55        56 
## 20.655299 18.379298 10.123980 21.301707 23.615972 23.753771 15.837700 31.241383 
##        57        60        61        62        63        64        66        70 
## 24.516441 20.438613 17.310199 18.157273 23.189925 22.066266 30.797144 20.543840 
##        72        74        75        76        77        78        79        80 
## 21.778072 24.064215 25.782342 24.177855 23.408338 23.730307 21.563822 22.491122 
##        81        82        83        85        86        90        91        92 
## 28.512491 27.036799 26.067510 24.939130 27.841751 31.214345 27.532738 27.725768 
##        93        94        95        96        98        99       100       101 
## 29.244565 29.217268 27.362654 29.084243 36.645698 35.994169 33.038339 24.865356 
##       102       103       105       108       109       110       112       113 
## 25.768284 20.016943 21.757105 21.322585 23.078511 20.375466 27.052276 21.485895 
##       116       117       119       120       121       122       123       124 
## 21.101957 23.837059 21.016411 21.201315 21.666406 22.223412 20.607733 17.063676 
##       125       127       128       129       130       131       133       135 
## 20.547315 15.421737 15.142586 19.084893 14.046744 19.853364 19.647732 13.090447 
##       136       140       141       142       143       144       146       147 
## 17.420815 16.779922 14.469562  5.258234 14.579225 12.273893 12.616658 15.112643 
##       148       149       152       153       154       155       156       157 
##  8.662768  9.875755 17.154117 18.559761 16.516555 21.549876 19.357130 12.657012 
##       158       159       160       161       162       163       164       165 
## 33.287297 28.809813 24.385979 32.348996 36.820134 40.546286 42.235114 24.784750 
##       166       168       169       170       171       172       175       176 
## 25.365368 23.204308 26.652203 26.922018 22.801499 24.317177 26.842734 31.049455 
##       177       178       180       182       184       185       186       187 
## 25.923752 29.180476 33.299926 28.174823 31.027790 23.018814 25.286960 36.550080 
##       188       191       192       194       196       197       198       199 
## 34.701145 31.304327 30.643005 32.790996 42.089150 37.016091 33.866751 35.795925 
##       200       201       203       204       205       207       208       209 
## 30.716619 31.343821 38.233107 43.507461 44.677221 23.741933 18.184987 23.586327 
##       210       211       212       213       214       215       217       218 
## 17.402570 22.606997 17.685627 23.022535 25.378294 12.845758 26.659772 28.375969 
##       221       224       225       226       227       228       229       231 
## 33.644571 29.941548 39.207950 40.900219 38.085233 32.789123 36.156637 24.694900 
##       232       233       234       235       236       237       239       241 
## 33.585206 38.508323 37.781603 31.795883 25.484735 30.076957 28.598154 27.904319 
##       242       243       244       245       247       251       252       253 
## 23.992499 24.254490 27.234300 15.558779 19.380342 23.762839 24.385389 24.092584 
##       254       255       257       258       259       263       265       266 
## 29.758281 23.648192 38.351456 44.535004 37.183861 41.978386 36.531534 28.467036 
##       267       268       269       270       272       273       274       278 
## 32.412671 42.345413 39.967621 25.980389 27.246930 28.739793 35.907790 34.957078 
##       279       280       282       283       284       285       286       287 
## 30.929424 35.954677 34.676591 40.652864 45.906817 32.576524 27.839164 20.275697 
##       288       289       290       291       292       293       298       299 
## 27.011402 27.205246 27.317851 33.933261 35.065631 32.363199 19.591201 29.317746 
##       302       305       306       307       308       309       310       311 
## 29.601553 33.922149 31.456079 36.434611 33.377825 28.224709 23.426212 18.485754 
##       312       314       315       318       319       322       323       325 
## 26.680723 25.264004 25.353925 18.418161 24.290884 24.494107 22.373109 24.713462 
##       326       328       329       331       332       335       336       337 
## 24.105969 19.159140 21.447661 21.702029 20.107221 20.467446 19.665673 19.443841 
##       338       339       341       342       343       344       345       346 
## 18.472321 21.592714 20.773890 30.525017 21.896050 27.953967 28.583016 15.823599 
##       348       349       350       351       353       354       355       357 
## 25.116765 27.409012 21.632987 19.678949 15.971861 24.116600 13.064029 19.319598 
##       358       359       361       362       364       365       367       368 
## 22.027816 21.190345 21.662456 18.485233 19.553477 37.727454 14.840983  9.950105 
##       369       370       371       372       374       375       378       379 
## 22.617232 32.191311 34.289291 24.989386  7.631042  2.191851 21.389142 16.992755 
##       381       383       385       387       388       389       390       392 
## 15.159348 13.965652  3.945227  6.663718  6.704284  7.291424 14.388025 17.578658 
##       395       396       397       398       399       402       404       405 
## 17.993011 20.717611 19.912041 16.670989  7.702981 18.481691 13.074444  8.312991 
##       406       408       409       410       411       413       414       415 
##  8.765305 19.792561 15.079756 21.215801 15.131298  3.269362 12.297671 -3.071166 
##       416       418       419       420       421       422       423       424 
## 11.079721  7.708828  6.576804 15.676791 19.626689 18.035926 18.615683 14.329842 
##       426       427       428       429       432       433       435       436 
## 10.734174 16.949499 14.210641 15.045954 19.959264 21.964047 15.842339 14.092238 
##       437       438       439       440       441       442       444       448 
## 14.732136  9.630333  6.346714 13.355870 12.997742 17.579725 18.332074 18.019147 
##       449       450       451       452       453       454       455       459 
## 17.631512 17.589919 16.872240 19.704515 18.682908 23.020719 15.673445 17.113835 
##       460       462       463       464       465       466       467       468 
## 18.207339 20.036266 19.524251 21.888314 20.243344 17.733011 14.464244 17.822137 
##       469       471       473       475       476       477       478       479 
## 17.547512 20.596876 22.930391 16.713147 17.396006 21.337025 12.350325 19.815520 
##       481       483       484       486       487       488       489       492 
## 23.643893 28.755741 20.925021 22.079396 19.798358 21.292015 12.215478 14.338384 
##       493       495       497       498       499       501       502       503 
## 16.089936 20.992295 14.538963 19.180972 21.540926 20.745266 23.444228 22.014430 
##       504       505       506 
## 27.341846 25.773079 21.746144

RMSE

sqrt(mean(train$medv-p1)^2)

## [1] 1.28278e-14

test prediction

p2<-predict(en,test)
sqrt(mean(test$medv-p2)^2)

## [1] 0.6604127

library(tidyverse)
library(readr)
library(nnet)
Admission_Predict <- read_csv("C:/Users/USER/Desktop/Admission_Predict.csv")

## Rows: 400 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (4): gre, sop, cgpa, admitted
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

attach(Admission_Predict)
Admission_Predict$admitted<-as.factor(Admission_Predict$admitted)
data<-Admission_Predict
str(data)

## spc_tbl_ [400 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ gre     : num [1:400] 337 324 316 322 314 330 321 308 302 323 ...
##  $ sop     : num [1:400] 4.5 4 3 3.5 2 4.5 3 3 2 3.5 ...
##  $ cgpa    : num [1:400] 9.65 8.87 8 8.67 8.21 9.34 8.2 7.9 8 8.6 ...
##  $ admitted: Factor w/ 2 levels "0","1": 2 2 2 2 1 2 2 1 1 1 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   gre = col_double(),
##   ..   sop = col_double(),
##   ..   cgpa = col_double(),
##   ..   admitted = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

explore the data

summary(data)

##       gre             sop           cgpa       admitted
##  Min.   :290.0   Min.   :1.0   Min.   :6.800   0:181   
##  1st Qu.:308.0   1st Qu.:2.5   1st Qu.:8.170   1:219   
##  Median :317.0   Median :3.5   Median :8.610           
##  Mean   :316.8   Mean   :3.4   Mean   :8.599           
##  3rd Qu.:325.0   3rd Qu.:4.0   3rd Qu.:9.062           
##  Max.   :340.0   Max.   :5.0   Max.   :9.920

table(data$admitted)

## 
##   0   1 
## 181 219

data partitioning

set.seed(123)
pd<-sample(2,nrow(data),replace = TRUE,prob = c(0.8,0.2))
train<-data[pd==1,]
test<-data[pd==2,]

MODEL

m1<-multinom(admitted~.,data=train)

## # weights:  5 (4 variable)
## initial  value 225.272834 
## iter  10 value 155.133415
## final  value 155.131838 
## converged

m1

## Call:
## multinom(formula = admitted ~ ., data = train)
## 
## Coefficients:
## (Intercept)         gre         sop        cgpa 
## -40.1574370   0.1192646   0.4172153   0.1479590 
## 
## Residual Deviance: 310.2637 
## AIC: 318.2637

miclassification rate

p<-predict(m1,data=train)
p

##   [1] 1 1 0 1 1 0 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1
##  [38] 1 1 0 0 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 1 1 0 0 0 0 1 1 1 1 1 1 1 0
##  [75] 0 0 0 0 0 1 1 1 1 0 1 1 1 1 0 1 0 0 1 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 1 0
## [112] 1 1 1 1 1 1 0 1 1 0 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 0
## [149] 0 0 1 0 1 1 1 1 1 1 0 0 0 1 1 0 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0
## [186] 0 1 1 1 0 0 1 1 1 0 0 0 1 1 0 1 1 1 1 0 1 1 1 0 0 1 1 0 1 0 0 0 1 0 0 0 0
## [223] 0 1 1 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 0 0 1 1 1 0 1 1 1 0 0 1 1 1 0 0
## [260] 0 1 1 0 0 0 1 0 1 1 0 0 1 1 1 1 1 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1 0 0
## [297] 1 1 0 0 0 0 1 1 0 0 0 0 1 1 1 0 1 0 0 0 1 0 1 1 1 1 1 1 0
## Levels: 0 1

table

ta<-table(p,train$admitted)
ta

##    
## p     0   1
##   0 108  34
##   1  38 145

accuracy (correct classification)

sum(diag(ta))/sum(ta)

## [1] 0.7784615

how many admitted not admitted

table(train$admitted)

## 
##   0   1 
## 146 179

model performance evaluation

library(ROCR)
pred<-predict(m1,data,type="prob")
head(pred)

##         1         2         3         4         5         6 
## 0.9657758 0.8123789 0.4913724 0.7288536 0.3409223 0.9212373

head(data)

## # A tibble: 6 × 4
##     gre   sop  cgpa admitted
##   <dbl> <dbl> <dbl> <fct>   
## 1   337   4.5  9.65 1       
## 2   324   4    8.87 1       
## 3   316   3    8    1       
## 4   322   3.5  8.67 1       
## 5   314   2    8.21 0       
## 6   330   4.5  9.34 1

histogram

hist(pred)

roc_curve

pred<-prediction(pred,data$admitted)
pred

## A prediction instance
##   with 400 data points

eval<-performance(pred,"acc")
plot(eval)
abline(h=0.79,v=0.60)

identify best cut off and accuracy

max<-which.max(slot(eval,"y.values")[[1]])
acc<-slot(eval,"y.values")[[1]][max]
max

## [1] 192

acc

## [1] 0.7925

cutoff<-slot(eval,"x.values")[[1]][max]
cutoff

##       334 
## 0.5994509

print(c(accuracy=acc,cutoffp=cutoff))

##    accuracy cutoffp.334 
##   0.7925000   0.5994509

performance(pred,"tpr")

## A performance instance
##   'Cutoff' vs. 'True positive rate' (alpha: 'none')
##   with 398 data points

roc<-performance(pred,"tpr","fpr")
plot(roc)
abline(a=0,b=1)

plot(roc,colorize=T,main="Roc Curve",ylab="sensitivity",xlab="1-specificity")
abline(a=0,b=1)

are under the curve

auc<-performance(pred,"auc")
auc<-unlist(slot(auc,"y.values"))
auc

## [1] 0.8513459

auc<-round(auc,2)
auc

## [1] 0.85

#legend(.6,.2,legend="auc",title = "AUC"))

multinomial logistic regression

Maternal_Health_Risk_Data_Set <- read_csv("Maternal Health Risk Data Set.csv")

## Rows: 1014 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): RiskLevel
## dbl (6): Age, SystolicBP, DiastolicBP, BS, BodyTemp, HeartRate
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

maternal_risk<-Maternal_Health_Risk_Data_Set
head(maternal_risk)

## # A tibble: 6 × 7
##     Age SystolicBP DiastolicBP    BS BodyTemp HeartRate RiskLevel
##   <dbl>      <dbl>       <dbl> <dbl>    <dbl>     <dbl> <chr>    
## 1    25        130          80 15          98        86 high risk
## 2    35        140          90 13          98        70 high risk
## 3    29         90          70  8         100        80 high risk
## 4    30        140          85  7          98        70 high risk
## 5    35        120          60  6.1        98        76 low risk 
## 6    23        140          80  7.01       98        70 high risk

maternal_risk$RiskLevel<-as.factor(maternal_risk$RiskLevel)
names(maternal_risk)<-str_replace_all(names(maternal_risk),"\\s","_")
#maternal_risk$RiskLevel[maternal_risk$RiskLevel=='high risk']<-1
#maternal_risk$RiskLevel[maternal_risk$RiskLevel=='mid risk']<-2
#maternal_risk$RiskLevel[maternal_risk$RiskLevel=='low risk']<-3
names(maternal_risk)

## [1] "Age"         "SystolicBP"  "DiastolicBP" "BS"          "BodyTemp"   
## [6] "HeartRate"   "RiskLevel"

str(maternal_risk)

## spc_tbl_ [1,014 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Age        : num [1:1014] 25 35 29 30 35 23 23 35 32 42 ...
##  $ SystolicBP : num [1:1014] 130 140 90 140 120 140 130 85 120 130 ...
##  $ DiastolicBP: num [1:1014] 80 90 70 85 60 80 70 60 90 80 ...
##  $ BS         : num [1:1014] 15 13 8 7 6.1 7.01 7.01 11 6.9 18 ...
##  $ BodyTemp   : num [1:1014] 98 98 100 98 98 98 98 102 98 98 ...
##  $ HeartRate  : num [1:1014] 86 70 80 70 76 70 78 86 70 70 ...
##  $ RiskLevel  : Factor w/ 3 levels "high risk","low risk",..: 1 1 1 1 2 1 3 1 3 1 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Age = col_double(),
##   ..   SystolicBP = col_double(),
##   ..   DiastolicBP = col_double(),
##   ..   BS = col_double(),
##   ..   BodyTemp = col_double(),
##   ..   HeartRate = col_double(),
##   ..   RiskLevel = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>

mymodel<-multinom(RiskLevel~ Age+SystolicBP+DiastolicBP+BS+BodyTemp+HeartRate,data = maternal_risk)

## # weights:  24 (14 variable)
## initial  value 1113.992861 
## iter  10 value 919.635048
## iter  20 value 817.505685
## iter  30 value 780.847881
## iter  40 value 780.842087
## final  value 780.838740 
## converged

summary(mymodel)

## Call:
## multinom(formula = RiskLevel ~ Age + SystolicBP + DiastolicBP + 
##     BS + BodyTemp + HeartRate, data = maternal_risk)
## 
## Coefficients:
##          (Intercept)        Age  SystolicBP DiastolicBP         BS   BodyTemp
## low risk    100.0386 0.01939081 -0.06613182 -0.02202248 -0.7487849 -0.8074419
## mid risk     52.1469 0.01653015 -0.00778542 -0.05880622 -0.4038215 -0.4035825
##            HeartRate
## low risk -0.06495778
## mid risk -0.04168854
## 
## Std. Errors:
##           (Intercept)        Age  SystolicBP DiastolicBP         BS   BodyTemp
## low risk 0.0001341594 0.01089657 0.010765949  0.01358323 0.08027454 0.01507466
## mid risk 0.0001270224 0.01025968 0.009867095  0.01263257 0.04493161 0.01312267
##           HeartRate
## low risk 0.01456100
## mid risk 0.01313263
## 
## Residual Deviance: 1561.677 
## AIC: 1589.677

prediction

head(predict(mymodel,maternal_risk))

## [1] high risk high risk low risk  mid risk  mid risk  mid risk 
## Levels: high risk low risk mid risk

prediction in terms of probability

head(predict(mymodel,maternal_risk,type="prob"))

##    high risk    low risk   mid risk
## 1 0.91038118 0.002803312 0.08681551
## 2 0.78607232 0.015382745 0.19854494
## 3 0.15001274 0.486422100 0.36356516
## 4 0.15895036 0.281647825 0.55940182
## 5 0.02522402 0.425887276 0.54888870
## 6 0.14485164 0.248306646 0.60684171

please calculate probability of 4th,10th and 18th patients

solution

predict(mymodel,maternal_risk[c(4,10,18),],type="prob")

##   high risk    low risk  mid risk
## 1 0.1589504 0.281647825 0.5594018
## 2 0.9306159 0.001191721 0.0681924
## 3 0.4033055 0.241628846 0.3550657

confusion matrix

cm<-table(predict(mymodel),maternal_risk$RiskLevel)
cm

##            
##             high risk low risk mid risk
##   high risk       198       13       54
##   low risk         14      321      164
##   mid risk         60       72      118

accuracy of the model in percentage

(sum(diag(cm))/sum(cm))*100

## [1] 62.82051

two tailed z test

z<-summary(mymodel)$coefficients/summary(mymodel)$std.errors
p<-(1-pnorm(abs(z),0,1))*2
p

## numeric(0)

decision tree

library(party)

## Loading required package: grid

## Loading required package: mvtnorm

## Loading required package: modeltools

## Loading required package: stats4

## Loading required package: strucchange

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## Loading required package: sandwich

## 
## Attaching package: 'strucchange'

## The following object is masked from 'package:stringr':
## 
##     boundary

## 
## Attaching package: 'party'

## The following object is masked from 'package:dplyr':
## 
##     where

tree<-ctree(RiskLevel~ Age+SystolicBP+DiastolicBP+BS+BodyTemp+HeartRate,data = maternal_risk,controls=ctree_control(mincriterion=0.99,minsplit=500))
plot(tree)

predict

head(predict(tree,maternal_risk,type='prob'))

## [[1]]
## [1] 0.75285171 0.03041825 0.21673004
## 
## [[2]]
## [1] 0.75285171 0.03041825 0.21673004
## 
## [[3]]
## [1] 0.75285171 0.03041825 0.21673004
## 
## [[4]]
## [1] 0.8888889 0.0000000 0.1111111
## 
## [[5]]
## [1] 0.02 0.54 0.44
## 
## [[6]]
## [1] 0.8888889 0.0000000 0.1111111

head(predict(tree,maternal_risk))

## [1] high risk high risk high risk high risk low risk  high risk
## Levels: high risk low risk mid risk

decision tree with rpart

library(rpart)
tree1<-rpart(RiskLevel~ Age+SystolicBP+DiastolicBP+BS+BodyTemp+HeartRate,data = maternal_risk)
library(rpart.plot)
rpart.plot(tree1,extra = 1)

rpart.plot(tree1,extra = 6)

## Warning: extra=6 but the response has 3 levels (only the 2nd level is
## displayed)

rpart.plot(tree1,extra = 10)

## Warning: extra=10 but the response has 3 levels (only the 2nd level is
## displayed)

prediction

head(predict(tree1,maternal_risk))

##    high risk   low risk  mid risk
## 1 0.75285171 0.03041825 0.2167300
## 2 0.75285171 0.03041825 0.2167300
## 3 0.75285171 0.03041825 0.2167300
## 4 0.88888889 0.00000000 0.1111111
## 5 0.08474576 0.55932203 0.3559322
## 6 0.88888889 0.00000000 0.1111111

confusion matrix

tab<-table(predict(tree),maternal_risk$RiskLevel)
tab

##            
##             high risk low risk mid risk
##   high risk       238        8       62
##   low risk          9      359      170
##   mid risk         25       39      104

accuracy of the model

(sum(diag(tab))/sum(tab))*100

## [1] 69.13215

random forest

library(randomForest)

## randomForest 4.7-1.1

## Type rfNews() to see new features/changes/bug fixes.

## 
## Attaching package: 'randomForest'

## The following object is masked from 'package:dplyr':
## 
##     combine

## The following object is masked from 'package:ggplot2':
## 
##     margin

## The following object is masked from 'package:psych':
## 
##     outlier

rf<-randomForest(RiskLevel~ Age+SystolicBP+DiastolicBP+BS+BodyTemp+HeartRate,data = maternal_risk)
rf

## 
## Call:
##  randomForest(formula = RiskLevel ~ Age + SystolicBP + DiastolicBP +      BS + BodyTemp + HeartRate, data = maternal_risk) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 2
## 
##         OOB estimate of  error rate: 14.6%
## Confusion matrix:
##           high risk low risk mid risk class.error
## high risk       248       10       14  0.08823529
## low risk          8      342       56  0.15763547
## mid risk         19       41      276  0.17857143

confusion matrix

tab<-table(predict(rf),maternal_risk$RiskLevel)
tab

##            
##             high risk low risk mid risk
##   high risk       248        8       19
##   low risk         10      342       41
##   mid risk         14       56      276

accuracy of the model ```

SUPPORT_VECTOR_MACHINE"

library(ggplot2)
str(iris)

## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

qplot(Petal.Length,Petal.Width,data=iris,color=Species)

## Warning: `qplot()` was deprecated in ggplot2 3.4.0.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

MODEL

library(e1071)
m1<-svm(Species~.,data=iris)
summary(m1)

## 
## Call:
## svm(formula = Species ~ ., data = iris)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1 
## 
## Number of Support Vectors:  51
## 
##  ( 8 22 21 )
## 
## 
## Number of Classes:  3 
## 
## Levels: 
##  setosa versicolor virginica

plot

plot(m1,data=iris,Petal.Width~Petal.Length,slice=list(Sepal.Width=3,Sepal.Length=4))

confusion matrix

pred<-predict(m1,iris)
table(pred,iris$Species)

##             
## pred         setosa versicolor virginica
##   setosa         50          0         0
##   versicolor      0         48         2
##   virginica       0          2        48

tab1<-table(predicted=pred,Actual=iris$Species)
tab1

##             Actual
## predicted    setosa versicolor virginica
##   setosa         50          0         0
##   versicolor      0         48         2
##   virginica       0          2        48

misclassification

1-sum(diag(tab1))/sum(tab1)

## [1] 0.02666667

model2

m2<-svm(Species~.,data=iris,Kernel="Linear")

summary

summary(m2)

## 
## Call:
## svm(formula = Species ~ ., data = iris, Kernel = "Linear")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1 
## 
## Number of Support Vectors:  51
## 
##  ( 8 22 21 )
## 
## 
## Number of Classes:  3 
## 
## Levels: 
##  setosa versicolor virginica

plot

plot(m2,data=iris,Petal.Width~Petal.Length,slice=list(Sepal.Width=3,Sepal.Width=3,Sepal.Length=4))

confusio matrix

pred<-predict(m2,iris)
tab2<-table(predicted=pred,Actual=iris$Species)
tab2

##             Actual
## predicted    setosa versicolor virginica
##   setosa         50          0         0
##   versicolor      0         48         2
##   virginica       0          2        48

misclassification

1-sum(diag(tab2))/sum(tab2)

## [1] 0.02666667

fine tune—-hyper-parameter optimization

set.seed(123)
t<-tune(svm,Species~.,data=iris,ranges = list(epsilon=seq(0,1,0.1),cost=2^(2:9)))
plot(t)

summary

summary(t)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  epsilon cost
##        0    4
## 
## - best performance: 0.04 
## 
## - Detailed performance results:
##    epsilon cost      error dispersion
## 1      0.0    4 0.04000000 0.04661373
## 2      0.1    4 0.04000000 0.04661373
## 3      0.2    4 0.04000000 0.04661373
## 4      0.3    4 0.04000000 0.04661373
## 5      0.4    4 0.04000000 0.04661373
## 6      0.5    4 0.04000000 0.04661373
## 7      0.6    4 0.04000000 0.04661373
## 8      0.7    4 0.04000000 0.04661373
## 9      0.8    4 0.04000000 0.04661373
## 10     0.9    4 0.04000000 0.04661373
## 11     1.0    4 0.04000000 0.04661373
## 12     0.0    8 0.04666667 0.06324555
## 13     0.1    8 0.04666667 0.06324555
## 14     0.2    8 0.04666667 0.06324555
## 15     0.3    8 0.04666667 0.06324555
## 16     0.4    8 0.04666667 0.06324555
## 17     0.5    8 0.04666667 0.06324555
## 18     0.6    8 0.04666667 0.06324555
## 19     0.7    8 0.04666667 0.06324555
## 20     0.8    8 0.04666667 0.06324555
## 21     0.9    8 0.04666667 0.06324555
## 22     1.0    8 0.04666667 0.06324555
## 23     0.0   16 0.04666667 0.04499657
## 24     0.1   16 0.04666667 0.04499657
## 25     0.2   16 0.04666667 0.04499657
## 26     0.3   16 0.04666667 0.04499657
## 27     0.4   16 0.04666667 0.04499657
## 28     0.5   16 0.04666667 0.04499657
## 29     0.6   16 0.04666667 0.04499657
## 30     0.7   16 0.04666667 0.04499657
## 31     0.8   16 0.04666667 0.04499657
## 32     0.9   16 0.04666667 0.04499657
## 33     1.0   16 0.04666667 0.04499657
## 34     0.0   32 0.04666667 0.04499657
## 35     0.1   32 0.04666667 0.04499657
## 36     0.2   32 0.04666667 0.04499657
## 37     0.3   32 0.04666667 0.04499657
## 38     0.4   32 0.04666667 0.04499657
## 39     0.5   32 0.04666667 0.04499657
## 40     0.6   32 0.04666667 0.04499657
## 41     0.7   32 0.04666667 0.04499657
## 42     0.8   32 0.04666667 0.04499657
## 43     0.9   32 0.04666667 0.04499657
## 44     1.0   32 0.04666667 0.04499657
## 45     0.0   64 0.05333333 0.06126244
## 46     0.1   64 0.05333333 0.06126244
## 47     0.2   64 0.05333333 0.06126244
## 48     0.3   64 0.05333333 0.06126244
## 49     0.4   64 0.05333333 0.06126244
## 50     0.5   64 0.05333333 0.06126244
## 51     0.6   64 0.05333333 0.06126244
## 52     0.7   64 0.05333333 0.06126244
## 53     0.8   64 0.05333333 0.06126244
## 54     0.9   64 0.05333333 0.06126244
## 55     1.0   64 0.05333333 0.06126244
## 56     0.0  128 0.06000000 0.05837300
## 57     0.1  128 0.06000000 0.05837300
## 58     0.2  128 0.06000000 0.05837300
## 59     0.3  128 0.06000000 0.05837300
## 60     0.4  128 0.06000000 0.05837300
## 61     0.5  128 0.06000000 0.05837300
## 62     0.6  128 0.06000000 0.05837300
## 63     0.7  128 0.06000000 0.05837300
## 64     0.8  128 0.06000000 0.05837300
## 65     0.9  128 0.06000000 0.05837300
## 66     1.0  128 0.06000000 0.05837300
## 67     0.0  256 0.06666667 0.06285394
## 68     0.1  256 0.06666667 0.06285394
## 69     0.2  256 0.06666667 0.06285394
## 70     0.3  256 0.06666667 0.06285394
## 71     0.4  256 0.06666667 0.06285394
## 72     0.5  256 0.06666667 0.06285394
## 73     0.6  256 0.06666667 0.06285394
## 74     0.7  256 0.06666667 0.06285394
## 75     0.8  256 0.06666667 0.06285394
## 76     0.9  256 0.06666667 0.06285394
## 77     1.0  256 0.06666667 0.06285394
## 78     0.0  512 0.07333333 0.06629526
## 79     0.1  512 0.07333333 0.06629526
## 80     0.2  512 0.07333333 0.06629526
## 81     0.3  512 0.07333333 0.06629526
## 82     0.4  512 0.07333333 0.06629526
## 83     0.5  512 0.07333333 0.06629526
## 84     0.6  512 0.07333333 0.06629526
## 85     0.7  512 0.07333333 0.06629526
## 86     0.8  512 0.07333333 0.06629526
## 87     0.9  512 0.07333333 0.06629526
## 88     1.0  512 0.07333333 0.06629526

BEST MODEL

model<-t$best.model
summary(model)

## 
## Call:
## best.tune(METHOD = svm, train.x = Species ~ ., data = iris, ranges = list(epsilon = seq(0, 
##     1, 0.1), cost = 2^(2:9)))
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  4 
## 
## Number of Support Vectors:  37
## 
##  ( 6 17 14 )
## 
## 
## Number of Classes:  3 
## 
## Levels: 
##  setosa versicolor virginica

plot

plot(model,data=iris,Petal.Width~Petal.Length,slice=list(Sepal.Width=3,Sepal.Length=4))

confusion matrix

pred<-predict(model,iris)
tab3<-table(predicted=pred,Actual=iris$Species)
tab3

##             Actual
## predicted    setosa versicolor virginica
##   setosa         50          0         0
##   versicolor      0         48         0
##   virginica       0          2        50

misclassification

1-sum(diag(tab3))/sum(tab3)

## [1] 0.01333333

LINEAR_DISCRIMINANT_ANALYSIS

library(psych)
pairs.panels(iris[1:4],gap=0,bg=c("red","green","blue")[iris$Species],psc=21)

## Warning in plot.window(...): "psc" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter

## Warning in title(...): "psc" is not a graphical parameter

## Warning in plot.window(...): "psc" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter

## Warning in title(...): "psc" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "psc" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "psc" is not a graphical
## parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "psc" is not a graphical
## parameter

## Warning in plot.window(...): "psc" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter

## Warning in title(...): "psc" is not a graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "psc" is not a graphical
## parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "psc" is not a graphical
## parameter

## Warning in plot.window(...): "psc" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter

## Warning in title(...): "psc" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "psc" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "psc" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "psc" is not a graphical
## parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "psc" is not a graphical
## parameter

## Warning in plot.window(...): "psc" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter

## Warning in title(...): "psc" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "psc" is not a
## graphical parameter

## Warning in plot.window(...): "psc" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter

## Warning in title(...): "psc" is not a graphical parameter

## Warning in plot.window(...): "psc" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter

## Warning in title(...): "psc" is not a graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "psc" is not a graphical
## parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "psc" is not a graphical
## parameter

## Warning in plot.window(...): "psc" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter

## Warning in title(...): "psc" is not a graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "psc" is not a graphical
## parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "psc" is not a graphical
## parameter

## Warning in plot.window(...): "psc" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter

## Warning in title(...): "psc" is not a graphical parameter

## Warning in plot.window(...): "psc" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter

## Warning in title(...): "psc" is not a graphical parameter

## Warning in plot.window(...): "psc" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter

## Warning in title(...): "psc" is not a graphical parameter

## Warning in plot.window(...): "psc" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter

## Warning in title(...): "psc" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "psc" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "psc" is not a graphical
## parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "psc" is not a graphical
## parameter

## Warning in plot.window(...): "psc" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter

## Warning in title(...): "psc" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "psc" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "psc" is not a
## graphical parameter

## Warning in plot.window(...): "psc" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter

## Warning in title(...): "psc" is not a graphical parameter

## Warning in plot.window(...): "psc" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter

## Warning in title(...): "psc" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "psc" is not a
## graphical parameter

## Warning in plot.window(...): "psc" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "psc" is not a graphical parameter

## Warning in title(...): "psc" is not a graphical parameter

PARTITIONING

set.seed(555)
ind<-sample(2,nrow(iris),replace = TRUE,prob = c(0.8,0.2))
train<-iris[ind==1,]
test<-iris[ind==2,]

model

library(MASS)

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:dplyr':
## 
##     select

linear<-lda(Species~.,train)
linear

## Call:
## lda(Species ~ ., data = train)
## 
## Prior probabilities of groups:
##     setosa versicolor  virginica 
##  0.3559322  0.3220339  0.3220339 
## 
## Group means:
##            Sepal.Length Sepal.Width Petal.Length Petal.Width
## setosa         4.983333    3.407143     1.464286    0.252381
## versicolor     5.955263    2.755263     4.273684    1.334211
## virginica      6.618421    2.965789     5.563158    2.013158
## 
## Coefficients of linear discriminants:
##                    LD1        LD2
## Sepal.Length  1.143465  0.2050828
## Sepal.Width   1.251756 -2.3225224
## Petal.Length -2.434279  0.6721985
## Petal.Width  -2.713041 -2.5114909
## 
## Proportion of trace:
##    LD1    LD2 
## 0.9925 0.0075

attributes

attributes(linear)

## $names
##  [1] "prior"   "counts"  "means"   "scaling" "lev"     "svd"     "N"      
##  [8] "call"    "terms"   "xlevels"
## 
## $class
## [1] "lda"

linear$count

##     setosa versicolor  virginica 
##         42         38         38

linear$prior

##     setosa versicolor  virginica 
##  0.3559322  0.3220339  0.3220339

linear$scaling

##                    LD1        LD2
## Sepal.Length  1.143465  0.2050828
## Sepal.Width   1.251756 -2.3225224
## Petal.Length -2.434279  0.6721985
## Petal.Width  -2.713041 -2.5114909


stacked histogram of Discriminant Function values

```r
p<-predict(linear,train)
p

## $class
##   [1] setosa     setosa     setosa     setosa     setosa     setosa    
##   [7] setosa     setosa     setosa     setosa     setosa     setosa    
##  [13] setosa     setosa     setosa     setosa     setosa     setosa    
##  [19] setosa     setosa     setosa     setosa     setosa     setosa    
##  [25] setosa     setosa     setosa     setosa     setosa     setosa    
##  [31] setosa     setosa     setosa     setosa     setosa     setosa    
##  [37] setosa     setosa     setosa     setosa     setosa     setosa    
##  [43] versicolor versicolor versicolor versicolor versicolor versicolor
##  [49] versicolor versicolor versicolor versicolor versicolor versicolor
##  [55] versicolor versicolor versicolor versicolor versicolor versicolor
##  [61] virginica  versicolor versicolor versicolor versicolor versicolor
##  [67] versicolor versicolor versicolor versicolor versicolor versicolor
##  [73] versicolor versicolor versicolor versicolor versicolor versicolor
##  [79] versicolor versicolor virginica  virginica  virginica  virginica 
##  [85] virginica  virginica  virginica  virginica  virginica  virginica 
##  [91] virginica  virginica  virginica  virginica  virginica  virginica 
##  [97] virginica  virginica  virginica  virginica  virginica  virginica 
## [103] virginica  virginica  virginica  versicolor virginica  virginica 
## [109] virginica  virginica  virginica  virginica  virginica  virginica 
## [115] virginica  virginica  virginica  virginica 
## Levels: setosa versicolor virginica
## 
## $posterior
##           setosa   versicolor    virginica
## 1   1.000000e+00 9.570171e-23 3.467391e-43
## 3   1.000000e+00 4.148630e-20 8.077736e-40
## 4   1.000000e+00 5.508665e-17 1.259714e-35
## 5   1.000000e+00 6.905823e-23 3.239328e-43
## 6   1.000000e+00 1.989014e-21 1.584854e-40
## 8   1.000000e+00 1.191699e-20 2.095622e-40
## 9   1.000000e+00 7.044587e-16 2.678402e-34
## 12  1.000000e+00 1.070802e-18 1.183246e-37
## 13  1.000000e+00 2.353548e-19 2.700003e-39
## 14  1.000000e+00 4.000902e-20 3.520246e-40
## 16  1.000000e+00 6.787907e-28 7.388311e-49
## 17  1.000000e+00 1.538362e-25 3.575203e-46
## 18  1.000000e+00 1.002833e-21 1.434635e-41
## 19  1.000000e+00 3.121659e-23 1.873737e-43
## 21  1.000000e+00 1.908362e-20 2.913916e-40
## 22  1.000000e+00 6.924301e-21 6.108942e-40
## 23  1.000000e+00 3.792552e-25 3.499021e-46
## 24  1.000000e+00 2.160663e-15 1.057841e-32
## 25  1.000000e+00 1.298356e-15 2.032785e-33
## 26  1.000000e+00 3.327927e-17 3.399044e-36
## 27  1.000000e+00 1.395337e-17 9.256824e-36
## 29  1.000000e+00 1.326246e-22 3.711511e-43
## 30  1.000000e+00 5.030246e-17 1.387733e-35
## 31  1.000000e+00 6.970974e-17 1.485436e-35
## 32  1.000000e+00 1.842844e-20 7.492207e-40
## 33  1.000000e+00 7.916134e-27 2.933146e-48
## 34  1.000000e+00 7.905464e-29 9.176394e-51
## 35  1.000000e+00 2.252033e-18 1.230656e-37
## 36  1.000000e+00 1.590517e-22 3.058317e-43
## 37  1.000000e+00 1.263948e-25 2.806406e-47
## 38  1.000000e+00 1.913068e-23 3.662364e-44
## 39  1.000000e+00 1.642225e-17 2.073054e-36
## 40  1.000000e+00 4.105279e-21 4.479896e-41
## 41  1.000000e+00 2.729968e-22 2.600843e-42
## 42  1.000000e+00 1.010682e-12 1.446954e-30
## 43  1.000000e+00 1.014790e-18 8.268463e-38
## 44  1.000000e+00 3.808628e-16 3.164798e-33
## 45  1.000000e+00 2.225498e-17 5.408310e-35
## 46  1.000000e+00 2.584287e-17 4.622119e-36
## 48  1.000000e+00 1.284173e-18 9.750050e-38
## 49  1.000000e+00 7.483600e-24 1.630794e-44
## 50  1.000000e+00 4.495726e-21 4.066621e-41
## 51  6.837393e-18 9.999464e-01 5.356221e-05
## 52  1.239301e-19 9.993679e-01 6.321340e-04
## 53  4.902141e-22 9.975222e-01 2.477847e-03
## 54  4.647975e-22 9.997431e-01 2.569214e-04
## 55  1.286113e-22 9.977254e-01 2.274627e-03
## 56  2.985366e-23 9.972594e-01 2.740564e-03
## 57  1.415505e-22 9.814186e-01 1.858141e-02
## 58  5.638331e-14 9.999999e-01 1.209049e-07
## 60  5.061565e-21 9.992694e-01 7.305519e-04
## 61  5.496318e-18 9.999989e-01 1.054475e-06
## 62  4.504370e-20 9.992490e-01 7.510365e-04
## 63  2.740898e-17 9.999995e-01 4.781694e-07
## 64  7.122616e-24 9.924874e-01 7.512623e-03
## 65  7.395146e-14 9.999987e-01 1.250812e-06
## 67  1.455021e-24 9.573374e-01 4.266264e-02
## 68  3.213591e-16 9.999990e-01 1.005720e-06
## 69  1.306777e-26 9.855505e-01 1.444951e-02
## 70  2.557212e-17 9.999973e-01 2.728496e-06
## 71  6.164746e-29 1.429533e-01 8.570467e-01
## 72  2.930603e-16 9.999951e-01 4.912004e-06
## 73  1.668151e-28 8.607639e-01 1.392361e-01
## 74  1.957680e-22 9.993960e-01 6.039622e-04
## 75  2.378311e-17 9.999866e-01 1.336160e-05
## 77  2.903408e-22 9.991927e-01 8.072534e-04
## 78  9.503086e-27 7.675201e-01 2.324799e-01
## 79  2.660810e-23 9.918416e-01 8.158404e-03
## 80  4.045835e-11 1.000000e+00 1.004847e-08
## 87  6.624103e-21 9.988996e-01 1.100428e-03
## 88  1.812604e-22 9.998063e-01 1.937170e-04
## 89  2.157571e-18 9.999166e-01 8.336390e-05
## 90  7.522454e-21 9.998342e-01 1.658480e-04
## 91  2.450117e-23 9.988442e-01 1.155832e-03
## 92  3.070771e-22 9.974931e-01 2.506885e-03
## 93  7.757766e-18 9.999919e-01 8.065305e-06
## 95  3.107085e-21 9.996111e-01 3.888813e-04
## 98  2.822355e-18 9.999653e-01 3.469698e-05
## 99  2.211578e-10 1.000000e+00 1.042401e-08
## 100 3.870206e-19 9.999199e-01 8.014712e-05
## 103 1.317172e-42 3.220037e-05 9.999678e-01
## 105 7.840059e-47 1.126925e-06 9.999989e-01
## 106 3.874149e-49 7.203430e-07 9.999993e-01
## 107 1.269734e-34 1.678549e-02 9.832145e-01
## 109 1.596966e-42 2.380285e-04 9.997620e-01
## 110 4.976922e-47 1.355850e-07 9.999999e-01
## 111 2.529196e-32 1.305030e-02 9.869497e-01
## 112 1.079614e-37 1.848637e-03 9.981514e-01
## 114 3.654120e-41 1.519895e-04 9.998480e-01
## 115 2.838221e-46 8.031545e-07 9.999992e-01
## 117 3.931949e-36 3.866017e-03 9.961340e-01
## 118 6.707402e-46 6.999584e-07 9.999993e-01
## 119 9.802945e-59 2.189326e-09 1.000000e+00
## 120 2.744397e-33 2.404620e-01 7.595380e-01
## 121 5.869706e-43 7.215299e-06 9.999928e-01
## 122 2.529668e-38 4.399863e-04 9.995600e-01
## 123 1.159049e-49 1.222596e-06 9.999988e-01
## 124 3.669876e-31 1.346388e-01 8.653612e-01
## 125 2.299196e-40 5.391577e-05 9.999461e-01
## 126 4.231977e-37 2.041033e-03 9.979590e-01
## 128 2.181013e-30 1.035716e-01 8.964284e-01
## 130 1.720483e-32 1.075489e-01 8.924511e-01
## 131 1.382413e-41 2.315330e-04 9.997685e-01
## 132 4.314830e-37 4.013577e-04 9.995986e-01
## 133 4.450738e-46 2.643006e-06 9.999974e-01
## 134 7.440809e-29 6.706163e-01 3.293837e-01
## 135 1.454401e-36 2.351838e-02 9.764816e-01
## 138 4.212482e-36 2.988751e-03 9.970112e-01
## 139 1.143640e-29 1.478430e-01 8.521570e-01
## 140 3.443215e-36 1.278875e-03 9.987211e-01
## 141 3.340946e-45 1.368035e-06 9.999986e-01
## 142 3.455844e-35 1.162388e-03 9.988376e-01
## 143 6.868742e-39 6.188908e-04 9.993811e-01
## 144 1.884646e-46 7.646829e-07 9.999992e-01
## 145 7.845946e-47 2.218301e-07 9.999998e-01
## 146 1.223600e-38 1.487672e-04 9.998512e-01
## 147 1.567893e-35 1.040148e-02 9.895985e-01
## 149 5.158270e-42 5.612656e-06 9.999944e-01
## 
## $x
##            LD1          LD2
## 1    7.9280301 -0.289614784
## 3    7.3385453  0.257888972
## 4    6.6121675  0.604072636
## 5    7.9388592 -0.542375299
## 6    7.4988801 -1.457737530
## 8    7.4450802 -0.010650971
## 9    6.3765512  0.960340708
## 12   6.9729593  0.015552326
## 13   7.2304168  1.061270666
## 14   7.3889680  0.757069727
## 16   8.9546532 -2.691913594
## 17   8.4725916 -1.726616936
## 18   7.6567260 -0.540763873
## 19   7.9880481 -0.912811372
## 21   7.4156104  0.205821840
## 22   7.3923452 -1.189197587
## 23   8.4551847 -0.893287812
## 24   6.1334831 -0.376898019
## 25   6.2426756  0.217211880
## 26   6.7009500  0.985577834
## 27   6.6590442 -0.445729297
## 29   7.9172010 -0.036854268
## 30   6.6082616  0.459548526
## 31   6.5974326  0.712309042
## 32   7.3598580 -0.430916041
## 33   8.8213062 -1.344250997
## 34   9.2616450 -1.833347345
## 35   6.9552070  0.665597467
## 36   7.9250127  0.252193952
## 37   8.6288440 -0.274801527
## 38   8.0958167 -0.311734487
## 39   6.7451547  0.660868618
## 40   7.5594267  0.009857306
## 41   7.7858074 -0.628492001
## 42   5.7119682  2.055993476
## 43   6.9955058  0.196364141
## 44   6.2416116 -1.180279713
## 45   6.5438093 -1.152570420
## 46   6.6878087  0.558972489
## 48   6.9807709  0.304600547
## 49   8.1636463 -0.645882855
## 50   7.5633325  0.154381416
## 51  -1.5636819  0.001265223
## 52  -2.0342092 -0.507373231
## 53  -2.5613638  0.096299798
## 54  -2.4301602  1.564521343
## 55  -2.6639929  0.509363851
## 56  -2.7927288  0.780375962
## 57  -2.7815400 -0.876843132
## 58  -0.4731563  1.492127750
## 60  -2.3003737  0.255618618
## 61  -1.3463678  2.576084683
## 62  -2.1260092 -0.347069693
## 63  -1.1696911  2.652762233
## 64  -2.9683271  0.513447445
## 65  -0.5910488 -0.077363216
## 67  -3.1993323 -0.206934970
## 68  -1.0159342  1.517704338
## 69  -3.5146579  1.774132600
## 70  -1.2794266  1.555603470
## 71  -4.1501375 -1.161702328
## 72  -1.1182034  0.526309813
## 73  -3.9984962  1.366763567
## 74  -2.5508945  1.247997861
## 75  -1.3802720  0.557241959
## 77  -2.5365051  0.956477474
## 78  -3.7012684 -0.147542844
## 79  -2.8671219  0.107350377
## 80   0.2051110  1.326129192
## 87  -2.3032010 -0.079156459
## 88  -2.4890998  1.997466965
## 89  -1.6830127  0.026483802
## 90  -2.1798091  1.100016866
## 91  -2.7570410  1.387793122
## 92  -2.5997237  0.213975355
## 93  -1.4402900  1.180438548
## 95  -2.3019673  0.790460369
## 98  -1.6089650  0.516225406
## 99   0.3396918  0.848083423
## 100 -1.8190173  0.511496556
## 103 -6.5199495 -0.465127429
## 105 -7.2339047 -0.906546031
## 106 -7.6522122  0.107952915
## 107 -5.1682437  0.308470106
## 109 -6.5458733  1.300328070
## 110 -7.2266217 -2.708289235
## 111 -4.7369503 -1.339291290
## 112 -5.6927263  0.187050417
## 114 -6.2845233  0.055188312
## 115 -7.1232943 -1.558436630
## 117 -5.4184048 -0.103609230
## 118 -7.0511932 -1.913485953
## 119 -9.3114598  0.756831522
## 120 -4.9604903  2.069215302
## 121 -6.5540438 -1.607386340
## 122 -5.7799153 -0.729296532
## 123 -7.7603407  0.911334609
## 124 -4.5620572  0.148811823
## 125 -6.1149531 -1.378356955
## 126 -5.5847676 -0.088456511
## 128 -4.4152235 -0.588961446
## 130 -4.8056549  0.743906440
## 131 -6.3715088  0.697639759
## 132 -5.5496084 -1.571830775
## 133 -7.1117466 -0.596989534
## 134 -4.1098253  0.804446554
## 135 -5.5347047  1.815182822
## 138 -5.4075758 -0.356369746
## 139 -4.2861422 -0.676689574
## 140 -5.4063276 -1.074495478
## 141 -6.9357885 -1.734519596
## 142 -5.2186520 -1.778453210
## 143 -5.8919495 -0.070438948
## 144 -7.1552460 -1.493454915
## 145 -7.2001693 -2.382953310
## 146 -5.8159485 -1.519997674
## 147 -5.3271403  0.429387063
## 149 -6.3738345 -2.417108310

histogram

c<-ldahist(data=p$x[,1],g=train$Species)
c

biplot

library(devtools)

## Loading required package: usethis

library(ggord)
ggord(linear,train$Species)

library(klaR)
partimat(Species~.,data=train,method="lda")

partimat(Species~.,data=train,method="qda")

confusion matrix

p1<-predict(linear,train)$class
tab<-table(predicted=p1,Actual=train$Species)
tab

##             Actual
## predicted    setosa versicolor virginica
##   setosa         42          0         0
##   versicolor      0         37         1
##   virginica       0          1        37

accuracy

sum(diag(tab))/sum(tab)

## [1] 0.9830508

FOR TEST DATA

p2<-predict(linear,test)$class
t2<-table(predicted=p2,Actual=test$Species)
t2

##             Actual
## predicted    setosa versicolor virginica
##   setosa          8          0         0
##   versicolor      0         11         0
##   virginica       0          1        12

accuracy

sum(diag(t2))/sum(t2)

## [1] 0.96875

BINARY LOGISTIC REGRESSION

telecom_churn <- read_csv("C:/Users/USER/Desktop/telecom_churn.csv")

## Rows: 3333 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (11): Churn, AccountWeeks, ContractRenewal, DataPlan, DataUsage, CustSer...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

churn<-telecom_churn
churn$Churn<-as.factor(churn$Churn)
names(churn)

##  [1] "Churn"           "AccountWeeks"    "ContractRenewal" "DataPlan"       
##  [5] "DataUsage"       "CustServCalls"   "DayMins"         "DayCalls"       
##  [9] "MonthlyCharge"   "OverageFee"      "RoamMins"

str(churn)

## spc_tbl_ [3,333 × 11] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Churn          : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ AccountWeeks   : num [1:3333] 128 107 137 84 75 118 121 147 117 141 ...
##  $ ContractRenewal: num [1:3333] 1 1 1 0 0 0 1 0 1 0 ...
##  $ DataPlan       : num [1:3333] 1 1 0 0 0 0 1 0 0 1 ...
##  $ DataUsage      : num [1:3333] 2.7 3.7 0 0 0 0 2.03 0 0.19 3.02 ...
##  $ CustServCalls  : num [1:3333] 1 1 0 2 3 0 3 0 1 0 ...
##  $ DayMins        : num [1:3333] 265 162 243 299 167 ...
##  $ DayCalls       : num [1:3333] 110 123 114 71 113 98 88 79 97 84 ...
##  $ MonthlyCharge  : num [1:3333] 89 82 52 57 41 57 87.3 36 63.9 93.2 ...
##  $ OverageFee     : num [1:3333] 9.87 9.78 6.06 3.1 7.42 ...
##  $ RoamMins       : num [1:3333] 10 13.7 12.2 6.6 10.1 6.3 7.5 7.1 8.7 11.2 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Churn = col_double(),
##   ..   AccountWeeks = col_double(),
##   ..   ContractRenewal = col_double(),
##   ..   DataPlan = col_double(),
##   ..   DataUsage = col_double(),
##   ..   CustServCalls = col_double(),
##   ..   DayMins = col_double(),
##   ..   DayCalls = col_double(),
##   ..   MonthlyCharge = col_double(),
##   ..   OverageFee = col_double(),
##   ..   RoamMins = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

head(churn)

## # A tibble: 6 × 11
##   Churn AccountWeeks ContractRenewal DataPlan DataUsage CustServCalls DayMins
##   <fct>        <dbl>           <dbl>    <dbl>     <dbl>         <dbl>   <dbl>
## 1 0              128               1        1       2.7             1    265.
## 2 0              107               1        1       3.7             1    162.
## 3 0              137               1        0       0               0    243.
## 4 0               84               0        0       0               2    299.
## 5 0               75               0        0       0               3    167.
## 6 0              118               0        0       0               0    223.
## # ℹ 4 more variables: DayCalls <dbl>, MonthlyCharge <dbl>, OverageFee <dbl>,
## #   RoamMins <dbl>

DATA PARTITIONING

ind<-sample(2,nrow(churn),replace=TRUE,prob=c(0.8,0.2))
train<-churn[ind==1,]
test<-churn[ind==2,]

modelling

mymodel<-glm(Churn~MonthlyCharge+OverageFee,data=churn,family="binomial")
summary(mymodel)

## 
## Call:
## glm(formula = Churn ~ MonthlyCharge + OverageFee, family = "binomial", 
##     data = churn)
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -3.192084   0.246627 -12.943  < 2e-16 ***
## MonthlyCharge  0.008515   0.003046   2.795  0.00519 ** 
## OverageFee     0.090140   0.020466   4.404 1.06e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2758.3  on 3332  degrees of freedom
## Residual deviance: 2721.7  on 3330  degrees of freedom
## AIC: 2727.7
## 
## Number of Fisher Scoring iterations: 4

PREDICTION

p1<-predict(mymodel,train,type='response')
head(p1)

##          1          2          3          4          5          6 
## 0.17586805 0.16626793 0.09947492 0.08111335 0.10209668 0.15283956

head(train)

## # A tibble: 6 × 11
##   Churn AccountWeeks ContractRenewal DataPlan DataUsage CustServCalls DayMins
##   <fct>        <dbl>           <dbl>    <dbl>     <dbl>         <dbl>   <dbl>
## 1 0              128               1        1       2.7             1    265.
## 2 0              107               1        1       3.7             1    162.
## 3 0              137               1        0       0               0    243.
## 4 0               84               0        0       0               2    299.
## 5 0               75               0        0       0               3    167.
## 6 0              118               0        0       0               0    223.
## # ℹ 4 more variables: DayCalls <dbl>, MonthlyCharge <dbl>, OverageFee <dbl>,
## #   RoamMins <dbl>

accuracy

pred1=ifelse(p1>0.5,1,0)
table(pred1)

## pred1
##    0 
## 2679

tab1<-table(predict=pred1,actual=train$Churn)

confusion matrix

1-sum(diag(tab1))/sum(tab1)

## [1] 0.1433371

test data

p2<-predict(mymodel,test,type='response')
pred2<-ifelse(p2>0.5,1,0)
tab2<-table(predicted=pred2,actual=test$Churn)
tab2

##          actual
## predicted   0   1
##         0 555  99

misclassification error

1-sum(diag(tab2))/sum(tab2)

## [1] 0.1513761

ordered logistic regression

milknew <- read_csv("C:/Users/USER/Desktop/milknew.csv")

## Rows: 1059 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Grade
## dbl (7): pH, Temprature, Taste, Odor, Fat, Turbidity, Colour
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

milknew$Grade<-as.factor(milknew$Grade)
milknew$Taste<-as.factor(milknew$Taste)
milknew$Odor<-as.factor(milknew$Odor)
milknew$Fat<-as.factor(milknew$Fat)
milknew$Turbidity<-as.factor(milknew$Turbidity)
milknew$Colour<-as.factor(milknew$Colour)
str(milknew)

## spc_tbl_ [1,059 × 8] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ pH        : num [1:1059] 6.6 6.6 8.5 9.5 6.6 6.6 5.5 4.5 8.1 6.7 ...
##  $ Temprature: num [1:1059] 35 36 70 34 37 37 45 60 66 45 ...
##  $ Taste     : Factor w/ 2 levels "0","1": 2 1 2 2 1 2 2 1 2 2 ...
##  $ Odor      : Factor w/ 2 levels "0","1": 1 2 2 2 1 2 1 2 1 2 ...
##  $ Fat       : Factor w/ 2 levels "0","1": 2 1 2 1 1 2 2 2 2 1 ...
##  $ Turbidity : Factor w/ 2 levels "0","1": 1 2 2 2 1 2 2 2 2 1 ...
##  $ Colour    : Factor w/ 9 levels "240","245","246",..: 8 7 3 9 9 9 6 6 9 4 ...
##  $ Grade     : Factor w/ 3 levels "high","low","medium": 1 1 2 2 3 1 2 2 2 3 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   pH = col_double(),
##   ..   Temprature = col_double(),
##   ..   Taste = col_double(),
##   ..   Odor = col_double(),
##   ..   Fat = col_double(),
##   ..   Turbidity = col_double(),
##   ..   Colour = col_double(),
##   ..   Grade = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>

head(milknew)

## # A tibble: 6 × 8
##      pH Temprature Taste Odor  Fat   Turbidity Colour Grade 
##   <dbl>      <dbl> <fct> <fct> <fct> <fct>     <fct>  <fct> 
## 1   6.6         35 1     0     1     0         254    high  
## 2   6.6         36 0     1     0     1         253    high  
## 3   8.5         70 1     1     1     1         246    low   
## 4   9.5         34 1     1     0     1         255    low   
## 5   6.6         37 0     0     0     0         255    medium
## 6   6.6         37 1     1     1     1         255    high

names(milknew)

## [1] "pH"         "Temprature" "Taste"      "Odor"       "Fat"       
## [6] "Turbidity"  "Colour"     "Grade"

summary(milknew)

##        pH         Temprature    Taste   Odor    Fat     Turbidity     Colour   
##  Min.   :3.00   Min.   :34.00   0:480   0:601   0:348   0:539     255    :628  
##  1st Qu.:6.50   1st Qu.:38.00   1:579   1:458   1:711   1:520     250    :146  
##  Median :6.70   Median :41.00                                     245    :115  
##  Mean   :6.63   Mean   :44.23                                     247    : 48  
##  3rd Qu.:6.80   3rd Qu.:45.00                                     246    : 44  
##  Max.   :9.50   Max.   :90.00                                     240    : 32  
##                                                                   (Other): 46  
##     Grade    
##  high  :256  
##  low   :429  
##  medium:374  
##              
##              
##              
##

partitioning

ind<-sample(2,nrow(milknew),replace=T,prob=c(0.8,0.2))
train<-milknew[ind==1,]
test<-milknew[ind==2,]

modelling

library(MASS)
model<-polr(Grade~Temprature + pH+Taste+Odor+Fat+Turbidity+Colour,data=train)

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

summary(model)

## 
## Re-fitting to get Hessian

## Call:
## polr(formula = Grade ~ Temprature + pH + Taste + Odor + Fat + 
##     Turbidity + Colour, data = train)
## 
## Coefficients:
##                Value Std. Error    t value
## Temprature  -0.04308  8.579e-03 -5.021e+00
## pH          -0.17298  5.200e-02 -3.327e+00
## Taste1      -1.46961  1.841e-01 -7.984e+00
## Odor1       -2.98636  2.688e-01 -1.111e+01
## Fat1        -0.95355  2.009e-01 -4.746e+00
## Turbidity1   0.19053  2.710e-01  7.031e-01
## Colour245  -10.00488  2.886e-01 -3.467e+01
## Colour246  -11.61656  3.569e-01 -3.255e+01
## Colour247    1.78725  9.263e-05  1.929e+04
## Colour248   -8.86921  4.454e-01 -1.991e+01
## Colour250  -12.44678  2.247e-01 -5.540e+01
## Colour253  -33.74576  1.517e-10 -2.225e+11
## Colour254  -18.49326  3.082e-06 -6.000e+06
## Colour255  -11.57078  1.390e-01 -8.327e+01
## 
## Intercepts:
##            Value         Std. Error    t value      
## high|low   -1.901630e+01  5.527000e-01 -3.440910e+01
## low|medium -1.597480e+01  4.999000e-01 -3.195590e+01
## 
## Residual Deviance: 1212.989 
## AIC: 1244.989

p values

coef(summary(model))

## 
## Re-fitting to get Hessian

##                  Value   Std. Error       t value
## Temprature  -0.0430763 8.579486e-03 -5.020848e+00
## pH          -0.1729803 5.199521e-02 -3.326851e+00
## Taste1      -1.4696066 1.840798e-01 -7.983529e+00
## Odor1       -2.9863624 2.688305e-01 -1.110872e+01
## Fat1        -0.9535484 2.009220e-01 -4.745864e+00
## Turbidity1   0.1905287 2.709774e-01  7.031168e-01
## Colour245  -10.0048780 2.885778e-01 -3.466960e+01
## Colour246  -11.6165643 3.569180e-01 -3.254687e+01
## Colour247    1.7872452 9.263105e-05  1.929423e+04
## Colour248   -8.8692105 4.453870e-01 -1.991349e+01
## Colour250  -12.4467785 2.246867e-01 -5.539614e+01
## Colour253  -33.7457602 1.516871e-10 -2.224696e+11
## Colour254  -18.4932576 3.082265e-06 -5.999893e+06
## Colour255  -11.5707791 1.389623e-01 -8.326558e+01
## high|low   -19.0163403 5.526548e-01 -3.440907e+01
## low|medium -15.9747577 4.998998e-01 -3.195592e+01

ctable<-coef(summary(model))

## 
## Re-fitting to get Hessian

ctable

##                  Value   Std. Error       t value
## Temprature  -0.0430763 8.579486e-03 -5.020848e+00
## pH          -0.1729803 5.199521e-02 -3.326851e+00
## Taste1      -1.4696066 1.840798e-01 -7.983529e+00
## Odor1       -2.9863624 2.688305e-01 -1.110872e+01
## Fat1        -0.9535484 2.009220e-01 -4.745864e+00
## Turbidity1   0.1905287 2.709774e-01  7.031168e-01
## Colour245  -10.0048780 2.885778e-01 -3.466960e+01
## Colour246  -11.6165643 3.569180e-01 -3.254687e+01
## Colour247    1.7872452 9.263105e-05  1.929423e+04
## Colour248   -8.8692105 4.453870e-01 -1.991349e+01
## Colour250  -12.4467785 2.246867e-01 -5.539614e+01
## Colour253  -33.7457602 1.516871e-10 -2.224696e+11
## Colour254  -18.4932576 3.082265e-06 -5.999893e+06
## Colour255  -11.5707791 1.389623e-01 -8.326558e+01
## high|low   -19.0163403 5.526548e-01 -3.440907e+01
## low|medium -15.9747577 4.998998e-01 -3.195592e+01

p<-pnorm(abs(ctable[,"t value"]))*2

(ctable<-cbind(ctable,"p value "=p))

##                  Value   Std. Error       t value p value 
## Temprature  -0.0430763 8.579486e-03 -5.020848e+00 1.999999
## pH          -0.1729803 5.199521e-02 -3.326851e+00 1.999122
## Taste1      -1.4696066 1.840798e-01 -7.983529e+00 2.000000
## Odor1       -2.9863624 2.688305e-01 -1.110872e+01 2.000000
## Fat1        -0.9535484 2.009220e-01 -4.745864e+00 1.999998
## Turbidity1   0.1905287 2.709774e-01  7.031168e-01 1.518017
## Colour245  -10.0048780 2.885778e-01 -3.466960e+01 2.000000
## Colour246  -11.6165643 3.569180e-01 -3.254687e+01 2.000000
## Colour247    1.7872452 9.263105e-05  1.929423e+04 2.000000
## Colour248   -8.8692105 4.453870e-01 -1.991349e+01 2.000000
## Colour250  -12.4467785 2.246867e-01 -5.539614e+01 2.000000
## Colour253  -33.7457602 1.516871e-10 -2.224696e+11 2.000000
## Colour254  -18.4932576 3.082265e-06 -5.999893e+06 2.000000
## Colour255  -11.5707791 1.389623e-01 -8.326558e+01 2.000000
## high|low   -19.0163403 5.526548e-01 -3.440907e+01 2.000000
## low|medium -15.9747577 4.998998e-01 -3.195592e+01 2.000000

TO BE CONTINUED

MACHINE_LEARNING ALGORITHMS– A TO Z

mugo

23 October 2023