rm(list = ls())
library(dplyr)
library(tidyr)
library(stringr)
library(mlr)
library(tidyverse)
library(plyr)
library(caret)
library(gmodels)
library(ggplot2)
library(e1071)
library(caTools)
library(class)
library(GGally)
library(parallelMap)
library(parallel)
library(rpart.plot)
require(ISLR)
require(tree)
library(corrplot)
library(factoextra)
library(umap)
library(Rtsne)
This data set consists of three types of entities: (a) the specification of an auto in terms of various characteristics, (b) its assigned insurance risk rating, (c) its normalized losses in use as compared to other cars. The second rating corresponds to the degree to which the auto is more risky than its price indicates. Cars are initially assigned a risk factor symbol associated with its price. Then, if it is more risky (or less), this symbol is adjusted by moving it up (or down) the scale. Actuarians call this process “symboling”. A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.
The third factor is the relative average loss payment per insured vehicle year. This value is normalized for all autos within a particular size classification (two-door small, station wagons, sports/speciality, etc…), and represents the average loss per car per year.
Kategorinio tipo kintamieji:
Kiekybinio tipo kintamieji:
getwd()
## [1] "C:/Users/skirmantas/OneDrive/Desktop"
setwd("C:/Users/skirmantas/OneDrive/Desktop")
bands <- read.csv2("C:/Users/skirmantas/OneDrive/Desktop/Duomenys/duomenys1.csv", header = TRUE, sep = ";", dec = ".")
bands %>%
keep(is.numeric) %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram(bins=20)
Kintamieji:
Yra pašalinami, nes jie neneša naudingos informacijos ir šių kintamųjų reikšmės yra susikoncentravusios ties keliomis reiškmėmis.
Kategoriniam kintamajam yra braižoma stulpelinė diagrama.
ggplot(bands, aes(x=as.factor(bandtype) )) +
geom_bar(color="red", fill=rgb(0.7,0.4,0.5,0.6) )+ ggtitle("Band type") +
xlab("Class") + ylab("Value")
Iš grafiko galime matyti, kad duomenis nėra susikoncentravę vienoje binarinio kintamojo reikšmėje.
Nuskaitomi sutvarkyti duomenis.
getwd()
## [1] "C:/Users/skirmantas/OneDrive/Desktop"
setwd("C:/Users/skirmantas/OneDrive/Desktop")
bands <- read.csv2("C:/Users/skirmantas/OneDrive/Desktop/Duomenys/duomenys22.csv", header = TRUE, sep = ";", dec = ".")
Šalinamos praleistos reikšmės
print(anyNA.data.frame(bands))
## [1] TRUE
bands[bands == "?"] <- NA
bands <- na.omit(bands)
Duomenims priskiriamas integer duomenų tipas.
bands$bandtype <- as.factor(bands$bandtype)
bands$unitnumber <- as.integer(bands$unitnumber)
bands$press <- as.integer(bands$press)
bands$platingtank <- as.integer(bands$platingtank)
bands$proofcut <- as.integer(bands$proofcut)
bands$viscosity <- as.integer(bands$viscosity)
bands$caliper <- as.integer(bands$caliper)
bands$roughness <- as.integer(bands$roughness)
bands$bladepressure <- as.integer(bands$bladepressure)
bands$speed <- as.integer(bands$speed)
str(bands)
## 'data.frame': 205 obs. of 20 variables:
## $ press : int 824 827 827 827 815 815 815 827 827 816 ...
## $ unitnumber : int 2 9 9 2 2 9 2 2 9 9 ...
## $ platingtank : int 0 0 0 0 0 0 1 0 0 0 ...
## $ proofcut : int 30 40 50 50 37 37 35 60 52 40 ...
## $ viscosity : int 43 42 45 45 44 44 44 43 43 46 ...
## $ caliper : int 0 0 0 0 0 0 0 0 0 0 ...
## $ temperature : num 16.3 14.5 15 15.2 16 16.5 16 15.8 16.6 15.9 ...
## $ humifity : int 70 74 76 72 90 91 80 57 58 78 ...
## $ roughness : int 0 1 1 1 0 0 1 0 0 0 ...
## $ bladepressure: int 25 25 30 25 28 30 32 24 20 34 ...
## $ narnish : num 1.2 8 8 5.9 20 21.7 7 1.2 7.9 2.4 ...
## $ speed : int 2200 2100 2150 2150 2050 2050 1400 1480 1480 2000 ...
## $ ink : num 58.1 57.5 57.5 58.8 45 43.5 58.1 58.1 56.2 61 ...
## $ solvent : num 40.7 34.5 34.5 35.3 35 34.8 34.9 40.7 36 36.6 ...
## $ Voltage : num 4 0 0 0 0 0 0 0 0 0 ...
## $ hardener : num 1 0.7 1 1 0.8 0.8 1 1.7 0.7 1.3 ...
## $ durometer : int 30 35 35 35 35 35 28 33 33 35 ...
## $ density : int 40 40 40 40 40 40 40 40 40 40 ...
## $ spaceratio : num 96.9 107.4 107.4 107.4 107.4 ...
## $ bandtype : Factor w/ 2 levels "band","noband": 2 1 2 2 2 2 2 2 2 2 ...
## - attr(*, "na.action")= 'omit' Named int [1:8] 56 77 161 182 210 211 212 213
## ..- attr(*, "names")= chr [1:8] "56" "77" "161" "182" ...
smp_size <- floor(0.9 * nrow(bands))
set.seed(10)
train_ind <- sample(seq_len(nrow(bands)), size = smp_size)
adult_norm <- bands[train_ind, ]
test_adult_norm <- bands[-train_ind, ]
LDA modelis
bands <- bands[,-6]
bandsTask <- makeClassifTask(data = bands, target = "bandtype")
bands <- as_tibble(bands)
lda <- makeLearner("classif.lda")
holdout <- makeResampleDesc(method = "Holdout", split = 3/6, stratify = TRUE)
set.seed(123)
holdoutCV_lda <- resample(learner = lda, task = bandsTask, resampling = holdout, measures = list(mmce, acc))
## Resampling: holdout
## Measures: mmce acc
## [Resample] iter 1: 0.1747573 0.8252427
##
## Aggregated Result: mmce.test.mean=0.1747573,acc.test.mean=0.8252427
##
Gauti LDA modelio tikslumo ir paklaidų įverčiai parodo, kad 79 procentai duomenų yra suklasifikuojami gerai.
calculateConfusionMatrix(holdoutCV_lda$pred, relative = TRUE)
## Relative confusion matrix (normalized by row/column):
## predicted
## true band noband -err.-
## band 0.50/0.61 0.50/0.13 0.50
## noband 0.09/0.39 0.91/0.87 0.09
## -err.- 0.39 0.13 0.17
##
##
## Absolute confusion matrix:
## predicted
## true band noband -err.-
## band 11 11 11
## noband 7 74 7
## -err.- 7 11 18
Iš 103 narių 18 narių klasifikuojama blogai.
bandsTask <- makeClassifTask(data = bands, target = "bandtype")
## Warning in makeTask(type = type, data = data, weights = weights, blocking =
## blocking, : Provided data is not a pure data.frame but from class tbl_df, hence
## it will be converted.
kFold <- makeResampleDesc(method = "RepCV", folds = 10, stratify = TRUE)
set.seed(10)
kfold_ldaCV <- resample(learner = lda, task = bandsTask, resampling = kFold, measures = list(mlr::mmce, mlr::acc))
kfold_ldaCV$aggr
## mmce.test.mean acc.test.mean
## 0.1446407 0.8553593
Fold <- makeResampleDesc(method = "RepCV", folds = 10, stratify = TRUE)
set.seed(10)
kfold_ldaCV <- resample(learner = lda, task = bandsTask, resampling = kFold, measures = list(mlr::mmce, mlr::acc))
## Resampling: repeated cross-validation
## Measures: mmce acc
## [Resample] iter 1: 0.0952381 0.9047619
## [Resample] iter 2: 0.1000000 0.9000000
## [Resample] iter 3: 0.1904762 0.8095238
## [Resample] iter 4: 0.0500000 0.9500000
## [Resample] iter 5: 0.1000000 0.9000000
## [Resample] iter 6: 0.2000000 0.8000000
## [Resample] iter 7: 0.2000000 0.8000000
## [Resample] iter 8: 0.1500000 0.8500000
## [Resample] iter 9: 0.1904762 0.8095238
## [Resample] iter 10: 0.1363636 0.8636364
## [Resample] iter 11: 0.1500000 0.8500000
## [Resample] iter 12: 0.1428571 0.8571429
## [Resample] iter 13: 0.1500000 0.8500000
## [Resample] iter 14: 0.0952381 0.9047619
## [Resample] iter 15: 0.1818182 0.8181818
## [Resample] iter 16: 0.0500000 0.9500000
## [Resample] iter 17: 0.2500000 0.7500000
## [Resample] iter 18: 0.2000000 0.8000000
## [Resample] iter 19: 0.1000000 0.9000000
## [Resample] iter 20: 0.1428571 0.8571429
## [Resample] iter 21: 0.0952381 0.9047619
## [Resample] iter 22: 0.1000000 0.9000000
## [Resample] iter 23: 0.0952381 0.9047619
## [Resample] iter 24: 0.1000000 0.9000000
## [Resample] iter 25: 0.2000000 0.8000000
## [Resample] iter 26: 0.2380952 0.7619048
## [Resample] iter 27: 0.0476190 0.9523810
## [Resample] iter 28: 0.1500000 0.8500000
## [Resample] iter 29: 0.1000000 0.9000000
## [Resample] iter 30: 0.2380952 0.7619048
## [Resample] iter 31: 0.2727273 0.7272727
## [Resample] iter 32: 0.0500000 0.9500000
## [Resample] iter 33: 0.2000000 0.8000000
## [Resample] iter 34: 0.1428571 0.8571429
## [Resample] iter 35: 0.1500000 0.8500000
## [Resample] iter 36: 0.2000000 0.8000000
## [Resample] iter 37: 0.1500000 0.8500000
## [Resample] iter 38: 0.1000000 0.9000000
## [Resample] iter 39: 0.1904762 0.8095238
## [Resample] iter 40: 0.1904762 0.8095238
## [Resample] iter 41: 0.1000000 0.9000000
## [Resample] iter 42: 0.1500000 0.8500000
## [Resample] iter 43: 0.1000000 0.9000000
## [Resample] iter 44: 0.1500000 0.8500000
## [Resample] iter 45: 0.1428571 0.8571429
## [Resample] iter 46: 0.0952381 0.9047619
## [Resample] iter 47: 0.1500000 0.8500000
## [Resample] iter 48: 0.2272727 0.7727273
## [Resample] iter 49: 0.0952381 0.9047619
## [Resample] iter 50: 0.1500000 0.8500000
## [Resample] iter 51: 0.2380952 0.7619048
## [Resample] iter 52: 0.1000000 0.9000000
## [Resample] iter 53: 0.0952381 0.9047619
## [Resample] iter 54: 0.1000000 0.9000000
## [Resample] iter 55: 0.0952381 0.9047619
## [Resample] iter 56: 0.1500000 0.8500000
## [Resample] iter 57: 0.1500000 0.8500000
## [Resample] iter 58: 0.1904762 0.8095238
## [Resample] iter 59: 0.2500000 0.7500000
## [Resample] iter 60: 0.1428571 0.8571429
## [Resample] iter 61: 0.1000000 0.9000000
## [Resample] iter 62: 0.1000000 0.9000000
## [Resample] iter 63: 0.1500000 0.8500000
## [Resample] iter 64: 0.1904762 0.8095238
## [Resample] iter 65: 0.1363636 0.8636364
## [Resample] iter 66: 0.1428571 0.8571429
## [Resample] iter 67: 0.1000000 0.9000000
## [Resample] iter 68: 0.2000000 0.8000000
## [Resample] iter 69: 0.2000000 0.8000000
## [Resample] iter 70: 0.0476190 0.9523810
## [Resample] iter 71: 0.1500000 0.8500000
## [Resample] iter 72: 0.0476190 0.9523810
## [Resample] iter 73: 0.1904762 0.8095238
## [Resample] iter 74: 0.1428571 0.8571429
## [Resample] iter 75: 0.1428571 0.8571429
## [Resample] iter 76: 0.1500000 0.8500000
## [Resample] iter 77: 0.0500000 0.9500000
## [Resample] iter 78: 0.2380952 0.7619048
## [Resample] iter 79: 0.1000000 0.9000000
## [Resample] iter 80: 0.1500000 0.8500000
## [Resample] iter 81: 0.2500000 0.7500000
## [Resample] iter 82: 0.2500000 0.7500000
## [Resample] iter 83: 0.1000000 0.9000000
## [Resample] iter 84: 0.0909091 0.9090909
## [Resample] iter 85: 0.1818182 0.8181818
## [Resample] iter 86: 0.1428571 0.8571429
## [Resample] iter 87: 0.2000000 0.8000000
## [Resample] iter 88: 0.1000000 0.9000000
## [Resample] iter 89: 0.1000000 0.9000000
## [Resample] iter 90: 0.2000000 0.8000000
## [Resample] iter 91: 0.2272727 0.7727273
## [Resample] iter 92: 0.1500000 0.8500000
## [Resample] iter 93: 0.1000000 0.9000000
## [Resample] iter 94: 0.1500000 0.8500000
## [Resample] iter 95: 0.1428571 0.8571429
## [Resample] iter 96: 0.0952381 0.9047619
## [Resample] iter 97: 0.0952381 0.9047619
## [Resample] iter 98: 0.1500000 0.8500000
## [Resample] iter 99: 0.2500000 0.7500000
## [Resample] iter 100: 0.0000000 1.0000000
##
## Aggregated Result: mmce.test.mean=0.1446407,acc.test.mean=0.8553593
##
Atlikus LDA modelio k-fold validavimą, modelis suklasifikuoja teingai 86 procentus duomenų. Atlikus šį validavimą modelio klasifikavimas pagerėja 7 procentais.
calculateConfusionMatrix(kfold_ldaCV$pred, relative = TRUE)
## Relative confusion matrix (normalized by row/column):
## predicted
## true band noband -err.-
## band 0.45/0.76 0.55/0.13 0.55
## noband 0.04/0.24 0.96/0.87 0.04
## -err.- 0.24 0.13 0.14
##
##
## Absolute confusion matrix:
## predicted
## true band noband -err.-
## band 194 236 236
## noband 61 1559 61
## -err.- 61 236 297
LOO validation
LOO <- makeResampleDesc(method = "LOO")
set.seed(50)
lda <- makeLearner("classif.lda")
lda_LOO <- resample(learner = lda, task = bandsTask, resampling = LOO,
measures = list(mmce, acc))
## Resampling: LOO
## Measures: mmce acc
## [Resample] iter 1: 0.0000000 1.0000000
## [Resample] iter 2: 1.0000000 0.0000000
## [Resample] iter 3: 0.0000000 1.0000000
## [Resample] iter 4: 0.0000000 1.0000000
## [Resample] iter 5: 0.0000000 1.0000000
## [Resample] iter 6: 0.0000000 1.0000000
## [Resample] iter 7: 1.0000000 0.0000000
## [Resample] iter 8: 0.0000000 1.0000000
## [Resample] iter 9: 0.0000000 1.0000000
## [Resample] iter 10: 0.0000000 1.0000000
## [Resample] iter 11: 0.0000000 1.0000000
## [Resample] iter 12: 0.0000000 1.0000000
## [Resample] iter 13: 0.0000000 1.0000000
## [Resample] iter 14: 0.0000000 1.0000000
## [Resample] iter 15: 0.0000000 1.0000000
## [Resample] iter 16: 0.0000000 1.0000000
## [Resample] iter 17: 0.0000000 1.0000000
## [Resample] iter 18: 0.0000000 1.0000000
## [Resample] iter 19: 0.0000000 1.0000000
## [Resample] iter 20: 0.0000000 1.0000000
## [Resample] iter 21: 0.0000000 1.0000000
## [Resample] iter 22: 0.0000000 1.0000000
## [Resample] iter 23: 0.0000000 1.0000000
## [Resample] iter 24: 0.0000000 1.0000000
## [Resample] iter 25: 0.0000000 1.0000000
## [Resample] iter 26: 0.0000000 1.0000000
## [Resample] iter 27: 0.0000000 1.0000000
## [Resample] iter 28: 0.0000000 1.0000000
## [Resample] iter 29: 0.0000000 1.0000000
## [Resample] iter 30: 0.0000000 1.0000000
## [Resample] iter 31: 0.0000000 1.0000000
## [Resample] iter 32: 0.0000000 1.0000000
## [Resample] iter 33: 0.0000000 1.0000000
## [Resample] iter 34: 0.0000000 1.0000000
## [Resample] iter 35: 0.0000000 1.0000000
## [Resample] iter 36: 0.0000000 1.0000000
## [Resample] iter 37: 0.0000000 1.0000000
## [Resample] iter 38: 0.0000000 1.0000000
## [Resample] iter 39: 1.0000000 0.0000000
## [Resample] iter 40: 1.0000000 0.0000000
## [Resample] iter 41: 0.0000000 1.0000000
## [Resample] iter 42: 1.0000000 0.0000000
## [Resample] iter 43: 0.0000000 1.0000000
## [Resample] iter 44: 0.0000000 1.0000000
## [Resample] iter 45: 0.0000000 1.0000000
## [Resample] iter 46: 0.0000000 1.0000000
## [Resample] iter 47: 0.0000000 1.0000000
## [Resample] iter 48: 0.0000000 1.0000000
## [Resample] iter 49: 0.0000000 1.0000000
## [Resample] iter 50: 0.0000000 1.0000000
## [Resample] iter 51: 0.0000000 1.0000000
## [Resample] iter 52: 0.0000000 1.0000000
## [Resample] iter 53: 0.0000000 1.0000000
## [Resample] iter 54: 1.0000000 0.0000000
## [Resample] iter 55: 0.0000000 1.0000000
## [Resample] iter 56: 1.0000000 0.0000000
## [Resample] iter 57: 1.0000000 0.0000000
## [Resample] iter 58: 1.0000000 0.0000000
## [Resample] iter 59: 0.0000000 1.0000000
## [Resample] iter 60: 0.0000000 1.0000000
## [Resample] iter 61: 0.0000000 1.0000000
## [Resample] iter 62: 0.0000000 1.0000000
## [Resample] iter 63: 0.0000000 1.0000000
## [Resample] iter 64: 0.0000000 1.0000000
## [Resample] iter 65: 0.0000000 1.0000000
## [Resample] iter 66: 0.0000000 1.0000000
## [Resample] iter 67: 0.0000000 1.0000000
## [Resample] iter 68: 1.0000000 0.0000000
## [Resample] iter 69: 0.0000000 1.0000000
## [Resample] iter 70: 0.0000000 1.0000000
## [Resample] iter 71: 0.0000000 1.0000000
## [Resample] iter 72: 0.0000000 1.0000000
## [Resample] iter 73: 1.0000000 0.0000000
## [Resample] iter 74: 1.0000000 0.0000000
## [Resample] iter 75: 0.0000000 1.0000000
## [Resample] iter 76: 1.0000000 0.0000000
## [Resample] iter 77: 0.0000000 1.0000000
## [Resample] iter 78: 1.0000000 0.0000000
## [Resample] iter 79: 0.0000000 1.0000000
## [Resample] iter 80: 0.0000000 1.0000000
## [Resample] iter 81: 0.0000000 1.0000000
## [Resample] iter 82: 0.0000000 1.0000000
## [Resample] iter 83: 0.0000000 1.0000000
## [Resample] iter 84: 0.0000000 1.0000000
## [Resample] iter 85: 0.0000000 1.0000000
## [Resample] iter 86: 0.0000000 1.0000000
## [Resample] iter 87: 0.0000000 1.0000000
## [Resample] iter 88: 0.0000000 1.0000000
## [Resample] iter 89: 0.0000000 1.0000000
## [Resample] iter 90: 0.0000000 1.0000000
## [Resample] iter 91: 0.0000000 1.0000000
## [Resample] iter 92: 0.0000000 1.0000000
## [Resample] iter 93: 0.0000000 1.0000000
## [Resample] iter 94: 0.0000000 1.0000000
## [Resample] iter 95: 0.0000000 1.0000000
## [Resample] iter 96: 0.0000000 1.0000000
## [Resample] iter 97: 0.0000000 1.0000000
## [Resample] iter 98: 0.0000000 1.0000000
## [Resample] iter 99: 0.0000000 1.0000000
## [Resample] iter 100: 0.0000000 1.0000000
## [Resample] iter 101: 0.0000000 1.0000000
## [Resample] iter 102: 0.0000000 1.0000000
## [Resample] iter 103: 0.0000000 1.0000000
## [Resample] iter 104: 1.0000000 0.0000000
## [Resample] iter 105: 1.0000000 0.0000000
## [Resample] iter 106: 0.0000000 1.0000000
## [Resample] iter 107: 0.0000000 1.0000000
## [Resample] iter 108: 0.0000000 1.0000000
## [Resample] iter 109: 0.0000000 1.0000000
## [Resample] iter 110: 1.0000000 0.0000000
## [Resample] iter 111: 0.0000000 1.0000000
## [Resample] iter 112: 0.0000000 1.0000000
## [Resample] iter 113: 0.0000000 1.0000000
## [Resample] iter 114: 0.0000000 1.0000000
## [Resample] iter 115: 0.0000000 1.0000000
## [Resample] iter 116: 0.0000000 1.0000000
## [Resample] iter 117: 0.0000000 1.0000000
## [Resample] iter 118: 0.0000000 1.0000000
## [Resample] iter 119: 0.0000000 1.0000000
## [Resample] iter 120: 0.0000000 1.0000000
## [Resample] iter 121: 0.0000000 1.0000000
## [Resample] iter 122: 0.0000000 1.0000000
## [Resample] iter 123: 0.0000000 1.0000000
## [Resample] iter 124: 0.0000000 1.0000000
## [Resample] iter 125: 0.0000000 1.0000000
## [Resample] iter 126: 0.0000000 1.0000000
## [Resample] iter 127: 0.0000000 1.0000000
## [Resample] iter 128: 0.0000000 1.0000000
## [Resample] iter 129: 0.0000000 1.0000000
## [Resample] iter 130: 0.0000000 1.0000000
## [Resample] iter 131: 0.0000000 1.0000000
## [Resample] iter 132: 0.0000000 1.0000000
## [Resample] iter 133: 0.0000000 1.0000000
## [Resample] iter 134: 0.0000000 1.0000000
## [Resample] iter 135: 0.0000000 1.0000000
## [Resample] iter 136: 0.0000000 1.0000000
## [Resample] iter 137: 0.0000000 1.0000000
## [Resample] iter 138: 0.0000000 1.0000000
## [Resample] iter 139: 0.0000000 1.0000000
## [Resample] iter 140: 0.0000000 1.0000000
## [Resample] iter 141: 0.0000000 1.0000000
## [Resample] iter 142: 1.0000000 0.0000000
## [Resample] iter 143: 1.0000000 0.0000000
## [Resample] iter 144: 0.0000000 1.0000000
## [Resample] iter 145: 1.0000000 0.0000000
## [Resample] iter 146: 0.0000000 1.0000000
## [Resample] iter 147: 0.0000000 1.0000000
## [Resample] iter 148: 0.0000000 1.0000000
## [Resample] iter 149: 0.0000000 1.0000000
## [Resample] iter 150: 0.0000000 1.0000000
## [Resample] iter 151: 0.0000000 1.0000000
## [Resample] iter 152: 0.0000000 1.0000000
## [Resample] iter 153: 0.0000000 1.0000000
## [Resample] iter 154: 0.0000000 1.0000000
## [Resample] iter 155: 0.0000000 1.0000000
## [Resample] iter 156: 0.0000000 1.0000000
## [Resample] iter 157: 1.0000000 0.0000000
## [Resample] iter 158: 0.0000000 1.0000000
## [Resample] iter 159: 1.0000000 0.0000000
## [Resample] iter 160: 1.0000000 0.0000000
## [Resample] iter 161: 1.0000000 0.0000000
## [Resample] iter 162: 0.0000000 1.0000000
## [Resample] iter 163: 0.0000000 1.0000000
## [Resample] iter 164: 0.0000000 1.0000000
## [Resample] iter 165: 0.0000000 1.0000000
## [Resample] iter 166: 0.0000000 1.0000000
## [Resample] iter 167: 0.0000000 1.0000000
## [Resample] iter 168: 0.0000000 1.0000000
## [Resample] iter 169: 0.0000000 1.0000000
## [Resample] iter 170: 0.0000000 1.0000000
## [Resample] iter 171: 1.0000000 0.0000000
## [Resample] iter 172: 0.0000000 1.0000000
## [Resample] iter 173: 0.0000000 1.0000000
## [Resample] iter 174: 0.0000000 1.0000000
## [Resample] iter 175: 0.0000000 1.0000000
## [Resample] iter 176: 1.0000000 0.0000000
## [Resample] iter 177: 1.0000000 0.0000000
## [Resample] iter 178: 0.0000000 1.0000000
## [Resample] iter 179: 1.0000000 0.0000000
## [Resample] iter 180: 0.0000000 1.0000000
## [Resample] iter 181: 1.0000000 0.0000000
## [Resample] iter 182: 0.0000000 1.0000000
## [Resample] iter 183: 0.0000000 1.0000000
## [Resample] iter 184: 0.0000000 1.0000000
## [Resample] iter 185: 0.0000000 1.0000000
## [Resample] iter 186: 0.0000000 1.0000000
## [Resample] iter 187: 0.0000000 1.0000000
## [Resample] iter 188: 0.0000000 1.0000000
## [Resample] iter 189: 0.0000000 1.0000000
## [Resample] iter 190: 0.0000000 1.0000000
## [Resample] iter 191: 0.0000000 1.0000000
## [Resample] iter 192: 0.0000000 1.0000000
## [Resample] iter 193: 0.0000000 1.0000000
## [Resample] iter 194: 0.0000000 1.0000000
## [Resample] iter 195: 0.0000000 1.0000000
## [Resample] iter 196: 0.0000000 1.0000000
## [Resample] iter 197: 0.0000000 1.0000000
## [Resample] iter 198: 0.0000000 1.0000000
## [Resample] iter 199: 0.0000000 1.0000000
## [Resample] iter 200: 0.0000000 1.0000000
## [Resample] iter 201: 0.0000000 1.0000000
## [Resample] iter 202: 0.0000000 1.0000000
## [Resample] iter 203: 0.0000000 1.0000000
## [Resample] iter 204: 0.0000000 1.0000000
## [Resample] iter 205: 0.0000000 1.0000000
##
## Aggregated Result: mmce.test.mean=0.1414634,acc.test.mean=0.8585366
##
lda_LOO$aggr
## mmce.test.mean acc.test.mean
## 0.1414634 0.8585366
Atlikus LOO validavimą tesingai suklasifikuojama yra 86 procentai duomenų.
bandsTask <- makeClassifTask(data = bands , target = "bandtype")
## Warning in makeTask(type = type, data = data, weights = weights, blocking =
## blocking, : Provided data is not a pure data.frame but from class tbl_df, hence
## it will be converted.
knnParamSpace <- makeParamSet(makeDiscreteParam("k", values = 1:10))
gridSearch <- makeTuneControlGrid()
set.seed(10)
holdout <- makeResampleDesc(method = "Holdout", split = 3/5, stratify = TRUE)
tunedKCv <- tuneParams("classif.knn", task = bandsTask, resampling = holdout, par.set = knnParamSpace, control = gridSearch)
## [Tune] Started tuning learner classif.knn for parameter set:
## Type len Def Constr Req Tunable Trafo
## k discrete - - 1,2,3,4,5,6,7,8,9,10 - TRUE -
## With control class: TuneControlGrid
## Imputation value: 1
## [Tune-x] 1: k=1
## [Tune-y] 1: mmce.test.mean=0.0722892; time: 0.0 min
## [Tune-x] 2: k=2
## [Tune-y] 2: mmce.test.mean=0.1566265; time: 0.0 min
## [Tune-x] 3: k=3
## [Tune-y] 3: mmce.test.mean=0.1686747; time: 0.0 min
## [Tune-x] 4: k=4
## [Tune-y] 4: mmce.test.mean=0.1566265; time: 0.0 min
## [Tune-x] 5: k=5
## [Tune-y] 5: mmce.test.mean=0.1445783; time: 0.0 min
## [Tune-x] 6: k=6
## [Tune-y] 6: mmce.test.mean=0.1325301; time: 0.0 min
## [Tune-x] 7: k=7
## [Tune-y] 7: mmce.test.mean=0.1566265; time: 0.0 min
## [Tune-x] 8: k=8
## [Tune-y] 8: mmce.test.mean=0.1807229; time: 0.0 min
## [Tune-x] 9: k=9
## [Tune-y] 9: mmce.test.mean=0.1807229; time: 0.0 min
## [Tune-x] 10: k=10
## [Tune-y] 10: mmce.test.mean=0.1807229; time: 0.0 min
## [Tune] Result: k=1 : mmce.test.mean=0.0722892
knnTuningData <- generateHyperParsEffectData(tunedKCv)
plotHyperParsEffect(knnTuningData, x = "k", y = "mmce.test.mean", plot.type = "line") + theme_bw()
Iš grafiko galime matyti, kad aukščiausias taškas (accuracy) yra, kai k
= 10, todėl KNN modeliui pasirinksime šią reikšmę.
tunedKCv
## Tune result:
## Op. pars: k=1
## mmce.test.mean=0.0722892
Artimiausio kaimyno metodas teisingai suklasifikuoja 72 procentus duomenų.
knn <- makeLearner("classif.knn", par.vals = list("k" = 10))
holdoutNoStrat <- makeResampleDesc(method = "Holdout", split = 0.5, stratify = FALSE)
set.seed(10)
kFoldCV <- resample(learner = knn, task = bandsTask, resampling = holdoutNoStrat, measures = list(mmce, acc))
## Resampling: holdout
## Measures: mmce acc
## [Resample] iter 1: 0.2038835 0.7961165
##
## Aggregated Result: mmce.test.mean=0.2038835,acc.test.mean=0.7961165
##
Priskyrus fold = 10 reikšmę, duomenų klasifikavimas pagerėja 8 procentais.
calculateConfusionMatrix(kFoldCV$pred, relative = TRUE)
## Relative confusion matrix (normalized by row/column):
## predicted
## true band noband -err.-
## band 0.30/0.58 0.70/0.18 0.70
## noband 0.06/0.42 0.94/0.82 0.06
## -err.- 0.42 0.18 0.20
##
##
## Absolute confusion matrix:
## predicted
## true band noband -err.-
## band 7 16 16
## noband 5 75 5
## -err.- 5 16 21
set.seed(10)
holdoutCV <- resample(learner = knn, task = bandsTask, resampling = holdout, measures = list(mlr::mmce, mlr::acc))
## Resampling: holdout
## Measures: mmce acc
## [Resample] iter 1: 0.1807229 0.8192771
##
## Aggregated Result: mmce.test.mean=0.1807229,acc.test.mean=0.8192771
##
Atlikus KNN modeliui holdout validavimą duomenų klasifikavimas pagerėja 2 procentais.
holdoutCV$aggr
## mmce.test.mean acc.test.mean
## 0.1807229 0.8192771
calculateConfusionMatrix(holdoutCV$pred, relative = TRUE)
## Relative confusion matrix (normalized by row/column):
## predicted
## true band noband -err.-
## band 0.28/0.71 0.72/0.17 0.72
## noband 0.03/0.29 0.97/0.83 0.03
## -err.- 0.29 0.17 0.18
##
##
## Absolute confusion matrix:
## predicted
## true band noband -err.-
## band 5 13 13
## noband 2 63 2
## -err.- 2 13 15
kfold <- makeResampleDesc(method = "RepCV", folds = 10)
set.seed(10)
kfoldCV <- resample(learner = knn, task = bandsTask, resampling = kfold , measures = list(mlr::mmce, mlr::acc))
kfoldCV$aggr
## mmce.test.mean acc.test.mean
## 0.1545238 0.8454762
KNN modeliui pritaikius KFOLD validavimą KNN modelio klasifikavimas pagerėja 5 procentais.
calculateConfusionMatrix(kfoldCV$pred, relative = TRUE)
## Relative confusion matrix (normalized by row/column):
## predicted
## true band noband -err.-
## band 0.50/0.68 0.50/0.12 0.50
## noband 0.06/0.32 0.94/0.88 0.06
## -err.- 0.32 0.12 0.15
##
##
## Absolute confusion matrix:
## predicted
## true band noband -err.-
## band 214 216 216
## noband 101 1519 101
## -err.- 101 216 317
bandsT <- as_tibble(bands)
bandsTask <- makeClassifTask(data = bandsT, target = "bandtype")
## Warning in makeTask(type = type, data = data, weights = weights, blocking =
## blocking, : Provided data is not a pure data.frame but from class tbl_df, hence
## it will be converted.
cvForTuning <- makeResampleDesc("Holdout", split = 0.9)
kernels <- c("polynomial", "radial", "sigmoid")
svmParamSpace <- makeParamSet(makeDiscreteParam("kernel", values = kernels),
makeIntegerParam("degree", lower = 1, upper = 3),
makeNumericParam("cost", lower = 0.1, upper = 10),
makeNumericParam("gamma", lower = 0.1, 10))
randSearch <- makeTuneControlRandom(maxit = 10)
outer <- makeResampleDesc("CV", iters = 3)
svmWrapper <- makeTuneWrapper("classif.svm", resampling = cvForTuning,
par.set = svmParamSpace, control = randSearch)
cvWithTuning <- resample(learner = svmWrapper, task = bandsTask, resampling = outer, measures = list(mmce, acc))
## Resampling: cross-validation
## Measures: mmce acc
## [Tune] Started tuning learner classif.svm for parameter set:
## Type len Def Constr Req Tunable Trafo
## kernel discrete - - polynomial,radial,sigmoid - TRUE -
## degree integer - - 1 to 3 - TRUE -
## cost numeric - - 0.1 to 10 - TRUE -
## gamma numeric - - 0.1 to 10 - TRUE -
## With control class: TuneControlRandom
## Imputation value: 1
## [Tune-x] 1: kernel=polynomial; degree=1; cost=7.42; gamma=0.767
## [Tune-y] 1: mmce.test.mean=0.0000000; time: 0.0 min
## [Tune-x] 2: kernel=sigmoid; degree=2; cost=4.89; gamma=5.05
## [Tune-y] 2: mmce.test.mean=0.2857143; time: 0.0 min
## [Tune-x] 3: kernel=sigmoid; degree=2; cost=4.77; gamma=4.6
## [Tune-y] 3: mmce.test.mean=0.2857143; time: 0.0 min
## [Tune-x] 4: kernel=sigmoid; degree=3; cost=0.117; gamma=5.8
## [Tune-y] 4: mmce.test.mean=0.1428571; time: 0.0 min
## [Tune-x] 5: kernel=sigmoid; degree=1; cost=4.81; gamma=2.58
## [Tune-y] 5: mmce.test.mean=0.2857143; time: 0.0 min
## [Tune-x] 6: kernel=sigmoid; degree=3; cost=5.73; gamma=1.29
## [Tune-y] 6: mmce.test.mean=0.2857143; time: 0.0 min
## [Tune-x] 7: kernel=sigmoid; degree=2; cost=6.13; gamma=3.05
## [Tune-y] 7: mmce.test.mean=0.2857143; time: 0.0 min
## [Tune-x] 8: kernel=polynomial; degree=3; cost=6.89; gamma=3.3
## [Tune-y] 8: mmce.test.mean=0.0000000; time: 0.0 min
## [Tune-x] 9: kernel=polynomial; degree=2; cost=0.641; gamma=9.73
## [Tune-y] 9: mmce.test.mean=0.1428571; time: 0.0 min
## [Tune-x] 10: kernel=radial; degree=1; cost=8.98; gamma=4.04
## [Tune-y] 10: mmce.test.mean=0.0714286; time: 0.0 min
## [Tune] Result: kernel=polynomial; degree=3; cost=6.89; gamma=3.3 : mmce.test.mean=0.0000000
## [Resample] iter 1: 0.0294118 0.9705882
## [Tune] Started tuning learner classif.svm for parameter set:
## Type len Def Constr Req Tunable Trafo
## kernel discrete - - polynomial,radial,sigmoid - TRUE -
## degree integer - - 1 to 3 - TRUE -
## cost numeric - - 0.1 to 10 - TRUE -
## gamma numeric - - 0.1 to 10 - TRUE -
## With control class: TuneControlRandom
## Imputation value: 1
## [Tune-x] 1: kernel=polynomial; degree=1; cost=9.56; gamma=9.87
## [Tune-y] 1: mmce.test.mean=0.1428571; time: 0.0 min
## [Tune-x] 2: kernel=radial; degree=2; cost=3.85; gamma=7.61
## [Tune-y] 2: mmce.test.mean=0.0000000; time: 0.0 min
## [Tune-x] 3: kernel=polynomial; degree=1; cost=0.598; gamma=7.97
## [Tune-y] 3: mmce.test.mean=0.1428571; time: 0.0 min
## [Tune-x] 4: kernel=radial; degree=3; cost=4.8; gamma=3.3
## [Tune-y] 4: mmce.test.mean=0.0000000; time: 0.0 min
## [Tune-x] 5: kernel=radial; degree=1; cost=3.92; gamma=3.1
## [Tune-y] 5: mmce.test.mean=0.0000000; time: 0.0 min
## [Tune-x] 6: kernel=polynomial; degree=3; cost=2.11; gamma=8.43
## [Tune-y] 6: mmce.test.mean=0.0714286; time: 0.0 min
## [Tune-x] 7: kernel=sigmoid; degree=3; cost=0.268; gamma=6.14
## [Tune-y] 7: mmce.test.mean=0.0714286; time: 0.0 min
## [Tune-x] 8: kernel=sigmoid; degree=2; cost=6.01; gamma=0.416
## [Tune-y] 8: mmce.test.mean=0.1428571; time: 0.0 min
## [Tune-x] 9: kernel=polynomial; degree=2; cost=0.347; gamma=5.02
## [Tune-y] 9: mmce.test.mean=0.0714286; time: 0.0 min
## [Tune-x] 10: kernel=radial; degree=2; cost=8.84; gamma=9.06
## [Tune-y] 10: mmce.test.mean=0.0000000; time: 0.0 min
## [Tune] Result: kernel=radial; degree=1; cost=3.92; gamma=3.1 : mmce.test.mean=0.0000000
## [Resample] iter 2: 0.1014493 0.8985507
## [Tune] Started tuning learner classif.svm for parameter set:
## Type len Def Constr Req Tunable Trafo
## kernel discrete - - polynomial,radial,sigmoid - TRUE -
## degree integer - - 1 to 3 - TRUE -
## cost numeric - - 0.1 to 10 - TRUE -
## gamma numeric - - 0.1 to 10 - TRUE -
## With control class: TuneControlRandom
## Imputation value: 1
## [Tune-x] 1: kernel=radial; degree=1; cost=4.09; gamma=8.01
## [Tune-y] 1: mmce.test.mean=0.0000000; time: 0.0 min
## [Tune-x] 2: kernel=polynomial; degree=1; cost=2.56; gamma=9.82
## [Tune-y] 2: mmce.test.mean=0.1428571; time: 0.0 min
## [Tune-x] 3: kernel=polynomial; degree=1; cost=9.98; gamma=0.313
## [Tune-y] 3: mmce.test.mean=0.1428571; time: 0.0 min
## [Tune-x] 4: kernel=radial; degree=1; cost=4.49; gamma=7.17
## [Tune-y] 4: mmce.test.mean=0.0000000; time: 0.0 min
## [Tune-x] 5: kernel=sigmoid; degree=3; cost=8.08; gamma=1.66
## [Tune-y] 5: mmce.test.mean=0.2857143; time: 0.0 min
## [Tune-x] 6: kernel=radial; degree=3; cost=1.43; gamma=7.86
## [Tune-y] 6: mmce.test.mean=0.0000000; time: 0.0 min
## [Tune-x] 7: kernel=radial; degree=3; cost=1.77; gamma=1.33
## [Tune-y] 7: mmce.test.mean=0.0000000; time: 0.0 min
## [Tune-x] 8: kernel=sigmoid; degree=3; cost=4.48; gamma=7.06
## [Tune-y] 8: mmce.test.mean=0.2857143; time: 0.0 min
## [Tune-x] 9: kernel=polynomial; degree=3; cost=9.37; gamma=2.66
## [Tune-y] 9: mmce.test.mean=0.0000000; time: 0.0 min
## [Tune-x] 10: kernel=radial; degree=1; cost=9.75; gamma=0.479
## [Tune-y] 10: mmce.test.mean=0.0000000; time: 0.0 min
## [Tune] Result: kernel=polynomial; degree=3; cost=9.37; gamma=2.66 : mmce.test.mean=0.0000000
## [Resample] iter 3: 0.1176471 0.8823529
##
## Aggregated Result: mmce.test.mean=0.0828360,acc.test.mean=0.9171640
##
calculateConfusionMatrix(cvWithTuning$pred, relative = TRUE)
## Relative confusion matrix (normalized by row/column):
## predicted
## true band noband -err.-
## band 0.70/0.88 0.30/0.08 0.30
## noband 0.02/0.12 0.98/0.92 0.02
## -err.- 0.12 0.08 0.08
##
##
## Absolute confusion matrix:
## predicted
## true band noband -err.-
## band 30 13 13
## noband 4 158 4
## -err.- 4 13 17
SVM modelis iš 205 imties narių 188 suklasifikuoja teisingai.
bandsTask <- makeClassifTask(data = bands, target = "bandtype")
## Warning in makeTask(type = type, data = data, weights = weights, blocking =
## blocking, : Provided data is not a pure data.frame but from class tbl_df, hence
## it will be converted.
kernels <- c("polynomial", "radial", "sigmoid")
svmParamSpace <- makeParamSet(
makeDiscreteParam("kernel", values = kernels),
makeIntegerParam("degree", lower = 1, upper = 3),
makeNumericParam("cost", lower = 0.1, upper = 10),
makeNumericParam("gamma", lower = 0.1, 10))
set.seed(10)
randSearch <- makeTuneControlRandom(maxit = 10)
cvForTuning <- makeResampleDesc("Holdout", split = 1/3)
library(parallelMap)
library(parallel)
parallelStartSocket(cpus = detectCores())
## Starting parallelization in mode=socket with cpus=8.
set.seed(10)
tunedSvmPars <- tuneParams("classif.svm", task = bandsTask,
resampling = cvForTuning,
par.set = svmParamSpace,
control = randSearch)
## [Tune] Started tuning learner classif.svm for parameter set:
## Type len Def Constr Req Tunable Trafo
## kernel discrete - - polynomial,radial,sigmoid - TRUE -
## degree integer - - 1 to 3 - TRUE -
## cost numeric - - 0.1 to 10 - TRUE -
## gamma numeric - - 0.1 to 10 - TRUE -
## With control class: TuneControlRandom
## Imputation value: 1
## Exporting objects to slaves for mode socket: .mlr.slave.options
## Mapping in parallel: mode = socket; level = mlr.tuneParams; cpus = 8; elements = 10.
## [Tune] Result: kernel=polynomial; degree=3; cost=9.58; gamma=9.81 : mmce.test.mean=0.1313869
parallelStop()
## Stopped parallelization. All cleaned up.
tunedSvmPars
## Tune result:
## Op. pars: kernel=polynomial; degree=3; cost=9.58; gamma=9.81
## mmce.test.mean=0.1313869
logReg <- makeLearner("classif.logreg", predict.type = "prob")
logRegWrapper <- makeImputeWrapper("classif.logreg")
holdout <- makeResampleDesc(method = "Holdout", split = 4/5, stratify = TRUE)
set.seed(123)
logRegwithImpute <- resample(logRegWrapper, bandsTask,
resampling = holdout,
measures = list(acc, fpr, fnr))
## Resampling: holdout
## Measures: acc fpr fnr
## [Resample] iter 1: 0.8571429 0.0606061 0.4444444
##
## Aggregated Result: acc.test.mean=0.8571429,fpr.test.mean=0.0606061,fnr.test.mean=0.4444444
##
calculateConfusionMatrix(logRegwithImpute$pred, relative = TRUE)
## Relative confusion matrix (normalized by row/column):
## predicted
## true band noband -err.-
## band 0.56/0.71 0.44/0.11 0.44
## noband 0.06/0.29 0.94/0.89 0.06
## -err.- 0.29 0.11 0.14
##
##
## Absolute confusion matrix:
## predicted
## true band noband -err.-
## band 5 4 4
## noband 2 31 2
## -err.- 2 4 6
Logistinės regresijos modelis teisingai suklasifikuoja 86 procentus duomenų.
kFold <- makeResampleDesc(method = "CV", iters = 10)
set.seed(123)
logRegwithImpute <- resample(logRegWrapper, bandsTask,
resampling = kFold,
measures = list(acc, fpr, fnr))
## Resampling: cross-validation
## Measures: acc fpr fnr
## [Resample] iter 1: 0.9523810 0.0625000 0.0000000
## [Resample] iter 2: 0.9523810 0.0000000 0.2500000
## Warning: glm.fit: atsirado tikimybės, kurios skaitine prasme yra 0 arba 1
## [Resample] iter 3: 0.8500000 0.0625000 0.5000000
## [Resample] iter 4: 0.9000000 0.0714286 0.1666667
## Warning: glm.fit: atsirado tikimybės, kurios skaitine prasme yra 0 arba 1
## [Resample] iter 5: 0.8000000 0.1764706 0.3333333
## [Resample] iter 6: 0.9523810 0.0000000 0.2500000
## [Resample] iter 7: 0.9000000 0.0000000 0.3333333
## [Resample] iter 8: 0.9047619 0.1111111 0.0000000
## Warning: glm.fit: atsirado tikimybės, kurios skaitine prasme yra 0 arba 1
## [Resample] iter 9: 0.8571429 0.0000000 0.5000000
## Warning: glm.fit: atsirado tikimybės, kurios skaitine prasme yra 0 arba 1
## [Resample] iter 10: 0.7000000 0.2222222 1.0000000
##
## Aggregated Result: acc.test.mean=0.8769048,fpr.test.mean=0.0706232,fnr.test.mean=0.3333333
##
logRegwithImpute$aggr
## acc.test.mean fpr.test.mean fnr.test.mean
## 0.87690476 0.07062325 0.33333333
calculateConfusionMatrix(logRegwithImpute$pred, relative = TRUE)
## Relative confusion matrix (normalized by row/column):
## predicted
## true band noband -err.-
## band 0.70/0.71 0.30/0.08 0.30
## noband 0.07/0.29 0.93/0.92 0.07
## -err.- 0.29 0.08 0.12
##
##
## Absolute confusion matrix:
## predicted
## true band noband -err.-
## band 30 13 13
## noband 12 150 12
## -err.- 12 13 25
Logistinei regresijai atlikus 10 - FOLD validavimą, modelio prognozavimas pagerėja 2 procentais.