- Установить пакет CARET, выполнить команду names(getModelInfo()),
ознакомиться со списком доступных методов выбора признаков. Выполните
графический разведочный анализ данных с использование функции
featurePlot() для набора данных из справочного файла пакета CARET:
## Warning: package 'caret' was built under R version 4.0.5
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.0.5
## Loading required package: lattice
## [1] "Точки на отражают значения в наборе данных, расстояние между точками отражаетет размер шага между сгенерированными значениями для факторов А и В"

- С использование функций из пакета Fselector определить важность
признаков для решения задачи классификации. Использовать набор
data(iris). Сделать выводы.
## Warning: package 'mlbench' was built under R version 4.0.5
## Warning: package 'FSelector' was built under R version 4.0.5
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## [1] "Petal.Width" "Species"
## [1] "Petal.Width" "Species" "Sepal.Length" "Sepal.Width"
## Warning: package 'Boruta' was built under R version 4.0.5
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
## 5 1 5 1 5 5760 3 51 54 45.32 1450 25 57.02 60
## 6 1 6 2 6 5720 4 69 35 49.64 1568 15 53.78 60
## 7 1 7 3 4 5790 6 19 45 46.40 2631 -33 54.14 100
## 8 1 8 4 4 5790 3 25 55 52.70 554 -28 64.76 250
## 9 1 9 5 6 5700 3 73 41 48.02 2083 23 52.52 120
## 12 1 12 1 6 5720 3 44 51 54.32 111 9 63.14 150
## 1. run of importance source...
## 2. run of importance source...
## 3. run of importance source...
## 4. run of importance source...
## 5. run of importance source...
## 6. run of importance source...
## 7. run of importance source...
## 8. run of importance source...
## 9. run of importance source...
## 10. run of importance source...
## 11. run of importance source...
## After 11 iterations, +0.4 secs:
## confirmed 9 attributes: V1, V10, V11, V12, V13 and 4 more;
## rejected 1 attribute: V3;
## still have 2 attributes left.
## 12. run of importance source...
## 13. run of importance source...
## 14. run of importance source...
## 15. run of importance source...
## After 15 iterations, +0.53 secs:
## rejected 1 attribute: V2;
## still have 1 attribute left.
## 16. run of importance source...
## 17. run of importance source...
## 18. run of importance source...
## After 18 iterations, +0.62 secs:
## rejected 1 attribute: V6;
## no more attributes left.
## meanImp medianImp minImp maxImp normHits decision
## V1 9.294488 9.4828445 7.5481248 10.1250310 1.00000000 Confirmed
## V2 0.751229 0.4090303 -0.9443525 2.8436619 0.05555556 Rejected
## V3 -1.683898 -1.4729796 -3.0250587 -0.1911083 0.00000000 Rejected
## V5 9.075544 9.3149825 7.8228538 9.8149683 1.00000000 Confirmed
## V6 1.399087 1.4392659 -0.3476106 3.7737265 0.11111111 Rejected
## V7 11.396456 11.4072583 10.1960760 12.3952003 1.00000000 Confirmed
## V8 16.716411 16.8055075 15.8156346 17.5830431 1.00000000 Confirmed
## V9 19.484823 19.4327618 18.2069628 20.7035009 1.00000000 Confirmed
## V10 10.156480 10.3140914 8.9555323 11.1066703 1.00000000 Confirmed
## V11 12.274802 12.2261779 11.2625527 14.4806432 1.00000000 Confirmed
## V12 14.604610 14.6729560 13.3411125 16.1665801 1.00000000 Confirmed
## V13 9.311753 9.4047466 7.9339483 10.4615078 1.00000000 Confirmed

- С использованием функции discretize() из пакета arules выполните
преобразование непрерывной переменной в категориальную различными
методами: «interval» (равная ширина интервала), «frequency» (равная
частота), «cluster» (кластеризация) и «fixed» (категории задают границы
интервалов). Используйте набор данных iris. Сделайте выводы
## Warning: package 'arules' was built under R version 4.0.5
## Loading required package: Matrix
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write

## Warning in discretize(x, categories = 3): Parameter categories is deprecated.
## Use breaks instead! Also, the default method is now frequency!
##
## [0.1,0.867) [0.867,1.6) [1.6,2.5]
## 50 48 52
## Warning in discretize(x, categories = 3, onlycuts = TRUE): Parameter categories
## is deprecated. Use breaks instead! Also, the default method is now frequency!
## Warning in discretize(x, "frequency", categories = 3): Parameter categories is
## deprecated. Use breaks instead! Also, the default method is now frequency!
##
## [0.1,0.867) [0.867,1.6) [1.6,2.5]
## 50 48 52
## Warning in discretize(x, method = "frequency", categories = 3, onlycuts = TRUE):
## Parameter categories is deprecated. Use breaks instead! Also, the default method
## is now frequency!
## Warning in discretize(x, "cluster", categories = 3): Parameter categories is
## deprecated. Use breaks instead! Also, the default method is now frequency!
##
## [0.1,0.792) [0.792,1.71) [1.71,2.5]
## 50 54 46
## Warning in discretize(x, method = "cluster", categories = 3, onlycuts = TRUE):
## Parameter categories is deprecated. Use breaks instead! Also, the default method
## is now frequency!
## Warning in discretize(x, "fixed", categories = c(-Inf, 0.8, Inf)): Parameter
## categories is deprecated. Use breaks instead! Also, the default method is now
## frequency!
##
## [-Inf,0.8) [0.8, Inf]
## 50 100
## Warning in discretize(x, "fixed", categories = c(-Inf, 0.8, Inf), labels =
## c("small", : Parameter categories is deprecated. Use breaks instead! Also, the
## default method is now frequency!
##
## small large
## 50 100
## Warning in discretize(x, method = "fixed", categories = c(-Inf, 0.8, Inf), :
## Parameter categories is deprecated. Use breaks instead! Also, the default method
## is now frequency!

## Warning in discretize(iris[, i], "frequency", categories = 3): Parameter
## categories is deprecated. Use breaks instead! Also, the default method is now
## frequency!
## Warning in discretize(iris[, i], "frequency", categories = 3): Parameter
## categories is deprecated. Use breaks instead! Also, the default method is now
## frequency!
## Warning in discretize(iris[, i], "frequency", categories = 3): Parameter
## categories is deprecated. Use breaks instead! Also, the default method is now
## frequency!
## Warning in discretize(iris[, i], "frequency", categories = 3): Parameter
## categories is deprecated. Use breaks instead! Also, the default method is now
## frequency!
## items transactionID
## [1] {Sepal.Length=[4.3,5.4),
## Sepal.Width=[3.2,4.4],
## Petal.Length=[1,2.63),
## Petal.Width=[0.1,0.867),
## Species=setosa} 1
## Sepal.Length=[4.3,5.4) Sepal.Length=[5.4,6.3) Sepal.Length=[6.3,7.9]
## 1 TRUE FALSE FALSE
## 2 TRUE FALSE FALSE
## 3 TRUE FALSE FALSE
## Sepal.Width=[2,2.9) Sepal.Width=[2.9,3.2) Sepal.Width=[3.2,4.4]
## 1 FALSE FALSE TRUE
## 2 FALSE TRUE FALSE
## 3 FALSE FALSE TRUE
## Petal.Length=[1,2.63) Petal.Length=[2.63,4.9) Petal.Length=[4.9,6.9]
## 1 TRUE FALSE FALSE
## 2 TRUE FALSE FALSE
## 3 TRUE FALSE FALSE
## Petal.Width=[0.1,0.867) Petal.Width=[0.867,1.6) Petal.Width=[1.6,2.5]
## 1 TRUE FALSE FALSE
## 2 TRUE FALSE FALSE
## 3 TRUE FALSE FALSE
## Species=setosa Species=versicolor Species=virginica
## 1 TRUE FALSE FALSE
## 2 TRUE FALSE FALSE
## 3 TRUE FALSE FALSE
- Установите пакет Boruta и проведите выбор признаков для набора данных data(“Ozone”). Построить график boxplot, сделать выводы.
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
## 5 1 5 1 5 5760 3 51 54 45.32 1450 25 57.02 60
## 6 1 6 2 6 5720 4 69 35 49.64 1568 15 53.78 60
## 7 1 7 3 4 5790 6 19 45 46.40 2631 -33 54.14 100
## 8 1 8 4 4 5790 3 25 55 52.70 554 -28 64.76 250
## 9 1 9 5 6 5700 3 73 41 48.02 2083 23 52.52 120
## 12 1 12 1 6 5720 3 44 51 54.32 111 9 63.14 150
## 1. run of importance source...
## 2. run of importance source...
## 3. run of importance source...
## 4. run of importance source...
## 5. run of importance source...
## 6. run of importance source...
## 7. run of importance source...
## 8. run of importance source...
## 9. run of importance source...
## 10. run of importance source...
## 11. run of importance source...
## After 11 iterations, +0.38 secs:
## confirmed 9 attributes: V1, V10, V11, V12, V13 and 4 more;
## rejected 2 attributes: V2, V3;
## still have 1 attribute left.
## 12. run of importance source...
## 13. run of importance source...
## 14. run of importance source...
## 15. run of importance source...
## 16. run of importance source...
## 17. run of importance source...
## 18. run of importance source...
## 19. run of importance source...
## 20. run of importance source...
## 21. run of importance source...
## 22. run of importance source...
## 23. run of importance source...
## 24. run of importance source...
## After 24 iterations, +0.8 secs:
## rejected 1 attribute: V6;
## no more attributes left.
## meanImp medianImp minImp maxImp normHits decision
## V1 9.2376317 9.3690906 7.7809921 10.9181168 1.0000000 Confirmed
## V2 0.8045046 0.6383691 -0.3798369 1.6192373 0.0000000 Rejected
## V3 -1.0230047 -0.7932765 -3.2862552 -0.2142909 0.0000000 Rejected
## V5 8.9640613 9.0156440 7.3523676 10.2679577 1.0000000 Confirmed
## V6 1.4857544 1.7786131 -0.4079408 2.7575345 0.1666667 Rejected
## V7 11.7176861 11.5714677 10.7545221 12.8265972 1.0000000 Confirmed
## V8 17.3007484 17.2037485 16.4797082 18.5801832 1.0000000 Confirmed
## V9 19.3796531 19.0872733 17.6308316 21.5291697 1.0000000 Confirmed
## V10 10.1809311 10.1059919 8.7317178 11.9690188 1.0000000 Confirmed
## V11 12.1930274 12.1082348 11.0385579 13.2786622 1.0000000 Confirmed
## V12 14.6859226 14.7348337 13.6111690 15.8171778 1.0000000 Confirmed
## V13 9.6555053 9.6605230 8.6051787 10.7951540 1.0000000 Confirmed
