1. Установить пакет CARET, выполнить команду names(getModelInfo()), ознакомиться со списком доступных методов выбора признаков. Выполните графический разведочный анализ данных с использование функции featurePlot() для набора данных из справочного файла пакета CARET:
## Warning: package 'caret' was built under R version 4.0.5
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.0.5
## Loading required package: lattice
## [1] "Точки на отражают значения в наборе данных, расстояние между точками отражаетет размер шага между сгенерированными значениями для факторов А и В"

  1. С использование функций из пакета Fselector определить важность признаков для решения задачи классификации. Использовать набор data(iris). Сделать выводы.
## Warning: package 'mlbench' was built under R version 4.0.5
## Warning: package 'FSelector' was built under R version 4.0.5
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
## [1] "Petal.Width" "Species"
## [1] "Petal.Width"  "Species"      "Sepal.Length" "Sepal.Width"
## Warning: package 'Boruta' was built under R version 4.0.5
##    V1 V2 V3 V4   V5 V6 V7 V8    V9  V10 V11   V12 V13
## 5   1  5  1  5 5760  3 51 54 45.32 1450  25 57.02  60
## 6   1  6  2  6 5720  4 69 35 49.64 1568  15 53.78  60
## 7   1  7  3  4 5790  6 19 45 46.40 2631 -33 54.14 100
## 8   1  8  4  4 5790  3 25 55 52.70  554 -28 64.76 250
## 9   1  9  5  6 5700  3 73 41 48.02 2083  23 52.52 120
## 12  1 12  1  6 5720  3 44 51 54.32  111   9 63.14 150
##  1. run of importance source...
##  2. run of importance source...
##  3. run of importance source...
##  4. run of importance source...
##  5. run of importance source...
##  6. run of importance source...
##  7. run of importance source...
##  8. run of importance source...
##  9. run of importance source...
##  10. run of importance source...
##  11. run of importance source...
## After 11 iterations, +0.4 secs:
##  confirmed 9 attributes: V1, V10, V11, V12, V13 and 4 more;
##  rejected 1 attribute: V3;
##  still have 2 attributes left.
##  12. run of importance source...
##  13. run of importance source...
##  14. run of importance source...
##  15. run of importance source...
## After 15 iterations, +0.53 secs:
##  rejected 1 attribute: V2;
##  still have 1 attribute left.
##  16. run of importance source...
##  17. run of importance source...
##  18. run of importance source...
## After 18 iterations, +0.62 secs:
##  rejected 1 attribute: V6;
##  no more attributes left.
##       meanImp  medianImp     minImp     maxImp   normHits  decision
## V1   9.294488  9.4828445  7.5481248 10.1250310 1.00000000 Confirmed
## V2   0.751229  0.4090303 -0.9443525  2.8436619 0.05555556  Rejected
## V3  -1.683898 -1.4729796 -3.0250587 -0.1911083 0.00000000  Rejected
## V5   9.075544  9.3149825  7.8228538  9.8149683 1.00000000 Confirmed
## V6   1.399087  1.4392659 -0.3476106  3.7737265 0.11111111  Rejected
## V7  11.396456 11.4072583 10.1960760 12.3952003 1.00000000 Confirmed
## V8  16.716411 16.8055075 15.8156346 17.5830431 1.00000000 Confirmed
## V9  19.484823 19.4327618 18.2069628 20.7035009 1.00000000 Confirmed
## V10 10.156480 10.3140914  8.9555323 11.1066703 1.00000000 Confirmed
## V11 12.274802 12.2261779 11.2625527 14.4806432 1.00000000 Confirmed
## V12 14.604610 14.6729560 13.3411125 16.1665801 1.00000000 Confirmed
## V13  9.311753  9.4047466  7.9339483 10.4615078 1.00000000 Confirmed

  1. С использованием функции discretize() из пакета arules выполните преобразование непрерывной переменной в категориальную различными методами: «interval» (равная ширина интервала), «frequency» (равная частота), «cluster» (кластеризация) и «fixed» (категории задают границы интервалов). Используйте набор данных iris. Сделайте выводы
## Warning: package 'arules' was built under R version 4.0.5
## Loading required package: Matrix
## 
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
## 
##     abbreviate, write

## Warning in discretize(x, categories = 3): Parameter categories is deprecated.
## Use breaks instead! Also, the default method is now frequency!
## 
## [0.1,0.867) [0.867,1.6)   [1.6,2.5] 
##          50          48          52
## Warning in discretize(x, categories = 3, onlycuts = TRUE): Parameter categories
## is deprecated. Use breaks instead! Also, the default method is now frequency!
## Warning in discretize(x, "frequency", categories = 3): Parameter categories is
## deprecated. Use breaks instead! Also, the default method is now frequency!
## 
## [0.1,0.867) [0.867,1.6)   [1.6,2.5] 
##          50          48          52
## Warning in discretize(x, method = "frequency", categories = 3, onlycuts = TRUE):
## Parameter categories is deprecated. Use breaks instead! Also, the default method
## is now frequency!
## Warning in discretize(x, "cluster", categories = 3): Parameter categories is
## deprecated. Use breaks instead! Also, the default method is now frequency!
## 
##  [0.1,0.792) [0.792,1.71)   [1.71,2.5] 
##           50           54           46
## Warning in discretize(x, method = "cluster", categories = 3, onlycuts = TRUE):
## Parameter categories is deprecated. Use breaks instead! Also, the default method
## is now frequency!
## Warning in discretize(x, "fixed", categories = c(-Inf, 0.8, Inf)): Parameter
## categories is deprecated. Use breaks instead! Also, the default method is now
## frequency!
## 
## [-Inf,0.8) [0.8, Inf] 
##         50        100
## Warning in discretize(x, "fixed", categories = c(-Inf, 0.8, Inf), labels =
## c("small", : Parameter categories is deprecated. Use breaks instead! Also, the
## default method is now frequency!
## 
## small large 
##    50   100
## Warning in discretize(x, method = "fixed", categories = c(-Inf, 0.8, Inf), :
## Parameter categories is deprecated. Use breaks instead! Also, the default method
## is now frequency!

## Warning in discretize(iris[, i], "frequency", categories = 3): Parameter
## categories is deprecated. Use breaks instead! Also, the default method is now
## frequency!
## Warning in discretize(iris[, i], "frequency", categories = 3): Parameter
## categories is deprecated. Use breaks instead! Also, the default method is now
## frequency!

## Warning in discretize(iris[, i], "frequency", categories = 3): Parameter
## categories is deprecated. Use breaks instead! Also, the default method is now
## frequency!

## Warning in discretize(iris[, i], "frequency", categories = 3): Parameter
## categories is deprecated. Use breaks instead! Also, the default method is now
## frequency!
##     items                      transactionID
## [1] {Sepal.Length=[4.3,5.4),                
##      Sepal.Width=[3.2,4.4],                 
##      Petal.Length=[1,2.63),                 
##      Petal.Width=[0.1,0.867),               
##      Species=setosa}                       1
##   Sepal.Length=[4.3,5.4) Sepal.Length=[5.4,6.3) Sepal.Length=[6.3,7.9]
## 1                   TRUE                  FALSE                  FALSE
## 2                   TRUE                  FALSE                  FALSE
## 3                   TRUE                  FALSE                  FALSE
##   Sepal.Width=[2,2.9) Sepal.Width=[2.9,3.2) Sepal.Width=[3.2,4.4]
## 1               FALSE                 FALSE                  TRUE
## 2               FALSE                  TRUE                 FALSE
## 3               FALSE                 FALSE                  TRUE
##   Petal.Length=[1,2.63) Petal.Length=[2.63,4.9) Petal.Length=[4.9,6.9]
## 1                  TRUE                   FALSE                  FALSE
## 2                  TRUE                   FALSE                  FALSE
## 3                  TRUE                   FALSE                  FALSE
##   Petal.Width=[0.1,0.867) Petal.Width=[0.867,1.6) Petal.Width=[1.6,2.5]
## 1                    TRUE                   FALSE                 FALSE
## 2                    TRUE                   FALSE                 FALSE
## 3                    TRUE                   FALSE                 FALSE
##   Species=setosa Species=versicolor Species=virginica
## 1           TRUE              FALSE             FALSE
## 2           TRUE              FALSE             FALSE
## 3           TRUE              FALSE             FALSE
  1. Установите пакет Boruta и проведите выбор признаков для набора данных data(“Ozone”). Построить график boxplot, сделать выводы.
##    V1 V2 V3 V4   V5 V6 V7 V8    V9  V10 V11   V12 V13
## 5   1  5  1  5 5760  3 51 54 45.32 1450  25 57.02  60
## 6   1  6  2  6 5720  4 69 35 49.64 1568  15 53.78  60
## 7   1  7  3  4 5790  6 19 45 46.40 2631 -33 54.14 100
## 8   1  8  4  4 5790  3 25 55 52.70  554 -28 64.76 250
## 9   1  9  5  6 5700  3 73 41 48.02 2083  23 52.52 120
## 12  1 12  1  6 5720  3 44 51 54.32  111   9 63.14 150
##  1. run of importance source...
##  2. run of importance source...
##  3. run of importance source...
##  4. run of importance source...
##  5. run of importance source...
##  6. run of importance source...
##  7. run of importance source...
##  8. run of importance source...
##  9. run of importance source...
##  10. run of importance source...
##  11. run of importance source...
## After 11 iterations, +0.38 secs:
##  confirmed 9 attributes: V1, V10, V11, V12, V13 and 4 more;
##  rejected 2 attributes: V2, V3;
##  still have 1 attribute left.
##  12. run of importance source...
##  13. run of importance source...
##  14. run of importance source...
##  15. run of importance source...
##  16. run of importance source...
##  17. run of importance source...
##  18. run of importance source...
##  19. run of importance source...
##  20. run of importance source...
##  21. run of importance source...
##  22. run of importance source...
##  23. run of importance source...
##  24. run of importance source...
## After 24 iterations, +0.8 secs:
##  rejected 1 attribute: V6;
##  no more attributes left.
##        meanImp  medianImp     minImp     maxImp  normHits  decision
## V1   9.2376317  9.3690906  7.7809921 10.9181168 1.0000000 Confirmed
## V2   0.8045046  0.6383691 -0.3798369  1.6192373 0.0000000  Rejected
## V3  -1.0230047 -0.7932765 -3.2862552 -0.2142909 0.0000000  Rejected
## V5   8.9640613  9.0156440  7.3523676 10.2679577 1.0000000 Confirmed
## V6   1.4857544  1.7786131 -0.4079408  2.7575345 0.1666667  Rejected
## V7  11.7176861 11.5714677 10.7545221 12.8265972 1.0000000 Confirmed
## V8  17.3007484 17.2037485 16.4797082 18.5801832 1.0000000 Confirmed
## V9  19.3796531 19.0872733 17.6308316 21.5291697 1.0000000 Confirmed
## V10 10.1809311 10.1059919  8.7317178 11.9690188 1.0000000 Confirmed
## V11 12.1930274 12.1082348 11.0385579 13.2786622 1.0000000 Confirmed
## V12 14.6859226 14.7348337 13.6111690 15.8171778 1.0000000 Confirmed
## V13  9.6555053  9.6605230  8.6051787 10.7951540 1.0000000 Confirmed