Лабораторная работа: Анализ данных

Задание 1: CARET

set.seed(123)
x <- matrix(rnorm(50*5), ncol=5)
y <- factor(rep(c("A", "B"), 25))
featurePlot(x = x, y = y, plot = "pairs", auto.key = list(columns = 2))

featurePlot(x = x, y = y, plot = "density",
            scales = list(x = list(relation = "free"),
                          y = list(relation = "free")),
            auto.key = list(columns = 2))

Задание 2: FSelector

data(iris)
weights_ig <- information.gain(Species ~ ., data = iris)
print(weights_ig)

#>              attr_importance
#> Sepal.Length       0.4521286
#> Sepal.Width        0.2672750
#> Petal.Length       0.9402853
#> Petal.Width        0.9554360

weights_chi <- chi.squared(Species ~ ., data = iris)
print(weights_chi)

#>              attr_importance
#> Sepal.Length       0.6288067
#> Sepal.Width        0.4922162
#> Petal.Length       0.9346311
#> Petal.Width        0.9432359

Задание 3: Дискретизация

x <- iris$Petal.Length
table(discretize(x, method = "interval", breaks = 3))

#> 
#>    [1,2.97) [2.97,4.93)  [4.93,6.9] 
#>          50          54          46

table(discretize(x, method = "frequency", breaks = 3))

#> 
#>   [1,2.63) [2.63,4.9)  [4.9,6.9] 
#>         50         49         51

table(discretize(x, method = "cluster", breaks = 3))

#> 
#>    [1,2.88) [2.88,4.96)  [4.96,6.9] 
#>          50          54          46

table(discretize(x, method = "fixed", breaks = c(-Inf, 2, 4, 6, Inf)))

#> 
#> [-Inf,2)    [2,4)    [4,6) [6, Inf] 
#>       50       11       78       11

Задание 4: Boruta

data(Ozone, package = "mlbench")
ozone_clean <- na.omit(Ozone)
set.seed(123)
boruta_result <- Boruta(V4 ~ ., data = ozone_clean, doTrace = 0)
print(boruta_result)

#> Boruta performed 24 iterations in 1.535146 secs.
#>  9 attributes confirmed important: V1, V10, V11, V12, V13 and 4 more;
#>  3 attributes confirmed unimportant: V2, V3, V6;

plot(boruta_result, las = 2)

plotImpHistory(boruta_result)

Выводы

Задание 1: Графики показывают отсутствие разделения классов.
Задание 2: Petal.Length и Petal.Width - самые важные признаки.
Задание 3: Метод cluster выделяет естественные группы.
Задание 4: Boruta подтвердил важность ключевых признаков.

Лабораторная работа: Анализ данных

Таня

2026-04-25

Задание 1: CARET

Задание 2: FSelector

Задание 3: Дискретизация

Задание 4: Boruta

Выводы