BaiKTKN3

Câu 1 (5 điểm)

Sử dụng tập dữ liệu mtcars trong R, xây dựng mô hình hồi quy tuyến tính để dự đoán mpg dựa vào hp và wt. Đánh giá mô hình bằng hệ số R bình phương R² và RMSE. ## 1. Tải dữ liệu

# Sử dụng dữ liệu có sẵn trong R
data(mtcars)

# Xem 6 dòng đầu
head(mtcars)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

2. Xây dựng mô hình hồi quy tuyến tính

Mô hình dự đoán mpg theo hp và wt:

# Xây dựng mô hình
model <- lm(mpg ~ hp + wt, data = mtcars)

# Xem kết quả mô hình
summary(model)

## 
## Call:
## lm(formula = mpg ~ hp + wt, data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.941 -1.600 -0.182  1.050  5.854 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 37.22727    1.59879  23.285  < 2e-16 ***
## hp          -0.03177    0.00903  -3.519  0.00145 ** 
## wt          -3.87783    0.63273  -6.129 1.12e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.593 on 29 degrees of freedom
## Multiple R-squared:  0.8268, Adjusted R-squared:  0.8148 
## F-statistic: 69.21 on 2 and 29 DF,  p-value: 9.109e-12

3. Tính R bình phương (R²)

# Lấy R squared từ summary
r_squared <- summary(model)$r.squared
r_squared

## [1] 0.8267855

4. Tính RMSE

# Dự đoán giá trị mpg
pred <- predict(model, mtcars)

# Tính RMSE
rmse <- sqrt(mean((mtcars$mpg - pred)^2))
rmse

## [1] 2.468854

Câu 2 (5 điểm)

Dùng tập dữ liệu iris, xây dựng mô hình hồi quy logistic đề phân loại Species (chỉ xét hai lớp setosa và versicolor). Tính ma trận nhầm lẫn của mô hình và độ chính xác.

1. Chuẩn bị dữ liệu

# Tải dữ liệu
data(iris)

# Chỉ lấy 2 lớp: setosa và versicolor
iris2 <- subset(iris, Species != "virginica")

# Kiểm tra dữ liệu
table(iris2$Species)

## 
##     setosa versicolor  virginica 
##         50         50          0

2. Xây dựng mô hình Logistic

Chuyển Species thành biến nhị phân:

# Chuyển sang factor nhị phân
iris2$Species <- factor(iris2$Species)

# Xây dựng mô hình logistic
model_logit <- glm(Species ~ Sepal.Length + Sepal.Width +
                   Petal.Length + Petal.Width,
                   data = iris2,
                   family = binomial)

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

summary(model_logit)

## 
## Call:
## glm(formula = Species ~ Sepal.Length + Sepal.Width + Petal.Length + 
##     Petal.Width, family = binomial, data = iris2)
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)
## (Intercept)       6.556 601950.324       0        1
## Sepal.Length     -9.879 194223.245       0        1
## Sepal.Width      -7.418  92924.451       0        1
## Petal.Length     19.054 144515.981       0        1
## Petal.Width      25.033 216058.936       0        1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1.3863e+02  on 99  degrees of freedom
## Residual deviance: 1.3166e-09  on 95  degrees of freedom
## AIC: 10
## 
## Number of Fisher Scoring iterations: 25

3. Dự đoán và phân loại

# Xác suất dự đoán
prob <- predict(model_logit, iris2, type = "response")

# Chuyển xác suất thành nhãn lớp (ngưỡng 0.5)
pred_class <- ifelse(prob > 0.5, "versicolor", "setosa")

pred_class <- factor(pred_class)

4. Ma trận nhầm lẫn

conf_matrix <- table(Predicted = pred_class,
                     Actual = iris2$Species)

conf_matrix

##             Actual
## Predicted    setosa versicolor
##   setosa         50          0
##   versicolor      0         50

5. Tính độ chính xác (Accuracy)

accuracy <- sum(diag(conf_matrix)) / sum(conf_matrix)
accuracy

## [1] 1