Q2.문제 답안 보고서

Q1: Check the first 3 rows of this dataset.

head(state.x77, 3)

        Population Income Illiteracy Life Exp Murder HS Grad Frost   Area
Alabama       3615   3624        2.1    69.05   15.1    41.3    20  50708
Alaska         365   6315        1.5    69.31   11.3    66.7   152 566432
Arizona       2212   4530        1.8    70.55    7.8    58.1    15 113417

Q2: Create a new data frame using variable “Murder”, “Population”, “Illiteracy”, “Income” and “Frost” and check it.

df <- as.data.frame(state.x77)
sub.df <- df[, c("Murder","Population","Illiteracy", "Income", "Frost")]
head(sub.df)

           Murder Population Illiteracy Income Frost
Alabama      15.1       3615        2.1   3624    20
Alaska       11.3        365        1.5   6315   152
Arizona       7.8       2212        1.8   4530    15
Arkansas     10.1       2110        1.9   3378    65
California   10.3      21198        1.1   5114    20
Colorado      6.8       2541        0.7   4884   166

Q3: Conduct a multiple regression to predict the dependent variable “Murder” using all other variables as the independent variables and show the results.

mlr <- lm(Murder ~., data=sub.df)
options(scipen = 999)
summary(mlr)


Call:
lm(formula = Murder ~ ., data = sub.df)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.7960 -1.6495 -0.0811  1.4815  7.6210 

Coefficients:
              Estimate Std. Error t value  Pr(>|t|)    
(Intercept) 1.23456341 3.86611474   0.319    0.7510    
Population  0.00022368 0.00009052   2.471    0.0173 *  
Illiteracy  4.14283659 0.87435319   4.738 0.0000219 ***
Income      0.00006442 0.00068370   0.094    0.9253    
Frost       0.00058131 0.01005366   0.058    0.9541    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.535 on 45 degrees of freedom
Multiple R-squared:  0.567, Adjusted R-squared:  0.5285 
F-statistic: 14.73 on 4 and 45 DF,  p-value: 0.00000009133

Q4: Use stepwise method to select independent variables and conduct multiple regression with the selected independent variables.

mlr.step <- step(mlr, direction = "both")

Start:  AIC=97.75
Murder ~ Population + Illiteracy + Income + Frost

             Df Sum of Sq    RSS     AIC
- Frost       1     0.021 289.19  95.753
- Income      1     0.057 289.22  95.759
<none>                    289.17  97.749
- Population  1    39.238 328.41 102.111
- Illiteracy  1   144.264 433.43 115.986

Step:  AIC=95.75
Murder ~ Population + Illiteracy + Income

             Df Sum of Sq    RSS     AIC
- Income      1     0.057 289.25  93.763
<none>                    289.19  95.753
+ Frost       1     0.021 289.17  97.749
- Population  1    43.658 332.85 100.783
- Illiteracy  1   236.196 525.38 123.605

Step:  AIC=93.76
Murder ~ Population + Illiteracy

             Df Sum of Sq    RSS     AIC
<none>                    289.25  93.763
+ Income      1     0.057 289.19  95.753
+ Frost       1     0.021 289.22  95.759
- Population  1    48.517 337.76  99.516
- Illiteracy  1   299.646 588.89 127.311

summary(mlr.step)


Call:
lm(formula = Murder ~ Population + Illiteracy, data = sub.df)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.7652 -1.6561 -0.0898  1.4570  7.6758 

Coefficients:
              Estimate Std. Error t value      Pr(>|t|)    
(Intercept) 1.65154974 0.81011208   2.039       0.04713 *  
Population  0.00022419 0.00007984   2.808       0.00724 ** 
Illiteracy  4.08073664 0.58481561   6.978 0.00000000883 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.481 on 47 degrees of freedom
Multiple R-squared:  0.5668,    Adjusted R-squared:  0.5484 
F-statistic: 30.75 on 2 and 47 DF,  p-value: 0.000000002893

Q5: Compare the results in Q3 and Q4.

summary(mlr)


Call:
lm(formula = Murder ~ ., data = sub.df)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.7960 -1.6495 -0.0811  1.4815  7.6210 

Coefficients:
              Estimate Std. Error t value  Pr(>|t|)    
(Intercept) 1.23456341 3.86611474   0.319    0.7510    
Population  0.00022368 0.00009052   2.471    0.0173 *  
Illiteracy  4.14283659 0.87435319   4.738 0.0000219 ***
Income      0.00006442 0.00068370   0.094    0.9253    
Frost       0.00058131 0.01005366   0.058    0.9541    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.535 on 45 degrees of freedom
Multiple R-squared:  0.567, Adjusted R-squared:  0.5285 
F-statistic: 14.73 on 4 and 45 DF,  p-value: 0.00000009133

summary(mlr.step)


Call:
lm(formula = Murder ~ Population + Illiteracy, data = sub.df)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.7652 -1.6561 -0.0898  1.4570  7.6758 

Coefficients:
              Estimate Std. Error t value      Pr(>|t|)    
(Intercept) 1.65154974 0.81011208   2.039       0.04713 *  
Population  0.00022419 0.00007984   2.808       0.00724 ** 
Illiteracy  4.08073664 0.58481561   6.978 0.00000000883 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.481 on 47 degrees of freedom
Multiple R-squared:  0.5668,    Adjusted R-squared:  0.5484 
F-statistic: 30.75 on 2 and 47 DF,  p-value: 0.000000002893

항목	Q3 (모든 변수 포함)	Q4 (Stepwise 선택 변수)
사용 변수	Population, Illiteracy, Income, Frost	Population, Illiteracy
주요 유의 변수	Population, Illiteracy	Population, Illiteracy
R-squared	0.567	0.5668
모델 특성	불필요한 변수 포함 가능성	간결하고 해석 용이한 모델

Q2.문제 답안 보고서

2025-07-08

Q1: Check the first 3 rows of this dataset.

Q2: Create a new data frame using variable “Murder”, “Population”, “Illiteracy”, “Income” and “Frost” and check it.

Q3: Conduct a multiple regression to predict the dependent variable “Murder” using all other variables as the independent variables and show the results.

Q4: Use stepwise method to select independent variables and conduct multiple regression with the selected independent variables.

Q5: Compare the results in Q3 and Q4.