Q1: Check the first 3 rows of this dataset.
head(state.x77, 3)
Population Income Illiteracy Life Exp Murder HS Grad Frost Area
Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
Q2: Create a new data frame using variable “Murder”, “Population”,
“Illiteracy”, “Income” and “Frost” and check it.
df <- as.data.frame(state.x77)
sub.df <- df[, c("Murder","Population","Illiteracy", "Income", "Frost")]
head(sub.df)
Murder Population Illiteracy Income Frost
Alabama 15.1 3615 2.1 3624 20
Alaska 11.3 365 1.5 6315 152
Arizona 7.8 2212 1.8 4530 15
Arkansas 10.1 2110 1.9 3378 65
California 10.3 21198 1.1 5114 20
Colorado 6.8 2541 0.7 4884 166
Q3: Conduct a multiple regression to predict the dependent variable
“Murder” using all other variables as the independent variables and show
the results.
mlr <- lm(Murder ~., data=sub.df)
options(scipen = 999)
summary(mlr)
Call:
lm(formula = Murder ~ ., data = sub.df)
Residuals:
Min 1Q Median 3Q Max
-4.7960 -1.6495 -0.0811 1.4815 7.6210
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.23456341 3.86611474 0.319 0.7510
Population 0.00022368 0.00009052 2.471 0.0173 *
Illiteracy 4.14283659 0.87435319 4.738 0.0000219 ***
Income 0.00006442 0.00068370 0.094 0.9253
Frost 0.00058131 0.01005366 0.058 0.9541
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.535 on 45 degrees of freedom
Multiple R-squared: 0.567, Adjusted R-squared: 0.5285
F-statistic: 14.73 on 4 and 45 DF, p-value: 0.00000009133
Q4: Use stepwise method to select independent variables and conduct
multiple regression with the selected independent variables.
mlr.step <- step(mlr, direction = "both")
Start: AIC=97.75
Murder ~ Population + Illiteracy + Income + Frost
Df Sum of Sq RSS AIC
- Frost 1 0.021 289.19 95.753
- Income 1 0.057 289.22 95.759
<none> 289.17 97.749
- Population 1 39.238 328.41 102.111
- Illiteracy 1 144.264 433.43 115.986
Step: AIC=95.75
Murder ~ Population + Illiteracy + Income
Df Sum of Sq RSS AIC
- Income 1 0.057 289.25 93.763
<none> 289.19 95.753
+ Frost 1 0.021 289.17 97.749
- Population 1 43.658 332.85 100.783
- Illiteracy 1 236.196 525.38 123.605
Step: AIC=93.76
Murder ~ Population + Illiteracy
Df Sum of Sq RSS AIC
<none> 289.25 93.763
+ Income 1 0.057 289.19 95.753
+ Frost 1 0.021 289.22 95.759
- Population 1 48.517 337.76 99.516
- Illiteracy 1 299.646 588.89 127.311
summary(mlr.step)
Call:
lm(formula = Murder ~ Population + Illiteracy, data = sub.df)
Residuals:
Min 1Q Median 3Q Max
-4.7652 -1.6561 -0.0898 1.4570 7.6758
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.65154974 0.81011208 2.039 0.04713 *
Population 0.00022419 0.00007984 2.808 0.00724 **
Illiteracy 4.08073664 0.58481561 6.978 0.00000000883 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.481 on 47 degrees of freedom
Multiple R-squared: 0.5668, Adjusted R-squared: 0.5484
F-statistic: 30.75 on 2 and 47 DF, p-value: 0.000000002893
Q5: Compare the results in Q3 and Q4.
summary(mlr)
Call:
lm(formula = Murder ~ ., data = sub.df)
Residuals:
Min 1Q Median 3Q Max
-4.7960 -1.6495 -0.0811 1.4815 7.6210
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.23456341 3.86611474 0.319 0.7510
Population 0.00022368 0.00009052 2.471 0.0173 *
Illiteracy 4.14283659 0.87435319 4.738 0.0000219 ***
Income 0.00006442 0.00068370 0.094 0.9253
Frost 0.00058131 0.01005366 0.058 0.9541
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.535 on 45 degrees of freedom
Multiple R-squared: 0.567, Adjusted R-squared: 0.5285
F-statistic: 14.73 on 4 and 45 DF, p-value: 0.00000009133
summary(mlr.step)
Call:
lm(formula = Murder ~ Population + Illiteracy, data = sub.df)
Residuals:
Min 1Q Median 3Q Max
-4.7652 -1.6561 -0.0898 1.4570 7.6758
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.65154974 0.81011208 2.039 0.04713 *
Population 0.00022419 0.00007984 2.808 0.00724 **
Illiteracy 4.08073664 0.58481561 6.978 0.00000000883 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.481 on 47 degrees of freedom
Multiple R-squared: 0.5668, Adjusted R-squared: 0.5484
F-statistic: 30.75 on 2 and 47 DF, p-value: 0.000000002893
사용 변수 |
Population, Illiteracy, Income, Frost |
Population, Illiteracy |
주요 유의 변수 |
Population, Illiteracy |
Population, Illiteracy |
R-squared |
0.567 |
0.5668 |
모델 특성 |
불필요한 변수 포함 가능성 |
간결하고 해석 용이한 모델 |