Economic and unemployment data were recorded.
There are 16 rows of data. The data include:
I, the index;
A1, the percentage price deflation;
A2, the GNP in millions of dollars;
A3, the number of unemployed;
A4, the number of people employed by the military;
A5, the number of people over 14;
A6, the year
Y, the number of people employed.
A3-A5 are in units of thousands
We seek a model of the form:
\(Y = X_0 + A_1X_1 + A_2X_2 + A_3X_3 + A_4X_4 + A_5X_5 + A_6X_6\)
# Load data
data <- data.frame(A1=c(83.0, 88.5, 88.2, 89.5, 96.2, 98.1, 99.0, 100.0, 101.2, 104.6, 108.4, 110.8, 112.6, 114.2, 115.7, 116.9),
A2=c(234289, 259426, 258054, 284599, 328975, 346999, 365385, 363112, 397469, 419180, 442769, 444546, 482704, 502601, 518173, 554894),
A3=c(2356, 2326, 3682, 3351, 2099, 1932, 1870, 3578, 2904, 2822, 2936, 4681, 3813, 3931, 4806, 4007),
A4=c(1590, 1456, 1616, 1650, 3099, 3594, 3547, 3350, 3048, 2857, 2798, 2637, 2552, 2514, 2572, 2827),
A5=c(107608, 108632, 109773, 110929, 112075, 113270, 115094, 116219, 117388, 118734, 120445, 121950, 123366, 125368, 127852, 130081),
A6=c(1947, 1948, 1949, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962),
Y=c(60323, 61122, 60171, 61187, 63221, 63639, 64989, 63761, 66019, 67857, 68169, 66513, 68655, 69564, 69331, 70561))
data <- data.frame(data)
data
## A1 A2 A3 A4 A5 A6 Y
## 1 83.0 234289 2356 1590 107608 1947 60323
## 2 88.5 259426 2326 1456 108632 1948 61122
## 3 88.2 258054 3682 1616 109773 1949 60171
## 4 89.5 284599 3351 1650 110929 1950 61187
## 5 96.2 328975 2099 3099 112075 1951 63221
## 6 98.1 346999 1932 3594 113270 1952 63639
## 7 99.0 365385 1870 3547 115094 1953 64989
## 8 100.0 363112 3578 3350 116219 1954 63761
## 9 101.2 397469 2904 3048 117388 1955 66019
## 10 104.6 419180 2822 2857 118734 1956 67857
## 11 108.4 442769 2936 2798 120445 1957 68169
## 12 110.8 444546 4681 2637 121950 1958 66513
## 13 112.6 482704 3813 2552 123366 1959 68655
## 14 114.2 502601 3931 2514 125368 1960 69564
## 15 115.7 518173 4806 2572 127852 1961 69331
## 16 116.9 554894 4007 2827 130081 1962 70561
cor(x=data[,c(1:6)],y=data[,c(7)])
## [,1]
## A1 0.9708914
## A2 0.9835925
## A3 0.5024692
## A4 0.4572380
## A5 0.9604640
## A6 0.9713535
It is clear that the GNP in millions of dollars(A2) has the highest contribution to the number of people employed follow by the year(A6) and the percentage price deflation(A1).
# Fit a multiple linear regression model A1
model1 <- lm(Y ~ A1 + A3 + A4 + A5, data = data)
summary(model1)
##
## Call:
## lm(formula = Y ~ A1 + A3 + A4 + A5, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -652.19 -256.11 -0.53 121.31 1221.57
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13699.5644 6872.1758 1.993 0.071599 .
## A1 206.3701 81.2103 2.541 0.027422 *
## A3 -1.2427 0.2775 -4.479 0.000933 ***
## A4 -0.5971 0.3319 -1.799 0.099484 .
## A5 0.3079 0.1235 2.493 0.029897 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 565.9 on 11 degrees of freedom
## Multiple R-squared: 0.981, Adjusted R-squared: 0.974
## F-statistic: 141.7 on 4 and 11 DF, p-value: 2.203e-09
# Fit a multiple linear regression model with A2
model2 <- lm(Y ~ A2+ A3 + A4 + A5 , data = data)
summary(model2)
##
## Call:
## lm(formula = Y ~ A2 + A3 + A4 + A5, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -571.73 -326.50 53.64 200.67 965.66
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.235e+04 2.173e+04 3.790 0.00300 **
## A2 6.193e-02 1.594e-02 3.885 0.00254 **
## A3 -5.232e-01 2.909e-01 -1.798 0.09961 .
## A4 -5.919e-01 2.593e-01 -2.283 0.04333 *
## A5 -3.221e-01 2.409e-01 -1.337 0.20818
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 462.9 on 11 degrees of freedom
## Multiple R-squared: 0.9873, Adjusted R-squared: 0.9826
## F-statistic: 213.2 on 4 and 11 DF, p-value: 2.427e-10
# Fit a multiple linear regression model with A6
model3 <- lm(Y ~A3+ A4 + A5 + A6, data = data)
summary(model3)
##
## Call:
## lm(formula = Y ~ A3 + A4 + A5 + A6, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -410.89 -140.32 2.97 82.50 563.32
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.440e+06 3.392e+05 -7.193 1.77e-05 ***
## A3 -1.501e+00 1.512e-01 -9.931 7.92e-07 ***
## A4 -9.340e-01 1.851e-01 -5.047 0.000374 ***
## A5 -2.262e-01 1.175e-01 -1.925 0.080511 .
## A6 1.299e+03 1.807e+02 7.189 1.78e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 298.7 on 11 degrees of freedom
## Multiple R-squared: 0.9947, Adjusted R-squared: 0.9928
## F-statistic: 516.1 on 4 and 11 DF, p-value: 1.971e-12
# Fit a multiple linear regression model with A1 and A6
model4 <- lm(Y ~ A1 + A3 + A4 + A6, data = data)
summary(model4)
##
## Call:
## lm(formula = Y ~ A1 + A3 + A4 + A6, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -577.91 -119.08 46.45 127.11 745.08
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.832e+06 3.135e+05 -5.844 0.000112 ***
## A1 -7.826e+00 7.015e+01 -0.112 0.913186
## A3 -1.475e+00 1.774e-01 -8.313 4.53e-06 ***
## A4 -7.692e-01 1.952e-01 -3.940 0.002313 **
## A6 9.746e+02 1.641e+02 5.941 9.72e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 345.1 on 11 degrees of freedom
## Multiple R-squared: 0.9929, Adjusted R-squared: 0.9903
## F-statistic: 385.8 on 4 and 11 DF, p-value: 9.65e-12
# Fit a multiple linear regression model with A1 and A2
model5 <- lm(Y ~ A1 + A2 + A4 + A5, data = data)
summary(model5)
##
## Call:
## lm(formula = Y ~ A1 + A2 + A4 + A5, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -560.25 -411.22 38.98 213.25 876.63
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.203e+05 1.778e+04 6.766 3.09e-05 ***
## A1 -1.372e+02 9.081e+01 -1.511 0.158943
## A2 9.665e-02 1.700e-02 5.686 0.000141 ***
## A4 -4.682e-01 2.636e-01 -1.777 0.103277
## A5 -6.581e-01 1.778e-01 -3.701 0.003495 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 479.1 on 11 degrees of freedom
## Multiple R-squared: 0.9864, Adjusted R-squared: 0.9814
## F-statistic: 198.8 on 4 and 11 DF, p-value: 3.546e-10
# Fit a multiple linear regression model with A1 and A2
model6 <- lm(Y ~ A2 + A4 + A5 + A6, data = data)
summary(model6)
##
## Call:
## lm(formula = Y ~ A2 + A4 + A5 + A6, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -568.31 -423.41 -19.85 214.36 1001.00
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.101e+05 6.922e+05 0.448 0.662901
## A2 8.243e-02 1.664e-02 4.954 0.000433 ***
## A4 -4.878e-01 2.897e-01 -1.684 0.120355
## A5 -5.929e-01 2.278e-01 -2.603 0.024568 *
## A6 -1.053e+02 3.631e+02 -0.290 0.777194
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 524.5 on 11 degrees of freedom
## Multiple R-squared: 0.9837, Adjusted R-squared: 0.9777
## F-statistic: 165.5 on 4 and 11 DF, p-value: 9.574e-10
# Fit a multiple linear regression model
model7 <- lm(Y ~ A1 + A2 + A3 + A4 + A5 + A6, data = data)
summary(model7)
##
## Call:
## lm(formula = Y ~ A1 + A2 + A3 + A4 + A5 + A6, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -409.7 -158.0 -27.5 101.5 455.9
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.475e+06 8.880e+05 -3.914 0.003544 **
## A1 1.479e+01 8.472e+01 0.175 0.865281
## A2 -3.575e-02 3.341e-02 -1.070 0.312495
## A3 -2.020e+00 4.873e-01 -4.146 0.002499 **
## A4 -1.033e+00 2.138e-01 -4.831 0.000933 ***
## A5 -4.912e-02 2.255e-01 -0.218 0.832448
## A6 1.826e+03 4.542e+02 4.019 0.003023 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 304 on 9 degrees of freedom
## Multiple R-squared: 0.9955, Adjusted R-squared: 0.9925
## F-statistic: 332.3 on 6 and 9 DF, p-value: 4.853e-10
# Compare models based on AIC
model_compare <- data.frame(Model = c("Model 1", "Model 2", "Model 3","Model 4","Model 5","Model 6","Model 7"),
AIC = c(AIC(model1), AIC(model2), AIC(model3),AIC(model4),AIC(model5),AIC(model6),AIC(model7)))
model_compare
## Model AIC
## 1 Model 1 254.2422
## 2 Model 2 247.8095
## 3 Model 3 233.7882
## 4 Model 4 238.4141
## 5 Model 5 248.9150
## 6 Model 6 251.8111
## 7 Model 7 235.1489
The First best model is Model 3 with an AIC of 233.7882
The Second best model is Model 7 with an AIC of 235.1489. But Using the Adjusted R-squared values ,
Model3 has Adjusted R-squared value of 0.9928
Model7 has Adjusted R-squared value of 0.9925
The overall best model is the model3 with the linear regression equation:
\(Y=-2.440e^{6} - 1.501A_3 - 0.9340A_4 - 0.2262A_5 + 1299 A_6\)