Economic and unemployment data were recorded.

There are 16 rows of data.  The data include:

  I,  the index;
 A1, the percentage price deflation;
 A2, the GNP in millions of dollars;
  A3, the number of unemployed;
  A4, the number of people employed by the military;
  A5, the number of people over 14;
  A6, the year
  Y,  the number of people employed.

 A3-A5 are in units of thousands
  We seek a model of the form:

\(Y = X_0 + A_1X_1 + A_2X_2 + A_3X_3 + A_4X_4 + A_5X_5 + A_6X_6\)

Load Data

# Load data
data <- data.frame(A1=c(83.0, 88.5, 88.2, 89.5, 96.2, 98.1, 99.0, 100.0, 101.2, 104.6, 108.4, 110.8, 112.6, 114.2, 115.7, 116.9),
                   A2=c(234289, 259426, 258054, 284599, 328975, 346999, 365385, 363112, 397469, 419180, 442769, 444546, 482704, 502601, 518173, 554894),
                   A3=c(2356, 2326, 3682, 3351, 2099, 1932, 1870, 3578, 2904, 2822, 2936, 4681, 3813, 3931, 4806, 4007),
                   A4=c(1590, 1456, 1616, 1650, 3099, 3594, 3547, 3350, 3048, 2857, 2798, 2637, 2552, 2514, 2572, 2827),
                   A5=c(107608, 108632, 109773, 110929, 112075, 113270, 115094, 116219, 117388, 118734, 120445, 121950, 123366, 125368, 127852, 130081),
                   A6=c(1947, 1948, 1949, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962),
                   Y=c(60323, 61122, 60171, 61187, 63221, 63639, 64989, 63761, 66019, 67857, 68169, 66513, 68655, 69564, 69331, 70561))
data <- data.frame(data)
data
##       A1     A2   A3   A4     A5   A6     Y
## 1   83.0 234289 2356 1590 107608 1947 60323
## 2   88.5 259426 2326 1456 108632 1948 61122
## 3   88.2 258054 3682 1616 109773 1949 60171
## 4   89.5 284599 3351 1650 110929 1950 61187
## 5   96.2 328975 2099 3099 112075 1951 63221
## 6   98.1 346999 1932 3594 113270 1952 63639
## 7   99.0 365385 1870 3547 115094 1953 64989
## 8  100.0 363112 3578 3350 116219 1954 63761
## 9  101.2 397469 2904 3048 117388 1955 66019
## 10 104.6 419180 2822 2857 118734 1956 67857
## 11 108.4 442769 2936 2798 120445 1957 68169
## 12 110.8 444546 4681 2637 121950 1958 66513
## 13 112.6 482704 3813 2552 123366 1959 68655
## 14 114.2 502601 3931 2514 125368 1960 69564
## 15 115.7 518173 4806 2572 127852 1961 69331
## 16 116.9 554894 4007 2827 130081 1962 70561

Check the features that contributed to the the number of people employed postively and negatively

cor(x=data[,c(1:6)],y=data[,c(7)])
##         [,1]
## A1 0.9708914
## A2 0.9835925
## A3 0.5024692
## A4 0.4572380
## A5 0.9604640
## A6 0.9713535

It is clear that the GNP in millions of dollars(A2) has the highest contribution to the number of people employed follow by the year(A6) and the percentage price deflation(A1).

Fit Different Models by changing A1 ,A2 AND A6

# Fit a multiple linear regression model  A1
model1 <- lm(Y ~ A1 + A3 + A4 + A5, data = data)
summary(model1)
## 
## Call:
## lm(formula = Y ~ A1 + A3 + A4 + A5, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -652.19 -256.11   -0.53  121.31 1221.57 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13699.5644  6872.1758   1.993 0.071599 .  
## A1            206.3701    81.2103   2.541 0.027422 *  
## A3             -1.2427     0.2775  -4.479 0.000933 ***
## A4             -0.5971     0.3319  -1.799 0.099484 .  
## A5              0.3079     0.1235   2.493 0.029897 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 565.9 on 11 degrees of freedom
## Multiple R-squared:  0.981,  Adjusted R-squared:  0.974 
## F-statistic: 141.7 on 4 and 11 DF,  p-value: 2.203e-09
# Fit a multiple linear regression model with A2
model2 <- lm(Y ~ A2+ A3 + A4 + A5 , data = data)
summary(model2)
## 
## Call:
## lm(formula = Y ~ A2 + A3 + A4 + A5, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -571.73 -326.50   53.64  200.67  965.66 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  8.235e+04  2.173e+04   3.790  0.00300 **
## A2           6.193e-02  1.594e-02   3.885  0.00254 **
## A3          -5.232e-01  2.909e-01  -1.798  0.09961 . 
## A4          -5.919e-01  2.593e-01  -2.283  0.04333 * 
## A5          -3.221e-01  2.409e-01  -1.337  0.20818   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 462.9 on 11 degrees of freedom
## Multiple R-squared:  0.9873, Adjusted R-squared:  0.9826 
## F-statistic: 213.2 on 4 and 11 DF,  p-value: 2.427e-10
# Fit a multiple linear regression model with A6
model3 <- lm(Y ~A3+ A4 + A5 + A6, data = data)
summary(model3)
## 
## Call:
## lm(formula = Y ~ A3 + A4 + A5 + A6, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -410.89 -140.32    2.97   82.50  563.32 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.440e+06  3.392e+05  -7.193 1.77e-05 ***
## A3          -1.501e+00  1.512e-01  -9.931 7.92e-07 ***
## A4          -9.340e-01  1.851e-01  -5.047 0.000374 ***
## A5          -2.262e-01  1.175e-01  -1.925 0.080511 .  
## A6           1.299e+03  1.807e+02   7.189 1.78e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 298.7 on 11 degrees of freedom
## Multiple R-squared:  0.9947, Adjusted R-squared:  0.9928 
## F-statistic: 516.1 on 4 and 11 DF,  p-value: 1.971e-12
# Fit a multiple linear regression model with A1 and A6
model4 <- lm(Y ~ A1 + A3 + A4 + A6, data = data)
summary(model4)
## 
## Call:
## lm(formula = Y ~ A1 + A3 + A4 + A6, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -577.91 -119.08   46.45  127.11  745.08 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.832e+06  3.135e+05  -5.844 0.000112 ***
## A1          -7.826e+00  7.015e+01  -0.112 0.913186    
## A3          -1.475e+00  1.774e-01  -8.313 4.53e-06 ***
## A4          -7.692e-01  1.952e-01  -3.940 0.002313 ** 
## A6           9.746e+02  1.641e+02   5.941 9.72e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 345.1 on 11 degrees of freedom
## Multiple R-squared:  0.9929, Adjusted R-squared:  0.9903 
## F-statistic: 385.8 on 4 and 11 DF,  p-value: 9.65e-12
# Fit a multiple linear regression model with A1 and A2
model5 <- lm(Y ~ A1 + A2 + A4 + A5, data = data)
summary(model5)
## 
## Call:
## lm(formula = Y ~ A1 + A2 + A4 + A5, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -560.25 -411.22   38.98  213.25  876.63 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.203e+05  1.778e+04   6.766 3.09e-05 ***
## A1          -1.372e+02  9.081e+01  -1.511 0.158943    
## A2           9.665e-02  1.700e-02   5.686 0.000141 ***
## A4          -4.682e-01  2.636e-01  -1.777 0.103277    
## A5          -6.581e-01  1.778e-01  -3.701 0.003495 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 479.1 on 11 degrees of freedom
## Multiple R-squared:  0.9864, Adjusted R-squared:  0.9814 
## F-statistic: 198.8 on 4 and 11 DF,  p-value: 3.546e-10
# Fit a multiple linear regression model with A1 and A2
model6 <- lm(Y ~  A2 + A4 + A5 + A6, data = data)
summary(model6)
## 
## Call:
## lm(formula = Y ~ A2 + A4 + A5 + A6, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -568.31 -423.41  -19.85  214.36 1001.00 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.101e+05  6.922e+05   0.448 0.662901    
## A2           8.243e-02  1.664e-02   4.954 0.000433 ***
## A4          -4.878e-01  2.897e-01  -1.684 0.120355    
## A5          -5.929e-01  2.278e-01  -2.603 0.024568 *  
## A6          -1.053e+02  3.631e+02  -0.290 0.777194    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 524.5 on 11 degrees of freedom
## Multiple R-squared:  0.9837, Adjusted R-squared:  0.9777 
## F-statistic: 165.5 on 4 and 11 DF,  p-value: 9.574e-10
# Fit a multiple linear regression model
model7 <- lm(Y ~ A1 + A2 + A3 + A4 + A5 + A6, data = data)
summary(model7)
## 
## Call:
## lm(formula = Y ~ A1 + A2 + A3 + A4 + A5 + A6, data = data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -409.7 -158.0  -27.5  101.5  455.9 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3.475e+06  8.880e+05  -3.914 0.003544 ** 
## A1           1.479e+01  8.472e+01   0.175 0.865281    
## A2          -3.575e-02  3.341e-02  -1.070 0.312495    
## A3          -2.020e+00  4.873e-01  -4.146 0.002499 ** 
## A4          -1.033e+00  2.138e-01  -4.831 0.000933 ***
## A5          -4.912e-02  2.255e-01  -0.218 0.832448    
## A6           1.826e+03  4.542e+02   4.019 0.003023 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 304 on 9 degrees of freedom
## Multiple R-squared:  0.9955, Adjusted R-squared:  0.9925 
## F-statistic: 332.3 on 6 and 9 DF,  p-value: 4.853e-10

Pick the best model using the AIC

# Compare models based on AIC
model_compare <- data.frame(Model = c("Model 1", "Model 2", "Model 3","Model 4","Model 5","Model 6","Model 7"),
                            AIC = c(AIC(model1), AIC(model2), AIC(model3),AIC(model4),AIC(model5),AIC(model6),AIC(model7)))

model_compare
##     Model      AIC
## 1 Model 1 254.2422
## 2 Model 2 247.8095
## 3 Model 3 233.7882
## 4 Model 4 238.4141
## 5 Model 5 248.9150
## 6 Model 6 251.8111
## 7 Model 7 235.1489

The  First  best  model  is  Model 3  with  an  AIC  of  233.7882

The  Second  best  model  is  Model 7  with  an  AIC  of  235.1489. But  Using  the  Adjusted  R-squared  values ,

Model3  has  Adjusted  R-squared  value  of  0.9928

Model7  has  Adjusted  R-squared  value  of  0.9925

The  overall  best  model  is  the  model3  with  the  linear  regression  equation:

\(Y=-2.440e^{6} - 1.501A_3 - 0.9340A_4 - 0.2262A_5 + 1299 A_6\)