ANLY 505 - Nested Model

### Define Parameters ###
set.seed(1001)

x1 = sample(1:4, replace = TRUE, 100)
x2 = sample(1:4, replace = TRUE, 100)
x3 = sample(1:4, replace = TRUE, 100)
y = rbinom(100, x1, .3) + rbinom(100, x2, .9) + rpois(100, x3/5)

### Full model with all parameters ###
linreg = lm(y ~ x1 + x2 + x3)
summary(linreg)

## 
## Call:
## lm(formula = y ~ x1 + x2 + x3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.0334 -0.8558 -0.0677  0.4114  4.0576 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.85818    0.40328   2.128   0.0359 *  
## x1           0.09936    0.09540   1.042   0.3003    
## x2           0.89235    0.08823  10.114   <2e-16 ***
## x3           0.03083    0.09088   0.339   0.7352    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.048 on 96 degrees of freedom
## Multiple R-squared:  0.5253, Adjusted R-squared:  0.5104 
## F-statistic: 35.41 on 3 and 96 DF,  p-value: 1.699e-15

From the full model, the x2 variable (p-value < 0.05) found to be significant and the x1, x3 are insignificant variables (p-value > 0.05).

FIND A NESTED MODEL IN LINREG THAT IMPROVES F STATISTIC

### Model with significant parameters ###
linreg2 = lm(y ~ x2)
summary(linreg2)

## 
## Call:
## lm(formula = y ~ x2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9607 -0.8593  0.0393  0.2421  3.9379 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.16350    0.24184   4.811 5.44e-06 ***
## x2           0.89860    0.08727  10.297  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.044 on 98 degrees of freedom
## Multiple R-squared:  0.5197, Adjusted R-squared:  0.5148 
## F-statistic:   106 on 1 and 98 DF,  p-value: < 2.2e-16

### Model with insignificant parameters ###
linreg3 = lm(y ~ x1 + x3)
summary(linreg3)

## 
## Call:
## lm(formula = y ~ x1 + x3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.5589 -1.1847 -0.2686  1.0688  4.1651 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.7308     0.5122   5.331 6.36e-07 ***
## x1            0.1452     0.1362   1.066    0.289    
## x3            0.1309     0.1292   1.013    0.314    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.499 on 97 degrees of freedom
## Multiple R-squared:  0.01944,    Adjusted R-squared:  -0.0007822 
## F-statistic: 0.9613 on 2 and 97 DF,  p-value: 0.386

Individual model with x2 variable alone also showed it is a significant variable and model with x1 and x3 found to insignicant.

### Comparison of full model containing both significant and insignificant parameters with the model containing insignificant parameters ###
anova(linreg,linreg3)

## Analysis of Variance Table
## 
## Model 1: y ~ x1 + x2 + x3
## Model 2: y ~ x1 + x3
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     96 105.48                                  
## 2     97 217.87 -1   -112.39 102.29 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Comparison of full model containing both significant and insignificant parameters with the model containing insignificant parameters was performed and found to be that x2 is significant coefficient.

### Comparison of full model containing both significant and insignificant parameters with the model containing significant parameters ###
anova(linreg,linreg2)

## Analysis of Variance Table
## 
## Model 1: y ~ x1 + x2 + x3
## Model 2: y ~ x2
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     96 105.48                           
## 2     98 106.72 -2   -1.2388 0.5637 0.5709

Comparison of full model containing both significant and insignificant parameters with the model containing significant parameters was performed and found to be that x1 and x3 are insignificant coefficients. Therefore x1 and x3 did not contribute to reduce the error in the model and are not needed for the analysis.

ANLY 505 - Nested Model

Dhruva KUmar Chepuri

2020-04-20