Prepare data

file <- "assignment3.sas7bdat"
data <- read_sas(file)
summary(data)
##      IMALE            IEDUC          PAQAGE        IMARRIED    
##  Min.   :0.0000   Min.   : 1.0   Min.   :18.0   Min.   :0.000  
##  1st Qu.:0.0000   1st Qu.:12.0   1st Qu.:42.0   1st Qu.:0.000  
##  Median :0.0000   Median :13.0   Median :57.0   Median :1.000  
##  Mean   :0.3827   Mean   :13.3   Mean   :54.8   Mean   :0.592  
##  3rd Qu.:1.0000   3rd Qu.:16.0   3rd Qu.:67.0   3rd Qu.:1.000  
##  Max.   :1.0000   Max.   :18.0   Max.   :98.0   Max.   :1.000  
##  NA's   :78       NA's   :126    NA's   :51     NA's   :71     
##     IADJINCM         NCLINDEP           CHF                MI         
##  Min.   :   625   Min.   :0.0000   Min.   :0.00000   Min.   :0.00000  
##  1st Qu.: 12374   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.00000  
##  Median : 17678   Median :0.0000   Median :0.00000   Median :0.00000  
##  Mean   : 22922   Mean   :0.1401   Mean   :0.06577   Mean   :0.03258  
##  3rd Qu.: 31820   3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:0.00000  
##  Max.   :100000   Max.   :1.0000   Max.   :1.00000   Max.   :1.00000  
##  NA's   :405      NA's   :37                                          
##      IWHITE           sevhyp             PSQG             PSQT       
##  Min.   :0.0000   Min.   :0.00000   Min.   :  0.00   Min.   :  0.00  
##  1st Qu.:1.0000   1st Qu.:0.00000   1st Qu.: 50.00   1st Qu.: 57.50  
##  Median :1.0000   Median :0.00000   Median : 66.67   Median : 67.50  
##  Mean   :0.7916   Mean   :0.03349   Mean   : 63.92   Mean   : 67.06  
##  3rd Qu.:1.0000   3rd Qu.:0.00000   3rd Qu.: 79.17   3rd Qu.: 77.50  
##  Max.   :1.0000   Max.   :1.00000   Max.   :100.00   Max.   :100.00  
##  NA's   :117      NA's   :59        NA's   :41       NA's   :42      
##       PSQI              PSQF            PSQTI             PSQA       
##  Min.   :  3.571   Min.   :  0.00   Min.   :  0.00   Min.   :  6.25  
##  1st Qu.: 64.286   1st Qu.: 53.12   1st Qu.: 50.00   1st Qu.: 62.50  
##  Median : 75.000   Median : 68.75   Median : 75.00   Median : 72.92  
##  Mean   : 72.006   Mean   : 65.39   Mean   : 65.02   Mean   : 71.08  
##  3rd Qu.: 82.143   3rd Qu.: 81.25   3rd Qu.: 75.00   3rd Qu.: 81.25  
##  Max.   :100.000   Max.   :100.00   Max.   :100.00   Max.   :100.00  
##  NA's   :100       NA's   :96       NA's   :71       NA's   :55      
##     severity    
##  Min.   :1.000  
##  1st Qu.:2.000  
##  Median :3.000  
##  Mean   :2.839  
##  3rd Qu.:3.000  
##  Max.   :4.000  
##  NA's   :142
data$IMALE <- as.factor(data$IMALE)
data$IMARRIED <- as.factor(data$IMARRIED)
data$NCLINDEP <- as.factor(data$NCLINDEP)
data$CHF <- as.factor(data$CHF)
data$MI <- as.factor(data$MI)
data$IWHITE <- as.factor(data$IWHITE)
data$sevhyp <- as.factor(data$sevhyp)
#data$severity <- as.factor(data$severity)
#data$severity <- ordered(data$severity,levels=c("1","2","3","4"))

Examine pattern of missing values

Two plots are generated to understand the distribution of missing values, including their combination. The first plot is scaled so that cell height is proportional to N, while the second is not. Combined, the plots illustrate that most cases are complete. Of cases with missing values, 73% are missing income alone. Very few cases have combinations of missing values.

Summary tables from the previous section indicate a low incidence of missing values amongst variables of interest:

  1. 41 cases are missing the response variable (PSQG) (1%). In this revision, cases missing the response variable are deleted.

  2. 0 are missing pain severity (4%)

  3. 37 37 are missing clinical depression (1%)

  4. Overall, 73.91% of cases are complete.

aggr(data, prop = T, col=("blue"),numbers = T,bars=TRUE,only.miss=TRUE,combined=TRUE,sortCombs=TRUE,varheight=TRUE)

aggr(data, prop = T, col=("blue"),numbers = T,bars=TRUE,only.miss=TRUE,combined=TRUE,sortCombs=TRUE,sortVars=TRUE,labels=TRUE)

## 
##  Variables sorted by number of missings: 
##  Variable Count
##  IADJINCM   392
##  severity   129
##     IEDUC   123
##    IWHITE   111
##     IMALE    77
##  IMARRIED    69
##      PSQI    64
##      PSQF    62
##    sevhyp    56
##    PAQAGE    51
##  NCLINDEP    37
##     PSQTI    33
##      PSQA    16
##      PSQT     3
##       CHF     0
##        MI     0
##      PSQG     0

Margin plots to look for sign of non-random missingness

Three plots are generated to look for signs that missingness is conditional on the value of key variables. Quality scores appear independent of the missingness of income, pain severity, and depression, and vice-versa.

Given that income has the highest incidence of missing data, I think it is important to assess for non-random missingness (even if will only be included as potential confound). Marginplot for income by education conditoned on missing value status suggest a non-random association, specifically non-response bias for individiuals of low socioeconomic status.

# Key response and predictor variables
marginplot(data[c(11, 6)], col = c("lightblue", "red", "orange"))

marginplot(data[c(6, 17)], col = c("lightblue", "red", "orange"))

marginplot(data[c(11, 17)], col = c("lightblue", "red", "orange"))

# Income (highest incidence of missing data)
marginplot(data[c(11, 5)], col = c("lightblue", "red", "orange"))

prop.table(table(data$NCLINDEP,is.na(data$severity)),2)
##    
##         FALSE      TRUE
##   0 0.8593699 0.8818898
##   1 0.1406301 0.1181102
marginplot(data[c(2, 5)], col = c("lightblue", "red", "orange"),numbers=T)

t.test(IEDUC~is.na(IADJINCM),data=data)
## 
##  Welch Two Sample t-test
## 
## data:  IEDUC by is.na(IADJINCM)
## t = 7.4599, df = 371.93, p-value = 6.17e-13
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.915084 1.570172
## sample estimates:
## mean in group FALSE  mean in group TRUE 
##            13.44602            12.20339

Variable transformation

Before imputation, will examine the distribution of variables to see whether it might be reasonable to test some variable transformations during the modeling.

data %>%
  keep(is.numeric) %>% 
  gather() %>% 
  ggplot(aes(value)) +
    facet_wrap(~ key, scales = "free") +
    geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 873 rows containing non-finite values (stat_bin).

The distributions for income, and to a lesser extent education, are skewed. During modeling, I will include a log transformed income variable as well as a square root for education

data$logIADJINCM <- log(data$IADJINCM)
data$sqrIEDUC <- (data$IEDUC^2)
ggplot(data=data,aes(data$logIADJINCM))+
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 392 rows containing non-finite values (stat_bin).

ggplot(data=data,aes(data$sqrIEDUC))+
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 123 rows containing non-finite values (stat_bin).

Impute mean (or mode) value

One largely discredited technqiue for managing missing data is to impute a measure of central tendency (e.g. median, or mode for categorical data). A second approach, complete case analysis, will also be used later.

# Mode function from https://www.tutorialspoint.com/r/r_mean_median_mode.htm
getmode <- function(v) { 
   uniqv <- unique(v)
   uniqv[which.max(tabulate(match(v, uniqv)))]
}
# Impute function, I created
impute.mean <- function(x) {
  if (class(x) == "numeric") {
    x[is.na(x)] <- median(x,na.rm=TRUE) # Use median if numeric
    return(as.numeric(x))
  }
  else {
    x[is.na(x)] <- getmode(x) # Otherwise use mode
    return(as.factor(x))
  }
}
# I thought I'd be able to use apply but it coereces all data to the same data type, not sure if there is an equivalent function that doesn't do that
data.mean <- data
data.mean$IMALE <- impute.mean(data$IMALE)
data.mean$IEDUC <- impute.mean(data$IEDUC)
data.mean$PAQAGE <- impute.mean(data$PAQAGE)
data.mean$IMARRIED <- impute.mean(data$IMARRIED)
data.mean$IADJINCM <- impute.mean(data$IADJINCM)
data.mean$NCLINDEP <- impute.mean(data$NCLINDEP)
data.mean$CHF <- impute.mean(data$CHF)
data.mean$MI <- impute.mean(data$MI)
data.mean$IWHITE <- impute.mean(data$IWHITE)
data.mean$sevhyp <- impute.mean(data$sevhyp)
data.mean$PSQG <- impute.mean(data$PSQG)
data.mean$PSQT <- impute.mean(data$PSQT)
data.mean$PSQI <- impute.mean(data$PSQI)
data.mean$PSQF <- impute.mean(data$PSQF)
data.mean$PSQTI <- impute.mean(data$PSQTI)
data.mean$PSQA <- impute.mean(data$PSQA)
data.mean$severity <- impute.mean(data$severity)
data.mean$logIADJINCM <- impute.mean(data$IADJINCM)
data.mean$sqrIEDUC <- impute.mean(data$sqrIEDUC)
# Sanity check
summary(data)
##   IMALE          IEDUC           PAQAGE      IMARRIED       IADJINCM     
##  0   :1952   Min.   : 1.00   Min.   :18.00   0   :1293   Min.   :   625  
##  1   :1214   1st Qu.:12.00   1st Qu.:42.00   1   :1881   1st Qu.: 12374  
##  NA's:  77   Median :13.00   Median :57.00   NA's:  69   Median : 17678  
##              Mean   :13.33   Mean   :54.71               Mean   : 23015  
##              3rd Qu.:16.00   3rd Qu.:67.00               3rd Qu.: 31820  
##              Max.   :18.00   Max.   :98.00               Max.   :100000  
##              NA's   :123     NA's   :51                  NA's   :392     
##  NCLINDEP    CHF      MI        IWHITE      sevhyp          PSQG       
##  0   :2758   0:3032   0:3137   0   : 645   0   :3082   Min.   :  0.00  
##  1   : 448   1: 211   1: 106   1   :2487   1   : 105   1st Qu.: 50.00  
##  NA's:  37                     NA's: 111   NA's:  56   Median : 66.67  
##                                                        Mean   : 63.92  
##                                                        3rd Qu.: 79.17  
##                                                        Max.   :100.00  
##                                                                        
##       PSQT             PSQI              PSQF            PSQTI       
##  Min.   :  0.00   Min.   :  3.571   Min.   :  0.00   Min.   :  0.00  
##  1st Qu.: 57.50   1st Qu.: 64.286   1st Qu.: 53.12   1st Qu.: 50.00  
##  Median : 67.50   Median : 75.000   Median : 68.75   Median : 75.00  
##  Mean   : 67.06   Mean   : 71.986   Mean   : 65.39   Mean   : 65.01  
##  3rd Qu.: 77.50   3rd Qu.: 82.143   3rd Qu.: 81.25   3rd Qu.: 75.00  
##  Max.   :100.00   Max.   :100.000   Max.   :100.00   Max.   :100.00  
##  NA's   :3        NA's   :64        NA's   :62       NA's   :33      
##       PSQA           severity      logIADJINCM        sqrIEDUC    
##  Min.   :  6.25   Min.   :1.000   Min.   : 6.438   Min.   :  1.0  
##  1st Qu.: 62.50   1st Qu.:2.000   1st Qu.: 9.423   1st Qu.:144.0  
##  Median : 72.92   Median :3.000   Median : 9.780   Median :169.0  
##  Mean   : 71.09   Mean   :2.838   Mean   : 9.759   Mean   :186.4  
##  3rd Qu.: 81.25   3rd Qu.:3.000   3rd Qu.:10.368   3rd Qu.:256.0  
##  Max.   :100.00   Max.   :4.000   Max.   :11.513   Max.   :324.0  
##  NA's   :16       NA's   :129     NA's   :392      NA's   :123
summary(data.mean)
##  IMALE        IEDUC           PAQAGE      IMARRIED    IADJINCM     
##  0:2029   Min.   : 1.00   Min.   :18.00   0:1293   Min.   :   625  
##  1:1214   1st Qu.:12.00   1st Qu.:43.00   1:1950   1st Qu.: 12500  
##           Median :13.00   Median :57.00            Median : 17678  
##           Mean   :13.32   Mean   :54.74            Mean   : 22370  
##           3rd Qu.:16.00   3rd Qu.:66.50            3rd Qu.: 25981  
##           Max.   :18.00   Max.   :98.00            Max.   :100000  
##  NCLINDEP CHF      MI       IWHITE   sevhyp        PSQG       
##  0:2795   0:3032   0:3137   0: 645   0:3138   Min.   :  0.00  
##  1: 448   1: 211   1: 106   1:2598   1: 105   1st Qu.: 50.00  
##                                               Median : 66.67  
##                                               Mean   : 63.92  
##                                               3rd Qu.: 79.17  
##                                               Max.   :100.00  
##       PSQT             PSQI              PSQF            PSQTI       
##  Min.   :  0.00   Min.   :  3.571   Min.   :  0.00   Min.   :  0.00  
##  1st Qu.: 57.50   1st Qu.: 64.286   1st Qu.: 53.12   1st Qu.: 50.00  
##  Median : 67.50   Median : 75.000   Median : 68.75   Median : 75.00  
##  Mean   : 67.06   Mean   : 72.045   Mean   : 65.46   Mean   : 65.11  
##  3rd Qu.: 77.50   3rd Qu.: 82.143   3rd Qu.: 81.25   3rd Qu.: 75.00  
##  Max.   :100.00   Max.   :100.000   Max.   :100.00   Max.   :100.00  
##       PSQA           severity      logIADJINCM        sqrIEDUC    
##  Min.   :  6.25   Min.   :1.000   Min.   :   625   Min.   :  1.0  
##  1st Qu.: 62.50   1st Qu.:2.000   1st Qu.: 12500   1st Qu.:144.0  
##  Median : 72.92   Median :3.000   Median : 17678   Median :169.0  
##  Mean   : 71.09   Mean   :2.845   Mean   : 22370   Mean   :185.7  
##  3rd Qu.: 81.25   3rd Qu.:3.000   3rd Qu.: 25981   3rd Qu.:256.0  
##  Max.   :100.00   Max.   :4.000   Max.   :100000   Max.   :324.0
table(complete.cases(data.mean))
## 
## TRUE 
## 3243

Create regression model for the patient satisfaction score

From Bair et al., the goal of the study is to determine the effect of depression and pain severity on patient satisfaction. I will model effects on general satisfaction (PSQG). The dependent variables include depression and pain, with other variables included as potential confounds.

As one approach to variable selection I tried using the regsubsets function of the leaps package. This function calculates model BIC for all combinations of included variables, displayed as a heatmap. With this approach, we can see that optimal model BIC is achieved with education, age, depression and severity.

This procedure could be repeated for each of the datasets (complete, imputed mean, and each frame of multiple imputation). However, given the low incidence of missing data in this dataset, I do not think this is worthwhile.

data.mean <- data.mean[-c(12:16)]
data <- data[-c(12:16)]
complete.leaps <- regsubsets(PSQG~.,data=data,nbest=10)
complete.leaps$nn # number of cases used, see that n << nrow(data)
## [1] 2464
mean.leaps <- regsubsets(PSQG~.,data=data.mean,nbest=10)
## Warning in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax,
## force.in = force.in, : 1 linear dependencies found
## Reordering variables and trying again:
mean.leaps$nn # number of cases used, now n = nrow(data)
## [1] 3243
plot(complete.leaps, scale="bic")

plot(mean.leaps,scale="bic")

mean.m1 <- lm(PSQG~severity,data=data.mean)
mean.m2 <- lm(PSQG~severity+NCLINDEP,data=data.mean)
mean.m3 <- lm(PSQG~severity+NCLINDEP+PAQAGE,data=data.mean)
mean.m4 <- lm(PSQG~severity+NCLINDEP+PAQAGE+sqrIEDUC,data=data.mean)
multiplot(mean.m1,mean.m2,mean.m3,mean.m4,intercept=FALSE)

summary(mean.m4)
## 
## Call:
## lm(formula = PSQG ~ severity + NCLINDEP + PAQAGE + sqrIEDUC, 
##     data = data.mean)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -66.899 -12.039   1.201  13.058  51.901 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 50.808385   2.108512  24.097  < 2e-16 ***
## severity     3.296399   0.394969   8.346  < 2e-16 ***
## NCLINDEP1   -6.778905   1.012516  -6.695 2.53e-11 ***
## PAQAGE       0.185665   0.023161   8.016 1.51e-15 ***
## sqrIEDUC    -0.029563   0.004631  -6.384 1.97e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.83 on 3238 degrees of freedom
## Multiple R-squared:  0.08519,    Adjusted R-squared:  0.08406 
## F-statistic: 75.38 on 4 and 3238 DF,  p-value: < 2.2e-16
plot(mean.m4)

complete <- data[complete.cases(data),]
complete.m4 <- lm(PSQG~severity+NCLINDEP+PAQAGE+sqrIEDUC,data=complete)
summary(complete.m4)
## 
## Call:
## lm(formula = PSQG ~ severity + NCLINDEP + PAQAGE + sqrIEDUC, 
##     data = complete)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -64.452 -12.087   1.116  12.985  51.804 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 51.837253   2.400351  21.596  < 2e-16 ***
## severity     3.419606   0.455716   7.504 8.61e-14 ***
## NCLINDEP1   -6.695808   1.120323  -5.977 2.61e-09 ***
## PAQAGE       0.167743   0.026417   6.350 2.56e-10 ***
## sqrIEDUC    -0.031355   0.005197  -6.034 1.84e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.84 on 2459 degrees of freedom
## Multiple R-squared:  0.08515,    Adjusted R-squared:  0.08366 
## F-statistic: 57.22 on 4 and 2459 DF,  p-value: < 2.2e-16
multiplot(complete.m4,mean.m4,intercept=FALSE)

Model with multiple imputation

We have been asked to use multiple imputation as an alternative to complete case analysis and other simpler techniques. For the imputation, I will use the mice package. I have chosen predictive mean matching ‘pmm’ as the imputation method. Alternatives based on linear regression are probably not appropriate, as many of the included variables for modeling are discrete variables.

hmm…. this will be interesting, as the multiple imputation function in SAS using a conditional expectation model and

linear regression for multiple imputation. So, the results from SAS may be different.

As a check of the imputed values, I have compared observed vs. imputed using the density plot functions. These plots show substantial overlap in the distributions. As expected, under the pmm method, peaks and valleys in distributions are preserved (e.g. severity).

data.mi <- mice(data,m=100,method="pmm",seed=1,print=FALSE)
densityplot(data.mi)

Try alternative regression-based approaches to imputation

The default method for mice is “pmm” (predictive mean matching), which can be used for any data type (continuous, binary, ordered categorical and non-ordered categorical. This dataset contains continuous, binary and ordered categorical data (one could argue education as ordered categorical, but I dont think that’s useful here). An alternative approach will be to use logistic regression to impute binary variables and linear regression to impute continuous variables.

Plots of imputed vs. observed variables show fair overlap, but peaks and valleys are no longer preserved.

method <- data.mi$method # Export the variable-wise list of methods [easy to work with this list to specify method for each variable]. A few variables will have an empty string "", these are variables with no missing data.
method["IMALE"] <- "logreg.boot" # Logistic regression with bootstrap (logistric regression alone would just reproduce the same value)
method["IEDUC"] <- "norm" # Predicted values from linear regression
method["PAQAGE"] <- "norm"
method["IMARRIED"] <- "logreg.boot"
method["IADJINCM"] <-"norm"
method["NCLINDEP"] <- "logreg.boot"
method["IWHITE"] <- "logreg.boot"
method["sevhyp"] <- "logreg.boot"
method["severity"] <- "norm" # 
method["logIADJINCM"] <- "norm"
method["sqrIEDUC"] <- "norm"
data.mir <- mice(data,m=100,method=method,seed=1,print=FALSE)
densityplot(data.mir)

mi.m4 <- with(data.mi,lm(PSQG~severity+NCLINDEP+PAQAGE+sqrIEDUC))
mir.m4 <- with(data.mir,lm(PSQG~severity+NCLINDEP+PAQAGE+sqrIEDUC))
pool.m4 <- pool(mi.m4)
pool.r.squared(mi.m4)
##            est      lo 95     hi 95 fmi
## R^2 0.08647058 0.06861455 0.1059654 NaN
pool.m4r <- pool(mir.m4)
pool.r.squared(mir.m4)
##            est    lo 95    hi 95 fmi
## R^2 0.08654106 0.068696 0.106021 NaN
summary(pool.m4)
##                estimate   std.error statistic       df      p.value
## (Intercept) 50.45725749 2.127652702 23.714988 2985.775 0.000000e+00
## severity     3.29207221 0.396960932  8.293189 2897.911 0.000000e+00
## NCLINDEP1   -6.73217810 1.019196340 -6.605379 3126.409 4.642242e-11
## PAQAGE       0.18848177 0.023311604  8.085320 3108.480 8.881784e-16
## sqrIEDUC    -0.02825746 0.004641059 -6.088582 2972.425 1.277804e-09
summary(pool.m4r)
##                estimate   std.error statistic       df      p.value
## (Intercept) 50.55244212 2.121009210 23.834146 3035.762 0.000000e+00
## severity     3.27158533 0.396182486  8.257774 2945.325 2.220446e-16
## NCLINDEP1   -6.73877829 1.021378135 -6.597731 3093.538 4.881473e-11
## PAQAGE       0.18849941 0.023252510  8.106626 3139.272 8.881784e-16
## sqrIEDUC    -0.02842697 0.004637299 -6.130071 2968.803 9.878707e-10
summary(mean.m4)
## 
## Call:
## lm(formula = PSQG ~ severity + NCLINDEP + PAQAGE + sqrIEDUC, 
##     data = data.mean)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -66.899 -12.039   1.201  13.058  51.901 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 50.808385   2.108512  24.097  < 2e-16 ***
## severity     3.296399   0.394969   8.346  < 2e-16 ***
## NCLINDEP1   -6.778905   1.012516  -6.695 2.53e-11 ***
## PAQAGE       0.185665   0.023161   8.016 1.51e-15 ***
## sqrIEDUC    -0.029563   0.004631  -6.384 1.97e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.83 on 3238 degrees of freedom
## Multiple R-squared:  0.08519,    Adjusted R-squared:  0.08406 
## F-statistic: 75.38 on 4 and 3238 DF,  p-value: < 2.2e-16
summary(complete.m4)
## 
## Call:
## lm(formula = PSQG ~ severity + NCLINDEP + PAQAGE + sqrIEDUC, 
##     data = complete)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -64.452 -12.087   1.116  12.985  51.804 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 51.837253   2.400351  21.596  < 2e-16 ***
## severity     3.419606   0.455716   7.504 8.61e-14 ***
## NCLINDEP1   -6.695808   1.120323  -5.977 2.61e-09 ***
## PAQAGE       0.167743   0.026417   6.350 2.56e-10 ***
## sqrIEDUC    -0.031355   0.005197  -6.034 1.84e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.84 on 2459 degrees of freedom
## Multiple R-squared:  0.08515,    Adjusted R-squared:  0.08366 
## F-statistic: 57.22 on 4 and 2459 DF,  p-value: < 2.2e-16