Mirna Assignment: Generalized Linear Models

In this part, we import the data from spss file into R, and convert our categorical variables into factors; in addition, we create a graphical summary of the response variable anaema.

library(aod)
library(ggplot2)
library(Hmisc)
## Loading required package: survival
## Loading required package: splines
## 
## Attaching package: 'survival'
## 
## The following object is masked from 'package:aod':
## 
##     rats
## 
## Loading required package: Formula
## Hmisc library by Frank E Harrell Jr
## 
## Type library(help='Hmisc'), ?Overview, or ?Hmisc.Overview')
## to see overall documentation.
## 
## 
## Attaching package: 'Hmisc'
## 
## The following object is masked from 'package:survival':
## 
##     untangle.specials
## 
## The following object is masked from 'package:base':
## 
##     format.pval, round.POSIXt, trunc.POSIXt, units
library(lattice)
mydata <- spss.get("/Users/Sofia/Desktop/anaemia.sav")  #import .sav data 
## Loading required package: foreign
## Warning: duplicated levels in factors are deprecated
View(mydata)
summary(mydata)
## Warning: duplicated levels in factors are deprecated
##     age                       region              educ      ceb     
##  15-19:139   Central             :200   no education:173   0  :134  
##  20-24:263   Eastern             :210   primary     :495   1  :144  
##  25-29:193   Northern            :108   secondary   :129   2-3:220  
##  30-34:112   Western             :242                      2-3:  0  
##  35-39: 65   Not de jure resident: 37                      4-5:161  
##  40-44: 23                                                 4-5:  0  
##  45-49:  2                                                 6+ :138  
##        wt                anaema   
##  Min.   : 37.7   not anaemic:493  
##  1st Qu.: 52.0   anaemic    :304  
##  Median : 56.8                    
##  Mean   : 57.6                    
##  3rd Qu.: 62.1                    
##  Max.   :152.6                    
## 
### Convert to factor
mydata$age <- factor(mydata$age)
mydata$region <- factor(mydata$region)
mydata$edu <- factor(mydata$educ)
mydata$ceb <- factor(mydata$ceb)
## Warning: duplicated levels in factors are deprecated
mydata$anaema <- factor(mydata$anaema)
weight <- as.numeric(mydata$wt)

xtabs(~anaema + factor(age), data = mydata)
##              factor(age)
## anaema        15-19 20-24 25-29 30-34 35-39 40-44 45-49
##   not anaemic    83   169   117    64    43    16     1
##   anaemic        56    94    76    48    22     7     1
xtabs(~anaema + factor(region), data = mydata)
##              factor(region)
## anaema        Central Eastern Northern Western Not de jure resident
##   not anaemic     130     117       63     167                   16
##   anaemic          70      93       45      75                   21
xtabs(~anaema + factor(educ), data = mydata)
##              factor(educ)
## anaema        no education primary secondary
##   not anaemic           95     302        96
##   anaemic               78     193        33
xtabs(~anaema + factor(ceb), data = mydata)
##              factor(ceb)
## anaema          0   1 2-3 4-5  6+
##   not anaemic  81  93 134 102  83
##   anaemic      53  51  86  59  55

Plots we created for as graphical summaries of our variables

barchart(mydata$anaema, horizontal = F)

plot of chunk multipleplots

histogram(weight)

plot of chunk multipleplots

bwplot(anaema ~ wt, data = mydata)

plot of chunk multipleplots

bwplot(anaema ~ wt | age, data = mydata)

plot of chunk multipleplots

bwplot(anaema ~ wt | educ, data = mydata)

plot of chunk multipleplots

bwplot(anaema ~ wt | region, data = mydata)

plot of chunk multipleplots

bwplot(anaema ~ wt | ceb, data = mydata)

plot of chunk multipleplots

Analysis

mylogit <- glm(anaema ~ wt + educ + region + ceb + age, family = "binomial", 
    data = mydata)
summary(mylogit)
## 
## Call:
## glm(formula = anaema ~ wt + educ + region + ceb + age, family = "binomial", 
##     data = mydata)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.452  -0.993  -0.796   1.274   1.892  
## 
## Coefficients:
##                            Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                 0.74649    0.59686    1.25  0.21105    
## wt                         -0.01266    0.00942   -1.34  0.17899    
## educprimary                -0.29834    0.18961   -1.57  0.11562    
## educsecondary              -0.98489    0.28527   -3.45  0.00056 ***
## regionEastern               0.30273    0.20979    1.44  0.14903    
## regionNorthern              0.07350    0.25802    0.28  0.77574    
## regionWestern              -0.38131    0.21521   -1.77  0.07642 .  
## regionNot de jure resident  0.70770    0.37521    1.89  0.05928 .  
## ceb1                       -0.26528    0.27517   -0.96  0.33502    
## ceb2-3                     -0.24200    0.30079   -0.80  0.42108    
## ceb4-5                     -0.61067    0.36654   -1.67  0.09571 .  
## ceb6+                      -0.34540    0.43203   -0.80  0.42400    
## age20-24                   -0.00945    0.26770   -0.04  0.97183    
## age25-29                    0.25480    0.33171    0.77  0.44241    
## age30-34                    0.46875    0.40075    1.17  0.24213    
## age35-39                   -0.11237    0.47101   -0.24  0.81144    
## age40-44                   -0.23305    0.61901   -0.38  0.70656    
## age45-49                    0.66134    1.50672    0.44  0.66071    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1059.6  on 796  degrees of freedom
## Residual deviance: 1021.7  on 779  degrees of freedom
## AIC: 1058
## 
## Number of Fisher Scoring iterations: 4
1 - pchisq(1021.7, 779)  ###Likelihood ratio test p-value
## [1] 9.418e-09

1- in the output above, the first thing we see is the call, this is R reminding us what the model we ran was, what options we specified, etc.

2- Next we see the deviance residuals, which are a measure of model fit. This part of output shows the distribution of the deviance residuals for individual cases used in the model. Below we discuss how to use summaries of the deviance statistic to assess model fit. 3- The next part of the output shows the coefficients, their standard errors, the z-statistic (sometimes called a Wald z-statistic), and the associated p-values.The logistic regression coefficients give the change in the log odds of the outcome for a one unit increase in the predictor variable. We can see that education is highly signficant, and ceb is significant at 10%.

4- For a one unit increase in wt, the log odds of being anaemic decreases by 0.013

5- The indicator variables for education, region, ceb, and age have a slightly different interpretation. For example, having attended a secondary school, versus no education, changes the log odds of admission by -0.985.

anova(mylogit, test = "Chisq")  ##weight, education and region seem to be significant, 
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: anaema
## 
## Terms added sequentially (first to last)
## 
## 
##        Df Deviance Resid. Df Resid. Dev Pr(>Chi)   
## NULL                     796       1060            
## wt      1     5.05       795       1055   0.0247 * 
## educ    2    10.46       793       1044   0.0054 **
## region  4    14.86       789       1029   0.0050 **
## ceb     4     2.11       785       1027   0.7159   
## age     6     5.45       779       1022   0.4881   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
drop1(mylogit, test = "Chisq")  ## we can drop wt, ceb, and age!!
## Single term deletions
## 
## Model:
## anaema ~ wt + educ + region + ceb + age
##        Df Deviance  AIC   LRT Pr(>Chi)   
## <none>        1022 1058                  
## wt      1     1024 1058  1.88   0.1704   
## educ    2     1034 1066 12.64   0.0018 **
## region  4     1038 1066 16.35   0.0026 **
## ceb     4     1025 1053  3.56   0.4687   
## age     6     1027 1051  5.45   0.4881   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
add1(mylogit, ~.^2, test = "Chisq")
## Single term additions
## 
## Model:
## anaema ~ wt + educ + region + ceb + age
##             Df Deviance  AIC   LRT Pr(>Chi)
## <none>             1022 1058               
## wt:educ      2     1019 1059  2.74     0.25
## wt:region    4     1019 1063  2.38     0.67
## wt:ceb       4     1018 1062  4.12     0.39
## wt:age       6     1016 1064  5.63     0.47
## educ:region  8     1012 1064  9.70     0.29
## educ:ceb     8     1018 1070  4.14     0.84
## educ:age    10     1009 1065 12.86     0.23
## region:ceb  16     1010 1078 11.99     0.74
## region:age  20     1007 1083 14.94     0.78
## ceb:age     13     1005 1067 16.31     0.23
search <- step(mylogit, ~.^2)
## Start:  AIC=1058
## anaema ~ wt + educ + region + ceb + age
## 
##               Df Deviance  AIC
## - age          6     1027 1051
## - ceb          4     1025 1053
## - wt           1     1024 1058
## <none>               1022 1058
## + wt:educ      2     1019 1059
## + wt:ceb       4     1018 1062
## + wt:region    4     1019 1063
## + educ:region  8     1012 1064
## + wt:age       6     1016 1064
## + educ:age    10     1009 1065
## - region       4     1038 1066
## - educ         2     1034 1066
## + ceb:age     13     1005 1067
## + educ:ceb     8     1018 1070
## + region:ceb  16     1010 1078
## + region:age  20     1007 1083
## 
## Step:  AIC=1051
## anaema ~ wt + educ + region + ceb
## 
##               Df Deviance  AIC
## - ceb          4     1029 1045
## - wt           1     1029 1051
## <none>               1027 1051
## + wt:educ      2     1024 1052
## + wt:ceb       4     1023 1055
## + wt:region    4     1025 1057
## + age          6     1022 1058
## - region       4     1042 1058
## + educ:region  8     1018 1058
## - educ         2     1039 1059
## + educ:ceb     8     1023 1063
## + region:ceb  16     1015 1071
## 
## Step:  AIC=1045
## anaema ~ wt + educ + region
## 
##               Df Deviance  AIC
## <none>               1029 1045
## - wt           1     1031 1045
## + wt:educ      2     1026 1046
## + wt:region    4     1027 1051
## + ceb          4     1027 1051
## - region       4     1044 1052
## - educ         2     1040 1052
## + educ:region  8     1020 1052
## + age          6     1025 1053
newlogit <- (glm(anaema ~ wt + educ + region, family = "binomial", data = mydata))
summary(newlogit)
## 
## Call:
## glm(formula = anaema ~ wt + educ + region, family = "binomial", 
##     data = mydata)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.431  -0.997  -0.823   1.273   1.954  
## 
## Coefficients:
##                            Estimate Std. Error z value Pr(>|z|)   
## (Intercept)                  0.5302     0.5622    0.94   0.3457   
## wt                          -0.0130     0.0093   -1.40   0.1623   
## educprimary                 -0.2740     0.1843   -1.49   0.1370   
## educsecondary               -0.8612     0.2675   -3.22   0.0013 **
## regionEastern                0.2920     0.2075    1.41   0.1594   
## regionNorthern               0.0772     0.2551    0.30   0.7621   
## regionWestern               -0.3246     0.2107   -1.54   0.1234   
## regionNot de jure resident   0.7534     0.3682    2.05   0.0407 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1059.6  on 796  degrees of freedom
## Residual deviance: 1029.3  on 789  degrees of freedom
## AIC: 1045
## 
## Number of Fisher Scoring iterations: 4
anova(newlogit, test = "Chisq")
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: anaema
## 
## Terms added sequentially (first to last)
## 
## 
##        Df Deviance Resid. Df Resid. Dev Pr(>Chi)   
## NULL                     796       1060            
## wt      1     5.05       795       1055   0.0247 * 
## educ    2    10.46       793       1044   0.0054 **
## region  4    14.86       789       1029   0.0050 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
1 - pchisq(1029.3, 789)  ###Likelihood ratio test p-value
## [1] 1.497e-08
add1(newlogit, ~.^2, test = "Chisq")  ##no need to add intercation effects to the model.
## Single term additions
## 
## Model:
## anaema ~ wt + educ + region
##             Df Deviance  AIC  LRT Pr(>Chi)
## <none>             1029 1045              
## wt:educ      2     1026 1046 3.29     0.19
## wt:region    4     1027 1051 2.44     0.65
## educ:region  8     1020 1052 8.85     0.36
mylogit <- (glm(anaema ~ wt + educ + region, family = "binomial", data = mydata))
summary(mylogit)
## 
## Call:
## glm(formula = anaema ~ wt + educ + region, family = "binomial", 
##     data = mydata)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.431  -0.997  -0.823   1.273   1.954  
## 
## Coefficients:
##                            Estimate Std. Error z value Pr(>|z|)   
## (Intercept)                  0.5302     0.5622    0.94   0.3457   
## wt                          -0.0130     0.0093   -1.40   0.1623   
## educprimary                 -0.2740     0.1843   -1.49   0.1370   
## educsecondary               -0.8612     0.2675   -3.22   0.0013 **
## regionEastern                0.2920     0.2075    1.41   0.1594   
## regionNorthern               0.0772     0.2551    0.30   0.7621   
## regionWestern               -0.3246     0.2107   -1.54   0.1234   
## regionNot de jure resident   0.7534     0.3682    2.05   0.0407 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1059.6  on 796  degrees of freedom
## Residual deviance: 1029.3  on 789  degrees of freedom
## AIC: 1045
## 
## Number of Fisher Scoring iterations: 4

In the output above, we found ceb and age not to be significant, so we remove them from the model; also, we checked to see if interations between variables is significant, and didn't find any significane!

We can obtain the confidence intervals for the coefficient variables. Note that for logistic models, confidence intervals are based on the profiled log-likelihood function. We can also obtain confidence intervals based on just the standard errors by using the default method.

## CIs using profiled log-likelihood
confint(mylogit)
## Waiting for profiling to be done...
##                               2.5 %   97.5 %
## (Intercept)                -0.54848  1.65825
## wt                         -0.03175  0.00475
## educprimary                -0.63515  0.08811
## educsecondary              -1.39365 -0.34303
## regionEastern              -0.11407  0.70023
## regionNorthern             -0.42544  0.57599
## regionWestern              -0.73871  0.08815
## regionNot de jure resident  0.03558  1.48757
## CIs using standard errors
confint.default(mylogit)
##                               2.5 %    97.5 %
## (Intercept)                -0.57178  1.632100
## wt                         -0.03122  0.005232
## educprimary                -0.63526  0.087188
## educsecondary              -1.38557 -0.336920
## regionEastern              -0.11472  0.698768
## regionNorthern             -0.42272  0.577151
## regionWestern              -0.73762  0.088314
## regionNot de jure resident  0.03176  1.475026

We can test for an overall effect of educ using the wald.test function of the aod library. The order in which the coefficients are given in the table of coefficients is the same as the order of the terms in the model. This is important because the wald.test function refers to the coefficients by their order in the model. We use the wald.test function. b supplies the coefficients, while Sigma supplies the variance covariance matrix of the error terms, finally Terms tells R which terms in the model are to be tested.

wald.test(b = coef(mylogit), Sigma = vcov(mylogit), Terms = 3:4)
## Wald test:
## ----------
## 
## Chi-squared test:
## X2 = 10.4, df = 2, P(> X2) = 0.0054

The chi-squared test statistic of 12.1, with two degrees of freedom is associated with a p-value of 0.0024 indicating that the overall effect of education is statistically significant.

We can use the wald test on the remaining categorical variables.

wald.test(b = coef(mylogit), Sigma = vcov(mylogit), Terms = 5:8)
## Wald test:
## ----------
## 
## Chi-squared test:
## X2 = 14.6, df = 4, P(> X2) = 0.0057

We can also test additional hypotheses about the differences in the coefficients for the different levels of education. Below we test that the coefficient for primary education is equal to the coefficient for secondary education. The first line of code below creates a vector l that defines the test we want to perform. In this case, we want to test the difference (subtraction) of the terms for educ=primary and educ= secondary (i.e., the 3rd and 4th terms in the model). To contrast these two terms, we multiply one of them by 1, and the other by -1. The other terms in the model are not involved in the test, so they are multiplied by 0. The second line of code below uses L=l to tell R that we wish to base the test on the vector l (rather than using the Terms option as we did above).

l <- cbind(0, 0, 1, -1, 0, 0, 0, 0)
wald.test(b = coef(mylogit), Sigma = vcov(mylogit), L = l)
## Wald test:
## ----------
## 
## Chi-squared test:
## X2 = 6.6, df = 1, P(> X2) = 0.01

The chi-squared test statistic of 6.6 with 1 degree of freedom is associated with a p-value of 0.01, indicating that the difference between the coefficient for primary education and the coefficient for secondary education is statistically significant.

You can also exponentiate the coefficients and interpret them as odds-ratios. R will do this computation for you. To get the exponentiated coefficients, you tell R that you want to exponentiate (exp), and that the object you want to exponentiate is called coefficients and it is part of mylogit (coef(mylogit)). We can use the same logic to get odds ratios and their confidence intervals, by exponentiating the confidence intervals from before. To put it all in one table, we use cbind to bind the coefficients and confidence intervals column-wise.

## odds ratios only
exp(coef(mylogit))
##                (Intercept)                         wt 
##                     1.6992                     0.9871 
##                educprimary              educsecondary 
##                     0.7603                     0.4226 
##              regionEastern             regionNorthern 
##                     1.3391                     1.0803 
##              regionWestern regionNot de jure resident 
##                     0.7228                     2.1242

## odds ratios and 95% CI
exp(cbind(OR = coef(mylogit), confint(mylogit)))
## Waiting for profiling to be done...
##                                OR  2.5 % 97.5 %
## (Intercept)                1.6992 0.5778 5.2501
## wt                         0.9871 0.9687 1.0048
## educprimary                0.7603 0.5299 1.0921
## educsecondary              0.4226 0.2482 0.7096
## regionEastern              1.3391 0.8922 2.0142
## regionNorthern             1.0803 0.6535 1.7789
## regionWestern              0.7228 0.4777 1.0922
## regionNot de jure resident 2.1242 1.0362 4.4263

Now we can say that for a one unit increase in wt, the odds of being anaemic (versus not being anaemic) increase by a factor of 0.987 . Note that while R produces it, the odds ratio for the intercept is not generally interpreted.

Checking assumptions of logistic regression

Generalized linear models are freed from the assumption that residuals are normally distributed with equal variance, but the method nevertheless makes important assumptions that should be checked.

Residual plots and other diagnostic plots for glm objects might looks strange and have “stripes” because the data are discrete. Residual plots can nevertheless help to spot severe outliers. Leverage plots can help determine whether any outlying data points exert excessive influence on the parameter estimates. The normal quantile plots are probably not of much use.


plot(mylogit)  # deviance residuals, q-q, leverage

plot of chunk unnamed-chunk-9 plot of chunk unnamed-chunk-9 plot of chunk unnamed-chunk-9 plot of chunk unnamed-chunk-9

Deviance residuals indicate the contribution of each point to the total deviance, which measures overall fit of the model to data. Use jitter to reduce overlap of points in the plot.


plot(predict(mylogit), jitter(residuals(mylogit), amount = 0.05))

plot of chunk unnamed-chunk-10

Pearson residuals are calculated from the working data on the logit or log scale, but correct for differences between data points in their associated “weight” (inverse of expected variance).


plot(predict(mylogit), jitter(residuals(mylogit, type = "pearson"), amount = 0.05))

plot of chunk unnamed-chunk-11

Leverage indicates the influence that each data point has on the parameters of the fitted model. A data point has high leverage if removing it from the analysis changes the parameter estimates substantially.


plot(hatvalues(mylogit), jitter(residuals(mylogit, type = "pearson"), amount = 0.05))

plot of chunk unnamed-chunk-12

Logistic regression assume that the “dispersion parameter” is 1. What this means is that the variance of the residuals are as expected from the binomial or Poisson distributions specified by the link function. In the case of count data, for example, a dispersion parameter of 1 means that the variance of the explanatory variable within each group should be approximately equal to the mean for the group. However, count data are notorious for having higher than expected variances (dispersion parameter > 1). One way to check is to tabulate the mean and variance for each group and see how they compare. Another way is to estimate the dispersion parameter directly by refitting the data to a quasi-likelihood model that includes such a parameter. For binary and count data (logistic) the corresponding quasi-likelihood models are as follows.

mylogit1 <- glm(anaema ~ wt + educ + region, family = quasibinomial(link = "logit"), 
    data = mydata)
summary(mylogit1)
## 
## Call:
## glm(formula = anaema ~ wt + educ + region, family = quasibinomial(link = "logit"), 
##     data = mydata)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.431  -0.997  -0.823   1.273   1.954  
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)   
## (Intercept)                 0.53016    0.56523    0.94   0.3486   
## wt                         -0.01299    0.00935   -1.39   0.1649   
## educprimary                -0.27403    0.18529   -1.48   0.1395   
## educsecondary              -0.86124    0.26895   -3.20   0.0014 **
## regionEastern               0.29202    0.20864    1.40   0.1620   
## regionNorthern              0.07721    0.25644    0.30   0.7634   
## regionWestern              -0.32465    0.21183   -1.53   0.1258   
## regionNot de jure resident  0.75339    0.37016    2.04   0.0422 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasibinomial family taken to be 1.011)
## 
##     Null deviance: 1059.6  on 796  degrees of freedom
## Residual deviance: 1029.3  on 789  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 4

If the estimated dispersion parameter is much greater than 1 (in our case it's 1), then you should repeat your entire analysis using the quasibinomial link function. This will lead to more reliable standard errors for parameters, and different P-values in tests.

Model selection

Compute log-likelihoods for fitted model.

logLik(mylogit)  # log-likelihood of the model fit
## 'log Lik.' -514.6 (df=8)

stepAIC

stepAIC is a command in the MASS library that will automatically carry out a restricted search for the “best” model, as measured by either AIC or BIC (Bayesian Information criterion) (but not AICc, unfortunately). stepAIC is accepts both categorical and numerical variables. It does not search all possible subsets of variables, but rather it searches in a specific sequence determined by the order of the variables in the formula of the model object.

library(MASS)
mylogit2 <- stepAIC(mylogit, upper = ~., lower = ~1, direction = "both")
## Start:  AIC=1045
## anaema ~ wt + educ + region
## 
##          Df Deviance  AIC
## <none>          1029 1045
## - wt      1     1031 1045
## - region  4     1044 1052
## - educ    2     1040 1052
summary(mylogit2)
## 
## Call:
## glm(formula = anaema ~ wt + educ + region, family = "binomial", 
##     data = mydata)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.431  -0.997  -0.823   1.273   1.954  
## 
## Coefficients:
##                            Estimate Std. Error z value Pr(>|z|)   
## (Intercept)                  0.5302     0.5622    0.94   0.3457   
## wt                          -0.0130     0.0093   -1.40   0.1623   
## educprimary                 -0.2740     0.1843   -1.49   0.1370   
## educsecondary               -0.8612     0.2675   -3.22   0.0013 **
## regionEastern                0.2920     0.2075    1.41   0.1594   
## regionNorthern               0.0772     0.2551    0.30   0.7621   
## regionWestern               -0.3246     0.2107   -1.54   0.1234   
## regionNot de jure resident   0.7534     0.3682    2.05   0.0407 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1059.6  on 796  degrees of freedom
## Residual deviance: 1029.3  on 789  degrees of freedom
## AIC: 1045
## 
## Number of Fisher Scoring iterations: 4
plot(mylogit2)

plot of chunk unnamed-chunk-15 plot of chunk unnamed-chunk-15 plot of chunk unnamed-chunk-15 plot of chunk unnamed-chunk-15

anova(mylogit2)
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: anaema
## 
## Terms added sequentially (first to last)
## 
## 
##        Df Deviance Resid. Df Resid. Dev
## NULL                     796       1060
## wt      1     5.05       795       1055
## educ    2    10.46       793       1044
## region  4    14.86       789       1029

Now, to interpret the output of stepAIC. Start at the top and follow the sequence of steps that the procedure has carried out. First, it fits the full model you gave it and prints the AIC value. Then it lists the AIC values that result when it drops out each term, one at a time, leaving all other variables in the model. (It starts with the highest-order interactions, if you have included any.) It picks the best model of the bunch tested (the one with the lowest AIC) and then starts again. At each iteration it may also add variables one at a time to see if AIC is lowered further. The process continues until neither adding nor removing any single term results in a lower AIC.