Introduction

Today, I decided to blog about an R package memisc, which is available at CRAN, that provides tools for the management of survey data, as well as the creation of tables of summary statistics and model estimates.

One of the aims of this package is to make life easier for R users who deal with survey data sets. It provides an infrastructure for the management of survey data including value labels, definable missing values, recoding of variables, and production of code books. Further, it provides functionality to produce tables and data frames of arbitrary descriptive statistics and publication-ready tables of regression model estimates.

Examples

Let’ see how we can use the memisc package to provide comparison statistics between linear model through examples.

first we create the models

library(memisc)  # mtcars 
# lm0 <- lm(hp ~ wt, mtcars)
# lm1 <- lm(qsec ~ hp, mtcars)
# lm2 <- lm(qsec ~ wt, mtcars)

lm0 <- lm(sr ~ pop15 + pop75,              data = LifeCycleSavings)
lm1 <- lm(sr ~                 dpi + ddpi, data = LifeCycleSavings)
lm2 <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings)

Then, we print out the table

mtable123 <- mtable("Model 1"=lm0,"Model 2"=lm1,"Model 3"=lm2,
                    summary.stats=c("sigma","R-squared","F","p","N"))

mtable123
## 
## Calls:
## Model 1: lm(formula = sr ~ pop15 + pop75, data = LifeCycleSavings)
## Model 2: lm(formula = sr ~ dpi + ddpi, data = LifeCycleSavings)
## Model 3: lm(formula = sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings)
## 
## ================================================
##                 Model 1    Model 2    Model 3   
## ------------------------------------------------
##   (Intercept)  30.628***   6.360***  28.566***  
##                (7.409)    (1.252)    (7.355)    
##   pop15        -0.471**              -0.461**   
##                (0.147)               (0.145)    
##   pop75        -1.934                -1.691     
##                (1.041)               (1.084)    
##   dpi                      0.001     -0.000     
##                           (0.001)    (0.001)    
##   ddpi                     0.529*     0.410*    
##                           (0.210)    (0.196)    
## ------------------------------------------------
##   sigma         3.931      4.189      3.803     
##   R-squared     0.262      0.162      0.338     
##   F             8.332      4.528      5.756     
##   p             0.001      0.016      0.001     
##   N            50         50         50         
## ================================================
##   Significance: *** = p < 0.001;   
##                 ** = p < 0.01; * = p < 0.05

memisc also allows you to give the variables labels for clarity purposes. So let’s do that!

(mtable123 <- relabel(mtable123,
                      "(Intercept)" = "Constant",
                      pop15 = "Percentage of population under 15",
                      pop75 = "Percentage of population over 75",
                      dpi = "Real per-capita disposable income",
                      ddpi = "Growth rate of real per-capita disp. income"
))
## 
## Calls:
## Model 1: lm(formula = sr ~ pop15 + pop75, data = LifeCycleSavings)
## Model 2: lm(formula = sr ~ dpi + ddpi, data = LifeCycleSavings)
## Model 3: lm(formula = sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings)
## 
## ================================================================================
##                                                 Model 1    Model 2    Model 3   
## --------------------------------------------------------------------------------
##   Constant                                     30.628***   6.360***  28.566***  
##                                                (7.409)    (1.252)    (7.355)    
##   Percentage of population under 15            -0.471**              -0.461**   
##                                                (0.147)               (0.145)    
##   Percentage of population over 75             -1.934                -1.691     
##                                                (1.041)               (1.084)    
##   Real per-capita disposable income                        0.001     -0.000     
##                                                           (0.001)    (0.001)    
##   Growth rate of real per-capita disp. income              0.529*     0.410*    
##                                                           (0.210)    (0.196)    
## --------------------------------------------------------------------------------
##   sigma                                         3.931      4.189      3.803     
##   R-squared                                     0.262      0.162      0.338     
##   F                                             8.332      4.528      5.756     
##   p                                             0.001      0.016      0.001     
##   N                                            50         50         50         
## ================================================================================
##   Significance: *** = p < 0.001; ** = p < 0.01; * = p < 0.05

The ftable in memisc package creates ‘flat’ contingency tables. Similar to the usual contingency tables, these contain the counts of each combination of the levels of the variables (factors) involved. This information is then re-arranged as a matrix whose rows and columns correspond to unique combinations of the levels of the row and column variables (as specified by row.vars and col.vars, respectively). Let’s see that in axample!

## Start with a contingency table.
ftable(Titanic, row.vars = 1:3)
##                    Survived  No Yes
## Class Sex    Age                   
## 1st   Male   Child            0   5
##              Adult          118  57
##       Female Child            0   1
##              Adult            4 140
## 2nd   Male   Child            0  11
##              Adult          154  14
##       Female Child            0  13
##              Adult           13  80
## 3rd   Male   Child           35  13
##              Adult          387  75
##       Female Child           17  14
##              Adult           89  76
## Crew  Male   Child            0   0
##              Adult          670 192
##       Female Child            0   0
##              Adult            3  20
ftable(Titanic, row.vars = 1:2, col.vars = "Survived")
##              Survived  No Yes
## Class Sex                    
## 1st   Male            118  62
##       Female            4 141
## 2nd   Male            154  25
##       Female           13  93
## 3rd   Male            422  88
##       Female          106  90
## Crew  Male            670 192
##       Female            3  20
ftable(Titanic, row.vars = 2:1, col.vars = "Survived")
##              Survived  No Yes
## Sex    Class                 
## Male   1st            118  62
##        2nd            154  25
##        3rd            422  88
##        Crew           670 192
## Female 1st              4 141
##        2nd             13  93
##        3rd            106  90
##        Crew             3  20
## Start with a data frame.
x <- ftable(mtcars[c("cyl", "vs", "am", "gear")])
x
##           gear  3  4  5
## cyl vs am              
## 4   0  0        0  0  0
##        1        0  0  1
##     1  0        1  2  0
##        1        0  6  1
## 6   0  0        0  0  0
##        1        0  2  1
##     1  0        2  2  0
##        1        0  0  0
## 8   0  0       12  0  0
##        1        0  0  2
##     1  0        0  0  0
##        1        0  0  0
ftable(x, row.vars = c(2, 4))
##         cyl  4     6     8   
##         am   0  1  0  1  0  1
## vs gear                      
## 0  3         0  0  0  0 12  0
##    4         0  0  0  2  0  0
##    5         0  1  0  1  0  2
## 1  3         1  0  2  0  0  0
##    4         2  6  2  0  0  0
##    5         0  1  0  0  0  0
## Start with expressions, use table()'s "dnn" to change labels
ftable(mtcars$cyl, mtcars$vs, mtcars$am, mtcars$gear, row.vars = c(2, 4),
       dnn = c("Cylinders", "V/S", "Transmission", "Gears"))
##           Cylinders     4     6     8   
##           Transmission  0  1  0  1  0  1
## V/S Gears                               
## 0   3                   0  0  0  0 12  0
##     4                   0  0  0  2  0  0
##     5                   0  1  0  1  0  2
## 1   3                   1  0  2  0  0  0
##     4                   2  6  2  0  0  0
##     5                   0  1  0  0  0  0

The generic function collect collects several objects of the same mode into one object, using their names, rownames, colnames and/or dimnames. There are methods for atomic vectors, arrays (including matrices), and data frames. For example

a <- c(a=1,b=2)
  b <- c(a=10,c=30)
  collect(a,b)
##    a  b
## a  1 10
## b  2 NA
## c NA 30

The function recode substitutes old values of a factor or a numeric vector by new ones, just like the recoding facilities in some commercial statistical packages.

x <- as.item(sample(1:6,20,replace=TRUE),
        labels=c( a=1,
                  b=2,
                  c=3,
                  d=4,
                  e=5,
                  f=6))
print(x)
##  [1] f d b b b d b e e f b b f e e c a a f e
# A recoded version of x is returned
# containing the values 1, 2, 3, which are
# labelled as "A", "B", "C".
recode(x,
  A = 1 <- range(min,2),
  B = 2 <- 3:4,
  C = 3 <- range(5,max), # this last comma is ignored
  )
## 
## Item (measurement: nominal, type: integer, length = 20) 
## 
##  [1:20] C B A A A B A C C C A A C C C B A A C C