Introduction
Today, I decided to blog about an R package memisc
, which is available at CRAN, that provides tools for the management of survey data, as well as the creation of tables of summary statistics and model estimates.
One of the aims of this package is to make life easier for R users who deal with survey data sets. It provides an infrastructure for the management of survey data including value labels, definable missing values, recoding of variables, and production of code books. Further, it provides functionality to produce tables and data frames of arbitrary descriptive statistics and publication-ready tables of regression model estimates.
Examples
Let’ see how we can use the memisc
package to provide comparison statistics between linear model through examples.
first we create the models
library(memisc) # mtcars
# lm0 <- lm(hp ~ wt, mtcars)
# lm1 <- lm(qsec ~ hp, mtcars)
# lm2 <- lm(qsec ~ wt, mtcars)
lm0 <- lm(sr ~ pop15 + pop75, data = LifeCycleSavings)
lm1 <- lm(sr ~ dpi + ddpi, data = LifeCycleSavings)
lm2 <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings)
Then, we print out the table
mtable123 <- mtable("Model 1"=lm0,"Model 2"=lm1,"Model 3"=lm2,
summary.stats=c("sigma","R-squared","F","p","N"))
mtable123
##
## Calls:
## Model 1: lm(formula = sr ~ pop15 + pop75, data = LifeCycleSavings)
## Model 2: lm(formula = sr ~ dpi + ddpi, data = LifeCycleSavings)
## Model 3: lm(formula = sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings)
##
## ================================================
## Model 1 Model 2 Model 3
## ------------------------------------------------
## (Intercept) 30.628*** 6.360*** 28.566***
## (7.409) (1.252) (7.355)
## pop15 -0.471** -0.461**
## (0.147) (0.145)
## pop75 -1.934 -1.691
## (1.041) (1.084)
## dpi 0.001 -0.000
## (0.001) (0.001)
## ddpi 0.529* 0.410*
## (0.210) (0.196)
## ------------------------------------------------
## sigma 3.931 4.189 3.803
## R-squared 0.262 0.162 0.338
## F 8.332 4.528 5.756
## p 0.001 0.016 0.001
## N 50 50 50
## ================================================
## Significance: *** = p < 0.001;
## ** = p < 0.01; * = p < 0.05
memisc also allows you to give the variables labels for clarity purposes. So let’s do that!
(mtable123 <- relabel(mtable123,
"(Intercept)" = "Constant",
pop15 = "Percentage of population under 15",
pop75 = "Percentage of population over 75",
dpi = "Real per-capita disposable income",
ddpi = "Growth rate of real per-capita disp. income"
))
##
## Calls:
## Model 1: lm(formula = sr ~ pop15 + pop75, data = LifeCycleSavings)
## Model 2: lm(formula = sr ~ dpi + ddpi, data = LifeCycleSavings)
## Model 3: lm(formula = sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings)
##
## ================================================================================
## Model 1 Model 2 Model 3
## --------------------------------------------------------------------------------
## Constant 30.628*** 6.360*** 28.566***
## (7.409) (1.252) (7.355)
## Percentage of population under 15 -0.471** -0.461**
## (0.147) (0.145)
## Percentage of population over 75 -1.934 -1.691
## (1.041) (1.084)
## Real per-capita disposable income 0.001 -0.000
## (0.001) (0.001)
## Growth rate of real per-capita disp. income 0.529* 0.410*
## (0.210) (0.196)
## --------------------------------------------------------------------------------
## sigma 3.931 4.189 3.803
## R-squared 0.262 0.162 0.338
## F 8.332 4.528 5.756
## p 0.001 0.016 0.001
## N 50 50 50
## ================================================================================
## Significance: *** = p < 0.001; ** = p < 0.01; * = p < 0.05
The ftable
in memisc package creates ‘flat’ contingency tables. Similar to the usual contingency tables, these contain the counts of each combination of the levels of the variables (factors) involved. This information is then re-arranged as a matrix whose rows and columns correspond to unique combinations of the levels of the row and column variables (as specified by row.vars and col.vars, respectively). Let’s see that in axample!
## Start with a contingency table.
ftable(Titanic, row.vars = 1:3)
## Survived No Yes
## Class Sex Age
## 1st Male Child 0 5
## Adult 118 57
## Female Child 0 1
## Adult 4 140
## 2nd Male Child 0 11
## Adult 154 14
## Female Child 0 13
## Adult 13 80
## 3rd Male Child 35 13
## Adult 387 75
## Female Child 17 14
## Adult 89 76
## Crew Male Child 0 0
## Adult 670 192
## Female Child 0 0
## Adult 3 20
ftable(Titanic, row.vars = 1:2, col.vars = "Survived")
## Survived No Yes
## Class Sex
## 1st Male 118 62
## Female 4 141
## 2nd Male 154 25
## Female 13 93
## 3rd Male 422 88
## Female 106 90
## Crew Male 670 192
## Female 3 20
ftable(Titanic, row.vars = 2:1, col.vars = "Survived")
## Survived No Yes
## Sex Class
## Male 1st 118 62
## 2nd 154 25
## 3rd 422 88
## Crew 670 192
## Female 1st 4 141
## 2nd 13 93
## 3rd 106 90
## Crew 3 20
## Start with a data frame.
x <- ftable(mtcars[c("cyl", "vs", "am", "gear")])
x
## gear 3 4 5
## cyl vs am
## 4 0 0 0 0 0
## 1 0 0 1
## 1 0 1 2 0
## 1 0 6 1
## 6 0 0 0 0 0
## 1 0 2 1
## 1 0 2 2 0
## 1 0 0 0
## 8 0 0 12 0 0
## 1 0 0 2
## 1 0 0 0 0
## 1 0 0 0
ftable(x, row.vars = c(2, 4))
## cyl 4 6 8
## am 0 1 0 1 0 1
## vs gear
## 0 3 0 0 0 0 12 0
## 4 0 0 0 2 0 0
## 5 0 1 0 1 0 2
## 1 3 1 0 2 0 0 0
## 4 2 6 2 0 0 0
## 5 0 1 0 0 0 0
## Start with expressions, use table()'s "dnn" to change labels
ftable(mtcars$cyl, mtcars$vs, mtcars$am, mtcars$gear, row.vars = c(2, 4),
dnn = c("Cylinders", "V/S", "Transmission", "Gears"))
## Cylinders 4 6 8
## Transmission 0 1 0 1 0 1
## V/S Gears
## 0 3 0 0 0 0 12 0
## 4 0 0 0 2 0 0
## 5 0 1 0 1 0 2
## 1 3 1 0 2 0 0 0
## 4 2 6 2 0 0 0
## 5 0 1 0 0 0 0
The generic function collect
collects several objects of the same mode into one object, using their names, rownames, colnames and/or dimnames. There are methods for atomic vectors, arrays (including matrices), and data frames. For example
a <- c(a=1,b=2)
b <- c(a=10,c=30)
collect(a,b)
## a b
## a 1 10
## b 2 NA
## c NA 30
The function recode
substitutes old values of a factor or a numeric vector by new ones, just like the recoding facilities in some commercial statistical packages.
x <- as.item(sample(1:6,20,replace=TRUE),
labels=c( a=1,
b=2,
c=3,
d=4,
e=5,
f=6))
print(x)
## [1] f d b b b d b e e f b b f e e c a a f e
# A recoded version of x is returned
# containing the values 1, 2, 3, which are
# labelled as "A", "B", "C".
recode(x,
A = 1 <- range(min,2),
B = 2 <- 3:4,
C = 3 <- range(5,max), # this last comma is ignored
)
##
## Item (measurement: nominal, type: integer, length = 20)
##
## [1:20] C B A A A B A C C C A A C C C B A A C C