As a student in applied economics with major for Econometrics, I find myself very often in situations where I look for datasets to apply the techniques I learn in class.
Not so long ago, I discovered through one of my teachers a book entitled Introductory Econometrics: A Modern Approach, 6e by Jeffrey M. Wooldridge.
The book is full of examples so that every material can be applied, but what is more exciting is that all the datasets used for the examples are publicly available and even more exciting is that there exists an R package that contains all the datasets.
The R package I will introduce in this article is the {woodridge}
package written by Justin M. Shea [aut, cre] and Kennth H. Brown [ctb].
You can read the official documentation at.
The official title of the package is “111 Data Sets for Econometrics”. It contains 111 datasets for regression analysis and various econometric modelling.
It is easy to install the package, just write install.packages("wooldridge")
in an R console and it should be okay.
Let’s now install the package and review what it contains
# install.packages("wooldridge")
library(wooldridge)
As stated in the title of the package, it contains 111 datasets for econometrics.
Let’s print the list of all the datasets :
ls("package:wooldridge")
## [1] "admnrev" "affairs" "airfare" "alcohol"
## [5] "apple" "approval" "athlet1" "athlet2"
## [9] "attend" "audit" "barium" "beauty"
## [13] "benefits" "beveridge" "big9salary" "bwght"
## [17] "bwght2" "campus" "card" "catholic"
## [21] "cement" "census2000" "ceosal1" "ceosal2"
## [25] "charity" "consump" "corn" "countymurders"
## [29] "cps78_85" "cps91" "crime1" "crime2"
## [33] "crime3" "crime4" "discrim" "driving"
## [37] "earns" "econmath" "elem94_95" "engin"
## [41] "expendshares" "ezanders" "ezunem" "fair"
## [45] "fertil1" "fertil2" "fertil3" "fish"
## [49] "fringe" "gpa1" "gpa2" "gpa3"
## [53] "happiness" "hprice1" "hprice2" "hprice3"
## [57] "hseinv" "htv" "infmrt" "injury"
## [61] "intdef" "intqrt" "inven" "jtrain"
## [65] "jtrain2" "jtrain3" "k401k" "k401ksubs"
## [69] "kielmc" "lawsch85" "loanapp" "lowbrth"
## [73] "mathpnl" "meap00_01" "meap01" "meap93"
## [77] "meapsingle" "minwage" "mlb1" "mroz"
## [81] "murder" "nbasal" "nyse" "okun"
## [85] "openness" "pension" "phillips" "pntsprd"
## [89] "prison" "prminwge" "rdchem" "rdtelec"
## [93] "recid" "rental" "return" "saving"
## [97] "sleep75" "slp75_81" "smoke" "traffic1"
## [101] "traffic2" "twoyear" "volat" "vote1"
## [105] "vote2" "voucher" "wage1" "wage2"
## [109] "wagepan" "wageprc" "wine"
set.seed(101)
knitr::kable(
head(
sample(
kielmc, 12),
10)
)
rooms | ldist | y81nrinc | y81 | agesq | lintstsq | lrprice | y81ldist | larea | age | nbh | baths |
---|---|---|---|---|---|---|---|---|---|---|---|
7 | 9.277999 | 0 | 0 | 2304 | 47.71770 | 11.00210 | 0 | 7.414573 | 48 | 4 | 1 |
6 | 9.305651 | 0 | 0 | 6889 | 47.71770 | 10.59663 | 0 | 7.867871 | 83 | 4 | 2 |
6 | 9.350102 | 0 | 0 | 3364 | 47.71770 | 10.43412 | 0 | 7.042286 | 58 | 4 | 1 |
5 | 9.384294 | 0 | 0 | 121 | 47.71770 | 11.06507 | 0 | 7.035269 | 11 | 4 | 1 |
5 | 9.400961 | 0 | 0 | 2304 | 57.77368 | 10.69195 | 0 | 7.532624 | 48 | 4 | 1 |
6 | 9.210339 | 0 | 0 | 6084 | 57.77368 | 10.73640 | 0 | 7.484369 | 78 | 4 | 3 |
6 | 9.367344 | 0 | 0 | 484 | 57.77368 | 10.93311 | 0 | 7.438384 | 22 | 4 | 2 |
6 | 9.230143 | 0 | 0 | 6084 | 57.77368 | 10.55841 | 0 | 7.349874 | 78 | 4 | 2 |
8 | 9.259131 | 0 | 0 | 1764 | 57.77368 | 11.01040 | 0 | 7.403670 | 42 | 4 | 2 |
5 | 9.305651 | 0 | 0 | 1681 | 57.77368 | 10.91509 | 0 | 7.274479 | 41 | 4 | 2 |
Let’s write a function to display the dimension of each dataset :
print_dimension <- function(names){
nb_rows = vector(mode = "numeric")
nb_cols = vector(mode = "numeric")
for (i in names){
df <- get(i)
nb_rows[i] <- dim(df)[1]
nb_cols[i] <- dim(df)[2]
rm(df)
}
df <- data.frame(row.names = names,
"Num.Obs" = nb_rows, "Num.Cols" = nb_cols)
return(df)
}
data_list = ls("package:wooldridge")
knitr::kable(print_dimension(data_list))
Num.Obs | Num.Cols | |
---|---|---|
admnrev | 153 | 5 |
affairs | 601 | 19 |
airfare | 4596 | 14 |
alcohol | 9822 | 33 |
apple | 660 | 17 |
approval | 78 | 16 |
athlet1 | 118 | 23 |
athlet2 | 30 | 10 |
attend | 680 | 11 |
audit | 241 | 3 |
barium | 131 | 31 |
beauty | 1260 | 17 |
benefits | 1848 | 18 |
beveridge | 135 | 8 |
big9salary | 786 | 30 |
bwght | 1388 | 14 |
bwght2 | 1832 | 23 |
campus | 97 | 7 |
card | 3010 | 34 |
catholic | 7430 | 13 |
cement | 312 | 30 |
census2000 | 29501 | 6 |
ceosal1 | 209 | 12 |
ceosal2 | 177 | 15 |
charity | 4268 | 8 |
consump | 37 | 24 |
corn | 37 | 5 |
countymurders | 37349 | 20 |
cps78_85 | 1084 | 15 |
cps91 | 5634 | 24 |
crime1 | 2725 | 16 |
crime2 | 92 | 34 |
crime3 | 106 | 12 |
crime4 | 630 | 59 |
discrim | 410 | 37 |
driving | 1200 | 56 |
earns | 41 | 14 |
econmath | 856 | 17 |
elem94_95 | 1848 | 14 |
engin | 403 | 17 |
expendshares | 1519 | 13 |
ezanders | 108 | 25 |
ezunem | 198 | 37 |
fair | 21 | 28 |
fertil1 | 1129 | 27 |
fertil2 | 4361 | 27 |
fertil3 | 72 | 24 |
fish | 97 | 20 |
fringe | 616 | 39 |
gpa1 | 141 | 29 |
gpa2 | 4137 | 12 |
gpa3 | 732 | 23 |
happiness | 17137 | 33 |
hprice1 | 88 | 10 |
hprice2 | 506 | 12 |
hprice3 | 321 | 19 |
hseinv | 42 | 14 |
htv | 1230 | 23 |
infmrt | 102 | 12 |
injury | 7150 | 30 |
intdef | 56 | 13 |
intqrt | 124 | 23 |
inven | 37 | 13 |
jtrain | 471 | 30 |
jtrain2 | 445 | 19 |
jtrain3 | 2675 | 20 |
k401k | 1534 | 8 |
k401ksubs | 9275 | 11 |
kielmc | 321 | 25 |
lawsch85 | 156 | 21 |
loanapp | 1989 | 59 |
lowbrth | 100 | 36 |
mathpnl | 3850 | 52 |
meap00_01 | 1692 | 9 |
meap01 | 1823 | 11 |
meap93 | 408 | 17 |
meapsingle | 229 | 18 |
minwage | 612 | 58 |
mlb1 | 353 | 47 |
mroz | 753 | 22 |
murder | 153 | 13 |
nbasal | 269 | 22 |
nyse | 691 | 8 |
okun | 47 | 4 |
openness | 114 | 12 |
pension | 194 | 19 |
phillips | 56 | 7 |
pntsprd | 553 | 12 |
prison | 714 | 45 |
prminwge | 38 | 25 |
rdchem | 32 | 8 |
rdtelec | 29 | 6 |
recid | 1445 | 18 |
rental | 128 | 23 |
return | 142 | 12 |
saving | 100 | 7 |
sleep75 | 706 | 34 |
slp75_81 | 239 | 20 |
smoke | 807 | 10 |
traffic1 | 51 | 13 |
traffic2 | 108 | 48 |
twoyear | 6763 | 23 |
volat | 558 | 17 |
vote1 | 173 | 10 |
vote2 | 186 | 26 |
voucher | 990 | 19 |
wage1 | 526 | 24 |
wage2 | 935 | 17 |
wagepan | 4360 | 44 |
wageprc | 286 | 20 |
wine | 21 | 5 |