1 Data

Lets try to import data into R using the base command using data() command that loads specified data sets, or list the available data sets.

1.1 Time Series Data

Longley’s Economic Regression Data - a macroeconomic data set which provides a well-known example for a highly collinear regression.

A data frame with 7 economical variables, observed yearly from 1947 to 1962 (n=16).

?data()
data()
data(package = .packages(all.available = TRUE))

?longley
df1 <- longley 

Try to represent it visually -

require(stats); require(graphics)
## give the data set in the form it is used in S-PLUS:

?plot
## Help on topic 'plot' was found in the following packages:
## 
##   Package               Library
##   graphics              /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
##   base                  /Library/Frameworks/R.framework/Resources/library
## 
## 
## Using the first match ...
plot(longley$GNP)

# clean up the chart

longley.x <- data.matrix(longley[, 1:6])
longley.y <- longley[, "Employed"]
pairs(longley, main = "longley data")

plot(fm1)

par(opar)

2 AER

??AER will open the ‘Applied Econometrics with R: Package Vignette and Errata’.

??AER

# install.packages("AER")
library("AER")
## Loading required package: car
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:psych':
## 
##     logit
## Loading required package: lmtest
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: sandwich
## Loading required package: survival
data("Fertility")

str(Fertility)
## 'data.frame':    254654 obs. of  8 variables:
##  $ morekids: Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ gender1 : Factor w/ 2 levels "female","male": 2 1 2 2 1 2 1 2 1 2 ...
##  $ gender2 : Factor w/ 2 levels "female","male": 1 2 1 1 1 1 2 2 2 1 ...
##  $ age     : int  27 30 27 35 30 26 29 33 29 27 ...
##  $ afam    : Factor w/ 2 levels "no","yes": 1 1 1 2 1 1 1 1 1 1 ...
##  $ hispanic: Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ other   : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ work    : int  0 30 0 0 22 40 0 52 0 0 ...
##  - attr(*, "datalabel")= chr ""
##  - attr(*, "time.stamp")= chr "26 Dec 2005 09:40"
##  - attr(*, "formats")= chr [1:9] "%9.0g" "%9.0g" "%9.0g" "%9.0g" ...
##  - attr(*, "types")= int [1:9] 251 251 251 251 251 251 251 251 251
##  - attr(*, "val.labels")= chr [1:9] "" "" "" "" ...
##  - attr(*, "var.labels")= chr [1:9] "=1 if mom had more than 2 kids" "=1 if 1st kid was a boy" "=1 if 2nd kid was a boy" "=1 if 1st two kids same sex" ...
##  - attr(*, "version")= int -8
##  - attr(*, "label.table")=List of 9
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
head(Fertility)
##   morekids gender1 gender2 age afam hispanic other work
## 1       no    male  female  27   no       no    no    0
## 2       no  female    male  30   no       no    no   30
## 3       no    male  female  27   no       no    no    0
## 4       no    male  female  35  yes       no    no    0
## 5       no  female  female  30   no       no    no   22
## 6       no    male  female  26   no       no    no   40
#install.packages("psych")

describe(Fertility)
##           vars      n  mean    sd median trimmed  mad min max range  skew
## morekids*    1 254654  1.38  0.49      1    1.35 0.00   1   2     1  0.49
## gender1*     2 254654  1.51  0.50      2    1.52 0.00   1   2     1 -0.06
## gender2*     3 254654  1.51  0.50      2    1.52 0.00   1   2     1 -0.05
## age          4 254654 30.39  3.39     31   30.65 2.97  21  35    14 -0.59
## afam*        5 254654  1.05  0.22      1    1.00 0.00   1   2     1  4.05
## hispanic*    6 254654  1.07  0.26      1    1.00 0.00   1   2     1  3.25
## other*       7 254654  1.06  0.23      1    1.00 0.00   1   2     1  3.85
## work         8 254654 19.02 21.87      5   17.27 7.41   0  52    52  0.54
##           kurtosis   se
## morekids*    -1.76 0.00
## gender1*     -2.00 0.00
## gender2*     -2.00 0.00
## age          -0.45 0.01
## afam*        14.41 0.00
## hispanic*     8.56 0.00
## other*       12.81 0.00
## work         -1.48 0.04
  • In your homework reports, you should try to hide the package warnings and use more presentable commands like stargazer instead of psych::describe.

  • Same chunk without warning and/or messages is much more readable/presentable. Be sure to understand what the warning message is trying to tell you though.

??AER

# install.packages("AER")
library("AER")


data("Fertility")

str(Fertility)
## 'data.frame':    254654 obs. of  8 variables:
##  $ morekids: Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ gender1 : Factor w/ 2 levels "female","male": 2 1 2 2 1 2 1 2 1 2 ...
##  $ gender2 : Factor w/ 2 levels "female","male": 1 2 1 1 1 1 2 2 2 1 ...
##  $ age     : int  27 30 27 35 30 26 29 33 29 27 ...
##  $ afam    : Factor w/ 2 levels "no","yes": 1 1 1 2 1 1 1 1 1 1 ...
##  $ hispanic: Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ other   : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ work    : int  0 30 0 0 22 40 0 52 0 0 ...
##  - attr(*, "datalabel")= chr ""
##  - attr(*, "time.stamp")= chr "26 Dec 2005 09:40"
##  - attr(*, "formats")= chr [1:9] "%9.0g" "%9.0g" "%9.0g" "%9.0g" ...
##  - attr(*, "types")= int [1:9] 251 251 251 251 251 251 251 251 251
##  - attr(*, "val.labels")= chr [1:9] "" "" "" "" ...
##  - attr(*, "var.labels")= chr [1:9] "=1 if mom had more than 2 kids" "=1 if 1st kid was a boy" "=1 if 2nd kid was a boy" "=1 if 1st two kids same sex" ...
##  - attr(*, "version")= int -8
##  - attr(*, "label.table")=List of 9
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
head(Fertility)
##   morekids gender1 gender2 age afam hispanic other work
## 1       no    male  female  27   no       no    no    0
## 2       no  female    male  30   no       no    no   30
## 3       no    male  female  27   no       no    no    0
## 4       no    male  female  35  yes       no    no    0
## 5       no  female  female  30   no       no    no   22
## 6       no    male  female  26   no       no    no   40
#install.packages("psych")

describe(Fertility)
##           vars      n  mean    sd median trimmed  mad min max range  skew
## morekids*    1 254654  1.38  0.49      1    1.35 0.00   1   2     1  0.49
## gender1*     2 254654  1.51  0.50      2    1.52 0.00   1   2     1 -0.06
## gender2*     3 254654  1.51  0.50      2    1.52 0.00   1   2     1 -0.05
## age          4 254654 30.39  3.39     31   30.65 2.97  21  35    14 -0.59
## afam*        5 254654  1.05  0.22      1    1.00 0.00   1   2     1  4.05
## hispanic*    6 254654  1.07  0.26      1    1.00 0.00   1   2     1  3.25
## other*       7 254654  1.06  0.23      1    1.00 0.00   1   2     1  3.85
## work         8 254654 19.02 21.87      5   17.27 7.41   0  52    52  0.54
##           kurtosis   se
## morekids*    -1.76 0.00
## gender1*     -2.00 0.00
## gender2*     -2.00 0.00
## age          -0.45 0.01
## afam*        14.41 0.00
## hispanic*     8.56 0.00
## other*       12.81 0.00
## work         -1.48 0.04
  • You can use either .Rmd or .qmd extension (later is preferred).