Overview

This series of posts is intended to get the reader up speed on how to import, format, and use the economic data of Thomas Piketty, Gabriel Zucman, and Emmanuel Saez. Piketty is most known in the US for his seminal 2014 work Capital in the Twenty-First Century, and Saez and Zucman recently released The Triumph of Injustice: How the Rich Dodge Taxes and How to Make Them Pay.

In this chapter you will:

Locating files on web

The micro-files are currently located in Gabriel Zucman’s Distributional National Accounts page located here http://gabriel-zucman.eu/usdina/
micro-files

After clicking on the circled link you will be able to download a zip file of all of the current yearly files. There will probably be a new set released every year and the zip file may change over time. Download a zip file

Unzip the source files to a folder that you can get to through R. I have mine in a sub-directory of the main R project.

Create a new folder at the same level as the folder you unzipped to called Dina_subset. You can change the code you’ll see later if you want a different folder structure.

These files are large, every year has almost 69,000 records which represents a generic US individual-type. For instance one record could be for a married working man age 20-64 with x amount of income from various sources, and y amount of wealth.

Finally some code

We will use these libraries in this section

library(tidyverse)
library(fs)
library(haven)

Get the filenames in Dina_subset

This uses the fs package’s dir_ls() function

paths <- dir_ls("Dina_subset/") 
paths
## Dina_subset/usdina1968.dta Dina_subset/usdina1978.dta 
## Dina_subset/usdina1988.dta Dina_subset/usdina1998.dta 
## Dina_subset/usdina2008.dta Dina_subset/usdina2018.dta

Import the files into a single dataframe

This part does a lot in two lines of code. It maps the filenames to the haven package’s read_dta() function which imports native-Stata data files. It also appends the filename to a column called id, then extracts the year from the filename and puts it into another column.

dina_df <- map_dfr(paths, ~ read_dta(.x), .id = "filename") %>%
  extract(filename, "year", "(\\d{4})")
dim(dina_df)
## [1] 321530    146

There are 146 variables per record.

Look at the variable names

names(dina_df)
##   [1] "year"            "id"              "dweght"          "dweghttaxu"     
##   [5] "female"          "ageprim"         "agesec"          "age"            
##   [9] "oldexm"          "oldexf"          "old"             "oldmar"         
##  [13] "married"         "second"          "xkidspop"        "filer"          
##  [17] "fiinc"           "fninc"           "fainc"           "flinc"          
##  [21] "fkinc"           "ptinc"           "plinc"           "pkinc"          
##  [25] "diinc"           "princ"           "peinc"           "poinc"          
##  [29] "hweal"           "fiwag"           "fibus"           "firen"          
##  [33] "fiint"           "fidiv"           "fikgi"           "fnps"           
##  [37] "peninc"          "schcinc"         "scorinc"         "partinc"        
##  [41] "rentinc"         "estinc"          "rylinc"          "othinc"         
##  [45] "flemp"           "flmil"           "flprl"           "fkhou"          
##  [49] "fkequ"           "fkfix"           "fkbus"           "fkpen"          
##  [53] "fkdeb"           "plcon"           "plbel"           "pkpen"          
##  [57] "pkbek"           "hwequ"           "hwfix"           "hwhou"          
##  [61] "hwbus"           "hwpen"           "hwdeb"           "flwag"          
##  [65] "flsup"           "waghealth"       "wagpen"          "fkhoumain"      
##  [69] "fkhourent"       "fkmor"           "fknmo"           "fkprk"          
##  [73] "proprestax"      "propbustax"      "rental"          "rentalhome"     
##  [77] "rentalmort"      "ownerhome"       "ownermort"       "housing"        
##  [81] "partw"           "soleprop"        "scorw"           "equity"         
##  [85] "taxbond"         "muni"            "currency"        "nonmort"        
##  [89] "hwealnokg"       "hwfin"           "hwnfa"           "plpco"          
##  [93] "ploco"           "plpbe"           "plobe"           "plben"          
##  [97] "plpbl"           "plnin"           "pkpbk"           "pknin"          
## [101] "ptnin"           "dicsh"           "inkindinc"       "colexp"         
## [105] "govin"           "npinc"           "prisupen"        "invpen"         
## [109] "peinck"          "peincl"          "prisupenprivate" "prisupgov"      
## [113] "educ"            "colexp2"         "poinc2"          "tax"            
## [117] "ditax"           "ditaf"           "ditas"           "salestax"       
## [121] "corptax"         "estatetax"       "govcontrib"      "ssuicontrib"    
## [125] "othercontrib"    "ssinc_oa"        "ssinc_di"        "uiinc"          
## [129] "ben"             "dicab"           "dicred"          "difoo"          
## [133] "disup"           "divet"           "diwco"           "dicao"          
## [137] "tanfinc"         "othben"          "medicare"        "medicaid"       
## [141] "otherkin"        "pell"            "vethealth"       "corptax0"       
## [145] "corptax60"       "corptax100"

Save RDS for next chapter

saveRDS(dina_df, "Dina_df.RDS")

Next up: Renaming the variables

END