This is the first in a series of posts to help R users with an interest in economics understand the work of Piketty, Zucman, and Saez on the distribution of national income, wealth, and taxes across the US adult population. These economists argue that income and wealth inequality is growing at an alarming rate and the wealthiest people are now paying the lowest effective tax rate of any income class. This series will take you step-by-step through the models and data transformations needed for full-spectrum analysis of their work. We will be using the data provided by Piketty, Zucman, and Saez which includes yearly files of every distinct US income and wealth cohort packed with over 140 variables, and we will be following the outline of my forthcoming e-book Economics in R: US Distributional National Accounts (Spring 2020).

What are Distributional National Accounts

Distributional National Accounts are a way to see how income, wealth, and taxes are distributed across a population, in this case the US. The ultimate goal of the economists championing this approach is to apply this methodology to every developed country so that comparisons across nations are possible. For each year, every adult person is represented as a member of a cohort and every dollar of national income is accounted for and reconciled to official US data. One way to think of the income distribution is to arrange the income of each person in ascending order then select the middle person to see what the median income was for that year.

I will be using four broad categories in this article:
1. Working-class - Those in the 0-50 percentiles
2. Middle-class - Those in the 50-90 percentiles
3. Upper-middle-class - Those in the 90-99 percentiles
4. Rich - Those in the 1%

Thank goodness for the weighted.median() function in the spatstat package.

In this way you can compare how the working-class are doing compared to the middle-class or the rich. You can also compare these across years to see how income & wealth distribution has changed over time.

Why you should care in six charts

The data is given in yearly “micro-files” from 1962 to 2018 where each record in the file represents a cohort of US adults. Their income, wealth and taxes are broken down to a granular level so an analyst can take a deep dive into any segment of the population. The economists use 2014 as a base year even though 2018 is the latest year and includes the effects of the 2017 Tax Reform.

Unless you are a billionaire these charts should concern you.

Chart 1 - The bottom 90% vs the top 1%

You are going to start to see a recurring theme around the year 1980. This is the year everything started to go off the rails for the average American.

From 1946 to 1966 income growth for both groups are highly correlated. From 1966 to 1980 the income of the bottom 90% was growing faster than the top 1%. After 1980 the bottom 90% income still grew at a steady pace from about $30,000 to $38,000 but the income of the rich went from about $600,000 to about $1.3 million per year. In 1980 it took twenty bottom-90 incomes to equal a single rich person; in 2014 it was 34.21 to 1, indicating that the wealthiest are pulling away from the rest of society.

Chart 2 - The top 0.1% share of national income

The adult population in 2014 was 243 million so 0.1% of that is 243,000 people. Those 243,000 people took 9.3% of all income while the working-class (~120 million people) received 12.5% of national income. In 1980 the share of income going to the top 0.1% was less than 4% of the total so their share has more than doubled.

Chart 3 - The rise of the Top 10%

Again, if we look at 1980 the share of income earned by the upper-middle-class and rich went from 35% to 50% in 2012. It stayed steady at 35% for 40 years before taking off after 1980. Notice that the upper-middle-class and rich income share today has surpassed the highest peak before the Great Depression circa ~1928.

Chart 4 - The big switcheroo

In 1980 the working-class brought in around 20% of all income and the rich brought in about 8%. By 2014 they have completely switched with rich bringing home 20% of national income and the working-class bringing in around 11%. The cross-over year was 1995.

Even to the untrained eye this looks like a transfer of wealth from the bottom 120 million people to the top 243,000 people.

Chart 5 - The stagnating incomes of the bottom 50%

Since 1980 the real average income of someone in the working-class went up $200. Yes, a raise of $200 over 34 years. That translates to a total increase of 1.23% or 0.036% per year. Meanwhile at the country club the income of the rich more than tripled with an average yearly gain of 1.97%.

350In 2012 Nobel Prize winner Joseph Stiglitz calculated that 91% of all GDP gains go to the top 1% and the bottom 99% get to split the remaining 9%.

Chart 6 - What happened to all the tax revenue?

The image below is from an infographic in this excellent NY Times article that uses the data that we will be working with. Look at the effective tax rate of the top 400 go from over 70% in 1950 to the lowest tax rate of any group. This revelation refutes the general assumption that the US has a progressive tax system. The US had a progressive tax system, now it’s basically a flat tax that gets regressive at the top.
It’s interesting that the 95th to 99th percentile changed the least over time and serves as a pivot-point. I highly recommend checking out the NY Times article. I’ll have to figure out how to recreate it in ggplot().

What’s coming up

My goal is to get any R user interested in analyzing this data up and running as quickly as possible.

This series of articles will resolve all of the roadblocks I encountered as I wrangled the data into usable shape.

When I discovered the dataset the first issue I ran into was that they were Stata files. How do I import Stata files? Luckily, the haven package has a function to read Stata files.

Next hurdle, these are pretty big files. Even though they top out at about 68,000 records they have 146 columns each. You don’t necessarily want to load every year, every time so how do I only bring in the years I want.

Then … OMG what are these variable names!! They all need to be replaced with more verbose names with a standard naming convention. These are the source names for the wealth variables.

hwequ + hwfix + hwhou + hwbus + hwpen + hwdeb

Reconcile the data with itself and with official US data. Some of the variables are aggregates of other granular variables. I dissect each one and validate that it matches the source data (and it does) and then check it against official data. We are mainly interested that Adult Population and National Pre-tax Income match.

Create proportions so that I know how much income & wealth comes from each granular source.
Here are all of the granular variables that make up total wealth.
Notice the better names from the source names above.

summ_wealth_net = 
        assets_equity +
        assets_currency +
        assets_housing +
        assets_business +
        assets_pension_lifeins +
        liabilities_household, 
recon_wealth_net = 
        ttl_wealth_net - summ_wealth_net,

Create distributions. This whole brand of economics is about distributions and they did not supply a variable that indicates where each cohort is on the income & wealth distributions so I calculate it and add it to the main dataset.

When we are finished you will have 263 columns and everything you need to start exploring.

End notes

The economists

Distributional national accounts are not a new economic idea but it has gotten renewed interest due to the prolific effort of three French economists, two of which are now working in the US.

Thomas Piketty of the Paris School of Economics released his landmark book Capital in the 21st Century in 2014 that showed in no uncertain terms that capital income is growing much faster than labor income and that the fruits of any economic gains were disproportionally going to the top 1% of income earners compared to the past. Those with high wealth could grow that wealth at much higher rates than inflation creating dynastic wealth. Professor Piketty is releasing his follow-up Capital and Ideology in March 2020 and should be a bestseller even though it is over 1000 pages long; that’s about 30 hours for those of us that prefer to listen. Piketty has been at this for awhile. His first book The Economics of Inequality was originally published in France in 1997 and was finally translated to English in 2015.

Emmanuel Saez and Gabriel Zucman are both professors of economics at the University of California - Berkeley and have recently released The Triumph of Injustice: How the Rich Dodge Taxes and How to Make Them Pay which details the demise of the progressive tax system in the US and elegant solutions that prevent the highest earners from paying a lower tax rate than their help. They are the primary developers of the US datasets that will be used in this series and have been helpful when I reached out to them with questions concerning the data. Professor Zucman has also published The Hidden Wealth of Nations - The Scourge of Tax Havens in 2013.

I think you get the picture of what they are focused on.

The main site for US Distributional National Accounts is hosted by Gabriel Zucman and contains links to the data, the data appendix/dictionary, and their published paper on the subject.

Feel free to connect with me through github, twitter or LinkedIn.