Histogram of incomes given a relative frequency distribution

Heather Geiger

February 16, 2018

We start being given the following income distribution.

  1. $1 to $9,999 or loss - 2.2%
  2. $10,000 to $14,999 - 4.7%
  3. $15,000 to $24,999 - 15.8%
  4. $25,000 to $34,999 - 18.3%
  5. $35,000 to $49,999 - 21.2%
  6. $50,000 to $64,999 - 13.9%
  7. $65,000 to $74,999 - 5.8%
  8. $75,000 to $99,999 - 8.4%
  9. $100,000 or more - 9.7%

To draw a histogram of this, let’s assume that the frequency of getting any income within an interval is relatively equal.

Let’s exclude the last interval from the plot, since it has no upper limit.

Let’s plot out of 10,000 people, taking $5k income intervals.

Round as necessary to get whole numbers of people.

  1. Two $5k intervals in here, 2.2% / 2 = 1.1% = 110 people per interval.
  2. One interval here, 4.7% = 470 people.
  3. Two $5k intervals here, 15.8% / 2 = 7.9% = 790 people per interval.
  4. Two $5k intervals here, 18.3% / 2 = 9.15% = 915 people.
  5. Three intervals, 21.2% / 3 = 7.0667% = 706.67 people. Give the middle interval a bit more people, so 706, 708, and 706 people.
  6. Three intervals, again give the middle interval more people. So 463, 464, and 463 people.
  7. Two intervals. 5.8% / 2 = 2.9% = 290 people per interval.
  8. 5 intervals, 8.4% / 5 = 1.68% = 168 people per interval.

Let’s make a histogram by repeating the middle of each interval by the correct number of people.

incomes <- seq(from=2499.5,to=97499.5,by=5000)
number_people <- c(110,110,470,790,790,915,915,706,708,706,463,464,463,290,290,168,168,168,168,168)

hist(rep(incomes,times=number_people),
labels=FALSE,
xlab="Income",ylab="Number of people",
main="Income distribution modeled based on 10,000 people\nand breaking up larger intervals\ninto equal $5k income brackets",
breaks=seq(from=0,to=100000,by=5000))

We can clearly see that the data is right-skewed.