The Data

This is a relatively clean data set. Potential area for tidying include:

  • Some additional binning - The data set is comprised of three distinct, yet related, data sets.
  • Merged Header - The file has a merged header and will require some clean-up
  • Filter - Filtering rows that are not needed for analysis
  • Missing and Incomplete Data - There is a need to deal with incomplete and/or missing data
  • Skip Rows - Willl need to skip rows during data import due to blank space and a fat header.

My Game Plan

My game plan for this data set follows:

  • Import the data
  • Tidy the data
  • Use ggplot to determine correlations and insights

My Question

Indentify correlation between income and educational attainment.

Transform Data

The steps to transform the dataset are set forth below:

  1. Import Data

Utilize readr with skip parameter of 11 to import the data.

X1 X2 X3 9th to 12th Nongrad Graduate (Incl GED) X6 X7 Total Bachelor’s Degree Master’s Degree Professional Degree Doctorate Degree
….Total 109564 2597 4198 27325 16269 12117 47057 29263 12938 2074 2782
Without Earnings 17 2 1 6 1 5 2 0 2 0 0
With Earnings 109547 2595 4197 27320 16269 12112 47055 29263 12936 2074 2782
..$1 to $2,499 or loss 357 24 17 96 68 31 121 95 19 5 2
..$2,500 to $4,999 97 1 7 39 12 11 27 20 6 0 0
..$5,000 to $7,499 315 16 41 89 34 27 106 66 28 5 7
  1. Tidy the Data

I’m going to use a plotly bar chart with a facet_wrap by educational attainment to answer the question.

Answers / Observations

Indentify if income and educational attainment are correlated.:

  • The chart above shows a positive correlation between educational attainment and income.
  • High school drop outs and individuals who did not attend high school have the lowest incomes in 2018
  • Highest income go to Bachelor, Masters, Phd and Professal degree holders.
  • There are a relatively small number of Phd and Professional degree holders but salaries are high ($100K) for these degreess.
  • All education levels have some individuals earning more than $100K.