GINI Index for Regional US income

The Census Bureau publishes data sets which tracks the GINI index at different levels of geographic granularity . . . such as state, metropolitan statical area and congressional district. This exercise cleans up “untidy” contingency tables for GINI by region. Census data at various levels can be searched under the American Community Survey for the year being sought by under the subject of income, by either individual or family and at a chosen geographic level The census provides the followng tool for acquiring data sets . . .

http://factfinder.census.gov/faces/nav/jsf/pages/guided_search.xhtml

GINI is an internationally recognized measure of income dispersion within a specified geographic area. Income inequality has been a topical discussion in recent years and this is a key measure.

Load Data Frame from file

l <- read.csv(
              "/Users/scottkarr/IS607Spring2016/project2/more/GINI-2014-Region-untidy.csv",
              sep=",",
              na.strings = "",
              blank.lines.skip = TRUE,
              col.names = c("Quintile", "West",  "South", "Midwest","Northeast", "US Overall"),
              stringsAsFactors=FALSE
    )
df = data.frame(l)

Tidy data

# remove extraneous rows
# derived fields can be calculated from raw data
df <- df[-c(1,7),]
# gather morphs data from wide to long format
df_tidy <- df %>% 
  gather(Region, Gini, -Quintile) %>%
  arrange(Quintile, Region, Gini)
# organize the final data sets
df_tidy <- df_tidy %>%
  select(Region, Quintile, Gini) %>%
  arrange(Region, Quintile, Gini)
# present data nicely
kable(df_tidy, align = 'l')

Region	Quintile	Gini
Midwest	1st quintile	0.08
Midwest	2nd qunitile	0.13
Midwest	3rd quintile	0.19
Midwest	4th quintile	0.29
Midwest	5th quintile	0.31
Northeast	1st quintile	0.12
Northeast	2nd qunitile	0.18
Northeast	3rd quintile	0.24
Northeast	4th quintile	0.32
Northeast	5th quintile	0.13
South	1st quintile	0.32
South	2nd qunitile	0.25
South	3rd quintile	0.18
South	4th quintile	0.14
South	5th quintile	0.11
US.Overall	1st quintile	0.20
US.Overall	2nd qunitile	0.20
US.Overall	3rd quintile	0.20
US.Overall	4th quintile	0.20
US.Overall	5th quintile	0.20
West	1st quintile	0.12
West	2nd qunitile	0.21
West	3rd quintile	0.22
West	4th quintile	0.19
West	5th quintile	0.26

Analysis - Group by region and calculate statistics

df_tidy_grouped= group_by(df_tidy, Region)
df_stats <-summarise(df_tidy_grouped, mean_gini = mean(Gini), std_gini = sd(Gini))

Presentation

# histogram of population by regions
ggplot(df_tidy) + geom_histogram(aes(x = Gini))

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# regional scatterplot of population by regions
ggplot(data = df_tidy, aes(x = Quintile, y = Gini)) +
  geom_point() + facet_wrap( ~ Region )