This is a short exercise to assess income inequality drawing Lorenz curves and calculating Gini coefficients.

You’ll need to run the code on your machine and you’ll be required to change it a bit, so your data and the resulting pictures and tables will be different from what you’ll see in this tutorial.

We will need reldist and IC2 libraries…
# install.packages('reldist', dependencies = T)
# install.packages('IC2', dependencies = T)
library(reldist)
library(IC2)
library(ggplot2)
library(dplyr)
…we will abandon the exponential notation (e.g. e+10) for prettier graphs…
options(scipen = 999)
…and we will need some data. If you’re comfortable with R you can use any open data that contains income distribution. Here, we’ll generate some artificial data.

Explain how our data are structured.

For your submission, select another seed instead of 42, report it.
set.seed(42)
city <- c("A", "B", "C", "D", "E", "F", "G", "H")
income <- sample(1:100000,
                 160,
                 replace = TRUE)
cities <- data.frame(city, income) %>% arrange(city)

What does the following graph show?

Insert the figure with a name and a description.
Based on the graph, in which city do you think the income inequality is bigger and why?

## CODE FIXED

par(mfrow=c(2,4))
for (i in LETTERS[1:8]) {
  cities %>% filter(city == i) %>% .$income %>% curveLorenz(., col = 'red')
  title(paste('City', i))
}

Calculate and compare Gini indices in all the cities.

Insert and name the table.
Which conclusions can you draw from it?
Were your expectations from the previous task confirmed?
ginicities <- aggregate(income ~ city,
                        data = cities,
                        FUN = "gini")
names(ginicities) <- c("city", "gini")
knitr::kable(ginicities %>% arrange(desc(gini)), align = 'l')
city gini
H 0.4549123
C 0.3505498
B 0.3184892
G 0.3037768
E 0.3022083
D 0.2759564
A 0.2710777
F 0.1637490

What does the following graph show?

Insert the figure with a name and a description.
How the distributions correspond to the inequality measures calculated above?
ggplot(cities,
       aes(income)) +
  geom_histogram(aes(y = ..density..), bins = 20) +
  geom_density() +
  theme_minimal() +
  facet_wrap(~ city, ncol = 2)

Play with the data and comment on the results.

Manipulate a couple of observations in your data (note in which cities you rewrote the data) and repeat the Gini coefficient calculation.
Look at the resulting table and comment on the changes.

## CODE FIXED

cities[26,]
##    city income
## 26    B  26912
cities[26,2] <- 120000

cities <- cities %>% mutate(income = ifelse(city == 'D', income*1.5, income))
city gini
H 0.4549123
C 0.3505498
B 0.3352633
G 0.3037768
E 0.3022083
D 0.2759564
A 0.2710777
F 0.1637490