This is a short exercise to assess income inequality drawing Lorenz curves and calculating Gini coefficients.

We will need reldist and IC2 libraries to get them…

# install.packages('reldist', dependencies = T)
# install.packages('IC2', dependencies = T)
library(reldist)
library(IC2)
library(ggplot2)
library(dplyr)

…we will abandon the exponential notation (e.g. e+10) for prettier graphs…

options(scipen = 999)

…and we will need some data.

Q1: What happens here?

Explain how our data is structured.
set.seed(42)
city <- c("A", "B", "C", "D", "E", "F", "G", "H")
income <- sample(1:100000,
                 160,
                 replace = TRUE)
cities <- data.frame(city, income)

(Hint: run with set.seed all the time to get exactly the same data.
When you run on your machine you can get different numbers!)

Q2: What does the following graph show?

Save the figure, name it and provide a description.
Based on the graph, in which city do you think the income inequality is bigger and why?
par(mfrow=c(2,4))
for (i in LETTERS[1:8]) {
curveLorenz(cities[city==i,'income'], col = 'red')
title(paste('City', i))
}

Q3: Calculate and compare Gini indices in all the cities.

Insert and name the table.
Which conclusions can you draw from it?
Were your expectations from task 2 confirmed and why?
ginicities <- aggregate(income ~ city,
                        data = cities,
                        FUN = "gini")
names(ginicities) <- c("city", "gini")
knitr::kable(ginicities %>% arrange(desc(gini)), align = 'l')
city gini
H 0.4549123
C 0.3505498
B 0.3184892
G 0.3037768
E 0.3022083
D 0.2759564
A 0.2710777
F 0.1637490

Q4: Play with the data and comment on the results.

Change a couple of observations in your data and repeat the previous step.
Note in which cities you changed the data.
See the resulting table. Explain how and why the results have changed.
cities[26,]
##    city income
## 26    B  24609
cities[26,2] <- 120000
cities[city == 'D',]$income <- cities[city == 'D',]$income*1.5
city gini
H 0.4549123
C 0.3505498
B 0.3331599
G 0.3037768
E 0.3022083
D 0.2759564
A 0.2710777
F 0.1637490

Q0 (2 bonus points to whatever HA): What does the following graph show?

Save the figure, name it and provide a description.
How the distributions correspond to the inequality measures calculated above?
ggplot(cities,
       aes(income)) +
  geom_histogram(aes(y = ..density..), bins = 20) +
  geom_density() +
  facet_wrap(~ city, ncol = 2)