All graphs should be created with ggplot2.

Upload a data frame from file demography.csv. It contains data on population of Belgorodskaya and Kaluzhskaya oblasts (districts) in 2016 (taken from Russian State Statistics Service).

df <- read.csv("http://math-info.hse.ru/f/2018-19/pep/demography.csv", encoding = "UTF-8")

Variables:

Problem 1

Create the following columns in a data set:

Hint: you can add three columns at once via mutate() from dplyr.

library(dplyr)
df <- df %>% mutate(work_share = wa_total / popul_total * 100,
                    old_share = ret_total / popul_total * 100,
                    young_share = young_total / popul_total * 100)

Problem 2

Plot a histogram for work_share. Change its fill color and add rugs. Add a vertical line that corresponds to the median value of work_share.

Hint: add one more layer to the graph: geom_vline(xintercept = ) and put the median value after =.

library(ggplot2)
ggplot(data = df, aes(x = work_share)) + 
  geom_histogram(fill = "tomato", color = "black", bins = 10) +
  geom_rug() +
  geom_vline(xintercept = median(df$work_share), 
             lty = 2, lwd = 1, color = "darkgreen")

Note 1. We also changed the default number of bins in the histogram. By default it is 30, so in our case the histogram looks bad unless we specify the parameter bins inside geom_histogram.

Note 2. We also specified a type of the vertical line for the median and its width (lty - line type, lwd - line width).

Other problems we will discuss next time.