All graphs should be created with ggplot2
.
Upload a data frame from file demography.csv
. It contains data on population of Belgorodskaya and Kaluzhskaya oblasts (districts) in 2016 (taken from Russian State Statistics Service).
df <- read.csv("http://math-info.hse.ru/f/2018-19/pep/demography.csv", encoding = "UTF-8")
Variables:
region_ru
: region (in Russian, Belgorodskaya and Kaluzhskaya oblasts);district_ru
: district (in Russian);region_code
: code of region (1 - Belgorodskaya oblast, 2 - Kaluzhskaya oblast);empl_total
: total number of employed people;popul_total
: population;urban_total
: urban population;rural_total
: rural population;wa_total
: total number of people in working age (from 18 to 55 for females, from 18 to 60 for males);ret_total
: total number of people older than working age;young_total
: total number of people younger than working age;Create the following columns in a data set:
work_share
: a share of people in working age (in %);old_share
: a share of people older than working age (in %);young_share
: a share of people younger than working age (in %);Hint: you can add three columns at once via mutate()
from dplyr
.
library(dplyr)
df <- df %>% mutate(work_share = wa_total / popul_total * 100,
old_share = ret_total / popul_total * 100,
young_share = young_total / popul_total * 100)
Plot a histogram for work_share
. Change its fill color and add rugs. Add a vertical line that corresponds to the median value of work_share
.
Hint: add one more layer to the graph: geom_vline(xintercept = )
and put the median value after =
.
library(ggplot2)
ggplot(data = df, aes(x = work_share)) +
geom_histogram(fill = "tomato", color = "black", bins = 10) +
geom_rug() +
geom_vline(xintercept = median(df$work_share),
lty = 2, lwd = 1, color = "darkgreen")
Note 1. We also changed the default number of bins in the histogram. By default it is 30, so in our case the histogram looks bad unless we specify the parameter bins
inside geom_histogram
.
Note 2. We also specified a type of the vertical line for the median and its width (lty
- line type, lwd
- line width).
Other problems we will discuss next time.