Group work & personal contribution

  • Alexandra Shanina was responsible for graph descriptions and wrote code for variable selecting and filtering for further graph construction.

  • Milena Oleshko created descriptive tables for all used variables, constructed scatterplot and helped to code other graphs.

  • Zakharova Victoria organized .Rmd and constructed graphs (except the scatterplot).

Overall, we had several group discussions that were about a deal of apsects such as distinguishing possible variables for the whole project, for instance.

Note: we wanted to organize our work in the following way: “what we do” -> “code” -> “graph” -> “description”. Hope, we did it :)

Identification of variable types

As our topic is migration in Germany, we attempted to choose appropriate variables. We are going to work with the following ones:

Variable <- c("gndr","agea", "imsclbn","impcntr", "livecnta", "hhmmb", "wkhtot","ctzcntr", "mainact", "domicil" ) 
LevelOfMeasure <- c("Nominal", "Ratio", "Interval", "Interval", "Ordinal", "Ordinal", "Ordinal", "Nominal", "Nominal", "Nominal") 
mode_is_it_possible <- c("Yes","Yes","Yes","Yes","Yes","Yes","Yes","Yes","Yes","Yes") 
median_is_it_possible <- c("No", "Yes", "No", "No", "Yes", "Yes", "Yes", "No", "No","No") 
mean_is_it_possible <- c("No", "Yes", "Yes", "Yes","No","No","No","No","No","No") 
QualOrQuant <- c("Qualitive", "Quantitative", "Quantitative","Quantitative","Quantitative","Quantitative","Quantitative", "Qualitive", "Qualitive", "Qualitive") 
ContinOrDiscrete <- c("None", "Discrete", "Discrete", "Discrete", "Continuous", "Continuous","Continuous", "None","None", "None") 
DescriptionOfVariables <- c("Gender", "Age of respondent, calculated", "When should immigrants obtain rights to social benefits/services", "Allow many/few immigrants from poorer countries outside Europe", "What year you first came to live in country", "Number of people living regularly as member of household", "Total hours normally worked per week in main job overtime included", "Citizen of country", "Main activity last 7 days", "Domicile, respondent's description") 


Variables_discription1 <- data.frame (Variable, LevelOfMeasure, QualOrQuant, ContinOrDiscrete, mode_is_it_possible, median_is_it_possible, mean_is_it_possible) 
knitr::kable(Variables_discription1) 
Variable LevelOfMeasure QualOrQuant ContinOrDiscrete mode_is_it_possible median_is_it_possible mean_is_it_possible
gndr Nominal Qualitive None Yes No No
agea Ratio Quantitative Discrete Yes Yes Yes
imsclbn Interval Quantitative Discrete Yes No Yes
impcntr Interval Quantitative Discrete Yes No Yes
livecnta Ordinal Quantitative Continuous Yes Yes No
hhmmb Ordinal Quantitative Continuous Yes Yes No
wkhtot Ordinal Quantitative Continuous Yes Yes No
ctzcntr Nominal Qualitive None Yes No No
mainact Nominal Qualitive None Yes No No
domicil Nominal Qualitive None Yes No No
Variables_discription2 <- data.frame(Variable, DescriptionOfVariables)
knitr::kable(Variables_discription2)
Variable DescriptionOfVariables
gndr Gender
agea Age of respondent, calculated
imsclbn When should immigrants obtain rights to social benefits/services
impcntr Allow many/few immigrants from poorer countries outside Europe
livecnta What year you first came to live in country
hhmmb Number of people living regularly as member of household
wkhtot Total hours normally worked per week in main job overtime included
ctzcntr Citizen of country
mainact Main activity last 7 days
domicil Domicile, respondent’s description

Graphs

As the basic step, let us to download data and activate needed libraries for further work.

Germany <- haven::read_sav("ESS8DE.sav") 

library(haven)
library(ggplot2)
library(dplyr)
library(cowplot)
library(knitr) 

Histogram (for single continuous variable(s))

The fisrt graph is a histogram. We would like to look at the distribution of ages of people who were for the migrants’ immediate access to social benefits. Here it is!

### Choose some variables such as age, gender and opinions about the mentioned issue for construction of histogram

## for male
data_hist1 <- Germany %>% 
                select(imsclbn, gndr, agea) %>% 
                  na.omit() %>% 
                    filter(gndr == 1 & imsclbn == 1)

## for female
data_hist2 <- Germany %>% 
                select(imsclbn, gndr, agea) %>% 
                  na.omit() %>% 
                    filter(gndr == 2 & imsclbn == 1)

### Construct two histograms according to gender and depict them on one picture

pic_hist1 <- ggplot() + 
                geom_histogram(data = data_hist1, aes(x = as.numeric(agea)), 
                col = "#6F15A4", fill = "#D5ACEE", alpha = 0.75, binwidth = 1) +
                  geom_vline(aes(xintercept = mean(data_hist1$agea)), color="#490A72", size=0.75)+
                  geom_vline(aes(xintercept = median(data_hist1$agea)), linetype="dashed", 
                  color="#D9137F", size=0.75) +
                    labs(title = "Distribution of men's age", 
                    x = "Age", y = "Frequency") +
                    theme_minimal() 

pic_hist2 <- ggplot() + 
                geom_histogram(data = data_hist2, aes(x = as.numeric(agea)), 
                col = "#D58E22", fill = "#FACB84", alpha = 0.75, binwidth = 1) +
                  geom_vline(aes(xintercept = mean(data_hist2$agea)), color="#490A72", size=0.75)+
                  geom_vline(aes(xintercept = median(data_hist2$agea)), linetype="dashed", 
                  color="#D9137F", size=0.75) +
                    labs(title = "Distribution of women's age", 
                    x = "Age", y = "Frequency") +
                    theme_minimal() 

two_hist <- plot_grid(pic_hist1, pic_hist2, labels = "AUTO")

title <- ggdraw() + draw_label("Age of people who are for the migrants' immediate access to social benefits", fontface='bold', size = 12.5)

plot_grid(title, two_hist, ncol=1, rel_heights=c(0.1, 1))

### Now we create a table with mean and median of ages for more accurate visibility

mean_male_age <-  mean(data_hist1$agea)
median_male_age <- median(data_hist1$agea) 
mean_female_age <-  mean(data_hist2$agea)
median_female_age <- median(data_hist2$agea) 

df_age <- data.frame(mean_male_age, mean_female_age, median_male_age, median_female_age)
kable(df_age)
mean_male_age mean_female_age median_male_age median_female_age
43.35758 44.03425 43 43

This graph show the amount of people according to their age group and gender, who agree with the statement that migrants should have the immediate access to social benefits. There were used such variables as Calculated Age (agea), Gender (gndr) and When should immigrants obtain rights to social benefits/services (imsclbn).

It was interesting to explore which age group more frequently answered on “When should immigrants obtain rights to social benefits/services” that immigrants should be treated with all social benefits as soon as they come to the country. That is why, the answers on this particular question were filtered and only one variant of answer was chosen: “Immediately on arrival”.

It can be observed from the graph that males aged thirty, forty and sixty are more likely to provide migrants with all social benefits immediately, thus it could be said they are more tolerant towards migrants. In female group could be seen that women in the age of thirty and fifty are tend to agree with the statement “Migrants should obtain rights to social benefits Immediately on arrival”. Both genders from sixty-five to seventy-five are more conservative, they are less likely to answer that way.

Barplot (for single categorical variable(s))

Now, it is time to create a barplot. Doing it, we decided to choose a varible such as viewpoints about people from the poorer countries outside Europe and make a division according to respondents’ gender.

### Select some variables in which we are interested in (main one is opinions and one for division)

data_bar_men <- Germany %>% select(impcntr, gndr) %>% na.omit() %>% 
                    filter((impcntr != 7 | impcntr != 8 | impcntr != 9)
                           & (gndr == 1))

data_bar_women <- Germany %>% select(impcntr, gndr) %>% na.omit() %>% 
                    filter((impcntr != 7 | impcntr != 8 | impcntr != 9)
                           & (gndr == 2))


### Rename the chosen variables for better presentation on graphs

data_bar_men$impcntr <- factor(data_bar_men$impcntr, levels = c(1, 2, 3, 4), labels = c("Allow many \nto come and live \nhere", "Allow some", "Allow a few", "Allow none"))
data_bar_men$gndr <- factor(data_bar_men$gndr, levels = c(1), labels = c("Male"))

data_bar_women$impcntr <- factor(data_bar_women$impcntr, levels = c(1, 2, 3, 4), labels = c("Allow many \nto come and live \nhere", "Allow some", "Allow a few", "Allow none"))
data_bar_women$gndr <- factor(data_bar_women$gndr, levels = c(2), labels = c("Female"))


### Construct two barplots and depict them on one picture 

barplot1 <- ggplot() + geom_bar(data = data_bar_men, aes(x = as.factor(impcntr)), fill = "#D5ACEE") +
           xlab(" ") +
           ylab("Number of men") +
           theme(axis.text.x = element_text(angle=90)) +
           theme_bw(base_size = 9)

barplot2 <- ggplot() + geom_bar(data = data_bar_women, aes(x = as.factor(impcntr)), fill = "#FACB84") +
           xlab(" ") +
           ylab("Number of women") +
           theme(axis.text.x = element_text(angle=90)) +
           theme_bw(base_size = 9)

two_bar <- plot_grid(barplot1, barplot2, labels = "AUTO")

title_bar <- ggdraw() + draw_label("Viewpoints about people from the poorer countries outside Europe", fontface='bold', size = 12.5)

plot_grid(title_bar, two_bar, ncol=1, rel_heights=c(0.1, 1))

This bar chart illustrates the different attitudes of both genders germans towards immigrants from Non-European countries. We used Gender (gndr) and Allow many/few immigrants from poorer countries outside Europe (impcntr) variables to build this graph.

Men and Women in Germany shows the same results. Both genders are prefer to choose “Allow some”, so people are seems to be rather tolerant towards immigrant. It thesis also can be proved by considering that the smallest number of people answered “Allow none”.

Scatterplot (for bivariate distribution of continuous variables)

The next is a scatterplot. Here we want to look at the correlation between number of people living in a household and year of migration.

### We do similar manipulations as we have done before: 
## a) choosing variables
data_sc <- Germany %>% 
                select(livecnta, hhmmb) %>% na.omit()

## b) and construction graph
pic_sc <- ggplot(data_sc, aes(x = as.numeric(livecnta), y = as.numeric(hhmmb))) + 
            geom_point(shape = 1) + geom_smooth(method = lm) +
            ylab("Number of people in the household") +
            xlab("Year in which people came to Germany") +
            ggtitle("Correlation between members of household and year of migration") 
pic_sc

This scatter plot depicts the relation between year of migrating to Germany and the number of family members. By this graph we wanted to observe how many people moved to Germany with their families or started a family during 1940 and 2000 years. Variables are used in the graph: Number of people living regularly as member of household (hhmmb) and What year you first came to live in country (livecnta).

It is clearly seen from the graph, that the number of migrants, who came to the Germany increased a lot. Here is depicted the upward trend line. More and more people started to move to Germany with their families or started a family, that can say, that migrants seek not for labor opportunities, but also for better living conditions.

Boxplot (for bivariate distribution of a continuous and categorical variable)

Further, let us make things more complicated and construct a boxplot.

### We do similar manipulations as we have done before: 
## a) choosing variables
data_box1 <- Germany %>% 
                select(wkhtot, ctzcntr, mainact) %>% na.omit() 

## b) renaming the chosen variables for better presentation on graphs
data_box1$ctzcntr <- factor(data_box1$ctzcntr, levels = c(1, 2), labels = c("Yes", "No"))

## c) and construction graph
box1 <- ggplot() +
  geom_boxplot(data = data_box1, aes(x = as.factor(mainact), y = as.numeric(wkhtot))) +
  theme_bw() + 
  facet_grid(~as.factor(data_box1$ctzcntr)) +
  ylab("Total hours worked per week in main job") + 
  xlab("Main activity last 7 days") +
  ggtitle("Variety of working activity depending on citizenship") +
  theme_set(theme_bw(base_size = 10))

### There is a creation of a table that describes  meaning of numbers regarding to activity
activity_number <- c(1:9)
text_of_description <- c("Paid work", "Education", "Unemployed, looking for job", "Unemployed, not looking for job", "Permanently sick or disabled", "Retired", "Community or military service", "Housework, looking after children, others", "Other")
desc_df <- data.frame(activity_number, text_of_description)

box1

kable(desc_df)
activity_number text_of_description
1 Paid work
2 Education
3 Unemployed, looking for job
4 Unemployed, not looking for job
5 Permanently sick or disabled
6 Retired
7 Community or military service
8 Housework, looking after children, others
9 Other

That boxplot gives the information about the difference in activities of immigrants and citizens. To make this figure we used “Main activity last 7 days” (mainact) and “Total hours normally worked per week in main job overtime included” (wkhtot)

The graph shows that there are more immigrants, who unemployed and not looking for a job (4), there are less people without work among citizens. Immigrants are more engaged in the service sector (8) than citizens are. Also, what is interesting, there are little number of immigrants who work in education sphere (2), community and military service (8). More than that it is seen from the graph that there are no retired immigrants (6), So that point is obvious because we were observing people based on citizenship presence. What is more, there are small number among immigrants, who are sick or disabled (5), while there are a lot of people with citizenship which are permanently sick (9).

Overall, there is a gap among owner of citizenship and immigrants. Immigrants are tend to have low-paid jobs, do not have any privileges. Immigrants engaged mostly in service sphere.

Stacked barplot (for bivariate distribution of categorical variables)

The last graph is another type of barplot.

data_st_bar <- Germany %>% 
                select(domicil, impcntr) %>% 
                  na.omit() %>% 
                    filter((impcntr != 7 | impcntr != 8 | impcntr != 9)
                           & (domicil != 7 | domicil != 8 | domicil != 9))

data_st_bar$impcntr <- factor(data_st_bar$impcntr, levels = c(1, 2, 3, 4), labels = c("Allow many to come and live here", "Allow some", "Allow a few", "Allow none"))

data_st_bar$domicil <- factor(data_st_bar$domicil, levels = c(1, 2, 3, 4, 5), labels = c("A big city", "Suburbs or\n outskirts of \n big city", "Town or \nsmall city", "Country \nvillage", "Farm or home \nin countyside"))

st_bar <- ggplot() +
  geom_bar(data = data_st_bar, aes(x = as.factor(domicil), fill = as.factor(impcntr))) +
  xlab("Place of living")  +
  ylab("Number of people") +
  theme_bw() + 
  labs(fill = "Variation of opinions", title = "Viewpoints about people from the poorer countries outside Europe \nexpressed by respondents who differ in places of dwelling") +
  scale_fill_brewer(palette = "Set3") +
  theme_set(theme_bw(base_size = 10)) 

st_bar

Here is a stacked bar plot which demonstrates how citizens of different country areas accept migrants from poor countries to live with them. By “accepting” it is meant what people answered on the question How about people from the poorer countries outside Europe?: Allow many is acception, allow none is refusal. This graph was constructed by these variables: Allow many/few immigrants from poorer countries outside Europe (impcntr) and Domicile, respondent’s description (domicil).

As it is seen from the bar plot, people from big cities and suburbs are likely to allow some people from poor countries and here is a little amount of people who reject any immigrants from poor countries, so it means that people from big cities are tollerante towards immigrants. The biggest part of native population , who do not allow immigrants or allow only few, is concentrated in small towns. It can be said that citizens of urban area much more tolerant towards immigrants from poorer countries than people, who live in towns or countryside.