Nonresponse patterns to immigration items - do they work in the same manner?

Rpubs link: https://rpubs.com/AnetaPiekut/SMI205_Replication_nonresponse

Github repository: https://github.com/anetapiekut/SMI205_Replication_nonresponse

Replicated paper

Replication project based on paper: Piekut, A. (2019). Survey nonresponse in attitudes towards immigration in Europe. Journal of Ethnic and Migration Studies: 1-26, doi: 10.1080/1369183X.2019.1661773.

Workspace setup

Global r chunks setup:

library(knitr)
## Global options
opts_chunk$set(echo=TRUE,
                 cache=TRUE,
               comment=NA,
               message=FALSE,
               warning=FALSE)

Used libraries:

library(essurvey)
library(dplyr)
library(ggplot2)
library(lme4)
library(foreign)
library(sjPlot)
library(sjmisc)
library(sjlabelled)
library(summarytools)

I’m using rmdformats::html_clean theme, with kate highlight. My YAML settings are:

output:
rmdformats::html_clean:
highlight: kate

1. Introduction

Item nonresponse in surveys is a common phenomena. Many studies demonstrated that it works differently for questions asking about knowledge than for quesitons measuring subjective-states (Berinsky 1999; Herda 2013). I reproduce some analysis conducted by Piekut (2019) to explore this matter further. Her research explored patterns in nonresponse rate to items measuring attitudes to immigration using European Social Survey (ESS) data from 2014. She analysed all nonresponses together applying count models, explaining that in studies based on ESS allitems are often combined together as summative indices:

Many previous studies have utilised the ESS impact items to construct an index that captures anti-immigrant attitudes (…). Nonresponse to items (…) will result in poorer reliability of such indices (Piekut 2019: 10).

In this analysis I test whether nonresponse pattern is different for the two most commonly used questions measuring opinions on immigration impacts on respondent’s country: economy and culture (‘imbgeco’ and ‘imbleco’, respectively, in the ESS data). While indeed, as Piekut (2019) acknowledges, many research contruct an average score measuring anti-immigration attitudes on the basis of a few impact items, there is also a number of studies which approach them as two dissimilar constructs - the first one representing realistic threat, and the second one - symbolic threat (see Meuleman et al. 2009; and more recently - Jedinger, Eisentraut 2020).

Furthermore, past research pointed to some other dissimilarities between the two questions. While in general age is positively correlated with negative attitudes to immigration, it might be a weaker predictor for the question asking about impact of immigration on economy than for the culture one. This is so because economic impacts brought by immigration are less relevant for people who are no longer active on the labour market (Schotte, Winkler 2018). Similarly, the role of gender for shaping attitudes to immigration is not clear. Previous research on the determinants of attitudes towards immigration indicated that women are more opposed to immigration, although results are not consistent across countries (Chandler et al. 2001), with some studies revealing that men display more cultural threat concerns, while women express more economic threat concerns (Markaki, Longhi 2013).

As such, by replicating the same study using the same data, but a different method, I test robustness of conducted analysis by Piekut (2019), which allows me “to see if the target finding is merely the result of analytic decisions” (Freese & Peterson 2017: 152).

2. Data and methods

2.1. Data

European Social Survey is a cross-national survey conducted every two years in a number of European countries, using random probability sampling (ESS 2014). I use wave 7, collected in 2014/2015, which was also analysed by Piekut (2019), and I select the same key variables for my analysis¹.

library("essurvey")
set_email("a.piekut@sheffield.ac.uk")
round_7 <- import_rounds(7)

The original paper uses also data from the interviewer questionnaire (ESSinterviewer.sav - SPSS format), which I have am attached to my main dataset by matching cases by both respondents ‘idno’ and ‘cntry’. As the country variable in the interviewer dataset is differently coded, I have to recode it in my ESS data, so cases in both datsets can be matched.

Following the original paper, I recode also all variables. Finally, I compute a new binary variable whether a respondent replied to each question measuring attitudes - my two dependent variables. I subset data again to remove old variables. Summary information on my final list of variables is in Appendix 1. R code is hidden for these data manupulation steps, but my etire R script is available in Appendix 2.

The original sample size was 40,185 respondents. After excluding Isreal subsample (as Piekut (2019) did), and removing cases with missing data for the key variables, it is 37,385.

2.2. Nonresponse to impacts on economy and culture items

As Tables 1 and 2 in tabs below present, almost 5.3% of respondents in the entire ESS sample did not reply to the question asking whether according to them immigration has a negative or positive impact on their country culture, while 3.3.% did not reply to the economy item. However, the proportions of nonresponse considerably vary across countries. For example, for the culture item, in Belgium only 0.8% of respondents replied ‘Don’t know’ or ‘Refuse to answer’, while in Slovenia - 17.2%. Figures 1 and 2 illustrate this cross-country variation.

Economy impacts - all

library(summarytools)
freq(round_7x_finalx$nonresponse_econ, style = "rmarkdown", headings = FALSE, caption="Table 1. Frequency distribution of nonresponse to immigration impact of economy question")

Table 1. Frequency distribution of nonresponse to immigration impact of economy question
	Freq	% Valid	% Valid Cum.	% Total	% Total Cum.
Nonresponse	1235	3.30	3.30	3.30	3.30
Response	36150	96.70	100.00	96.70	100.00
<NA>	0			0.00	100.00
Total	37385	100.00	100.00	100.00	100.00

Culture impacts - all

library(summarytools)
freq(round_7x_finalx$nonresponse_cult, style = "rmarkdown", headings = FALSE, caption="Table 2. Frequency distribution of nonresponse to immigration impact of economy question")

Table 2. Frequency distribution of nonresponse to immigration impact of economy question
	Freq	% Valid	% Valid Cum.	% Total	% Total Cum.
Nonresponse	1981	5.30	5.30	5.30	5.30
Response	35404	94.70	100.00	94.70	100.00
<NA>	0			0.00	100.00
Total	37385	100.00	100.00	100.00	100.00

Economy impacts - by country

library(dplyr)
# Data in a form of percentages per country first
ESS_2014_perc1 <- round_7x_finalx %>% 
  group_by(cntry,nonresponse_econ) %>% 
  summarise(count=n()) %>% 
  mutate(perc=count/sum(count))
library(ggplot2)
# Graph whether respondents answered question on impact of immigration by country
ggplot(ESS_2014_perc1, aes(x = factor(cntry), y = perc*100, fill = factor(nonresponse_econ))) +
  geom_bar(stat="identity", width = 0.7) +
  labs(x = "Country", y = "Percentage", fill = "nonresponse") + coord_flip() +
  theme_minimal(base_size = 14) +
  ggtitle("Figure 1. Nonresponse to impact on economy item \n across ESS 2014 countries")

Culture impacts - by country

library(dplyr)
# Data in a form of percentages per country first
ESS_2014_perc2 <- round_7x_finalx %>% 
  group_by(cntry,nonresponse_cult) %>% 
  summarise(count=n()) %>% 
  mutate(perc=count/sum(count))
library(ggplot2)
# Graph whether respondents answered question on impact of immigration by country
ggplot(ESS_2014_perc2, aes(x = factor(cntry), y = perc*100, fill = factor(nonresponse_cult))) +
  geom_bar(stat="identity", width = 0.7) +
  labs(x = "Country", y = "Percentage", fill = "nonresponse") + coord_flip() +
  theme_minimal(base_size = 14) +
  ggtitle("Figure 1. Nonresponse to impact on culture item \n across ESS 2014 countries")

2.3. Methods

Piekut (2019) research explored nonresponse rate for all questions measuring attitudes to immigration, hence she used a count model, as the dependent variable was a number of nonresponses. Due to hierarchical nature of the data, she conducted a multilevel analysis, with respondents ‘nested in’ interviewers, who were then ‘nested in’ countries.

In this replication I explore whether the results hold if we look at two (binary) nonresponse variables, so I use two multilevel logistic regressions. In other words, I model the probability of nonresponse for each of these two questions separately, which will be conditional on respondents’ characteristics. These probabilities are allowed to vary between interviewers and between countries (Sommet, Morselli 2017). Intraclass Correlation Coefficient (ICC) for null models for both variables is 0.30, meaning that the grouping structure of the hierarchical model explains almost one third of variance in the dependent variable nonresponse. As Table 3 illustrates, the size of my final sample is as follows: 37,385 respondents, 2,039 interviewers, 20 countries.

Table 3. Multilevel logistic regression models of nonreponse - baseline / empty model
	Nonresponse: Economy		Nonresponse: Culture
Predictors	Odds Ratios	Conf. Int (95%)	Odds Ratios	Conf. Int (95%)
(Intercept)	0.02 ^***	0.02 – 0.03	0.04 ^***	0.03 – 0.06
Random Effects
σ²	3.29		3.29
τ₀₀	0.82 _intnum		0.83 _intnum
	0.61 _cntry		0.56 _cntry
ICC	0.30		0.30
N	20 _cntry		20 _cntry
	2039 _intnum		2039 _intnum
Observations	37385		37385
Marginal R² / Conditional R²	0.000 / 0.303		0.000 / 0.297
p<0.05 p<0.01 * p<0.001

3. Results

3.1. Comparison with the original paper

Table 4 below displays the results of two multilevel logistic regressions: one for nonresponse to the question on immigration impacts on economy (first column), and the other one – on culture (second column). Overall, the direction of the relationship between various socio-demographic characteritics of respondents and their other reponses, and the probability not to respond, is the same as in the original study. Nonresponders are older, more likely to be female and identifying as ethnic minority, less likely to be coping easily on their current income and less interested in politics. They are of lower political efficacy score meaning that those who believe they have more influence on political affairs, are likely to repond to both items. Following Piekut (2019) results, the strongest predictors of nonresponse are nonresponses to other variables, in this intance items measuring racism and net household income.

The frequency of contact was one of key variables explored by Piekut (2019) and she found that people with ‘medium’ amount of contact - not without any or everyday interactions - had the lowest nonresponse rate. This pattern is not present when we model both nonresponses separetly.

# Individual-level variables
M1 <- glmer(nonresponse_econ1 ~ minority + factor(sex) + age + marital_status + unemployed5yr + education + subj_income + polit_intr + polit_efficacy + contact_freq + racism + income_nonresponse + interviewer_age + interviewer_gender + resp_vo_understood + resp_often_reluctant + someone_present + (1 | cntry) + (1 | intnum), data = round_7x_finalx, family = "binomial", nAGQ=0)
# Interview variables
M2 <- glmer(nonresponse_cult1 ~ minority + factor(sex) + age + marital_status + unemployed5yr + education + subj_income + polit_intr + polit_efficacy + contact_freq + racism + income_nonresponse + interviewer_age + interviewer_gender + resp_vo_understood + resp_often_reluctant + someone_present + (1 | cntry) + (1 | intnum), data = round_7x_finalx, family = "binomial", nAGQ=0)

tab_model(M1, M2,
  dv.labels = c("Nonresponse: Economy", "Nonresponse: Culture"),
    pred.labels = c("Intercept", "Ethnic minority", "Female", "Age", "Marital status Ref. Married: </br> Never married", "Separated/Divorced", "Windowed",
                    "Uneployed in last 5 years", "Education Ref. No/Primary: </br> Upper secondary", "Vocational", "Tertiary", 
                    "Subjective income Ref. Comfortable:</br> Coping", "Difficult", "Political interest", "Political efficacy",
                    "Contact frequency Ref. No:</br> Rarely", "Every month", "Every week", "Everyday", "Racism item: Yes", "Racism item: Nonresponse",
                    "Income item: nonresponse", "Interviewer: age", "Interviewer: female", "Respondent: understood", "Respondent: reluctant", "Interview: Someone present"),
    string.ci = "Conf. Int (95%)",
     p.style = "a", title = "Table 4. Multilevel logistic regression models of nonreponse to questions measuring opinions on immigration"
)

Table 4. Multilevel logistic regression models of nonreponse to questions measuring opinions on immigration
	Nonresponse: Economy		Nonresponse: Culture
Predictors	Odds Ratios	Conf. Int (95%)	Odds Ratios	Conf. Int (95%)
Intercept	0.01 ^***	0.00 – 0.01	0.02 ^***	0.01 – 0.03
Ethnic minority	1.18	0.88 – 1.58	1.28 ^*	1.03 – 1.61
Female	1.22 ^**	1.06 – 1.41	1.29 ^***	1.16 – 1.45
Age	1.01 ^***	1.01 – 1.02	1.01 ^***	1.01 – 1.02
Marital status Ref. Married: Never married	1.19	0.98 – 1.44	1.24 ^**	1.07 – 1.45
Separated/Divorced	0.95	0.75 – 1.19	0.93	0.77 – 1.12
Windowed	1.10	0.89 – 1.36	1.11	0.93 – 1.31
Uneployed in last 5 years	0.90	0.73 – 1.11	0.90	0.76 – 1.06
Education Ref. No/Primary: Upper secondary	0.69 ^***	0.58 – 0.82	0.74 ^***	0.65 – 0.85
Vocational	0.65 ^***	0.51 – 0.84	0.84	0.70 – 1.02
Tertiary	0.81	0.64 – 1.01	1.03	0.87 – 1.22
Subjective income Ref. Comfortable: Coping	1.27 ^*	1.05 – 1.53	1.09	0.94 – 1.25
Difficult	1.20	0.96 – 1.50	1.18	0.99 – 1.40
Political interest	1.21 ^***	1.11 – 1.32	1.09 ^*	1.01 – 1.17
Political efficacy	0.89 ^***	0.86 – 0.93	0.89 ^***	0.86 – 0.92
Contact frequency Ref. No: Rarely	0.81 ^*	0.66 – 1.00	0.98	0.83 – 1.15
Every month	0.83	0.65 – 1.06	0.85	0.69 – 1.04
Every week	0.92	0.74 – 1.14	0.85	0.71 – 1.02
Everyday	0.88	0.69 – 1.12	0.89	0.73 – 1.08
Racism item: Yes	1.22 ^*	1.04 – 1.42	1.17 ^*	1.03 – 1.32
Racism item: Nonresponse	2.65 ^***	2.13 – 3.28	2.58 ^***	2.16 – 3.09
Income item: nonresponse	1.47 ^***	1.25 – 1.74	1.88 ^***	1.64 – 2.14
Interviewer: age	1.00	1.00 – 1.01	1.00	1.00 – 1.01
Interviewer: female	1.13	0.95 – 1.35	1.06	0.91 – 1.23
Respondent: understood	0.67 ^***	0.57 – 0.78	0.63 ^***	0.56 – 0.71
Respondent: reluctant	1.55 ^***	1.31 – 1.82	1.53 ^***	1.33 – 1.75
Interview: Someone present	1.05	0.85 – 1.29	1.04	0.87 – 1.23
Random Effects
σ²	3.29		3.29
τ₀₀	0.74 _intnum		0.75 _intnum
	0.29 _cntry		0.32 _cntry
ICC	0.24		0.25
N	20 _cntry		20 _cntry
	2025 _intnum		2025 _intnum
Observations	35421		35421
Marginal R² / Conditional R²	0.118 / 0.327		0.113 / 0.332
p<0.05 p<0.01 * p<0.001

3.2. Differences between both nonreponses

There are, however, some diffrences in the strenght of coefficients between both models. Odds ratio for women is higher in the model predicting probability of nonresponse to impact on culture item. While women are 29% more likely than man to refrain from answering the culture item, it is 22% for nonresponse to the economy item (see Figures 3 and 4 below for comparison). On average, keeping all other variable at their means, the probability of nonresponse for women in ESS 2014 iss 2.3% and 3.8%, for each item respectively.

Age effect is the same in both models, and one year results in 1% higher chances of nonrespose for each question. Since nonresponse to the question on culture impacts was higher than for the question on economy impacts, the probability of nonreponse for a 70-year-old person is almost 4% for the culture item, and 2% for the economy item, while it is about 2% and 1.5% for 20-year-olds, respectively (see Figures 5 and 6 below for comparison).

Gender: Economy item

plot_model(M1, type = "pred", terms = "sex [1, 2]", title="Figure 3. Predicted probabilities of nonresponse to impact on economy item")

Gender: Culture item

plot_model(M2, type = "pred", terms = "sex [1, 2]", title="Figure 4. Predicted probabilities of nonresponse to impact on culture item")

Age: Economy item

plot_model(M1, type = "pred", terms = "age [20, 70]", title="Figure 5. Predicted probabilities of nonresponse to impact on economy item")

Age: Culture item

plot_model(M2, type = "pred", terms = "age [20, 70]", title="Figure 6. Predicted probabilities of nonresponse to impact on culture item")

4. Conclusions

This replication paper aim was to test whether nonresponse pattern is the same for the item asking about impact of immigration on country’s economy and the item asking about impacts on culture. The obtained results are largely in line with the article by Piekut (2019). The realistic and symbolic threats – that might be mobilised among those with less favourable attitudes to immigration – are two separate phenomena, yet they are correlated one with another (Meuleman et al. 2009). Hence, the direction of coefficients was the same for two nonresponses.

Contrary to Piekut (2019), we found that a few independent variables – like the frequency of contact – turned out not to be statistically significant in the models. The proportion of nonresponse for both items was low (3-5%), which results in a very uneven division in the dependent variable and a small number of cases in one category. Detecting patterns in distributions with unbalanced data is more difficult and logistic regression might underestimate the probability of rare events (King, Zeng 2001). As such, the count model chosen by Piekut (2019) seems to be a sensible option trying to overcome this problem, while building on the fact that both measures relate to the same latent variable anti-immigration attitudes.

References

Berinsky, A. J. (1999). The two faces of public opinion. American Journal of Political Science, 43(4), 1209-1230.

Chandler, C. R., & Tsai, Y. M. (2001). Social factors influencing immigration attitudes: an analysis of data from the General Social Survey. The Social Science Journal, 38(2), 177-188.

ESS (2014). ESS Round 7: European Social Survey Round 7 Data. 2014a. Data file edition 2.1. NSD – Norwegian Centre for Research Data, Norway – Data Archive and distributor of ESS data for ESS ERIC.

Herda, D. (2013). Too many immigrants? Examining alternative forms of immigrant population innumeracy. Sociological Perspectives, 56(2), 213-240.

Markaki, Y., & Longhi, S. (2013). What determines attitudes to immigration in European countries? An analysis at the regional level. Migration Studies, 1(3), 311-337.

Freese, J., & Peterson, D. (2017). Replication in social science. Annual Review of Sociology, 43, 147-165.

Hellwig, T., & Sinno, A. (2017). Different groups, different threats: public attitudes towards immigrants. Journal of Ethnic and Migration Studies, 43(3), 339-358.

Jedinger, A., & Eisentraut, M. (2020). Exploring the Differential Effects of Perceived Threat on Attitudes Toward Ethnic Minority Groups in Germany. Frontiers in Psychology, 10, 2895, https://doi.org/10.3389/fpsyg.2019.02895.

King, G., & Zeng, L. (2001). Logistic regression in rare events data. Political Analysis, 9(2), 137-163.

Meuleman, B., Davidov, E., & Billiet, J. (2009). Changing attitudes toward immigration in Europe, 2002–2007: A dynamic group conflict theory approach. Social Science Research, 38(2), 352-365.

Piekut, A. (2019). Survey nonresponse in attitudes towards immigration in Europe. Journal of Ethnic and Migration Studies, 1-26, doi: 10.1080/1369183X.2019.1661773

Sommet, N., & Morselli, D. (2017). Keep Calm and Learn Multilevel Logistic Modeling: A Simplified Three-Step Procedure Using Stata, R, Mplus, and SPSS. International Review of Social Psychology, 30, 203-218.

Schotte, S., & Winkler, H. (2018). Why are the elderly more averse to immigration when they are more likely to benefit? Evidence across countries. International Migration Review, 52(4), 1250-1282.

Endnotes

Appendix

Appendix 1: Dataframe summary

library(summarytools)
dfSummary(round_7x_finalx, plain.ascii = FALSE, style = 'grid', graph.magnif = 0.75, valid.col = FALSE, tmp.img.dir = "/tmp", headings = FALSE, caption="Table 1. Summary of data frame")

Table 1. Summary of data frame
No	Variable	Stats / Values	Freqs (% of Valid)	Missing
1	idno [numeric]	Mean (sd) : 6765561.4 (23466750.9) min < med < max: 1 < 2897 < 100005599 IQR (CV) : 107972 (3.5)	18359 distinct values	0 (0%)
2	cntry [character]	1. Germany 2. Ireland 3. Lithuania 4. United Kingdom 5. Czech Republic 6. Finland 7. Estonia 8. Spain 9. France 10. Netherlands [ 10 others ]	3031 ( 8.1%) 2349 ( 6.3%) 2239 ( 6.0%) 2197 ( 5.9%) 2132 ( 5.7%) 2084 ( 5.6%) 2049 ( 5.5%) 1921 ( 5.1%) 1912 ( 5.1%) 1905 ( 5.1%) 15566 (41.6%)	0 (0%)
3	minority [character]	1. No 2. Yes	34872 (94.5%) 2044 ( 5.5%)	469 (1.25%)
4	sex [numeric]	Min : 1 Mean : 1.5 Max : 2	1 : 17604 (47.1%) 2 : 19759 (52.9%)	22 (0.06%)
5	age [numeric]	Mean (sd) : 49.4 (18.7) min < med < max: 14 < 50 < 114 IQR (CV) : 30 (0.4)	90 distinct values	65 (0.17%)
6	marital_status [character]	1. Married/Union 2. Never married 3. Separated/Divorced 4. Windowed	18568 (50.1%) 11139 (30.1%) 4050 (10.9%) 3294 ( 8.9%)	334 (0.89%)
7	unemployed5yr [numeric]	Min : 0 Mean : 0.1 Max : 1	0 : 32396 (86.7%) 1 : 4989 (13.3%)	0 (0%)
8	education [character]	1. 1-Lower secondary 2. 2-Upper secondary 3. 3-Vocational 4. 4-Tertiary	10419 (27.9%) 13362 (35.7%) 5244 (14.0%) 8360 (22.4%)	0 (0%)
9	subj_income [character]	1. 1-Living comfortably 2. 2-Coping 3. 3-Difficult	12058 (32.5%) 17215 (46.4%) 7801 (21.0%)	311 (0.83%)
10	polit_intr [numeric]	Mean (sd) : 2.6 (0.9) min < med < max: 1 < 3 < 4 IQR (CV) : 1 (0.4)	1 : 4312 (11.6%) 2 : 13751 (36.9%) 3 : 12513 (33.6%) 4 : 6715 (18.0%)	94 (0.25%)
11	polit_efficacy [numeric]	Mean (sd) : 3.6 (2.1) min < med < max: 0 < 3.5 < 10 IQR (CV) : 3 (0.6)	116 distinct values	93 (0.25%)
12	contact_freq [character]	1. 1-Never 2. 2-Rarely 3. 3-Every month 4. 4-Every week 5. 5-Everyday	5300 (14.3%) 7534 (20.3%) 5065 (13.7%) 10323 (27.9%) 8817 (23.8%)	346 (0.93%)
13	racism [character]	1. 1-No 2. 2-Yes 3. 3-nonresponse	14285 (38.2%) 21289 (57.0%) 1811 ( 4.8%)	0 (0%)
14	income_nonresponse [character]	1. Answer 2. Nonresponse	29835 (79.8%) 7550 (20.2%)	0 (0%)
15	intnum [integer]	Mean (sd) : 862 (671.3) min < med < max: 1 < 804 < 2108 IQR (CV) : 1254 (0.8)	2039 distinct values	0 (0%)
16	resp_vo_understood [character]	1. Not very often 2. Very often	11600 (31.2%) 25605 (68.8%)	180 (0.48%)
17	resp_often_reluctant [character]	1. No 2. Yes	31713 (85.2%) 5505 (14.8%)	167 (0.45%)
18	someone_present [character]	1. No 2. Yes	34104 (91.2%) 3281 ( 8.8%)	0 (0%)
19	interviewer_age [integer]	Mean (sd) : 35.8 (12.7) min < med < max: 1 < 38 < 66 IQR (CV) : 17 (0.4)	66 distinct values	169 (0.45%)
20	interviewer_gender [integer]	Min : 1 Mean : 1.6 Max : 2	1 : 13248 (35.5%) 2 : 24015 (64.5%)	122 (0.33%)
21	nonresponse_econ [character]	1. Nonresponse 2. Response	1235 ( 3.3%) 36150 (96.7%)	0 (0%)
22	nonresponse_cult [character]	1. Nonresponse 2. Response	1981 ( 5.3%) 35404 (94.7%)	0 (0%)
23	nonresponse_econ1 [numeric]	Min : 0 Mean : 0 Max : 1	0 : 36150 (96.7%) 1 : 1235 ( 3.3%)	0 (0%)
24	nonresponse_cult1 [numeric]	Min : 0 Mean : 0.1 Max : 1	0 : 35404 (94.7%) 1 : 1981 ( 5.3%)	0 (0%)

Appendix 2. Entire R code used in the project

library(knitr)
## Global options
opts_chunk$set(echo=TRUE,
                 cache=TRUE,
               comment=NA,
               message=FALSE,
               warning=FALSE)
library(essurvey)
library(dplyr)
library(ggplot2)
library(lme4)
library(foreign)
library(sjPlot)
library(sjmisc)
library(sjlabelled)
library(summarytools)
library("essurvey")
set_email("a.piekut@sheffield.ac.uk")
round_7 <- import_rounds(7)
round_7x = subset(round_7, select = c(idno, cntry, dweight, pspwght, pweight, blgetmg, imbgeco, imbleco, dfegcon, smegbhw, gndr, agea, maritalb, eisced, polintr, psppsgv, actrolg, psppipl, cptppol, ptcpplt, etapapl, hinctnta, hincfel, uemp5yr))
summary(round_7x)
#ESSinterviewer = read.spss("C:/Users/aneta/Google Drive (a.piekut@sheffield.ac.uk)/Teaching @ SMI/SMI205 - Advanced Research Project/SMI205 Replication - materials in R/AP nonresponse paper/ESS7INTe02_1.spss/ESS7INTe02_1.sav", to.data.frame=TRUE)
ESSinterviewer = read.spss("/Users/anetapiekut/Dropbox/ESS Non-attitudes/ESS7INTe02_1/ESS7INTe02_1.sav", to.data.frame=TRUE)
# Recoding 'cntry' in the main dataset
round_7x$cntry[round_7x$cntry == 'AT'] <- "Austria"
round_7x$cntry[round_7x$cntry == 'BE'] <- "Belgium"
round_7x$cntry[round_7x$cntry == 'CZ'] <- "Czech Republic"
round_7x$cntry[round_7x$cntry == 'DK'] <- "Denmark"
round_7x$cntry[round_7x$cntry == 'EE'] <- "Estonia"
round_7x$cntry[round_7x$cntry == 'FI'] <- "Finland"
round_7x$cntry[round_7x$cntry == 'FR'] <- "France"
round_7x$cntry[round_7x$cntry == 'DE'] <- "Germany"
round_7x$cntry[round_7x$cntry == 'HU'] <- "Hungary"
round_7x$cntry[round_7x$cntry == 'IE'] <- "Ireland"
round_7x$cntry[round_7x$cntry == 'IL'] <- "Israel"
round_7x$cntry[round_7x$cntry == 'LT'] <- "Lithuania"
round_7x$cntry[round_7x$cntry == 'NL'] <- "Netherlands"
round_7x$cntry[round_7x$cntry == 'NO'] <- "Norway"
round_7x$cntry[round_7x$cntry == 'PL'] <- "Poland"
round_7x$cntry[round_7x$cntry == 'PT'] <- "Portugal"
round_7x$cntry[round_7x$cntry == 'SI'] <- "Slovenia"
round_7x$cntry[round_7x$cntry == 'ES'] <- "Spain"
round_7x$cntry[round_7x$cntry == 'SE'] <- "Sweden"
round_7x$cntry[round_7x$cntry == 'CH'] <- "Switzerland"
round_7x$cntry[round_7x$cntry == 'GB'] <- "United Kingdom"

# Keeping only necessary variables from itnerviewer dataset
ESSinterviewer2 = subset(ESSinterviewer, select = c(idno, cntry, intnum, resundq, resrelq, preintf, intgndr, intagea))

# Attaching interviewer dataset, merge by both respondent number 'idno' and 'cntry' (as numbers repeat across data)
round_7x_final <- merge(round_7x, ESSinterviewer2, by=c("idno","cntry"), sort=TRUE)
round_7x_final[] <- lapply(round_7x_final, unclass)
# 'blgetmg' - beign ethnic minority --> 'minority' 
round_7x_final$minority[round_7x_final$blgetmg == 2] <- "No"
round_7x_final$minority[round_7x_final$blgetmg == 1] <- "Yes"

# 'maritalb' --> 'marital_status'
round_7x_final$marital_status[round_7x_final$maritalb == 1] <- "Married/Union"
round_7x_final$marital_status[round_7x_final$maritalb == 2] <- "Married/Union"
round_7x_final$marital_status[round_7x_final$maritalb == 3] <- "Separated/Divorced"
round_7x_final$marital_status[round_7x_final$maritalb == 4] <- "Separated/Divorced"
round_7x_final$marital_status[round_7x_final$maritalb == 5] <- "Windowed"
round_7x_final$marital_status[round_7x_final$maritalb == 6] <- "Never married"

# 'uemp5yr' - any periods not working in last 5 years --> 'unemployed5yr' 
round_7x_final$unemployed5yr[is.na(round_7x_final$uemp5yr)] <- 0
round_7x_final$unemployed5yr[round_7x_final$uemp5yr == 2] <- 0
round_7x_final$unemployed5yr[round_7x_final$uemp5yr == 1] <- 1

# 'eisced' - 7 levels of education --> 'education'
round_7x_final$education[round_7x_final$eisced == 1] <- "1-Lower secondary"
round_7x_final$education[round_7x_final$eisced == 2] <- "1-Lower secondary"
round_7x_final$education[round_7x_final$eisced == 3] <- "2-Upper secondary"
round_7x_final$education[round_7x_final$eisced == 4] <- "2-Upper secondary"
round_7x_final$education[round_7x_final$eisced == 5] <- "3-Vocational"
round_7x_final$education[round_7x_final$eisced == 6] <- "4-Tertiary"
round_7x_final$education[round_7x_final$eisced == 7] <- "4-Tertiary"

# 'hincfel' subjective income --> 'subj_income'
round_7x_final$subj_income[round_7x_final$hincfel == 1] <- "1-Living comfortably"
round_7x_final$subj_income[round_7x_final$hincfel == 2] <- "2-Coping"
round_7x_final$subj_income[round_7x_final$hincfel == 3] <- "3-Difficult"
round_7x_final$subj_income[round_7x_final$hincfel == 4] <- "3-Difficult"

# 'polintr' - interest in politics - look at the distribution --> stays the same

# Political efficacy scale - mean of psppsgv, actrolg, psppipl, cptppol, ptcpplt, etapapl --> 'polit_efficacy'
round_7x_final$polit_efficacy=rowMeans(round_7x_final[,c("psppsgv", "actrolg", "psppipl", "cptppol", "ptcpplt", "etapapl")], na.rm=TRUE)

# Recoding code
round_7x_final$contact_freq[round_7x_final$dfegcon == 1] <- "1-Never"
round_7x_final$contact_freq[round_7x_final$dfegcon == 2] <- "2-Rarely"
round_7x_final$contact_freq[round_7x_final$dfegcon == 3] <- "2-Rarely"
round_7x_final$contact_freq[round_7x_final$dfegcon == 4] <- "3-Every month"
round_7x_final$contact_freq[round_7x_final$dfegcon == 5] <- "4-Every week"
round_7x_final$contact_freq[round_7x_final$dfegcon == 6] <- "4-Every week"
round_7x_final$contact_freq[round_7x_final$dfegcon == 7] <- "5-Everyday"

# 'smegbhw' - recode into  No / Yes / Nonresponse --> 'racism'
round_7x_final$racism[is.na(round_7x_final$smegbhw)] <- "3-nonresponse"
round_7x_final$racism[round_7x_final$smegbhw == 2] <- "2-Yes"
round_7x_final$racism[round_7x_final$smegbhw == 1] <- "1-No"

# 'hinctnta' - Income nonresponse - new variable 0/1 - responded/not respondened --> 'income_nonresponse'
round_7x_final$income_nonresponse[is.na(round_7x_final$hinctnta)] <- "Nonresponse"
round_7x_final$income_nonresponse[!is.na(round_7x_final$hinctnta)] <- "Answer"

# 'resundq' - On scale 1-5, Recoded into dummy 1-Very often (5), 0 - No (1-4) --> 'resp_vo_understood'
round_7x_final$resp_vo_understood[round_7x_final$resundq == 1] <- "Not very often"
round_7x_final$resp_vo_understood[round_7x_final$resundq == 2] <- "Not very often"
round_7x_final$resp_vo_understood[round_7x_final$resundq == 3] <- "Not very often"
round_7x_final$resp_vo_understood[round_7x_final$resundq == 4] <- "Not very often"
round_7x_final$resp_vo_understood[round_7x_final$resundq == 5] <- "Very often"

# 'resrelq' - On scale 1-5, Recoded into dummy 1-Yes (3/5), 0 - No (1-2) --> 'resp_often_reluctant'
round_7x_final$resp_often_reluctant[round_7x_final$resrelq == 1] <- "No"
round_7x_final$resp_often_reluctant[round_7x_final$resrelq == 2] <- "No"
round_7x_final$resp_often_reluctant[round_7x_final$resrelq == 3] <- "Yes"
round_7x_final$resp_often_reluctant[round_7x_final$resrelq == 4] <- "Yes"
round_7x_final$resp_often_reluctant[round_7x_final$resrelq == 5] <- "Yes"

# 'preintf' - Recoded into dummy 1-Yes (1), 0 - No (2 & NA) --> 'someone_present'
round_7x_final$someone_present[round_7x_final$preintf == 1] <- "Yes"
round_7x_final$someone_present[round_7x_final$preintf == 2] <- "No"
round_7x_final$someone_present[is.na(round_7x_final$preintf)] <- "No"

# Renaming other variables into more meaningful names
names(round_7x_final)[names(round_7x_final) == "agea"] <- "age"
names(round_7x_final)[names(round_7x_final) == "gndr"] <- "sex"
names(round_7x_final)[names(round_7x_final) == "polintr"] <- "polit_intr"
names(round_7x_final)[names(round_7x_final) == "intgndr"] <- "interviewer_gender"
names(round_7x_final)[names(round_7x_final) == "intagea"] <- "interviewer_age"

# Nonresponse to impact on economy
round_7x_final$nonresponse_econ[is.na(round_7x_final$imbgeco)] <- "Nonresponse"
round_7x_final$nonresponse_econ[!is.na(round_7x_final$imbgeco)] <- "Response"
# Nonresponse to impact on economy
round_7x_final$nonresponse_cult[is.na(round_7x_final$imbleco)] <- "Nonresponse"
round_7x_final$nonresponse_cult[!is.na(round_7x_final$imbleco)] <- "Response"
library(dplyr)
# Dropping Isreal and 'other' education from the sample
round_7x_final2 <- round_7x_final[which(round_7x_final$education != 55),]
round_7x_final3 <- round_7x_final2[which(round_7x_final2$cntry != "Israel"),]

# Subsetting again
round_7x_finalx = subset(round_7x_final3, select = c(idno, cntry, minority, sex, age, marital_status, unemployed5yr, education, subj_income, polit_intr, polit_efficacy, contact_freq, racism, income_nonresponse, intnum, resp_vo_understood, resp_often_reluctant, someone_present, interviewer_age, interviewer_gender, nonresponse_econ, nonresponse_cult))

# round_7x_finalx[!is.na(round_7x_final4)]

# round_7x_finalx <- na.omit(round_7x_final4) 
# summary(round_7x_finalx)

# Deleting all observations with missing data for key variables
# row.has.na <- apply(round_7x_final4, 1, function(x){any(is.na(x))})
# sum(row.has.na)
# round_7x_finalx <- round_7x_final4[!row.has.na,]

# round_7x_finalx <- na.omit(round_7x_final4) 

library(summarytools)
freq(round_7x_finalx$nonresponse_econ, style = "rmarkdown", headings = FALSE, caption="Table 1. Frequency distribution of nonresponse to immigration impact of economy question")
library(summarytools)
freq(round_7x_finalx$nonresponse_cult, style = "rmarkdown", headings = FALSE, caption="Table 2. Frequency distribution of nonresponse to immigration impact of economy question")
library(dplyr)
# Data in a form of percentages per country first
ESS_2014_perc1 <- round_7x_finalx %>% 
  group_by(cntry,nonresponse_econ) %>% 
  summarise(count=n()) %>% 
  mutate(perc=count/sum(count))
library(ggplot2)
# Graph whether respondents answered question on impact of immigration by country
ggplot(ESS_2014_perc1, aes(x = factor(cntry), y = perc*100, fill = factor(nonresponse_econ))) +
  geom_bar(stat="identity", width = 0.7) +
  labs(x = "Country", y = "Percentage", fill = "nonresponse") + coord_flip() +
  theme_minimal(base_size = 14) +
  ggtitle("Figure 1. Nonresponse to impact on economy item \n across ESS 2014 countries")
library(dplyr)
# Data in a form of percentages per country first
ESS_2014_perc2 <- round_7x_finalx %>% 
  group_by(cntry,nonresponse_cult) %>% 
  summarise(count=n()) %>% 
  mutate(perc=count/sum(count))
library(ggplot2)
# Graph whether respondents answered question on impact of immigration by country
ggplot(ESS_2014_perc2, aes(x = factor(cntry), y = perc*100, fill = factor(nonresponse_cult))) +
  geom_bar(stat="identity", width = 0.7) +
  labs(x = "Country", y = "Percentage", fill = "nonresponse") + coord_flip() +
  theme_minimal(base_size = 14) +
  ggtitle("Figure 1. Nonresponse to impact on culture item \n across ESS 2014 countries")
# Nonrepose into numeric
round_7x_finalx$nonresponse_econ1[round_7x_finalx$nonresponse_econ == "Nonresponse"] <- 1
round_7x_finalx$nonresponse_econ1[round_7x_finalx$nonresponse_econ == "Response"] <- 0
round_7x_finalx$nonresponse_econ1<-as.numeric(round_7x_finalx$nonresponse_econ1)
round_7x_finalx$nonresponse_cult1[round_7x_finalx$nonresponse_cult == "Nonresponse"] <- 1
round_7x_finalx$nonresponse_cult1[round_7x_finalx$nonresponse_cult == "Response"] <- 0
round_7x_finalx$nonresponse_cult1<-as.numeric(round_7x_finalx$nonresponse_cult1)
library(lme4)
# Baseline models
M0a <- glmer(nonresponse_econ1 ~ (1 | cntry) + (1 | intnum), data = round_7x_finalx, family = "binomial", nAGQ=0)
M0b <- glmer(nonresponse_cult1 ~ (1 | cntry) + (1 | intnum), data = round_7x_finalx, family = "binomial", nAGQ=0)
tab_model(M0a, M0b,
            dv.labels = c("Nonresponse: Economy", "Nonresponse: Culture"),
            string.ci = "Conf. Int (95%)",
            p.style = "a", title = "Table 3. Multilevel logistic regression models of nonreponse - baseline / empty model"
)
# Individual-level variables
M1 <- glmer(nonresponse_econ1 ~ minority + factor(sex) + age + marital_status + unemployed5yr + education + subj_income + polit_intr + polit_efficacy + contact_freq + racism + income_nonresponse + interviewer_age + interviewer_gender + resp_vo_understood + resp_often_reluctant + someone_present + (1 | cntry) + (1 | intnum), data = round_7x_finalx, family = "binomial", nAGQ=0)
# Interview variables
M2 <- glmer(nonresponse_cult1 ~ minority + factor(sex) + age + marital_status + unemployed5yr + education + subj_income + polit_intr + polit_efficacy + contact_freq + racism + income_nonresponse + interviewer_age + interviewer_gender + resp_vo_understood + resp_often_reluctant + someone_present + (1 | cntry) + (1 | intnum), data = round_7x_finalx, family = "binomial", nAGQ=0)
tab_model(M1, M2,
  dv.labels = c("Nonresponse: Economy", "Nonresponse: Culture"),
    pred.labels = c("Intercept", "Ethnic minority", "Female", "Age", "Marital status Ref. Married: </br> Never married", "Separated/Divorced", "Windowed",
                    "Uneployed in last 5 years", "Education Ref. No/Primary: </br> Upper secondary", "Vocational", "Tertiary", 
                    "Subjective income Ref. Comfortable:</br> Coping", "Difficult", "Political interest", "Political efficacy",
                    "Contact frequency Ref. No:</br> Rarely", "Every month", "Every week", "Everyday", "Racism item: Yes", "Racism item: Nonresponse",
                    "Income item: nonresponse", "Interviewer: age", "Interviewer: female", "Respondent: understood", "Respondent: reluctant", "Interview: Someone present"),
    string.ci = "Conf. Int (95%)",
     p.style = "a", title = "Table 4. Multilevel logistic regression models of nonreponse to questions measuring opinions on immigration"
)
plot_model(M1, type = "pred", terms = "sex [1, 2]", title="Figure 3. Predicted probabilities of nonresponse to impact on economy item")
plot_model(M2, type = "pred", terms = "sex [1, 2]", title="Figure 4. Predicted probabilities of nonresponse to impact on culture item")
plot_model(M1, type = "pred", terms = "age [20, 70]", title="Figure 5. Predicted probabilities of nonresponse to impact on economy item")
plot_model(M2, type = "pred", terms = "age [20, 70]", title="Figure 6. Predicted probabilities of nonresponse to impact on culture item")
$(document).ready(function() {
  $('.footnotes ol').appendTo('#endnotes');
  $('.footnotes').remove();
});
library(summarytools)
dfSummary(round_7x_finalx, plain.ascii = FALSE, style = 'grid', graph.magnif = 0.75, valid.col = FALSE, tmp.img.dir = "/tmp", headings = FALSE, caption="Table 1. Summary of data frame")

There are still some dissimilarities in the final list of variables I use. I do not add any country-level variables, as I am not interested in re-testing such contextual effects↩

SMI205 Replication paper

Aneta Piekut

18/05/2020