Nonresponse patterns to immigration items - do they work in the same manner?
Github repository: https://github.com/anetapiekut/SMI205_Replication_nonresponse
Replicated paper
Replication project based on paper: Piekut, A. (2019). Survey nonresponse in attitudes towards immigration in Europe. Journal of Ethnic and Migration Studies: 1-26, doi: 10.1080/1369183X.2019.1661773.
Workspace setup
Global r chunks setup:
library(knitr)
## Global options
opts_chunk$set(echo=TRUE,
cache=TRUE,
comment=NA,
message=FALSE,
warning=FALSE)Used libraries:
library(essurvey)
library(dplyr)
library(ggplot2)
library(lme4)
library(foreign)
library(sjPlot)
library(sjmisc)
library(sjlabelled)
library(summarytools)I’m using rmdformats::html_clean theme, with kate highlight. My YAML settings are:
output: rmdformats::html_clean: highlight: kate
1. Introduction
Item nonresponse in surveys is a common phenomena. Many studies demonstrated that it works differently for questions asking about knowledge than for quesitons measuring subjective-states (Berinsky 1999; Herda 2013). I reproduce some analysis conducted by Piekut (2019) to explore this matter further. Her research explored patterns in nonresponse rate to items measuring attitudes to immigration using European Social Survey (ESS) data from 2014. She analysed all nonresponses together applying count models, explaining that in studies based on ESS allitems are often combined together as summative indices:
Many previous studies have utilised the ESS impact items to construct an index that captures anti-immigrant attitudes (…). Nonresponse to items (…) will result in poorer reliability of such indices (Piekut 2019: 10).
In this analysis I test whether nonresponse pattern is different for the two most commonly used questions measuring opinions on immigration impacts on respondent’s country: economy and culture (‘imbgeco’ and ‘imbleco’, respectively, in the ESS data). While indeed, as Piekut (2019) acknowledges, many research contruct an average score measuring anti-immigration attitudes on the basis of a few impact items, there is also a number of studies which approach them as two dissimilar constructs - the first one representing realistic threat, and the second one - symbolic threat (see Meuleman et al. 2009; and more recently - Jedinger, Eisentraut 2020).
Furthermore, past research pointed to some other dissimilarities between the two questions. While in general age is positively correlated with negative attitudes to immigration, it might be a weaker predictor for the question asking about impact of immigration on economy than for the culture one. This is so because economic impacts brought by immigration are less relevant for people who are no longer active on the labour market (Schotte, Winkler 2018). Similarly, the role of gender for shaping attitudes to immigration is not clear. Previous research on the determinants of attitudes towards immigration indicated that women are more opposed to immigration, although results are not consistent across countries (Chandler et al. 2001), with some studies revealing that men display more cultural threat concerns, while women express more economic threat concerns (Markaki, Longhi 2013).
As such, by replicating the same study using the same data, but a different method, I test robustness of conducted analysis by Piekut (2019), which allows me “to see if the target finding is merely the result of analytic decisions” (Freese & Peterson 2017: 152).
2. Data and methods
2.1. Data
European Social Survey is a cross-national survey conducted every two years in a number of European countries, using random probability sampling (ESS 2014). I use wave 7, collected in 2014/2015, which was also analysed by Piekut (2019), and I select the same key variables for my analysis1.
The original paper uses also data from the interviewer questionnaire (ESSinterviewer.sav - SPSS format), which I have am attached to my main dataset by matching cases by both respondents ‘idno’ and ‘cntry’. As the country variable in the interviewer dataset is differently coded, I have to recode it in my ESS data, so cases in both datsets can be matched.
Following the original paper, I recode also all variables. Finally, I compute a new binary variable whether a respondent replied to each question measuring attitudes - my two dependent variables. I subset data again to remove old variables. Summary information on my final list of variables is in Appendix 1. R code is hidden for these data manupulation steps, but my etire R script is available in Appendix 2.
The original sample size was 40,185 respondents. After excluding Isreal subsample (as Piekut (2019) did), and removing cases with missing data for the key variables, it is 37,385.
2.2. Nonresponse to impacts on economy and culture items
As Tables 1 and 2 in tabs below present, almost 5.3% of respondents in the entire ESS sample did not reply to the question asking whether according to them immigration has a negative or positive impact on their country culture, while 3.3.% did not reply to the economy item. However, the proportions of nonresponse considerably vary across countries. For example, for the culture item, in Belgium only 0.8% of respondents replied ‘Don’t know’ or ‘Refuse to answer’, while in Slovenia - 17.2%. Figures 1 and 2 illustrate this cross-country variation.
Economy impacts - all
library(summarytools)
freq(round_7x_finalx$nonresponse_econ, style = "rmarkdown", headings = FALSE, caption="Table 1. Frequency distribution of nonresponse to immigration impact of economy question")| Freq | % Valid | % Valid Cum. | % Total | % Total Cum. | |
|---|---|---|---|---|---|
| Nonresponse | 1235 | 3.30 | 3.30 | 3.30 | 3.30 |
| Response | 36150 | 96.70 | 100.00 | 96.70 | 100.00 |
| <NA> | 0 | 0.00 | 100.00 | ||
| Total | 37385 | 100.00 | 100.00 | 100.00 | 100.00 |
Culture impacts - all
library(summarytools)
freq(round_7x_finalx$nonresponse_cult, style = "rmarkdown", headings = FALSE, caption="Table 2. Frequency distribution of nonresponse to immigration impact of economy question")| Freq | % Valid | % Valid Cum. | % Total | % Total Cum. | |
|---|---|---|---|---|---|
| Nonresponse | 1981 | 5.30 | 5.30 | 5.30 | 5.30 |
| Response | 35404 | 94.70 | 100.00 | 94.70 | 100.00 |
| <NA> | 0 | 0.00 | 100.00 | ||
| Total | 37385 | 100.00 | 100.00 | 100.00 | 100.00 |
Economy impacts - by country
library(dplyr)
# Data in a form of percentages per country first
ESS_2014_perc1 <- round_7x_finalx %>%
group_by(cntry,nonresponse_econ) %>%
summarise(count=n()) %>%
mutate(perc=count/sum(count))
library(ggplot2)
# Graph whether respondents answered question on impact of immigration by country
ggplot(ESS_2014_perc1, aes(x = factor(cntry), y = perc*100, fill = factor(nonresponse_econ))) +
geom_bar(stat="identity", width = 0.7) +
labs(x = "Country", y = "Percentage", fill = "nonresponse") + coord_flip() +
theme_minimal(base_size = 14) +
ggtitle("Figure 1. Nonresponse to impact on economy item \n across ESS 2014 countries")Culture impacts - by country
library(dplyr)
# Data in a form of percentages per country first
ESS_2014_perc2 <- round_7x_finalx %>%
group_by(cntry,nonresponse_cult) %>%
summarise(count=n()) %>%
mutate(perc=count/sum(count))
library(ggplot2)
# Graph whether respondents answered question on impact of immigration by country
ggplot(ESS_2014_perc2, aes(x = factor(cntry), y = perc*100, fill = factor(nonresponse_cult))) +
geom_bar(stat="identity", width = 0.7) +
labs(x = "Country", y = "Percentage", fill = "nonresponse") + coord_flip() +
theme_minimal(base_size = 14) +
ggtitle("Figure 1. Nonresponse to impact on culture item \n across ESS 2014 countries")2.3. Methods
Piekut (2019) research explored nonresponse rate for all questions measuring attitudes to immigration, hence she used a count model, as the dependent variable was a number of nonresponses. Due to hierarchical nature of the data, she conducted a multilevel analysis, with respondents ‘nested in’ interviewers, who were then ‘nested in’ countries.
In this replication I explore whether the results hold if we look at two (binary) nonresponse variables, so I use two multilevel logistic regressions. In other words, I model the probability of nonresponse for each of these two questions separately, which will be conditional on respondents’ characteristics. These probabilities are allowed to vary between interviewers and between countries (Sommet, Morselli 2017). Intraclass Correlation Coefficient (ICC) for null models for both variables is 0.30, meaning that the grouping structure of the hierarchical model explains almost one third of variance in the dependent variable nonresponse. As Table 3 illustrates, the size of my final sample is as follows: 37,385 respondents, 2,039 interviewers, 20 countries.
| Nonresponse: Economy | Nonresponse: Culture | |||
|---|---|---|---|---|
| Predictors | Odds Ratios | Conf. Int (95%) | Odds Ratios | Conf. Int (95%) |
| (Intercept) | 0.02 *** | 0.02 – 0.03 | 0.04 *** | 0.03 – 0.06 |
| Random Effects | ||||
| σ2 | 3.29 | 3.29 | ||
| τ00 | 0.82 intnum | 0.83 intnum | ||
| 0.61 cntry | 0.56 cntry | |||
| ICC | 0.30 | 0.30 | ||
| N | 20 cntry | 20 cntry | ||
| 2039 intnum | 2039 intnum | |||
| Observations | 37385 | 37385 | ||
| Marginal R2 / Conditional R2 | 0.000 / 0.303 | 0.000 / 0.297 | ||
|
||||
3. Results
3.1. Comparison with the original paper
Table 4 below displays the results of two multilevel logistic regressions: one for nonresponse to the question on immigration impacts on economy (first column), and the other one – on culture (second column). Overall, the direction of the relationship between various socio-demographic characteritics of respondents and their other reponses, and the probability not to respond, is the same as in the original study. Nonresponders are older, more likely to be female and identifying as ethnic minority, less likely to be coping easily on their current income and less interested in politics. They are of lower political efficacy score meaning that those who believe they have more influence on political affairs, are likely to repond to both items. Following Piekut (2019) results, the strongest predictors of nonresponse are nonresponses to other variables, in this intance items measuring racism and net household income.
The frequency of contact was one of key variables explored by Piekut (2019) and she found that people with ‘medium’ amount of contact - not without any or everyday interactions - had the lowest nonresponse rate. This pattern is not present when we model both nonresponses separetly.
# Individual-level variables
M1 <- glmer(nonresponse_econ1 ~ minority + factor(sex) + age + marital_status + unemployed5yr + education + subj_income + polit_intr + polit_efficacy + contact_freq + racism + income_nonresponse + interviewer_age + interviewer_gender + resp_vo_understood + resp_often_reluctant + someone_present + (1 | cntry) + (1 | intnum), data = round_7x_finalx, family = "binomial", nAGQ=0)
# Interview variables
M2 <- glmer(nonresponse_cult1 ~ minority + factor(sex) + age + marital_status + unemployed5yr + education + subj_income + polit_intr + polit_efficacy + contact_freq + racism + income_nonresponse + interviewer_age + interviewer_gender + resp_vo_understood + resp_often_reluctant + someone_present + (1 | cntry) + (1 | intnum), data = round_7x_finalx, family = "binomial", nAGQ=0)tab_model(M1, M2,
dv.labels = c("Nonresponse: Economy", "Nonresponse: Culture"),
pred.labels = c("Intercept", "Ethnic minority", "Female", "Age", "Marital status Ref. Married: </br> Never married", "Separated/Divorced", "Windowed",
"Uneployed in last 5 years", "Education Ref. No/Primary: </br> Upper secondary", "Vocational", "Tertiary",
"Subjective income Ref. Comfortable:</br> Coping", "Difficult", "Political interest", "Political efficacy",
"Contact frequency Ref. No:</br> Rarely", "Every month", "Every week", "Everyday", "Racism item: Yes", "Racism item: Nonresponse",
"Income item: nonresponse", "Interviewer: age", "Interviewer: female", "Respondent: understood", "Respondent: reluctant", "Interview: Someone present"),
string.ci = "Conf. Int (95%)",
p.style = "a", title = "Table 4. Multilevel logistic regression models of nonreponse to questions measuring opinions on immigration"
)| Nonresponse: Economy | Nonresponse: Culture | |||
|---|---|---|---|---|
| Predictors | Odds Ratios | Conf. Int (95%) | Odds Ratios | Conf. Int (95%) |
| Intercept | 0.01 *** | 0.00 – 0.01 | 0.02 *** | 0.01 – 0.03 |
| Ethnic minority | 1.18 | 0.88 – 1.58 | 1.28 * | 1.03 – 1.61 |
| Female | 1.22 ** | 1.06 – 1.41 | 1.29 *** | 1.16 – 1.45 |
| Age | 1.01 *** | 1.01 – 1.02 | 1.01 *** | 1.01 – 1.02 |
| Marital status Ref. Married: Never married | 1.19 | 0.98 – 1.44 | 1.24 ** | 1.07 – 1.45 |
| Separated/Divorced | 0.95 | 0.75 – 1.19 | 0.93 | 0.77 – 1.12 |
| Windowed | 1.10 | 0.89 – 1.36 | 1.11 | 0.93 – 1.31 |
| Uneployed in last 5 years | 0.90 | 0.73 – 1.11 | 0.90 | 0.76 – 1.06 |
| Education Ref. No/Primary: Upper secondary | 0.69 *** | 0.58 – 0.82 | 0.74 *** | 0.65 – 0.85 |
| Vocational | 0.65 *** | 0.51 – 0.84 | 0.84 | 0.70 – 1.02 |
| Tertiary | 0.81 | 0.64 – 1.01 | 1.03 | 0.87 – 1.22 |
| Subjective income Ref. Comfortable: Coping | 1.27 * | 1.05 – 1.53 | 1.09 | 0.94 – 1.25 |
| Difficult | 1.20 | 0.96 – 1.50 | 1.18 | 0.99 – 1.40 |
| Political interest | 1.21 *** | 1.11 – 1.32 | 1.09 * | 1.01 – 1.17 |
| Political efficacy | 0.89 *** | 0.86 – 0.93 | 0.89 *** | 0.86 – 0.92 |
| Contact frequency Ref. No: Rarely | 0.81 * | 0.66 – 1.00 | 0.98 | 0.83 – 1.15 |
| Every month | 0.83 | 0.65 – 1.06 | 0.85 | 0.69 – 1.04 |
| Every week | 0.92 | 0.74 – 1.14 | 0.85 | 0.71 – 1.02 |
| Everyday | 0.88 | 0.69 – 1.12 | 0.89 | 0.73 – 1.08 |
| Racism item: Yes | 1.22 * | 1.04 – 1.42 | 1.17 * | 1.03 – 1.32 |
| Racism item: Nonresponse | 2.65 *** | 2.13 – 3.28 | 2.58 *** | 2.16 – 3.09 |
| Income item: nonresponse | 1.47 *** | 1.25 – 1.74 | 1.88 *** | 1.64 – 2.14 |
| Interviewer: age | 1.00 | 1.00 – 1.01 | 1.00 | 1.00 – 1.01 |
| Interviewer: female | 1.13 | 0.95 – 1.35 | 1.06 | 0.91 – 1.23 |
| Respondent: understood | 0.67 *** | 0.57 – 0.78 | 0.63 *** | 0.56 – 0.71 |
| Respondent: reluctant | 1.55 *** | 1.31 – 1.82 | 1.53 *** | 1.33 – 1.75 |
| Interview: Someone present | 1.05 | 0.85 – 1.29 | 1.04 | 0.87 – 1.23 |
| Random Effects | ||||
| σ2 | 3.29 | 3.29 | ||
| τ00 | 0.74 intnum | 0.75 intnum | ||
| 0.29 cntry | 0.32 cntry | |||
| ICC | 0.24 | 0.25 | ||
| N | 20 cntry | 20 cntry | ||
| 2025 intnum | 2025 intnum | |||
| Observations | 35421 | 35421 | ||
| Marginal R2 / Conditional R2 | 0.118 / 0.327 | 0.113 / 0.332 | ||
|
||||
3.2. Differences between both nonreponses
There are, however, some diffrences in the strenght of coefficients between both models. Odds ratio for women is higher in the model predicting probability of nonresponse to impact on culture item. While women are 29% more likely than man to refrain from answering the culture item, it is 22% for nonresponse to the economy item (see Figures 3 and 4 below for comparison). On average, keeping all other variable at their means, the probability of nonresponse for women in ESS 2014 iss 2.3% and 3.8%, for each item respectively.
Age effect is the same in both models, and one year results in 1% higher chances of nonrespose for each question. Since nonresponse to the question on culture impacts was higher than for the question on economy impacts, the probability of nonreponse for a 70-year-old person is almost 4% for the culture item, and 2% for the economy item, while it is about 2% and 1.5% for 20-year-olds, respectively (see Figures 5 and 6 below for comparison).
Gender: Economy item
Gender: Culture item
Age: Economy item
4. Conclusions
This replication paper aim was to test whether nonresponse pattern is the same for the item asking about impact of immigration on country’s economy and the item asking about impacts on culture. The obtained results are largely in line with the article by Piekut (2019). The realistic and symbolic threats – that might be mobilised among those with less favourable attitudes to immigration – are two separate phenomena, yet they are correlated one with another (Meuleman et al. 2009). Hence, the direction of coefficients was the same for two nonresponses.
Contrary to Piekut (2019), we found that a few independent variables – like the frequency of contact – turned out not to be statistically significant in the models. The proportion of nonresponse for both items was low (3-5%), which results in a very uneven division in the dependent variable and a small number of cases in one category. Detecting patterns in distributions with unbalanced data is more difficult and logistic regression might underestimate the probability of rare events (King, Zeng 2001). As such, the count model chosen by Piekut (2019) seems to be a sensible option trying to overcome this problem, while building on the fact that both measures relate to the same latent variable anti-immigration attitudes.
References
Berinsky, A. J. (1999). The two faces of public opinion. American Journal of Political Science, 43(4), 1209-1230.
Chandler, C. R., & Tsai, Y. M. (2001). Social factors influencing immigration attitudes: an analysis of data from the General Social Survey. The Social Science Journal, 38(2), 177-188.
ESS (2014). ESS Round 7: European Social Survey Round 7 Data. 2014a. Data file edition 2.1. NSD – Norwegian Centre for Research Data, Norway – Data Archive and distributor of ESS data for ESS ERIC.
Herda, D. (2013). Too many immigrants? Examining alternative forms of immigrant population innumeracy. Sociological Perspectives, 56(2), 213-240.
Markaki, Y., & Longhi, S. (2013). What determines attitudes to immigration in European countries? An analysis at the regional level. Migration Studies, 1(3), 311-337.
Freese, J., & Peterson, D. (2017). Replication in social science. Annual Review of Sociology, 43, 147-165.
Hellwig, T., & Sinno, A. (2017). Different groups, different threats: public attitudes towards immigrants. Journal of Ethnic and Migration Studies, 43(3), 339-358.
Jedinger, A., & Eisentraut, M. (2020). Exploring the Differential Effects of Perceived Threat on Attitudes Toward Ethnic Minority Groups in Germany. Frontiers in Psychology, 10, 2895, https://doi.org/10.3389/fpsyg.2019.02895.
King, G., & Zeng, L. (2001). Logistic regression in rare events data. Political Analysis, 9(2), 137-163.
Meuleman, B., Davidov, E., & Billiet, J. (2009). Changing attitudes toward immigration in Europe, 2002–2007: A dynamic group conflict theory approach. Social Science Research, 38(2), 352-365.
Piekut, A. (2019). Survey nonresponse in attitudes towards immigration in Europe. Journal of Ethnic and Migration Studies, 1-26, doi: 10.1080/1369183X.2019.1661773
Sommet, N., & Morselli, D. (2017). Keep Calm and Learn Multilevel Logistic Modeling: A Simplified Three-Step Procedure Using Stata, R, Mplus, and SPSS. International Review of Social Psychology, 30, 203-218.
Schotte, S., & Winkler, H. (2018). Why are the elderly more averse to immigration when they are more likely to benefit? Evidence across countries. International Migration Review, 52(4), 1250-1282.
Endnotes
Appendix
Appendix 1: Dataframe summary
library(summarytools)
dfSummary(round_7x_finalx, plain.ascii = FALSE, style = 'grid', graph.magnif = 0.75, valid.col = FALSE, tmp.img.dir = "/tmp", headings = FALSE, caption="Table 1. Summary of data frame")| No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Missing |
|---|---|---|---|---|---|
| 1 | idno [numeric] |
Mean (sd) : 6765561.4 (23466750.9) min < med < max: 1 < 2897 < 100005599 IQR (CV) : 107972 (3.5) |
18359 distinct values | 0 (0%) |
|
| 2 | cntry [character] |
1. Germany 2. Ireland 3. Lithuania 4. United Kingdom 5. Czech Republic 6. Finland 7. Estonia 8. Spain 9. France 10. Netherlands [ 10 others ] |
3031 ( 8.1%) 2349 ( 6.3%) 2239 ( 6.0%) 2197 ( 5.9%) 2132 ( 5.7%) 2084 ( 5.6%) 2049 ( 5.5%) 1921 ( 5.1%) 1912 ( 5.1%) 1905 ( 5.1%) 15566 (41.6%) |
0 (0%) |
|
| 3 | minority [character] |
1. No 2. Yes |
34872 (94.5%) 2044 ( 5.5%) |
469 (1.25%) |
|
| 4 | sex [numeric] |
Min : 1 Mean : 1.5 Max : 2 |
1 : 17604 (47.1%) 2 : 19759 (52.9%) |
22 (0.06%) |
|
| 5 | age [numeric] |
Mean (sd) : 49.4 (18.7) min < med < max: 14 < 50 < 114 IQR (CV) : 30 (0.4) |
90 distinct values | 65 (0.17%) |
|
| 6 | marital_status [character] |
1. Married/Union 2. Never married 3. Separated/Divorced 4. Windowed |
18568 (50.1%) 11139 (30.1%) 4050 (10.9%) 3294 ( 8.9%) |
334 (0.89%) |
|
| 7 | unemployed5yr [numeric] |
Min : 0 Mean : 0.1 Max : 1 |
0 : 32396 (86.7%) 1 : 4989 (13.3%) |
0 (0%) |
|
| 8 | education [character] |
1. 1-Lower secondary 2. 2-Upper secondary 3. 3-Vocational 4. 4-Tertiary |
10419 (27.9%) 13362 (35.7%) 5244 (14.0%) 8360 (22.4%) |
0 (0%) |
|
| 9 | subj_income [character] |
1. 1-Living comfortably 2. 2-Coping 3. 3-Difficult |
12058 (32.5%) 17215 (46.4%) 7801 (21.0%) |
311 (0.83%) |
|
| 10 | polit_intr [numeric] |
Mean (sd) : 2.6 (0.9) min < med < max: 1 < 3 < 4 IQR (CV) : 1 (0.4) |
1 : 4312 (11.6%) 2 : 13751 (36.9%) 3 : 12513 (33.6%) 4 : 6715 (18.0%) |
94 (0.25%) |
|
| 11 | polit_efficacy [numeric] |
Mean (sd) : 3.6 (2.1) min < med < max: 0 < 3.5 < 10 IQR (CV) : 3 (0.6) |
116 distinct values | 93 (0.25%) |
|
| 12 | contact_freq [character] |
1. 1-Never 2. 2-Rarely 3. 3-Every month 4. 4-Every week 5. 5-Everyday |
5300 (14.3%) 7534 (20.3%) 5065 (13.7%) 10323 (27.9%) 8817 (23.8%) |
346 (0.93%) |
|
| 13 | racism [character] |
1. 1-No 2. 2-Yes 3. 3-nonresponse |
14285 (38.2%) 21289 (57.0%) 1811 ( 4.8%) |
0 (0%) |
|
| 14 | income_nonresponse [character] |
1. Answer 2. Nonresponse |
29835 (79.8%) 7550 (20.2%) |
0 (0%) |
|
| 15 | intnum [integer] |
Mean (sd) : 862 (671.3) min < med < max: 1 < 804 < 2108 IQR (CV) : 1254 (0.8) |
2039 distinct values | 0 (0%) |
|
| 16 | resp_vo_understood [character] |
1. Not very often 2. Very often |
11600 (31.2%) 25605 (68.8%) |
180 (0.48%) |
|
| 17 | resp_often_reluctant [character] |
1. No 2. Yes |
31713 (85.2%) 5505 (14.8%) |
167 (0.45%) |
|
| 18 | someone_present [character] |
1. No 2. Yes |
34104 (91.2%) 3281 ( 8.8%) |
0 (0%) |
|
| 19 | interviewer_age [integer] |
Mean (sd) : 35.8 (12.7) min < med < max: 1 < 38 < 66 IQR (CV) : 17 (0.4) |
66 distinct values | 169 (0.45%) |
|
| 20 | interviewer_gender [integer] |
Min : 1 Mean : 1.6 Max : 2 |
1 : 13248 (35.5%) 2 : 24015 (64.5%) |
122 (0.33%) |
|
| 21 | nonresponse_econ [character] |
1. Nonresponse 2. Response |
1235 ( 3.3%) 36150 (96.7%) |
0 (0%) |
|
| 22 | nonresponse_cult [character] |
1. Nonresponse 2. Response |
1981 ( 5.3%) 35404 (94.7%) |
0 (0%) |
|
| 23 | nonresponse_econ1 [numeric] |
Min : 0 Mean : 0 Max : 1 |
0 : 36150 (96.7%) 1 : 1235 ( 3.3%) |
0 (0%) |
|
| 24 | nonresponse_cult1 [numeric] |
Min : 0 Mean : 0.1 Max : 1 |
0 : 35404 (94.7%) 1 : 1981 ( 5.3%) |
0 (0%) |
Appendix 2. Entire R code used in the project
library(knitr)
## Global options
opts_chunk$set(echo=TRUE,
cache=TRUE,
comment=NA,
message=FALSE,
warning=FALSE)
library(essurvey)
library(dplyr)
library(ggplot2)
library(lme4)
library(foreign)
library(sjPlot)
library(sjmisc)
library(sjlabelled)
library(summarytools)
library("essurvey")
set_email("a.piekut@sheffield.ac.uk")
round_7 <- import_rounds(7)
round_7x = subset(round_7, select = c(idno, cntry, dweight, pspwght, pweight, blgetmg, imbgeco, imbleco, dfegcon, smegbhw, gndr, agea, maritalb, eisced, polintr, psppsgv, actrolg, psppipl, cptppol, ptcpplt, etapapl, hinctnta, hincfel, uemp5yr))
summary(round_7x)
#ESSinterviewer = read.spss("C:/Users/aneta/Google Drive (a.piekut@sheffield.ac.uk)/Teaching @ SMI/SMI205 - Advanced Research Project/SMI205 Replication - materials in R/AP nonresponse paper/ESS7INTe02_1.spss/ESS7INTe02_1.sav", to.data.frame=TRUE)
ESSinterviewer = read.spss("/Users/anetapiekut/Dropbox/ESS Non-attitudes/ESS7INTe02_1/ESS7INTe02_1.sav", to.data.frame=TRUE)
# Recoding 'cntry' in the main dataset
round_7x$cntry[round_7x$cntry == 'AT'] <- "Austria"
round_7x$cntry[round_7x$cntry == 'BE'] <- "Belgium"
round_7x$cntry[round_7x$cntry == 'CZ'] <- "Czech Republic"
round_7x$cntry[round_7x$cntry == 'DK'] <- "Denmark"
round_7x$cntry[round_7x$cntry == 'EE'] <- "Estonia"
round_7x$cntry[round_7x$cntry == 'FI'] <- "Finland"
round_7x$cntry[round_7x$cntry == 'FR'] <- "France"
round_7x$cntry[round_7x$cntry == 'DE'] <- "Germany"
round_7x$cntry[round_7x$cntry == 'HU'] <- "Hungary"
round_7x$cntry[round_7x$cntry == 'IE'] <- "Ireland"
round_7x$cntry[round_7x$cntry == 'IL'] <- "Israel"
round_7x$cntry[round_7x$cntry == 'LT'] <- "Lithuania"
round_7x$cntry[round_7x$cntry == 'NL'] <- "Netherlands"
round_7x$cntry[round_7x$cntry == 'NO'] <- "Norway"
round_7x$cntry[round_7x$cntry == 'PL'] <- "Poland"
round_7x$cntry[round_7x$cntry == 'PT'] <- "Portugal"
round_7x$cntry[round_7x$cntry == 'SI'] <- "Slovenia"
round_7x$cntry[round_7x$cntry == 'ES'] <- "Spain"
round_7x$cntry[round_7x$cntry == 'SE'] <- "Sweden"
round_7x$cntry[round_7x$cntry == 'CH'] <- "Switzerland"
round_7x$cntry[round_7x$cntry == 'GB'] <- "United Kingdom"
# Keeping only necessary variables from itnerviewer dataset
ESSinterviewer2 = subset(ESSinterviewer, select = c(idno, cntry, intnum, resundq, resrelq, preintf, intgndr, intagea))
# Attaching interviewer dataset, merge by both respondent number 'idno' and 'cntry' (as numbers repeat across data)
round_7x_final <- merge(round_7x, ESSinterviewer2, by=c("idno","cntry"), sort=TRUE)
round_7x_final[] <- lapply(round_7x_final, unclass)
# 'blgetmg' - beign ethnic minority --> 'minority'
round_7x_final$minority[round_7x_final$blgetmg == 2] <- "No"
round_7x_final$minority[round_7x_final$blgetmg == 1] <- "Yes"
# 'maritalb' --> 'marital_status'
round_7x_final$marital_status[round_7x_final$maritalb == 1] <- "Married/Union"
round_7x_final$marital_status[round_7x_final$maritalb == 2] <- "Married/Union"
round_7x_final$marital_status[round_7x_final$maritalb == 3] <- "Separated/Divorced"
round_7x_final$marital_status[round_7x_final$maritalb == 4] <- "Separated/Divorced"
round_7x_final$marital_status[round_7x_final$maritalb == 5] <- "Windowed"
round_7x_final$marital_status[round_7x_final$maritalb == 6] <- "Never married"
# 'uemp5yr' - any periods not working in last 5 years --> 'unemployed5yr'
round_7x_final$unemployed5yr[is.na(round_7x_final$uemp5yr)] <- 0
round_7x_final$unemployed5yr[round_7x_final$uemp5yr == 2] <- 0
round_7x_final$unemployed5yr[round_7x_final$uemp5yr == 1] <- 1
# 'eisced' - 7 levels of education --> 'education'
round_7x_final$education[round_7x_final$eisced == 1] <- "1-Lower secondary"
round_7x_final$education[round_7x_final$eisced == 2] <- "1-Lower secondary"
round_7x_final$education[round_7x_final$eisced == 3] <- "2-Upper secondary"
round_7x_final$education[round_7x_final$eisced == 4] <- "2-Upper secondary"
round_7x_final$education[round_7x_final$eisced == 5] <- "3-Vocational"
round_7x_final$education[round_7x_final$eisced == 6] <- "4-Tertiary"
round_7x_final$education[round_7x_final$eisced == 7] <- "4-Tertiary"
# 'hincfel' subjective income --> 'subj_income'
round_7x_final$subj_income[round_7x_final$hincfel == 1] <- "1-Living comfortably"
round_7x_final$subj_income[round_7x_final$hincfel == 2] <- "2-Coping"
round_7x_final$subj_income[round_7x_final$hincfel == 3] <- "3-Difficult"
round_7x_final$subj_income[round_7x_final$hincfel == 4] <- "3-Difficult"
# 'polintr' - interest in politics - look at the distribution --> stays the same
# Political efficacy scale - mean of psppsgv, actrolg, psppipl, cptppol, ptcpplt, etapapl --> 'polit_efficacy'
round_7x_final$polit_efficacy=rowMeans(round_7x_final[,c("psppsgv", "actrolg", "psppipl", "cptppol", "ptcpplt", "etapapl")], na.rm=TRUE)
# Recoding code
round_7x_final$contact_freq[round_7x_final$dfegcon == 1] <- "1-Never"
round_7x_final$contact_freq[round_7x_final$dfegcon == 2] <- "2-Rarely"
round_7x_final$contact_freq[round_7x_final$dfegcon == 3] <- "2-Rarely"
round_7x_final$contact_freq[round_7x_final$dfegcon == 4] <- "3-Every month"
round_7x_final$contact_freq[round_7x_final$dfegcon == 5] <- "4-Every week"
round_7x_final$contact_freq[round_7x_final$dfegcon == 6] <- "4-Every week"
round_7x_final$contact_freq[round_7x_final$dfegcon == 7] <- "5-Everyday"
# 'smegbhw' - recode into No / Yes / Nonresponse --> 'racism'
round_7x_final$racism[is.na(round_7x_final$smegbhw)] <- "3-nonresponse"
round_7x_final$racism[round_7x_final$smegbhw == 2] <- "2-Yes"
round_7x_final$racism[round_7x_final$smegbhw == 1] <- "1-No"
# 'hinctnta' - Income nonresponse - new variable 0/1 - responded/not respondened --> 'income_nonresponse'
round_7x_final$income_nonresponse[is.na(round_7x_final$hinctnta)] <- "Nonresponse"
round_7x_final$income_nonresponse[!is.na(round_7x_final$hinctnta)] <- "Answer"
# 'resundq' - On scale 1-5, Recoded into dummy 1-Very often (5), 0 - No (1-4) --> 'resp_vo_understood'
round_7x_final$resp_vo_understood[round_7x_final$resundq == 1] <- "Not very often"
round_7x_final$resp_vo_understood[round_7x_final$resundq == 2] <- "Not very often"
round_7x_final$resp_vo_understood[round_7x_final$resundq == 3] <- "Not very often"
round_7x_final$resp_vo_understood[round_7x_final$resundq == 4] <- "Not very often"
round_7x_final$resp_vo_understood[round_7x_final$resundq == 5] <- "Very often"
# 'resrelq' - On scale 1-5, Recoded into dummy 1-Yes (3/5), 0 - No (1-2) --> 'resp_often_reluctant'
round_7x_final$resp_often_reluctant[round_7x_final$resrelq == 1] <- "No"
round_7x_final$resp_often_reluctant[round_7x_final$resrelq == 2] <- "No"
round_7x_final$resp_often_reluctant[round_7x_final$resrelq == 3] <- "Yes"
round_7x_final$resp_often_reluctant[round_7x_final$resrelq == 4] <- "Yes"
round_7x_final$resp_often_reluctant[round_7x_final$resrelq == 5] <- "Yes"
# 'preintf' - Recoded into dummy 1-Yes (1), 0 - No (2 & NA) --> 'someone_present'
round_7x_final$someone_present[round_7x_final$preintf == 1] <- "Yes"
round_7x_final$someone_present[round_7x_final$preintf == 2] <- "No"
round_7x_final$someone_present[is.na(round_7x_final$preintf)] <- "No"
# Renaming other variables into more meaningful names
names(round_7x_final)[names(round_7x_final) == "agea"] <- "age"
names(round_7x_final)[names(round_7x_final) == "gndr"] <- "sex"
names(round_7x_final)[names(round_7x_final) == "polintr"] <- "polit_intr"
names(round_7x_final)[names(round_7x_final) == "intgndr"] <- "interviewer_gender"
names(round_7x_final)[names(round_7x_final) == "intagea"] <- "interviewer_age"
# Nonresponse to impact on economy
round_7x_final$nonresponse_econ[is.na(round_7x_final$imbgeco)] <- "Nonresponse"
round_7x_final$nonresponse_econ[!is.na(round_7x_final$imbgeco)] <- "Response"
# Nonresponse to impact on economy
round_7x_final$nonresponse_cult[is.na(round_7x_final$imbleco)] <- "Nonresponse"
round_7x_final$nonresponse_cult[!is.na(round_7x_final$imbleco)] <- "Response"
library(dplyr)
# Dropping Isreal and 'other' education from the sample
round_7x_final2 <- round_7x_final[which(round_7x_final$education != 55),]
round_7x_final3 <- round_7x_final2[which(round_7x_final2$cntry != "Israel"),]
# Subsetting again
round_7x_finalx = subset(round_7x_final3, select = c(idno, cntry, minority, sex, age, marital_status, unemployed5yr, education, subj_income, polit_intr, polit_efficacy, contact_freq, racism, income_nonresponse, intnum, resp_vo_understood, resp_often_reluctant, someone_present, interviewer_age, interviewer_gender, nonresponse_econ, nonresponse_cult))
# round_7x_finalx[!is.na(round_7x_final4)]
# round_7x_finalx <- na.omit(round_7x_final4)
# summary(round_7x_finalx)
# Deleting all observations with missing data for key variables
# row.has.na <- apply(round_7x_final4, 1, function(x){any(is.na(x))})
# sum(row.has.na)
# round_7x_finalx <- round_7x_final4[!row.has.na,]
# round_7x_finalx <- na.omit(round_7x_final4)
library(summarytools)
freq(round_7x_finalx$nonresponse_econ, style = "rmarkdown", headings = FALSE, caption="Table 1. Frequency distribution of nonresponse to immigration impact of economy question")
library(summarytools)
freq(round_7x_finalx$nonresponse_cult, style = "rmarkdown", headings = FALSE, caption="Table 2. Frequency distribution of nonresponse to immigration impact of economy question")
library(dplyr)
# Data in a form of percentages per country first
ESS_2014_perc1 <- round_7x_finalx %>%
group_by(cntry,nonresponse_econ) %>%
summarise(count=n()) %>%
mutate(perc=count/sum(count))
library(ggplot2)
# Graph whether respondents answered question on impact of immigration by country
ggplot(ESS_2014_perc1, aes(x = factor(cntry), y = perc*100, fill = factor(nonresponse_econ))) +
geom_bar(stat="identity", width = 0.7) +
labs(x = "Country", y = "Percentage", fill = "nonresponse") + coord_flip() +
theme_minimal(base_size = 14) +
ggtitle("Figure 1. Nonresponse to impact on economy item \n across ESS 2014 countries")
library(dplyr)
# Data in a form of percentages per country first
ESS_2014_perc2 <- round_7x_finalx %>%
group_by(cntry,nonresponse_cult) %>%
summarise(count=n()) %>%
mutate(perc=count/sum(count))
library(ggplot2)
# Graph whether respondents answered question on impact of immigration by country
ggplot(ESS_2014_perc2, aes(x = factor(cntry), y = perc*100, fill = factor(nonresponse_cult))) +
geom_bar(stat="identity", width = 0.7) +
labs(x = "Country", y = "Percentage", fill = "nonresponse") + coord_flip() +
theme_minimal(base_size = 14) +
ggtitle("Figure 1. Nonresponse to impact on culture item \n across ESS 2014 countries")
# Nonrepose into numeric
round_7x_finalx$nonresponse_econ1[round_7x_finalx$nonresponse_econ == "Nonresponse"] <- 1
round_7x_finalx$nonresponse_econ1[round_7x_finalx$nonresponse_econ == "Response"] <- 0
round_7x_finalx$nonresponse_econ1<-as.numeric(round_7x_finalx$nonresponse_econ1)
round_7x_finalx$nonresponse_cult1[round_7x_finalx$nonresponse_cult == "Nonresponse"] <- 1
round_7x_finalx$nonresponse_cult1[round_7x_finalx$nonresponse_cult == "Response"] <- 0
round_7x_finalx$nonresponse_cult1<-as.numeric(round_7x_finalx$nonresponse_cult1)
library(lme4)
# Baseline models
M0a <- glmer(nonresponse_econ1 ~ (1 | cntry) + (1 | intnum), data = round_7x_finalx, family = "binomial", nAGQ=0)
M0b <- glmer(nonresponse_cult1 ~ (1 | cntry) + (1 | intnum), data = round_7x_finalx, family = "binomial", nAGQ=0)
tab_model(M0a, M0b,
dv.labels = c("Nonresponse: Economy", "Nonresponse: Culture"),
string.ci = "Conf. Int (95%)",
p.style = "a", title = "Table 3. Multilevel logistic regression models of nonreponse - baseline / empty model"
)
# Individual-level variables
M1 <- glmer(nonresponse_econ1 ~ minority + factor(sex) + age + marital_status + unemployed5yr + education + subj_income + polit_intr + polit_efficacy + contact_freq + racism + income_nonresponse + interviewer_age + interviewer_gender + resp_vo_understood + resp_often_reluctant + someone_present + (1 | cntry) + (1 | intnum), data = round_7x_finalx, family = "binomial", nAGQ=0)
# Interview variables
M2 <- glmer(nonresponse_cult1 ~ minority + factor(sex) + age + marital_status + unemployed5yr + education + subj_income + polit_intr + polit_efficacy + contact_freq + racism + income_nonresponse + interviewer_age + interviewer_gender + resp_vo_understood + resp_often_reluctant + someone_present + (1 | cntry) + (1 | intnum), data = round_7x_finalx, family = "binomial", nAGQ=0)
tab_model(M1, M2,
dv.labels = c("Nonresponse: Economy", "Nonresponse: Culture"),
pred.labels = c("Intercept", "Ethnic minority", "Female", "Age", "Marital status Ref. Married: </br> Never married", "Separated/Divorced", "Windowed",
"Uneployed in last 5 years", "Education Ref. No/Primary: </br> Upper secondary", "Vocational", "Tertiary",
"Subjective income Ref. Comfortable:</br> Coping", "Difficult", "Political interest", "Political efficacy",
"Contact frequency Ref. No:</br> Rarely", "Every month", "Every week", "Everyday", "Racism item: Yes", "Racism item: Nonresponse",
"Income item: nonresponse", "Interviewer: age", "Interviewer: female", "Respondent: understood", "Respondent: reluctant", "Interview: Someone present"),
string.ci = "Conf. Int (95%)",
p.style = "a", title = "Table 4. Multilevel logistic regression models of nonreponse to questions measuring opinions on immigration"
)
plot_model(M1, type = "pred", terms = "sex [1, 2]", title="Figure 3. Predicted probabilities of nonresponse to impact on economy item")
plot_model(M2, type = "pred", terms = "sex [1, 2]", title="Figure 4. Predicted probabilities of nonresponse to impact on culture item")
plot_model(M1, type = "pred", terms = "age [20, 70]", title="Figure 5. Predicted probabilities of nonresponse to impact on economy item")
plot_model(M2, type = "pred", terms = "age [20, 70]", title="Figure 6. Predicted probabilities of nonresponse to impact on culture item")
$(document).ready(function() {
$('.footnotes ol').appendTo('#endnotes');
$('.footnotes').remove();
});
library(summarytools)
dfSummary(round_7x_finalx, plain.ascii = FALSE, style = 'grid', graph.magnif = 0.75, valid.col = FALSE, tmp.img.dir = "/tmp", headings = FALSE, caption="Table 1. Summary of data frame")There are still some dissimilarities in the final list of variables I use. I do not add any country-level variables, as I am not interested in re-testing such contextual effects↩