Group Member: Yilin Zhou 1009413207 Luwei Yan 1009583381
A briefly introduction: In our research, the research question is “Is there a relationship between Norway’s admission of immigrants from poor countries and the efficiency of healthcare?”
And the hypothesis we set is: H0: There is no relationship between Norway’s willingness to accept of immigrants from poor countries and the efficiency of healthcare
H1: The higher the probability that Norway is willing to accept immigrants from poor contries, the higher the efficiency of health care
H2: The strength of the relationship between Norway’s willingness to accept immigrants from poor countries and the efficiency of healthcare is influenced by the satisfaction of country ’s economy: The higher the satisfaction with the country’s economy, the stronger the link between willingness to accept and health care efficiency
Also in our research the outcome variable is the efficiency of healthcare(hlthcef), the other predictor variable is willingness to accept immigrants from poor countries (impcntr) and the satisfaction with the country’s economy (stfeco-which also the interaction with impcntr)
First we set the packages to pepare the coding
# List of packages
packages <- c("tidyverse", "modelsummary", "forcats", "RColorBrewer",
"fst", "viridis", "knitr", "rmarkdown", "ggridges", "viridis", "questionr", "flextable", "infer","broom")
new_packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)
lapply(packages, library, character.only = TRUE)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Loading required package: viridisLite
##
##
## Attaching package: 'flextable'
##
##
## The following object is masked from 'package:purrr':
##
## compose
## [[1]]
## [1] "lubridate" "forcats" "stringr" "dplyr" "purrr" "readr"
## [7] "tidyr" "tibble" "ggplot2" "tidyverse" "stats" "graphics"
## [13] "grDevices" "utils" "datasets" "methods" "base"
##
## [[2]]
## [1] "modelsummary" "lubridate" "forcats" "stringr" "dplyr"
## [6] "purrr" "readr" "tidyr" "tibble" "ggplot2"
## [11] "tidyverse" "stats" "graphics" "grDevices" "utils"
## [16] "datasets" "methods" "base"
##
## [[3]]
## [1] "modelsummary" "lubridate" "forcats" "stringr" "dplyr"
## [6] "purrr" "readr" "tidyr" "tibble" "ggplot2"
## [11] "tidyverse" "stats" "graphics" "grDevices" "utils"
## [16] "datasets" "methods" "base"
##
## [[4]]
## [1] "RColorBrewer" "modelsummary" "lubridate" "forcats" "stringr"
## [6] "dplyr" "purrr" "readr" "tidyr" "tibble"
## [11] "ggplot2" "tidyverse" "stats" "graphics" "grDevices"
## [16] "utils" "datasets" "methods" "base"
##
## [[5]]
## [1] "fst" "RColorBrewer" "modelsummary" "lubridate" "forcats"
## [6] "stringr" "dplyr" "purrr" "readr" "tidyr"
## [11] "tibble" "ggplot2" "tidyverse" "stats" "graphics"
## [16] "grDevices" "utils" "datasets" "methods" "base"
##
## [[6]]
## [1] "viridis" "viridisLite" "fst" "RColorBrewer" "modelsummary"
## [6] "lubridate" "forcats" "stringr" "dplyr" "purrr"
## [11] "readr" "tidyr" "tibble" "ggplot2" "tidyverse"
## [16] "stats" "graphics" "grDevices" "utils" "datasets"
## [21] "methods" "base"
##
## [[7]]
## [1] "knitr" "viridis" "viridisLite" "fst" "RColorBrewer"
## [6] "modelsummary" "lubridate" "forcats" "stringr" "dplyr"
## [11] "purrr" "readr" "tidyr" "tibble" "ggplot2"
## [16] "tidyverse" "stats" "graphics" "grDevices" "utils"
## [21] "datasets" "methods" "base"
##
## [[8]]
## [1] "rmarkdown" "knitr" "viridis" "viridisLite" "fst"
## [6] "RColorBrewer" "modelsummary" "lubridate" "forcats" "stringr"
## [11] "dplyr" "purrr" "readr" "tidyr" "tibble"
## [16] "ggplot2" "tidyverse" "stats" "graphics" "grDevices"
## [21] "utils" "datasets" "methods" "base"
##
## [[9]]
## [1] "ggridges" "rmarkdown" "knitr" "viridis" "viridisLite"
## [6] "fst" "RColorBrewer" "modelsummary" "lubridate" "forcats"
## [11] "stringr" "dplyr" "purrr" "readr" "tidyr"
## [16] "tibble" "ggplot2" "tidyverse" "stats" "graphics"
## [21] "grDevices" "utils" "datasets" "methods" "base"
##
## [[10]]
## [1] "ggridges" "rmarkdown" "knitr" "viridis" "viridisLite"
## [6] "fst" "RColorBrewer" "modelsummary" "lubridate" "forcats"
## [11] "stringr" "dplyr" "purrr" "readr" "tidyr"
## [16] "tibble" "ggplot2" "tidyverse" "stats" "graphics"
## [21] "grDevices" "utils" "datasets" "methods" "base"
##
## [[11]]
## [1] "questionr" "ggridges" "rmarkdown" "knitr" "viridis"
## [6] "viridisLite" "fst" "RColorBrewer" "modelsummary" "lubridate"
## [11] "forcats" "stringr" "dplyr" "purrr" "readr"
## [16] "tidyr" "tibble" "ggplot2" "tidyverse" "stats"
## [21] "graphics" "grDevices" "utils" "datasets" "methods"
## [26] "base"
##
## [[12]]
## [1] "flextable" "questionr" "ggridges" "rmarkdown" "knitr"
## [6] "viridis" "viridisLite" "fst" "RColorBrewer" "modelsummary"
## [11] "lubridate" "forcats" "stringr" "dplyr" "purrr"
## [16] "readr" "tidyr" "tibble" "ggplot2" "tidyverse"
## [21] "stats" "graphics" "grDevices" "utils" "datasets"
## [26] "methods" "base"
##
## [[13]]
## [1] "infer" "flextable" "questionr" "ggridges" "rmarkdown"
## [6] "knitr" "viridis" "viridisLite" "fst" "RColorBrewer"
## [11] "modelsummary" "lubridate" "forcats" "stringr" "dplyr"
## [16] "purrr" "readr" "tidyr" "tibble" "ggplot2"
## [21] "tidyverse" "stats" "graphics" "grDevices" "utils"
## [26] "datasets" "methods" "base"
##
## [[14]]
## [1] "broom" "infer" "flextable" "questionr" "ggridges"
## [6] "rmarkdown" "knitr" "viridis" "viridisLite" "fst"
## [11] "RColorBrewer" "modelsummary" "lubridate" "forcats" "stringr"
## [16] "dplyr" "purrr" "readr" "tidyr" "tibble"
## [21] "ggplot2" "tidyverse" "stats" "graphics" "grDevices"
## [26] "utils" "datasets" "methods" "base"
ess <- read_fst("All-ESS-Data.fst")
Context
To do the context, we first need to do the summary table for our three main variable, to see what’s the mean,missing value etc,. And also have a slightly understanding about the three variable that we are going to study. And set the country we choose Filtering Norway Data: The dataset is filtered for entries where the country code cntry is “NO”, which stands for Norway. This suggests that Norway is chosen for analysis.
Norway_data <- ess %>%
filter(cntry == "NO")
write_fst(Norway_data, "~/Desktop/YiLin_Zhou_Project_202/Norway_data.fst")
Saving and Clearing Memory: The Norway_data is saved to disk, and then the memory is cleared to optimize performance.
rm(list=ls()); gc()
## used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
## Ncells 1274107 68.1 2132372 113.9 NA 2132372 113.9
## Vcells 2175646 16.6 1256286816 9584.8 16384 1348133646 10285.5
df <- read_fst("~/Desktop/YiLin_Zhou_Project_202/Norway_data.fst")
df$year <- NA
replacements <- c(2008, 2009, 2010, 2011)
for(i in 1:10){
df$year[df$essround == i] <- replacements[i]
}
Reading Data and Assigning Years, the saved dataset is read back into memory, and a new variable year is created.
Here is to transfer the category to numerical And clean the outcome and predictor Outcome:hlthcef Predictor: impcntr and stfeco(for interaction)
we use the coding methods from the homework 6,7,8, because is more easy to understand and organize what we are doing.
Norway_data <- df
Norway_data_table_subset <- Norway_data %>%
mutate(
impcntr = case_when(
impcntr == 1 ~ "Allow many to come and live here",
impcntr == 2 ~ "Allow some",
impcntr == 3 ~ "Allow a few",
impcntr == 3 ~ "Allow none",
TRUE ~ as.character(impcntr)
)
)
Norway_data_table_subset <- Norway_data %>%
mutate(
hlthcef = case_when(
hlthcef == 0 ~ "Extremely inefficient",
hlthcef == 10 ~ "Extremely efficient",
TRUE ~ as.character(hlthcef)
)
)
Norway_data_table_subset <- Norway_data %>%
mutate(
stfeco = case_when(
stfeco == 0 ~ "Extremely dissatisfied",
stfeco == 10 ~ "Extremely satisfied",
TRUE ~ as.character(stfeco)
)
)
Norway_data_table_subset <- Norway_data %>%
mutate(
impcntr = ifelse(impcntr %in% c(7, 8, 9), NA, impcntr),
hlthcef = ifelse(hlthcef %in% c(77, 88, 99), NA, hlthcef),
stfeco = ifelse(stfeco %in% c(77, 88, 99), NA, stfeco),
)
Transforming Categories to Numerical: The impcntr, hlthcef, and stfeco variables are recoded into more descriptive labels, turning categorical data into a format that may be more interpretable for analysis. Handling Missing Values: The code identifies specific values within impcntr, hlthcef, and stfeco as NA, which likely represent missing or irrelevant responses.
table(Norway_data$impcntr)
##
## 1 2 3 4 7 8 9
## 2822 7880 4605 646 23 78 11
table(Norway_data$hlthcef)
##
## 0 1 2 3 4 5 6 7 8 9 10 77 88
## 8 31 83 165 177 294 258 296 176 46 9 1 5
table(Norway_data$stfeco)
##
## 0 1 2 3 4 5 6 7 8 9 10 77 88 99
## 121 92 268 653 779 1573 1653 3019 3971 2342 1477 6 98 13
Tables of Frequency: The table function is used to create frequency tables for impcntr, hlthcef, and stfeco, providing an initial look at the distribution of responses.Here is to check all the observation of three variable, and to double check again those outlier number
summary_table <- datasummary_skim(Norway_data_table_subset %>% select(impcntr, hlthcef, stfeco), output = "flextable")
## Warning: The histogram argument is only supported for (a) output types "default",
## "html", "kableExtra", or "gt"; (b) writing to file paths with extensions
## ".html", ".jpg", or ".png"; and (c) Rmarkdown, knitr or Quarto documents
## compiled to PDF (via kableExtra) or HTML (via kableExtra or gt). Use
## `histogram=FALSE` to silence this warning.
Summary Table Creation: The datasummary_skim function is used to create a summary table of the selected variables. It to be aimed at generating a table for descriptive statistics.
summary_table
| Unique (#) | Missing (%) | Mean | SD | Min | Median | Max |
|---|---|---|---|---|---|---|---|
impcntr | 5 | 1 | 2.2 | 0.8 | 1.0 | 2.0 | 4.0 |
hlthcef | 12 | 90 | 5.4 | 2.0 | 0.0 | 6.0 | 10.0 |
stfeco | 12 | 1 | 7.0 | 2.1 | 0.0 | 7.0 | 10.0 |
Here is the summary table, more explain is on poster
Norway_data_v2 <- Norway_data_table_subset %>%
rename(
`Allow many/few immigrants from poorer countries outside Europe` = impcntr,
`Provision of health care, how efficient` = hlthcef,
`How satisfied with present state of economy in country` = stfeco
)
summary_table_v2 <- datasummary_skim(Norway_data_v2 %>% select(`Allow many/few immigrants from poorer countries outside Europe`,`Provision of health care, how efficient`, `How satisfied with present state of economy in country`), output = "flextable", title = "Table 1: Descriptive Statistics for outcome variables")
## Warning: The histogram argument is only supported for (a) output types "default",
## "html", "kableExtra", or "gt"; (b) writing to file paths with extensions
## ".html", ".jpg", or ".png"; and (c) Rmarkdown, knitr or Quarto documents
## compiled to PDF (via kableExtra) or HTML (via kableExtra or gt). Use
## `histogram=FALSE` to silence this warning.
Now we are rename the title and the variable name, to make this graph more esay to read.
summary_table_v2
| Unique (#) | Missing (%) | Mean | SD | Min | Median | Max |
|---|---|---|---|---|---|---|---|
Allow many/few immigrants from poorer countries outside Europe | 5 | 1 | 2.2 | 0.8 | 1.0 | 2.0 | 4.0 |
Provision of health care, how efficient | 12 | 90 | 5.4 | 2.0 | 0.0 | 6.0 | 10.0 |
How satisfied with present state of economy in country | 12 | 1 | 7.0 | 2.1 | 0.0 | 7.0 | 10.0 |
The data for “Allow many/few immigrants”, with five unique responses and a small percentage of missing data (1%). It has a mean of 2.2 and a standard deviation of 0.8, indicating a narrow spread around the average on a scale from 1 to 4. And indicate a relatively low variety of responses and minimal missing data, suggesting a consensus or uniformity in opinion. The moderate mean and low standard deviation hint at a leaning towards allowing some immigrants, with not much variation in responses.
For “Health care efficiency,” it assesses the perceived efficiency of health care, with twelve unique responses, a high percentage of missing data (90%), and a mean of 5.4 on a scale from 0 to 10. a wide range of responses and very high missing data suggest difficulties in assessing this variable, which could imply challenges in public perception or a non-uniform distribution of opinions.
“Economic satisfaction” also with twelve unique responses and 1% missing data, with a mean of 7.0 and a standard deviation of 2.1, suggesting a wider range of views on a scale from 0 to 10. The median values indicate the middle response when ordered, and the max values show the highest response recorded. It shows a wider range of responses with a higher mean, indicating overall positive satisfaction with the economy. However, the higher standard deviation points to a significant spread in the level of satisfaction among respondents.
The data suggest a moderate level of openness towards immigration, with the average leaning slightly above the midpoint. The high percentage of missing data for health care efficiency could indicate issues with data collection or respondent uncertainty. Despite this, the average score suggests a perception of moderate efficiency in health care. Economic satisfaction appears relatively high, with a mean score of 7 out of 10, but with a wider spread of opinions, indicating more variability in how respondents view the country’s economic state. These insights could be used to understand public opinion in Norway regarding these issues.
flextable::save_as_docx(summary_table_v2, path = "summary_table_v2.docx",
width = 7.0, height = 7.0)
set_flextable_defaults(fonts_ignore=TRUE)
print(summary_table_v2, preview = "pdf")
## a flextable object.
## col_keys: ` `, `Unique (#)`, `Missing (%)`, `Mean`, `SD`, `Min`, `Median`, `Max`
## header has 1 row(s)
## body has 3 row(s)
## original dataset sample:
## Unique (#)
## 1 Allow many/few immigrants from poorer countries outside Europe 5
## 2 Provision of health care, how efficient 12
## 3 How satisfied with present state of economy in country 12
## Missing (%) Mean SD Min Median Max
## 1 1 2.2 0.8 1.0 2.0 4.0
## 2 90 5.4 2.0 0.0 6.0 10.0
## 3 1 7.0 2.1 0.0 7.0 10.0
Since it is not possible to save it as pdf directly after we try, I first saved the word document format in this step and then converted it to pdf. The code snippet reflects an intention to analyze immigration attitudes, health care efficiency perceptions, and satisfaction with the economy in Norway, suggesting that these are the key variables of interest in the research. The choice of Norway as the country of focus needs to be justified in the context of the research objectives, which isn’t clear from the code alone. The data and sample description, including the number of observations post-cleaning, would be available in the summary table generated. The descriptive statistics table (Table 1) would offer insights into the central tendency and variability of the data, and the distribution of responses across the scales used for each variable.
Rearch Hypotheses
Now is to code the null-distribution graph
Norway_data <- df %>%
filter(cntry == "NO")
Norway_data <- Norway_data %>%
mutate(
impcntr = case_when(
impcntr == 1 ~ "Allow many to come and live here",
impcntr == 2 ~ "Allow some",
impcntr == 3 ~ "Allow a few",
impcntr == 3 ~ "Allow none",
TRUE ~ as.character(impcntr)
)
)
Norway_data <- Norway_data %>%
mutate(
hlthcef = case_when(
hlthcef == 0 ~ "Extremely inefficient",
hlthcef == 10 ~ "Extremely efficient",
TRUE ~ as.character(hlthcef)
)
)
Data Filtering and Transformation: Filtering for Norwegian data and recoding categorical variables like impcntr and hlthcef into more descriptive labels.
test_stat <- Norway_data %>%
specify(explanatory = impcntr,
response = hlthcef) %>%
hypothesize(null = "independence") %>%
calculate(stat = "Chisq")
## Warning: Removed 14516 rows containing missing values.
Statistical Testing (Chi-Squared): Performing a chi-squared test to examine the relationship between attitudes towards immigration (impcntr) and perceptions of health care efficiency (hlthcef). The test_stat part calculates the chi-squared statistic, and null_distribution generates a null distribution for the test.
print(test_stat$stat)
## X-squared
## 57.00027
An X-squared value of 57.00027 in a chi-squared test represents the calculated test statistic. This statistic is used to determine whether there is a significant association between two categorical variables. In chi-squared tests, a higher value typically suggests a greater likelihood of an association or difference between the groups or variables being compared. This value is compared against a critical value from the chi-squared distribution to determine statistical significance.
null_distribution <- Norway_data %>%
specify(explanatory = impcntr,
response = hlthcef) %>%
hypothesize(null = "independence") %>%
generate(reps = 1000, type = "permute") %>%
calculate(stat = "chisq")
## Warning: Removed 14516 rows containing missing values.
p_val <- null_distribution %>%
get_pvalue(obs_stat = test_stat, direction = "two-sided")
p_val
## # A tibble: 1 × 1
## p_value
## <dbl>
## 1 0.726
P-Value Calculation: Determining the p-value to assess the significance of the observed relationship. A high p-value (0.724) suggests that the null hypothesis of independence cannot be rejected at the standard alpha level of 0.05. The p-value is 0.724, which is much higher than the typical alpha level of 0.05. The large p-value means that we fail to reject the null hypothesis that the two variables are independent of each other. In other words, there isn’t strong evidence to suggest that there’s a significant association between the [impcntr]Allow many/few immigrants from poorer countries outside Europe and [hlthcef] Provision of health care, how efficient
alpha <- 0.05
c_val <- qchisq(alpha, df = 1, lower.tail = FALSE)
cat("Critical Value:", round(c_val, 3), "\n")
## Critical Value: 3.841
These few stpes is to know the critical value and p value and the Chi-squared test result To understand our hypothesis, and the null distribution.he critical value at the 0.05 significance level is 3.841. The Chi-squared test statistic that compute is 57.00027, which is significantly higher than the critical value. This suggests a rejection of the null hypothesis. This would mean that there is a statistically significant association between [impcntr]Allow many/few immigrants from poorer countries outside Europe and [hlthcef] Provision of health care, how efficient
null_distribution %>%
visualize() +
shade_p_value(obs_stat = test_stat, direction = "two-sided")
## Warning: Chi-Square usually corresponds to right-tailed tests. Proceed with
## caution.
Interval: Generating visualizations of the null distribution, highlighting the p-value and confidence intervals to contextualize the chi-squared test result.
null_distribution
## Response: hlthcef (factor)
## Explanatory: impcntr (factor)
## Null Hypothesis: independence
## # A tibble: 1,000 × 2
## replicate stat
## <int> <dbl>
## 1 1 53.5
## 2 2 62.8
## 3 3 33.0
## 4 4 42.0
## 5 5 32.5
## 6 6 49.8
## 7 7 48.8
## 8 8 38.6
## 9 9 41.4
## 10 10 43.6
## # ℹ 990 more rows
conf_int <- null_distribution %>%
get_confidence_interval(level = 0.95, type = "percentile")
null_distribution %>%
visualize() +
shade_p_value(obs_stat = test_stat, direction = "two-sided") +
shade_confidence_interval(endpoints = conf_int)
## Warning: Chi-Square usually corresponds to right-tailed tests. Proceed with
## caution.
The histogram shows a distribution of simulated chi-squared statistics
under the null hypothesis. With four blocks indicating different
frequencies, the distribution is uneven. The second unit of ‘stat’ with
a count around 880 is particularly notable, suggesting a concentration
of simulated values in this range. The first unit of stat has almost
zero count, the third unit of stat has a count of about 62.5,and the
fourth unit of stat also has a very small count, probably around 10.
Given the p-value of 0.724 and a critical value of 3.841, the observed
statistic from your actual data is not extreme enough to reject the null
hypothesis of independence. This means that there isn’t strong evidence
to suggest a significant association between immigration attitudes
(impcntr) and health care efficiency perceptions (hlthcef). For the
overall Implication,the results suggest that the probability
distribution of views on health care efficiency does not significantly
vary across different attitudes towards immigration, at least not to a
degree detectable by this test.
plot <- null_distribution %>%
visualize() +
shade_p_value(obs_stat = test_stat, direction = "two-sided") +
shade_confidence_interval(endpoints = conf_int)
ggsave("output_hypothesis_plot.pdf", plot, width = 8, height = 6)
## Warning: Chi-Square usually corresponds to right-tailed tests. Proceed with
## caution.
Here to save the Pdf file
null_distribution
## Response: hlthcef (factor)
## Explanatory: impcntr (factor)
## Null Hypothesis: independence
## # A tibble: 1,000 × 2
## replicate stat
## <int> <dbl>
## 1 1 53.5
## 2 2 62.8
## 3 3 33.0
## 4 4 42.0
## 5 5 32.5
## 6 6 49.8
## 7 7 48.8
## 8 8 38.6
## 9 9 41.4
## 10 10 43.6
## # ℹ 990 more rows
Finding
Regression
Now we are doing the regression for the finding
df <- df %>% filter(!is.na(hlthcef))
df <- df %>% filter(!is.na(impcntr))
df <- df %>% filter(!is.na(stfeco))
Data Preparation: Filtering out missing values from variables hlthcef, impcntr, and stfeco.
This steps took us the longest time capare the other, because is keeping showing error
df$weight <- df$dweight * df$pweight
model1 <- lm(hlthcef ~ impcntr, data = df, weights = weight)
model2 <- lm(hlthcef ~ impcntr + stfeco, data = df, weights = weight)
model3 <- lm(hlthcef ~ impcntr + stfeco + impcntr * stfeco, data = df, weights = weight)
modelsummary(
list(model1, model2, model3),
fmt = 1,
estimate = c("{estimate} ({std.error}){stars}",
"{estimate} ({std.error}){stars}",
"{estimate} ({std.error}){stars}"),
statistic = NULL,
coef_omit = "ei" # Adjust based on your variable names
)
| (1) | (2) | (3) | |
|---|---|---|---|
| (Intercept) | 5.9 (0.4)*** | 4.9 (0.4)*** | 6.8 (0.6)*** |
| impcntr | −0.1 (0.2) | −0.1 (0.2) | −0.9 (0.2)*** |
| stfeco | 0.2 (0.0)*** | −0.1 (0.1)* | |
| impcntr × stfeco | 0.1 (0.0)*** | ||
| Num.Obs. | 1549 | 1549 | 1549 |
| R2 | 0.000 | 0.049 | 0.062 |
| R2 Adj. | −0.001 | 0.047 | 0.060 |
| AIC | 9620.5 | 9545.6 | 9525.6 |
| BIC | 9636.6 | 9566.9 | 9552.3 |
| Log.Lik. | −4807.273 | −4768.778 | −4757.793 |
| RMSE | 5.39 | 5.26 | 5.22 |
Weight Calculation: Creating a new weight variable by multiplying dweight and pweight. egression Modeling: Constructing three linear models with increasing complexity - a single predictor model, a two-predictor model, and an interaction model involving both predictors.
modelsummary(
list(model1, model2, model3),
fmt = 1,
estimate = c("{estimate} ({std.error}){stars}",
"{estimate} ({std.error}){stars}",
"{estimate} ({std.error}){stars}"),
statistic = NULL,
coef_omit = "ei",
coef_rename = c("hlthcef" = "Provision of health care, how efficient", "impcntr" = "Allow many/few immigrants from poorer countries outside Europe", "stfeco" = "How satisfied with present state of economy in country"),
title = 'Table 3. Regression models predicting health care efficient' )
| (1) | (2) | (3) | |
|---|---|---|---|
| (Intercept) | 5.9 (0.4)*** | 4.9 (0.4)*** | 6.8 (0.6)*** |
| Allow many/few immigrants from poorer countries outside Europe | −0.1 (0.2) | −0.1 (0.2) | −0.9 (0.2)*** |
| How satisfied with present state of economy in country | 0.2 (0.0)*** | −0.1 (0.1)* | |
| Allow many/few immigrants from poorer countries outside Europe:How satisfied with present state of economy in country | 0.1 (0.0)*** | ||
| Num.Obs. | 1549 | 1549 | 1549 |
| R2 | 0.000 | 0.049 | 0.062 |
| R2 Adj. | −0.001 | 0.047 | 0.060 |
| AIC | 9620.5 | 9545.6 | 9525.6 |
| BIC | 9636.6 | 9566.9 | 9552.3 |
| Log.Lik. | −4807.273 | −4768.778 | −4757.793 |
| RMSE | 5.39 | 5.26 | 5.22 |
Model Summary Presentation: Using modelsummary to generate a table with formatted estimates, standard errors, and significance stars. Also, renaming coefficients for clarity and saving the summary as a PDF and text file.
Model 1 (Column 1): Intercept: High significance with an estimate of 5.9. impcntr: Not significant with an estimate of -0.1. R²: Virtually zero, indicating no variance explained by the model. Model 2 (Column 2): Intercept: High significance, lower estimate of 4.9. impcntr: Consistent with Model 1. stfeco: Statistically significant, suggesting economic satisfaction is a predictor of perceived health care efficiency. R²: Improved, 4.9% of variance explained. Model 3 (Column 3): Intercept: Highest estimate at 6.8. impcntr: Now highly significant with a larger negative estimate. stfeco: Remains significant, but the estimate is negative. impcntr × stfeco: The interaction term is significant, indicating a combined effect of immigration attitudes and economic satisfaction on health care efficiency perceptions. R²: Further improved, explaining 6.2% of variance. The number of observations is constant across models, indicating a robust dataset. AIC (Akaike Information Criterion): A measure of the relative quality of a statistical model for a given set of data. A lower AIC suggests a better model. AIC rewards goodness of fit but also includes a penalty that increases with the number of estimated parameters. The AIC decreases from Model 1 to Model 3, suggesting that the models’ fit improves as more variables are added, even after accounting for the increase in complexity.
BIC (Bayesian Information Criterion): Similar to AIC, but with a higher penalty for models with more parameters. Like AIC, a lower BIC indicates a better model. The BIC also decreases from Model 1 to Model 3, indicating improved model fit with the additional variables and interaction term.
Log.Lik. (Log Likelihood): This is the logarithm of the likelihood function, which measures the probability of observing the data given the parameters of the model. The higher the log likelihood (less negative in this case), the better the model fits the data. The Log Likelihood value becomes less negative from Model 1 through Model 3, indicating that the models’ fit to the data improves with additional predictors and interaction terms. The increase in R² and adjusted R² from Model 1 to 3 suggests adding stfeco and the interaction term improves the model’s explanatory power. The AIC and BIC decrease across the models, also indicating an improvement. Significance levels are denoted by stars, with more stars indicating higher significance levels. Overall, all indicators suggest that Model 3, which includes both predictors and their interaction, provides the best fit to the data among the three models presented.
df$weight <- df$dweight * df$pweight
model1 <- lm(hlthcef ~ impcntr, data = df, weights = df$weight)
model2 <- lm(hlthcef ~ impcntr + stfeco, data = df, weights = df$weight)
model3 <- lm(hlthcef ~ impcntr + stfeco + impcntr * stfeco, data = df, weights = df$weight)
summary1 <- modelsummary(
list(model1, model2, model3),
fmt = 1,
estimate = c("{estimate} ({std.error}){stars}",
"{estimate} ({std.error}){stars}",
"{estimate} ({std.error}){stars}"),
statistic = NULL,
coef_omit = "ei",
coef_rename = c(
"hlthcef" = "Provision of health care, how efficient",
"impcntr" = "Allow many/few immigrants from poorer countries outside Europe",
"stfeco" = "How satisfied with present state of economy in country"
),
title = 'Table 3. Regression models predicting health care efficient (0 to 100)'
)
cat(summary1, file = "model_summary_regression_rename.pdf")
I can’t open the pdf file on my cumputer, so I saved as txt version.
par(mfrow = c(2,2))
plot(model2, pch = 19, col = scales::alpha("grey", .8), cex = .5)
1.Regression Assumptions Diagnostics Clustering of Points and Outliers:
The clustering of points within the fitted values of 4-6 could indicate
that for a range of predictor values, the response variable has little
variation. However, it could also suggest that the model might not be
appropriate for the data or that there are missing variables that
explain the variance in the response. Most of the almost all of the
points are clustered on the far left side of the image fitted value of
4-6, with a few outliers very far from the horizontal line distributed
on the far right side of the image fitted value of 16 and fitted value
of 18. The presence of a few outliers with extreme fitted values (16 and
18) that are far from the rest of the data points could have a
significant impact on the regression line. These outliers could be
influential observations that disproportionately affect the model’s
coefficients and should be investigated further. The points in the graph
are not evenly distributed around a horizontal line, indicating that the
model is not linear in its fit, suggesting that perhaps the model is not
capturing the linear relationships in the data well. Non-Linear Fit: the
points do not scatter randomly around a horizontal line and instead show
a pattern, this could indicate that a linear model is not the best fit
for the data. Non-linearity suggests the relationship between the
predictors and the response variable might be better captured by a
non-linear model. Heteroscedasticity: The irregular pattern of variance
as the fitted values increase or decrease suggests heteroscedasticity,
meaning the residuals have non-constant variance. This violates one of
the key assumptions of linear regression, which can affect the validity
of the model’s standard errors and, consequently, any inferences made
from the model. Observations with points farther from the horizon may be
outliers, and the pattern of points may reflect model inaccuracies in
certain regions. Examining these points may help identify outliers in
the data or possible model improvements.
2.Q-Q Plot it is used to assess the normality of residuals. Deviations of the plotted points from the straight line (45-degree line) indicate violations of normality. In this plot, the degree of deviation from the line reflects how much the distribution of residuals diverges from a normal distribution. In this Q-Q Plot General Trend: In this plot, the deviation from the straight line reflects how far the residual distribution deviates from the normal distribution. The point distribution in this image is a very gentle straight line, not arranged along the diagonal, but a straight line approximate to standardized residuals=0, but with a slight upward trend. The fact that points form a gentle straight line but not along the diagonal (45-degree line) could suggest that the residuals have a systematic deviation from normality. The upward trend of the line might imply a distribution with heavier tails than the normal distribution. Outliers: Most of the points are distributed on this gentle straight line, and some outliers are distributed on the rightmost side of the image where the theoretical quantiles is 3. Compared with most standardized residuals floating around 0, the standardized residuals of these outliers are between 12-15. The presence of outliers indicates extreme values that do not conform to the expected distribution under normality. These points are significantly deviating from the expected line, suggesting the presence of heavy tails or skewness in the distribution of residuals. The Q-Q Plot suggests that the normality assumption may not hold for the residuals of the regression model, which can impact the reliability of certain statistical tests and confidence intervals that assume normality. These findings might necessitate the use of robust statistical methods or transformations to address the non-normality of residuals.
3.Scale-Location Plot: This plot checks the homoscedasticity assumption, meaning the residuals have constant variance across all levels of fitted values. The y-axis shows the square root of the absolute standardized residuals, which should be spread randomly and evenly along the y-axis for all fitted values on the x-axis. U-shaped Distribution: The concave U-shaped pattern in the residuals within the interval of 4-8 on the fitted values suggests that the variance of the residuals is not constant. This pattern is indicative of heteroscedasticity, where the error variance changes at different levels of the predictor variable. Linearly Increasing Variance: A uniform rising linear distribution in the interval of 8-12 implies that the variance of the residuals is increasing with the fitted values. This again suggests heteroscedasticity, as the assumption of constant variance (homoscedasticity) across all levels of fitted values is violated. Clustering of Points: Similar to the Residuals vs Fitted Plot, the clustering of points at the far left with fitted values between 4-6 indicates that the variance is smaller in this range of fitted values. However, the presence of outliers with large residuals at fitted values of 16 and 18 on the far right indicates that variance may be much larger for these points. Given these indications, the homoscedasticity assumption of linear regression does not hold, and this might affect the standard errors of the regression coefficients, leading to unreliable hypothesis tests and confidence intervals. To address this, transformations of the response variable or the use of heteroscedasticity-consistent standard errors might be considered. Additionally, further investigation into the cause of the heteroscedasticity could provide insights into the underlying data structure or potential data issues.
par(mfrow = c(2, 2))
pdf("diagnostic_plots.pdf")
plot(model2, pch = 19, col = scales::alpha("grey", .8), cex = .5)
dev.off()
## quartz_off_screen
## 2
Interaction
Here, we add some new packages for our interaction graph
packages <- c("effects", "survey", "MASS", "equatiomatic")
# Install packages if they aren't installed already
new_packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)
# Load the packages
lapply(packages, library, character.only = TRUE)
## Loading required package: carData
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
## Loading required package: grid
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
## Loading required package: survival
##
## Attaching package: 'survey'
## The following object is masked from 'package:graphics':
##
## dotchart
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
## [[1]]
## [1] "effects" "carData" "fstcore" "broom" "infer"
## [6] "flextable" "questionr" "ggridges" "rmarkdown" "knitr"
## [11] "viridis" "viridisLite" "fst" "RColorBrewer" "modelsummary"
## [16] "lubridate" "forcats" "stringr" "dplyr" "purrr"
## [21] "readr" "tidyr" "tibble" "ggplot2" "tidyverse"
## [26] "stats" "graphics" "grDevices" "utils" "datasets"
## [31] "methods" "base"
##
## [[2]]
## [1] "survey" "survival" "Matrix" "grid" "effects"
## [6] "carData" "fstcore" "broom" "infer" "flextable"
## [11] "questionr" "ggridges" "rmarkdown" "knitr" "viridis"
## [16] "viridisLite" "fst" "RColorBrewer" "modelsummary" "lubridate"
## [21] "forcats" "stringr" "dplyr" "purrr" "readr"
## [26] "tidyr" "tibble" "ggplot2" "tidyverse" "stats"
## [31] "graphics" "grDevices" "utils" "datasets" "methods"
## [36] "base"
##
## [[3]]
## [1] "MASS" "survey" "survival" "Matrix" "grid"
## [6] "effects" "carData" "fstcore" "broom" "infer"
## [11] "flextable" "questionr" "ggridges" "rmarkdown" "knitr"
## [16] "viridis" "viridisLite" "fst" "RColorBrewer" "modelsummary"
## [21] "lubridate" "forcats" "stringr" "dplyr" "purrr"
## [26] "readr" "tidyr" "tibble" "ggplot2" "tidyverse"
## [31] "stats" "graphics" "grDevices" "utils" "datasets"
## [36] "methods" "base"
##
## [[4]]
## [1] "equatiomatic" "MASS" "survey" "survival" "Matrix"
## [6] "grid" "effects" "carData" "fstcore" "broom"
## [11] "infer" "flextable" "questionr" "ggridges" "rmarkdown"
## [16] "knitr" "viridis" "viridisLite" "fst" "RColorBrewer"
## [21] "modelsummary" "lubridate" "forcats" "stringr" "dplyr"
## [26] "purrr" "readr" "tidyr" "tibble" "ggplot2"
## [31] "tidyverse" "stats" "graphics" "grDevices" "utils"
## [36] "datasets" "methods" "base"
Interaction Analysis: Using effect from the effects package to visualize the interaction effect in the third model and saving this plot as a PDF. For the visualization, we choose to use the interaction graph on tutorial
interaction_plot <- effect("impcntr * stfeco", model3, na.rm=TRUE)
plot(interaction_plot,
main="Interaction effect",
xlab="Allow many/few immigrants from poorer countries outside Europ",
ylab="Provision of health care, how efficien")
The interaction plot indicates a clear interaction between the
explanatory variable (attitudes towards immigration), the outcome
variable (perceived health care efficiency), and a third variable
(economic satisfaction, or stfeco). Dependence on a Third Variable: The
interaction effect indicates that the impact of immigration attitudes on
health care efficiency perceptions is modulated by the level of economic
satisfaction (stfeco). This suggests that the effect of one predictor on
the outcome is not constant but changes with the level of another
predictor. Interaction Effect: The non-parallel nature of the lines
suggests that there is indeed an interaction between the two independent
variables (impcntr and stfeco). Slope Changes: As stfeco decreases, the
slope of the line relating immigration attitudes to health care
efficiency also decreases. This suggests that higher economic
satisfaction amplifies the effect of immigration attitudes on perceived
health care efficiency. Error Margin: The error margin decreasing with
stfeco implies that predictions are more precise at higher levels of
economic satisfaction. Varying Error Range: The increasing range of
possible errors with the level of immigration attitudes indicates
greater uncertainty in health care efficiency perceptions among those
with stronger attitudes toward immigration, either for or against.
Specific Observations: For stfeco=90, the steep slope implies a strong
positive relationship between immigration attitudes and health care
efficiency at this high level of economic satisfaction. For stfeco=70
and stfeco=40, the positive relationship persists but weakens as stfeco
decreases. At stfeco=20, the relationship flattens, suggesting that at
this lower level of economic satisfaction, immigration attitudes have
little to no effect on health care efficiency perceptions. At stfeco=0,
the relationship inverts, with a downward trend suggesting that when
economic satisfaction is very low, more positive attitudes towards
immigration might correlate with lower perceived health care efficiency.
Radial Pattern: Overlapping the five lines would likely show a fan-like
spread, which is characteristic of interaction effects where the impact
of one variable changes at different levels of another. This interaction
plot provides valuable insights into how economic satisfaction modifies
the relationship between attitudes towards immigration and perceptions
of health care efficiency. It suggests that policymakers should consider
economic satisfaction when assessing public opinion on health care and
immigration policies.
interaction_plot <- effect("impcntr * stfeco", model3, na.rm = TRUE)
pdf("interaction_plot.pdf")
plot(interaction_plot,
main = "Interaction effect",
xlab = "Allow many/few immigrants from poorer countries outside Europe",
ylab = "Provision of health care, how efficient")
dev.off()
## quartz_off_screen
## 2
interaction_plot
##
## impcntr*stfeco effect
## stfeco
## impcntr 0 20 40 70 90
## 1 5.99024195 5.616386 5.242531 4.681747 4.307892
## 3 4.27356385 8.216906 12.160248 18.075261 22.018603
## 4 3.41522480 9.517166 15.619107 24.772018 30.873959
## 6 1.69854670 12.117685 22.536824 38.165532 48.584671
## 8 -0.01813141 14.718205 29.454542 51.559046 66.295383
Here is the equation
equatiomatic::extract_eq(model3, use_coefs = TRUE)
\[ \operatorname{\widehat{hlthcef}} = 6.85 - 0.86(\operatorname{impcntr}) - 0.13(\operatorname{stfeco}) + 0.11(\operatorname{impcntr} \times \operatorname{stfeco}) \]
Conclusion: Our study explored the complex relationship between the acceptance of immigrants from poorer countries and healthcare efficiency in Norway. Contrary to the initial hypothesis, our findings, supported by chi-squared test statistics and visual representations, suggest no substantial evidence to affirm that Norway’s acceptance of immigrants significantly influences the efficiency of its healthcare system. Through rigorous statistical analysis, we aimed to uncover whether the integration of immigrants correlates with healthcare performance, underpinned by a nuanced consideration of Norway’s socio-economic satisfaction.
Despite the chi-squared test yielding a value significantly higher than the critical value, the high p-value implied a failure to reject the null hypothesis, indicating no discernible association between immigration attitudes and healthcare efficiency. This finding challenges the prevailing discourse that links immigration directly to the burdening of healthcare systems. Instead, it underscores the need for a nuanced understanding of healthcare dynamics amidst demographic changes. The regression models further nuanced our understanding, revealing that economic satisfaction significantly interacts with immigration attitudes, impacting healthcare efficiency perceptions. However, diagnostic plots raised concerns over potential non-linearity and heteroscedasticity, suggesting that a linear model may not fully capture the complexities of the data. The literature emphasized that Sub-Saharan African (SSA) immigrants in Norway face barriers both prior to and within the healthcare system. Challenges such as language barriers, financial constraints, long waiting times, and perceived discrimination impede equal healthcare access. Our study echoes these findings, highlighting that healthcare access is influenced by more than just affordability—it is also shaped by awareness, cultural sensitivity, and systemic responsiveness. The preference for healthcare providers with immigrant backgrounds and private healthcare services among SSA immigrants indicates a mistrust in the public healthcare system and a lack of cultural competence among healthcare professionals. This reflects a broader issue of inequity and suggests a need for policies that ensure healthcare professionals are adequately trained in cultural sensitivity. The interaction model, however, illuminated a nuanced picture: economic satisfaction plays a critical role in moderating the relationship between immigration attitudes and healthcare efficiency. This finding aligns with the intersectionality approach highlighted in our introduction, which emphasizes the importance of multiple, intersecting social categories in shaping experiences within healthcare systems. Our study underscores the complexity of such intersections and contributes to the broader conversation by suggesting that economic satisfaction cannot be sidelined when considering immigration policies’ impact on healthcare. Conversely, the ethnic boundary-making theory, which posits that immigrants’ healthcare experiences are shaped by interactions within the healthcare system and societal attitudes, was not directly supported by our regression models. While our study does not disconfirm this theory, it points to the possibility that the mechanisms of ethnic boundary-making are more subtly embedded in healthcare interactions than our data could reveal. Situated against the backdrop of ethnic boundary-making and intersectionality theories, our findings offer a mixed perspective. The lack of a direct correlation does not fully align with the ethnic boundary-making theory, which would anticipate clearer divisions in healthcare experiences based on immigration status. However, the interaction effects resonate with intersectionality, affirming that economic satisfaction could be a pivotal factor in shaping healthcare outcomes amidst demographic shifts.
In a broader social context, our study’s results contribute to an understanding of how national healthcare systems intersect with immigration policies and societal well-being. Specifically, our findings suggest that in Norway, a country acclaimed for its universal healthcare and high living standards, the acceptance of immigrants from poorer countries does not have a detectable direct impact on the efficiency of healthcare services. This counters a common narrative in public discourse that immigration may strain or diminish the quality of healthcare.
Our results also highlight the significance of economic satisfaction in shaping perceptions of healthcare efficiency. In times of high economic satisfaction, there appears to be a stronger link between positive attitudes toward immigration and perceived healthcare efficiency. This reflects a societal tendency to view immigration more favorably during periods of economic prosperity, which may, in turn, influence the perceived performance of public services.
Regarding Norway, our study adds to the conversation on the country’s ability to maintain its welfare standards amidst demographic changes. Norway serves as a pertinent case for examining the effects of immigration in a welfare state, given its high standards of public services and its ongoing adaptation to a more diverse population.
The study has several limitations that could guide future research. The potential non-linearity and heteroscedasticity in the data suggest that the relationships between variables might be more complex than what can be captured by linear models. Further research could employ nonlinear modeling or machine learning techniques to uncover these complex patterns. Additionally, qualitative research could provide more profound insights into individual experiences within the healthcare system, which may reveal more about the nuances that quantitative data alone cannot capture.
Moreover, exploring other contexts, such as comparing Norway’s experience with other countries that have different immigration and healthcare policies, could offer a comparative perspective on how different systems interact with demographic changes. Such comparative studies could help identify best practices that could be adopted by various countries to maintain healthcare efficiency in the face of changing populations.
In summary, our study contributes to the nuanced understanding of immigration’s implications for public healthcare systems, indicating that policy and societal attitudes towards immigrants can significantly influence public health outcomes. It also add a layer to our understanding of social dynamics in Norway, highlighting how societal satisfaction with the economy might influence perceptions of public services in the context of immigration. This is particularly salient for Norway, a country grappling with integrating an increasingly diverse population while maintaining high standards of public service. It underscores the importance of considering economic satisfaction and other societal factors when evaluating the impact of immigration on healthcare services, thus providing a comprehensive perspective that goes beyond the simplistic associations often presented in public debates. While our study suggests no direct relationship between immigration acceptance and healthcare efficiency, it highlights the significance of economic satisfaction in this dynamic and contributes to a nuanced understanding of how societal factors intersect in the context of public service provision. Further research in this area could illuminate the complexities of policy-making in a globalized world and support the creation of inclusive, equitable healthcare systems.