Yilin_Zhou_Research

Group Member: Yilin Zhou 1009413207 Luwei Yan 1009583381

A briefly introduction: In our research, the research question is “Is there a relationship between Norway’s admission of immigrants from poor countries and the efficiency of healthcare?”

And the hypothesis we set is: H0: There is no relationship between Norway’s willingness to accept of immigrants from poor countries and the efficiency of healthcare

H1: The higher the probability that Norway is willing to accept immigrants from poor contries, the higher the efficiency of health care

H2: The strength of the relationship between Norway’s willingness to accept immigrants from poor countries and the efficiency of healthcare is influenced by the satisfaction of country ’s economy: The higher the satisfaction with the country’s economy, the stronger the link between willingness to accept and health care efficiency

Also in our research the outcome variable is the efficiency of healthcare(hlthcef), the other predictor variable is willingness to accept immigrants from poor countries (impcntr) and the satisfaction with the country’s economy (stfeco-which also the interaction with impcntr)

First we set the packages to pepare the coding

# List of packages
packages <- c("tidyverse", "modelsummary", "forcats", "RColorBrewer", 
              "fst", "viridis", "knitr", "rmarkdown", "ggridges", "viridis", "questionr", "flextable", "infer","broom") 

new_packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)

lapply(packages, library, character.only = TRUE)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Loading required package: viridisLite
## 
## 
## Attaching package: 'flextable'
## 
## 
## The following object is masked from 'package:purrr':
## 
##     compose

## [[1]]
##  [1] "lubridate" "forcats"   "stringr"   "dplyr"     "purrr"     "readr"    
##  [7] "tidyr"     "tibble"    "ggplot2"   "tidyverse" "stats"     "graphics" 
## [13] "grDevices" "utils"     "datasets"  "methods"   "base"     
## 
## [[2]]
##  [1] "modelsummary" "lubridate"    "forcats"      "stringr"      "dplyr"       
##  [6] "purrr"        "readr"        "tidyr"        "tibble"       "ggplot2"     
## [11] "tidyverse"    "stats"        "graphics"     "grDevices"    "utils"       
## [16] "datasets"     "methods"      "base"        
## 
## [[3]]
##  [1] "modelsummary" "lubridate"    "forcats"      "stringr"      "dplyr"       
##  [6] "purrr"        "readr"        "tidyr"        "tibble"       "ggplot2"     
## [11] "tidyverse"    "stats"        "graphics"     "grDevices"    "utils"       
## [16] "datasets"     "methods"      "base"        
## 
## [[4]]
##  [1] "RColorBrewer" "modelsummary" "lubridate"    "forcats"      "stringr"     
##  [6] "dplyr"        "purrr"        "readr"        "tidyr"        "tibble"      
## [11] "ggplot2"      "tidyverse"    "stats"        "graphics"     "grDevices"   
## [16] "utils"        "datasets"     "methods"      "base"        
## 
## [[5]]
##  [1] "fst"          "RColorBrewer" "modelsummary" "lubridate"    "forcats"     
##  [6] "stringr"      "dplyr"        "purrr"        "readr"        "tidyr"       
## [11] "tibble"       "ggplot2"      "tidyverse"    "stats"        "graphics"    
## [16] "grDevices"    "utils"        "datasets"     "methods"      "base"        
## 
## [[6]]
##  [1] "viridis"      "viridisLite"  "fst"          "RColorBrewer" "modelsummary"
##  [6] "lubridate"    "forcats"      "stringr"      "dplyr"        "purrr"       
## [11] "readr"        "tidyr"        "tibble"       "ggplot2"      "tidyverse"   
## [16] "stats"        "graphics"     "grDevices"    "utils"        "datasets"    
## [21] "methods"      "base"        
## 
## [[7]]
##  [1] "knitr"        "viridis"      "viridisLite"  "fst"          "RColorBrewer"
##  [6] "modelsummary" "lubridate"    "forcats"      "stringr"      "dplyr"       
## [11] "purrr"        "readr"        "tidyr"        "tibble"       "ggplot2"     
## [16] "tidyverse"    "stats"        "graphics"     "grDevices"    "utils"       
## [21] "datasets"     "methods"      "base"        
## 
## [[8]]
##  [1] "rmarkdown"    "knitr"        "viridis"      "viridisLite"  "fst"         
##  [6] "RColorBrewer" "modelsummary" "lubridate"    "forcats"      "stringr"     
## [11] "dplyr"        "purrr"        "readr"        "tidyr"        "tibble"      
## [16] "ggplot2"      "tidyverse"    "stats"        "graphics"     "grDevices"   
## [21] "utils"        "datasets"     "methods"      "base"        
## 
## [[9]]
##  [1] "ggridges"     "rmarkdown"    "knitr"        "viridis"      "viridisLite" 
##  [6] "fst"          "RColorBrewer" "modelsummary" "lubridate"    "forcats"     
## [11] "stringr"      "dplyr"        "purrr"        "readr"        "tidyr"       
## [16] "tibble"       "ggplot2"      "tidyverse"    "stats"        "graphics"    
## [21] "grDevices"    "utils"        "datasets"     "methods"      "base"        
## 
## [[10]]
##  [1] "ggridges"     "rmarkdown"    "knitr"        "viridis"      "viridisLite" 
##  [6] "fst"          "RColorBrewer" "modelsummary" "lubridate"    "forcats"     
## [11] "stringr"      "dplyr"        "purrr"        "readr"        "tidyr"       
## [16] "tibble"       "ggplot2"      "tidyverse"    "stats"        "graphics"    
## [21] "grDevices"    "utils"        "datasets"     "methods"      "base"        
## 
## [[11]]
##  [1] "questionr"    "ggridges"     "rmarkdown"    "knitr"        "viridis"     
##  [6] "viridisLite"  "fst"          "RColorBrewer" "modelsummary" "lubridate"   
## [11] "forcats"      "stringr"      "dplyr"        "purrr"        "readr"       
## [16] "tidyr"        "tibble"       "ggplot2"      "tidyverse"    "stats"       
## [21] "graphics"     "grDevices"    "utils"        "datasets"     "methods"     
## [26] "base"        
## 
## [[12]]
##  [1] "flextable"    "questionr"    "ggridges"     "rmarkdown"    "knitr"       
##  [6] "viridis"      "viridisLite"  "fst"          "RColorBrewer" "modelsummary"
## [11] "lubridate"    "forcats"      "stringr"      "dplyr"        "purrr"       
## [16] "readr"        "tidyr"        "tibble"       "ggplot2"      "tidyverse"   
## [21] "stats"        "graphics"     "grDevices"    "utils"        "datasets"    
## [26] "methods"      "base"        
## 
## [[13]]
##  [1] "infer"        "flextable"    "questionr"    "ggridges"     "rmarkdown"   
##  [6] "knitr"        "viridis"      "viridisLite"  "fst"          "RColorBrewer"
## [11] "modelsummary" "lubridate"    "forcats"      "stringr"      "dplyr"       
## [16] "purrr"        "readr"        "tidyr"        "tibble"       "ggplot2"     
## [21] "tidyverse"    "stats"        "graphics"     "grDevices"    "utils"       
## [26] "datasets"     "methods"      "base"        
## 
## [[14]]
##  [1] "broom"        "infer"        "flextable"    "questionr"    "ggridges"    
##  [6] "rmarkdown"    "knitr"        "viridis"      "viridisLite"  "fst"         
## [11] "RColorBrewer" "modelsummary" "lubridate"    "forcats"      "stringr"     
## [16] "dplyr"        "purrr"        "readr"        "tidyr"        "tibble"      
## [21] "ggplot2"      "tidyverse"    "stats"        "graphics"     "grDevices"   
## [26] "utils"        "datasets"     "methods"      "base"

ess <- read_fst("All-ESS-Data.fst")

Context

To do the context, we first need to do the summary table for our three main variable, to see what’s the mean,missing value etc,. And also have a slightly understanding about the three variable that we are going to study. And set the country we choose Filtering Norway Data: The dataset is filtered for entries where the country code cntry is “NO”, which stands for Norway. This suggests that Norway is chosen for analysis.

Norway_data <- ess %>% 
  filter(cntry == "NO") 

write_fst(Norway_data, "~/Desktop/YiLin_Zhou_Project_202/Norway_data.fst")

Saving and Clearing Memory: The Norway_data is saved to disk, and then the memory is cleared to optimize performance.

rm(list=ls()); gc()

##           used (Mb) gc trigger   (Mb) limit (Mb)   max used    (Mb)
## Ncells 1274107 68.1    2132372  113.9         NA    2132372   113.9
## Vcells 2175646 16.6 1256286816 9584.8      16384 1348133646 10285.5

df <- read_fst("~/Desktop/YiLin_Zhou_Project_202/Norway_data.fst")

df$year <- NA
replacements <- c(2008, 2009, 2010, 2011)
for(i in 1:10){
  df$year[df$essround == i] <- replacements[i]
}

Reading Data and Assigning Years, the saved dataset is read back into memory, and a new variable year is created.

Here is to transfer the category to numerical And clean the outcome and predictor Outcome:hlthcef Predictor: impcntr and stfeco(for interaction)

we use the coding methods from the homework 6,7,8, because is more easy to understand and organize what we are doing.

Norway_data <- df

Norway_data_table_subset <- Norway_data %>%
  mutate(
    impcntr = case_when(
      impcntr == 1 ~ "Allow many to come and live here",  
      impcntr == 2 ~ "Allow some",  
      impcntr == 3 ~ "Allow a few", 
      impcntr == 3 ~ "Allow none", 
      TRUE ~ as.character(impcntr)  
    )
  )

Norway_data_table_subset <- Norway_data %>%
  mutate(
    hlthcef = case_when(
    hlthcef == 0 ~ "Extremely inefficient",
    hlthcef == 10 ~ "Extremely efficient",
    TRUE ~ as.character(hlthcef)  
    )
  )

Norway_data_table_subset <- Norway_data %>%
  mutate(
    stfeco = case_when(
    stfeco == 0 ~ "Extremely dissatisfied",
    stfeco == 10 ~ "Extremely satisfied",
    TRUE ~ as.character(stfeco)  
    )
  )
Norway_data_table_subset <- Norway_data %>%
  mutate(
    impcntr = ifelse(impcntr %in% c(7, 8, 9), NA, impcntr),
    hlthcef = ifelse(hlthcef %in% c(77, 88, 99), NA, hlthcef), 
    stfeco = ifelse(stfeco %in% c(77, 88, 99), NA, stfeco),
  )

Transforming Categories to Numerical: The impcntr, hlthcef, and stfeco variables are recoded into more descriptive labels, turning categorical data into a format that may be more interpretable for analysis. Handling Missing Values: The code identifies specific values within impcntr, hlthcef, and stfeco as NA, which likely represent missing or irrelevant responses.

table(Norway_data$impcntr)

## 
##    1    2    3    4    7    8    9 
## 2822 7880 4605  646   23   78   11

table(Norway_data$hlthcef)

## 
##   0   1   2   3   4   5   6   7   8   9  10  77  88 
##   8  31  83 165 177 294 258 296 176  46   9   1   5

table(Norway_data$stfeco)

## 
##    0    1    2    3    4    5    6    7    8    9   10   77   88   99 
##  121   92  268  653  779 1573 1653 3019 3971 2342 1477    6   98   13

Tables of Frequency: The table function is used to create frequency tables for impcntr, hlthcef, and stfeco, providing an initial look at the distribution of responses.Here is to check all the observation of three variable, and to double check again those outlier number

summary_table <- datasummary_skim(Norway_data_table_subset %>% select(impcntr, hlthcef, stfeco), output = "flextable")

## Warning: The histogram argument is only supported for (a) output types "default",
##   "html", "kableExtra", or "gt"; (b) writing to file paths with extensions
##   ".html", ".jpg", or ".png"; and (c) Rmarkdown, knitr or Quarto documents
##   compiled to PDF (via kableExtra)  or HTML (via kableExtra or gt). Use
##   `histogram=FALSE` to silence this warning.

Summary Table Creation: The datasummary_skim function is used to create a summary table of the selected variables. It to be aimed at generating a table for descriptive statistics.

summary_table

	Unique (#)	Missing (%)	Mean	SD	Min	Median	Max
impcntr	5	1	2.2	0.8	1.0	2.0	4.0
hlthcef	12	90	5.4	2.0	0.0	6.0	10.0
stfeco	12	1	7.0	2.1	0.0	7.0	10.0

Here is the summary table, more explain is on poster

Norway_data_v2 <- Norway_data_table_subset %>%
  rename(
    `Allow many/few immigrants from poorer countries outside Europe` = impcntr,
    `Provision of health care, how efficient` = hlthcef,
    `How satisfied with present state of economy in country` = stfeco
  )

summary_table_v2 <- datasummary_skim(Norway_data_v2 %>% select(`Allow many/few immigrants from poorer countries outside Europe`,`Provision of health care, how efficient`, `How satisfied with present state of economy in country`), output = "flextable", title = "Table 1: Descriptive Statistics for outcome variables")

## Warning: The histogram argument is only supported for (a) output types "default",
##   "html", "kableExtra", or "gt"; (b) writing to file paths with extensions
##   ".html", ".jpg", or ".png"; and (c) Rmarkdown, knitr or Quarto documents
##   compiled to PDF (via kableExtra)  or HTML (via kableExtra or gt). Use
##   `histogram=FALSE` to silence this warning.

Now we are rename the title and the variable name, to make this graph more esay to read.

summary_table_v2

Table 1: Descriptive Statistics for outcome variables
	Unique (#)	Missing (%)	Mean	SD	Min	Median	Max
Allow many/few immigrants from poorer countries outside Europe	5	1	2.2	0.8	1.0	2.0	4.0
Provision of health care, how efficient	12	90	5.4	2.0	0.0	6.0	10.0
How satisfied with present state of economy in country	12	1	7.0	2.1	0.0	7.0	10.0

The data for “Allow many/few immigrants”, with five unique responses and a small percentage of missing data (1%). It has a mean of 2.2 and a standard deviation of 0.8, indicating a narrow spread around the average on a scale from 1 to 4. And indicate a relatively low variety of responses and minimal missing data, suggesting a consensus or uniformity in opinion. The moderate mean and low standard deviation hint at a leaning towards allowing some immigrants, with not much variation in responses.

For “Health care efficiency,” it assesses the perceived efficiency of health care, with twelve unique responses, a high percentage of missing data (90%), and a mean of 5.4 on a scale from 0 to 10. a wide range of responses and very high missing data suggest difficulties in assessing this variable, which could imply challenges in public perception or a non-uniform distribution of opinions.

“Economic satisfaction” also with twelve unique responses and 1% missing data, with a mean of 7.0 and a standard deviation of 2.1, suggesting a wider range of views on a scale from 0 to 10. The median values indicate the middle response when ordered, and the max values show the highest response recorded. It shows a wider range of responses with a higher mean, indicating overall positive satisfaction with the economy. However, the higher standard deviation points to a significant spread in the level of satisfaction among respondents.

The data suggest a moderate level of openness towards immigration, with the average leaning slightly above the midpoint. The high percentage of missing data for health care efficiency could indicate issues with data collection or respondent uncertainty. Despite this, the average score suggests a perception of moderate efficiency in health care. Economic satisfaction appears relatively high, with a mean score of 7 out of 10, but with a wider spread of opinions, indicating more variability in how respondents view the country’s economic state. These insights could be used to understand public opinion in Norway regarding these issues.

flextable::save_as_docx(summary_table_v2, path = "summary_table_v2.docx",
                       width = 7.0, height = 7.0)

set_flextable_defaults(fonts_ignore=TRUE)
print(summary_table_v2, preview = "pdf")

## a flextable object.
## col_keys: ` `, `Unique (#)`, `Missing (%)`, `Mean`, `SD`, `Min`, `Median`, `Max` 
## header has 1 row(s) 
## body has 3 row(s) 
## original dataset sample: 
##                                                                  Unique (#)
## 1 Allow many/few immigrants from poorer countries outside Europe          5
## 2                        Provision of health care, how efficient         12
## 3         How satisfied with present state of economy in country         12
##   Missing (%) Mean  SD Min Median  Max
## 1           1  2.2 0.8 1.0    2.0  4.0
## 2          90  5.4 2.0 0.0    6.0 10.0
## 3           1  7.0 2.1 0.0    7.0 10.0

Since it is not possible to save it as pdf directly after we try, I first saved the word document format in this step and then converted it to pdf. The code snippet reflects an intention to analyze immigration attitudes, health care efficiency perceptions, and satisfaction with the economy in Norway, suggesting that these are the key variables of interest in the research. The choice of Norway as the country of focus needs to be justified in the context of the research objectives, which isn’t clear from the code alone. The data and sample description, including the number of observations post-cleaning, would be available in the summary table generated. The descriptive statistics table (Table 1) would offer insights into the central tendency and variability of the data, and the distribution of responses across the scales used for each variable.

Rearch Hypotheses

Now is to code the null-distribution graph

Norway_data <- df %>%
  filter(cntry == "NO")

Norway_data <- Norway_data %>%
  mutate(
    impcntr = case_when(
      impcntr == 1 ~ "Allow many to come and live here",  
      impcntr == 2 ~ "Allow some",  
      impcntr == 3 ~ "Allow a few", 
      impcntr == 3 ~ "Allow none", 
      TRUE ~ as.character(impcntr)  
    )
  )

Norway_data <- Norway_data %>%
  mutate(
    hlthcef = case_when(
    hlthcef == 0 ~ "Extremely inefficient",
    hlthcef == 10 ~ "Extremely efficient",
    TRUE ~ as.character(hlthcef)  
    )
  )

Data Filtering and Transformation: Filtering for Norwegian data and recoding categorical variables like impcntr and hlthcef into more descriptive labels.

test_stat <- Norway_data %>%
  specify(explanatory = impcntr,
           response = hlthcef) %>%
  hypothesize(null = "independence") %>%
  calculate(stat = "Chisq")

## Warning: Removed 14516 rows containing missing values.

Statistical Testing (Chi-Squared): Performing a chi-squared test to examine the relationship between attitudes towards immigration (impcntr) and perceptions of health care efficiency (hlthcef). The test_stat part calculates the chi-squared statistic, and null_distribution generates a null distribution for the test.

print(test_stat$stat)

## X-squared 
##  57.00027

An X-squared value of 57.00027 in a chi-squared test represents the calculated test statistic. This statistic is used to determine whether there is a significant association between two categorical variables. In chi-squared tests, a higher value typically suggests a greater likelihood of an association or difference between the groups or variables being compared. This value is compared against a critical value from the chi-squared distribution to determine statistical significance.

null_distribution <- Norway_data %>%
  specify(explanatory = impcntr,
          response = hlthcef) %>%
  hypothesize(null = "independence") %>%
  generate(reps = 1000, type = "permute") %>% 
  calculate(stat = "chisq")

## Warning: Removed 14516 rows containing missing values.

p_val <- null_distribution %>% 
  get_pvalue(obs_stat = test_stat, direction = "two-sided") 

p_val

## # A tibble: 1 × 1
##   p_value
##     <dbl>
## 1   0.726

P-Value Calculation: Determining the p-value to assess the significance of the observed relationship. A high p-value (0.724) suggests that the null hypothesis of independence cannot be rejected at the standard alpha level of 0.05. The p-value is 0.724, which is much higher than the typical alpha level of 0.05. The large p-value means that we fail to reject the null hypothesis that the two variables are independent of each other. In other words, there isn’t strong evidence to suggest that there’s a significant association between the [impcntr]Allow many/few immigrants from poorer countries outside Europe and [hlthcef] Provision of health care, how efficient

alpha <- 0.05 
c_val <- qchisq(alpha, df = 1, lower.tail = FALSE)
cat("Critical Value:", round(c_val, 3), "\n")

## Critical Value: 3.841

These few stpes is to know the critical value and p value and the Chi-squared test result To understand our hypothesis, and the null distribution.he critical value at the 0.05 significance level is 3.841. The Chi-squared test statistic that compute is 57.00027, which is significantly higher than the critical value. This suggests a rejection of the null hypothesis. This would mean that there is a statistically significant association between [impcntr]Allow many/few immigrants from poorer countries outside Europe and [hlthcef] Provision of health care, how efficient

null_distribution %>%
  visualize() +
  shade_p_value(obs_stat = test_stat, direction = "two-sided")

## Warning: Chi-Square usually corresponds to right-tailed tests. Proceed with
## caution.

Interval: Generating visualizations of the null distribution, highlighting the p-value and confidence intervals to contextualize the chi-squared test result.

null_distribution

## Response: hlthcef (factor)
## Explanatory: impcntr (factor)
## Null Hypothesis: independence
## # A tibble: 1,000 × 2
##    replicate  stat
##        <int> <dbl>
##  1         1  53.5
##  2         2  62.8
##  3         3  33.0
##  4         4  42.0
##  5         5  32.5
##  6         6  49.8
##  7         7  48.8
##  8         8  38.6
##  9         9  41.4
## 10        10  43.6
## # ℹ 990 more rows

conf_int <- null_distribution %>%
  get_confidence_interval(level = 0.95, type = "percentile")


null_distribution %>%
  visualize() +
  shade_p_value(obs_stat = test_stat, direction = "two-sided") +
  shade_confidence_interval(endpoints = conf_int)

## Warning: Chi-Square usually corresponds to right-tailed tests. Proceed with
## caution.

The histogram shows a distribution of simulated chi-squared statistics under the null hypothesis. With four blocks indicating different frequencies, the distribution is uneven. The second unit of ‘stat’ with a count around 880 is particularly notable, suggesting a concentration of simulated values in this range. The first unit of stat has almost zero count, the third unit of stat has a count of about 62.5，and the fourth unit of stat also has a very small count, probably around 10. Given the p-value of 0.724 and a critical value of 3.841, the observed statistic from your actual data is not extreme enough to reject the null hypothesis of independence. This means that there isn’t strong evidence to suggest a significant association between immigration attitudes (impcntr) and health care efficiency perceptions (hlthcef). For the overall Implication，the results suggest that the probability distribution of views on health care efficiency does not significantly vary across different attitudes towards immigration, at least not to a degree detectable by this test.

plot <- null_distribution %>%
  visualize() +
  shade_p_value(obs_stat = test_stat, direction = "two-sided") +
  shade_confidence_interval(endpoints = conf_int)

ggsave("output_hypothesis_plot.pdf", plot, width = 8, height = 6)

## Warning: Chi-Square usually corresponds to right-tailed tests. Proceed with
## caution.

Here to save the Pdf file

null_distribution

## Response: hlthcef (factor)
## Explanatory: impcntr (factor)
## Null Hypothesis: independence
## # A tibble: 1,000 × 2
##    replicate  stat
##        <int> <dbl>
##  1         1  53.5
##  2         2  62.8
##  3         3  33.0
##  4         4  42.0
##  5         5  32.5
##  6         6  49.8
##  7         7  48.8
##  8         8  38.6
##  9         9  41.4
## 10        10  43.6
## # ℹ 990 more rows

Finding

Regression

Now we are doing the regression for the finding

df <- df %>% filter(!is.na(hlthcef))
df <- df %>% filter(!is.na(impcntr))
df <- df %>% filter(!is.na(stfeco))

Data Preparation: Filtering out missing values from variables hlthcef, impcntr, and stfeco.

This steps took us the longest time capare the other, because is keeping showing error

df$weight <- df$dweight * df$pweight

model1 <- lm(hlthcef ~ impcntr, data = df, weights = weight)
model2 <- lm(hlthcef ~ impcntr + stfeco, data = df, weights = weight)
model3 <- lm(hlthcef ~ impcntr + stfeco + impcntr * stfeco, data = df, weights = weight)

modelsummary(
    list(model1, model2, model3),
    fmt = 1,
    estimate = c("{estimate} ({std.error}){stars}",
                 "{estimate} ({std.error}){stars}",
                 "{estimate} ({std.error}){stars}"),
    statistic = NULL,
    coef_omit = "ei"  # Adjust based on your variable names
)

	(1)	(2)	(3)
(Intercept)	5.9 (0.4)***	4.9 (0.4)***	6.8 (0.6)***
impcntr	−0.1 (0.2)	−0.1 (0.2)	−0.9 (0.2)***
stfeco		0.2 (0.0)***	−0.1 (0.1)*
impcntr × stfeco			0.1 (0.0)***
Num.Obs.	1549	1549	1549
R2	0.000	0.049	0.062
R2 Adj.	−0.001	0.047	0.060
AIC	9620.5	9545.6	9525.6
BIC	9636.6	9566.9	9552.3
Log.Lik.	−4807.273	−4768.778	−4757.793
RMSE	5.39	5.26	5.22

Weight Calculation: Creating a new weight variable by multiplying dweight and pweight. egression Modeling: Constructing three linear models with increasing complexity - a single predictor model, a two-predictor model, and an interaction model involving both predictors.

  modelsummary(
    list(model1, model2, model3),
    fmt = 1,
    estimate = c("{estimate} ({std.error}){stars}",
                 "{estimate} ({std.error}){stars}",
                 "{estimate} ({std.error}){stars}"),
    statistic = NULL,
    coef_omit = "ei",
  coef_rename = c("hlthcef" = "Provision of health care, how efficient", "impcntr" = "Allow many/few immigrants from poorer countries outside Europe", "stfeco" = "How satisfied with present state of economy in country"),
  title = 'Table 3. Regression models predicting health care efficient' )

Table 3. Regression models predicting health care efficient
	(1)	(2)	(3)
(Intercept)	5.9 (0.4)***	4.9 (0.4)***	6.8 (0.6)***
Allow many/few immigrants from poorer countries outside Europe	−0.1 (0.2)	−0.1 (0.2)	−0.9 (0.2)***
How satisfied with present state of economy in country		0.2 (0.0)***	−0.1 (0.1)*
Allow many/few immigrants from poorer countries outside Europe:How satisfied with present state of economy in country			0.1 (0.0)***
Num.Obs.	1549	1549	1549
R2	0.000	0.049	0.062
R2 Adj.	−0.001	0.047	0.060
AIC	9620.5	9545.6	9525.6
BIC	9636.6	9566.9	9552.3
Log.Lik.	−4807.273	−4768.778	−4757.793
RMSE	5.39	5.26	5.22

Model Summary Presentation: Using modelsummary to generate a table with formatted estimates, standard errors, and significance stars. Also, renaming coefficients for clarity and saving the summary as a PDF and text file.

Model 1 (Column 1): Intercept: High significance with an estimate of 5.9. impcntr: Not significant with an estimate of -0.1. R²: Virtually zero, indicating no variance explained by the model. Model 2 (Column 2): Intercept: High significance, lower estimate of 4.9. impcntr: Consistent with Model 1. stfeco: Statistically significant, suggesting economic satisfaction is a predictor of perceived health care efficiency. R²: Improved, 4.9% of variance explained. Model 3 (Column 3): Intercept: Highest estimate at 6.8. impcntr: Now highly significant with a larger negative estimate. stfeco: Remains significant, but the estimate is negative. impcntr × stfeco: The interaction term is significant, indicating a combined effect of immigration attitudes and economic satisfaction on health care efficiency perceptions. R²: Further improved, explaining 6.2% of variance. The number of observations is constant across models, indicating a robust dataset. AIC (Akaike Information Criterion): A measure of the relative quality of a statistical model for a given set of data. A lower AIC suggests a better model. AIC rewards goodness of fit but also includes a penalty that increases with the number of estimated parameters. The AIC decreases from Model 1 to Model 3, suggesting that the models’ fit improves as more variables are added, even after accounting for the increase in complexity.

BIC (Bayesian Information Criterion): Similar to AIC, but with a higher penalty for models with more parameters. Like AIC, a lower BIC indicates a better model. The BIC also decreases from Model 1 to Model 3, indicating improved model fit with the additional variables and interaction term.

Log.Lik. (Log Likelihood): This is the logarithm of the likelihood function, which measures the probability of observing the data given the parameters of the model. The higher the log likelihood (less negative in this case), the better the model fits the data. The Log Likelihood value becomes less negative from Model 1 through Model 3, indicating that the models’ fit to the data improves with additional predictors and interaction terms. The increase in R² and adjusted R² from Model 1 to 3 suggests adding stfeco and the interaction term improves the model’s explanatory power. The AIC and BIC decrease across the models, also indicating an improvement. Significance levels are denoted by stars, with more stars indicating higher significance levels. Overall, all indicators suggest that Model 3, which includes both predictors and their interaction, provides the best fit to the data among the three models presented.

df$weight <- df$dweight * df$pweight

model1 <- lm(hlthcef ~ impcntr, data = df, weights = df$weight)
model2 <- lm(hlthcef ~ impcntr + stfeco, data = df, weights = df$weight)
model3 <- lm(hlthcef ~ impcntr + stfeco + impcntr * stfeco, data = df, weights = df$weight)

summary1 <- modelsummary(
  list(model1, model2, model3),
  fmt = 1,
  estimate = c("{estimate} ({std.error}){stars}",
               "{estimate} ({std.error}){stars}",
               "{estimate} ({std.error}){stars}"),
  statistic = NULL,
  coef_omit = "ei",
  coef_rename = c(
    "hlthcef" = "Provision of health care, how efficient",
    "impcntr" = "Allow many/few immigrants from poorer countries outside Europe",
    "stfeco" = "How satisfied with present state of economy in country"
  ),
  title = 'Table 3. Regression models predicting health care efficient (0 to 100)'
)

cat(summary1, file = "model_summary_regression_rename.pdf")

I can’t open the pdf file on my cumputer, so I saved as txt version.

par(mfrow = c(2,2)) 

plot(model2, pch = 19, col = scales::alpha("grey", .8), cex = .5)

1.Regression Assumptions Diagnostics Clustering of Points and Outliers: The clustering of points within the fitted values of 4-6 could indicate that for a range of predictor values, the response variable has little variation. However, it could also suggest that the model might not be appropriate for the data or that there are missing variables that explain the variance in the response. Most of the almost all of the points are clustered on the far left side of the image fitted value of 4-6, with a few outliers very far from the horizontal line distributed on the far right side of the image fitted value of 16 and fitted value of 18. The presence of a few outliers with extreme fitted values (16 and 18) that are far from the rest of the data points could have a significant impact on the regression line. These outliers could be influential observations that disproportionately affect the model’s coefficients and should be investigated further. The points in the graph are not evenly distributed around a horizontal line, indicating that the model is not linear in its fit, suggesting that perhaps the model is not capturing the linear relationships in the data well. Non-Linear Fit: the points do not scatter randomly around a horizontal line and instead show a pattern, this could indicate that a linear model is not the best fit for the data. Non-linearity suggests the relationship between the predictors and the response variable might be better captured by a non-linear model. Heteroscedasticity: The irregular pattern of variance as the fitted values increase or decrease suggests heteroscedasticity, meaning the residuals have non-constant variance. This violates one of the key assumptions of linear regression, which can affect the validity of the model’s standard errors and, consequently, any inferences made from the model. Observations with points farther from the horizon may be outliers, and the pattern of points may reflect model inaccuracies in certain regions. Examining these points may help identify outliers in the data or possible model improvements.

2.Q-Q Plot it is used to assess the normality of residuals. Deviations of the plotted points from the straight line (45-degree line) indicate violations of normality. In this plot, the degree of deviation from the line reflects how much the distribution of residuals diverges from a normal distribution. In this Q-Q Plot General Trend: In this plot, the deviation from the straight line reflects how far the residual distribution deviates from the normal distribution. The point distribution in this image is a very gentle straight line, not arranged along the diagonal, but a straight line approximate to standardized residuals=0, but with a slight upward trend. The fact that points form a gentle straight line but not along the diagonal (45-degree line) could suggest that the residuals have a systematic deviation from normality. The upward trend of the line might imply a distribution with heavier tails than the normal distribution. Outliers: Most of the points are distributed on this gentle straight line, and some outliers are distributed on the rightmost side of the image where the theoretical quantiles is 3. Compared with most standardized residuals floating around 0, the standardized residuals of these outliers are between 12-15. The presence of outliers indicates extreme values that do not conform to the expected distribution under normality. These points are significantly deviating from the expected line, suggesting the presence of heavy tails or skewness in the distribution of residuals. The Q-Q Plot suggests that the normality assumption may not hold for the residuals of the regression model, which can impact the reliability of certain statistical tests and confidence intervals that assume normality. These findings might necessitate the use of robust statistical methods or transformations to address the non-normality of residuals.

3.Scale-Location Plot: This plot checks the homoscedasticity assumption, meaning the residuals have constant variance across all levels of fitted values. The y-axis shows the square root of the absolute standardized residuals, which should be spread randomly and evenly along the y-axis for all fitted values on the x-axis. U-shaped Distribution: The concave U-shaped pattern in the residuals within the interval of 4-8 on the fitted values suggests that the variance of the residuals is not constant. This pattern is indicative of heteroscedasticity, where the error variance changes at different levels of the predictor variable. Linearly Increasing Variance: A uniform rising linear distribution in the interval of 8-12 implies that the variance of the residuals is increasing with the fitted values. This again suggests heteroscedasticity, as the assumption of constant variance (homoscedasticity) across all levels of fitted values is violated. Clustering of Points: Similar to the Residuals vs Fitted Plot, the clustering of points at the far left with fitted values between 4-6 indicates that the variance is smaller in this range of fitted values. However, the presence of outliers with large residuals at fitted values of 16 and 18 on the far right indicates that variance may be much larger for these points. Given these indications, the homoscedasticity assumption of linear regression does not hold, and this might affect the standard errors of the regression coefficients, leading to unreliable hypothesis tests and confidence intervals. To address this, transformations of the response variable or the use of heteroscedasticity-consistent standard errors might be considered. Additionally, further investigation into the cause of the heteroscedasticity could provide insights into the underlying data structure or potential data issues.

Residuals vs. Leverage In this figure, within the interval from Leverage=0 to Leverage=0.3, standard Residuals is basically a horizontal straight line with a value of 0. The straight line begins to fold downward at Leverage=0.3, forming a slight downward trend, but the amplitude is very, very gentle. Normal Range of Leverage: For the majority of the data (Leverage=0 to 0.3), the residuals are distributed around the horizontal line at 0, which is a good indication that most data points do not have an undue influence on the model’s predictions. Beginning of Influence: The slight downward trend beginning at Leverage=0.3 may indicate that as the leverage of observations increases, their influence on the model also begins to increase, albeit gently. Potential High Leverage Points: Observations with both high residuals and high leverage are of particular concern because they can unduly influence the regression analysis. These are the points that could be potential outliers or influential points. However, there is no specific high leverage points that show a significant deviation from the trend, which suggests there may not be individual observations exerting excessive influence on the model. In summary, this plot is crucial for identifying data points that might have an outsized impact on the regression model due to their high leverage, and it seems that no such points are present, or if they are, their impact is minimal.

par(mfrow = c(2, 2)) 


pdf("diagnostic_plots.pdf")


plot(model2, pch = 19, col = scales::alpha("grey", .8), cex = .5)


dev.off()

## quartz_off_screen 
##                 2

Interaction

Here, we add some new packages for our interaction graph

packages <- c("effects", "survey", "MASS", "equatiomatic") 

# Install packages if they aren't installed already
new_packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)

# Load the packages
lapply(packages, library, character.only = TRUE)

## Loading required package: carData

## lattice theme set by effectsTheme()
## See ?effectsTheme for details.

## Loading required package: grid

## Loading required package: Matrix

## 
## Attaching package: 'Matrix'

## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack

## Loading required package: survival

## 
## Attaching package: 'survey'

## The following object is masked from 'package:graphics':
## 
##     dotchart

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:dplyr':
## 
##     select

## [[1]]
##  [1] "effects"      "carData"      "fstcore"      "broom"        "infer"       
##  [6] "flextable"    "questionr"    "ggridges"     "rmarkdown"    "knitr"       
## [11] "viridis"      "viridisLite"  "fst"          "RColorBrewer" "modelsummary"
## [16] "lubridate"    "forcats"      "stringr"      "dplyr"        "purrr"       
## [21] "readr"        "tidyr"        "tibble"       "ggplot2"      "tidyverse"   
## [26] "stats"        "graphics"     "grDevices"    "utils"        "datasets"    
## [31] "methods"      "base"        
## 
## [[2]]
##  [1] "survey"       "survival"     "Matrix"       "grid"         "effects"     
##  [6] "carData"      "fstcore"      "broom"        "infer"        "flextable"   
## [11] "questionr"    "ggridges"     "rmarkdown"    "knitr"        "viridis"     
## [16] "viridisLite"  "fst"          "RColorBrewer" "modelsummary" "lubridate"   
## [21] "forcats"      "stringr"      "dplyr"        "purrr"        "readr"       
## [26] "tidyr"        "tibble"       "ggplot2"      "tidyverse"    "stats"       
## [31] "graphics"     "grDevices"    "utils"        "datasets"     "methods"     
## [36] "base"        
## 
## [[3]]
##  [1] "MASS"         "survey"       "survival"     "Matrix"       "grid"        
##  [6] "effects"      "carData"      "fstcore"      "broom"        "infer"       
## [11] "flextable"    "questionr"    "ggridges"     "rmarkdown"    "knitr"       
## [16] "viridis"      "viridisLite"  "fst"          "RColorBrewer" "modelsummary"
## [21] "lubridate"    "forcats"      "stringr"      "dplyr"        "purrr"       
## [26] "readr"        "tidyr"        "tibble"       "ggplot2"      "tidyverse"   
## [31] "stats"        "graphics"     "grDevices"    "utils"        "datasets"    
## [36] "methods"      "base"        
## 
## [[4]]
##  [1] "equatiomatic" "MASS"         "survey"       "survival"     "Matrix"      
##  [6] "grid"         "effects"      "carData"      "fstcore"      "broom"       
## [11] "infer"        "flextable"    "questionr"    "ggridges"     "rmarkdown"   
## [16] "knitr"        "viridis"      "viridisLite"  "fst"          "RColorBrewer"
## [21] "modelsummary" "lubridate"    "forcats"      "stringr"      "dplyr"       
## [26] "purrr"        "readr"        "tidyr"        "tibble"       "ggplot2"     
## [31] "tidyverse"    "stats"        "graphics"     "grDevices"    "utils"       
## [36] "datasets"     "methods"      "base"

Interaction Analysis: Using effect from the effects package to visualize the interaction effect in the third model and saving this plot as a PDF. For the visualization, we choose to use the interaction graph on tutorial

interaction_plot <- effect("impcntr * stfeco", model3, na.rm=TRUE)

plot(interaction_plot,
     main="Interaction effect",
     xlab="Allow many/few immigrants from poorer countries outside Europ",
     ylab="Provision of health care, how efficien")

The interaction plot indicates a clear interaction between the explanatory variable (attitudes towards immigration), the outcome variable (perceived health care efficiency), and a third variable (economic satisfaction, or stfeco). Dependence on a Third Variable: The interaction effect indicates that the impact of immigration attitudes on health care efficiency perceptions is modulated by the level of economic satisfaction (stfeco). This suggests that the effect of one predictor on the outcome is not constant but changes with the level of another predictor. Interaction Effect: The non-parallel nature of the lines suggests that there is indeed an interaction between the two independent variables (impcntr and stfeco). Slope Changes: As stfeco decreases, the slope of the line relating immigration attitudes to health care efficiency also decreases. This suggests that higher economic satisfaction amplifies the effect of immigration attitudes on perceived health care efficiency. Error Margin: The error margin decreasing with stfeco implies that predictions are more precise at higher levels of economic satisfaction. Varying Error Range: The increasing range of possible errors with the level of immigration attitudes indicates greater uncertainty in health care efficiency perceptions among those with stronger attitudes toward immigration, either for or against. Specific Observations: For stfeco=90, the steep slope implies a strong positive relationship between immigration attitudes and health care efficiency at this high level of economic satisfaction. For stfeco=70 and stfeco=40, the positive relationship persists but weakens as stfeco decreases. At stfeco=20, the relationship flattens, suggesting that at this lower level of economic satisfaction, immigration attitudes have little to no effect on health care efficiency perceptions. At stfeco=0, the relationship inverts, with a downward trend suggesting that when economic satisfaction is very low, more positive attitudes towards immigration might correlate with lower perceived health care efficiency. Radial Pattern: Overlapping the five lines would likely show a fan-like spread, which is characteristic of interaction effects where the impact of one variable changes at different levels of another. This interaction plot provides valuable insights into how economic satisfaction modifies the relationship between attitudes towards immigration and perceptions of health care efficiency. It suggests that policymakers should consider economic satisfaction when assessing public opinion on health care and immigration policies.

interaction_plot <- effect("impcntr * stfeco", model3, na.rm = TRUE)

pdf("interaction_plot.pdf")

plot(interaction_plot,
     main = "Interaction effect",
     xlab = "Allow many/few immigrants from poorer countries outside Europe",
     ylab = "Provision of health care, how efficient")

dev.off()

## quartz_off_screen 
##                 2

interaction_plot

## 
##  impcntr*stfeco effect
##        stfeco
## impcntr           0        20        40        70        90
##       1  5.99024195  5.616386  5.242531  4.681747  4.307892
##       3  4.27356385  8.216906 12.160248 18.075261 22.018603
##       4  3.41522480  9.517166 15.619107 24.772018 30.873959
##       6  1.69854670 12.117685 22.536824 38.165532 48.584671
##       8 -0.01813141 14.718205 29.454542 51.559046 66.295383

Here is the equation

equatiomatic::extract_eq(model3, use_coefs = TRUE)

\[ \operatorname{\widehat{hlthcef}} = 6.85 - 0.86(\operatorname{impcntr}) - 0.13(\operatorname{stfeco}) + 0.11(\operatorname{impcntr} \times \operatorname{stfeco}) \]

Conclusion: Our study explored the complex relationship between the acceptance of immigrants from poorer countries and healthcare efficiency in Norway. Contrary to the initial hypothesis, our findings, supported by chi-squared test statistics and visual representations, suggest no substantial evidence to affirm that Norway’s acceptance of immigrants significantly influences the efficiency of its healthcare system. Through rigorous statistical analysis, we aimed to uncover whether the integration of immigrants correlates with healthcare performance, underpinned by a nuanced consideration of Norway’s socio-economic satisfaction.

Despite the chi-squared test yielding a value significantly higher than the critical value, the high p-value implied a failure to reject the null hypothesis, indicating no discernible association between immigration attitudes and healthcare efficiency. This finding challenges the prevailing discourse that links immigration directly to the burdening of healthcare systems. Instead, it underscores the need for a nuanced understanding of healthcare dynamics amidst demographic changes. The regression models further nuanced our understanding, revealing that economic satisfaction significantly interacts with immigration attitudes, impacting healthcare efficiency perceptions. However, diagnostic plots raised concerns over potential non-linearity and heteroscedasticity, suggesting that a linear model may not fully capture the complexities of the data. The literature emphasized that Sub-Saharan African (SSA) immigrants in Norway face barriers both prior to and within the healthcare system. Challenges such as language barriers, financial constraints, long waiting times, and perceived discrimination impede equal healthcare access. Our study echoes these findings, highlighting that healthcare access is influenced by more than just affordability—it is also shaped by awareness, cultural sensitivity, and systemic responsiveness. The preference for healthcare providers with immigrant backgrounds and private healthcare services among SSA immigrants indicates a mistrust in the public healthcare system and a lack of cultural competence among healthcare professionals. This reflects a broader issue of inequity and suggests a need for policies that ensure healthcare professionals are adequately trained in cultural sensitivity. The interaction model, however, illuminated a nuanced picture: economic satisfaction plays a critical role in moderating the relationship between immigration attitudes and healthcare efficiency. This finding aligns with the intersectionality approach highlighted in our introduction, which emphasizes the importance of multiple, intersecting social categories in shaping experiences within healthcare systems. Our study underscores the complexity of such intersections and contributes to the broader conversation by suggesting that economic satisfaction cannot be sidelined when considering immigration policies’ impact on healthcare. Conversely, the ethnic boundary-making theory, which posits that immigrants’ healthcare experiences are shaped by interactions within the healthcare system and societal attitudes, was not directly supported by our regression models. While our study does not disconfirm this theory, it points to the possibility that the mechanisms of ethnic boundary-making are more subtly embedded in healthcare interactions than our data could reveal. Situated against the backdrop of ethnic boundary-making and intersectionality theories, our findings offer a mixed perspective. The lack of a direct correlation does not fully align with the ethnic boundary-making theory, which would anticipate clearer divisions in healthcare experiences based on immigration status. However, the interaction effects resonate with intersectionality, affirming that economic satisfaction could be a pivotal factor in shaping healthcare outcomes amidst demographic shifts.

In a broader social context, our study’s results contribute to an understanding of how national healthcare systems intersect with immigration policies and societal well-being. Specifically, our findings suggest that in Norway, a country acclaimed for its universal healthcare and high living standards, the acceptance of immigrants from poorer countries does not have a detectable direct impact on the efficiency of healthcare services. This counters a common narrative in public discourse that immigration may strain or diminish the quality of healthcare.

Our results also highlight the significance of economic satisfaction in shaping perceptions of healthcare efficiency. In times of high economic satisfaction, there appears to be a stronger link between positive attitudes toward immigration and perceived healthcare efficiency. This reflects a societal tendency to view immigration more favorably during periods of economic prosperity, which may, in turn, influence the perceived performance of public services.

Regarding Norway, our study adds to the conversation on the country’s ability to maintain its welfare standards amidst demographic changes. Norway serves as a pertinent case for examining the effects of immigration in a welfare state, given its high standards of public services and its ongoing adaptation to a more diverse population.

The study has several limitations that could guide future research. The potential non-linearity and heteroscedasticity in the data suggest that the relationships between variables might be more complex than what can be captured by linear models. Further research could employ nonlinear modeling or machine learning techniques to uncover these complex patterns. Additionally, qualitative research could provide more profound insights into individual experiences within the healthcare system, which may reveal more about the nuances that quantitative data alone cannot capture.

Moreover, exploring other contexts, such as comparing Norway’s experience with other countries that have different immigration and healthcare policies, could offer a comparative perspective on how different systems interact with demographic changes. Such comparative studies could help identify best practices that could be adopted by various countries to maintain healthcare efficiency in the face of changing populations.

In summary, our study contributes to the nuanced understanding of immigration’s implications for public healthcare systems, indicating that policy and societal attitudes towards immigrants can significantly influence public health outcomes. It also add a layer to our understanding of social dynamics in Norway, highlighting how societal satisfaction with the economy might influence perceptions of public services in the context of immigration. This is particularly salient for Norway, a country grappling with integrating an increasingly diverse population while maintaining high standards of public service. It underscores the importance of considering economic satisfaction and other societal factors when evaluating the impact of immigration on healthcare services, thus providing a comprehensive perspective that goes beyond the simplistic associations often presented in public debates. While our study suggests no direct relationship between immigration acceptance and healthcare efficiency, it highlights the significance of economic satisfaction in this dynamic and contributes to a nuanced understanding of how societal factors intersect in the context of public service provision. Further research in this area could illuminate the complexities of policy-making in a globalized world and support the creation of inclusive, equitable healthcare systems.

Yilin_Zhou_Research_Poster

2023-12-10