official homework submission (writing this so it doesnt get lost, i made like 15 different homeworks.)

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

copy paste of task 1-5

#Provide code and answer.

#Prompt and question: calculate the average for the variable ‘happy’ for the country of Norway. On average, based on the ESS data, who reports higher levels of happiness: Norway or Belgium?

#Note: we already did it for Belgium. You just need to compare to Norway’s average, making sure to provide the code for both.

Task 2

#Provide code and answer.

#Prompt and question: what is the most common category selected, for Irish respondents, for frequency of binge drinking? The variable of interest is: alcbnge.

#More info here: https://ess-search.nsd.no/en/variable/0c65116e-7481-4ca6-b1d9-f237db99a694.

#Hint: need to convert numeric value entries to categories as specified in the variable information link. We did similar steps for Estonia and the climate change attitude variable.

Task 3

#Provide code and answer.

#Prompt and question: when you use the summary() function for the variable plnftr (about planning for future or taking every each day as it comes from 0-10) for both the countries of Portugal and Serbia, what do you notice? What stands out as different when you compare the two countries (note: look up the variable information on the ESS website to help with interpretation)? Explain while referring to the output generated.

Task 4

#Provide code and answer.

#Prompt and question: using the variables stfdem and gndr, answer the following: on average, who is more dissastified with democracy in Italy, men or women? Explain while referring to the output generated.

#Info on variable here: https://ess.sikt.no/en/variable/query/stfdem/page/1

Task 5

#Provide code and answer.

#Prompt: Interpret the boxplot graph of stfedu and stfhlth that we generated already: according to ESS data, would we say that the median French person is more satisfied with the education system or health services? Explain.

#Change the boxplot graph: provide the code to change some of the key labels: (1) Change the title to: Boxplot of satisfaction with the state of education vs. health services; (2) Remove the x-axis label; (3) Change the y-axis label to: Satisfaction (0-10).

#Hint: copy the boxplot code above and just replace or cut what is asked.

DOWNLOAD NECESSARY DATA FIRST

# Load necessary packages
# Install and load required packages
if (!requireNamespace("tidyverse", quietly = TRUE)) {
  install.packages("tidyverse")
}
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# List of packages
packages <- c("tidyverse", "fst", "modelsummary") # add any you need here

# Install packages if they aren't installed already
new_packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)

# Load the packages
lapply(packages, library, character.only = TRUE)
## `modelsummary` has built-in support to draw text-only (markdown) tables.
##   To generate tables in other formats, you must install one or more of
##   these libraries:
##   
## install.packages(c(
##     "kableExtra",
##     "gt",
##     "flextable",
##    
##   "huxtable",
##     "DT"
## ))
## 
##   Alternatively, you can set markdown as the default table format to
##   silence this alert:
##   
## config_modelsummary(factory_default = "markdown")
## [[1]]
##  [1] "lubridate" "forcats"   "stringr"   "dplyr"     "purrr"     "readr"    
##  [7] "tidyr"     "tibble"    "ggplot2"   "tidyverse" "stats"     "graphics" 
## [13] "grDevices" "utils"     "datasets"  "methods"   "base"     
## 
## [[2]]
##  [1] "fst"       "lubridate" "forcats"   "stringr"   "dplyr"     "purrr"    
##  [7] "readr"     "tidyr"     "tibble"    "ggplot2"   "tidyverse" "stats"    
## [13] "graphics"  "grDevices" "utils"     "datasets"  "methods"   "base"     
## 
## [[3]]
##  [1] "modelsummary" "fst"          "lubridate"    "forcats"      "stringr"     
##  [6] "dplyr"        "purrr"        "readr"        "tidyr"        "tibble"      
## [11] "ggplot2"      "tidyverse"    "stats"        "graphics"     "grDevices"   
## [16] "utils"        "datasets"     "methods"      "base"
ess <- read_fst("All-ESS-Data.fst")
#belgium_data <- read.fst("belgium_data.fst")
#estonia_data <- read.fst("estonia_data.fst")
#france_data <- read.fst("france_data.fst")
#norway_data <- read.fst("norway_data.fst")
#ireland_data <- read.fst("ireland_data.fst")
#portugal_data <- read.fst("portugal_data.fst")
#serbia_data <- read.fst("serbia_data.fst")
#italy_data <- read.fst("italy_data.fst")

FOR TASK 1

# Code for Belgium

belgium_happy <- ess %>%
  filter(cntry == "BE") %>%
  select(happy)

belgium_happy$y <- belgium_happy$happy

# Recode values 77 through 99 to NA
belgium_happy$y[belgium_happy$y %in% 77:99] <- NA
# Calculate the average for the variable 'happy' for Belgium
mean_belgium_happy <- mean(belgium_happy$y, na.rm = TRUE)

# Calculate the average for the variable 'happy' for Norway
norway_happy <- ess %>%
  filter(cntry == "NO") %>%
  select(happy)

norway_happy$y <- norway_happy$happy

# Recode values 77 through 99 to NA
norway_happy$y[norway_happy$y %in% 77:99] <- NA

mean_norway_happy <- mean(norway_happy$y, na.rm = TRUE)

# Compare averages
cat("Average happiness in Belgium:", mean_belgium_happy, "\n")
## Average happiness in Belgium: 7.737334
cat("Average happiness in Norway:", mean_norway_happy, "\n")
## Average happiness in Norway: 7.975005
# Compare countries
if (mean_belgium_happy > mean_norway_happy) {
  cat("On average, Belgium reports higher levels of happiness than Norway.")
} else if (mean_norway_happy > mean_belgium_happy) {
  cat("On average, Norway reports higher levels of happiness than Belgium.")
} else {
  cat("Belgium and Norway have the same average happiness levels.")
}
## On average, Norway reports higher levels of happiness than Belgium.

FOR TASK 2

ireland_alcbnge <- ess %>%
  filter(cntry == "IE") %>%
  select(alcbnge)

# Determine the mode of the alcbnge category
table_alcbnge <- table(ireland_alcbnge$alcbnge_category)
mode_alcbnge <- names(table_alcbnge)[which.max(table_alcbnge)]

cat("Most common category for frequency of binge drinking in Ireland:", mode_alcbnge, "\n")
## Most common category for frequency of binge drinking in Ireland:
# Code for Ireland
ireland_alcbnge <- ess %>%
  filter(cntry == "IE") %>%
  select(alcbnge)

ireland_alcbnge$alcbnge_category <- case_when(
  ireland_alcbnge$alcbnge == 0 ~ "Never",
  ireland_alcbnge$alcbnge == 1 ~ "Less than monthly",
  ireland_alcbnge$alcbnge == 2 ~ "Monthly",
  ireland_alcbnge$alcbnge == 3 ~ "Weekly",
  ireland_alcbnge$alcbnge == 4 ~ "Daily or almost daily",
  TRUE ~ NA_character_
)

# To confirm the conversion:
table(ireland_alcbnge$alcbnge_category)
## 
## Daily or almost daily     Less than monthly               Monthly 
##                   417                    65                   650 
##                Weekly 
##                   346

FOR TASK 3

# Code for Portugal
portugal_plnftr <- ess %>%
  filter(cntry == "PT") %>%
  select(plnftr)

# Code for Serbia
serbia_plnftr <- ess %>%
  filter(cntry == "RS") %>%
  select(plnftr)

# Summary for Portugal
summary(portugal_plnftr)
##      plnftr      
##  Min.   : 0.000  
##  1st Qu.: 3.000  
##  Median : 5.000  
##  Mean   : 6.426  
##  3rd Qu.: 8.000  
##  Max.   :88.000  
##  NA's   :14604
# Summary for Serbia
summary(serbia_plnftr)
##      plnftr      
##  Min.   : 0.000  
##  1st Qu.: 0.000  
##  Median : 4.000  
##  Mean   : 4.983  
##  3rd Qu.: 8.000  
##  Max.   :88.000  
##  NA's   :1505

FOR TASK 4

# Code for Italy
italy_stfdem <- ess %>%
  filter(cntry == "IT") %>%
  select(stfdem, gndr)

# Recode values 77, 88, 99 to NA for 'stfdem'
italy_stfdem$stfdem[italy_stfdem$stfdem %in% c(77, 88, 99)] <- NA

# Group by gender and calculate mean dissatisfaction
mean_dissatisfaction <- italy_stfdem %>%
  group_by(gndr) %>%
  summarize(mean_stfdem = mean(stfdem, na.rm = TRUE))

# Print the result
print(mean_dissatisfaction)
## # A tibble: 3 × 2
##    gndr mean_stfdem
##   <dbl>       <dbl>
## 1     1        4.78
## 2     2        4.66
## 3     9        3.25

FOR TASK 5

france_data <- ess %>% 
  filter(cntry == "FR")


ggplot(france_data %>%
         mutate(stfedu = ifelse(stfedu %in% c(77, 88, 99), NA, stfedu),
                stfhlth = ifelse(stfhlth %in% c(77, 88, 99), NA, stfhlth)) %>%
         select(stfedu, stfhlth) %>%
         gather(variable, value, c(stfedu, stfhlth)),
       aes(x = variable, y = value)) +
  geom_boxplot() +
  labs(title = "Boxplot of satisfaction with the state of education vs. health services",
       x = "",  # Remove x-axis label
       y = "Satisfaction (0-10)")  # Change y-axis label
## Warning: Removed 364 rows containing non-finite values (`stat_boxplot()`).

# Modified boxplot code
ggplot(france_data %>%
         gather(variable, value, c(stfedu, stfhlth)),
       aes(x = variable, y = value)) +
  geom_boxplot() +
  labs(title = "Boxplot of satisfaction with the state of education vs. health services",
       x = "",  # Remove x-axis label
       y = "Satisfaction (0-10)")

# Add a code chunk to print the explanation
cat("\nBased on the boxplot, it appears that the median satisfaction level for health services (stfhlth) is higher than the median satisfaction level for education (stfedu) in France.")
## 
## Based on the boxplot, it appears that the median satisfaction level for health services (stfhlth) is higher than the median satisfaction level for education (stfedu) in France.