Provide code and answer.
Prompt and question: calculate the average for the variable ‘happy’ for the country of Norway. On average, based on the ESS data, who reports higher levels of happiness: Norway or Belgium?
Note: we already did it for Belgium. You just need to compare to Norway’s average, making sure to provide the code for both.
# List of packages
packages <- c("tidyverse", "fst", "modelsummary") # add any you need here
# Install packages if they aren't installed already
new_packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)
# Load the packages
lapply(packages, library, character.only = TRUE)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## [[1]]
## [1] "lubridate" "forcats" "stringr" "dplyr" "purrr" "readr"
## [7] "tidyr" "tibble" "ggplot2" "tidyverse" "stats" "graphics"
## [13] "grDevices" "utils" "datasets" "methods" "base"
##
## [[2]]
## [1] "fst" "lubridate" "forcats" "stringr" "dplyr" "purrr"
## [7] "readr" "tidyr" "tibble" "ggplot2" "tidyverse" "stats"
## [13] "graphics" "grDevices" "utils" "datasets" "methods" "base"
##
## [[3]]
## [1] "modelsummary" "fst" "lubridate" "forcats" "stringr"
## [6] "dplyr" "purrr" "readr" "tidyr" "tibble"
## [11] "ggplot2" "tidyverse" "stats" "graphics" "grDevices"
## [16] "utils" "datasets" "methods" "base"
belgium_data <- read.fst("belgium_data.fst")
norway_data <- read.fst("norway_data.fst")
belgium_happy <- belgium_data %>%
filter(cntry == "BE") %>%
select(happy)
belgium_happy$y <- belgium_happy$happy
table(belgium_happy$y)
##
## 0 1 2 3 4 5 6 7 8 9 10 77 88 99
## 50 27 104 194 234 830 999 3503 6521 3402 1565 3 16 3
# need to remove 77, 88, 99
# Recode values 77 through 99 to NA
belgium_happy$y[belgium_happy$y %in% 77:99] <- NA
# checking again
table(belgium_happy$y)
##
## 0 1 2 3 4 5 6 7 8 9 10
## 50 27 104 194 234 830 999 3503 6521 3402 1565
norway_happy <- norway_data %>% # note: since I work from norway_data, I replaced "ess" with norway_data
filter(cntry == "NO") %>%
select(happy)
norway_happy$y <- norway_happy$happy
table(norway_happy$y)
##
## 0 1 2 3 4 5 6 7 8 9 10 77 88
## 15 29 59 163 238 730 817 2617 5235 3796 2344 12 10
# need to remove 77, 88
# Recode values 77 through 99 to NA
norway_happy$y[norway_happy$y %in% 77:88] <- NA
# checking again
table(norway_happy$y)
##
## 0 1 2 3 4 5 6 7 8 9 10
## 15 29 59 163 238 730 817 2617 5235 3796 2344
As the outcomes show us, the Norway has higher happiness average when compared to the Belgium.(7.975 > 7.737)
mean_y <- mean(belgium_happy$y, na.rm = TRUE)
cat("Mean of 'y' is:", mean_y, "\n")
## Mean of 'y' is: 7.737334
mean_y <- mean(norway_happy$y, na.rm = TRUE)
cat("Mean of 'y' is:", mean_y, "\n")
## Mean of 'y' is: 7.975005
As the outcomes show us, the Norway has higher happiness average when compared to the Belgium.(7.975 > 7.737)
Provide code and answer.
Prompt and question: what is the most common category selected, for Irish respondents, for frequency of binge drinking? The variable of interest is: alcbnge.
More info here: https://ess-search.nsd.no/en/variable/0c65116e-7481-4ca6-b1d9-f237db99a694.
Hint: need to convert numeric value entries to categories as specified in the variable information link. We did similar steps for Estonia and the climate change attitude variable.
ireland_data <- read.fst("ireland_data.fst")
ireland_alcbnge <- ireland_data %>%
filter(cntry == "IE") %>%
select(alcbnge)
ireland_alcbnge$y <- ireland_alcbnge$alcbnge
table(ireland_alcbnge$y)
##
## 1 2 3 4 5 6 7 8
## 65 650 346 417 239 641 26 6
# Converting to categories to get mode as a category instead of a number
df <- ireland_alcbnge %>%
mutate(
y_category = case_when(
y == 1 ~ "Daily or almost daily",
y == 2 ~ "Weekly",
y == 3 ~ "Monthly",
y == 4 ~ "Less than monthly",
y == 5 ~ "Never",
TRUE ~ NA_character_
),
y_category = fct_relevel(factor(y_category), ### here we put the categories in order we want them to appear
"Daily or almost daily",
"Weekly",
"Monthly",
"Less than monthly",
"Never")
)
# To confirm the conversion:
table(df$y_category)
##
## Daily or almost daily Weekly Monthly
## 65 650 346
## Less than monthly Never
## 417 239
The most common category selected, for Irish respondents, for frequency of binge drinking is “Weekly (650)”
Provide code and answer.
Prompt and question: when you use the summary() function for the variable plnftr (about planning for future or taking every each day as it comes from 0-10) for both the countries of Portugal and Serbia, what do you notice? What stands out as different when you compare the two countries (note: look up the variable information on the ESS website to help with interpretation)? Explain while referring to the output generated.
portugal_data <- read.fst("portugal_data.fst")
portugal_plnftr <- portugal_data %>%
filter(cntry == "PT") %>%
select(plnftr)
portugal_plnftr$y <- portugal_plnftr$plnftr
table(portugal_plnftr$y)
##
## 0 1 2 3 4 5 6 7 8 9 10 88
## 114 184 313 356 264 481 262 382 345 166 370 40
# need to remove 77, 88
# Recode values 77 through 99 to NA
portugal_plnftr$y[portugal_plnftr$y %in% 77:88] <- NA
# checking again
table(portugal_plnftr$y)
##
## 0 1 2 3 4 5 6 7 8 9 10
## 114 184 313 356 264 481 262 382 345 166 370
summary(portugal_plnftr)
## plnftr y
## Min. : 0.000 Min. : 0.000
## 1st Qu.: 3.000 1st Qu.: 3.000
## Median : 5.000 Median : 5.000
## Mean : 6.426 Mean : 5.418
## 3rd Qu.: 8.000 3rd Qu.: 8.000
## Max. :88.000 Max. :10.000
## NA's :14604 NA's :14644
Here above we can see the data results for Portugal. Now let’s have a look at the Serbia’s case:
serbia_data <- read.fst("serbia_data.fst")
serbia_plnftr <- serbia_data %>%
filter(cntry == "RS") %>%
select(plnftr)
serbia_plnftr$y <- serbia_plnftr$plnftr
table(serbia_plnftr$y)
##
## 0 1 2 3 4 5 6 7 8 9 10 77 88
## 587 133 152 138 95 246 70 87 103 47 364 4 17
# need to remove 77, 88
# Recode values 77 through 99 to NA
serbia_plnftr$y[serbia_plnftr$y %in% 77:88] <- NA
# checking again
table(serbia_plnftr$y)
##
## 0 1 2 3 4 5 6 7 8 9 10
## 587 133 152 138 95 246 70 87 103 47 364
summary(serbia_plnftr)
## plnftr y
## Min. : 0.000 Min. : 0.000
## 1st Qu.: 0.000 1st Qu.: 0.000
## Median : 4.000 Median : 4.000
## Mean : 4.983 Mean : 4.143
## 3rd Qu.: 8.000 3rd Qu.: 8.000
## Max. :88.000 Max. :10.000
## NA's :1505 NA's :1526
So according to these results, we can deduce that Serbia is more inclined to plan for future. And Portugal takes each day as it comes more than Serbia. The mean value, that is, average plnftr score of the people surveyed is 5.418 (for Portugal) on a scale of 0 to 10. This means that, on average, people tend to report their plnftr level closer to the ‘I just take each day as it comes’ end of the scale. The average plnftr score of the people surveyed is 4.143 (for Serbia) on a scale of 0 to 10. This means that, on average, people tend to report their plnftr level closer to the ‘planning the future’ end of the scale, as 10 represents “I just take each day as it comes” and 0 represents “I plan for my future as muhc possible.”
Provide code and answer.
Prompt and question: using the variables stfdem and gndr, answer the following: on average, who is more dissastified with democracy in Italy, men or women? Explain while referring to the output generated.
Info on variable here: https://ess.sikt.no/en/variable/query/stfdem/page/1
We want to compare the average outcome relative to a second variable.
First, let’s deal with both our variables of interest after filtering to Italy
italy_data <- read.fst("italy_data.fst")
italy_data <- italy_data %>%
filter(cntry == "IT")
# Convert gender and stfdem (representing satisfaction with democracy)
italy_data <- italy_data %>%
mutate(
gndr = case_when(
gndr == 1 ~ "Male",
gndr == 2 ~ "Female",
TRUE ~ as.character(gndr)
),
stfdem = ifelse(stfdem %in% c(77, 88), NA, stfdem) # Convert stfdem values
)
# Compute mean for male
mean_male_stfdem <- italy_data %>%
filter(gndr == "Male") %>%
summarize(mean_stfdem_men = mean(stfdem, na.rm = TRUE))
print(mean_male_stfdem)
## mean_stfdem_men
## 1 4.782646
# Compute average of stfdem by gender
means_by_gender <- italy_data %>%
group_by(gndr) %>% # here we are "grouping by" our second variable
summarize(stfdem = mean(stfdem, na.rm = TRUE)) # here we are summarizing our variable of interest
print(means_by_gender)
## # A tibble: 3 × 2
## gndr stfdem
## <chr> <dbl>
## 1 9 3.25
## 2 Female 4.69
## 3 Male 4.78
Females are more dissastified with democracy in Italy, since on a scale of 0 to 10 where 0 means “Extremly dissatisfied”. Thus, we can conclude that from the results of:
Female outcome 4.694 closer to 0 than the Male outcome 4.782.
Provide code and answer.
Prompt: Interpret the boxplot graph of stfedu and stfhlth that we generated already: according to ESS data, would we say that the median French person is more satisfied with the education system or health services? Explain.
Change the boxplot graph: provide the code to change some of the key labels: (1) Change the title to: Boxplot of satisfaction with the state of education vs. health services; (2) Remove the x-axis label; (3) Change the y-axis label to: Satisfaction (0-10).
Hint: copy the boxplot code above and just replace or cut what is asked.
france_data <- read.fst("france_data.fst")
france_data %>%
# Setting values to NA
mutate(stfedu = ifelse(stfedu %in% c(77, 88, 99), NA, stfedu),
stfhlth = ifelse(stfhlth %in% c(77, 88, 99), NA, stfhlth)) %>%
# Reshaping the data
select(stfedu, stfhlth) %>%
gather(variable, value, c(stfedu, stfhlth)) %>%
# Creating the boxplot
ggplot(aes(x = variable, y = value)) +
geom_boxplot() +
labs(y = "Satisfaction (0-10)", x= " ", title = "Boxplot of Satisfaction with the State of Education vs. Health Services") +
theme_minimal()
## Warning: Removed 364 rows containing non-finite values (`stat_boxplot()`).
Here in this graph the x-axis represents the variables, the y-axis represents the values (of the satisfaction scale)
We’d say that the median French person is more satisfied with the health services than education services. Because the bold line in the boxes demonstrates the median of the satisfaction, thus bold line is higher in stfhlth (the median is around 7) than stfedu (around 5) (Plus, from the scale 0 to 10, 10 means Extremeley satisfied).