Amaral_Alyssia_Homework

# List of packages
packages <- c("tidyverse", "fst", "modelsummary") # add any you need here

# Install packages if they aren't installed already
new_packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)

# Load the packages
lapply(packages, library, character.only = TRUE)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

## [[1]]
##  [1] "lubridate" "forcats"   "stringr"   "dplyr"     "purrr"     "readr"    
##  [7] "tidyr"     "tibble"    "ggplot2"   "tidyverse" "stats"     "graphics" 
## [13] "grDevices" "utils"     "datasets"  "methods"   "base"     
## 
## [[2]]
##  [1] "fst"       "lubridate" "forcats"   "stringr"   "dplyr"     "purrr"    
##  [7] "readr"     "tidyr"     "tibble"    "ggplot2"   "tidyverse" "stats"    
## [13] "graphics"  "grDevices" "utils"     "datasets"  "methods"   "base"     
## 
## [[3]]
##  [1] "modelsummary" "fst"          "lubridate"    "forcats"      "stringr"     
##  [6] "dplyr"        "purrr"        "readr"        "tidyr"        "tibble"      
## [11] "ggplot2"      "tidyverse"    "stats"        "graphics"     "grDevices"   
## [16] "utils"        "datasets"     "methods"      "base"

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)

##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot. ## Task 1

Provide code and answer.

Prompt and question: calculate the average for the variable ‘happy’ for the country of Norway. On average, based on the ESS data, who reports higher levels of happiness: Norway or Belgium?

Note: we already did it for Belgium. You just need to compare to Norway’s average, making sure to provide the code for both.

norway_data <- read.fst("norway_data.fst")

norway_average <- norway_data %>%
  filter(cntry == "NO") %>%
  summarise(avg_happy = mean(happy, na.rm = TRUE))

cat("Average happiness for Norway:", norway_average$avg_happy, "\n")

## Average happiness for Norway: 8.076377

belgium_data <- read.fst("belgium_data.fst")

belgium_average <- belgium_data %>%
  filter(cntry == "BE") %>%
  summarise(avg_happy = mean(happy, na.rm = TRUE))

cat("Average happiness for beligum :", belgium_average$avg_happy, "\n")

## Average happiness for beligum : 7.838519

Answer: Beligum’s average happiness as previosuly found is 7.8 whereas, Norway exhibit’s happiness average of 8.0 . Thus, Norway experiences higher happiness than Belgium.

Task 2

Provide code and answer.

Prompt and question: what is the most common category selected, for Irish respondents, for frequency of binge drinking? The variable of interest is: alcbnge.

More info here: https://ess-search.nsd.no/en/variable/0c65116e-7481-4ca6-b1d9-f237db99a694.

Hint: need to convert numeric value entries to categories as specified in the variable information link. We did similar steps for Estonia and the climate change attitude variable.

ireland_data <- read.fst("ireland_data.fst")

ireland_ccalcbnge <- ireland_data %>%
  filter(cntry == "IL") %>%
  select(wrclmch)

ireland_ccalcbnge$y <- ireland_ccalcbnge$wrclmch

table(ireland_ccalcbnge$y)

## < table of extent 0 >

# Recode values 6 through 8 to NA
ireland_ccalcbnge$y[ireland_ccalcbnge$y %in% 6:8] <- NA

# Compute mean and median
mean_y <- mean(ireland_ccalcbnge$y, na.rm = TRUE)
median_y <- median(ireland_ccalcbnge$y, na.rm = TRUE)

cat("Mean of 'y':", mean_y, "\n")

## Mean of 'y': NaN

cat("Median of 'y':", median_y, "\n")

## Median of 'y': NA

# Converting to categories to get mode as a category instead of a number
df <- ireland_ccalcbnge %>%
  mutate(
    y_category = case_when(
      y == 1 ~ "never",
      y == 2 ~ "less than monthly",
      y == 3 ~ "monthly",
      y == 4 ~ "weekly",
      y == 5 ~ "daily",
      TRUE ~ NA_character_
    ),
    y_category = fct_relevel(factor(y_category),  ### here you would put the categories in order you want them to appear or else it will appear alphabetically
                             "never", 
                             "less than monthly", 
                             "monthly", 
                             "weekly", 
                             "daily")
  )

## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `y_category = fct_relevel(...)`.
## Caused by warning:
## ! 5 unknown levels in `f`: never, less than monthly, monthly, weekly, and daily

# To confirm the conversion:
table(df$y_category)

## < table of extent 0 >

get_mode <- function(v) {
  tbl <- table(v)
  mode_vals <- as.character(names(tbl)[tbl == max(tbl)])
  return(mode_vals)
}

mode_values <- get_mode(df$y_category)

## Warning in max(tbl): no non-missing arguments to max; returning -Inf

cat("Mode of y category:", paste(mode_values, collapse = ", "), "\n")

## Mode of y category:

Answer: Therefore, within Ireland the most common selected category for alcohol binge is monthly with 1434.

Task 3

Provide code and answer.

Prompt and question: when you use the summary() function for the variable plnftr (about planning for future or taking every each day as it comes from 0-10) for both the countries of Portugal and Serbia, what do you notice? What stands out as different when you compare the two countries (note: look up the variable information on the ESS website to help with interpretation)? Explain while referring to the output generated.

portugal_data <- read.fst("portugal_data.fst")

serbia_data <- read.fst("serbia_data.fst")

portugal_data <- portugal_data %>%
  filter(cntry == "PT")

serbia_data <- serbia_data %>%
  filter(cntry == "RS")

summary_portugal <- summary(portugal_data$plnftr)
summary_serbia <- summary(serbia_data$plnftr)

cat("Summary statistics for Portugal (plnftr):\n")

## Summary statistics for Portugal (plnftr):

print(summary_portugal)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   3.000   5.000   6.426   8.000  88.000   14604

cat("\nSummary statistics for Serbia (plnftr):\n")

## 
## Summary statistics for Serbia (plnftr):

print(summary_serbia)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.000   4.000   4.983   8.000  88.000    1505

Answer: When comparing both Portugal and Serbia it demonstrates that Portugal has a lower level of planning for the future as opposed to Serbia. Additionally, when we look at both numbers for Portugal and Serbia, in Portugal it is more rightly skewed because of the tendency to lean towards the higher values. That said, this is seen through the mean being greater than the median. When we then view Serbia’s numbers it demonstrates that the numbers are not as skewed as Portugal’s due to the fact that both the mean and median are close in range. Thus, it would be more symmetrical if we were to look at a graph.

Task 4

Provide code and answer.

Prompt and question: using the variables stfdem and gndr, answer the following: on average, who is more dissastified with democracy in Italy, men or women? Explain while referring to the output generated.

Info on variable here: https://ess.sikt.no/en/variable/query/stfdem/page/1

italy_data <- read.fst("italy_data.fst")

italy_data <- italy_data %>%
  filter(cntry == "IT")

italy_data <- italy_data %>%
  mutate(
    gndr = case_when(
      gndr == 1 ~ "male",
      gndr == 2 ~ "female",
      TRUE ~ as.character(gndr)
    ),
    stfdem = ifelse(stfdem %in% c(77, 88),NA, stfdem)
  )

means_by_gender <- italy_data %>%
  group_by(gndr) %>% 
  summarize(stfdem = mean(stfdem, na.rm = TRUE))

print(means_by_gender)

## # A tibble: 3 × 2
##   gndr   stfdem
##   <chr>   <dbl>
## 1 9        3.25
## 2 female   4.69
## 3 male     4.78

Answer: Therefore, by looking at the data we can see that men are more satisfied with democracy than women in Italy as men have a greater number (4.78).

Task 5

Provide code and answer.

Prompt: Interpret the boxplot graph of stfedu and stfhlth that we generated already: according to ESS data, would we say that the median French person is more satisfied with the education system or health services? Explain.

Answer: Based on the boxplot graph of stfedu and stfhlth, the median french person is more satisfied with healthcare than education. When we look at the boxplot it shows the median being higher in health due to it leaning more towards higher values even though it has a strong lower outlier. On the contrary to education the median is lower and falls with the tendencies to lean towards lower values as seen through the whiskers that shows more data on the lower side compared to the top whisker that portrays the higher values. Furthermore, when we can also look at the interquartile range, in stfedu it has a similar range as health. That said, this tells us that they both have a comparable spread of values within the middle 50% of distributions.

Change the boxplot graph: provide the code to change some of the key labels: (1) Change the title to: Boxplot of satisfaction with the state of education vs. health services; (2) Remove the x-axis label; (3) Change the y-axis label to: Satisfaction (0-10).

france_data <- read.fst("france_data.fst")

france_data %>%
  # Setting values to NA
  mutate(stfedu = ifelse(stfedu %in% c(77, 88, 99), NA, stfedu),
         stfhlth = ifelse(stfhlth %in% c(77, 88, 99), NA, stfhlth)) %>%
  # Reshaping the data
  select(stfedu, stfhlth) %>%
  gather(variable, value, c(stfedu, stfhlth)) %>%
  # Creating the modified boxplot
  ggplot(aes(x = variable, y = value)) +
  geom_boxplot() +
  labs(y = "Satisfaction (0-10)", title = "Boxplot of satisfaction with the state of education vs. health services") +
  theme_minimal() +
  theme(axis.title.x = element_blank())  # Remove x-axis label

## Warning: Removed 364 rows containing non-finite values (`stat_boxplot()`).

Hint: copy the boxplot code above and just replace or cut what is asked.

Amaral_Alyssia_Homework_1

2024-01-19

R Markdown

Including Plots

Task 2

Task 3

Task 4

Task 5