Salomon_Kayla_Homework

Task 1

Provide code and answer.

Prompt: in the tutorial, we calculated the average trust in others for France and visualized it. Using instead the variable ‘Trust in Parliament’ (trstplt) and the country of Spain (country file provided on course website), visualize the average trust by survey year. You can truncate the y-axis if you wish. Provide appropriate titles and labels given the changes. What are your main takeaways based on the visual (e.g., signs of increase, decrease, or stall)?

packages <- c("tidyverse", "fst", "modelsummary", "viridis")
new_packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)
lapply(packages, library, character.only = TRUE)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Loading required package: viridisLite

## [[1]]
##  [1] "lubridate" "forcats"   "stringr"   "dplyr"     "purrr"     "readr"    
##  [7] "tidyr"     "tibble"    "ggplot2"   "tidyverse" "stats"     "graphics" 
## [13] "grDevices" "utils"     "datasets"  "methods"   "base"     
## 
## [[2]]
##  [1] "fst"       "lubridate" "forcats"   "stringr"   "dplyr"     "purrr"    
##  [7] "readr"     "tidyr"     "tibble"    "ggplot2"   "tidyverse" "stats"    
## [13] "graphics"  "grDevices" "utils"     "datasets"  "methods"   "base"     
## 
## [[3]]
##  [1] "modelsummary" "fst"          "lubridate"    "forcats"      "stringr"     
##  [6] "dplyr"        "purrr"        "readr"        "tidyr"        "tibble"      
## [11] "ggplot2"      "tidyverse"    "stats"        "graphics"     "grDevices"   
## [16] "utils"        "datasets"     "methods"      "base"        
## 
## [[4]]
##  [1] "viridis"      "viridisLite"  "modelsummary" "fst"          "lubridate"   
##  [6] "forcats"      "stringr"      "dplyr"        "purrr"        "readr"       
## [11] "tidyr"        "tibble"       "ggplot2"      "tidyverse"    "stats"       
## [16] "graphics"     "grDevices"    "utils"        "datasets"     "methods"     
## [21] "base"

setwd("/Users/kaylapatricia/Desktop/soc222/homework 2")
spain_data <- read.fst("spain_data.fst")
spain_trstplt <- spain_data %>%
  filter(cntry == "ES") %>% 
  select(trstplt)
spain_trstplt$y <- spain_trstplt$trstplt
table(spain_trstplt$y)

## 
##    0    1    2    3    4    5    6    7    8    9   10   77   88   99 
## 5165 1830 2329 2441 2085 2890 1154  639  355   80   71   46  336   31

spain_trstplt$y[spain_trstplt$y %in% 77:99] <- NA
spain_data <- spain_data %>%
  mutate(
    trstplt = ifelse(trstplt %in% c(77, 88, 99), NA, trstplt),
  )
table(spain_data$trstplt)

## 
##    0    1    2    3    4    5    6    7    8    9   10 
## 5165 1830 2329 2441 2085 2890 1154  639  355   80   71

spain_data$year <- NA
replacements <- c(2002, 2004, 2006, 2008, 2010, 2012, 2014, 2016, 2018, 2020)
for(i in 1:10){
  spain_data$year[spain_data$essround == i] <- replacements[i]
}
table(spain_data$year)

## 
## 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 
## 1729 1663 1876 2576 1885 1889 1925 1958 1668 2283

trstplt_by_year <- spain_data %>%
  group_by(year) %>%
  summarize(mean_trstplt = mean(trstplt, na.rm = TRUE))
trstplt_by_year

## # A tibble: 10 × 2
##     year mean_trstplt
##    <dbl>        <dbl>
##  1  2002         3.41
##  2  2004         3.66
##  3  2006         3.49
##  4  2008         3.32
##  5  2010         2.72
##  6  2012         1.91
##  7  2014         2.23
##  8  2016         2.40
##  9  2018         2.55
## 10  2020         1.94

ggplot(trstplt_by_year, aes(x = year, y = mean_trstplt)) +
  geom_line(color = "blue", size = 1) +  
  geom_point(color = "red", size = 3) + 
  labs(title = "Trust in Parliament in Spain (2002-2020)", 
       x = "Survey Year", 
       y = "Average Trust (0-10 scale)") +
  ylim(0, 10) +  # Setting the y-axis limits from 0 to 10
  theme_minimal()

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

My main takeaways based on this visual are that the level of trust in parliament in Spain has fluctuated, with the lowest level of trust in 2012. Trust in parliament in Spain is lower in every proceeding year from the first survey year, indicating trust in parliament in Spain overall is declining.

Task 2

Provide answer only.

Prompt and question: Based on the figure we produced above called task2_plot, tell us: what are your main takeaways regarding France relative to Italy and Norway? Make sure to be concrete and highlight at least two important comparative trends visualized in the graph.

Two comparative trends visualized in the graph are that France follows a more stable and consistent downwards trend in proportion of respondents saying ‘yes’ to feeling close to a party by cohort, while Norway and Italy follow more fluctuating trends as visualized by the graph. It also shows that Norway has the greatest proportion of respondents indicating they feel close to a party across cohorts, as it has the most shallow slope as compared to the steep, downwards slopes of France and Italy.

Task 3

Provide code and answer.

Question: What is the marginal percentage of Italian men who feel close to a particular political party?

italy_data <- read.fst("italy_data.fst")
italy_data <- italy_data %>%
  mutate(
    gndr = case_when(
      gndr == 1 ~ "Male",
      gndr == 2 ~ "Female",
      TRUE ~ NA_character_  
    ),
    clsprty  = case_when(
      clsprty  %in% 1 ~ "Yes",
      clsprty  %in% 2 ~ "No", 
      TRUE ~ NA_character_
    )    
  )

clsprty_percentages <- italy_data %>%
  filter(!is.na(clsprty), !is.na(gndr)) %>% 
  group_by(gndr, clsprty) %>% 
  summarise(count = n(), .groups = 'drop') %>%  
  mutate(percentage = count / sum(count) * 100) 

clsprty_percentages

## # A tibble: 4 × 4
##   gndr   clsprty count percentage
##   <chr>  <chr>   <int>      <dbl>
## 1 Female No       3228       34.2
## 2 Female Yes      1686       17.9
## 3 Male   No       2593       27.5
## 4 Male   Yes      1936       20.5

The marginal percentage of Italian men who feel close to a particular political party is 20.5%.

Task 4

Provide code and output only.

Prompt: In the tutorial, we calculated then visualized the percentage distribution for left vs. right by gender for France. Your task is to replicate the second version of the visualization but for the country of Sweden instead.

sweden_data <- read.fst("sweden_data.fst")
sweden_data <- sweden_data %>%
  mutate(
    gndr = case_when(
      gndr == 1 ~ "Male",
      gndr == 2 ~ "Female",
      TRUE ~ NA_character_ 
    ),
    lrscale = case_when(
      lrscale %in% 0:3 ~ "Left",
      lrscale %in% 7:10 ~ "Right",
      TRUE ~ NA_character_  
    )    
  )

lrscale_percentages <- sweden_data %>% 
  filter(!is.na(lrscale), !is.na(gndr)) %>% 
  group_by(gndr, lrscale) %>% 
  summarise(count = n(), .groups = 'drop') %>%
  mutate(percentage = count / sum(count) * 100) 

lrscale_percentages

## # A tibble: 4 × 4
##   gndr   lrscale count percentage
##   <chr>  <chr>   <int>      <dbl>
## 1 Female Left     2296       23.0
## 2 Female Right    2530       25.3
## 3 Male   Left     2062       20.6
## 4 Male   Right    3107       31.1

lrscale_plot_v2 <- ggplot(lrscale_percentages, 
            aes(x = percentage, 
                y = reorder(gndr, -percentage), 
                fill = gndr)) + 
  geom_col() +  
  coord_flip() + 
  guides(fill = "none") + 
  facet_wrap(~ lrscale, nrow = 1) + 
  labs(x = "Percentage of Respondents", 
       y = NULL, 
       title = "Political Orientation by Gender", 
       subtitle = "Comparing the percentage distribution of left vs. right for Sweden (2002-2020)") +  
  theme(plot.title = element_text(size = 16, face = "bold"), 
        plot.subtitle = element_text(size = 12), 
        axis.title.y = element_blank(),  
        legend.position = "bottom")  
lrscale_plot_v2

Task 5

Provide code and answer: In Hungary, what is the conditional probability of NOT feeling close to any particular party given that the person lives in a rural area?

hungary_data <- read.fst("hungary_data.fst")
hungary_data <- hungary_data %>%
  mutate(
    geo = recode(as.character(domicil), 
                 '1' = "Urban", 
                 '2' = "Urban",
                 '3' = "Rural", 
                 '4' = "Rural", 
                 '5' = "Rural",
                 '7' = NA_character_,
                 '8' = NA_character_,
                 '9' = NA_character_)
  ) %>%
  filter(!is.na(clsprty), !is.na(geo))

cond <- hungary_data %>%
  count(clsprty, geo) %>%
  group_by(geo) %>%
  mutate(prob = n / sum(n))

cond

## # A tibble: 10 × 4
## # Groups:   geo [2]
##    clsprty geo       n     prob
##      <dbl> <chr> <int>    <dbl>
##  1       1 Rural  5055 0.429   
##  2       1 Urban  2283 0.472   
##  3       2 Rural  6275 0.532   
##  4       2 Urban  2395 0.495   
##  5       7 Rural   234 0.0199  
##  6       7 Urban    88 0.0182  
##  7       8 Rural   219 0.0186  
##  8       8 Urban    70 0.0145  
##  9       9 Rural     4 0.000339
## 10       9 Urban     4 0.000826

Given that someone lives in a rural area, the conditional probability of them not feeling close to any particular party is 80%.

Salomon_Kayla_Homework_2

Kayla Salomon

2024-01-30

Task 1

Task 2

Task 3

Task 4

Task 5