Sallay_Nicholas_Homework

# List of packages
packages <- c("tidyverse", "fst", "modelsummary", "viridis") # add any you need here

# Install packages if they aren't installed already
new_packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)

# Load the packages
lapply(packages, library, character.only = TRUE)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Loading required package: viridisLite

## [[1]]
##  [1] "lubridate" "forcats"   "stringr"   "dplyr"     "purrr"     "readr"    
##  [7] "tidyr"     "tibble"    "ggplot2"   "tidyverse" "stats"     "graphics" 
## [13] "grDevices" "utils"     "datasets"  "methods"   "base"     
## 
## [[2]]
##  [1] "fst"       "lubridate" "forcats"   "stringr"   "dplyr"     "purrr"    
##  [7] "readr"     "tidyr"     "tibble"    "ggplot2"   "tidyverse" "stats"    
## [13] "graphics"  "grDevices" "utils"     "datasets"  "methods"   "base"     
## 
## [[3]]
##  [1] "modelsummary" "fst"          "lubridate"    "forcats"      "stringr"     
##  [6] "dplyr"        "purrr"        "readr"        "tidyr"        "tibble"      
## [11] "ggplot2"      "tidyverse"    "stats"        "graphics"     "grDevices"   
## [16] "utils"        "datasets"     "methods"      "base"        
## 
## [[4]]
##  [1] "viridis"      "viridisLite"  "modelsummary" "fst"          "lubridate"   
##  [6] "forcats"      "stringr"      "dplyr"        "purrr"        "readr"       
## [11] "tidyr"        "tibble"       "ggplot2"      "tidyverse"    "stats"       
## [16] "graphics"     "grDevices"    "utils"        "datasets"     "methods"     
## [21] "base"


## Task 1 

Provide code and answer.

Prompt: in the tutorial, we calculated the average trust in others for France and visualized it. Using instead the variable ‘Trust in Parliament’ (trstplt) and the country of Spain (country file provided on course website), visualize the average trust by survey year. You can truncate the y-axis if you wish. Provide appropriate titles and labels given the changes. What are your main takeaways based on the visual (e.g., signs of increase, decrease, or stall)?


```r
france_data <- read.fst("spain_data.fst")

france_data$year <- france_data$essround
replacements <- c(2002, 2004, 2006, 2008, 2010, 2012, 2014, 2016, 2018, 2020)
france_data$year[france_data$year == 1] <- replacements[1]
france_data$year[france_data$year == 2] <- replacements[2]
france_data$year[france_data$year == 3] <- replacements[3]
france_data$year[france_data$year == 4] <- replacements[4]
france_data$year[france_data$year == 5] <- replacements[5]
france_data$year[france_data$year == 6] <- replacements[6]
france_data$year[france_data$year == 7] <- replacements[7]
france_data$year[france_data$year == 8] <- replacements[8]
france_data$year[france_data$year == 9] <- replacements[9]
france_data$year[france_data$year == 10] <- replacements[10]


table(france_data$year)

## 
## 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 
## 1729 1663 1876 2576 1885 1889 1925 1958 1668 2283

table(france_data$essround)

## 
##    1    2    3    4    5    6    7    8    9   10 
## 1729 1663 1876 2576 1885 1889 1925 1958 1668 2283

Now, we will calculate the average by year to then visualize:

trust_by_year <- france_data %>%
  group_by(year) %>%
  summarize(mean_trust = mean(trstplt, na.rm = TRUE))
trust_by_year

## # A tibble: 10 × 2
##     year mean_trust
##    <dbl>      <dbl>
##  1  2002       7.81
##  2  2004       5.96
##  3  2006       5.41
##  4  2008       5.71
##  5  2010       3.61
##  6  2012       2.84
##  7  2014       3.30
##  8  2016       3.88
##  9  2018       4.15
## 10  2020       3.13

We can see from this table that the average does not shift much from year to year.

But now it’s time to:

Visualize

ggplot(trust_by_year, aes(x = year, y = mean_trust)) +
  geom_line(color = "blue", size = 1) +  # Line to show the trend
  geom_point(color = "red", size = 3) +  # Points to highlight each year's value
  labs(title = "Trust in Parliament in Spain (2002-2020)", 
       x = "Survey Year", 
       y = "Average Trust (0-10 scale)") +
  ylim(0, 10) +  # Setting the y-axis limits from 0 to 10
  theme_minimal()

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Task 1 answer: trust in parliament has steadily decreased in Spain over the years, consistently falling below 5.0 since 2010.

## Task 2

Provide answer only.

Prompt and question: Based on the figure we produced above called task2_plot, tell us: what are your main takeaways regarding France relative to Italy and Norway? Make sure to be concrete and highlight at least two important comparative trends visualized in the graph.  

Answer: In France, there has been a decline since 1940 in people feeling close to a political party. In comparison, Italians have never felt very close to a political party, remaining under 0.50 since 1920 and have now dropped to below 0.25 since 2000. Finally, Norwegians have consistently demonstrated feelings of closeness to a political party, remaining well above 0.50 since 1920, yet these feelings have began to drop much closer to 0.50 since after 1980 and beyond.  


```r
italy_data <- read.fst("italy_data(1).fst")
View(italy_data)

# temp1 = list.files(path = "data/" , pattern = "*.fst")

norway_data <- read.fst("norway_data(1).fst")
View(norway_data)

# temp1 = list.files(path = "data/" , pattern = "*.fst")

Task 3

Provide code and answer.

Question: What is the marginal percentage of Italian men who feel close to a particular political party?

lrscale_percentages <- france_data %>%  # Begin with the dataset 'italy_data'
  filter(!is.na(lrscale), !is.na(gndr)) %>%  # Filter out rows where 'lrscale' or 'gender' is NA (missing data)
  group_by(gndr, lrscale) %>%  # Group the data by 'gender' and 'lrscale' categories
  summarise(count = n(), .groups = 'drop') %>%  # Summarise each group to get counts, and then drop groupings
  mutate(percentage = count / sum(count) * 100)  # Calculate percentage for each group by dividing count by total count and multiplying by 100

lrscale_percentages  # The resulting dataframe

## # A tibble: 30 × 4
##     gndr lrscale count percentage
##    <dbl>   <dbl> <int>      <dbl>
##  1     1       0   506      2.60 
##  2     1       1   313      1.61 
##  3     1       2   764      3.93 
##  4     1       3  1243      6.39 
##  5     1       4   997      5.13 
##  6     1       5  2567     13.2  
##  7     1       6   706      3.63 
##  8     1       7   589      3.03 
##  9     1       8   420      2.16 
## 10     1       9   140      0.720
## # ℹ 20 more rows

Task 3 answer: The marginal percentage of Italian men who feel close to a particular political party is 24.4% for left leaning political parties and the marginal percentage of Italian men who feel close to right leaning political parties is 23.2%.

Task 4

Provide code and output only.

Prompt: In the tutorial, we calculated then visualized the percentage distribution for left vs. right by gender for France. Your task is to replicate the second version of the visualization but for the country of Sweden instead.

sweden_data <- read.fst("sweden_data.fst")

# Create a ggplot object for horizontal bar chart with the specified style
lrscale_plot_v2 <- ggplot(lrscale_percentages, 
            aes(x = percentage,  # Use percentage directly
                y = reorder(gndr, -percentage),  # Order bars within each gender
                fill = gndr)) +  # Fill color based on Gender

  # Create horizontal bar chart
  geom_col() +  # Draws the bars using the provided data
  coord_flip() +  # Flip coordinates to make bars horizontal

  # Remove fill color legend
  guides(fill = "none") +  # Removes legend for the fill aesthetic

  # Split the plot based on Political Orientation
  facet_wrap(~ lrscale, nrow = 1) +  # Separate plots for Left/Right

  # Labels and titles for the plot
  labs(x = "Percentage of Respondents",  # X-axis label
       y = NULL,  # Remove Y-axis label
       title = "Political Orientation by Gender",  # Main title
       subtitle = "Comparing the percentage distribution of left vs. right for Sweden (2002-2020)") +  # Subtitle

  # Adjust visual properties of the plot
  theme(plot.title = element_text(size = 16, face = "bold"),  # Format title
        plot.subtitle = element_text(size = 12),  # Format subtitle
        axis.title.y = element_blank(),  # Remove Y-axis title
        legend.position = "bottom")  # Position the legend at the bottom

# Display the ggplot object
lrscale_plot_v2

## Task 5

Provide code and answer: In Hungary, what is the conditional probability of NOT feeling close to any particular party given that the person lives in a rural area?

hungary_data <- read.fst("hungary_data.fst")

# Recode clsprty and geo variables, removing NAs
hungary_data <- hungary_data %>%
  mutate(
    geo = recode(as.character(domicil), 
                 '1' = "Urban", 
                 '2' = "Urban",
                 '3' = "Rural", 
                 '4' = "Rural", 
                 '5' = "Rural",
                 '7' = NA_character_,
                 '8' = NA_character_,
                 '9' = NA_character_)
  ) %>%
  filter(!is.na(lrscale), !is.na(geo))  # Removing rows with NA in clsprty or geo

# Calculate conditional probabilities, excluding NAs
cond <- hungary_data %>%
  count(lrscale, geo) %>%
  group_by(geo) %>%
  mutate(prob = n / sum(n))

cond

## # A tibble: 28 × 4
## # Groups:   geo [2]
##    lrscale geo       n   prob
##      <dbl> <chr> <int>  <dbl>
##  1       0 Rural   314 0.0266
##  2       0 Urban   234 0.0483
##  3       1 Rural   233 0.0198
##  4       1 Urban   107 0.0221
##  5       2 Rural   469 0.0398
##  6       2 Urban   215 0.0444
##  7       3 Rural   637 0.0540
##  8       3 Urban   337 0.0696
##  9       4 Rural   632 0.0536
## 10       4 Urban   332 0.0686
## # ℹ 18 more rows

Task 5 answer: The conditional probability of not feeling close to any particular political party given that the person lives in a rural area is

{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE)


## R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.

When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:


```r
summary(cars)

##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Sallay_Nicholas_Homework_2

Nicholas Sallay

2024-01-27

Visualize

Task 3

Task 4

Including Plots