First Skill Exercise

Marco Tulio Eguez Hurtado

Soc-3320: Methodology and Research II

Instructor: Sébastien Parker

Steps are below each code chunk*

packages <- c("tidyverse", "srvyr", "broom","gt", "modelsummary",
"gapminder", "fst", "ggridges") 
load("anes_2020.rda")

new_packages <- packages[!(packages %in% installed.packages()[,"Package"])] 
if(length(new_packages)) install.packages(new_packages) 
lapply(packages, library, character.only = TRUE)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## 
## Attaching package: 'srvyr'
## 
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## 
## `modelsummary` 2.0.0 now uses `tinytable` as its default table-drawing
##   backend. Learn more at: https://vincentarelbundock.github.io/tinytable/
## 
## Revert to `kableExtra` for one session:
## 
##   options(modelsummary_factory_default = 'kableExtra')
##   options(modelsummary_factory_latex = 'kableExtra')
##   options(modelsummary_factory_html = 'kableExtra')
## 
## Silence this message forever:
## 
##   config_modelsummary(startup_message = FALSE)

## [[1]]
##  [1] "lubridate" "forcats"   "stringr"   "dplyr"     "purrr"     "readr"    
##  [7] "tidyr"     "tibble"    "ggplot2"   "tidyverse" "stats"     "graphics" 
## [13] "grDevices" "utils"     "datasets"  "methods"   "base"     
## 
## [[2]]
##  [1] "srvyr"     "lubridate" "forcats"   "stringr"   "dplyr"     "purrr"    
##  [7] "readr"     "tidyr"     "tibble"    "ggplot2"   "tidyverse" "stats"    
## [13] "graphics"  "grDevices" "utils"     "datasets"  "methods"   "base"     
## 
## [[3]]
##  [1] "broom"     "srvyr"     "lubridate" "forcats"   "stringr"   "dplyr"    
##  [7] "purrr"     "readr"     "tidyr"     "tibble"    "ggplot2"   "tidyverse"
## [13] "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"  
## [19] "base"     
## 
## [[4]]
##  [1] "gt"        "broom"     "srvyr"     "lubridate" "forcats"   "stringr"  
##  [7] "dplyr"     "purrr"     "readr"     "tidyr"     "tibble"    "ggplot2"  
## [13] "tidyverse" "stats"     "graphics"  "grDevices" "utils"     "datasets" 
## [19] "methods"   "base"     
## 
## [[5]]
##  [1] "modelsummary" "gt"           "broom"        "srvyr"        "lubridate"   
##  [6] "forcats"      "stringr"      "dplyr"        "purrr"        "readr"       
## [11] "tidyr"        "tibble"       "ggplot2"      "tidyverse"    "stats"       
## [16] "graphics"     "grDevices"    "utils"        "datasets"     "methods"     
## [21] "base"        
## 
## [[6]]
##  [1] "gapminder"    "modelsummary" "gt"           "broom"        "srvyr"       
##  [6] "lubridate"    "forcats"      "stringr"      "dplyr"        "purrr"       
## [11] "readr"        "tidyr"        "tibble"       "ggplot2"      "tidyverse"   
## [16] "stats"        "graphics"     "grDevices"    "utils"        "datasets"    
## [21] "methods"      "base"        
## 
## [[7]]
##  [1] "fst"          "gapminder"    "modelsummary" "gt"           "broom"       
##  [6] "srvyr"        "lubridate"    "forcats"      "stringr"      "dplyr"       
## [11] "purrr"        "readr"        "tidyr"        "tibble"       "ggplot2"     
## [16] "tidyverse"    "stats"        "graphics"     "grDevices"    "utils"       
## [21] "datasets"     "methods"      "base"        
## 
## [[8]]
##  [1] "ggridges"     "fst"          "gapminder"    "modelsummary" "gt"          
##  [6] "broom"        "srvyr"        "lubridate"    "forcats"      "stringr"     
## [11] "dplyr"        "purrr"        "readr"        "tidyr"        "tibble"      
## [16] "ggplot2"      "tidyverse"    "stats"        "graphics"     "grDevices"   
## [21] "utils"        "datasets"     "methods"      "base"

Loading the necessary packages

summary(gapminder)

##         country        continent        year         lifeExp     
##  Afghanistan:  12   Africa  :624   Min.   :1952   Min.   :23.60  
##  Albania    :  12   Americas:300   1st Qu.:1966   1st Qu.:48.20  
##  Algeria    :  12   Asia    :396   Median :1980   Median :60.71  
##  Angola     :  12   Europe  :360   Mean   :1980   Mean   :59.47  
##  Argentina  :  12   Oceania : 24   3rd Qu.:1993   3rd Qu.:70.85  
##  Australia  :  12                  Max.   :2007   Max.   :82.60  
##  (Other)    :1632                                                
##       pop              gdpPercap       
##  Min.   :6.001e+04   Min.   :   241.2  
##  1st Qu.:2.794e+06   1st Qu.:  1202.1  
##  Median :7.024e+06   Median :  3531.8  
##  Mean   :2.960e+07   Mean   :  7215.3  
##  3rd Qu.:1.959e+07   3rd Qu.:  9325.5  
##  Max.   :1.319e+09   Max.   :113523.1  
##

Check the data

life_exp_cont <- gapminder %>%
  filter(year %in% c(1987, 2007)) %>%
  group_by(continent) %>%
  summarise(
    lifeExp_1987 = first(lifeExp),
    lifeExp_2007 = last(lifeExp),
    change = lifeExp_2007 - lifeExp_1987,
    avg_life = mean(lifeExp),
    .groups = "drop"
  ) %>%
  arrange(desc(avg_life))

print(life_exp_cont)

## # A tibble: 5 × 5
##   continent lifeExp_1987 lifeExp_2007 change avg_life
##   <fct>            <dbl>        <dbl>  <dbl>    <dbl>
## 1 Oceania           76.3         80.2   3.88     78.0
## 2 Europe            72           79.4   7.42     75.6
## 3 Americas          70.8         73.7   2.97     70.8
## 4 Asia              40.8         62.7  21.9      67.8
## 5 Africa            65.8         43.5 -22.3      54.1

I first filtered for the years I wanted to include.Then, I grouped the data by continent and then utilized the summarise function to calculate the new columns, with the change column being calculated as the life expectancy of 1987 minus the one in 2007, showing the change in life expectancy over time.Then I arranged my result in a descendant way.

life_exp_country <- gapminder %>%
  filter(year >= 1987 & year <= 2007) %>%
  group_by(country, year) %>%
  summarise(
    avg_life = mean(lifeExp),
    .groups = "drop"
  ) %>%
  arrange(desc(avg_life))



focal_countries <- life_exp_country %>%
  filter(country %in% c("Niger", "Bangladesh", "El Salvador", "Iraq", "Zimbabwe")
         ) 
focal_countries

## # A tibble: 25 × 3
##    country      year avg_life
##    <fct>       <int>    <dbl>
##  1 El Salvador  2007     71.9
##  2 El Salvador  2002     70.7
##  3 El Salvador  1997     69.5
##  4 El Salvador  1992     66.8
##  5 Iraq         1987     65.0
##  6 Bangladesh   2007     64.1
##  7 El Salvador  1987     63.2
##  8 Zimbabwe     1987     62.4
##  9 Bangladesh   2002     62.0
## 10 Zimbabwe     1992     60.4
## # ℹ 15 more rows

I followed a similar structure from the previous step. Separating the 5 countries with the filter(), grouping them with %in% and then selecting the countries with c(). Here I tried to use the select() function to separate the countries but when doing the graph below the countries did not appeared, so I had to move to the filter function.

enhanced_table <- life_exp_cont %>% 
  select(-avg_life) %>%
  gt() %>% 
  cols_label(
    continent = "Continent",
    lifeExp_1987 = "1987",
    lifeExp_2007 = "2007",
    change = "Change (2007-1987)"
  ) %>%
  fmt_number(
    columns = c(`lifeExp_1987`, `lifeExp_2007`, change),
    decimals = 1,
    use_seps = TRUE
  ) %>%
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_column_labels()
  ) %>%
  tab_header(
    title = md("**Life Expectancy Changes by Continent**"),
    subtitle = md("Average life expectancy in years")
  ) %>% 
   tab_source_note(
    source_note = "Data: Gapminder"
  )

enhanced_table

Continent	1987	2007	Change (2007-1987)
Life Expectancy Changes by Continent
Average life expectancy in years
Oceania	76.3	80.2	3.9
Europe	72.0	79.4	7.4
Americas	70.8	73.7	3.0
Asia	40.8	62.7	21.9
Africa	65.8	43.5	−22.3
Data: Gapminder

I started by removing the avg_life column with the select(-) as I did not need it for the column. Then I used the gt tool to start making the graph and entered the values for the table. After that, I changed the style, making the text in the columns bold. Then, I changed the headers using the markdown format. I concluded with the source note.

ggplot(
  focal_countries,
  aes(x = year, y = avg_life,
      color = country, fill = country)
  ) + 
  geom_line(linewidth = 1.5) +
  
  scale_color_brewer(palette = "Set1") +
  
  theme_minimal() +
  
  theme(
    panel.grid = element_blank(),
    plot.title = element_text(face = "bold", size = 14),
    plot.subtitle = element_text(size = 12),
    legend.position = "bottom",
  ) +
  labs(
    title = "Life Expectancy Trajectories (1987-2007)",
    subtitle = "in Selected Countries",
    x = "Year",
    y = "Life Expectancy (years)",
  )

I created the visualization following the steps on Tutorial 3. I used the ggplot tool and include the values for my x and y axis and the color which will represent the country. I chose to do it not with just the two values for 1987 and 2007 but with the range of years from 1987 to 2007 to showcase the trajectory of the countries selected. After that, I changed the width of the line and copy the code for the color. Then, I removed the background with theme_minimal() and change the elements of the visual with theme. Lastly, I used the labs command to add the necessary titles, subtitles, and footnotes.

Interpretation

Continental trends: The analysis of continental life expectancy trends between 1987 and 2007 reveals significant regional disparities. Oceania maintained the highest baseline life expectancy and achieved the smallest increase of years, reflecting its already advanced healthcare systems and relatively stable socio-economic conditions. Europe exhibited the most substantial improvement, rising from 72.0 to 79.4 years. This might be caused by the post-Cold War investments in public health infrastructure and advancements in health research. Asia demonstrated the largest relative gain, surging from a low baseline of 40.8 to 62.7 years, a trajectory consistent with rapid economic development and expanded access to primary care. Conversely, Africa experienced a major decline, falling from 65.8 to 43.5 years, a pattern likely exacerbated by the HIV/AIDS pandemic, political instability, and underfunded health systems. The Americas showed moderate progress, though structural inequalities, political instability and variable healthcare access may explain its intermediate position between high- and low-performing regions, where some people might enjoy from an expanded lifetime but others do not.
Five-country analysis: The trajectories of the five focal nations highlight divergent developmental pathways. El Salvador exhibited the most consistent upward trend, rising from 63.2 to 71.9 years, likely driven by poverty reduction initiatives. Bangladesh achieved moderate gains, reflecting incremental improvements in public health despite persistent resource constraints. Zimbabwe and Niger exemplify stagnation and volatility. Zimbabwe’s life expectancy collapsed from 62.4 years to 44.7 years, most likely due to corruption, political instability, lack of access to healthcare and HIV prevalence. Niger’s marginal improvements underscore challenges in combating infectious diseases and malnutrition. Iraq displayed a unique trajectory: initial stability followed by decline during economic sanctions and conflict, with partial recovery pafter the year 2003.

table(anes_2020$TrustPeople)

## 
##              Always    Most of the time About half the time    Some of the time 
##                  48                3511                2020                1597 
##               Never 
##                 264

table(anes_2020$AgeGroup)

## 
##       18-29       30-39       40-49       50-59       60-69 70 or older 
##         871        1241        1081        1200        1436        1330

I started by looking at my variables.

anes_2020_clean <- anes_2020 %>%
  filter(!is.na(TrustPeople), !is.na(AgeGroup))

Then, filtering the missing values of both columns.

trust_by_age <- anes_2020_clean %>%
  group_by(AgeGroup) %>%
  count(TrustPeople) %>%
  mutate(
    prop = n/sum(n),
    percent = round(100 * prop, 1)
  )

trust_by_age

## # A tibble: 30 × 5
## # Groups:   AgeGroup [6]
##    AgeGroup TrustPeople             n    prop percent
##    <fct>    <fct>               <int>   <dbl>   <dbl>
##  1 18-29    Always                  7 0.00804     0.8
##  2 18-29    Most of the time      268 0.308      30.8
##  3 18-29    About half the time   278 0.319      31.9
##  4 18-29    Some of the time      246 0.282      28.2
##  5 18-29    Never                  72 0.0827      8.3
##  6 30-39    Always                 10 0.00807     0.8
##  7 30-39    Most of the time      502 0.405      40.5
##  8 30-39    About half the time   378 0.305      30.5
##  9 30-39    Some of the time      281 0.227      22.7
## 10 30-39    Never                  68 0.0549      5.5
## # ℹ 20 more rows

Here, I calculated the proportions, the number, and the percentage of trust in people by age group, so each age group will display the proportions within the same category.

total_sample_size <- nrow(anes_2020_clean)

total_sample_size

## [1] 7153

I counted the total sample with nrow() with the previously cleaned data to avoid missing values.

trust_wide <- trust_by_age %>%
  select(-prop, -n) %>%
  pivot_wider(
    names_from = TrustPeople,
    values_from = percent
  )  %>%
  ungroup() 
  
  
print(trust_wide)

## # A tibble: 6 × 6
##   AgeGroup    Always `Most of the time` `About half the time` `Some of the time`
##   <fct>        <dbl>              <dbl>                 <dbl>              <dbl>
## 1 18-29          0.8               30.8                  31.9               28.2
## 2 30-39          0.8               40.5                  30.5               22.7
## 3 40-49          0.7               44.1                  29.1               22.9
## 4 50-59          0.2               48.9                  27.1               20.8
## 5 60-69          0.7               52.4                  25.2               19.8
## 6 70 or older    0.6               59.2                  21.6               17.3
## # ℹ 1 more variable: Never <dbl>

I got stuck here for a while because I was not able to figure out a way to arrange my data in a way that the trust categories where converted to column headers. Fortunately, I found this youtube video that help me to use and understand the function pivot_wider().https://youtu.be/YpAdZ4079qs?si=YKOM7pdcjRIs3dx4. It was important to ungroup the values at the end, because at first I was having troubles with my table being grouped by age and the age group column disappearing. After ungrouping, I was able to get the column age group back.

trust_table <- trust_wide %>%
  gt() %>%
  cols_label(
    AgeGroup = "Age Group",
    Always = "Always",          
    `Most of the time` = "Most of the time",
    `About half the time` = "About half the time",
    `Some of the time` = "Some of the time",
    Never = "Never"
  ) %>%
  fmt_number(
    columns = c(Always, `Most of the time`, `About half the time`, `Some of the time`, Never),
    decimals = 1
  ) %>%
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_column_labels(columns = "AgeGroup")
  ) %>%
  tab_style(
    style = cell_text(align = "center"),
    locations = cells_body(columns = c(Always, `Most of the time`, `About half the time`, `Some of the time`, Never))
  ) %>%
  tab_header(
    title = md("**Interpersonal Trust by Age Group**"),
    subtitle = "Distribution of responses (percentages)"
  ) %>% 
  tab_source_note(
    source_note = paste("Data: ANES 2020 (sample size =", total_sample_size, ")") 
  )

trust_table

Age Group	Always	Most of the time	About half the time	Some of the time	Never
Interpersonal Trust by Age Group
Distribution of responses (percentages)
18-29	0.8	30.8	31.9	28.2	8.3
30-39	0.8	40.5	30.5	22.7	5.5
40-49	0.7	44.1	29.1	22.9	3.2
50-59	0.2	48.9	27.1	20.8	3.1
60-69	0.7	52.4	25.2	19.8	1.9
70 or older	0.6	59.2	21.6	17.3	1.3
Data: ANES 2020 (sample size = 7153 )

Following the steps of the first table, I was able to use a similar structure while adding some important things like aligning the values to the center in the trust categories to make it more legible and clear. Also, using paste = to paste the total sample size directly.

ggplot( 
data = anes_2020_clean %>%  
 filter(!is.na(TrustPeople), !is.na(AgeGroup)), 
mapping = aes(x = AgeGroup, fill = TrustPeople) 
) + 
geom_bar( 
position = "fill", 
color = "white", 
alpha = 0.9 
) + 
   coord_flip() +
scale_fill_viridis_d( 
option = "mako", 
direction = -1
) + 
scale_y_continuous( 
labels = scales::percent, 
breaks = seq(0, 1, 0.2)
) + 
labs( 
title = "Interpersonal Trust Distribution by Age Group", 
x = "Age Group", 
y = "Percentage of Trust by Age Group", 
fill = "Level of Trust",
caption = paste("Data: ANES 2020 | Total sample size =", total_sample_size)
) + 
theme_minimal() + 
theme( 
legend.position = "right", 
legend.title = element_text(face = "bold"), 
plot.title = element_text(face = "bold", size = 14),
axis.title.x = element_text(face = "bold", size = 11),
axis.title.y = element_text(face = "bold", size = 11),
plot.caption = element_text(face = "italic", hjust = 0),
axis.text = element_text(size = 10) 
)

Following the steps in tutorial 3, and filtering missing results at the begging to avoid confusion, I was able to copy most of the structure of this graph and I added some details like personalizing the size and face of the labels and captions to make it clearer. I also used the labels to display the percentage signs on the y axis and change the sequel to 0 to 100 every 20% to reduce the volume of visuals at the bottom and make it clearer.

#Interpretation The data reveals distinct age patterns in interpersonal trust levels. Younger age groups, particularly those aged 18-29, exhibit lower levels of trust. As age increases, trust levels tend to rise, with the oldest group (70 or older) showing the highest proportion of individuals who trust others “Most of the time”. When it comes to older people, they tend to trust other people more, especially when compared to young adults, who have a harder time trusting most people. This suggests that trust in others may increase with age, possibly due to life experiences or social stability. Other possible explanation to why younger age groups tend to show lower interpersonal trust might be due to the increasing political and cultural polarization that the world is going through. There is a clear trend that has been going on, from higher levels of trust, to lower levels of trust over the years and it might be worthwhile to unpack the reasons behind such trend.

library(fst)
italy_fairness <- read_fst("italy_data.fst")

table(italy_fairness$sofrdst)

## 
##    1    2    3    4    5    7    8 
##  692 1346  448  174   31    9   45

table(italy_fairness$eisced)

## 
##    0    1    2    3    4    5    6    7   55   77   88   99 
## 1207 1090 2685  626 2874  295  403  891   25   68    9    5

Check the data to spot N/A values

library(fst)
denmark_fairness <- read_fst("denmark_data.fst")

table(denmark_fairness$sofrdst)

## 
##   1   2   3   4   5   7   8   9 
##  79 268 325 674 202   4  16   4

table(denmark_fairness$eisced)

## 
##    1    2    3    4    5    6    7   55   77   88   99 
##  610 2353 3468  967 1282 2410 1239   14    2    4   59

Check the data to spot N/A values

italy_fairness <- italy_fairness %>% 
mutate( 
  cntry = "Italy",
social_fairness = case_when(
   sofrdst == 1 ~ "Agree\nstrongly", 
sofrdst == 2 ~ "Agree", 
sofrdst == 3 ~ "Neither agree\nnor disagree", 
sofrdst == 4 ~ "Disagree",
sofrdst == 5 ~ "Disagree\nstrongly",
TRUE ~ NA_character_ 
), 
social_fairness = factor( 
social_fairness, 
levels = c("Agree\nstrongly", "Agree",  
"Neither agree\nnor disagree", "Disagree", "Disagree\nstrongly")
) 
) 

table(italy_fairness$social_fairness)

## 
##             Agree\nstrongly                       Agree 
##                         692                        1346 
## Neither agree\nnor disagree                    Disagree 
##                         448                         174 
##          Disagree\nstrongly 
##                          31

Here I changed the numbers with the names I found on the ESS website, and then I converted them into meaningful categories with the factor() function. I changed the names to make them easy to read in the graphs due to some of the names being to long and were overlapping when I did my graphs. By changing the name now I can make the font of the categories bigger in the graphs.

italy_fairness_by_edu <- italy_fairness %>%
  mutate(
    education = case_when( 
eisced %in% c(1:4) ~ "Less than BA",    

eisced %in% c(5:7) ~ "BA or Higher",       
 
TRUE ~ NA_character_ 
), 
education = factor(education) 
  )

table(italy_fairness_by_edu$education)

## 
## BA or Higher Less than BA 
##         1589         7275

Here I grouped the education categories to make them meaningful when comparing the data, I choose these two levels to see if there is substantial difference between people going to post-secondary education, and those who did not.

denmark_fairness <- denmark_fairness %>% 
mutate( 
  cntry = "Denmark",
social_fairness = case_when(
 sofrdst == 1 ~ "Agree\nstrongly", 
sofrdst == 2 ~ "Agree", 
sofrdst == 3 ~ "Neither agree\nnor disagree", 
sofrdst == 4 ~ "Disagree",
sofrdst == 5 ~ "Disagree\nstrongly",
TRUE ~ NA_character_ 
), 
social_fairness = factor( 
social_fairness, 
levels = c("Agree\nstrongly", "Agree",  
"Neither agree\nnor disagree", "Disagree", "Disagree\nstrongly")
)
)  

table(denmark_fairness$social_fairness)

## 
##             Agree\nstrongly                       Agree 
##                          79                         268 
## Neither agree\nnor disagree                    Disagree 
##                         325                         674 
##          Disagree\nstrongly 
##                         202

denmark_fairness_by_edu <- denmark_fairness %>%
  mutate(
    education = case_when( 
eisced %in% c(1:4) ~ "Less than BA",    

eisced %in% c(5:7) ~ "BA or Higher",       
 
TRUE ~ NA_character_ 
), 
education = factor(education) 
  )

table(denmark_fairness_by_edu$education)

## 
## BA or Higher Less than BA 
##         4931         7398

combined_fairness <- bind_rows(italy_fairness, denmark_fairness)
table(combined_fairness$social_fairness)

## 
##             Agree\nstrongly                       Agree 
##                         771                        1614 
## Neither agree\nnor disagree                    Disagree 
##                         773                         848 
##          Disagree\nstrongly 
##                         233

I combined the fairness data to use it in my table.

combined_data_by_edu <- bind_rows(italy_fairness_by_edu, denmark_fairness_by_edu)
table(combined_data_by_edu$social_fairness)

## 
##             Agree\nstrongly                       Agree 
##                         771                        1614 
## Neither agree\nnor disagree                    Disagree 
##                         773                         848 
##          Disagree\nstrongly 
##                         233

table(combined_data_by_edu$education)

## 
## BA or Higher Less than BA 
##         6520        14673

I combined the fairness by education data to include it in my graph.

response_dist_trust <- combined_fairness %>% 
  filter(!is.na(social_fairness))%>% 
group_by(cntry) %>%
  count(social_fairness) %>%
  mutate(
    prop = n/sum(n),
    percent = round(100 * prop, 1)
    ) %>%
  ungroup()

response_dist_trust

## # A tibble: 10 × 5
##    cntry   social_fairness                   n   prop percent
##    <chr>   <fct>                         <int>  <dbl>   <dbl>
##  1 Denmark "Agree\nstrongly"                79 0.0510     5.1
##  2 Denmark "Agree"                         268 0.173     17.3
##  3 Denmark "Neither agree\nnor disagree"   325 0.210     21  
##  4 Denmark "Disagree"                      674 0.435     43.5
##  5 Denmark "Disagree\nstrongly"            202 0.130     13  
##  6 Italy   "Agree\nstrongly"               692 0.257     25.7
##  7 Italy   "Agree"                        1346 0.500     50  
##  8 Italy   "Neither agree\nnor disagree"   448 0.166     16.6
##  9 Italy   "Disagree"                      174 0.0647     6.5
## 10 Italy   "Disagree\nstrongly"             31 0.0115     1.2

I proceed to calculate the proportions by using the skills I learned in previous steps, like filtering before starting and grouping by the variable I am interested and then dropping the group to make the table.

sample_sizes <- combined_fairness %>%
  filter(!is.na(social_fairness))%>%
  group_by(cntry) %>%
  summarise(
    total_sample = n()
  )

education_samples <- combined_data_by_edu %>%
  filter(!is.na(education))%>%
  count(cntry, education) %>%
  group_by(cntry) %>%
  mutate(total = sum(n)) %>%
  ungroup()

sample_sizes

## # A tibble: 2 × 2
##   cntry   total_sample
##   <chr>          <int>
## 1 Denmark         1548
## 2 Italy           2691

education_samples

## # A tibble: 4 × 4
##   cntry   education        n total
##   <chr>   <fct>        <int> <int>
## 1 Denmark BA or Higher  4931 12329
## 2 Denmark Less than BA  7398 12329
## 3 Italy   BA or Higher  1589  8864
## 4 Italy   Less than BA  7275  8864

Here I calculated the sample sizes for both my values always filtering the missing values.

combined_fairness_wide <- response_dist_trust %>%
  select(-prop, -n) %>%
  pivot_wider(
    names_from = social_fairness,
    values_from = percent
  )  %>%
  ungroup() 
  
  
print(combined_fairness_wide)

## # A tibble: 2 × 6
##   cntry   `Agree\nstrongly` Agree `Neither agree\nnor disagree` Disagree
##   <chr>               <dbl> <dbl>                         <dbl>    <dbl>
## 1 Denmark               5.1  17.3                          21       43.5
## 2 Italy                25.7  50                            16.6      6.5
## # ℹ 1 more variable: `Disagree\nstrongly` <dbl>

With my proportions and with the insight from previous tables and the youtube video I watched, I converted my data to match the table structure I wanted.

fairness_table <- combined_fairness_wide %>%
 gt() %>%
  cols_label(
    cntry = "Country",
    'Agree\nstrongly' = "Agree\nstrongly",          
    `Agree` = "Agree",
    `Neither agree\nnor disagree` = "Neither agree\nnor disagree",
    `Disagree` = "Disagree",
    'Disagree\nstrongly' = "Disagree\nstrongly"
  ) %>%
  fmt_number(
    columns = c('Agree\nstrongly', `Agree`, `Neither agree\nnor disagree`, `Disagree`, 'Disagree\nstrongly'),
    decimals = 1
  ) %>%
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_column_labels(columns = "cntry")
  ) %>%
  tab_style(
    style = cell_text(align = "center"),
    locations = cells_body(columns = c('Agree\nstrongly', `Agree`, `Neither agree\nnor disagree`, `Disagree`, 'Disagree\nstrongly'))
  ) %>%
  tab_header(
    title = md("**Views on Fair Income Distribution**"),
    subtitle = "Response distribution by country (%)"
  ) %>%
  tab_source_note(
    source_note = md(
      paste("Sample size:",
            paste(sample_sizes$cntry,"(", sample_sizes$total_sample, ")")
           )
    )
  )


fairness_table

Country	Agree strongly	Agree	Neither agree nor disagree	Disagree	Disagree strongly
Views on Fair Income Distribution
Response distribution by country (%)
Denmark	5.1	17.3	21.0	43.5	13.0
Italy	25.7	50.0	16.6	6.5	1.2
Sample size: Denmark ( 1548 )
Sample size: Italy ( 2691 )

I made the table with a similar structure as the previous tables but with my new, cleaned data.

fairness_plot <- combined_fairness %>%
  filter(!is.na(social_fairness)) %>%
ggplot( 
mapping = aes( 
x = as.numeric(social_fairness), 
y = cntry, 
fill = cntry 
) 
) + 
geom_density_ridges( 
alpha = 0.7,
scale = 0.9,
bandwidth = 0.1
) + 
scale_fill_brewer(palette = "Set1") + 
scale_x_continuous( 
breaks = 1:5, 
labels = levels(combined_fairness$social_fairness) 
) + 
labs( 
title = "Distribution of Views on Income Equality", 
subtitle = "Comparison between Italy and Denmark", 
x = NULL, 
y = NULL   
) + 
theme_minimal() + 
theme(
  panel.grid.minor = element_blank(),
    legend.position = "none",
axis.text = element_text(face = "bold",size = 10)
)

fairness_plot

First, I cleaned my combined data using the filter function. Then, I used most of the structure of the graph provided on tutorial 3 and I added some elements to make my graph clear like changing the scale of the plot and removing unnecessary captions. I also made the category labels bigger and bold for clarity.

edu_plot <- combined_data_by_edu %>%
  filter(!is.na(social_fairness), !is.na(education)) %>%
ggplot( 
mapping = aes( 
x = as.numeric(social_fairness), 
y = education, 
fill = education 
) 
) + 
geom_density_ridges( 
alpha = 0.7,
scale = 0.9,
bandwidth = 0.1
) + 
scale_fill_brewer(palette = "Set1") + 
scale_x_continuous( 
breaks = 1:5, 
labels = levels(combined_fairness$social_fairness) 
) + 
  facet_wrap(~cntry) +
labs( 
title = "Views on Income Distribution by Education Level", 
subtitle = "Comparing Italy and Denmark", 
x = NULL, 
y = NULL   
) + 
theme_minimal() + 
theme(
  panel.grid.minor = element_blank(),
    legend.position = "none",
axis.text = element_text(face = "bold", size = 7),
strip.text = element_text(face = "bold", size = 10)
)

edu_plot

Following a similar structure from the previous graph, I cleaned my data to make this graph. The change in the income equality were useful here as some of the categories were previously to long but now they are legible and I did not had to put an inclination angle to them.

#Interpretation 1. Country differences in views: There are notable differences between Denmark and Italy in their views on fair income distribution. In Denmark, a significant proportion of respondents “Disagree” that income is distributed fairly, while a small fraction of them “Agree strongly.” In contrast, Italy shows a much higher proportion of respondents who “Agree” or “Agree strongly” that income is distributed fairly. This suggests that Italians are generally more satisfied with income distribution compared to Danes, who are more critical. This is surprising due to the social democrat model of Denmark, which should be more effective when distributing wealth and developing public infrastructure.

Educational patterns within countries: Within both countries, educational success influences perceptions of income fairness. In Denmark, individuals with a “BA or Higher” are slightly more likely to “Disagree” or “Disagree strongly” with fair income distribution compared to those with “Less than BA.” Similarly, in Italy, those with higher education are more critical, with a higher proportion of “Disagree” responses. This indicates that higher education may lead to a more critical view of income distribution, possibly due to greater awareness of economic inequalities.
1. Overall takeaways: The data suggests that cultural and societal factors significantly influence perceptions of income fairness. Denmark, with its strong welfare state, shows higher skepticism toward income distribution, while Italy, with a different economic structure, shows greater satisfaction. Education also plays a role, as higher-educated individuals in both countries tend to be more critical of income distribution. These findings highlight the importance of considering both national context and individual education levels when analyzing attitudes toward economic fairness.

Skill Exercise 1

2025-01-30

First Skill Exercise

Steps are below each code chunk*

Interpretation