Marco Tulio Eguez Hurtado
Soc-3320: Methodology and Research II
Instructor: Sébastien Parker
packages <- c("tidyverse", "srvyr", "broom","gt", "modelsummary",
"gapminder", "fst", "ggridges")
load("anes_2020.rda")
new_packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)
lapply(packages, library, character.only = TRUE)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
##
## Attaching package: 'srvyr'
##
##
## The following object is masked from 'package:stats':
##
## filter
##
##
## `modelsummary` 2.0.0 now uses `tinytable` as its default table-drawing
## backend. Learn more at: https://vincentarelbundock.github.io/tinytable/
##
## Revert to `kableExtra` for one session:
##
## options(modelsummary_factory_default = 'kableExtra')
## options(modelsummary_factory_latex = 'kableExtra')
## options(modelsummary_factory_html = 'kableExtra')
##
## Silence this message forever:
##
## config_modelsummary(startup_message = FALSE)
## [[1]]
## [1] "lubridate" "forcats" "stringr" "dplyr" "purrr" "readr"
## [7] "tidyr" "tibble" "ggplot2" "tidyverse" "stats" "graphics"
## [13] "grDevices" "utils" "datasets" "methods" "base"
##
## [[2]]
## [1] "srvyr" "lubridate" "forcats" "stringr" "dplyr" "purrr"
## [7] "readr" "tidyr" "tibble" "ggplot2" "tidyverse" "stats"
## [13] "graphics" "grDevices" "utils" "datasets" "methods" "base"
##
## [[3]]
## [1] "broom" "srvyr" "lubridate" "forcats" "stringr" "dplyr"
## [7] "purrr" "readr" "tidyr" "tibble" "ggplot2" "tidyverse"
## [13] "stats" "graphics" "grDevices" "utils" "datasets" "methods"
## [19] "base"
##
## [[4]]
## [1] "gt" "broom" "srvyr" "lubridate" "forcats" "stringr"
## [7] "dplyr" "purrr" "readr" "tidyr" "tibble" "ggplot2"
## [13] "tidyverse" "stats" "graphics" "grDevices" "utils" "datasets"
## [19] "methods" "base"
##
## [[5]]
## [1] "modelsummary" "gt" "broom" "srvyr" "lubridate"
## [6] "forcats" "stringr" "dplyr" "purrr" "readr"
## [11] "tidyr" "tibble" "ggplot2" "tidyverse" "stats"
## [16] "graphics" "grDevices" "utils" "datasets" "methods"
## [21] "base"
##
## [[6]]
## [1] "gapminder" "modelsummary" "gt" "broom" "srvyr"
## [6] "lubridate" "forcats" "stringr" "dplyr" "purrr"
## [11] "readr" "tidyr" "tibble" "ggplot2" "tidyverse"
## [16] "stats" "graphics" "grDevices" "utils" "datasets"
## [21] "methods" "base"
##
## [[7]]
## [1] "fst" "gapminder" "modelsummary" "gt" "broom"
## [6] "srvyr" "lubridate" "forcats" "stringr" "dplyr"
## [11] "purrr" "readr" "tidyr" "tibble" "ggplot2"
## [16] "tidyverse" "stats" "graphics" "grDevices" "utils"
## [21] "datasets" "methods" "base"
##
## [[8]]
## [1] "ggridges" "fst" "gapminder" "modelsummary" "gt"
## [6] "broom" "srvyr" "lubridate" "forcats" "stringr"
## [11] "dplyr" "purrr" "readr" "tidyr" "tibble"
## [16] "ggplot2" "tidyverse" "stats" "graphics" "grDevices"
## [21] "utils" "datasets" "methods" "base"
Loading the necessary packages
summary(gapminder)
## country continent year lifeExp
## Afghanistan: 12 Africa :624 Min. :1952 Min. :23.60
## Albania : 12 Americas:300 1st Qu.:1966 1st Qu.:48.20
## Algeria : 12 Asia :396 Median :1980 Median :60.71
## Angola : 12 Europe :360 Mean :1980 Mean :59.47
## Argentina : 12 Oceania : 24 3rd Qu.:1993 3rd Qu.:70.85
## Australia : 12 Max. :2007 Max. :82.60
## (Other) :1632
## pop gdpPercap
## Min. :6.001e+04 Min. : 241.2
## 1st Qu.:2.794e+06 1st Qu.: 1202.1
## Median :7.024e+06 Median : 3531.8
## Mean :2.960e+07 Mean : 7215.3
## 3rd Qu.:1.959e+07 3rd Qu.: 9325.5
## Max. :1.319e+09 Max. :113523.1
##
Check the data
life_exp_cont <- gapminder %>%
filter(year %in% c(1987, 2007)) %>%
group_by(continent) %>%
summarise(
lifeExp_1987 = first(lifeExp),
lifeExp_2007 = last(lifeExp),
change = lifeExp_2007 - lifeExp_1987,
avg_life = mean(lifeExp),
.groups = "drop"
) %>%
arrange(desc(avg_life))
print(life_exp_cont)
## # A tibble: 5 × 5
## continent lifeExp_1987 lifeExp_2007 change avg_life
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 Oceania 76.3 80.2 3.88 78.0
## 2 Europe 72 79.4 7.42 75.6
## 3 Americas 70.8 73.7 2.97 70.8
## 4 Asia 40.8 62.7 21.9 67.8
## 5 Africa 65.8 43.5 -22.3 54.1
I first filtered for the years I wanted to include.Then, I grouped the data by continent and then utilized the summarise function to calculate the new columns, with the change column being calculated as the life expectancy of 1987 minus the one in 2007, showing the change in life expectancy over time.Then I arranged my result in a descendant way.
life_exp_country <- gapminder %>%
filter(year >= 1987 & year <= 2007) %>%
group_by(country, year) %>%
summarise(
avg_life = mean(lifeExp),
.groups = "drop"
) %>%
arrange(desc(avg_life))
focal_countries <- life_exp_country %>%
filter(country %in% c("Niger", "Bangladesh", "El Salvador", "Iraq", "Zimbabwe")
)
focal_countries
## # A tibble: 25 × 3
## country year avg_life
## <fct> <int> <dbl>
## 1 El Salvador 2007 71.9
## 2 El Salvador 2002 70.7
## 3 El Salvador 1997 69.5
## 4 El Salvador 1992 66.8
## 5 Iraq 1987 65.0
## 6 Bangladesh 2007 64.1
## 7 El Salvador 1987 63.2
## 8 Zimbabwe 1987 62.4
## 9 Bangladesh 2002 62.0
## 10 Zimbabwe 1992 60.4
## # ℹ 15 more rows
I followed a similar structure from the previous step. Separating the 5 countries with the filter(), grouping them with %in% and then selecting the countries with c(). Here I tried to use the select() function to separate the countries but when doing the graph below the countries did not appeared, so I had to move to the filter function.
enhanced_table <- life_exp_cont %>%
select(-avg_life) %>%
gt() %>%
cols_label(
continent = "Continent",
lifeExp_1987 = "1987",
lifeExp_2007 = "2007",
change = "Change (2007-1987)"
) %>%
fmt_number(
columns = c(`lifeExp_1987`, `lifeExp_2007`, change),
decimals = 1,
use_seps = TRUE
) %>%
tab_style(
style = cell_text(weight = "bold"),
locations = cells_column_labels()
) %>%
tab_header(
title = md("**Life Expectancy Changes by Continent**"),
subtitle = md("Average life expectancy in years")
) %>%
tab_source_note(
source_note = "Data: Gapminder"
)
enhanced_table
| Life Expectancy Changes by Continent | |||
| Average life expectancy in years | |||
| Continent | 1987 | 2007 | Change (2007-1987) |
|---|---|---|---|
| Oceania | 76.3 | 80.2 | 3.9 |
| Europe | 72.0 | 79.4 | 7.4 |
| Americas | 70.8 | 73.7 | 3.0 |
| Asia | 40.8 | 62.7 | 21.9 |
| Africa | 65.8 | 43.5 | −22.3 |
| Data: Gapminder | |||
I started by removing the avg_life column with the select(-) as I did not need it for the column. Then I used the gt tool to start making the graph and entered the values for the table. After that, I changed the style, making the text in the columns bold. Then, I changed the headers using the markdown format. I concluded with the source note.
ggplot(
focal_countries,
aes(x = year, y = avg_life,
color = country, fill = country)
) +
geom_line(linewidth = 1.5) +
scale_color_brewer(palette = "Set1") +
theme_minimal() +
theme(
panel.grid = element_blank(),
plot.title = element_text(face = "bold", size = 14),
plot.subtitle = element_text(size = 12),
legend.position = "bottom",
) +
labs(
title = "Life Expectancy Trajectories (1987-2007)",
subtitle = "in Selected Countries",
x = "Year",
y = "Life Expectancy (years)",
)
I created the visualization following the steps on Tutorial 3. I used the ggplot tool and include the values for my x and y axis and the color which will represent the country. I chose to do it not with just the two values for 1987 and 2007 but with the range of years from 1987 to 2007 to showcase the trajectory of the countries selected. After that, I changed the width of the line and copy the code for the color. Then, I removed the background with theme_minimal() and change the elements of the visual with theme. Lastly, I used the labs command to add the necessary titles, subtitles, and footnotes.
Continental trends: The analysis of continental life expectancy trends between 1987 and 2007 reveals significant regional disparities. Oceania maintained the highest baseline life expectancy and achieved the smallest increase of years, reflecting its already advanced healthcare systems and relatively stable socio-economic conditions. Europe exhibited the most substantial improvement, rising from 72.0 to 79.4 years. This might be caused by the post-Cold War investments in public health infrastructure and advancements in health research. Asia demonstrated the largest relative gain, surging from a low baseline of 40.8 to 62.7 years, a trajectory consistent with rapid economic development and expanded access to primary care. Conversely, Africa experienced a major decline, falling from 65.8 to 43.5 years, a pattern likely exacerbated by the HIV/AIDS pandemic, political instability, and underfunded health systems. The Americas showed moderate progress, though structural inequalities, political instability and variable healthcare access may explain its intermediate position between high- and low-performing regions, where some people might enjoy from an expanded lifetime but others do not.
Five-country analysis: The trajectories of the five focal nations highlight divergent developmental pathways. El Salvador exhibited the most consistent upward trend, rising from 63.2 to 71.9 years, likely driven by poverty reduction initiatives. Bangladesh achieved moderate gains, reflecting incremental improvements in public health despite persistent resource constraints. Zimbabwe and Niger exemplify stagnation and volatility. Zimbabwe’s life expectancy collapsed from 62.4 years to 44.7 years, most likely due to corruption, political instability, lack of access to healthcare and HIV prevalence. Niger’s marginal improvements underscore challenges in combating infectious diseases and malnutrition. Iraq displayed a unique trajectory: initial stability followed by decline during economic sanctions and conflict, with partial recovery pafter the year 2003.
table(anes_2020$TrustPeople)
##
## Always Most of the time About half the time Some of the time
## 48 3511 2020 1597
## Never
## 264
table(anes_2020$AgeGroup)
##
## 18-29 30-39 40-49 50-59 60-69 70 or older
## 871 1241 1081 1200 1436 1330
I started by looking at my variables.
anes_2020_clean <- anes_2020 %>%
filter(!is.na(TrustPeople), !is.na(AgeGroup))
Then, filtering the missing values of both columns.
trust_by_age <- anes_2020_clean %>%
group_by(AgeGroup) %>%
count(TrustPeople) %>%
mutate(
prop = n/sum(n),
percent = round(100 * prop, 1)
)
trust_by_age
## # A tibble: 30 × 5
## # Groups: AgeGroup [6]
## AgeGroup TrustPeople n prop percent
## <fct> <fct> <int> <dbl> <dbl>
## 1 18-29 Always 7 0.00804 0.8
## 2 18-29 Most of the time 268 0.308 30.8
## 3 18-29 About half the time 278 0.319 31.9
## 4 18-29 Some of the time 246 0.282 28.2
## 5 18-29 Never 72 0.0827 8.3
## 6 30-39 Always 10 0.00807 0.8
## 7 30-39 Most of the time 502 0.405 40.5
## 8 30-39 About half the time 378 0.305 30.5
## 9 30-39 Some of the time 281 0.227 22.7
## 10 30-39 Never 68 0.0549 5.5
## # ℹ 20 more rows
Here, I calculated the proportions, the number, and the percentage of trust in people by age group, so each age group will display the proportions within the same category.
total_sample_size <- nrow(anes_2020_clean)
total_sample_size
## [1] 7153
I counted the total sample with nrow() with the previously cleaned data to avoid missing values.
trust_wide <- trust_by_age %>%
select(-prop, -n) %>%
pivot_wider(
names_from = TrustPeople,
values_from = percent
) %>%
ungroup()
print(trust_wide)
## # A tibble: 6 × 6
## AgeGroup Always `Most of the time` `About half the time` `Some of the time`
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 18-29 0.8 30.8 31.9 28.2
## 2 30-39 0.8 40.5 30.5 22.7
## 3 40-49 0.7 44.1 29.1 22.9
## 4 50-59 0.2 48.9 27.1 20.8
## 5 60-69 0.7 52.4 25.2 19.8
## 6 70 or older 0.6 59.2 21.6 17.3
## # ℹ 1 more variable: Never <dbl>
I got stuck here for a while because I was not able to figure out a way to arrange my data in a way that the trust categories where converted to column headers. Fortunately, I found this youtube video that help me to use and understand the function pivot_wider().https://youtu.be/YpAdZ4079qs?si=YKOM7pdcjRIs3dx4. It was important to ungroup the values at the end, because at first I was having troubles with my table being grouped by age and the age group column disappearing. After ungrouping, I was able to get the column age group back.
trust_table <- trust_wide %>%
gt() %>%
cols_label(
AgeGroup = "Age Group",
Always = "Always",
`Most of the time` = "Most of the time",
`About half the time` = "About half the time",
`Some of the time` = "Some of the time",
Never = "Never"
) %>%
fmt_number(
columns = c(Always, `Most of the time`, `About half the time`, `Some of the time`, Never),
decimals = 1
) %>%
tab_style(
style = cell_text(weight = "bold"),
locations = cells_column_labels(columns = "AgeGroup")
) %>%
tab_style(
style = cell_text(align = "center"),
locations = cells_body(columns = c(Always, `Most of the time`, `About half the time`, `Some of the time`, Never))
) %>%
tab_header(
title = md("**Interpersonal Trust by Age Group**"),
subtitle = "Distribution of responses (percentages)"
) %>%
tab_source_note(
source_note = paste("Data: ANES 2020 (sample size =", total_sample_size, ")")
)
trust_table
| Interpersonal Trust by Age Group | |||||
| Distribution of responses (percentages) | |||||
| Age Group | Always | Most of the time | About half the time | Some of the time | Never |
|---|---|---|---|---|---|
| 18-29 | 0.8 | 30.8 | 31.9 | 28.2 | 8.3 |
| 30-39 | 0.8 | 40.5 | 30.5 | 22.7 | 5.5 |
| 40-49 | 0.7 | 44.1 | 29.1 | 22.9 | 3.2 |
| 50-59 | 0.2 | 48.9 | 27.1 | 20.8 | 3.1 |
| 60-69 | 0.7 | 52.4 | 25.2 | 19.8 | 1.9 |
| 70 or older | 0.6 | 59.2 | 21.6 | 17.3 | 1.3 |
| Data: ANES 2020 (sample size = 7153 ) | |||||
Following the steps of the first table, I was able to use a similar structure while adding some important things like aligning the values to the center in the trust categories to make it more legible and clear. Also, using paste = to paste the total sample size directly.
ggplot(
data = anes_2020_clean %>%
filter(!is.na(TrustPeople), !is.na(AgeGroup)),
mapping = aes(x = AgeGroup, fill = TrustPeople)
) +
geom_bar(
position = "fill",
color = "white",
alpha = 0.9
) +
coord_flip() +
scale_fill_viridis_d(
option = "mako",
direction = -1
) +
scale_y_continuous(
labels = scales::percent,
breaks = seq(0, 1, 0.2)
) +
labs(
title = "Interpersonal Trust Distribution by Age Group",
x = "Age Group",
y = "Percentage of Trust by Age Group",
fill = "Level of Trust",
caption = paste("Data: ANES 2020 | Total sample size =", total_sample_size)
) +
theme_minimal() +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(face = "bold", size = 14),
axis.title.x = element_text(face = "bold", size = 11),
axis.title.y = element_text(face = "bold", size = 11),
plot.caption = element_text(face = "italic", hjust = 0),
axis.text = element_text(size = 10)
)
Following the steps in tutorial 3, and filtering missing results at the begging to avoid confusion, I was able to copy most of the structure of this graph and I added some details like personalizing the size and face of the labels and captions to make it clearer. I also used the labels to display the percentage signs on the y axis and change the sequel to 0 to 100 every 20% to reduce the volume of visuals at the bottom and make it clearer.
#Interpretation The data reveals distinct age patterns in interpersonal trust levels. Younger age groups, particularly those aged 18-29, exhibit lower levels of trust. As age increases, trust levels tend to rise, with the oldest group (70 or older) showing the highest proportion of individuals who trust others “Most of the time”. When it comes to older people, they tend to trust other people more, especially when compared to young adults, who have a harder time trusting most people. This suggests that trust in others may increase with age, possibly due to life experiences or social stability. Other possible explanation to why younger age groups tend to show lower interpersonal trust might be due to the increasing political and cultural polarization that the world is going through. There is a clear trend that has been going on, from higher levels of trust, to lower levels of trust over the years and it might be worthwhile to unpack the reasons behind such trend.
library(fst)
italy_fairness <- read_fst("italy_data.fst")
table(italy_fairness$sofrdst)
##
## 1 2 3 4 5 7 8
## 692 1346 448 174 31 9 45
table(italy_fairness$eisced)
##
## 0 1 2 3 4 5 6 7 55 77 88 99
## 1207 1090 2685 626 2874 295 403 891 25 68 9 5
Check the data to spot N/A values
library(fst)
denmark_fairness <- read_fst("denmark_data.fst")
table(denmark_fairness$sofrdst)
##
## 1 2 3 4 5 7 8 9
## 79 268 325 674 202 4 16 4
table(denmark_fairness$eisced)
##
## 1 2 3 4 5 6 7 55 77 88 99
## 610 2353 3468 967 1282 2410 1239 14 2 4 59
Check the data to spot N/A values
italy_fairness <- italy_fairness %>%
mutate(
cntry = "Italy",
social_fairness = case_when(
sofrdst == 1 ~ "Agree\nstrongly",
sofrdst == 2 ~ "Agree",
sofrdst == 3 ~ "Neither agree\nnor disagree",
sofrdst == 4 ~ "Disagree",
sofrdst == 5 ~ "Disagree\nstrongly",
TRUE ~ NA_character_
),
social_fairness = factor(
social_fairness,
levels = c("Agree\nstrongly", "Agree",
"Neither agree\nnor disagree", "Disagree", "Disagree\nstrongly")
)
)
table(italy_fairness$social_fairness)
##
## Agree\nstrongly Agree
## 692 1346
## Neither agree\nnor disagree Disagree
## 448 174
## Disagree\nstrongly
## 31
Here I changed the numbers with the names I found on the ESS website, and then I converted them into meaningful categories with the factor() function. I changed the names to make them easy to read in the graphs due to some of the names being to long and were overlapping when I did my graphs. By changing the name now I can make the font of the categories bigger in the graphs.
italy_fairness_by_edu <- italy_fairness %>%
mutate(
education = case_when(
eisced %in% c(1:4) ~ "Less than BA",
eisced %in% c(5:7) ~ "BA or Higher",
TRUE ~ NA_character_
),
education = factor(education)
)
table(italy_fairness_by_edu$education)
##
## BA or Higher Less than BA
## 1589 7275
Here I grouped the education categories to make them meaningful when comparing the data, I choose these two levels to see if there is substantial difference between people going to post-secondary education, and those who did not.
denmark_fairness <- denmark_fairness %>%
mutate(
cntry = "Denmark",
social_fairness = case_when(
sofrdst == 1 ~ "Agree\nstrongly",
sofrdst == 2 ~ "Agree",
sofrdst == 3 ~ "Neither agree\nnor disagree",
sofrdst == 4 ~ "Disagree",
sofrdst == 5 ~ "Disagree\nstrongly",
TRUE ~ NA_character_
),
social_fairness = factor(
social_fairness,
levels = c("Agree\nstrongly", "Agree",
"Neither agree\nnor disagree", "Disagree", "Disagree\nstrongly")
)
)
table(denmark_fairness$social_fairness)
##
## Agree\nstrongly Agree
## 79 268
## Neither agree\nnor disagree Disagree
## 325 674
## Disagree\nstrongly
## 202
denmark_fairness_by_edu <- denmark_fairness %>%
mutate(
education = case_when(
eisced %in% c(1:4) ~ "Less than BA",
eisced %in% c(5:7) ~ "BA or Higher",
TRUE ~ NA_character_
),
education = factor(education)
)
table(denmark_fairness_by_edu$education)
##
## BA or Higher Less than BA
## 4931 7398
combined_fairness <- bind_rows(italy_fairness, denmark_fairness)
table(combined_fairness$social_fairness)
##
## Agree\nstrongly Agree
## 771 1614
## Neither agree\nnor disagree Disagree
## 773 848
## Disagree\nstrongly
## 233
I combined the fairness data to use it in my table.
combined_data_by_edu <- bind_rows(italy_fairness_by_edu, denmark_fairness_by_edu)
table(combined_data_by_edu$social_fairness)
##
## Agree\nstrongly Agree
## 771 1614
## Neither agree\nnor disagree Disagree
## 773 848
## Disagree\nstrongly
## 233
table(combined_data_by_edu$education)
##
## BA or Higher Less than BA
## 6520 14673
I combined the fairness by education data to include it in my graph.
response_dist_trust <- combined_fairness %>%
filter(!is.na(social_fairness))%>%
group_by(cntry) %>%
count(social_fairness) %>%
mutate(
prop = n/sum(n),
percent = round(100 * prop, 1)
) %>%
ungroup()
response_dist_trust
## # A tibble: 10 × 5
## cntry social_fairness n prop percent
## <chr> <fct> <int> <dbl> <dbl>
## 1 Denmark "Agree\nstrongly" 79 0.0510 5.1
## 2 Denmark "Agree" 268 0.173 17.3
## 3 Denmark "Neither agree\nnor disagree" 325 0.210 21
## 4 Denmark "Disagree" 674 0.435 43.5
## 5 Denmark "Disagree\nstrongly" 202 0.130 13
## 6 Italy "Agree\nstrongly" 692 0.257 25.7
## 7 Italy "Agree" 1346 0.500 50
## 8 Italy "Neither agree\nnor disagree" 448 0.166 16.6
## 9 Italy "Disagree" 174 0.0647 6.5
## 10 Italy "Disagree\nstrongly" 31 0.0115 1.2
I proceed to calculate the proportions by using the skills I learned in previous steps, like filtering before starting and grouping by the variable I am interested and then dropping the group to make the table.
sample_sizes <- combined_fairness %>%
filter(!is.na(social_fairness))%>%
group_by(cntry) %>%
summarise(
total_sample = n()
)
education_samples <- combined_data_by_edu %>%
filter(!is.na(education))%>%
count(cntry, education) %>%
group_by(cntry) %>%
mutate(total = sum(n)) %>%
ungroup()
sample_sizes
## # A tibble: 2 × 2
## cntry total_sample
## <chr> <int>
## 1 Denmark 1548
## 2 Italy 2691
education_samples
## # A tibble: 4 × 4
## cntry education n total
## <chr> <fct> <int> <int>
## 1 Denmark BA or Higher 4931 12329
## 2 Denmark Less than BA 7398 12329
## 3 Italy BA or Higher 1589 8864
## 4 Italy Less than BA 7275 8864
Here I calculated the sample sizes for both my values always filtering the missing values.
combined_fairness_wide <- response_dist_trust %>%
select(-prop, -n) %>%
pivot_wider(
names_from = social_fairness,
values_from = percent
) %>%
ungroup()
print(combined_fairness_wide)
## # A tibble: 2 × 6
## cntry `Agree\nstrongly` Agree `Neither agree\nnor disagree` Disagree
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Denmark 5.1 17.3 21 43.5
## 2 Italy 25.7 50 16.6 6.5
## # ℹ 1 more variable: `Disagree\nstrongly` <dbl>
With my proportions and with the insight from previous tables and the youtube video I watched, I converted my data to match the table structure I wanted.
fairness_table <- combined_fairness_wide %>%
gt() %>%
cols_label(
cntry = "Country",
'Agree\nstrongly' = "Agree\nstrongly",
`Agree` = "Agree",
`Neither agree\nnor disagree` = "Neither agree\nnor disagree",
`Disagree` = "Disagree",
'Disagree\nstrongly' = "Disagree\nstrongly"
) %>%
fmt_number(
columns = c('Agree\nstrongly', `Agree`, `Neither agree\nnor disagree`, `Disagree`, 'Disagree\nstrongly'),
decimals = 1
) %>%
tab_style(
style = cell_text(weight = "bold"),
locations = cells_column_labels(columns = "cntry")
) %>%
tab_style(
style = cell_text(align = "center"),
locations = cells_body(columns = c('Agree\nstrongly', `Agree`, `Neither agree\nnor disagree`, `Disagree`, 'Disagree\nstrongly'))
) %>%
tab_header(
title = md("**Views on Fair Income Distribution**"),
subtitle = "Response distribution by country (%)"
) %>%
tab_source_note(
source_note = md(
paste("Sample size:",
paste(sample_sizes$cntry,"(", sample_sizes$total_sample, ")")
)
)
)
fairness_table
| Views on Fair Income Distribution | |||||
| Response distribution by country (%) | |||||
| Country | Agree strongly | Agree | Neither agree nor disagree | Disagree | Disagree strongly |
|---|---|---|---|---|---|
| Denmark | 5.1 | 17.3 | 21.0 | 43.5 | 13.0 |
| Italy | 25.7 | 50.0 | 16.6 | 6.5 | 1.2 |
| Sample size: Denmark ( 1548 ) | |||||
| Sample size: Italy ( 2691 ) | |||||
I made the table with a similar structure as the previous tables but with my new, cleaned data.
fairness_plot <- combined_fairness %>%
filter(!is.na(social_fairness)) %>%
ggplot(
mapping = aes(
x = as.numeric(social_fairness),
y = cntry,
fill = cntry
)
) +
geom_density_ridges(
alpha = 0.7,
scale = 0.9,
bandwidth = 0.1
) +
scale_fill_brewer(palette = "Set1") +
scale_x_continuous(
breaks = 1:5,
labels = levels(combined_fairness$social_fairness)
) +
labs(
title = "Distribution of Views on Income Equality",
subtitle = "Comparison between Italy and Denmark",
x = NULL,
y = NULL
) +
theme_minimal() +
theme(
panel.grid.minor = element_blank(),
legend.position = "none",
axis.text = element_text(face = "bold",size = 10)
)
fairness_plot
First, I cleaned my combined data using the filter function. Then, I used most of the structure of the graph provided on tutorial 3 and I added some elements to make my graph clear like changing the scale of the plot and removing unnecessary captions. I also made the category labels bigger and bold for clarity.
edu_plot <- combined_data_by_edu %>%
filter(!is.na(social_fairness), !is.na(education)) %>%
ggplot(
mapping = aes(
x = as.numeric(social_fairness),
y = education,
fill = education
)
) +
geom_density_ridges(
alpha = 0.7,
scale = 0.9,
bandwidth = 0.1
) +
scale_fill_brewer(palette = "Set1") +
scale_x_continuous(
breaks = 1:5,
labels = levels(combined_fairness$social_fairness)
) +
facet_wrap(~cntry) +
labs(
title = "Views on Income Distribution by Education Level",
subtitle = "Comparing Italy and Denmark",
x = NULL,
y = NULL
) +
theme_minimal() +
theme(
panel.grid.minor = element_blank(),
legend.position = "none",
axis.text = element_text(face = "bold", size = 7),
strip.text = element_text(face = "bold", size = 10)
)
edu_plot
Following a similar structure from the previous graph, I cleaned my data to make this graph. The change in the income equality were useful here as some of the categories were previously to long but now they are legible and I did not had to put an inclination angle to them.
#Interpretation 1. Country differences in views: There are notable differences between Denmark and Italy in their views on fair income distribution. In Denmark, a significant proportion of respondents “Disagree” that income is distributed fairly, while a small fraction of them “Agree strongly.” In contrast, Italy shows a much higher proportion of respondents who “Agree” or “Agree strongly” that income is distributed fairly. This suggests that Italians are generally more satisfied with income distribution compared to Danes, who are more critical. This is surprising due to the social democrat model of Denmark, which should be more effective when distributing wealth and developing public infrastructure.
Educational patterns within countries: Within both countries, educational success influences perceptions of income fairness. In Denmark, individuals with a “BA or Higher” are slightly more likely to “Disagree” or “Disagree strongly” with fair income distribution compared to those with “Less than BA.” Similarly, in Italy, those with higher education are more critical, with a higher proportion of “Disagree” responses. This indicates that higher education may lead to a more critical view of income distribution, possibly due to greater awareness of economic inequalities.