Lab 5

Author

Robin Chavez

PART 1: Practice using pipes (dplyr) to summarize data: Two Categorical Variables

Code
library(socviz)
library(dplyr)
library(ggplot2)

colnames(gss_sm)
 [1] "year"        "id"          "ballot"      "age"         "childs"     
 [6] "sibs"        "degree"      "race"        "sex"         "region"     
[11] "income16"    "relig"       "marital"     "padeg"       "madeg"      
[16] "partyid"     "polviews"    "happy"       "partners"    "grass"      
[21] "zodiac"      "pres12"      "wtssall"     "income_rc"   "agegrp"     
[26] "ageq"        "siblings"    "kids"        "religion"    "bigregion"  
[31] "partners_rc" "obama"      
Code
cat("Categorical Variables: Marital and Degree")
Categorical Variables: Marital and Degree
Code
pip1 <- gss_sm %>%
  filter(!is.na(marital) & !is.na(degree)) %>%
  group_by(marital, degree) %>%
  summarize(N = n()) %>%
  mutate(freq = N/sum(N),
         pct = round((freq*100),0))
pip1
# A tibble: 25 × 5
# Groups:   marital [5]
   marital degree             N   freq   pct
   <fct>   <fct>          <int>  <dbl> <dbl>
 1 Married Lt High School   110 0.0909     9
 2 Married High School      555 0.459     46
 3 Married Junior College    91 0.0752     8
 4 Married Bachelor         279 0.231     23
 5 Married Graduate         175 0.145     14
 6 Widowed Lt High School    53 0.213     21
 7 Widowed High School      128 0.514     51
 8 Widowed Junior College    13 0.0522     5
 9 Widowed Bachelor          28 0.112     11
10 Widowed Graduate          27 0.108     11
# ℹ 15 more rows
Code
# Create plot 

p <- ggplot(data = pip1,
            aes(x = marital, y = pct, fill = degree))
p +  geom_bar(stat = "identity") +
  labs(title = "Distribution of Marital Status by Degree",
       subtitle = "Married people tend to have a Graduate degree, than other people", 
       caption = "gss_sm Dataset", 
       x = "Marital Status",
       y = "Percentage") + geom_text(aes(label = pct),
              position = position_stack(vjust = 0.5))+ 
  theme_minimal()

Interpretation

These are the insights that can be understood from the chart:
Married people tend to have higher education levels, with 23% having a Bachelor’s degree and 14% having a Graduate degree.

On the other hand, separated individuals stand out for having lower education levels. Only 10% have a Bachelor’s degree, 5% have a Graduate degree, and a significant 24% have less than a high school education.

For never-married and divorced individuals, a large portion (56%) have a high school degree. Additionally, 9% of divorced individuals and 12% of never-married individuals have an education level below high school.

Widowed individuals show a different pattern, with the lowest percentage (5%) attending junior college. The distribution is more even, with 11% having either a Bachelor’s or a Graduate degree.

In summary, the chart helps to see how education levels vary across different marital status groups. Married individuals often pursue higher education, while separated individuals tend to have lower educational levels. Never-married and divorced individuals often focus on high school education, and widowed individuals have a more balanced distribution across different education levels.

PART 2: Create stacked and dodged bar charts: Two Categorical Variables

Code
# Stacked Bar Chart
p + geom_col(position = "stack") + 
  geom_text(aes(label = sprintf("%.1f%%", pct), group = degree),
            position = position_stack(vjust = 0.5), 
            color = "white",  
            size = 3) +  
  labs(x = "Marital Status", y = "Percent", fill = "Degree", 
       title = "Stack Bar: Marital Status and Degree", 
       caption = "gss_ssm Dataset", 
       subtitle = "Married people tend to have a Graduate degree, than other people") +
  theme_minimal()

Code
# Dodged Bar Chart with labels
p + geom_col(position = "dodge2") +
  geom_text(aes(label = sprintf("%.1f%%", pct), group = degree),
            position = position_dodge(width = 0.9),   
            vjust = -0.5,  
            color = "black", 
            size = 3) +  
  labs(x = "Marital Status", y = "Percent", fill = "Degree",
       title = "Dodged Bar: Marital Status and Degree", 
       subtitle = "Married people tend to have a Graduate degree, than other people", 
       caption = "gss_ssm Dataset") +
  theme(legend.position = "top") +
  theme_minimal()

Code
# Dodged Bar , faceted horizontal chart with no legends
p + geom_col(position = "dodge2") +
  geom_text(aes(label = sprintf("%.1f%%", pct), group = degree),
            position = position_dodge(width = 0.9), vjust = -0.5,
            color = "black", size = 3) +
  labs(x = NULL, y = "Percent", fill = "Degree", 
       title = "Dodged Bar with Changes: Marital Status and Degree", 
       subtitle = "Married people tend to have a Graduate degree, than other people", 
       caption = "gss_ssm Dataset") +
  guides(fill = FALSE) + 
  coord_flip() + 
  facet_grid(~ degree) +
  theme_minimal()

PART 3: Practice using pipes (dplyr) to summarize data: Two Continuous Variables and One Categorical

Code
cat("Categorical Variable: Marital \nContinous Variables: Age and Childs")
Categorical Variable: Marital 
Continous Variables: Age and Childs
Code
pip1new <- gss_sm %>%
  filter(!is.na(marital) & !is.na(age) & !is.na(childs)) %>%
  group_by(marital) %>%
  summarize(
    mean_age = mean(age),
    mean_childs = mean(childs),
      N = n()) %>%
  mutate(freq = N/sum(N),
         pct = round((freq*100),0))
pip1new
# A tibble: 5 × 6
  marital       mean_age mean_childs     N   freq   pct
  <fct>            <dbl>       <dbl> <int>  <dbl> <dbl>
1 Married           51.3       2.17   1203 0.422     42
2 Widowed           72.3       2.85    250 0.0878     9
3 Divorced          54.5       2.23    491 0.172     17
4 Separated         47.8       2.46    101 0.0355     4
5 Never Married     35.6       0.748   803 0.282     28

PART 4: Create a scatterplot: Two Continuous Variables and One Categorical

Code
library(ggplot2)

scatter_plot <- ggplot(pip1new, aes(x = mean_age, y = mean_childs, color = marital)) +
  geom_point(size = 5) +
  labs(title = "Scatterplot of Mean Age vs. Mean Children",
       subtitle = "Widowed People tend to be older and have more children, \n while never married people tend to be younger and have no kids at all ",
       x = "Mean Age",
       y = "Mean Children",
       color = "Marital Status") +
  theme_minimal()

scatter_plot + 
  annotate(geom = "text", x = 60, y = 2.9, label = "\n Widowed people tend \n to be older \n and have more children") +
  annotate(
    geom = "rect", 
    xmin = 70, xmax = 75,
    ymin = 2.5, ymax = 3, 
    fill = "red", 
    alpha = 0.2
  )

PART 5: Legends and guides

Code
library(ggplot2)

scatter_plot <- ggplot(pip1new, aes(x = mean_age, y = mean_childs, color = marital)) +
  geom_point(size = 5) +  
  labs(title = "Scatterplot of Mean Age vs. Mean Children",
       subtitle = "Widowed People tend to be older and have more children, \nwhile never married people tend to be younger and have no kids at all ", 
       x = "Mean Age",
       y = "Mean Children",
       color = "Marital Status") +
  theme_minimal() +
  theme(legend.title = element_text(face = "bold", size = 12, color = "black"), 
        legend.position = "bottom",  
        legend.box.background = element_rect(color = "lightgray"), 
        legend.background = element_rect(fill = "lightgray"), 
        legend.key.size = unit(1.5, "lines"))  

scatter_plot +  
  annotate(geom = "text", x = 60, y = 2.9, label = "\n Widowed people tend \n to be older \n and have more children") +
  annotate(
    geom = "rect", 
    xmin = 70, xmax = 75,
    ymin = 2.5, ymax = 3, 
    fill = "red", 
    alpha = 0.2
  )

PART 6: Data Labels

Code
library(ggplot2)
library(ggrepel)

scatter_plot <- ggplot(pip1new, aes(x = mean_age, y = mean_childs, color = marital)) +
  geom_point(size = 5) +
  labs(title = "Scatterplot of Mean Age vs. Mean Children",
       subtitle = "Widowed People tend to be older and have more children, \nwhile never married people tend to be younger and have no kids at all ", 
       x = "Mean Age",
       y = "Mean Children",
       color = "Marital Status") +
  theme_minimal() +
  theme(legend.position = "none")  

scatter_plot + 
  geom_text_repel(aes(label = marital), 
                  box.padding = 0.5, 
                  point.padding = 0.5)  + 
  annotate(geom = "text", x = 60, y = 2.9, label = "\n Widowed people tend \n to be older \n and have more children") +
  annotate(
    geom = "rect", 
    xmin = 70, xmax = 75,
    ymin = 2.5, ymax = 3, 
    fill = "red", 
    alpha = 0.2
  )

PART 7: Interpretation

-Widowed People: This group seems to be in the top right corner of the chart. This shows they are older on average and have more children. The results being close together here suggests a stable trend for widowed people. This can be because they may have started families young, had children throughout their marriage, or had children late in life with a younger partner.

-Never Married People: On the other hand, people who have never married are together in the bottom left corner. This suggests never-married people tend to be younger and have no children on average. This could be because this may reflect personal choices, economic concerns, lack of a long-term partner,

-Married People: The married people fall between the widowed and never married groups in age and number of children. The more spread out dots for this group shows there is more variety. This means there is diversity among married people in age and number of children.

-Separated People: These people tend to be younger than married and divorced people and on average have more children than these other two groups.

-Divorced People:These group is older than married and separated, but on average have less children than separated people.