Project #2 - Data 101

Author

Kalina P

Introduction

Is there an association between gender and whether an individual is married or single? The dataset used for this project contains a 500 observation sample from the 2000 U.S. Census Data. It has 8 variables, 2 of which will be used for this project: gender and marital status. Both variables are categorical, marital status has 6 levels: divorced, married/spouse absent, married/spouse present, Never married/Single, Separated, and Widowed. The data was accessed through OpenIntro.(https://www.openintro.org/data/index.php?data=census)

Loading the Libraries and Data

library(tidyverse)
library(ggplot2)
library(highcharter)
setwd("C:/Users/kpeter81/OneDrive - montgomerycollege.edu/Datasets")
census <- read_csv("census - census.csv")

Exploratory Data Analysis

unique(census$marital_status)
[1] "Married/spouse present" "Never married/single"   "Widowed"               
[4] "Divorced"               "Separated"              "Married/spouse absent" 

Here are the 6 different levels of marital status. However, I am only interested in whether the individual is married or single. I am going to filter the dataset and create a new binary variable that categorizes marital status as married or single.

married <- census |>
  select(marital_status, sex) |>
  filter(marital_status %in% c("Married/spouse present", "Married/spouse absent", "Never married/single")) |>
  mutate(marital_status = ifelse(marital_status %in% c("Married/spouse present", "Married/spouse absent"), "married", "single"))

head(married)
# A tibble: 6 × 2
  marital_status sex   
  <chr>          <chr> 
1 married        Male  
2 single         Female
3 single         Male  
4 single         Female
5 married        Male  
6 married        Female

Now that we have our dataset, we can visualize the proportions for each group.

married |>
  ggplot(aes(y = marital_status, fill = sex)) +
  geom_bar(position = "fill") +
  scale_fill_manual(values = c("lightpink", "steelblue")) +
  labs(y = "Marital Status", title = "Marital Status by Gender") +
  theme_minimal(base_family = "serif")

From this filled bar graph, examining the proportions of male and females in each marital status, it seems to be distributed fairly equally. Males have a slightly higher proportion to be single (~62%), but I doubt it will be significant after further testing.

Chi-Squared Test for Independence

To determine if there is an association between marital status and gender, I will perform a chi-squared test of independence. I chose this test because I want to test, not for a difference in proportions or means, but for any association between gender and marital status. Hopefully, the two variables will be independent and there will be a p-value above the alpha of 0.05, indicating that males and females have equal probabilities of being married or single.

\(H_0\) : Gender is not associated with whether an individual is married or single.

\(H_a\) : Gender is associated with whether an individual is married or single.

# creating a table for the chi squared test
observed_dataset<- table(married$marital_status, married$sex)
observed_dataset
         
          Female Male
  married     97  109
  single      93  129
# performing the test for independence on the dataset
chisq.test(observed_dataset)

    Pearson's Chi-squared test with Yates' continuity correction

data:  observed_dataset
X-squared = 0.9674, df = 1, p-value = 0.3253

Fail to reject the null. Our p-value was 0.3253, far above our alpha value of 0.05. Thus, there is no significant evidence to suggest that there is an association between gender and whether an individual is married or single.

Conclusion

In my analysis I found that there is no significant association between gender and marital status, the two variables are independent because our p-value far surpassed our alpha level for significance (0.05). This implies that, fortunately, there is no association in the US between gender and marital status. Though there was a slight difference in proportions in the data, the p-value revealed this to be an insignificant difference.

Further research can be conducted to either support or refute these findings, or determine what other measurable variables do have an association with marital status.

References

Data accessed through OpenIntro: https://www.openintro.org/data/index.php?data=census