library(tidyverse)
library(ggplot2)
library(highcharter)
setwd("C:/Users/kpeter81/OneDrive - montgomerycollege.edu/Datasets")
census <- read_csv("census - census.csv")Project #2 - Data 101
Introduction
Is there an association between gender and whether an individual is married or single? The dataset used for this project contains a 500 observation sample from the 2000 U.S. Census Data. It has 8 variables, 2 of which will be used for this project: gender and marital status. Both variables are categorical, marital status has 6 levels: divorced, married/spouse absent, married/spouse present, Never married/Single, Separated, and Widowed. The data was accessed through OpenIntro.(https://www.openintro.org/data/index.php?data=census)
Loading the Libraries and Data
Exploratory Data Analysis
unique(census$marital_status)[1] "Married/spouse present" "Never married/single" "Widowed"
[4] "Divorced" "Separated" "Married/spouse absent"
Here are the 6 different levels of marital status. However, I am only interested in whether the individual is married or single. I am going to filter the dataset and create a new binary variable that categorizes marital status as married or single.
married <- census |>
select(marital_status, sex) |>
filter(marital_status %in% c("Married/spouse present", "Married/spouse absent", "Never married/single")) |>
mutate(marital_status = ifelse(marital_status %in% c("Married/spouse present", "Married/spouse absent"), "married", "single"))
head(married)# A tibble: 6 × 2
marital_status sex
<chr> <chr>
1 married Male
2 single Female
3 single Male
4 single Female
5 married Male
6 married Female
Now that we have our dataset, we can visualize the proportions for each group.
married |>
ggplot(aes(y = marital_status, fill = sex)) +
geom_bar(position = "fill") +
scale_fill_manual(values = c("lightpink", "steelblue")) +
labs(y = "Marital Status", title = "Marital Status by Gender") +
theme_minimal(base_family = "serif")From this filled bar graph, examining the proportions of male and females in each marital status, it seems to be distributed fairly equally. Males have a slightly higher proportion to be single (~62%), but I doubt it will be significant after further testing.
Chi-Squared Test for Independence
To determine if there is an association between marital status and gender, I will perform a chi-squared test of independence. I chose this test because I want to test, not for a difference in proportions or means, but for any association between gender and marital status. Hopefully, the two variables will be independent and there will be a p-value above the alpha of 0.05, indicating that males and females have equal probabilities of being married or single.
\(H_0\) : Gender is not associated with whether an individual is married or single.
\(H_a\) : Gender is associated with whether an individual is married or single.
# creating a table for the chi squared test
observed_dataset<- table(married$marital_status, married$sex)
observed_dataset
Female Male
married 97 109
single 93 129
# performing the test for independence on the dataset
chisq.test(observed_dataset)
Pearson's Chi-squared test with Yates' continuity correction
data: observed_dataset
X-squared = 0.9674, df = 1, p-value = 0.3253
Fail to reject the null. Our p-value was 0.3253, far above our alpha value of 0.05. Thus, there is no significant evidence to suggest that there is an association between gender and whether an individual is married or single.
Conclusion
In my analysis I found that there is no significant association between gender and marital status, the two variables are independent because our p-value far surpassed our alpha level for significance (0.05). This implies that, fortunately, there is no association in the US between gender and marital status. Though there was a slight difference in proportions in the data, the p-value revealed this to be an insignificant difference.
Further research can be conducted to either support or refute these findings, or determine what other measurable variables do have an association with marital status.
References
Data accessed through OpenIntro: https://www.openintro.org/data/index.php?data=census