Reading CSV
The above code block removes the first few columns in order to
refine the data to show females specifically, and break it down by the
females in the laborforce, and also pulls out the top numbers of each
category to give us a btter understanding.
Using Pivot Longer To Clean Our Data
Using the pivot longer command in this dataset was essential for
cleaning and organizing the large amount of data we are analyzing. Its
application allowed me to streamline and condense the large datasets
into more manageable blocks, and focus primarly on the data relevant to
my analysis. By reshaping the data, I was able to ensure accuracy, and
prepare for latter visualizations. Below is the code block demonstrating
how I reshaped the data to facilitate further analysis and
visualization.
females <- data.frame(
age_group = c("20-34 years", "35 to 44 years", "45-54 years", "55-64 years"),
total = c(33160377, 21785279, 20175854, 21471526),
married = c(32.0, 61.4, 63.0, 60.2),
widowed = c(0.2, 0.8, 2.4, 6.8),
divorced = c(3.1, 11.2, 17.2, 19.8),
separated = c(1.2, 3.0, 3.1, 2.6),
never_married = c(63.5, 23.6, 14.3, 10.6)
)
females_cleaned <- females %>%
pivot_longer(cols = -age_group,
names_to = "marital_status",
values_to = "percentage")
females_cleaned$marital_status <- factor(females_cleaned$marital_status,
levels = c("total", "married", "widowed", "divorced", "separated", "never_married"),
labels = c("Total", "Married", "Widowed", "Divorced", "Separated", "Never Married"))
print(females_cleaned)
## # A tibble: 24 × 3
## age_group marital_status percentage
## <chr> <fct> <dbl>
## 1 20-34 years Total 33160377
## 2 20-34 years Married 32
## 3 20-34 years Widowed 0.2
## 4 20-34 years Divorced 3.1
## 5 20-34 years Separated 1.2
## 6 20-34 years Never Married 63.5
## 7 35 to 44 years Total 21785279
## 8 35 to 44 years Married 61.4
## 9 35 to 44 years Widowed 0.8
## 10 35 to 44 years Divorced 11.2
## # ℹ 14 more rows
Visualizing the Data
To visualize the cleaned data, I generated separate bar graphs for
each group of women categorized by marital status, including married,
divorced, separated, widowed, and never married. Each bar graph is
color-coded to highlight the range between the highest and lowest
percentages of marital statuses within the female demographic.
Percentage of Women Married by Selected Age Groups
married_data <- females_cleaned %>%
filter(marital_status == "Married")
ggplot(married_data, aes(x = age_group, y = percentage)) +
geom_bar(stat = "identity", fill = "lightpink") +
geom_text(aes(label = paste0(percentage, "%")), vjust = -0.5, color = "black", size = 3) +
labs(title = "Married Percentage by Age Group",
x = "Age Group",
y = "Percentage") +
theme_minimal()

According to the data depicted in the above graph, women aged 45-54
years who are married appear to dominate the chart. This trend is
possibly attributed to upbringing during a generation when marriage held
significant societal importance. Furthermore, their sustained high
representation in successful marriages may contribute to this dominance
in the demographic.
Percentage of Women Divorced by Selected Age Groups
divorced_data <- subset(females_cleaned, marital_status == "Divorced")
text_labels <- c("3.1%", "11.2%", "17.2%", "19.8%")
ggplot(divorced_data, aes(x = age_group, y = percentage)) +
geom_bar(stat = "identity", fill = "darkblue") +
geom_text(aes(label = text_labels), vjust = -0.5, size = 4) + # Add text labels
labs(title = "Divorced Percentage by Age Group",
x = "Age Group",
y = "Percentage") +
theme_minimal()

From the data depicted in the graph above, it appears that women in
the 55-64 age group are notably predominant in the divorced category.
This trend may be attributed to various factors such as infidelity,
amicable disagreements, or other factors. Given that these women are
likely in their later stages of life, it is plausible that they
experienced divorce after starting a family, although the specific
reasons remain unclear due to the limited data available.
Percentage of Women Separated by Selected Age Groups
separated_data <- females_cleaned[females_cleaned$marital_status == "Separated", ]
ggplot(separated_data, aes(x = age_group, y = percentage)) +
geom_bar(stat = "identity", fill = "purple") +
geom_text(aes(label = paste0(percentage, "%")), vjust = -0.5) +
labs(title = "Separated Percentages by Age Group",
x = "Age Group",
y = "Percentage") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))

Similar to the graph illustrating the age group of married women,
this visualization suggests that women in the 45-54 age bracket hold a
prominent position in the separated category of our analysis. Despite
the relatively lower percentages (meaning out of 100), this age group
continues to appear In a similar vein to the graph illustrating the age
group of married women, this particular visualization suggests that
women in the 45-54 age bracket hold a prominent position in the
separated category of our analysis. Despite the relatively lower
percentages, this age group continues to exert significant influence in
marriage trends.
Percentage of Widowed Women by Selected Age Groups
females_cleaned <- data.frame(
age_group = c("20-34 years", "35-44 years", "45-54 years", "55-64 years"),
widowed_percentage = c(0.2, 0.8, 2.4, 6.8)
)
ggplot(females_cleaned, aes(x = age_group, y = widowed_percentage)) +
geom_bar(stat = "identity", fill = "lightgray") +
geom_text(aes(label = paste0(widowed_percentage, "%")), vjust = -0.5, color = "black") +
labs(title = "Widowed Percentage by Age Group",
x = "Age Group",
y = "Widowed Percentage") +
theme_minimal()

In the widowed category, it appears that the demographic aged 55-64
leads with the highest number of women who have been widowed. While our
data is limited, it is reasonable to infer, based on demographic trends,
that this phenomenon likely stems from various factors such as losing a
partner to illness, death resulting from illness, or potentially losing
a spouse in military service. Given the age group, speculation
surrounding the plausibility of these losses may be linked to military
service, as during their formative years, military enlistment and
marriage for starting a family were prevalent societal norms. However,
that is just an observation and an assumption as we do not have
definitive data confirming that theory.
Percentage of Women Never Married/Single by Selected Age Groups
never_married_data <- data.frame(
age_group = c("20-34 years", "35-44 years", "45-54 years", "55-64 years"),
never_married_percentage = c(63.5, 23.6, 14.3, 10.6)
)
ggplot(never_married_data, aes(x = age_group, y = never_married_percentage)) +
geom_bar(stat = "identity", fill = "darkorange") +
geom_text(aes(label = paste0(never_married_percentage, "%")), vjust = 1.5, color = "black") +
labs(title = "Never Married Percentages by Age Group",
x = "Age Group",
y = "Never Married Percentage") +
theme_minimal()

In the never-married category, the youngest demographic we examined,
ages 20-34 years, exhibits the highest percentage. This trend likely
arises from several factors, but it is reasonable to assume that many
women in this age group are still in the process of navigating their
journey to find a suitable partner, establish themselves professionally,
and strive to achieve a balance between pursuing their future goals and
managing their current responsibilities.
Percentages of Women in the Labor Force Based on Marital Status
femalelabor_cleaned <- data.frame(
row = c("Females 16y and Older", "In Labor Force"),
total = c(136948302, 80491996),
married = c(47.0, 47.3),
divorced = c(12.1, 12.3),
separated = c(1.9, 2.2),
widowed = c(8.5, 2.7),
never_married = c(30.5, 35.6)
)
femalelabor_cleaned <- femalelabor_cleaned %>%
pivot_longer(cols = -row,
names_to = "marital_status",
values_to = "percentage")
print(femalelabor_cleaned)
## # A tibble: 12 × 3
## row marital_status percentage
## <chr> <chr> <dbl>
## 1 Females 16y and Older total 136948302
## 2 Females 16y and Older married 47
## 3 Females 16y and Older divorced 12.1
## 4 Females 16y and Older separated 1.9
## 5 Females 16y and Older widowed 8.5
## 6 Females 16y and Older never_married 30.5
## 7 In Labor Force total 80491996
## 8 In Labor Force married 47.3
## 9 In Labor Force divorced 12.3
## 10 In Labor Force separated 2.2
## 11 In Labor Force widowed 2.7
## 12 In Labor Force never_married 35.6
TOTAL NUMBER OF WOMEN IN THE WORK FORCE BROKEN DOWN BY THEIR MARITAL
STATUS
workforce_data<- data.frame(
marital_status = c("Married", "Divorced", "Separated", "Widowed", "Never Married"),
percentage = c(47.3, 12.3, 2.2, 2.7, 35.6)
)
total_labor_force <- 80491996
workforce_data$number <- workforce_data$percentage / 100 * total_labor_force
ggplot(workforce_data, aes(x = marital_status, y = number, group = 1)) +
geom_line(color = "black") +
geom_point(color = "blue", size = 3) +
labs(title = "Number of Women in the Labor Force Aged 16 and Older by Marital Status",
x = "Marital Status",
y = "Number of Women",
caption = "Data Source: Your Source") +
theme_minimal()

Analysis Summary
Upon analyzing the age demographics with the highest number of women
in each marital status category, and subsequently refining the data to
focus on women in the workforce, I have reached the final conclusion
that married women constitute the largest portion of women in the
workforce based on their marital status. This outcome was somewhat
surprising, as I initially expected the “never married” or
divorced/separated categories to have the highest representation. This
assumption stemmed from the observation that the “never married”
category had the highest number of women, and in today’s society, many
women are prioritizing their careers over traditional gender roles.
However, upon reflection, it is not entirely unexpected to find that
many women in the workforce are still married, given the ongoing efforts
to balance family responsibilities with career aspirations.