Is there an association between income level and weight status classification (obesity vs overweight)?
This study shows the relationship between income level and obesity across U.S. states using data from the Nutrition, Physical Activity, and Obesity – Behavioral Risk Factor Surveillance System (BRFSS). The dataset contains 110,880 observations and 33 variables, with each observation representing reported health and behavioral data collected through state. The BRFSS dataset is a public source available through the U.S. Centers for Disease Control and Prevention (CDC). This topic was chosen because both income and obesity are major public health concerns in the U.S., and socioeconomic factors can play an important role in health care. Understanding if income level is associated with obesity can provide information to help inform public health strategies to aim at reducing obesity rates. Dataset Source: https://catalog.data.gov/dataset/nutrition-physical-activity-and-obesity-behavioral-risk-factor-surveillance-system
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.2
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
setwd("~/Downloads")
data <- read_csv("Nutrition__Physical_Activity__and_Obesity_-_Behavioral_Risk_Factor_Surveillance_System (2).csv")
## Rows: 110880 Columns: 33
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (26): LocationAbbr, LocationDesc, Datasource, Class, Topic, Question, Da...
## dbl (7): YearStart, YearEnd, Data_Value, Data_Value_Alt, Low_Confidence_Lim...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Before conducting the chi-square test of independence, the dataset was cleaned and prepared for analysis. First, the dataset was reduced to include only the variables relevant to the research question, including income level, survey question, and data values. Observations with missing data values were removed to avoid issues during analysis. These data preparation steps ensured the dataset was cleaned for analysis.
df <- data %>%
select(LocationDesc, Topic, Question, Stratification1, Data_Value) %>%
filter(!is.na(Data_Value)) %>%
filter(Topic == "Obesity / Weight Status") %>%
filter(!is.na(Stratification1))
df <- df %>%
filter(grepl("\\$", Stratification1))
A chi-square test of independence was used to show whether there is an association between income level and obesity in the U.S. This statistical method is right because both income level and obesity related survey responses are categorical variables. The test shows whether the distribution of obesity measures differs across income categories by comparing observed frequencies to expected frequencies.
\(H_0\) : Income level is not
associated with obesity
\(H_a\) : Income level is associated
with obesity
observed_dataset <- table(df$Stratification1, df$Question)
observed_dataset
##
## Percent of adults aged 18 years and older who have an overweight classification
## $15,000 - $24,999 751
## $25,000 - $34,999 751
## $35,000 - $49,999 751
## $50,000 - $74,999 750
## $75,000 or greater 751
## Less than $15,000 751
##
## Percent of adults aged 18 years and older who have obesity
## $15,000 - $24,999 751
## $25,000 - $34,999 751
## $35,000 - $49,999 751
## $50,000 - $74,999 750
## $75,000 or greater 751
## Less than $15,000 751
library(ggplot2)
ggplot(df, aes(x = Stratification1, fill = Question)) +
geom_bar(position = "dodge") +
labs(
title = "Obesity Classification by Income Level",
x = "Income Level",
y = "Count"
) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
chisq.test(observed_dataset)
##
## Pearson's Chi-squared test
##
## data: observed_dataset
## X-squared = 0, df = 5, p-value = 1
chi <- chisq.test(observed_dataset)
chi$expected
##
## Percent of adults aged 18 years and older who have an overweight classification
## $15,000 - $24,999 751
## $25,000 - $34,999 751
## $35,000 - $49,999 751
## $50,000 - $74,999 750
## $75,000 or greater 751
## Less than $15,000 751
##
## Percent of adults aged 18 years and older who have obesity
## $15,000 - $24,999 751
## $25,000 - $34,999 751
## $35,000 - $49,999 751
## $50,000 - $74,999 750
## $75,000 or greater 751
## Less than $15,000 751
The bar graph shows the counts of obesity related survey responses across different income levels, separated by overweight and obesity classifications. The bars is equal showing that the number of observations is similar across income groups for both classifications. This shows that the distribution of obesity and overweight survey measures does not differ by income level. The pattern in the bar graph is consistent with the statistical results from the chi-square test.
The chi-square test of independence resulted in a chi-square statistic of 0 with a degree of freedom of 5 d and a p-value of 1. Since the p-value is greater than the significance level of 0.05, we fail to reject the null hypothesis. This indicates that there is no statistical evidence of an association between income level and obesity classification in the BRFSS dataset.
This study examined if there is a association between income level and obesity classification using data from the BRFSS dataset. The chi-square test of independence found no significant association between income level and obesity related surveys. This result suggests that income level alone may not explain differences in obesity classification within this dataset and shows the complexity of obesity as a public health issue. Factors such as physical activity, access to healthy food, education, and healthcare may play a more important role in influencing obesity outcomes. Future research could explore these additional variables or analyze obesity rates rather than survey counts to better understand how socioeconomic factors relate to obesity.