A. Introduction

Is there an association between income level and weight status classification (obesity vs overweight)?

This study shows the relationship between income level and obesity across U.S. states using data from the Nutrition, Physical Activity, and Obesity – Behavioral Risk Factor Surveillance System (BRFSS). The dataset contains 110,880 observations and 33 variables, with each observation representing reported health and behavioral data collected through state. The BRFSS dataset is a public source available through the U.S. Centers for Disease Control and Prevention (CDC). This topic was chosen because both income and obesity are major public health concerns in the U.S., and socioeconomic factors can play an important role in health care. Understanding if income level is associated with obesity can provide information to help inform public health strategies to aim at reducing obesity rates. Dataset Source: https://catalog.data.gov/dataset/nutrition-physical-activity-and-obesity-behavioral-risk-factor-surveillance-system

Load Dataset:

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

setwd("~/Downloads")
data <- read_csv("Nutrition__Physical_Activity__and_Obesity_-_Behavioral_Risk_Factor_Surveillance_System (2).csv")

## Rows: 110880 Columns: 33
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (26): LocationAbbr, LocationDesc, Datasource, Class, Topic, Question, Da...
## dbl  (7): YearStart, YearEnd, Data_Value, Data_Value_Alt, Low_Confidence_Lim...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

B. Data Analysis

Before conducting the chi-square test of independence, the dataset was cleaned and prepared for analysis. First, the dataset was reduced to include only the variables relevant to the research question, including income level, survey question, and data values. Observations with missing data values were removed to avoid issues during analysis. These data preparation steps ensured the dataset was cleaned for analysis.

Cleaning the dataset & filtering

df <- data %>%
  select(LocationDesc, Topic, Question, Stratification1, Data_Value) %>%
  filter(!is.na(Data_Value)) %>%
  filter(Topic == "Obesity / Weight Status") %>%
  filter(!is.na(Stratification1))

Filtering to only show income

df <- df %>%
  filter(grepl("\\$", Stratification1))

C. Statistical Analysis

A chi-square test of independence was used to show whether there is an association between income level and obesity in the U.S. This statistical method is right because both income level and obesity related survey responses are categorical variables. The test shows whether the distribution of obesity measures differs across income categories by comparing observed frequencies to expected frequencies.

\(H_0\) : Income level is not associated with obesity
\(H_a\) : Income level is associated with obesity

observed_dataset <- table(df$Stratification1, df$Question)
observed_dataset

##                     
##                      Percent of adults aged 18 years and older who have an overweight classification
##   $15,000 - $24,999                                                                              751
##   $25,000 - $34,999                                                                              751
##   $35,000 - $49,999                                                                              751
##   $50,000 - $74,999                                                                              750
##   $75,000 or greater                                                                             751
##   Less than $15,000                                                                              751
##                     
##                      Percent of adults aged 18 years and older who have obesity
##   $15,000 - $24,999                                                         751
##   $25,000 - $34,999                                                         751
##   $35,000 - $49,999                                                         751
##   $50,000 - $74,999                                                         750
##   $75,000 or greater                                                        751
##   Less than $15,000                                                         751

Bar Graph

library(ggplot2)

ggplot(df, aes(x = Stratification1, fill = Question)) +
  geom_bar(position = "dodge") +
  labs(
    title = "Obesity Classification by Income Level",
    x = "Income Level",
    y = "Count"
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Chi square test

chisq.test(observed_dataset)

## 
##  Pearson's Chi-squared test
## 
## data:  observed_dataset
## X-squared = 0, df = 5, p-value = 1

Expected Counts

chi <- chisq.test(observed_dataset)
chi$expected

##                     
##                      Percent of adults aged 18 years and older who have an overweight classification
##   $15,000 - $24,999                                                                              751
##   $25,000 - $34,999                                                                              751
##   $35,000 - $49,999                                                                              751
##   $50,000 - $74,999                                                                              750
##   $75,000 or greater                                                                             751
##   Less than $15,000                                                                              751
##                     
##                      Percent of adults aged 18 years and older who have obesity
##   $15,000 - $24,999                                                         751
##   $25,000 - $34,999                                                         751
##   $35,000 - $49,999                                                         751
##   $50,000 - $74,999                                                         750
##   $75,000 or greater                                                        751
##   Less than $15,000                                                         751

Bar Graph Interpretation

The bar graph shows the counts of obesity related survey responses across different income levels, separated by overweight and obesity classifications. The bars is equal showing that the number of observations is similar across income groups for both classifications. This shows that the distribution of obesity and overweight survey measures does not differ by income level. The pattern in the bar graph is consistent with the statistical results from the chi-square test.

Chi-square Test Intrpretation

The chi-square test of independence resulted in a chi-square statistic of 0 with a degree of freedom of 5 d and a p-value of 1. Since the p-value is greater than the significance level of 0.05, we fail to reject the null hypothesis. This indicates that there is no statistical evidence of an association between income level and obesity classification in the BRFSS dataset.

D. Conclusion and Future Directions

This study examined if there is a association between income level and obesity classification using data from the BRFSS dataset. The chi-square test of independence found no significant association between income level and obesity related surveys. This result suggests that income level alone may not explain differences in obesity classification within this dataset and shows the complexity of obesity as a public health issue. Factors such as physical activity, access to healthy food, education, and healthcare may play a more important role in influencing obesity outcomes. Future research could explore these additional variables or analyze obesity rates rather than survey counts to better understand how socioeconomic factors relate to obesity.

Final Project

Jathiya Hamidi