We always hear that staying active helps prevent obesity, but I wanted to actually look at real data and see if that is always true. My question is whether people who are more physically active actually tend to have lower obesity rates, or if the relationship is more complicated than it sounds.
The dataset I am using is called “Nutrition, Physical Activity, and Obesity.” It includes data across different states and years about physical activity and obesity rates. The main variables I will use are LocationDesc, YearStart, Question, and Data_Value.
library(dplyr)
health_data <- read.csv("C:/Users/kingp/Downloads/Nutrition__Physical_Activity__and_Obesity_-_Behavioral_Risk_Factor_Surveillance_System(1).csv")
# keep only the columns you need
project_data <- health_data |>
select(LocationDesc, YearStart, Question, Data_Value)
# remove missing values
project_data <- project_data |>
filter(!is.na(Data_Value))
head(project_data)
## LocationDesc YearStart
## 1 Alabama 2011
## 2 Alabama 2011
## 3 Alabama 2011
## 4 Alabama 2011
## 5 Alabama 2011
## 6 Alabama 2011
## Question Data_Value
## 1 Percent of adults aged 18 years and older who have obesity 34.8
## 2 Percent of adults aged 18 years and older who have obesity 35.8
## 3 Percent of adults aged 18 years and older who have obesity 32.3
## 4 Percent of adults aged 18 years and older who have obesity 34.1
## 5 Percent of adults aged 18 years and older who have obesity 28.8
## 6 Percent of adults aged 18 years and older who have obesity 16.3
# get only obesity data
obesity_data <- project_data |>
filter(grepl("obesity", Question, ignore.case = TRUE))
# get only physical activity data
activity_data <- project_data |>
filter(grepl("physical activity", Question, ignore.case = TRUE))
head(obesity_data)
## LocationDesc YearStart
## 1 Alabama 2011
## 2 Alabama 2011
## 3 Alabama 2011
## 4 Alabama 2011
## 5 Alabama 2011
## 6 Alabama 2011
## Question Data_Value
## 1 Percent of adults aged 18 years and older who have obesity 34.8
## 2 Percent of adults aged 18 years and older who have obesity 35.8
## 3 Percent of adults aged 18 years and older who have obesity 32.3
## 4 Percent of adults aged 18 years and older who have obesity 34.1
## 5 Percent of adults aged 18 years and older who have obesity 28.8
## 6 Percent of adults aged 18 years and older who have obesity 16.3
head(activity_data)
## LocationDesc YearStart
## 1 Alabama 2011
## 2 Alabama 2011
## 3 Alabama 2011
## 4 Alabama 2011
## 5 Alabama 2011
## 6 Alabama 2011
## Question
## 1 Percent of adults who achieve at least 150 minutes a week of moderate-intensity aerobic physical activity or 75 minutes a week of vigorous-intensity aerobic activity (or an equivalent combination)
## 2 Percent of adults who achieve at least 150 minutes a week of moderate-intensity aerobic physical activity or 75 minutes a week of vigorous-intensity aerobic activity (or an equivalent combination)
## 3 Percent of adults who achieve at least 150 minutes a week of moderate-intensity aerobic physical activity or 75 minutes a week of vigorous-intensity aerobic activity (or an equivalent combination)
## 4 Percent of adults who achieve at least 150 minutes a week of moderate-intensity aerobic physical activity or 75 minutes a week of vigorous-intensity aerobic activity (or an equivalent combination)
## 5 Percent of adults who achieve at least 150 minutes a week of moderate-intensity aerobic physical activity or 75 minutes a week of vigorous-intensity aerobic activity (or an equivalent combination)
## 6 Percent of adults who achieve at least 150 minutes a week of moderate-intensity aerobic physical activity or 75 minutes a week of vigorous-intensity aerobic activity (or an equivalent combination)
## Data_Value
## 1 36.7
## 2 39.3
## 3 48.7
## 4 41.3
## 5 53.5
## 6 49.4
# calculate average obesity and physical activity values
mean(obesity_data$Data_Value)
## [1] 31.26541
mean(activity_data$Data_Value)
## [1] 31.04928
# make a simple plot to compare the two groups
boxplot(obesity_data$Data_Value, activity_data$Data_Value,
names = c("Obesity", "Physical Activity"),
main = "Obesity vs Physical Activity Rates",
ylab = "Percentage")
After looking at the data, I found that physical activity and obesity are related, but not in a perfectly simple way. On average, there is still a noticeable difference between the obesity data and the physical activity data. This suggests that being more active does matter, but it does not explain everything by itself.
Overall, this project showed me that obesity is more complicated than just whether people are active or not. Things like lifestyle, environment, and access to healthy food could also affect the results. In the future, it would be useful to look at more variables to understand the patterns even better.
U.S. Department of Health and Human Services. Nutrition, Physical
Activity, and Obesity Dataset.
https://catalog.data.gov/dataset/nutrition-physical-activity-and-obesity-behavioral-risk-factor-surveillance-system