Introduction

Obesity and diabetes are two important public health problems in America. It is already apparent that obesity is one of the risk factors for Type 2 diabetes, thanks to previous studies. Analysis of the correlation between the rates of obesity and diabetes will allow us to find the areas that might need to have prevention programs in place.

Throughout this study, I utilized CDC PLACES County-Level Data (2023 estimates) to analyze differences in diabetes prevalence among Maryland counties with various obesity prevalence levels. The data include estimates of population health in counties in the USA. Only Maryland counties and the indicators of obesity and diabetes have been chosen.

https://data.cdc.gov/500-Cities-Places/PLACES-Local-Data-for-Better-Health-County-Data-20/swc5-untb/data_preview

Research Question: Does diabetes prevalence differ among Maryland counties that possess different obesity prevalence levels?

Data Analysis

The analysis will be conducted on county-level data that will be taken from the CDC data set. The first thing I plan to do is to clean the data set by choosing the variables that will be useful for the research question and choosing the counties in the state of Maryland only. Following that, I aim to classify the counties as either high-level obesity or low-level obesity. Finally, I will plot the boxplot.

```{r} library(dplyr) library(ggplot2) library(tidyr)

data <- read.csv(“PLACES.csv”) head(data)

```{r}
md <- data%>%
filter(StateAbbr == "MD")
md

```{r} md_clean <- md%>% select(LocationName, Measure, Data_Value) md_clean

md_wide <- md_clean %>% filter(Measure %in% c(“Obesity among adults”, “Diagnosed diabetes among adults”)) %>% pivot_wider(names_from = Measure, values_from = Data_Value, values_fn = mean) md_wide


```{r}
summary(md_wide)
mean(md_wide$`Obesity among adults`, na.rm = TRUE)
mean(md_wide$`Diagnosed diabetes among adults`, na.rm = TRUE)
max(md_wide$`Obesity among adults`, na.rm = TRUE)
max(md_wide$`Diagnosed diabetes among adults`, na.rm = TRUE)
summary(md_wide)

``{r} md_wide <- md_wide %>% mutate(obesity_group = ifelse(Obesity among adults>= median(Obesity among adults`, na.rm = TRUE), “High Obesity”, “Low Obesity”))

boxplot(Diagnosed diabetes among adults ~ obesity_group, data = md_wide, main = “Diabetes Prevalence by Obesity Group”, xlab = “Obesity Group”, ylab = “Diabetes Prevalence (%)”)


# Statistical Analysis

  Through the use of a two sample t-test, I was able to establish whether there is a difference in the prevalence rate of diabetes in Maryland Counties with high/low obesity rates.
Null Hypothesis: There is no difference in mean diabetes prevalence in the two categories.
Alternative Hypothesis: There is a difference.

H₀: μHigh = μLow
H₁: μHigh ≠ μLow
```{r}
md_wide <- md_wide %>%
mutate(obesity_group = ifelse(`Obesity among adults` >= median(`Obesity among adults`, na.rm = TRUE), "High Obesity", "Low Obesity"))
md_wide

{r} table(md_wide$obesity_group) {r} t.test(`Diagnosed diabetes among adults` ~ obesity_group, data = md_wide) The test revealed that there is a stark difference with a t value of 3.6958 and a p value of .001541. As the p value is less than .05, the null hypothesis is rejected. Therefore, there is a greater prevalence of diabetes in the counties with a high obesity rate (12.55%) than those with a low obesity rate (10.68%).

Conclusion

From the results above, it is evident that there exists a significant difference in the prevalence rate of diabetes in counties with high levels of obesity compared to those with low levels of obesity. This difference in the prevalence rates is statistically significant (p<0.05) and, therefore, results in the rejection of the null hypothesis. In conclusion, a positive correlation between the level of obesity and the prevalence of diabetes in counties within Maryland is present.

These results are applicable towards answering the research question due to the data demonstrating that high levels of obesity correlate with high diabetes prevalence throughout Maryland counties. However, these results are still limited and not undisputed. This can likely be attributed to this statistical test being carried out using just grouped averages, while not taking into account the many other variables that might have affected the outcome of this test. These variables could include income, age, and access to healthcare.