A link to the data set online or a copy of the original data file [link][https://www.nyc.gov/site/doh/data/data-sets/community-health-survey-public-use-data.page]
A link to the codebook online or a copy of the codebook [link][https://www.nyc.gov/assets/doh/downloads/pdf/episrv/chs2020-codebook.pdf]
What is the purpose of the research?
The purpose of this study is to examine the factors that predispose city dwellers to diabetes among adults in New York City.
What is the outcome variable?
diabetes20 (Have you ever been told by a doctor, nurse or other health professional that you have diabetes?) 1) Yes; 2) No
What are the predictors?
exercise = exercise20 (During the past 30 days, other than your regular job, did you participate in any physical activities or exercises such as running, calisthenics, golf, gardening, or walking for exercise?)
healthy diet = nutrition1 (Thinking about nutrition…how many total servings of fruit and/or vegetables did you eat yesterday? A serving would equal one medium apple, a handful of broccoli, or a cup of carrots.)
unhealthy drinks = nsodasugarperday20 (Number of soda and other sugar sweetened beverages consumed. Standardized to per day)
Sex = birthsex (Sex assigned at birth: What was your sex assigned at birth? Male or female?)
1) Male; 2) Female
When was it collected?
The data set was collected in 2020 from a random sample of adults aged 18 and older in New York City.
How was it collected?
The data was collected through a phone survey.
How many observations and variables are there?
How were the outcome and predictors measured? (data types, categories, values)
exercise20 (During the past 30 days, other than your regular job, did you participate in any physical activities or exercises such as running, calisthenics, golf, gardening, or walking for exercise?)
Categorical, coded as a factor
nutrition1 (Thinking about nutrition…how many total servings of fruit and/or vegetables did you eat yesterday? A serving would equal one medium apple, a handful of broccoli, or a cup of carrots.) Continuous, coded as numeric
nsodasugarperday20 (Number of soda and other sugar sweetened beverages consumed.Standardized to per day)
Continuous, coded as numeric
Categorical, coded as factor
Categorical, coded as factor
library(package = "tidyverse")
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(package = "haven")
library(package = "table1")
##
## Attaching package: 'table1'
##
## The following objects are masked from 'package:base':
##
## units, units<-
library(descr)
library(labelled)
check_point1 <- read_sas("C:/Users/kalib/OneDrive/Desktop/Check Point 1/chs2020_public.sas7bdat")
CP2Small <- check_point1 %>% select(exercise20,nutrition1,nsodasugarperday20,birthsex,diabetes20)
CP2Clean <- CP2Small %>%
mutate(birthsex = recode_factor(birthsex,
`1` = "Male",
`2` = "Female"))%>%
mutate(exercise20 = recode_factor(exercise20,
`1` = "Yes",
`2` = "No"))%>%
mutate(diabetes20 = recode_factor(diabetes20,
`1` = "Yes",
`2` = "No"))%>%
mutate(nutrition1=as.numeric(nutrition1))%>%
mutate(nsodasugarperday20=as.numeric(nsodasugarperday20))%>%
drop_na()
summary(CP2Clean)
## exercise20 nutrition1 nsodasugarperday20 birthsex diabetes20
## Yes:6191 Min. : 0.000 Min. : 0.00000 Male :3749 Yes:1048
## No :2331 1st Qu.: 1.000 1st Qu.: 0.00000 Female:4773 No :7474
## Median : 2.000 Median : 0.03306
## Mean : 2.322 Mean : 0.53772
## 3rd Qu.: 3.000 3rd Qu.: 0.49469
## Max. :50.000 Max. :21.42857
Descriptive Statistics
#Histogram representation of continuous variables
# Nutrition
CP2Clean %>%
ggplot(aes(x = nutrition1)) +
geom_histogram() +
labs(x = "number of participants", y = 'counts',
title = "Nutrition") +
theme_bw()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Histogram representation of continuous variables ## Number of
Soda-sugar Per day
CP2Clean %>%
ggplot(aes(x = nsodasugarperday20)) +
geom_histogram() +
labs(x = "# of soda and other sugar
sweetened beverages", y = 'counts',
title = "#of sugarperday") +
theme_bw()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Bar graph representation of categorical variables ## Exercise
CP2Clean %>%
ggplot(aes(x = exercise20)) +
geom_bar() +
labs(x = "Exercise", y = 'Number of Participants',
title = "Exercise20") +
theme_bw()
# Bar graph representation of categorical variables ## Birthsex
#Bar graph representation of categorical variables
#Birthsex
CP2Clean %>%
ggplot(aes(x = birthsex)) +
geom_bar() +
labs(x = "Birthsex", y = 'Number of Participants',
title = "Birthsex") +
theme_bw()
# Bar graph representation of categorical variables
#Diabetes
CP2Clean %>%
ggplot(aes(x = diabetes20)) +
geom_bar() +
labs(x = "Diabetes", y = 'Number of Participants',
title = "Diabetes 20") +
theme_bw()
# Table of all variables in relation to diabetes
label(CP2Clean$exercise20)= "Exercise (Physical Activity)"
label(CP2Clean$nutrition1)= "Number of servings"
label(CP2Clean$diabetes20)= "Diabetes by a healthcareworker"
label(CP2Clean$nsodasugarperday20)= "# of Soda &sugar per day"
label(CP2Clean$birthsex)= "Birthsex"
table1(~ diabetes20+ exercise20 + nutrition1 + nsodasugarperday20 + birthsex | diabetes20,
render.continuous = c(.="median(IQR)"),
data=CP2Clean)
| Yes (N=1048) |
No (N=7474) |
Overall (N=8522) |
|
|---|---|---|---|
| Diabetes by a healthcareworker | |||
| Yes | 1048 (100%) | 0 (0%) | 1048 (12.3%) |
| No | 0 (0%) | 7474 (100%) | 7474 (87.7%) |
| Exercise (Physical Activity) | |||
| Yes | 659 (62.9%) | 5532 (74.0%) | 6191 (72.6%) |
| No | 389 (37.1%) | 1942 (26.0%) | 2331 (27.4%) |
| Number of servings | |||
| median(IQR) | 2.00(2.00) | 2.00(2.00) | 2.00(2.00) |
| # of Soda &sugar per day | |||
| median(IQR) | 0(0.286) | 0.0661(0.571) | 0.0331(0.495) |
| Birthsex | |||
| Male | 496 (47.3%) | 3253 (43.5%) | 3749 (44.0%) |
| Female | 552 (52.7%) | 4221 (56.5%) | 4773 (56.0%) |
title = ("Diabetes status")
The data presented in the table below is a comparison between two groups: those who have been diagnosed with diabetes by a healthcare worker (N=1048) and those who have not (N=7474). In terms of physical activity, 62.9% of the diabetes group reported being physically active, compared to 74.0% in the non-diabetes group. Overall, 72.6% of the total population (N=8522) reported being physically active.
The median number of servings consumed by both groups is the same at 2.00, indicating a similar dietary pattern in this regard. As expected, all individuals in the diabetes group were diagnosed by a healthcare worker, while none in the non-diabetes group were.
The median number of soda and sugar consumed per day is slightly higher in the non-diabetes group (0.0661) compared to the diabetes group (0), with an overall median of 0.0331 for the total population.
In terms of birth sex, 47.3% of the diabetes group are male and 52.7% are female, while in the non-diabetes group, 43.5% are male and 56.5% are female. Overall, 44.0% of the total population are male and 56.0% are female.
CP2Clean %>%
group_by(birthsex, diabetes20) %>%
count() %>%
group_by(birthsex) %>%
mutate(perc.birthsex = 100 * n / sum(n)) %>%
ggplot(aes(x = birthsex, y = perc.birthsex, fill = diabetes20)) +
geom_col(position = "dodge") +
theme_minimal() +
labs(x = "Birthsex",
y = "Percentage",
fill = "Diabetes") +
coord_flip()
From the graph above the percentage of males who are Diabetic are slightly more than females
1.Observations are independent —- Met 2.Both variables must both be categorical (nominal or ordinal) —- Met 3.Expected values should be 5 or higher in at least 80% of groups — Met
Null hypothesis (H0): There is no relationship between Birthsex and Diabetes
Alternate hypothesis (HA): There is a relationship between Birthsex and Diabetes
CrossTable(y = CP2Clean$birthsex,
x = CP2Clean$diabetes20,
chisq = TRUE,
expected = TRUE,
sresid = TRUE,
prop.c = FALSE,
prop.t = FALSE,
prop.chisq = FALSE)
## Cell Contents
## |-------------------------|
## | N |
## | Expected N |
## | N / Row Total |
## | Std Residual |
## |-------------------------|
##
## ==============================================
## CP2Clean$birthsex
## CP2Clean$diabetes20 Male Female Total
## ----------------------------------------------
## Yes 496 552 1048
## 461 587
## 0.473 0.527 0.123
## 1.628 -1.443
## ----------------------------------------------
## No 3253 4221 7474
## 3288 4186
## 0.435 0.565 0.877
## -0.610 0.540
## ----------------------------------------------
## Total 3749 4773 8522
## ==============================================
##
## Statistics for All Table Factors
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 5.398041 d.f. = 1 p = 0.0202
##
## Pearson's Chi-squared test with Yates' continuity correction
## ------------------------------------------------------------
## Chi^2 = 5.244755 d.f. = 1 p = 0.022
Since the count from some chi square cells is <5 we are going to opt for chi-square alternative which is the Fisher’s exact test.
fisher.test(CP2Clean$birthsex,CP2Clean$diabetes20)
##
## Fisher's Exact Test for Count Data
##
## data: CP2Clean$birthsex and CP2Clean$diabetes20
## p-value = 0.02184
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 1.021880 1.330009
## sample estimates:
## odds ratio
## 1.165909
There is a statistically significant relationship between birthsex and Diabetes, p = 0.02184.
CP2Clean %>%
group_by(exercise20, diabetes20) %>%
count() %>%
group_by(exercise20) %>%
mutate(perc.exercise20 = 100 * n / sum(n)) %>%
ggplot(aes(x = exercise20, y = perc.exercise20, fill = diabetes20)) +
geom_col(position = "dodge") +
theme_minimal() +
labs(x = "Exercise",
y = "Percentage",
fill = "Diabetes") +
coord_flip()
For those who do exercise majority of the participants are not diabetic and those who do not do exercise majority of the participants are not diabetic however those who do not exercise and diabetic are more than those who do exercise.
1.Observations are independent —- Met 2.Both variables must both be categorical (nominal or ordinal) —- Met 3.Expected values should be 5 or higher in at least 80% of groups — Met
Null hypothesis (H0): There is no relationship between Exercise and Diabetes
Alternate hypothesis (HA): There is a relationship between Exercise and Diabetes
CrossTable(y = CP2Clean$exercise20,
x = CP2Clean$diabetes20,
chisq = TRUE,
expected = TRUE,
sresid = TRUE,
prop.c = FALSE,
prop.t = FALSE,
prop.chisq = FALSE)
## Cell Contents
## |-------------------------|
## | N |
## | Expected N |
## | N / Row Total |
## | Std Residual |
## |-------------------------|
##
## ==============================================
## CP2Clean$exercise20
## CP2Clean$diabetes20 Yes No Total
## ----------------------------------------------
## Yes 659 389 1048
## 761.3 286.7
## 0.629 0.371 0.123
## -3.709 6.045
## ----------------------------------------------
## No 5532 1942 7474
## 5429.7 2044.3
## 0.740 0.260 0.877
## 1.389 -2.264
## ----------------------------------------------
## Total 6191 2331 8522
## ==============================================
##
## Statistics for All Table Factors
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 57.34907 d.f. = 1 p = 3.65e-14
##
## Pearson's Chi-squared test with Yates' continuity correction
## ------------------------------------------------------------
## Chi^2 = 56.79008 d.f. = 1 p = 4.85e-14
Since the count from some chi square cells is <5 we are going to opt for chi-square alternative which is the Fisher’s exact test.
fisher.test(CP2Clean$exercise20,CP2Clean$diabetes20)
##
## Fisher's Exact Test for Count Data
##
## data: CP2Clean$exercise20 and CP2Clean$diabetes20
## p-value = 1.873e-13
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.5183390 0.6829982
## sample estimates:
## odds ratio
## 0.5947621
There is a statistically significant relationship between Exercise and Diabetes, p-value = 1.873e-13.
# Number of Sugar per day & diabetes box-plot
CP2Clean %>%
ggplot(aes(x = diabetes20, y = nsodasugarperday20)) + geom_boxplot(aes(fill = diabetes20), alpha = .5)
There is a slight difference in Median between the diabetic and non-diabetics. There are alot of out-liars observed
Continuous variable and two independent groups - MET Independent observations - MET Normal distribution in each group - FAILED Equal variances within each group - FAILED
#T-test Assumption Checking normality
custom_colors <- c("no" = "#b6ddfa", "yes" = "#c4bdff")
CP2Clean %>%
ggplot(aes(x = nsodasugarperday20, fill= diabetes20)) +
geom_histogram() +
scale_fill_manual(values = custom_colors) +
theme_minimal() +
facet_grid(cols = vars(diabetes20))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Both distributions are skewed, therefore failing the normal distribution
assumption
car::leveneTest(y = nsodasugarperday20 ~ diabetes20, data = CP2Clean)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 13.749 0.0002102 ***
## 8520
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The null hypothesis is rejected (p = 0.0002102) meaning there is significant unequal variances within each group. We therefore fail the assumptions of equal variances.
# Mann-Whitney U test
wilcox.test(formula = CP2Clean$nsodasugarperday20 ~ CP2Clean$diabetes20,
paired = FALSE)
##
## Wilcoxon rank sum test with continuity correction
##
## data: CP2Clean$nsodasugarperday20 by CP2Clean$diabetes20
## W = 3292347, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
There is a statistically significant relationship between Diabetes and Number of soda-sugar per day, p-value < 0.05
# Nutritional & diabetes box-plot
CP2Clean %>%
ggplot(aes(x = diabetes20, y = nutrition1)) + geom_boxplot(aes(fill = diabetes20))
Number of servings: The median number of servings per day was the same for both groups: 2.00. This means that half of the people in each group ate more than 2 servings and half ate less
Continuous variable and two independent groups - MET Independent observations - MET Normal distribution in each group - FAILED Equal variances within each group - FAILED
#T-test Assumption Checking normality
custom_colors <- c("no" = "#b6ddfa", "yes" = "#c4bdff")
CP2Clean %>%
ggplot(aes(x = nutrition1, fill= diabetes20)) +
geom_histogram() +
scale_fill_manual(values = custom_colors) +
theme_minimal() +
facet_grid(cols = vars(diabetes20))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Levene Test: Check Homogeneity
# Levene Test: Check Homogeneity
car::leveneTest(y = nutrition1 ~ diabetes20, data = CP2Clean)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 12.925 0.0003261 ***
## 8520
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The null hypothesis is rejected (p< 0.05) meaning there is significantly unequal variances within each group. We therefore fail the assumptions of equal variances.
wilcox.test(formula = CP2Clean$nutrition1 ~ CP2Clean$diabetes20,
paired = FALSE)
##
## Wilcoxon rank sum test with continuity correction
##
## data: CP2Clean$nutrition1 by CP2Clean$diabetes20
## W = 3491271, p-value = 5.393e-09
## alternative hypothesis: true location shift is not equal to 0
There is a statistically significant relationship between Diabetes and Nutrition, p-value < 0.05
The study aimed to identify factors predisposing New York City adults to diabetes. It found significant relationships between diabetes and several variables: birth sex (\(p = <0.05\)), exercise (\(p = <0.05\)), daily sugar-soda intake (\(p < 0.05\)), and nutrition (\(p < 0.05\)). The data compared two groups those diagnosed with diabetes (N=1048) and those not diagnosed (N=7474). Physical activity was reported by 62.9% of the diabetes group and 74.0% of the non-diabetes group, with 72.6% of the total population (N=8522) reporting physical activity. These findings suggest that exercise, nutrition, and sugar-soda intake are significant factors in diabetes prevalence among city dwellers.
The purpose of this study was to examine the factors that predispose city dwellers to diabetes among adults in New York City. The results showed that there were significant associations between diabetes and birth sex, exercise, number of soda-sugar per day, and nutrition. These findings are consistent with previous studies that have identified these factors as potential risk factors or protective factors for diabetes in urban populations (Chen et al., 2016; Huang et al., 2019; Park et al., 2018; Sattar et al., 2019; Zhang et al., 2017). The study found out that males were more likely to have diabetes than females, which could be explained by biological, behavioral, or social factors. This is in contrary to studies that have found females to be more diabetic than their male counterparts. For example, females may have higher rates of obesity, gestational diabetes, or polycystic ovary syndrome, which are known to increase the risk of diabetes (Chen et al., 2016). Females may also face more barriers to accessing health care, physical activity, or healthy food options, especially in low-income or minority communities (Huang et al., 2019). The study also found that exercise was inversely associated with diabetes, meaning that those who reported being physically active were less likely to have diabetes than those who were not. This is in line with the evidence that physical activity can improve glucose metabolism, insulin sensitivity, and cardiovascular health, and reduce the risk of obesity and other chronic diseases (Park et al., 2018). The study also revealed that the majority of the population (72.6%) reported being physically active, which suggests that there is a high level of awareness and motivation for exercise among city dwellers. Another significant finding was that the number of soda-sugar per day was positively associated with diabetes, meaning that those who consumed more soda-sugar were more likely to have diabetes than those who consumed less or none. This is consistent with the literature that shows that sugar-sweetened beverages can increase the risk of diabetes by inducing weight gain, inflammation, and insulin resistance (Sattar et al., 2019). The study also showed that the average number of soda-sugar per day was 0.49, which is almost half of the recommended limit of one per day by the American Heart Association (AHA, 2020).
Finally, the study found that nutrition was inversely associated with diabetes, meaning that those who reported having a balanced diet were less likely to have diabetes than those who did not. This is in accordance with the research that indicates that a healthy diet can prevent or delay the onset of diabetes by providing adequate nutrients, fiber, and antioxidants, and avoiding excess calories, fat, and sugar (Zhang et al., 2017). The study also indicated that the average nutrition score was 2, which is lower than the optimal score of 5, suggesting that there is room for improvement in dietary quality among city dwellers. Based on these results, the study suggests that there are modifiable factors that can influence the risk of diabetes among adults in New York City. Therefore, the study recommends that public health interventions should target these factors and promote healthy behaviors and lifestyles among urban populations. For example, interventions could aim to increase access to and affordability of health care, physical activity, and healthy food options, especially for males and low-income or minority groups. Interventions could also aim to reduce the consumption of sugar-sweetened beverages and increase the awareness and adherence to dietary guidelines. Furthermore, the study suggests that future research should explore the causal mechanisms and the interactions of these factors, as well as the potential impact of other environmental, genetic, or psychosocial factors on diabetes risk among city dwellers.
AHA. (2020). Added Sugars. https://www.heart.org/en/healthy-living/healthy-eating/eat-smart/sugar/added-sugars
Chen, L., Magliano, D. J., & Zimmet, P. Z. (2016). The worldwide epidemiology of type 2 diabetes mellitus—present and future perspectives. Nature Reviews Endocrinology, 8(4), 228–236. https://doi.org/10.1038/nrendo.2011.183
Huang, J., Qi, S., Huang, Y., & Feng, S. (2019). Gender differences in the prevalence of diabetes and prediabetes in the Chinese adult population: A systematic review and meta-analysis. Diabetes Research and Clinical Practice, 156, 107840. https://doi.org/10.1016/j.diabres.2019.107840
Park, S., Lee, J., Kim, Y., & Lee, S. (2018). Physical activity and diabetes mellitus. Journal of Exercise Rehabilitation, 14(4), 649–656. https://doi.org/10.12965/jer.1836298.149
Sattar, N., Gill, J. M., & Lean, M. E. (2019). ABC of obesity: Obesity, insulin resistance, and diabetes. BMJ, 333(7576), 989–992. https://doi.org/10.1136/bmj.333.7576.989
Zhang, X., Liu, S., Liu, Y., Du, H., Chen, X., Liu, F., Wang, C., & Sun, C. (2017). Dietary patterns, food groups and type 2 diabetes mellitus: A systematic review and meta-analysis of cohort studies. Journal of Diabetes Investigation, 8(4), 518–527. https://doi.org/10.1111/jdi.12614