This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
# Set your working directory to where the file is saved (optional)
setwd("C:\\Users\\sudipta.gupta\\Pictures")
# Import the dataset
physical_activity <- read.csv("physical_activity.csv", header = TRUE)
# View the first few rows
head(physical_activity)
## participant_id age_group gender marital_status education_level occupation
## 1 1 18-29 Male Single Secondary Farmer
## 2 2 18-29 Male Married Secondary Business
## 3 3 45-59 Female Married Primary Farmer
## 4 4 45-59 Female Divorced University Student
## 5 5 45-59 Male Widowed Illiterate Student
## 6 6 45-59 Male Married Secondary Farmer
## monthly_income physical_activity chronic_disease self_rated_health
## 1 32696 Low No Good
## 2 40891 High Yes Good
## 3 28615 Moderate No Excellent
## 4 57190 Low Yes Fair
## 5 56976 Moderate Yes Excellent
## 6 34214 Moderate No Fair
# Check basic info
str(physical_activity)
## 'data.frame': 250 obs. of 10 variables:
## $ participant_id : int 1 2 3 4 5 6 7 8 9 10 ...
## $ age_group : chr "18-29" "18-29" "45-59" "45-59" ...
## $ gender : chr "Male" "Male" "Female" "Female" ...
## $ marital_status : chr "Single" "Married" "Married" "Divorced" ...
## $ education_level : chr "Secondary" "Secondary" "Primary" "University" ...
## $ occupation : chr "Farmer" "Business" "Farmer" "Student" ...
## $ monthly_income : int 32696 40891 28615 57190 56976 34214 41112 8866 14249 25496 ...
## $ physical_activity: chr "Low" "High" "Moderate" "Low" ...
## $ chronic_disease : chr "No" "Yes" "No" "Yes" ...
## $ self_rated_health: chr "Good" "Good" "Excellent" "Fair" ...
summary(physical_activity)
## participant_id age_group gender marital_status
## Min. : 1.00 Length:250 Length:250 Length:250
## 1st Qu.: 63.25 Class :character Class :character Class :character
## Median :125.50 Mode :character Mode :character Mode :character
## Mean :125.50
## 3rd Qu.:187.75
## Max. :250.00
## education_level occupation monthly_income physical_activity
## Length:250 Length:250 Min. : 5052 Length:250
## Class :character Class :character 1st Qu.:19940 Class :character
## Mode :character Mode :character Median :33588 Mode :character
## Mean :33184
## 3rd Qu.:46991
## Max. :59937
## chronic_disease self_rated_health
## Length:250 Length:250
## Class :character Class :character
## Mode :character Mode :character
##
##
##
#load packages
library(gtsummary)
## Warning: package 'gtsummary' was built under R version 4.5.1
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.5.1
## Warning: package 'ggplot2' was built under R version 4.5.1
## Warning: package 'tibble' was built under R version 4.5.1
## Warning: package 'tidyr' was built under R version 4.5.1
## Warning: package 'readr' was built under R version 4.5.1
## Warning: package 'purrr' was built under R version 4.5.1
## Warning: package 'dplyr' was built under R version 4.5.1
## Warning: package 'stringr' was built under R version 4.5.1
## Warning: package 'forcats' was built under R version 4.5.1
## Warning: package 'lubridate' was built under R version 4.5.1
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.1 ✔ stringr 1.5.2
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Generate a descriptive summary for all variables
summary_table <- physical_activity %>%
tbl_summary(
statistic = list(
all_continuous() ~ "{mean} ± {sd}", # Show mean ± SD for numeric vars
all_categorical() ~ "{n} ({p}%)" # Show n (%) for categorical vars
),
missing = "no" # or "ifany" to show missing counts
)
# Print the summary table
summary_table
| Characteristic | N = 2501 |
|---|---|
| participant_id | 126 ± 72 |
| age_group | |
| 18-29 | 69 (28%) |
| 30-44 | 89 (36%) |
| 45-59 | 54 (22%) |
| 60+ | 38 (15%) |
| gender | |
| Female | 132 (53%) |
| Male | 118 (47%) |
| marital_status | |
| Divorced | 25 (10%) |
| Married | 132 (53%) |
| Single | 67 (27%) |
| Widowed | 26 (10%) |
| education_level | |
| Illiterate | 31 (12%) |
| Primary | 82 (33%) |
| Secondary | 90 (36%) |
| University | 47 (19%) |
| occupation | |
| Business | 42 (17%) |
| Farmer | 72 (29%) |
| Service | 66 (26%) |
| Student | 41 (16%) |
| Unemployed | 29 (12%) |
| monthly_income | 33,184 ± 16,041 |
| physical_activity | |
| High | 54 (22%) |
| Low | 95 (38%) |
| Moderate | 101 (40%) |
| chronic_disease | 87 (35%) |
| self_rated_health | |
| Excellent | 44 (18%) |
| Fair | 91 (36%) |
| Good | 85 (34%) |
| Poor | 30 (12%) |
| 1 Mean ± SD; n (%) | |
### Interpretation: The study included 2,501 participants, with a mean age of 126 ± 72 months (approx. 10.5 years)—likely an error or unit issue since age groups clearly indicate adults. Most participants were aged 30–44 years (36%), followed by 18–29 years (28%) and 45–59 years (22%). The gender distribution was relatively balanced (53% female, 47% male). Over half were married (53%), and around one-fourth were single (27%). Educational attainment was modest — only 19% had university education, while the largest share had secondary education (36%). About 12% were illiterate.Regarding occupation, participants were fairly distributed among farmers (29%), service holders (26%), and businesspersons (17%), with smaller proportions of students (16%) and unemployed (12%). The mean monthly income was approximately 33,184 ± 16,041 BDT, suggesting a moderately varied income range.In terms of physical activity, most participants reported moderate activity (40%), followed by low (38%) and high (22%) levels. About 35% had a chronic disease, and self-rated health status was mostly fair (36%) or good (34%), while only 12% rated their health as poor.
#If you haven't already imported and converted factors, run:
df <- readr::read_csv("physical_activity.csv") %>%
mutate(
chronic_disease = factor(chronic_disease, levels = c("No","Yes")),
gender = factor(gender),
age_group = factor(age_group),
marital_status = factor(marital_status),
education_level = factor(education_level),
occupation = factor(occupation),
physical_activity = factor(physical_activity),
self_rated_health = factor(self_rated_health)
)
## Rows: 250 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (8): age_group, gender, marital_status, education_level, occupation, phy...
## dbl (2): participant_id, monthly_income
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# list of variables you want in bivariate tables
vars_cat <- c("age_group","marital_status","education_level",
"occupation","physical_activity","self_rated_health")
vars_num <- c("monthly_income") # add other numeric vars if present
# By gender
tbl_by_gender <- df %>%
select(all_of(c(vars_cat, vars_num)), gender) %>%
tbl_summary(
by = gender,
statistic = all_continuous() ~ "{mean} ({sd})",
digits = all_continuous() ~ 1,
missing = "ifany"
) %>%
add_p(
test = list(all_categorical() ~ "chisq.test",
all_continuous() ~ "t.test")
) %>%
add_q() %>% # optional: add Benjamini-Hochberg FDR q-values
modify_header(label = "**Variable**") %>%
bold_labels()
tbl_by_gender
| Variable | Female N = 1321 |
Male N = 1181 |
p-value2 | q-value3 |
|---|---|---|---|---|
| age_group | 0.6 | >0.9 | ||
| 18-29 | 33 (25%) | 36 (31%) | ||
| 30-44 | 46 (35%) | 43 (36%) | ||
| 45-59 | 30 (23%) | 24 (20%) | ||
| 60+ | 23 (17%) | 15 (13%) | ||
| marital_status | 0.8 | >0.9 | ||
| Divorced | 13 (9.8%) | 12 (10%) | ||
| Married | 72 (55%) | 60 (51%) | ||
| Single | 32 (24%) | 35 (30%) | ||
| Widowed | 15 (11%) | 11 (9.3%) | ||
| education_level | 0.8 | >0.9 | ||
| Illiterate | 19 (14%) | 12 (10%) | ||
| Primary | 43 (33%) | 39 (33%) | ||
| Secondary | 46 (35%) | 44 (37%) | ||
| University | 24 (18%) | 23 (19%) | ||
| occupation | 0.7 | >0.9 | ||
| Business | 23 (17%) | 19 (16%) | ||
| Farmer | 33 (25%) | 39 (33%) | ||
| Service | 38 (29%) | 28 (24%) | ||
| Student | 23 (17%) | 18 (15%) | ||
| Unemployed | 15 (11%) | 14 (12%) | ||
| physical_activity | 0.4 | >0.9 | ||
| High | 32 (24%) | 22 (19%) | ||
| Low | 46 (35%) | 49 (42%) | ||
| Moderate | 54 (41%) | 47 (40%) | ||
| self_rated_health | >0.9 | >0.9 | ||
| Excellent | 24 (18%) | 20 (17%) | ||
| Fair | 50 (38%) | 41 (35%) | ||
| Good | 43 (33%) | 42 (36%) | ||
| Poor | 15 (11%) | 15 (13%) | ||
| monthly_income | 32,486.1 (16,010.1) | 33,965.5 (16,107.5) | 0.5 | >0.9 |
| 1 n (%); Mean (SD) | ||||
| 2 Pearson’s Chi-squared test; Welch Two Sample t-test | ||||
| 3 False discovery rate correction for multiple testing | ||||
### Interpretation: When comparing males and females: - Age, marital status, education, and occupation showed no significant differences (p > 0.6). - Both genders had similar levels of physical activity and self-rated health.- Monthly income was slightly higher among males (33,965 BDT) than females (32,486 BDT), but this was not statistically significant (p = 0.5).Overall, gender differences across all variables were minimal and statistically insignificant, suggesting that demographic and health patterns are consistent between males and females.
tbl_by_disease <- df %>%
select(all_of(c(vars_cat, vars_num)), chronic_disease) %>%
tbl_summary(
by = chronic_disease,
statistic = all_continuous() ~ "{mean} ({sd})",
digits = all_continuous() ~ 1,
missing = "ifany"
) %>%
add_p(
test = list(all_categorical() ~ "chisq.test",
all_continuous() ~ "t.test")
) %>%
add_q() %>%
modify_header(label = "**Variable**") %>%
bold_labels()
tbl_by_disease
| Variable | No N = 1631 |
Yes N = 871 |
p-value2 | q-value3 |
|---|---|---|---|---|
| age_group | 0.7 | 0.9 | ||
| 18-29 | 46 (28%) | 23 (26%) | ||
| 30-44 | 61 (37%) | 28 (32%) | ||
| 45-59 | 33 (20%) | 21 (24%) | ||
| 60+ | 23 (14%) | 15 (17%) | ||
| marital_status | 0.3 | 0.8 | ||
| Divorced | 18 (11%) | 7 (8.0%) | ||
| Married | 91 (56%) | 41 (47%) | ||
| Single | 39 (24%) | 28 (32%) | ||
| Widowed | 15 (9.2%) | 11 (13%) | ||
| education_level | 0.3 | 0.8 | ||
| Illiterate | 16 (9.8%) | 15 (17%) | ||
| Primary | 57 (35%) | 25 (29%) | ||
| Secondary | 57 (35%) | 33 (38%) | ||
| University | 33 (20%) | 14 (16%) | ||
| occupation | 0.6 | 0.9 | ||
| Business | 26 (16%) | 16 (18%) | ||
| Farmer | 49 (30%) | 23 (26%) | ||
| Service | 40 (25%) | 26 (30%) | ||
| Student | 26 (16%) | 15 (17%) | ||
| Unemployed | 22 (13%) | 7 (8.0%) | ||
| physical_activity | >0.9 | >0.9 | ||
| High | 34 (21%) | 20 (23%) | ||
| Low | 62 (38%) | 33 (38%) | ||
| Moderate | 67 (41%) | 34 (39%) | ||
| self_rated_health | 0.6 | 0.9 | ||
| Excellent | 31 (19%) | 13 (15%) | ||
| Fair | 62 (38%) | 29 (33%) | ||
| Good | 51 (31%) | 34 (39%) | ||
| Poor | 19 (12%) | 11 (13%) | ||
| monthly_income | 31,445.6 (15,951.9) | 36,442.1 (15,786.1) | 0.019 | 0.13 |
| 1 n (%); Mean (SD) | ||||
| 2 Pearson’s Chi-squared test; Welch Two Sample t-test | ||||
| 3 False discovery rate correction for multiple testing | ||||
###Interpretation: Comparing participants with and without chronic diseases: There were no significant differences in age, marital status, education, or occupation (all q > 0.8).Income was somewhat higher among those with chronic diseases (36,442 BDT vs. 31,446 BDT; p = 0.019), though this association weakened after adjusting for multiple testing (q = 0.13).Physical activity and self-rated health patterns were similar between groups. This suggests that chronic disease presence was not strongly linked to socioeconomic or lifestyle differences, except for a slight income-related trend.
# outcome: chronic_disease (factor with levels No/Yes)
tbl_unadjusted <- df %>%
select(chronic_disease, gender, age_group, education_level, physical_activity, monthly_income) %>%
tbl_uvregression(
method = glm,
y = chronic_disease,
method.args = list(family = binomial),
exponentiate = TRUE
) %>%
bold_labels()
tbl_unadjusted
| Characteristic | N | OR | 95% CI | p-value |
|---|---|---|---|---|
| gender | 250 | |||
| Female | — | — | ||
| Male | 1.00 | 0.59, 1.68 | >0.9 | |
| age_group | 250 | |||
| 18-29 | — | — | ||
| 30-44 | 0.92 | 0.47, 1.80 | 0.8 | |
| 45-59 | 1.27 | 0.60, 2.68 | 0.5 | |
| 60+ | 1.30 | 0.57, 2.96 | 0.5 | |
| education_level | 250 | |||
| Illiterate | — | — | ||
| Primary | 0.47 | 0.20, 1.09 | 0.079 | |
| Secondary | 0.62 | 0.27, 1.42 | 0.3 | |
| University | 0.45 | 0.17, 1.15 | 0.10 | |
| physical_activity | 250 | |||
| High | — | — | ||
| Low | 0.90 | 0.45, 1.83 | 0.8 | |
| Moderate | 0.86 | 0.43, 1.73 | 0.7 | |
| monthly_income | 250 | 1.00 | 1.00, 1.00 | 0.020 |
| Abbreviations: CI = Confidence Interval, OR = Odds Ratio | ||||
###Interpretation: In the unadjusted model:Gender, age group, education, and physical activity were not significantly associated with disease outcome (all p > 0.05). However, monthly income showed a small but significant positive association (p = 0.020), implying that higher income might slightly increase the likelihood of the studied health outcome.
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.