R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

# Set your working directory to where the file is saved (optional)
setwd("C:\\Users\\sudipta.gupta\\Pictures")
# Import the dataset
physical_activity <- read.csv("physical_activity.csv", header = TRUE)
# View the first few rows
head(physical_activity)
##   participant_id age_group gender marital_status education_level occupation
## 1              1     18-29   Male         Single       Secondary     Farmer
## 2              2     18-29   Male        Married       Secondary   Business
## 3              3     45-59 Female        Married         Primary     Farmer
## 4              4     45-59 Female       Divorced      University    Student
## 5              5     45-59   Male        Widowed      Illiterate    Student
## 6              6     45-59   Male        Married       Secondary     Farmer
##   monthly_income physical_activity chronic_disease self_rated_health
## 1          32696               Low              No              Good
## 2          40891              High             Yes              Good
## 3          28615          Moderate              No         Excellent
## 4          57190               Low             Yes              Fair
## 5          56976          Moderate             Yes         Excellent
## 6          34214          Moderate              No              Fair
# Check basic info
str(physical_activity)
## 'data.frame':    250 obs. of  10 variables:
##  $ participant_id   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ age_group        : chr  "18-29" "18-29" "45-59" "45-59" ...
##  $ gender           : chr  "Male" "Male" "Female" "Female" ...
##  $ marital_status   : chr  "Single" "Married" "Married" "Divorced" ...
##  $ education_level  : chr  "Secondary" "Secondary" "Primary" "University" ...
##  $ occupation       : chr  "Farmer" "Business" "Farmer" "Student" ...
##  $ monthly_income   : int  32696 40891 28615 57190 56976 34214 41112 8866 14249 25496 ...
##  $ physical_activity: chr  "Low" "High" "Moderate" "Low" ...
##  $ chronic_disease  : chr  "No" "Yes" "No" "Yes" ...
##  $ self_rated_health: chr  "Good" "Good" "Excellent" "Fair" ...
summary(physical_activity)
##  participant_id    age_group            gender          marital_status    
##  Min.   :  1.00   Length:250         Length:250         Length:250        
##  1st Qu.: 63.25   Class :character   Class :character   Class :character  
##  Median :125.50   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :125.50                                                           
##  3rd Qu.:187.75                                                           
##  Max.   :250.00                                                           
##  education_level     occupation        monthly_income  physical_activity 
##  Length:250         Length:250         Min.   : 5052   Length:250        
##  Class :character   Class :character   1st Qu.:19940   Class :character  
##  Mode  :character   Mode  :character   Median :33588   Mode  :character  
##                                        Mean   :33184                     
##                                        3rd Qu.:46991                     
##                                        Max.   :59937                     
##  chronic_disease    self_rated_health 
##  Length:250         Length:250        
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
## 
#load packages
library(gtsummary)
## Warning: package 'gtsummary' was built under R version 4.5.1
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.5.1
## Warning: package 'ggplot2' was built under R version 4.5.1
## Warning: package 'tibble' was built under R version 4.5.1
## Warning: package 'tidyr' was built under R version 4.5.1
## Warning: package 'readr' was built under R version 4.5.1
## Warning: package 'purrr' was built under R version 4.5.1
## Warning: package 'dplyr' was built under R version 4.5.1
## Warning: package 'stringr' was built under R version 4.5.1
## Warning: package 'forcats' was built under R version 4.5.1
## Warning: package 'lubridate' was built under R version 4.5.1
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Generate a descriptive summary for all variables
summary_table <- physical_activity %>%
  tbl_summary(
    statistic = list(
      all_continuous() ~ "{mean} ± {sd}",   # Show mean ± SD for numeric vars
      all_categorical() ~ "{n} ({p}%)"      # Show n (%) for categorical vars
    ),
    missing = "no"  # or "ifany" to show missing counts
  )
# Print the summary table
summary_table
Characteristic N = 2501
participant_id 126 ± 72
age_group
    18-29 69 (28%)
    30-44 89 (36%)
    45-59 54 (22%)
    60+ 38 (15%)
gender
    Female 132 (53%)
    Male 118 (47%)
marital_status
    Divorced 25 (10%)
    Married 132 (53%)
    Single 67 (27%)
    Widowed 26 (10%)
education_level
    Illiterate 31 (12%)
    Primary 82 (33%)
    Secondary 90 (36%)
    University 47 (19%)
occupation
    Business 42 (17%)
    Farmer 72 (29%)
    Service 66 (26%)
    Student 41 (16%)
    Unemployed 29 (12%)
monthly_income 33,184 ± 16,041
physical_activity
    High 54 (22%)
    Low 95 (38%)
    Moderate 101 (40%)
chronic_disease 87 (35%)
self_rated_health
    Excellent 44 (18%)
    Fair 91 (36%)
    Good 85 (34%)
    Poor 30 (12%)
1 Mean ± SD; n (%)
### Interpretation: The study included 2,501 participants, with a mean age of 126 ± 72 months (approx. 10.5 years)—likely an error or unit issue since age groups clearly indicate adults. Most participants were aged 30–44 years (36%), followed by 18–29 years (28%) and 45–59 years (22%). The gender distribution was relatively balanced (53% female, 47% male). Over half were married (53%), and around one-fourth were single (27%). Educational attainment was modest — only 19% had university education, while the largest share had secondary education (36%). About 12% were illiterate.Regarding occupation, participants were fairly distributed among farmers (29%), service holders (26%), and businesspersons (17%), with smaller proportions of students (16%) and unemployed (12%). The mean monthly income was approximately 33,184 ± 16,041 BDT, suggesting a moderately varied income range.In terms of physical activity, most participants reported moderate activity (40%), followed by low (38%) and high (22%) levels. About 35% had a chronic disease, and self-rated health status was mostly fair (36%) or good (34%), while only 12% rated their health as poor.

#If you haven't already imported and converted factors, run:
df <- readr::read_csv("physical_activity.csv") %>%
  mutate(
    chronic_disease   = factor(chronic_disease, levels = c("No","Yes")),
    gender            = factor(gender),
    age_group         = factor(age_group),
    marital_status    = factor(marital_status),
    education_level   = factor(education_level),
    occupation        = factor(occupation),
    physical_activity = factor(physical_activity),
    self_rated_health = factor(self_rated_health)
  )
## Rows: 250 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (8): age_group, gender, marital_status, education_level, occupation, phy...
## dbl (2): participant_id, monthly_income
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# list of variables you want in bivariate tables
vars_cat <- c("age_group","marital_status","education_level",
              "occupation","physical_activity","self_rated_health")
vars_num <- c("monthly_income")  # add other numeric vars if present

# By gender
tbl_by_gender <- df %>%
  select(all_of(c(vars_cat, vars_num)), gender) %>%
  tbl_summary(
    by = gender,
    statistic = all_continuous() ~ "{mean} ({sd})",
    digits = all_continuous() ~ 1,
    missing = "ifany"
  ) %>%
  add_p(
    test = list(all_categorical() ~ "chisq.test",
                all_continuous()  ~ "t.test")
  ) %>%
  add_q() %>%                    # optional: add Benjamini-Hochberg FDR q-values
  modify_header(label = "**Variable**") %>%
  bold_labels()

tbl_by_gender
Variable Female
N = 132
1
Male
N = 118
1
p-value2 q-value3
age_group

0.6 >0.9
    18-29 33 (25%) 36 (31%)

    30-44 46 (35%) 43 (36%)

    45-59 30 (23%) 24 (20%)

    60+ 23 (17%) 15 (13%)

marital_status

0.8 >0.9
    Divorced 13 (9.8%) 12 (10%)

    Married 72 (55%) 60 (51%)

    Single 32 (24%) 35 (30%)

    Widowed 15 (11%) 11 (9.3%)

education_level

0.8 >0.9
    Illiterate 19 (14%) 12 (10%)

    Primary 43 (33%) 39 (33%)

    Secondary 46 (35%) 44 (37%)

    University 24 (18%) 23 (19%)

occupation

0.7 >0.9
    Business 23 (17%) 19 (16%)

    Farmer 33 (25%) 39 (33%)

    Service 38 (29%) 28 (24%)

    Student 23 (17%) 18 (15%)

    Unemployed 15 (11%) 14 (12%)

physical_activity

0.4 >0.9
    High 32 (24%) 22 (19%)

    Low 46 (35%) 49 (42%)

    Moderate 54 (41%) 47 (40%)

self_rated_health

>0.9 >0.9
    Excellent 24 (18%) 20 (17%)

    Fair 50 (38%) 41 (35%)

    Good 43 (33%) 42 (36%)

    Poor 15 (11%) 15 (13%)

monthly_income 32,486.1 (16,010.1) 33,965.5 (16,107.5) 0.5 >0.9
1 n (%); Mean (SD)
2 Pearson’s Chi-squared test; Welch Two Sample t-test
3 False discovery rate correction for multiple testing
### Interpretation: When comparing males and females: - Age, marital status, education, and occupation showed no significant differences (p > 0.6). - Both genders had similar levels of physical activity and self-rated health.- Monthly income was slightly higher among males (33,965 BDT) than females (32,486 BDT), but this was not statistically significant (p = 0.5).Overall, gender differences across all variables were minimal and statistically insignificant, suggesting that demographic and health patterns are consistent between males and females.

tbl_by_disease <- df %>%
  select(all_of(c(vars_cat, vars_num)), chronic_disease) %>%
  tbl_summary(
    by = chronic_disease,
    statistic = all_continuous() ~ "{mean} ({sd})",
    digits = all_continuous() ~ 1,
    missing = "ifany"
  ) %>%
  add_p(
    test = list(all_categorical() ~ "chisq.test",
                all_continuous()  ~ "t.test")
  ) %>%
  add_q() %>%
  modify_header(label = "**Variable**") %>%
  bold_labels()

tbl_by_disease
Variable No
N = 163
1
Yes
N = 87
1
p-value2 q-value3
age_group

0.7 0.9
    18-29 46 (28%) 23 (26%)

    30-44 61 (37%) 28 (32%)

    45-59 33 (20%) 21 (24%)

    60+ 23 (14%) 15 (17%)

marital_status

0.3 0.8
    Divorced 18 (11%) 7 (8.0%)

    Married 91 (56%) 41 (47%)

    Single 39 (24%) 28 (32%)

    Widowed 15 (9.2%) 11 (13%)

education_level

0.3 0.8
    Illiterate 16 (9.8%) 15 (17%)

    Primary 57 (35%) 25 (29%)

    Secondary 57 (35%) 33 (38%)

    University 33 (20%) 14 (16%)

occupation

0.6 0.9
    Business 26 (16%) 16 (18%)

    Farmer 49 (30%) 23 (26%)

    Service 40 (25%) 26 (30%)

    Student 26 (16%) 15 (17%)

    Unemployed 22 (13%) 7 (8.0%)

physical_activity

>0.9 >0.9
    High 34 (21%) 20 (23%)

    Low 62 (38%) 33 (38%)

    Moderate 67 (41%) 34 (39%)

self_rated_health

0.6 0.9
    Excellent 31 (19%) 13 (15%)

    Fair 62 (38%) 29 (33%)

    Good 51 (31%) 34 (39%)

    Poor 19 (12%) 11 (13%)

monthly_income 31,445.6 (15,951.9) 36,442.1 (15,786.1) 0.019 0.13
1 n (%); Mean (SD)
2 Pearson’s Chi-squared test; Welch Two Sample t-test
3 False discovery rate correction for multiple testing
###Interpretation: Comparing participants with and without chronic diseases: There were no significant differences in age, marital status, education, or occupation (all q > 0.8).Income was somewhat higher among those with chronic diseases (36,442 BDT vs. 31,446 BDT; p = 0.019), though this association weakened after adjusting for multiple testing (q = 0.13).Physical activity and self-rated health patterns were similar between groups. This suggests that chronic disease presence was not strongly linked to socioeconomic or lifestyle differences, except for a slight income-related trend.

# outcome: chronic_disease (factor with levels No/Yes)
tbl_unadjusted <- df %>%
  select(chronic_disease, gender, age_group, education_level, physical_activity, monthly_income) %>%
  tbl_uvregression(
    method = glm,
    y = chronic_disease,
    method.args = list(family = binomial),
    exponentiate = TRUE
  ) %>%
  bold_labels()

tbl_unadjusted
Characteristic N OR 95% CI p-value
gender 250


    Female

    Male
1.00 0.59, 1.68 >0.9
age_group 250


    18-29

    30-44
0.92 0.47, 1.80 0.8
    45-59
1.27 0.60, 2.68 0.5
    60+
1.30 0.57, 2.96 0.5
education_level 250


    Illiterate

    Primary
0.47 0.20, 1.09 0.079
    Secondary
0.62 0.27, 1.42 0.3
    University
0.45 0.17, 1.15 0.10
physical_activity 250


    High

    Low
0.90 0.45, 1.83 0.8
    Moderate
0.86 0.43, 1.73 0.7
monthly_income 250 1.00 1.00, 1.00 0.020
Abbreviations: CI = Confidence Interval, OR = Odds Ratio
###Interpretation: In the unadjusted model:Gender, age group, education, and physical activity were not significantly associated with disease outcome (all p > 0.05). However, monthly income showed a small but significant positive association (p = 0.020), implying that higher income might slightly increase the likelihood of the studied health outcome.

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.