Physical_activity.csv data

Loading Packages

library(gtsummary)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

filter, lag

The following objects are masked from ‘package:base’:

intersect, setdiff, setequal, union

library(broom.helpers)
## 
## Attaching package: 'broom.helpers'
## The following objects are masked from 'package:gtsummary':
## 
##     all_categorical, all_continuous, all_contrasts, all_dichotomous,
##     all_interaction, all_intercepts

Attaching package: ‘broom.helpers’

The following objects are masked from ‘package:gtsummary’:

all_categorical, all_continuous, all_contrasts, all_dichotomous,

all_interaction, all_intercepts

Dataset Load

physical_activity <- read.csv("C:/Users/JAHID HASAN/Desktop/physical_activity.csv")
View(physical_activity)
View(physical_activity)
head(physical_activity)

INTERPRETATION: The dataset was successfully loaded from a CSV file using the read.csv() function. It includes key variables such as physical activity ,age_group,gender, marital_status,education_level,occupation,monthly_income,chronic_disease,self_rated_health. The first few rows of data confirm that it was imported correctly, with appropriate values for each column. The data contains a mix of numeric and categorical variables, making it suitable for both regression and summary analyses. This dataset provides a broad overview of mothers’ demographic, socioeconomic, and health-related information.

UNIVARIATE ANALYSIS

physical_activity %>%
tbl_summary()
Characteristic N = 2501
participant_id 126 (63, 188)
age_group
    18-29 69 (28%)
    30-44 89 (36%)
    45-59 54 (22%)
    60+ 38 (15%)
gender
    Female 132 (53%)
    Male 118 (47%)
marital_status
    Divorced 25 (10%)
    Married 132 (53%)
    Single 67 (27%)
    Widowed 26 (10%)
education_level
    Illiterate 31 (12%)
    Primary 82 (33%)
    Secondary 90 (36%)
    University 47 (19%)
occupation
    Business 42 (17%)
    Farmer 72 (29%)
    Service 66 (26%)
    Student 41 (16%)
    Unemployed 29 (12%)
monthly_income 33,588 (19,877, 47,053)
physical_activity
    High 54 (22%)
    Low 95 (38%)
    Moderate 101 (40%)
chronic_disease 87 (35%)
self_rated_health
    Excellent 44 (18%)
    Fair 91 (36%)
    Good 85 (34%)
    Poor 30 (12%)
1 Median (Q1, Q3); n (%)

INTERPRETATION :

Overall characteristics:

Total Participants: The study includes 250 participants.

Categorical variables are summarized with the frequency and the corresponding percentage (%).

age_group: The highest number of participants (89, or 36%) are in the 30–44 age group. The 45–59 age group also has a large representation (54, or 22%), while the 60+ age group is the smallest (38, or 15%).

physical_activity: The majority of participants report having a moderate (101, 40%) or low (95, 38%) level of physical activity. A smaller proportion (54, 22%) report high physical activity.

gender: The sample is relatively balanced, with slightly more females (132, 53%) than males (118, 47%).

marital_status: The largest group of participants is married (132, 53%), followed by single individuals (67, 27%). Divorced (25, 10%) and widowed (26, 10%) participants make up the smaller segments.

education_level: The majority of participants have a secondary (90, 36%) or primary (82, 33%) education lebel. A smaller proportion (47, 19%) have a university education, and the smallest group is illiterate (31, 12%).

occupation: Farmers represent the largest occupational group (72, 29%), followed closely by those in service (66, 26%). Business owners (42, 17%), students (41, 16%), and unemployed individuals (29, 12%) form the smaller groups.

self_rated_health: The highest number of participants rate their health as fair (91, 36%), while almost as many rate it as good (85, 34%). A smaller proportion rate their health as excellent (44, 18%) or poor (30, 12%).

Continuous variables: are summarized using a median and interquartile range (IQR), represented as Median (Q1, Q3).

participant_id: This variable is likely an identification number and is not meaningful to interpret statistically.

monthly_income:Median: The median monthly income is 33,588. This means that 50% of the participants have a monthly income below this value and 50% have an income above it. Q1: The first quartile is 19,877. This means that 25% of the participants have a monthly income of 19,877 or less. Q3: The third quartile is 47,053. This means that 75% of the participants have a monthly income of 47,053 or less. The spread between Q1 and Q3 (the interquartile range) is considerable, from roughly 20,000 to 47,000, suggesting a wide range of incomes among the middle 50% of participants. The median is closer to the first quartile, which may indicate a right-skewed distribution, meaning some individuals have very high incomes that pull the average up.

chronic_disease: This variable is represented as a median and IQR, but the labels “87” and “35” don’t match the standard quartile interpretation. It is highly likely that chronic_disease is a categorical or dichotomous variable that was either misclassified as continuous by tbl_summary() or the summary statistics have been misinterpreted.If it’s categorical (Yes/No), then “87 (35)” probably means 87 participants (35% of the sample) reported having a chronic disease.

BIVARIATE ANALYSIS

physical_activity %>%
tbl_summary(by=physical_activity)
Characteristic High
N = 54
1
Low
N = 95
1
Moderate
N = 101
1
participant_id 104 (55, 186) 131 (70, 178) 133 (67, 194)
age_group


    18-29 13 (24%) 31 (33%) 25 (25%)
    30-44 16 (30%) 32 (34%) 41 (41%)
    45-59 14 (26%) 18 (19%) 22 (22%)
    60+ 11 (20%) 14 (15%) 13 (13%)
gender


    Female 32 (59%) 46 (48%) 54 (53%)
    Male 22 (41%) 49 (52%) 47 (47%)
marital_status


    Divorced 8 (15%) 12 (13%) 5 (5.0%)
    Married 30 (56%) 49 (52%) 53 (52%)
    Single 12 (22%) 23 (24%) 32 (32%)
    Widowed 4 (7.4%) 11 (12%) 11 (11%)
education_level


    Illiterate 8 (15%) 11 (12%) 12 (12%)
    Primary 13 (24%) 28 (29%) 41 (41%)
    Secondary 19 (35%) 38 (40%) 33 (33%)
    University 14 (26%) 18 (19%) 15 (15%)
occupation


    Business 12 (22%) 18 (19%) 12 (12%)
    Farmer 14 (26%) 22 (23%) 36 (36%)
    Service 14 (26%) 28 (29%) 24 (24%)
    Student 8 (15%) 17 (18%) 16 (16%)
    Unemployed 6 (11%) 10 (11%) 13 (13%)
monthly_income 31,963 (15,279, 40,891) 36,476 (23,144, 50,941) 30,441 (17,736, 41,901)
chronic_disease 20 (37%) 33 (35%) 34 (34%)
self_rated_health


    Excellent 6 (11%) 18 (19%) 20 (20%)
    Fair 20 (37%) 32 (34%) 39 (39%)
    Good 19 (35%) 35 (37%) 31 (31%)
    Poor 9 (17%) 10 (11%) 11 (11%)
1 Median (Q1, Q3); n (%)

INTERPRETATION :

The table uses:n (%) for categorical data, which provides the number and row-wise percentage for each category. Median (Q1, Q3) for continuous data, where the median is the 50th percentile, and Q1 (first quartile) and Q3 (third quartile) represent the 25th and 75th percentiles, respectively. The IQR (Q3 - Q1) represents the middle 50% of the data. Demographics

Age group: The distribution of age groups differs across physical activity levels. The Moderate group has a higher proportion of individuals aged 30-44 (41%), while the Low group has a higher proportion of those aged 18-29 (33%). The High group shows a more even distribution across the younger age ranges.

Gender: A slightly higher percentage of females (59%) are in the High activity group compared to males (41%). Conversely, the Low group is composed of a higher percentage of males (52%) than females (48%). The Moderate group has a more balanced split.

Marital status: The highest percentage of divorced participants is found in the High group (15%), while the highest percentage of single individuals is in the Moderate group (32%).

Education level: There’s a notable concentration of individuals with a Primary education level in the Moderate group (41%), whereas the highest proportion of University-educated individuals is in the High activity group (26%).

Occupation: The occupation distribution varies, with Farmers being most prevalent in the Moderate group (36%) and Business owners in the High group (22%).

Monthly income: The median monthly income is highest for the Low physical activity group (36,476), followed by the High group (31,963), and is lowest for the Moderate group (30,441). The interquartile ranges (IQR) are also useful for comparing the spread of incomes in each group.

Chronic disease: The proportion of participants with a chronic disease is similar across the groups, ranging from 34% to 37%.

Self-rated health: The distribution of self-rated health varies. A smaller percentage of the High group (17%) rates their health as Poor compared to the Low (11%) and Moderate (11%) groups. The Low group has the highest percentage of individuals with Excellent self-rated health (19%).

SUMMERY WITH P-VALUES

physical_activity %>%
tbl_summary(by = physical_activity) %>%
add_p()
Characteristic High
N = 54
1
Low
N = 95
1
Moderate
N = 101
1
p-value2
participant_id 104 (55, 186) 131 (70, 178) 133 (67, 194) 0.6
age_group


0.6
    18-29 13 (24%) 31 (33%) 25 (25%)
    30-44 16 (30%) 32 (34%) 41 (41%)
    45-59 14 (26%) 18 (19%) 22 (22%)
    60+ 11 (20%) 14 (15%) 13 (13%)
gender


0.4
    Female 32 (59%) 46 (48%) 54 (53%)
    Male 22 (41%) 49 (52%) 47 (47%)
marital_status


0.3
    Divorced 8 (15%) 12 (13%) 5 (5.0%)
    Married 30 (56%) 49 (52%) 53 (52%)
    Single 12 (22%) 23 (24%) 32 (32%)
    Widowed 4 (7.4%) 11 (12%) 11 (11%)
education_level


0.3
    Illiterate 8 (15%) 11 (12%) 12 (12%)
    Primary 13 (24%) 28 (29%) 41 (41%)
    Secondary 19 (35%) 38 (40%) 33 (33%)
    University 14 (26%) 18 (19%) 15 (15%)
occupation


0.6
    Business 12 (22%) 18 (19%) 12 (12%)
    Farmer 14 (26%) 22 (23%) 36 (36%)
    Service 14 (26%) 28 (29%) 24 (24%)
    Student 8 (15%) 17 (18%) 16 (16%)
    Unemployed 6 (11%) 10 (11%) 13 (13%)
monthly_income 31,963 (15,279, 40,891) 36,476 (23,144, 50,941) 30,441 (17,736, 41,901) 0.027
chronic_disease 20 (37%) 33 (35%) 34 (34%) >0.9
self_rated_health


0.7
    Excellent 6 (11%) 18 (19%) 20 (20%)
    Fair 20 (37%) 32 (34%) 39 (39%)
    Good 19 (35%) 35 (37%) 31 (31%)
    Poor 9 (17%) 10 (11%) 11 (11%)
1 Median (Q1, Q3); n (%)
2 Kruskal-Wallis rank sum test; Pearson’s Chi-squared test

INTERPRETATION :

#The study included 250 participants in total (High: N=54, Low: N=95, Moderate: N=101).

For continuous variables (participant_id and monthly_income), the median (Q1, Q3) is reported.

For categorical variables (age_group, gender, etc.), the counts and percentages are shown.

The statistical tests used were the Kruskal-Wallis rank sum test for continuous variables and Pearson’s Chi-squared test for categorical variables.

Statistically significant differences (p<0.05).Monthly income (p=0.027).P-value of 0.027 is less than 0.05, indicating a statistically significant difference in monthly income across the three physical activity groups.A post-hoc test would be needed to determine exactly which groups differ from each other.The table shows that the median income for the Low physical activity group (36,476) is higher than for the High (31,963) and Moderate (30,441) groups.

Characteristics with no significant differences (p>0.05).The following characteristics do not show a statistically significant difference across the different physical activity levels because their p-values are all greater than 0.05. This suggests that the distribution of these variables is similar for people in the High, Low, and Moderate activity groups.Age group (p=0.6): The age distribution is not significantly different between the groups. For instance, the percentage of participants aged 18–29 is relatively similar across the groups (24% for High, 33% for Low, 25% for Moderate).Gender (p=0.4)): There is no significant association between gender and physical activity level. The gender proportions are similar across the groups.Marital status (p=0.3): The distribution of marital status categories does not significantly differ between the physical activity groups.Education level (p=0.3): The education levels of participants are not significantly associated with their physical activity level.Occupation (p=0.6): Participants’ occupations are distributed similarly across the High, Low, and Moderate physical activity groups.Chronic disease (p>0.9): The prevalence of chronic disease is not significantly different across the groups. Around 34–37% of participants in each group report having a chronic disease.Self-rated health (p=0.7): The distribution of self-rated health ratings (Excellent, Fair, Good, Poor) is not significantly different among the physical activity groups

LOGISTIC REGRESSION

model <- glm( factor(physical_activity) ~ age_group + gender + marital_status + education_level + occupation + monthly_income + chronic_disease + self_rated_health , data = physical_activity, family = "binomial")


 tbl_regression(model)
Characteristic log(OR) 95% CI p-value
age_group


    18-29
    30-44 0.02 -0.88, 0.91 >0.9
    45-59 -0.48 -1.4, 0.46 0.3
    60+ -0.52 -1.5, 0.49 0.3
gender


    Female
    Male 0.29 -0.35, 0.94 0.4
marital_status


    Divorced
    Married 0.82 -0.25, 1.9 0.12
    Single 1.1 -0.11, 2.2 0.071
    Widowed 1.2 -0.22, 2.7 0.11
education_level


    Illiterate
    Primary 0.69 -0.40, 1.7 0.2
    Secondary 0.39 -0.67, 1.4 0.5
    University 0.03 -1.1, 1.2 >0.9
occupation


    Business
    Farmer 0.40 -0.57, 1.4 0.4
    Service 0.50 -0.48, 1.5 0.3
    Student 0.46 -0.63, 1.6 0.4
    Unemployed 0.39 -0.80, 1.7 0.5
monthly_income 0.00 0.00, 0.00 0.11
chronic_disease


    No
    Yes -0.16 -0.83, 0.53 0.6
self_rated_health


    Excellent
    Fair -0.31 -1.4, 0.68 0.6
    Good -0.40 -1.5, 0.62 0.5
    Poor -1.0 -2.3, 0.20 0.11
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

INTERPRETATION :

Interpretation of individual characteristics:

#Age group log(OR): The log(OR) for all age groups from 30+ compared to the 18-29 reference group are negative, suggesting older age is associated with a lower likelihood of physical activity. 95% CI and p-values: The confidence intervals for all age groups include 0. The p-values (0.9, 0.3, 0.3) are all well above the conventional 0.05 significance level. Conclusion: There is no statistically significant evidence that physical activity levels differ across these age groups, based on this model.

#Gender log(OR): The positive log(OR) for Male (0.29) compared to the Female reference group suggests that males may be more physically active. 95% CI and p-value: The 95% CI includes 0 (-0.35, 0.94), and the p-value (0.4) is not statistically significant. Conclusion: There is no statistically significant evidence of a difference in physical activity between males and females in this model.

#Marital status log(OR): The positive log(OR) for Married, Single, and Widowed compared to the Divorced reference group suggests that divorced individuals may have the lowest level of physical activity. 95% CI and p-values: The 95% CIs for all categories include 0. The p-values (0.12, 0.071, 0.11) are close to but do not meet the standard 0.05 threshold for statistical significance. Conclusion: No marital status is a statistically significant predictor of physical activity in this model.

#Education level log(OR): The log(OR)s for the other education levels compared to the Illiterate reference group are all positive but small, except for University which is near zero. 95% CI and p-values: The 95% CIs include 0 for all education levels, and all p-values are not statistically significant. Conclusion: Education level is not a statistically significant predictor of physical activity.

#Occupation log(OR): The positive log(OR)s for other occupations compared to the Business reference group suggest higher physical activity, but the effects are small. 95% CI and p-values: All 95% CIs include 0, and all p-values are not statistically significant. Conclusion: Occupation is not a statistically significant predictor of physical activity.

#Monthly income log(OR) and 95% CI: The log(OR) is effectively zero, with a 95% CI of (0.00, 0.00). This indicates that the variable is likely coded or handled improperly, or its effect is negligible. p-value: The p-value of 0.11 is not statistically significant. Conclusion: There is no significant association between monthly income and physical activity.

#Chronic disease log(OR): The negative log(OR) (-0.16) for those with a chronic_disease (Yes) compared to those without (No) suggests a slightly lower level of physical activity. 95% CI and p-value: The 95% CI includes 0, and the p-value (0.6) is not statistically significant. Conclusion: Having a chronic disease is not a statistically significant predictor of physical activity.

#Self-rated health log(OR): All log(OR)s are negative when compared to the Excellent reference group, indicating that lower self-rated health is associated with less physical activity. 95% CI and p-values: The 95% CIs for all categories include 0, and none of the p-values are statistically significant. The effect size for Poor health (-1.0) is notable, but with a p-value of 0.11, it is not statistically significant in this model. Conclusion: Self-rated health is not a statistically significant predictor of physical activity

OVERALL CONCLUSION:

Based on this logistic regression output, none of the predictor variables are statistically significant at the conventional 0.05 level. The model suggests no reliable associations between any of the measured characteristics and the likelihood of being physically active.