Obesity and overweight conditions are major public health concerns. This study aims to analyze the effectiveness of a 12-week weight loss program using a dataset collected from 35 participants.I will examine the relationships or impact between diet adherence, exercise intensity, sleep quality, and weight loss using statistical methods in R.
# Load necessary libraries
library(ggplot2)
library(dplyr)
library(corrplot)
library(readxl)
library(naniar)
library(car)
library(gtsummary)
data <- read_excel("Data_WeightLoss.xlsx")
head(data)
## # A tibble: 6 × 9
## Participant Gender Age BaselineWeight DietAdherence ExerciseIntensity
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1 Male 39 92.6 9.6 6.6
## 2 2 Male 48 88.9 9.7 7
## 3 3 Male 31 91.3 8.4 5.1
## 4 4 Female 35 70 7.6 5.9
## 5 5 Male 32 84.3 6.6 9.2
## 6 6 Female 29 76.6 9.7 6.2
## # ℹ 3 more variables: SleepQuality <dbl>, WeightLoss <dbl>, AfterWeight <dbl>
isnull <- colSums(is.na(data))
print(paste("Missing value count:", isnull))
## [1] "Missing value count: 0" "Missing value count: 0" "Missing value count: 0"
## [4] "Missing value count: 0" "Missing value count: 0" "Missing value count: 1"
## [7] "Missing value count: 0" "Missing value count: 0" "Missing value count: 0"
gg_miss_var(data,show_pct = TRUE)+
labs(y = "Displaying the proportion of missings")
numeric_cols <- data[sapply(data, is.numeric)]
stats <- data.frame(
Mean = sapply(numeric_cols, mean, na.rm = TRUE),
Std = sapply(numeric_cols, sd, na.rm = TRUE),
Min = sapply(numeric_cols, min, na.rm = TRUE),
Q1 = sapply(numeric_cols, function(x) quantile(x, 0.25, na.rm = TRUE)),
Median = sapply(numeric_cols, median, na.rm = TRUE),
Q3 = sapply(numeric_cols, function(x) quantile(x, 0.75, na.rm = TRUE)),
Max = sapply(numeric_cols, max, na.rm = TRUE),
IQR = sapply(numeric_cols, IQR, na.rm = TRUE)
)
stats$Lower_Bound = stats$Q1 - 1.5*stats$IQR
stats$Upper_Bound = stats$Q3 + 1.5*stats$IQR
print(stats)
## Mean Std Min Q1 Median Q3 Max IQR
## Participant 18.000000 10.246951 1.0 9.50 18.0 26.50 35.0 17.00
## Age 38.085714 7.875395 26.0 31.50 37.0 46.50 52.0 15.00
## BaselineWeight 84.434286 8.069694 70.0 78.45 84.3 89.85 99.5 11.40
## DietAdherence 8.242857 1.171797 6.2 7.25 8.4 9.20 9.9 1.95
## ExerciseIntensity 7.129412 1.164508 5.1 6.20 6.9 8.05 9.3 1.85
## SleepQuality 7.688571 1.265788 5.8 6.60 7.6 8.40 9.9 1.80
## WeightLoss 9.314286 1.334198 6.6 8.15 9.4 10.20 12.0 2.05
## AfterWeight 75.120000 8.436224 60.9 68.60 74.0 81.65 91.6 13.05
## Lower_Bound Upper_Bound
## Participant -16.000 52.000
## Age 9.000 69.000
## BaselineWeight 61.350 106.950
## DietAdherence 4.325 12.125
## ExerciseIntensity 3.425 10.825
## SleepQuality 3.900 11.100
## WeightLoss 5.075 13.275
## AfterWeight 49.025 101.225
Upon inspecting the dataset, the following observations were made:
Missing values: The dataset has 1
missing value in the ExerciseIntensity
column.
Gender column: The Gender column is
in character format. It may be beneficial to convert
this column into a numeric format for further analysis.
Participant column: The Participant
column appears to be a serial number rather than a
categorical or numeric variable of interest.
Outlier: As per the Lower_Bound and
Upper_Bound of the each column shows that there is no
outlier value
# Drop Participant col
data <- data %>% select(-Participant)
# Convert Gender to factor
data$Gender <- as.factor(data$Gender)
head(data)
## # A tibble: 6 × 8
## Gender Age BaselineWeight DietAdherence ExerciseIntensity SleepQuality
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Male 39 92.6 9.6 6.6 8.1
## 2 Male 48 88.9 9.7 7 6.9
## 3 Male 31 91.3 8.4 5.1 7.6
## 4 Female 35 70 7.6 5.9 9.4
## 5 Male 32 84.3 6.6 9.2 7.9
## 6 Female 29 76.6 9.7 6.2 9.2
## # ℹ 2 more variables: WeightLoss <dbl>, AfterWeight <dbl>
# checking up the correlation with columns and convert the gender factor to numeric and store in "Gender_numeric" column
data$Gender_numeric <- as.numeric(factor(data$Gender))
cor_matrix <- cor(data[, c("WeightLoss", "BaselineWeight","Age", "AfterWeight", "DietAdherence", "ExerciseIntensity", "SleepQuality", "Gender_numeric")], use="complete.obs")
corrplot(cor_matrix, method="number", type="full",order = 'AOE',col= COL2('PuOr', 40),)
print(cor_matrix)
## WeightLoss BaselineWeight Age AfterWeight
## WeightLoss 1.0000000 -0.17557639 -0.32758865 -0.328599400
## BaselineWeight -0.1755764 1.00000000 0.06870527 0.987492122
## Age -0.3275886 0.06870527 1.00000000 0.118379318
## AfterWeight -0.3285994 0.98749212 0.11837932 1.000000000
## DietAdherence 0.5823951 -0.10995272 -0.01491977 -0.198759837
## ExerciseIntensity 0.5855719 0.08816111 -0.08191880 -0.009203663
## SleepQuality 0.2370278 -0.06149738 -0.18388617 -0.096960379
## Gender_numeric 0.2818622 0.09047096 0.01148646 0.041653359
## DietAdherence ExerciseIntensity SleepQuality Gender_numeric
## WeightLoss 0.58239514 0.585571938 0.23702783 0.28186215
## BaselineWeight -0.10995272 0.088161111 -0.06149738 0.09047096
## Age -0.01491977 -0.081918801 -0.18388617 0.01148646
## AfterWeight -0.19875984 -0.009203663 -0.09696038 0.04165336
## DietAdherence 1.00000000 0.010902604 -0.06819197 0.10297619
## ExerciseIntensity 0.01090260 1.000000000 -0.23645092 0.12689491
## SleepQuality -0.06819197 -0.236450924 1.00000000 0.01795168
## Gender_numeric 0.10297619 0.126894912 0.01795168 1.00000000
From the correlation matrix, we observe that the
WeightLoss attribute has significant correlations with
several other numeric variables. Specifically:
DietAdherence and ExerciseIntensity show
strong positive correlations with WeightLoss.SleepQuality and Gender_numeric also
exhibit positive correlations with WeightLoss.BaselineWeight, AfterWeight and
Age have negative correlations with
WeightLoss, indicating that as these variables increase,
weight loss tends to decrease.Overall, we can conclude that DietAdherence and
ExerciseIntensity are key factors positively influencing
weight loss, while BaselineWeight and Age are
negatively related.
for (i in names(data)){
if (i %in% c("WeightLoss","BaselineWeight","Age", "DietAdherence", "ExerciseIntensity", "SleepQuality")){
print(
ggplot(data, aes_string(x = i)) +
geom_histogram(color = "black", bins = 5, alpha = 0.6) +
labs(title = paste("Histogram of ", i), x = i, y = "Frequency") +
theme_minimal()
)
}
}
In here if i select threshold = 0.3 than columns
will:
model <- lm(WeightLoss ~ Age + AfterWeight + DietAdherence + ExerciseIntensity, data = data)
model %>%
tbl_regression() %>%
bold_labels() %>%
bold_p(t=.1)
| Characteristic | Beta | 95% CI | p-value |
|---|---|---|---|
| Age | -0.04 | -0.07, -0.01 | 0.007 |
| AfterWeight | -0.03 | -0.06, 0.00 | 0.044 |
| DietAdherence | 0.61 | 0.40, 0.81 | <0.001 |
| ExerciseIntensity | 0.64 | 0.44, 0.85 | <0.001 |
| Abbreviation: CI = Confidence Interval | |||
summary(model)
##
## Call:
## lm(formula = WeightLoss ~ Age + AfterWeight + DietAdherence +
## ExerciseIntensity, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.62075 -0.31320 -0.05088 0.38025 1.37831
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.56128 1.71251 2.080 0.04651 *
## Age -0.04238 0.01474 -2.875 0.00749 **
## AfterWeight -0.02999 0.01426 -2.103 0.04427 *
## DietAdherence 0.60710 0.10029 6.053 1.38e-06 ***
## ExerciseIntensity 0.64168 0.10011 6.410 5.21e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6674 on 29 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.7822, Adjusted R-squared: 0.7521
## F-statistic: 26.03 on 4 and 29 DF, p-value: 3.117e-09
qqnorm(resid(model))
qqline(resid(model))
#plot(model$fitted.values, resid(model))
#abline(h=0, col="red")
print(vif(model))
## Age AfterWeight DietAdherence ExerciseIntensity
## 1.021073 1.055782 1.041316 1.006857
barplot(vif(model), main = "VIF Plot", col = "lightblue", ylim = c(0, max(vif(model)) + 10))
abline(h = 10, col = "red", lty = 2)
My Model demonstrates exceptional statistical significance with an
F-statistic of 26.03 and a
p-value of 3.117e-09.With an
R-squared value of 0.78 indicates that our model explains
78% of the variance in weight loss outcomes. [4][5]
In here the residuals range from -1.62 to
1.38, with a median close to zero, which is generally a
good sign.
All Variance Inflation Factor (VIF) values in our model
are below 1.06, this ensure that model coefficient
estimates are stable and that model can confidently interpret the
individual effects of each predictor on weight loss outcomes. [1][2][3]
The Q-Q plot of residuals demonstrates a normal distribution , further validating the model’s assumptions and strengthening the credibility of our findings. This normal distribution of residuals indicates that our model is well-specified and that the relationships between variables are appropriately captured.[6]
| Characteristic | Interpretation |
|---|---|
| Age | As age increases, weight loss decreases slightly. |
| AfterWeight | Higher after-weight slightly reduces weight loss. |
| DietAdherence | Higher diet adherence significantly increases weight loss. |
| ExerciseIntensity | Higher exercise intensity significantly increases weight loss. |
In here if i select threshold = 0.2 than columns
will:
model2 <- lm(WeightLoss ~ Age + AfterWeight + DietAdherence + ExerciseIntensity +SleepQuality+ Gender_numeric, data = data)
model2 %>%
tbl_regression() %>%
bold_labels() %>%
bold_p(t=.1)
| Characteristic | Beta | 95% CI | p-value |
|---|---|---|---|
| Age | -0.03 | -0.05, -0.01 | 0.001 |
| AfterWeight | -0.03 | -0.04, -0.01 | 0.004 |
| DietAdherence | 0.62 | 0.51, 0.74 | <0.001 |
| ExerciseIntensity | 0.73 | 0.61, 0.85 | <0.001 |
| SleepQuality | 0.40 | 0.29, 0.52 | <0.001 |
| Gender_numeric | 0.39 | 0.11, 0.66 | 0.007 |
| Abbreviation: CI = Confidence Interval | |||
summary(model2)
##
## Call:
## lm(formula = WeightLoss ~ Age + AfterWeight + DietAdherence +
## ExerciseIntensity + SleepQuality + Gender_numeric, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.81299 -0.18907 -0.04931 0.20005 0.81801
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.638318 1.210923 -1.353 0.18729
## Age -0.030569 0.008611 -3.550 0.00144 **
## AfterWeight -0.025912 0.008216 -3.154 0.00393 **
## DietAdherence 0.624733 0.058015 10.768 2.83e-11 ***
## ExerciseIntensity 0.728287 0.059870 12.164 1.82e-12 ***
## SleepQuality 0.404399 0.057209 7.069 1.34e-07 ***
## Gender_numeric 0.388966 0.133572 2.912 0.00712 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3818 on 27 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.9336, Adjusted R-squared: 0.9189
## F-statistic: 63.32 on 6 and 27 DF, p-value: 1.231e-14
qqnorm(resid(model2))
qqline(resid(model2))
print(vif(model2))
## Age AfterWeight DietAdherence ExerciseIntensity
## 1.064681 1.070766 1.064953 1.100533
## SleepQuality Gender_numeric
## 1.129739 1.036881
barplot(vif(model2), main = "VIF Plot", col = "lightblue", ylim = c(0, max(vif(model2)) + 10),las = 2, cex.names = 0.7)
abline(h = 10, col = "red", lty = 2)
My Model demonstrates exceptional statistical significance with an
F-statistic of 63.32 and ap-value:
1.231e-14.With anR-squared value of
0.93indicates that our model explains93%` of the variance
in weight loss outcomes [4].The
highly significant F-statistic (p < 0.001) provides robust evidence
for the model’s overall validity.[5]
In here the residuals range from -0.81299 to
0.81801, with a median close to zero, which is generally a
good sign.
All Variance Inflation Factor (VIF) values in our model
are below 1.12 and below the threshold, this ensure that
model coefficient estimates are stable and that model can confidently
interpret the individual effects of each predictor on weight loss
outcomes. [1][2][3]
The Q-Q plot of residuals demonstrates a normal distribution , further validating the model’s assumptions and strengthening the credibility of our findings. This normal distribution of residuals indicates that our model is well-specified and that the relationships between variables are appropriately captured.[6]
| Characteristic | Interpretation |
|---|---|
| Age | As age increases, weight loss decreases slightly. |
| AfterWeight | Higher after-weight slightly reduces weight loss. |
| Gender_numeric | Indicates potential differences in weight loss outcomes between genders. |
| SleepQuality | Moderate positive effect |
| DietAdherence | Second strongest positive effect |
| ExerciseIntensity | Strongest positive predictor of weight loss. |
| Predictor | Coefficient (Threshold 0.3) | Coefficient (Threshold 0.2) | p-value (Threshold 0.2) | p-value (Threshold 0.3) | Interpretation |
|---|---|---|---|---|---|
| Age | -0.04238 | -0.030569 | 0.00144 | 0.007 | Older age decreases weight loss. |
| AfterWeight | -0.02999 | -0.025912 | 0.00393 | 0.044 | Higher starting weight decreases weight loss. |
| DietAdherence | 0.60710 | 0.624733 | <0.001 | <0.001 | Higher diet adherence increases weight loss. |
| ExerciseIntensity | 0.64168 | 0.728287 | <0.001 | <0.001 | Higher exercise intensity increases weight loss. |
| SleepQuality | N/A | 0.404399 | <0.001 | N/A | Better sleep quality increases weight loss. |
| Gender_numeric | N/A | 0.388966 | 0.00712 | N/A | Gender positively impacts weight loss. |
All predictors have low VIF values (< 1.2), indicating no significant multicollinearity.
Both models emphasize the importance of DietAdherence and ExerciseIntensity for weight loss. The extended model with SleepQuality and Gender provides a more comprehensive understanding. Tailored programs based on these factors will be more effective in helping individuals achieve weight loss goals.