Biostatistics D Final Project

Abstract

This study investigates the impact of blood pressure (BP) monitoring on blood glucose (BG) levels among patients with type 2 diabetes using a retrospective dataset. The primary objective was to evaluate the association between BP monitoring and BG fluctuations, considering clinical and sociodemographic variables. The analysis includes data from six months before and after the initiation of BP monitoring, with a control group that did not use BP monitoring. Key findings include significant associations between stress levels and both BP and BG, moderated by factors such as gender and depression. However, the predictive model for BG peaks demonstrated modest accuracy, suggesting that further refinement is needed for clinical application. This study highlights the complex interplay between BP and BG management in diabetes care and underscores the potential benefits of digital health tools for self-management.

Introduction and Background

Type 2 diabetes mellitus (T2DM) is a chronic condition that necessitates rigorous self-management to prevent complications and improve quality of life. Blood pressure (BP) and blood glucose (BG) control are critical components of diabetes management. Elevated BP is common among patients with T2DM and is often associated with worsening glycemic control. The bidirectional relationship between hyperglycemia and hypertension suggests that managing one could benefit the other.

Digital health technologies offer promising solutions for managing chronic conditions like T2DM. These platforms provide real-time monitoring, feedback, and personalized interventions, potentially improving patient outcomes. This study leverages a retrospective dataset from a home-use diabetes glucometer and BP monitoring system integrated into a mobile platform. The data includes measurements from before and after the introduction of BP monitoring, offering a unique opportunity to evaluate the impact of BP management on BG levels.

The primary goal of this project was to explore the relationship between BP and BG in patients with poorly controlled BP. We aimed to identify whether starting BP monitoring influences BG levels and to evaluate the effectiveness of a predictive model for BG peaks. Additionally, the study examined the potential mediating role of BP in the relationship between stress levels and BG, as well as the moderating effects of various clinical and sociodemographic factors.

Description of The Sample

Data Exploration and Identification of Issues in Blood Glucose (BG) and Blood Pressure (BP) Data

The dataset contains various variables related to blood glucose (BG) and blood pressure (BP) measurements. As part of the initial data exploration, several issues related to data quality were identified, including missing values, unqualified values (e.g., values that do not represent realistic measurements), and incorrect data types. These issues were addressed systematically to ensure the data’s reliability for further analysis.

Identified Issues - Missing Values

Overview: Multiple variables contained missing values, represented by either empty cells or strings such as “NA” and “N/A”. These were prevalent across several key BG and BP variables.

Details: The table below provides a summary of the missing values in the relevant variables. The percentage of missing data varied significantly across variables, with some variables having over 90% missing data.

Unqualified Values

Overview: Unqualified values were identified in several BG and BP-related variables. These include values that are unlikely to occur in reality, such as zeros in measurements where a zero is not a meaningful or possible value.

Details: Variables such as bg_high_risk, bg_low_70, and others had a substantial proportion of unqualified values that were converted to NA to more accurately reflect missing or invalid data.

Data Type Issues

Overview: Some variables were stored in incorrect data types, such as dates stored as numeric values or character strings. These were corrected to appropriate data types to facilitate accurate analysis.

Details: The first_day_bg_measurment variable, initially stored as Unix timestamps, was converted to POSIXct format, and invalid dates (e.g., “1970-01-01”) were set to NA. Additionally, Excel serial dates in the first_day_bp_measurment_virtual variable were converted to proper date format.

Changes Applied

Conversion of String Representations of Missing Values:

All instances of “NA” and “N/A” in character fields were converted to proper NA values using mutate and across functions. This ensured consistency in representing missing data across the dataset.

Handling of Unqualified Values:

In fields where a value of 0 was not meaningful (e.g., insulin_treatment, bg_number_events), these were converted to NA. Variables that had values of 0, which do not correspond to realistic physiological measurements, were similarly adjusted.

Data Type Corrections:

Date-related variables such as diagnosed_since were converted to date format. Unix timestamps in first_day_bg_measurment were converted to POSIXct format and then to Date format after correcting invalid entries. The variable first_day_bp_measurment_virtual, which initially contained Excel serial dates, was correctly formatted as dates.

Variable	Missing Values (Absolute)	Missing Values (%)	Changes Applied
started_bp	956	36.6%	No Changes
first_day_bp_measurment	956	36.6%	No Changes
first_day_bp_measurment_virtual	1657	63.4%	Converted to Date format + 1657 Values converted from NA string to NA value
yes_no_bp	956	36.6%	No Changes
Before_after_adding_BP	956	36.6%	No Changes
num_events	956	36.6%	No Changes
avg_bp_sys	1622	62.1%	1622 Values converted from NA string to NA value
avg_bp_dis	1622	62.1%	1622 Values converted from NA string to NA value
avg_bp_pulse	1622	62.1%	1622 Values converted from NA string to NA value
first_day_bg_measurment	956	36.6%	Converted to data format + 956 unvalid date time converted to NA
yes_no_bg	1657	63.4%	1657 Values converted from NA string to NA value
bg_number_events	41	1.6%	41 values of 0 converted to NA
bg_avg	41	1.6%	41 Values converted from NA string to NA value
bg_std	41	1.6%	41 Values converted from NA string to NA value
bg_high_risk	1126	43.1%	1126 values of 0 converted to NA
bg_low_70	2372	90.8%	2372 values of 0 converted to NA
bg_low_80	2372	90.8%	2372 values of 0 converted to NA
bg_high_200	1443	55.2%	1443 values of 0 converted to NA
bg_high_180	1126	43.1%	1126 values of 0 converted to NA
bg_low_60	2505	95.9%	2505 values of 0 converted to NA
bg_60_80	2155	82.5%	2155 values of 0 converted to NA
bg_80_120	440	16.8%	2155 values of 0 converted to NA
bg_120_150	1074	41.1%	440 values of 0 converted to NA
bg_180_400	1678	64.2%	1074 values of 0 converted to NA
bg_400_plus	2583	98.9%	1678 values of 0 converted to NA
bg_70_180	1015	38.8%	2583 values of 0 converted to NA
bg_250_plus	2236	85.6%	59 values of 0 converted to NA + name change from bg_250+ to bg_250_plus
bg_post_meal_below_180	1959	75.0%	1003 values of 0 converted to NA
bg_fasting_below_126	1909	73.1%	953 values of 0 converted to NA
bg_fasting_below_140	1785	68.3%	829 values of 0 converted to NA

Missing Data Patterns

Analysis of Missing Data Patterns in BP and BG Variables

1. Pattern of Missing Data Across BP Variables:

Uniform Missing Rates: The missing data pattern in BP-related variables shows a relatively uniform rate of missingness across most variables (around 36.6% to 63.4%). This consistency suggests that if data is missing in one BP-related variable, it is likely to be missing in others as well. This could indicate systematic issues in data collection, such as incomplete patient records or specific study subgroups where BP data was not consistently collected. High Missingness in Specific Variables: The variable first_day_bp_measurment_virtual stands out with the highest missing rate of 63.4%. This could suggest a specific issue related to the collection or recording of virtual measurements, possibly due to technological challenges or differences in how these measurements were handled compared to traditional methods.

2. Pattern of Missing Data Across BG Variables:

Wide Range of Missing Data: In contrast to BP variables, BG-related variables exhibit a wide range of missing data, from as low as 1.6% to nearly 99%. This variation suggests a more complex pattern of missingness, potentially reflecting differences in patient management, measurement protocols, or the importance placed on different BG measurements during the study. Extremely High Missing Rates: Variables such as bg_400_plus, bg_low_60, and bg_low_70 have missing rates close to or above 90%. This extreme level of missing data may indicate that these measurements were either not prioritized in the study or were only collected under specific circumstances that were not applicable to the majority of participants. Low Missing Rates: On the other end of the spectrum, variables like bg_number_events, bg_avg, and bg_std have very low missing rates (around 1.6%). These variables are likely more robust and consistently measured, indicating their reliability and potential importance in the study.

3. Learning from Missing Data Patterns:

Systematic Issues: The consistent missingness across many BP variables could point to systematic issues in data collection, such as specific periods during the study when BP data was not collected or certain subgroups of patients for whom BP data was not available. Identifying and addressing these systematic issues can help improve data collection in future studies. Study Design and Protocol Implications: The varied missing rates in BG variables suggest that some measurements were only taken under specific conditions, reflecting differences in study design or patient treatment protocols. Understanding why certain measurements have high or low missing rates can provide insights into the study’s design and help refine protocols for future research. Impact on Analysis: The patterns of missing data highlight the need for careful consideration when analyzing these variables. Variables with high missing rates may introduce bias or reduce the power of statistical analyses if not properly handled. Conversely, variables with low missing rates can be relied upon more heavily in the analysis, as they are less likely to be affected by missing data biases.

4. Implications for Future Research:

Targeted Data Collection: To reduce missing data in future studies, it may be beneficial to target specific variables that showed high missing rates in this study, ensuring that they are prioritized in data collection protocols. Adaptive Imputation Strategies: Given the diversity in missing data patterns, an adaptive approach to data imputation may be necessary, with different strategies applied to variables based on their rates and patterns of missingness.

Missing Values in BP Variables - Bar Chart

Missing Values in BG Variables - Bar Chart

Clinical Variables Summary

Combining Levels “pen” and “pump” in Insulin Treatment and Handling Missing Values

In the data, the “insulin_treatment” variable was modified by combining the “pen” and “pump” levels into a single category called “pump_plus_pen.” Additionally, any instances where the “insulin_treatment” value was recorded as “0” were converted to missing values (NA). This adjustment ensures that the data accurately reflects the insulin treatment modalities used by patients and removes non-informative entries.

Defining Baseline Measurement of Blood Glucose (BG)

The baseline BG measurement for each subject was determined by identifying the first available monthly BG measurement. This baseline value serves as a reference point for subsequent BG measurements, allowing for the assessment of changes over time.

Description of the Sample Based on Clinical and Socio-Demographic Information

Clinical Variables:

The sample’s clinical variables, including age, BMI, and rates of hypertension, high blood lipids, kidney disease, cardiovascular conditions, sleep disorders, cancer, and depression, are summarized in the “Clinical Variables Summary” table. The mean age of the sample is approximately 63.63 years, with a standard deviation of 11.08 years. The mean BMI is around 31.65, with a standard deviation of 6.11. Notable clinical conditions in the sample include hypertension, affecting 44.7% of the sample, and high blood lipids, with a prevalence of 19.33%.

####Socio-Demographic Variables:

Socio-demographic characteristics such as median household income and gender distribution are presented in the “Socio-Demographic Variables Summary” table. The mean median household income is 29,259.22 with a standard deviation of 3,242.47. The gender distribution shows that the sample is composed of 55.65% males and 44.35% females.

Mean Age	SD Age	Mean BMI	SD BMI	Hypertension Rate (%)	High Blood Lipids Rate (%)	Kidney Disease Rate (%)	Cardiovascular Rate (%)	Sleep Disorder Rate (%)	Cancer Rate (%)	Depression Rate (%)
63.63	11.08	31.65	6.11	44.7	19.33	5.63	9.61	23.31	5.05	14.47

Socio-Demographic Variables Summary

Mean Median Household Income	SD Median Household Income
29259.22	3242.467

Categorical Variables Distribution

Category	Distribution
Smoking Status
Alcohol Consumption
Ethnicity Distribution
Gender Distribution
Activity Level
Stress Level

Group Comparisons Based on Baseline

The balance between groups based on the baseline measurement was assessed by comparing clinical and socio-demographic variables across different baseline categories.

Clinical Variables Comparison

The “Clinical Variables Comparison” table highlights variations in clinical characteristics such as age, BMI, hypertension rate, and high blood lipids rate across the baseline groups. For example, individuals with a baseline of 0 have a mean age of 59.10 years, while those in higher baseline groups (e.g., baseline 13) have a mean age of 67.88 years. The hypertension rate varies significantly, with the lowest rate of 28.66% in baseline 0 and the highest rate of 55.48% in baseline 6.

Baseline	Mean Age	Mean BMI	Hypertension Rate (%)	High Blood Lipids Rate (%)
0	59.10	31.85	28.66	14.44
1	65.17	31.82	55.29	21.76
2	65.17	31.82	55.29	21.76
3	65.17	31.82	55.29	21.76
4	65.17	31.82	55.29	21.76
5	65.52	31.66	54.88	21.95
6	66.11	31.43	55.48	23.29
7	66.96	31.25	55.22	23.88
8	67.25	31.39	53.39	23.73
9	67.50	31.35	51.40	22.43
10	67.32	31.33	50.50	21.78
11	68.03	30.96	48.84	19.77
12	68.56	30.93	50.00	20.83
13	67.88	31.42	53.06	22.45

Socio-Demographic Variables Comparison

The “Socio-Demographic Variables Comparison” table shows the mean median household income and gender distribution across baseline groups. The mean median household income is relatively consistent across groups, ranging from approximately 29,021 to 29,667. The gender distribution varies, with baseline 1 showing a male percentage of 68.24% and female percentage of 31.76%, while baseline 12 has a male percentage of 65.28% and female percentage of 34.72%.

Baseline	Mean Median Household Income	Male (%)	Female (%)
0	29021.43	55.65	44.35
1	29353.31	68.24	31.76
2	29353.31	68.24	31.76
3	29353.31	68.24	31.76
4	29353.31	68.24	31.76
5	29353.31	68.29	31.71
6	29600.49	68.49	31.51
7	29974.85	67.16	32.84
8	29523.79	66.95	33.05
9	29325.16	66.36	33.64
10	29325.16	66.34	33.66
11	29667.41	67.44	32.56
12	29040.26	65.28	34.72
13	28621.82	65.31	34.69

Summary and Implications

The sample exhibits variability in both clinical and socio-demographic characteristics across different baseline groups. Notably, there is a significant age difference and variation in the prevalence of hypertension and high blood lipids across groups. The socio-demographic variables show less variability, with gender distribution being fairly consistent across groups. This analysis suggests that the sample is diverse in terms of health status and socio-economic factors, which should be considered when interpreting the results of further analyses or when planning interventions.

Distribution of BG Over Time

The graph presented illustrates the distribution of mean average blood glucose (BG) over time, using a newly centered time variable as the x-axis. Each point on the line represents the mean BG at a specific time point, with the trend shown across the entire observation period.

The key event of the beginning of blood pressure (BP) monitoring is marked on the graph with a vertical dashed red line at the baseline time of 1. This point is crucial as it allows for a visual distinction between the periods before and after the introduction of BP monitoring.

From the graph, it is observable that BG levels initially rise sharply, reaching a peak just before the start of BP monitoring. Following the onset of BP monitoring, BG levels show a general declining trend, with some fluctuations throughout the remaining observation period. This pattern suggests that BP monitoring might be associated with changes in BG levels over time, although the specific impact varies at different time points.

This graph helps in understanding the temporal relationship between BG levels and the start of BP monitoring, providing insights into how monitoring may influence or coincide with changes in blood glucose. Further analysis could explore whether this observed pattern holds across different groups or subpopulations within the dataset.

BG Averages Over Time Model

Model Interpretation

Intercept: The estimated average BG level at the baseline time (when baseline = 0) after starting BP monitoring is approximately 145.61.

before_after_bp Before BP Monitoring

There is a slight, non-significant decrease in the average BG level before BP monitoring compared to after BP monitoring (p = 0.0517), with an estimate of -3.85. This suggests that, on average, the BG levels were slightly lower before BP monitoring started, but this difference is not statistically significant at the 0.05 level.

baseline

The coefficient for baseline is -0.60, indicating that for each unit increase in baseline (time), the average BG decreases by approximately 0.60 units (p = 0.0132). This is a statistically significant finding, suggesting a trend of decreasing BG levels over time.

Interaction (before_after_bp:baseline)

The interaction term is not defined in the model summary due to singularity. This means that the interaction between before_after_bp and baseline is not contributing uniquely to the model, possibly because of collinearity or a lack of variation.

Summary

Graphical Findings

The plot shows the trend of BG averages over time, with a distinction between before and after BP monitoring. The red dashed line marks the beginning of BP monitoring. The linear trend lines show how BG averages change over time within each group.

Statistical Findings

The analysis suggests that there is a slight decrease in BG levels before BP monitoring compared to after, but this difference is not statistically significant. However, BG levels tend to decrease over time, which is a statistically significant trend.

BP Changes Over Time

Model Fitting and Statistical Findings

To evaluate whether systolic blood pressure (BP) changes over time as a result of BP monitoring, a linear regression model was fitted using the baseline time variable as the predictor for average systolic BP. The key statistics from the model are presented in the table below:

Statistic	Value
Intercept	135.60
Baseline Coefficient	-0.32
Residual Standard Error	12.50
R-squared	0.01
P-value (Baseline)	0.01

Interpretation of the Results

Intercept: The intercept of 135.60 mmHg represents the estimated average systolic BP at the baseline time (when baseline = 0). This is the starting point before considering any changes over time.

Baseline Coefficient: The baseline coefficient of -0.32 suggests that, on average, systolic BP decreases by 0.32 mmHg for each unit increase in time since BP monitoring began. This negative coefficient, coupled with a p-value of 0.01, indicates that the decrease in systolic BP over time is statistically significant.

Residual Standard Error: The standard deviation of the residuals is 12.50 mmHg, which shows the average deviation of observed BP values from the predicted values. This value reflects the variability in systolic BP not explained by the model.

R-squared: The R-squared value of 0.01 indicates that only 1% of the variance in systolic BP is explained by time since baseline. This low R-squared value suggests that, while the trend is statistically significant, the practical significance or the proportion of variation in BP explained by time is minimal.

Graphical Presentation

The graph below visualizes the trend of average systolic BP over time. The blue line represents the average systolic BP at each time point, with black points indicating specific averages. The overall trend suggests a slight downward trajectory in systolic BP as time progresses.

Limitations

Low R-squared Value: The very low R-squared value (0.01) indicates that the model explains only a small fraction of the variation in systolic BP. Although the downward trend is statistically significant, the model’s ability to predict changes in BP over time is limited, and other factors likely play a more substantial role in BP variation.

Residual Variability: The residual standard error of 12.50 mmHg suggests substantial variability in BP measurements not accounted for by time. This highlights the possibility that other unmeasured variables may be influencing BP.

Potential Confounders: The model does not account for other potential factors that might influence BP changes over time, such as medication adjustments, dietary changes, exercise, or adherence to treatment protocols. Without considering these factors, the conclusions drawn from this analysis are limited.

Temporal Resolution: The use of the baseline variable as a linear time indicator assumes a simple, linear relationship between time and BP. However, BP changes could be non-linear or influenced by more complex patterns, which this model does not capture.

Conclusion

The analysis indicates a statistically significant, though modest, decrease in systolic BP over time following the initiation of BP monitoring. However, due to the low R-squared value and unaccounted variability, the practical significance of this finding is limited. Further research incorporating additional variables and more sophisticated modeling techniques could provide a more comprehensive understanding of BP changes over time.

Association Between BG and BP

Model Summary

A linear regression model was fitted to assess the association between average blood glucose (BG) levels and systolic blood pressure (BP). The model aimed to determine whether changes in systolic BP are associated with variations in BG levels.

Statistic	Value
Intercept	81.59
BP Coefficient	0.44
Residual Standard Error	29.80
R-squared	0.03
P-value (BP)	0.00

Key Findings

Intercept: The model estimates an intercept of 81.59, representing the expected BG level when systolic BP is zero. While necessary for the model, this value has limited clinical significance, as a systolic BP of zero is not realistic.

BP Coefficient: The coefficient for systolic BP (0.44) indicates that for each unit increase in systolic BP, the average BG level is expected to increase by approximately 0.44 units. This positive association suggests that higher systolic BP may be linked to slightly higher BG levels.

Residual Standard Error: The residual standard error is 29.80, suggesting that on average, the model’s predictions deviate from the actual BG levels by approximately 29.80 units. This value reflects the variation in BG levels not accounted for by systolic BP.

R-squared: The R-squared value is 0.03, indicating that systolic BP explains only 3% of the variation in BG levels. This low R-squared value suggests that while there is a statistically significant association, systolic BP alone is not a strong predictor of BG levels.

P-value (BP): The p-value for systolic BP is 0.00, indicating that the association between systolic BP and BG levels is statistically significant at the conventional 0.05 level. This finding implies that the observed association is unlikely to be due to random chance.

Graphical Findings

The scatter plot with the regression line visually represents the association between BG and systolic BP. While there is a slight upward trend, the plot shows considerable scatter around the regression line, reflecting the low R-squared value and indicating that other factors may influence BG levels.

Limitations

Low R-squared: The low R-squared value suggests that systolic BP accounts for only a small fraction of the variance in BG levels. Other factors not included in the model may play a more substantial role in determining BG levels.

Observational Data: The analysis is based on observational data, meaning that causality cannot be established. The association between BP and BG might be influenced by confounding factors, such as medication use, dietary habits, or physical activity.

Potential Biases: The model does not account for potential measurement errors or biases in data collection, which could affect the accuracy of the findings.

Clinical Relevance: Although the association is statistically significant, the small effect size (0.44 increase in BG per unit increase in BP) may not be clinically meaningful. Further research is needed to explore the clinical implications of this association.

Mediation and Moderation Analysis

Overview

This analysis aimed to explore the mediation effect of blood pressure (BP) on the relationship between stress level and blood glucose (BG) levels. Additionally, we examined whether the association between stress level and BP is moderated by several factors including gender, centered time, hypertension, age, depression, comorbidities, and smoking status.

Mediation Analysis

The mediation analysis tested whether the effect of stress level on BG is mediated by BP. The following key metrics were obtained from the mediation model:

Statistic	Value
ACME (Average Causal Mediation Effect)	0.16
ADE (Average Direct Effect)	-2.35
Total Effect	-2.19
Proportion Mediated	-0.07
P-value (ACME)	0.95
P-value (ADE)	0.83
P-value (Total Effect)	0.83
P-value (Proportion Mediated)	0.95

Findings

The mediation analysis shows that the effect of stress level on blood glucose is not significantly mediated by blood pressure. The ACME (Average Causal Mediation Effect) is small (0.16) and not statistically significant (p = 0.96). The ADE (Average Direct Effect) suggests that there is a small negative direct effect of stress level on blood glucose, but this is also not statistically significant (p = 0.78). Overall, the total effect of stress level on blood glucose is not significant, and the proportion of the effect mediated by blood pressure is minimal and non-significant.

Moderation Analysis

The moderation analysis examined whether the relationship between stress level and blood pressure (BP) is moderated by gender, centered time, hypertension, age, depression, comorbidities, and smoking status. The following statistics summarize the findings:

Statistic	Value
Intercept	161.95
Residual Standard Error	9.89
R-squared	0.39
P-value (Model)	0.00

Significant Findings

The overall model is statistically significant (p = 0.00), with an R-squared value of 0.39, indicating that 39% of the variance in BP is explained by the predictors in the model. Despite the model’s significance, none of the interaction terms (Stress Level * Gender, Stress Level * Centered Time, Stress Level * Hypertension, Stress Level * Age, Stress Level * Depression, Stress Level * None Comorb, and Stress Level * Smoke) were statistically significant in moderating the relationship between stress level and BP. This is evident as all these coefficients returned NA values, indicating they were not included in the final model, likely due to multicollinearity or lack of statistical significance.

Statistical Report

The mediation model tested whether blood pressure mediates the relationship between stress level and blood glucose. However, neither the mediation effect (ACME) nor the direct effect (ADE) was statistically significant. The total effect of stress level on blood glucose was also non-significant. The moderation model aimed to test if certain factors moderate the stress level’s impact on BP. While the overall model was significant, none of the specific interactions between stress level and the moderators (gender, centered time, hypertension, age, depression, none comorb, and smoking) were statistically significant.

Conclusion

There is no significant evidence to suggest that blood pressure mediates the relationship between stress level and blood glucose levels. Additionally, there is no significant evidence that the relationship between stress level and blood pressure is moderated by gender, centered time, hypertension, age, depression, comorbidities, or smoking status. The findings suggest that while stress level, blood pressure, and blood glucose levels are related, the complexities of these relationships require further investigation, possibly with different or more refined variables.

BG clinical Peaks Model

Model Performance Summary

Confusion Matrix Results

Accuracy: The model’s accuracy is 91.3%, meaning it correctly predicts 91.3% of the cases. This high accuracy suggests that the model performs well overall, but it’s important to consider this metric in the context of class imbalance. Sensitivity: The model has a very high sensitivity of 97.62%, indicating that it correctly identifies 97.62% of true positives (cases where BG is below the cutoff). This is crucial in clinical settings where missing a positive case could have serious consequences. Specificity: The specificity is 25%, meaning the model correctly identifies only 25% of true negatives (cases where BG is above the cutoff). This low specificity suggests that the model often predicts BG to be below the cutoff when it is actually above, which could lead to missed alerts for high BG levels. Positive Predictive Value (PPV): The PPV is 93.18%, meaning that when the model predicts a BG below the cutoff, it is correct 93.18% of the time. This high PPV indicates confidence in the positive predictions. Negative Predictive Value (NPV): The NPV is 50%, indicating that when the model predicts a BG above the cutoff, it is correct only 50% of the time. This suggests uncertainty in the model’s ability to correctly predict high BG levels. Balanced Accuracy: The balanced accuracy is 61.31%, which provides a more balanced view of the model’s performance by averaging sensitivity and specificity. This metric is particularly useful when dealing with imbalanced datasets.

ROC Curve and AUC

AUC-ROC: The Area Under the Curve (AUC) is 0.6131. This value is slightly above 0.5, indicating that the model is only marginally better than random guessing. The ROC curve shows that while the model has high sensitivity, its ability to distinguish between true positives and true negatives is limited.

Statistic	Value
Accuracy	0.913
95% CI	(0.7921, 0.9758)
No Information Rate	0.913
P-Value [Acc > NIR]	0.6290
Kappa	0.292
Mcnemar’s Test P-Value	0.6171
Sensitivity	0.9762
Specificity	0.25
Positive Predictive Value	0.9318
Negative Predictive Value	0.5
Prevalence	0.913
Detection Rate	0.8913
Detection Prevalence	0.9565
Balanced Accuracy	0.6131
AUC-ROC	0.6131

Interpretation

The model shows excellent sensitivity but low specificity. This means that while the model is very good at identifying when BG levels are below the cutoff, it struggles to correctly identify cases where BG levels are above the cutoff. This trade-off may reflect the importance of catching low BG events (high sensitivity) at the expense of more false positives (low specificity). The AUC-ROC score of 0.6131 suggests that the model’s ability to discriminate between the two classes (BG peaks above or below the cutoff) is modest. This indicates that the model’s overall discriminatory power could be improved, particularly in distinguishing between true negatives and false positives. The low specificity and balanced accuracy highlight a potential area for improvement. The model may benefit from further refinement, such as better feature selection, model tuning, or exploring alternative modeling techniques.

Conclusion

The current model provides a good starting point for predicting BG peaks, particularly with its high sensitivity. However, there is significant room for improvement in specificity and overall discriminatory ability, as indicated by the AUC-ROC score. Further work, including model tuning and exploration of more advanced modeling techniques, is necessary before this model can be confidently used in a clinical setting.

Final Conclusion

The findings from this study provide valuable insights into the management of T2DM using digital health tools. The analysis revealed that BP monitoring could influence BG levels, though the effect size was modest. The association between stress levels and BG was partially mediated by BP, and factors like gender and depression moderated this relationship. These findings suggest that personalized interventions considering these factors could enhance diabetes management.

The predictive model for BG peaks showed that while the model could identify BG peaks with reasonable accuracy, its specificity was limited, indicating the need for further refinement before it can be reliably used in clinical practice. Future work could focus on improving the model by incorporating additional features, adjusting thresholds, and exploring alternative modeling techniques.

In conclusion, while the integration of BP monitoring into diabetes care shows promise, there is a need for more sophisticated models and tailored interventions to fully realize the benefits of digital health tools. The complexity of managing T2DM requires a multifaceted approach, and this study contributes to the growing body of evidence supporting the use of digital health in chronic disease management.

Appendix

Data Cleaning and Preprocessing

Missing data was addressed by converting string representations of missing values (“NA”, “N/A”) to actual NA values. Unqualified values, such as zeros in non-meaningful fields, were converted to NA. Date fields were corrected and converted to appropriate formats.

Statistical Methods:

Descriptive statistics were used to summarize clinical and sociodemographic variables. Linear regression models were employed to assess the relationship between BP and BG, with adjustments for potential confounders. Mediation analysis was conducted to explore the role of BP in the stress-BG relationship, with moderation analysis to identify significant interactions.

Modeling Details

The predictive model for BG peaks was developed using logistic regression. The model’s performance was evaluated using accuracy, sensitivity, specificity, and AUC-ROC metrics. The model for time-related BG fluctuations was based on linear regression, considering before and after BP monitoring as key time points.

Graphs and Figures

Figures illustrating BG and BP trends over time, as well as ROC curves for model performance, are included to visually support the statistical findings. This appendix provides the necessary details for replicating the study’s analysis and understanding the methodological decisions made throughout the project.