Chafiaa Nadour
2023-12-05
High blood pressure during pregnancy can put the mother and her baby at risk; it can cause permanent damage to the organs, strokes, and underweight babies at birth. I want to examine if there is a relationship between blood pressure and different factors such as age, BMI, and heart rate? . I’m focusing on these three factors because a lot of women who focus on their career and education get pregnant in their 30s. It’s also known that weight and heart rate increase during pregnancy.
To answer my question, I used the data on the pregnancy risk factor that I found on Kaggle. https://www.kaggle.com/datasets/mmhossain/pregnancy-risk-factor-data: The data set has 6103 observations and 14 attributes. The cases are pregnant women of different ages who were monitored for their BMI, blood glucose, blood pressure (systolic and diastolic), heart rate, body temperature, etc. As a start, I created a new data set with all 6103 observations and only 4 attributes because it was easier for plotting.
## Patient.ID Name Age Body.Temperature.F. Heart.rate.bpm.
## 1 1994601 Moulya 20 97.5 91
## 2 2001562 Soni 45 97.7 99
## 3 2002530 Baishali 29 98.6 84
## 4 2002114 Abhilasha 26 99.5 135
## 5 2002058 Aanaya 38 102.5 51
## 6 1993812 Navni 21 98.6 85
## Systolic.Blood.Pressure.mm.Hg. Diastolic.Blood.Pressure.mm.Hg. BMI.kg.m.2.
## 1 161 100 24.9
## 2 99 94 22.1
## 3 129 87 19.0
## 4 161 101 23.7
## 5 106 91 18.8
## 6 142 89 22.0
## Blood.Glucose.HbA1c. Blood.Glucose.Fasting.hour.mg.dl. Outcome
## 1 41 5.8 high risk
## 2 36 5.7 high risk
## 3 42 6.4 mid risk
## 4 46 4.5 high risk
## 5 38 4.3 high risk
## 6 30 5.6 mid risk
## [1] 0
## [1] 6103 11
## [1] "Patient.ID" "Name"
## [3] "Age" "Body.Temperature.F."
## [5] "Heart.rate.bpm." "Systolic.Blood.Pressure.mm.Hg."
## [7] "Diastolic.Blood.Pressure.mm.Hg." "BMI.kg.m.2."
## [9] "Blood.Glucose.HbA1c." "Blood.Glucose.Fasting.hour.mg.dl."
## [11] "Outcome"
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
data1=data %>%
select(Diastolic.Blood.Pressure.mm.Hg., Age, Heart.rate.bpm., BMI.kg.m.2.)
head(data1)## Diastolic.Blood.Pressure.mm.Hg. Age Heart.rate.bpm. BMI.kg.m.2.
## 1 100 20 91 24.9
## 2 94 45 99 22.1
## 3 87 29 84 19.0
## 4 101 26 135 23.7
## 5 91 38 51 18.8
## 6 89 21 85 22.0
To conduct my analysis, I chose one dependent variable, y = diastolic BP, and three independent variables. x1=Age, x2= HR, x3=BMI
## [1] 100 94 87 101 91 89
## [1] TRUE
## [1] 20 45 29 26 38 21
## [1] TRUE
## [1] 91 99 84 135 51 85
## [1] TRUE
## [1] 24.9 22.1 19.0 23.7 18.8 22.0
## [1] TRUE
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.00 82.00 87.00 87.26 92.00 142.00
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 15.00 22.00 25.00 26.43 30.00 250.00
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 45.0 72.0 80.0 86.1 91.0 150.0
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 14.90 19.60 21.30 21.44 23.10 27.90
To get a visual idea about the relationship between the variables, I used scatter plots and pair plots.
## Warning: package 'GGally' was built under R version 4.3.2
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
pairs(data1,
col = "red",
pch = 18,
labels = c("y", "x1", "x2", "x3"),
main = "Pair plot of all variables")
##
From the plot, it appears that a pregnant woman’s blood pressure has a positive weak linear relationship With their age, HR and BMI
To have a better understanding of the relationship between the variables, I chose linear regression. I started with the formula to get the functions , then run the summary.
##
## Call:
## lm(formula = y ~ x1)
##
## Coefficients:
## (Intercept) x1
## 82.2440 0.1897
y=82.2440+0.1897x1—-example Age= 41, BP~90 which high
##
## Call:
## lm(formula = y ~ x2)
##
## Coefficients:
## (Intercept) x2
## 78.89607 0.09711
y=78.89607+0.09711x2—-Example HR=110 , BP~90
##
## Call:
## lm(formula = y ~ x3)
##
## Coefficients:
## (Intercept) x3
## 73.1042 0.6603
y=73.1042+0.6603x3—-Example BMI=25, BP~90
##
## Call:
## lm(formula = y ~ x1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -77.418 -4.987 -0.608 4.495 50.649
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 82.24404 0.41930 196.1 <2e-16 ***
## x1 0.18973 0.01542 12.3 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.699 on 6101 degrees of freedom
## Multiple R-squared: 0.0242, Adjusted R-squared: 0.02404
## F-statistic: 151.3 on 1 and 6101 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = y ~ x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -77.180 -4.888 -0.568 4.529 53.198
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 78.89607 0.37661 209.49 <2e-16 ***
## x2 0.09711 0.00423 22.96 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.477 on 6101 degrees of freedom
## Multiple R-squared: 0.07951, Adjusted R-squared: 0.07936
## F-statistic: 527 on 1 and 6101 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = y ~ x3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -81.535 -4.970 -0.828 4.888 52.059
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 73.10423 0.97970 74.62 <2e-16 ***
## x3 0.66027 0.04547 14.52 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.662 on 6101 degrees of freedom
## Multiple R-squared: 0.0334, Adjusted R-squared: 0.03324
## F-statistic: 210.8 on 1 and 6101 DF, p-value: < 2.2e-16
From the summary results I see that: The p value of Age, HR and BMI
<0.05 which make the relationship significant with BP. From R squared
values: I find the HR has the highest R^2 mean HR has more effect on BP.
R1sq= 0.0242
R2sq= 0.07951 R3sq= 0.0334
and from testing side:
H0: There is no relationship between high BP and Age, HR , BMI (null Hypothesis). H1: There is relationship high BP and Age, HR , BMI (Alternative Hypothesis).
R squared is close to 0 means that this model is not the best to prediction and we have insufficient evidence to reject H0
I also checked if there is any correlation between the variables.
## [1] 0.1555725
## [1] 0.2819719
## [1] 0.1827578
and we can tell that there is a weak positive correction between BP and age, HR and BMI
The data I worked on is not the best because I see some incorrect variables, like the maximum age is 250, but I think in general it serves to prove a scientific fact that high blood pressure can increase because of age, weight, and heart rate.
https://www.kaggle.com/datasets/mmhossain/pregnancy-risk-factor-data: www.google.com