2024-09-20

Simple Linear Regression Application in the Medical Field

  • My presentation will focus on the concept of simple linear regression and how it can be applied in a real working environment.

  • Specifically, I want to explore how we can use this math concept to predict or explain how one variable may affect another. I will be using the medical topic of body mass index versus blood pressure for this presentation.

  • The basic formula looks something like this: \[ y = \beta_0 + \beta_1 x + \epsilon \]

Where: \(y\) = Blood Pressure, \(x\) = Body Mass Index, \(\beta_0\) = Intercept, \(\beta_1\) = Slope

How it can be Utilized in the Medical Field

Simple Linear Regression is a strong math tool that is a major component of statistical analysis.

  • It can allow us to explore correlations between two variables.

  • This can then allow us to predict certain outcomes related to the variables, which gives us a a better chance of making more informed clinical decisions.

  • This presentation will focus on a medical application of such concepts, specifically with the relationship between blood pressure and body mass index.

BMI (predictor), Blood Pressure (outcome)

BMI and Blood Pressure Example Data Analysis

Example Code for this Presentation

We will use a randomly generated dataset of patients as well as their BMI and their corresponding blood pressure levels.

  • Although random, these numbers are set in a range that makes reasonable sense.
#example dataset
set.seed(123)
BMI <- rnorm(100, mean = 25, sd = 5)
bloodPressure <- 80 + 2.5 * BMI + rnorm(100, sd = 10)
medicalData <- data.frame(BMI, bloodPressure)
  • This code creates a fully randomized selection of patient data and serves as the baseline foundation for the data analysis we will perform later on.

Example Code Explained Line-By-Line

  • set.seed(123) ensures that the same sequence of random numbers are generated by rnorm every time.

  • BMI <- norm(100, mean = 25, sd = 5) generates vector of \(100\) random BMI values from a normal distribution, with an average value at \(25\) and standard deviation at \(5\).

  • bloodPressure <- 80 + 2.5 * BMI generates vector of assumed blood pressure levels based on BMI values. \(80\) is considered the base blood pressure, while \(2.5 * BMI\) represents the generally accepted relationship between both variables.

  • The added rnorm(100, sd = 10) is to add variability to the data to best simulate the situation and gain more credible information.

Randomly Generated Patient Data Set Output

set.seed(123)
BMI <- rnorm(100, mean = 25, sd = 5)
bloodPressure <- 80 + 2.5 * BMI + rnorm(100, sd = 10)
medicalData <- data.frame(BMI, bloodPressure)
print(head(medicalData))
##        BMI bloodPressure
## 1 22.19762      128.3900
## 2 23.84911      142.1916
## 3 32.79354      159.5169
## 4 25.35254      139.9059
## 5 25.64644      134.5999
## 6 33.57532      163.4880

Structuring the Data Relationship as Plots/ Models

  • Using this data set, the relationship between the two variables can be visualized in a more effective manner.

  • The next few slides have various models that better depict the nuances of the relationship between BMI and blood pressure.

  • This information reveals telling insights into how we can start to make more informed clinical decisions based on this correlation.

BMI vs. Blood Pressure Displayed as Scatter Plot

BMI vs. Blood Pressure Linear Regression Model

## 
## Call:
## lm(formula = bloodPressure ~ BMI, data = medicalData)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.073  -6.835  -0.875   5.806  32.904 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  81.5956     5.5265   14.76   <2e-16 ***
## BMI           2.3951     0.2138   11.21   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.707 on 98 degrees of freedom
## Multiple R-squared:  0.5616, Adjusted R-squared:  0.5571 
## F-statistic: 125.5 on 1 and 98 DF,  p-value: < 2.2e-16

Linear Regression Model Explanation

  • The previous slide shows the linear regression model, which allows us to predict the blood pressure level based on the body mass index as the independent variable.

  • I used lm() to depict it in order to gain a better idea of what the exact relationship between the two variables are.

The information that is important to us for this specific presentation are the:

  • Intercept (\(\beta_0\)): Blood pressure when BMI is \(0\), which is around \(81.5956\) mmHg.

  • Slope (\(\beta_1\)): Rate at which blood pressure increases for each additional unit of BMI, which is \(2.3951\) mmHg per 1 unit of BMI.

  • \(R^2\): Tells us how much variance of the data in blood pressure can be explained by its relationship with BMI. \(R^2\) here is \(0.5616\), which suggests that around \(56.16%\) of the blood pressure data can be explained by the BMI.

Final Equation for the Linear Regression Line

Therefore, the final equation for predicting the blood pressure based on the units of BMI is:

\[ \text{Blood Pressure} = 81.5956 + 2.3951 \times \text{BMI} \]

Where:

  • \(81.5956\) is the intercept that represents the baseline blood pressure when BMI is at zero.

  • \(2.3951\) is the slope that represents the change in blood pressure for each added unit of BMI.

Residual Plot of the Model

Explanation for the Residual Plot

  • The purpose of the residual plot is to showcase how the regression model fits into the relationship between the two variables.

  • The residual plot can help researchers assess whether the relationship is linear, how the variance of residuals are across all BMI levels which can reveal issues with data collection, as well as any outliers.

  • The dashed line in the center represents no variance. Ideally the plot should contain a random distribution of points around the line all throughout.

  • The residual plot from the previous slide has a random distribution which indicates that the data selection was truly random and that the results from the linear regression model are credible.

For Fun: 3D Plot with Added Age Variable

We can also create 3D plots if we have an added third variable.

  • This allows us to visualize the complex relationship between the three, which can then help researchers determine how another variable may impact the correlation between the two main variables.

  • The code used to create the 3D plotly plot generates a random data set for age similar to how the data set for BMI was created earlier.

  • The x-axis will represent BMI, the y-axis will represent the patients’ ages, and the z-axis will represent blood pressure measurements.

  • The goal is to visualize the combined effect of BMI and age on the blood pressure of a patient, which will help researchers better predict what might contribute to higher blood pressure.

  • I also used the rocket colorscale for my coloring scheme to better portray how blood pressure increases drastically with higher age and BMI, while the exact opposite can be seen when the patient is both younger and more physically fit.

3D Plot with BMI, Age and its Effect on Blood Pressure

That’s it! Thank you for reading