Data Preparation

#URL of the dataset
url <- "https://raw.githubusercontent.com/lburenkov/maternalrisk/refs/heads/main/Maternal%20Health%20Risk%20Data%20Set.csv"

#Loading the dataset into a data frame
df <- read.csv(url)

#Displaying the first few rows of the dataset
head(df)
library(tidyverse)
library(openintro)

Research question

What are the primary risk factors for maternal health complications in rural Bangladesh?

By analyzing the features in the dataset, researchers could identify the factors most strongly associated with adverse maternal outcomes.

Cases

What are the cases, and how many are there?

There are 1014 rows and 7 variables in this dataset.

str(df)
## 'data.frame':    1014 obs. of  7 variables:
##  $ Age        : int  25 35 29 30 35 23 23 35 32 42 ...
##  $ SystolicBP : int  130 140 90 140 120 140 130 85 120 130 ...
##  $ DiastolicBP: int  80 90 70 85 60 80 70 60 90 80 ...
##  $ BS         : num  15 13 8 7 6.1 7.01 7.01 11 6.9 18 ...
##  $ BodyTemp   : num  98 98 100 98 98 98 98 102 98 98 ...
##  $ HeartRate  : int  86 70 80 70 76 70 78 86 70 70 ...
##  $ RiskLevel  : chr  "high risk" "high risk" "high risk" "high risk" ...

Data collection

The Maternal Health Risk Dataset was collected by Ahmed, M. (2020) as part of a study aimed at identifying and predicting maternal health risks in rural Bangladesh. The dataset was compiled using an Internet of Things (IoT)-based monitoring system, which collected real-time health data from various sources, including hospitals, community clinics, and maternal healthcare facilities in rural regions.

This dataset focuses on the identification of maternal health risks by monitoring key health indicators, including factors such as age, blood pressure, and other clinical measures. The data was collected from 1,013 instances, each representing an individual’s health record during their pregnancy. The dataset consists of 6 features (both real and integer types) and is intended for use in classification tasks, where the goal is to predict the risk level based on the provided features.

The data was sourced from various rural healthcare settings across Bangladesh, making it a crucial resource for research aimed at improving maternal healthcare systems in low-resource environments. The collection process was designed to ensure the accurate monitoring of key health indicators to enable early intervention and improve health outcomes.

For further details, please refer to the original source of the dataset: Ahmed, M. (2020). Maternal Health Risk [Dataset]. UCI Machine Learning Repository. DOI: 10.24432/C5DP5D.

Type of study

What type of study is this (observational/experiment)?

As an observational study, this dataset falls under a non-interventional research design, where researchers observe and analyze existing data without influencing the conditions being studied. In this case, health metrics like blood pressure, age, and other clinical indicators were monitored, with the goal of identifying maternal health risks. The researchers’ role was limited to collecting and analyzing the data rather than manipulating or controlling the factors that could impact maternal health outcomes.

Data Source

Ahmed, M. (2020). Maternal Health Risk [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5DP5D.

Describe your variables?

Are they quantitative or qualitative?

In the Maternal Health Risk Dataset, the features are primarily quantitative, meaning they consist of numeric values (either continuous or discrete integers). Examples include age (real), blood pressure (real), and other clinical measurements that are typically treated as quantitative variables.

If you are are running a regression or similar model, which one is your dependent variable?

It is the intent to work on a classification Model:

Dependent Variable (Target Variable):

Maternal Health Risk:(target) a categorical variable

Binary classification: “high risk” vs. “low risk”. Multiclass classification: “low risk”, “medium risk”, “high risk”

Independent Variables (Features):

The independent variables (features) are the health indicators like age, blood pressure, or other clinical measures that could be used to predict the maternal health risk.

Relevant summary statistics

Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.**

summary(df)
##       Age          SystolicBP     DiastolicBP           BS        
##  Min.   :10.00   Min.   : 70.0   Min.   : 49.00   Min.   : 6.000  
##  1st Qu.:19.00   1st Qu.:100.0   1st Qu.: 65.00   1st Qu.: 6.900  
##  Median :26.00   Median :120.0   Median : 80.00   Median : 7.500  
##  Mean   :29.87   Mean   :113.2   Mean   : 76.46   Mean   : 8.726  
##  3rd Qu.:39.00   3rd Qu.:120.0   3rd Qu.: 90.00   3rd Qu.: 8.000  
##  Max.   :70.00   Max.   :160.0   Max.   :100.00   Max.   :19.000  
##     BodyTemp        HeartRate     RiskLevel        
##  Min.   : 98.00   Min.   : 7.0   Length:1014       
##  1st Qu.: 98.00   1st Qu.:70.0   Class :character  
##  Median : 98.00   Median :76.0   Mode  :character  
##  Mean   : 98.67   Mean   :74.3                     
##  3rd Qu.: 98.00   3rd Qu.:80.0                     
##  Max.   :103.00   Max.   :90.0
# Histogram
hist(df$Age, main = "Age Distribution", xlab = "Age", col = "lightblue", border = "black")

# Boxplot
boxplot(df$Age ~ df$RiskLevel, main = "Age by Risk Level", xlab = "Risk Level", ylab = "Age")

# Density plot
plot(density(df$Age), main = "Density Plot of Age", xlab = "Age")

#Calculating skewness and kurtosis
library(e1071)
skewness(df$Age)
## [1] 0.7807483
kurtosis(df$Age)
## [1] -0.400533
# Pairwise scatter plots
pairs(df[, sapply(df, is.numeric)])

# Alternatively, ggpairs for more options
library(GGally)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
## 
## Attaching package: 'GGally'
## The following object is masked from 'package:openintro':
## 
##     tips
ggpairs(df[, sapply(df, is.numeric)])

LS0tDQp0aXRsZTogIkRBVEEgNjA2IERhdGEgUHJvamVjdCBQcm9wb3NhbCINCmF1dGhvcjogIkxhdXJhIEIiDQpkYXRlOiAiYHIgU3lzLkRhdGUoKWAiDQpvdXRwdXQ6IG9wZW5pbnRybzo6bGFiX3JlcG9ydA0KLS0tDQoNCiMjIyBEYXRhIFByZXBhcmF0aW9uDQoNCmBgYHtyIHNldHVwLCBlY2hvPVRSVUUsIHJlc3VsdHM9J2hpZGUnLCB3YXJuaW5nPUZBTFNFLCBtZXNzYWdlPUZBTFNFfQ0KI1VSTCBvZiB0aGUgZGF0YXNldA0KdXJsIDwtICJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vbGJ1cmVua292L21hdGVybmFscmlzay9yZWZzL2hlYWRzL21haW4vTWF0ZXJuYWwlMjBIZWFsdGglMjBSaXNrJTIwRGF0YSUyMFNldC5jc3YiDQoNCiNMb2FkaW5nIHRoZSBkYXRhc2V0IGludG8gYSBkYXRhIGZyYW1lDQpkZiA8LSByZWFkLmNzdih1cmwpDQoNCiNEaXNwbGF5aW5nIHRoZSBmaXJzdCBmZXcgcm93cyBvZiB0aGUgZGF0YXNldA0KaGVhZChkZikNCg0KYGBgDQoNCmBgYHtyIGxvYWQtcGFja2FnZXMsIG1lc3NhZ2U9RkFMU0V9DQpsaWJyYXJ5KHRpZHl2ZXJzZSkNCmxpYnJhcnkob3BlbmludHJvKQ0KYGBgDQoNCiMjIyBSZXNlYXJjaCBxdWVzdGlvbiANCg0KV2hhdCBhcmUgdGhlIHByaW1hcnkgcmlzayBmYWN0b3JzIGZvciBtYXRlcm5hbCBoZWFsdGggY29tcGxpY2F0aW9ucyBpbiBydXJhbCBCYW5nbGFkZXNoPw0KDQpCeSBhbmFseXppbmcgdGhlIGZlYXR1cmVzIGluIHRoZSBkYXRhc2V0LCByZXNlYXJjaGVycyBjb3VsZCBpZGVudGlmeSB0aGUgZmFjdG9ycyBtb3N0IHN0cm9uZ2x5IGFzc29jaWF0ZWQgd2l0aCBhZHZlcnNlIG1hdGVybmFsIG91dGNvbWVzLg0KDQojIyMgQ2FzZXMgDQoNCldoYXQgYXJlIHRoZSBjYXNlcywgYW5kIGhvdyBtYW55IGFyZSB0aGVyZT8NCg0KVGhlcmUgYXJlIDEwMTQgcm93cyBhbmQgNyB2YXJpYWJsZXMgaW4gdGhpcyBkYXRhc2V0Lg0KDQpgYGB7cn0NCnN0cihkZikNCmBgYA0KDQoNCiMjIyBEYXRhIGNvbGxlY3Rpb24gDQoNClRoZSBNYXRlcm5hbCBIZWFsdGggUmlzayBEYXRhc2V0IHdhcyBjb2xsZWN0ZWQgYnkgQWhtZWQsIE0uICgyMDIwKSBhcyBwYXJ0IG9mIGEgc3R1ZHkgYWltZWQgYXQgaWRlbnRpZnlpbmcgYW5kIHByZWRpY3RpbmcgbWF0ZXJuYWwgaGVhbHRoIHJpc2tzIGluIHJ1cmFsIEJhbmdsYWRlc2guIFRoZSBkYXRhc2V0IHdhcyBjb21waWxlZCB1c2luZyBhbiBJbnRlcm5ldCBvZiBUaGluZ3MgKElvVCktYmFzZWQgbW9uaXRvcmluZyBzeXN0ZW0sIHdoaWNoIGNvbGxlY3RlZCByZWFsLXRpbWUgaGVhbHRoIGRhdGEgZnJvbSB2YXJpb3VzIHNvdXJjZXMsIGluY2x1ZGluZyBob3NwaXRhbHMsIGNvbW11bml0eSBjbGluaWNzLCBhbmQgbWF0ZXJuYWwgaGVhbHRoY2FyZSBmYWNpbGl0aWVzIGluIHJ1cmFsIHJlZ2lvbnMuDQoNClRoaXMgZGF0YXNldCBmb2N1c2VzIG9uIHRoZSBpZGVudGlmaWNhdGlvbiBvZiBtYXRlcm5hbCBoZWFsdGggcmlza3MgYnkgbW9uaXRvcmluZyBrZXkgaGVhbHRoIGluZGljYXRvcnMsIGluY2x1ZGluZyBmYWN0b3JzIHN1Y2ggYXMgYWdlLCBibG9vZCBwcmVzc3VyZSwgYW5kIG90aGVyIGNsaW5pY2FsIG1lYXN1cmVzLiBUaGUgZGF0YSB3YXMgY29sbGVjdGVkIGZyb20gMSwwMTMgaW5zdGFuY2VzLCBlYWNoIHJlcHJlc2VudGluZyBhbiBpbmRpdmlkdWFs4oCZcyBoZWFsdGggcmVjb3JkIGR1cmluZyB0aGVpciBwcmVnbmFuY3kuIFRoZSBkYXRhc2V0IGNvbnNpc3RzIG9mIDYgZmVhdHVyZXMgKGJvdGggcmVhbCBhbmQgaW50ZWdlciB0eXBlcykgYW5kIGlzIGludGVuZGVkIGZvciB1c2UgaW4gY2xhc3NpZmljYXRpb24gdGFza3MsIHdoZXJlIHRoZSBnb2FsIGlzIHRvIHByZWRpY3QgdGhlIHJpc2sgbGV2ZWwgYmFzZWQgb24gdGhlIHByb3ZpZGVkIGZlYXR1cmVzLg0KDQpUaGUgZGF0YSB3YXMgc291cmNlZCBmcm9tIHZhcmlvdXMgcnVyYWwgaGVhbHRoY2FyZSBzZXR0aW5ncyBhY3Jvc3MgQmFuZ2xhZGVzaCwgbWFraW5nIGl0IGEgY3J1Y2lhbCByZXNvdXJjZSBmb3IgcmVzZWFyY2ggYWltZWQgYXQgaW1wcm92aW5nIG1hdGVybmFsIGhlYWx0aGNhcmUgc3lzdGVtcyBpbiBsb3ctcmVzb3VyY2UgZW52aXJvbm1lbnRzLiBUaGUgY29sbGVjdGlvbiBwcm9jZXNzIHdhcyBkZXNpZ25lZCB0byBlbnN1cmUgdGhlIGFjY3VyYXRlIG1vbml0b3Jpbmcgb2Yga2V5IGhlYWx0aCBpbmRpY2F0b3JzIHRvIGVuYWJsZSBlYXJseSBpbnRlcnZlbnRpb24gYW5kIGltcHJvdmUgaGVhbHRoIG91dGNvbWVzLg0KDQpGb3IgZnVydGhlciBkZXRhaWxzLCBwbGVhc2UgcmVmZXIgdG8gdGhlIG9yaWdpbmFsIHNvdXJjZSBvZiB0aGUgZGF0YXNldDoNCkFobWVkLCBNLiAoMjAyMCkuIE1hdGVybmFsIEhlYWx0aCBSaXNrIFtEYXRhc2V0XS4gVUNJIE1hY2hpbmUgTGVhcm5pbmcgUmVwb3NpdG9yeS4gRE9JOiAxMC4yNDQzMi9DNURQNUQuDQoNCg0KDQojIyMgVHlwZSBvZiBzdHVkeSANCg0KV2hhdCB0eXBlIG9mIHN0dWR5IGlzIHRoaXMgKG9ic2VydmF0aW9uYWwvZXhwZXJpbWVudCk/DQoNCkFzIGFuIG9ic2VydmF0aW9uYWwgc3R1ZHksIHRoaXMgZGF0YXNldCBmYWxscyB1bmRlciBhIG5vbi1pbnRlcnZlbnRpb25hbCByZXNlYXJjaCBkZXNpZ24sIHdoZXJlIHJlc2VhcmNoZXJzIG9ic2VydmUgYW5kIGFuYWx5emUgZXhpc3RpbmcgZGF0YSB3aXRob3V0IGluZmx1ZW5jaW5nIHRoZSBjb25kaXRpb25zIGJlaW5nIHN0dWRpZWQuIEluIHRoaXMgY2FzZSwgaGVhbHRoIG1ldHJpY3MgbGlrZSBibG9vZCBwcmVzc3VyZSwgYWdlLCBhbmQgb3RoZXIgY2xpbmljYWwgaW5kaWNhdG9ycyB3ZXJlIG1vbml0b3JlZCwgd2l0aCB0aGUgZ29hbCBvZiBpZGVudGlmeWluZyBtYXRlcm5hbCBoZWFsdGggcmlza3MuIFRoZSByZXNlYXJjaGVycycgcm9sZSB3YXMgbGltaXRlZCB0byBjb2xsZWN0aW5nIGFuZCBhbmFseXppbmcgdGhlIGRhdGEgcmF0aGVyIHRoYW4gbWFuaXB1bGF0aW5nIG9yIGNvbnRyb2xsaW5nIHRoZSBmYWN0b3JzIHRoYXQgY291bGQgaW1wYWN0IG1hdGVybmFsIGhlYWx0aCBvdXRjb21lcy4NCg0KIyMjIERhdGEgU291cmNlIA0KDQpBaG1lZCwgTS4gKDIwMjApLiBNYXRlcm5hbCBIZWFsdGggUmlzayBbRGF0YXNldF0uIFVDSSBNYWNoaW5lIExlYXJuaW5nIFJlcG9zaXRvcnkuIGh0dHBzOi8vZG9pLm9yZy8xMC4yNDQzMi9DNURQNUQuDQoNCg0KIyMjIERlc2NyaWJlIHlvdXIgdmFyaWFibGVzPw0KDQpBcmUgdGhleSBxdWFudGl0YXRpdmUgb3IgcXVhbGl0YXRpdmU/DQoNCkluIHRoZSBNYXRlcm5hbCBIZWFsdGggUmlzayBEYXRhc2V0LCB0aGUgZmVhdHVyZXMgYXJlIHByaW1hcmlseSBxdWFudGl0YXRpdmUsIG1lYW5pbmcgdGhleSBjb25zaXN0IG9mIG51bWVyaWMgdmFsdWVzIChlaXRoZXIgY29udGludW91cyBvciBkaXNjcmV0ZSBpbnRlZ2VycykuIEV4YW1wbGVzIGluY2x1ZGUgYWdlIChyZWFsKSwgYmxvb2QgcHJlc3N1cmUgKHJlYWwpLCBhbmQgb3RoZXIgY2xpbmljYWwgbWVhc3VyZW1lbnRzIHRoYXQgYXJlIHR5cGljYWxseSB0cmVhdGVkIGFzIHF1YW50aXRhdGl2ZSB2YXJpYWJsZXMuDQoNCklmIHlvdSBhcmUgYXJlIHJ1bm5pbmcgYSByZWdyZXNzaW9uIG9yIHNpbWlsYXIgbW9kZWwsIHdoaWNoIG9uZSBpcyB5b3VyIGRlcGVuZGVudCB2YXJpYWJsZT8NCg0KSXQgaXMgdGhlIGludGVudCB0byB3b3JrIG9uIGEgY2xhc3NpZmljYXRpb24gTW9kZWw6DQoNCkRlcGVuZGVudCBWYXJpYWJsZSAoVGFyZ2V0IFZhcmlhYmxlKToNCg0KTWF0ZXJuYWwgSGVhbHRoIFJpc2s6KHRhcmdldCkgYSBjYXRlZ29yaWNhbCB2YXJpYWJsZQ0KDQpCaW5hcnkgY2xhc3NpZmljYXRpb246ICJoaWdoIHJpc2siIHZzLiAibG93IHJpc2siLg0KTXVsdGljbGFzcyBjbGFzc2lmaWNhdGlvbjogImxvdyByaXNrIiwgIm1lZGl1bSByaXNrIiwgImhpZ2ggcmlzayIgDQoNCkluZGVwZW5kZW50IFZhcmlhYmxlcyAoRmVhdHVyZXMpOg0KDQpUaGUgaW5kZXBlbmRlbnQgdmFyaWFibGVzIChmZWF0dXJlcykgYXJlIHRoZSBoZWFsdGggaW5kaWNhdG9ycyBsaWtlIGFnZSwgYmxvb2QgcHJlc3N1cmUsIG9yIG90aGVyIGNsaW5pY2FsIG1lYXN1cmVzIHRoYXQgY291bGQgYmUgdXNlZCB0byBwcmVkaWN0IHRoZSBtYXRlcm5hbCBoZWFsdGggcmlzay4NCg0KDQojIyMgUmVsZXZhbnQgc3VtbWFyeSBzdGF0aXN0aWNzIA0KDQpQcm92aWRlIHN1bW1hcnkgc3RhdGlzdGljcyBmb3IgZWFjaCB0aGUgdmFyaWFibGVzLiBBbHNvIGluY2x1ZGUgYXBwcm9wcmlhdGUgdmlzdWFsaXphdGlvbnMgcmVsYXRlZCB0byB5b3VyIHJlc2VhcmNoIHF1ZXN0aW9uIChlLmcuIHNjYXR0ZXIgcGxvdCwgYm94cGxvdHMsIGV0YykuIFRoaXMgc3RlcCByZXF1aXJlcyB0aGUgdXNlIG9mIFIsIGhlbmNlIGEgY29kZSBjaHVuayBpcyBwcm92aWRlZCBiZWxvdy4gSW5zZXJ0IG1vcmUgY29kZSBjaHVua3MgYXMgbmVlZGVkLioqDQoNCmBgYHtyfQ0Kc3VtbWFyeShkZikNCmBgYA0KDQpgYGB7cn0NCiMgSGlzdG9ncmFtDQpoaXN0KGRmJEFnZSwgbWFpbiA9ICJBZ2UgRGlzdHJpYnV0aW9uIiwgeGxhYiA9ICJBZ2UiLCBjb2wgPSAibGlnaHRibHVlIiwgYm9yZGVyID0gImJsYWNrIikNCg0KIyBCb3hwbG90DQpib3hwbG90KGRmJEFnZSB+IGRmJFJpc2tMZXZlbCwgbWFpbiA9ICJBZ2UgYnkgUmlzayBMZXZlbCIsIHhsYWIgPSAiUmlzayBMZXZlbCIsIHlsYWIgPSAiQWdlIikNCg0KIyBEZW5zaXR5IHBsb3QNCnBsb3QoZGVuc2l0eShkZiRBZ2UpLCBtYWluID0gIkRlbnNpdHkgUGxvdCBvZiBBZ2UiLCB4bGFiID0gIkFnZSIpDQoNCmBgYA0KDQoNCmBgYHtyfQ0KI0NhbGN1bGF0aW5nIHNrZXduZXNzIGFuZCBrdXJ0b3Npcw0KbGlicmFyeShlMTA3MSkNCnNrZXduZXNzKGRmJEFnZSkNCmt1cnRvc2lzKGRmJEFnZSkNCg0KYGBgDQoNCmBgYHtyfQ0KIyBQYWlyd2lzZSBzY2F0dGVyIHBsb3RzDQpwYWlycyhkZlssIHNhcHBseShkZiwgaXMubnVtZXJpYyldKQ0KDQojIEFsdGVybmF0aXZlbHksIGdncGFpcnMgZm9yIG1vcmUgb3B0aW9ucw0KbGlicmFyeShHR2FsbHkpDQpnZ3BhaXJzKGRmWywgc2FwcGx5KGRmLCBpcy5udW1lcmljKV0pDQoNCmBgYA0KDQoNCg==