Data Preparation
#URL of the dataset
url <- "https://raw.githubusercontent.com/lburenkov/maternalrisk/refs/heads/main/Maternal%20Health%20Risk%20Data%20Set.csv"
#Loading the dataset into a data frame
df <- read.csv(url)
#Displaying the first few rows of the dataset
head(df)
library(tidyverse)
library(openintro)
Research question
What are the primary risk factors for maternal health complications
in rural Bangladesh?
By analyzing the features in the dataset, researchers could identify
the factors most strongly associated with adverse maternal outcomes.
Cases
What are the cases, and how many are there?
There are 1014 rows and 7 variables in this dataset.
## 'data.frame': 1014 obs. of 7 variables:
## $ Age : int 25 35 29 30 35 23 23 35 32 42 ...
## $ SystolicBP : int 130 140 90 140 120 140 130 85 120 130 ...
## $ DiastolicBP: int 80 90 70 85 60 80 70 60 90 80 ...
## $ BS : num 15 13 8 7 6.1 7.01 7.01 11 6.9 18 ...
## $ BodyTemp : num 98 98 100 98 98 98 98 102 98 98 ...
## $ HeartRate : int 86 70 80 70 76 70 78 86 70 70 ...
## $ RiskLevel : chr "high risk" "high risk" "high risk" "high risk" ...
Data collection
The Maternal Health Risk Dataset was collected by Ahmed, M. (2020) as
part of a study aimed at identifying and predicting maternal health
risks in rural Bangladesh. The dataset was compiled using an Internet of
Things (IoT)-based monitoring system, which collected real-time health
data from various sources, including hospitals, community clinics, and
maternal healthcare facilities in rural regions.
This dataset focuses on the identification of maternal health risks
by monitoring key health indicators, including factors such as age,
blood pressure, and other clinical measures. The data was collected from
1,013 instances, each representing an individual’s health record during
their pregnancy. The dataset consists of 6 features (both real and
integer types) and is intended for use in classification tasks, where
the goal is to predict the risk level based on the provided
features.
The data was sourced from various rural healthcare settings across
Bangladesh, making it a crucial resource for research aimed at improving
maternal healthcare systems in low-resource environments. The collection
process was designed to ensure the accurate monitoring of key health
indicators to enable early intervention and improve health outcomes.
For further details, please refer to the original source of the
dataset: Ahmed, M. (2020). Maternal Health Risk [Dataset]. UCI Machine
Learning Repository. DOI: 10.24432/C5DP5D.
Type of study
What type of study is this (observational/experiment)?
As an observational study, this dataset falls under a
non-interventional research design, where researchers observe and
analyze existing data without influencing the conditions being studied.
In this case, health metrics like blood pressure, age, and other
clinical indicators were monitored, with the goal of identifying
maternal health risks. The researchers’ role was limited to collecting
and analyzing the data rather than manipulating or controlling the
factors that could impact maternal health outcomes.
Describe your variables?
Are they quantitative or qualitative?
In the Maternal Health Risk Dataset, the features are primarily
quantitative, meaning they consist of numeric values (either continuous
or discrete integers). Examples include age (real), blood pressure
(real), and other clinical measurements that are typically treated as
quantitative variables.
If you are are running a regression or similar model, which one is
your dependent variable?
It is the intent to work on a classification Model:
Dependent Variable (Target Variable):
Maternal Health Risk:(target) a categorical variable
Binary classification: “high risk” vs. “low risk”. Multiclass
classification: “low risk”, “medium risk”, “high risk”
Independent Variables (Features):
The independent variables (features) are the health indicators like
age, blood pressure, or other clinical measures that could be used to
predict the maternal health risk.
Relevant summary statistics
Provide summary statistics for each the variables. Also include
appropriate visualizations related to your research question
(e.g. scatter plot, boxplots, etc). This step requires the use of R,
hence a code chunk is provided below. Insert more code chunks as
needed.**
## Age SystolicBP DiastolicBP BS
## Min. :10.00 Min. : 70.0 Min. : 49.00 Min. : 6.000
## 1st Qu.:19.00 1st Qu.:100.0 1st Qu.: 65.00 1st Qu.: 6.900
## Median :26.00 Median :120.0 Median : 80.00 Median : 7.500
## Mean :29.87 Mean :113.2 Mean : 76.46 Mean : 8.726
## 3rd Qu.:39.00 3rd Qu.:120.0 3rd Qu.: 90.00 3rd Qu.: 8.000
## Max. :70.00 Max. :160.0 Max. :100.00 Max. :19.000
## BodyTemp HeartRate RiskLevel
## Min. : 98.00 Min. : 7.0 Length:1014
## 1st Qu.: 98.00 1st Qu.:70.0 Class :character
## Median : 98.00 Median :76.0 Mode :character
## Mean : 98.67 Mean :74.3
## 3rd Qu.: 98.00 3rd Qu.:80.0
## Max. :103.00 Max. :90.0
# Histogram
hist(df$Age, main = "Age Distribution", xlab = "Age", col = "lightblue", border = "black")

# Boxplot
boxplot(df$Age ~ df$RiskLevel, main = "Age by Risk Level", xlab = "Risk Level", ylab = "Age")

# Density plot
plot(density(df$Age), main = "Density Plot of Age", xlab = "Age")

#Calculating skewness and kurtosis
library(e1071)
skewness(df$Age)
## [1] 0.7807483
## [1] -0.400533
# Pairwise scatter plots
pairs(df[, sapply(df, is.numeric)])

# Alternatively, ggpairs for more options
library(GGally)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
##
## Attaching package: 'GGally'
## The following object is masked from 'package:openintro':
##
## tips
ggpairs(df[, sapply(df, is.numeric)])

LS0tDQp0aXRsZTogIkRBVEEgNjA2IERhdGEgUHJvamVjdCBQcm9wb3NhbCINCmF1dGhvcjogIkxhdXJhIEIiDQpkYXRlOiAiYHIgU3lzLkRhdGUoKWAiDQpvdXRwdXQ6IG9wZW5pbnRybzo6bGFiX3JlcG9ydA0KLS0tDQoNCiMjIyBEYXRhIFByZXBhcmF0aW9uDQoNCmBgYHtyIHNldHVwLCBlY2hvPVRSVUUsIHJlc3VsdHM9J2hpZGUnLCB3YXJuaW5nPUZBTFNFLCBtZXNzYWdlPUZBTFNFfQ0KI1VSTCBvZiB0aGUgZGF0YXNldA0KdXJsIDwtICJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vbGJ1cmVua292L21hdGVybmFscmlzay9yZWZzL2hlYWRzL21haW4vTWF0ZXJuYWwlMjBIZWFsdGglMjBSaXNrJTIwRGF0YSUyMFNldC5jc3YiDQoNCiNMb2FkaW5nIHRoZSBkYXRhc2V0IGludG8gYSBkYXRhIGZyYW1lDQpkZiA8LSByZWFkLmNzdih1cmwpDQoNCiNEaXNwbGF5aW5nIHRoZSBmaXJzdCBmZXcgcm93cyBvZiB0aGUgZGF0YXNldA0KaGVhZChkZikNCg0KYGBgDQoNCmBgYHtyIGxvYWQtcGFja2FnZXMsIG1lc3NhZ2U9RkFMU0V9DQpsaWJyYXJ5KHRpZHl2ZXJzZSkNCmxpYnJhcnkob3BlbmludHJvKQ0KYGBgDQoNCiMjIyBSZXNlYXJjaCBxdWVzdGlvbiANCg0KV2hhdCBhcmUgdGhlIHByaW1hcnkgcmlzayBmYWN0b3JzIGZvciBtYXRlcm5hbCBoZWFsdGggY29tcGxpY2F0aW9ucyBpbiBydXJhbCBCYW5nbGFkZXNoPw0KDQpCeSBhbmFseXppbmcgdGhlIGZlYXR1cmVzIGluIHRoZSBkYXRhc2V0LCByZXNlYXJjaGVycyBjb3VsZCBpZGVudGlmeSB0aGUgZmFjdG9ycyBtb3N0IHN0cm9uZ2x5IGFzc29jaWF0ZWQgd2l0aCBhZHZlcnNlIG1hdGVybmFsIG91dGNvbWVzLg0KDQojIyMgQ2FzZXMgDQoNCldoYXQgYXJlIHRoZSBjYXNlcywgYW5kIGhvdyBtYW55IGFyZSB0aGVyZT8NCg0KVGhlcmUgYXJlIDEwMTQgcm93cyBhbmQgNyB2YXJpYWJsZXMgaW4gdGhpcyBkYXRhc2V0Lg0KDQpgYGB7cn0NCnN0cihkZikNCmBgYA0KDQoNCiMjIyBEYXRhIGNvbGxlY3Rpb24gDQoNClRoZSBNYXRlcm5hbCBIZWFsdGggUmlzayBEYXRhc2V0IHdhcyBjb2xsZWN0ZWQgYnkgQWhtZWQsIE0uICgyMDIwKSBhcyBwYXJ0IG9mIGEgc3R1ZHkgYWltZWQgYXQgaWRlbnRpZnlpbmcgYW5kIHByZWRpY3RpbmcgbWF0ZXJuYWwgaGVhbHRoIHJpc2tzIGluIHJ1cmFsIEJhbmdsYWRlc2guIFRoZSBkYXRhc2V0IHdhcyBjb21waWxlZCB1c2luZyBhbiBJbnRlcm5ldCBvZiBUaGluZ3MgKElvVCktYmFzZWQgbW9uaXRvcmluZyBzeXN0ZW0sIHdoaWNoIGNvbGxlY3RlZCByZWFsLXRpbWUgaGVhbHRoIGRhdGEgZnJvbSB2YXJpb3VzIHNvdXJjZXMsIGluY2x1ZGluZyBob3NwaXRhbHMsIGNvbW11bml0eSBjbGluaWNzLCBhbmQgbWF0ZXJuYWwgaGVhbHRoY2FyZSBmYWNpbGl0aWVzIGluIHJ1cmFsIHJlZ2lvbnMuDQoNClRoaXMgZGF0YXNldCBmb2N1c2VzIG9uIHRoZSBpZGVudGlmaWNhdGlvbiBvZiBtYXRlcm5hbCBoZWFsdGggcmlza3MgYnkgbW9uaXRvcmluZyBrZXkgaGVhbHRoIGluZGljYXRvcnMsIGluY2x1ZGluZyBmYWN0b3JzIHN1Y2ggYXMgYWdlLCBibG9vZCBwcmVzc3VyZSwgYW5kIG90aGVyIGNsaW5pY2FsIG1lYXN1cmVzLiBUaGUgZGF0YSB3YXMgY29sbGVjdGVkIGZyb20gMSwwMTMgaW5zdGFuY2VzLCBlYWNoIHJlcHJlc2VudGluZyBhbiBpbmRpdmlkdWFs4oCZcyBoZWFsdGggcmVjb3JkIGR1cmluZyB0aGVpciBwcmVnbmFuY3kuIFRoZSBkYXRhc2V0IGNvbnNpc3RzIG9mIDYgZmVhdHVyZXMgKGJvdGggcmVhbCBhbmQgaW50ZWdlciB0eXBlcykgYW5kIGlzIGludGVuZGVkIGZvciB1c2UgaW4gY2xhc3NpZmljYXRpb24gdGFza3MsIHdoZXJlIHRoZSBnb2FsIGlzIHRvIHByZWRpY3QgdGhlIHJpc2sgbGV2ZWwgYmFzZWQgb24gdGhlIHByb3ZpZGVkIGZlYXR1cmVzLg0KDQpUaGUgZGF0YSB3YXMgc291cmNlZCBmcm9tIHZhcmlvdXMgcnVyYWwgaGVhbHRoY2FyZSBzZXR0aW5ncyBhY3Jvc3MgQmFuZ2xhZGVzaCwgbWFraW5nIGl0IGEgY3J1Y2lhbCByZXNvdXJjZSBmb3IgcmVzZWFyY2ggYWltZWQgYXQgaW1wcm92aW5nIG1hdGVybmFsIGhlYWx0aGNhcmUgc3lzdGVtcyBpbiBsb3ctcmVzb3VyY2UgZW52aXJvbm1lbnRzLiBUaGUgY29sbGVjdGlvbiBwcm9jZXNzIHdhcyBkZXNpZ25lZCB0byBlbnN1cmUgdGhlIGFjY3VyYXRlIG1vbml0b3Jpbmcgb2Yga2V5IGhlYWx0aCBpbmRpY2F0b3JzIHRvIGVuYWJsZSBlYXJseSBpbnRlcnZlbnRpb24gYW5kIGltcHJvdmUgaGVhbHRoIG91dGNvbWVzLg0KDQpGb3IgZnVydGhlciBkZXRhaWxzLCBwbGVhc2UgcmVmZXIgdG8gdGhlIG9yaWdpbmFsIHNvdXJjZSBvZiB0aGUgZGF0YXNldDoNCkFobWVkLCBNLiAoMjAyMCkuIE1hdGVybmFsIEhlYWx0aCBSaXNrIFtEYXRhc2V0XS4gVUNJIE1hY2hpbmUgTGVhcm5pbmcgUmVwb3NpdG9yeS4gRE9JOiAxMC4yNDQzMi9DNURQNUQuDQoNCg0KDQojIyMgVHlwZSBvZiBzdHVkeSANCg0KV2hhdCB0eXBlIG9mIHN0dWR5IGlzIHRoaXMgKG9ic2VydmF0aW9uYWwvZXhwZXJpbWVudCk/DQoNCkFzIGFuIG9ic2VydmF0aW9uYWwgc3R1ZHksIHRoaXMgZGF0YXNldCBmYWxscyB1bmRlciBhIG5vbi1pbnRlcnZlbnRpb25hbCByZXNlYXJjaCBkZXNpZ24sIHdoZXJlIHJlc2VhcmNoZXJzIG9ic2VydmUgYW5kIGFuYWx5emUgZXhpc3RpbmcgZGF0YSB3aXRob3V0IGluZmx1ZW5jaW5nIHRoZSBjb25kaXRpb25zIGJlaW5nIHN0dWRpZWQuIEluIHRoaXMgY2FzZSwgaGVhbHRoIG1ldHJpY3MgbGlrZSBibG9vZCBwcmVzc3VyZSwgYWdlLCBhbmQgb3RoZXIgY2xpbmljYWwgaW5kaWNhdG9ycyB3ZXJlIG1vbml0b3JlZCwgd2l0aCB0aGUgZ29hbCBvZiBpZGVudGlmeWluZyBtYXRlcm5hbCBoZWFsdGggcmlza3MuIFRoZSByZXNlYXJjaGVycycgcm9sZSB3YXMgbGltaXRlZCB0byBjb2xsZWN0aW5nIGFuZCBhbmFseXppbmcgdGhlIGRhdGEgcmF0aGVyIHRoYW4gbWFuaXB1bGF0aW5nIG9yIGNvbnRyb2xsaW5nIHRoZSBmYWN0b3JzIHRoYXQgY291bGQgaW1wYWN0IG1hdGVybmFsIGhlYWx0aCBvdXRjb21lcy4NCg0KIyMjIERhdGEgU291cmNlIA0KDQpBaG1lZCwgTS4gKDIwMjApLiBNYXRlcm5hbCBIZWFsdGggUmlzayBbRGF0YXNldF0uIFVDSSBNYWNoaW5lIExlYXJuaW5nIFJlcG9zaXRvcnkuIGh0dHBzOi8vZG9pLm9yZy8xMC4yNDQzMi9DNURQNUQuDQoNCg0KIyMjIERlc2NyaWJlIHlvdXIgdmFyaWFibGVzPw0KDQpBcmUgdGhleSBxdWFudGl0YXRpdmUgb3IgcXVhbGl0YXRpdmU/DQoNCkluIHRoZSBNYXRlcm5hbCBIZWFsdGggUmlzayBEYXRhc2V0LCB0aGUgZmVhdHVyZXMgYXJlIHByaW1hcmlseSBxdWFudGl0YXRpdmUsIG1lYW5pbmcgdGhleSBjb25zaXN0IG9mIG51bWVyaWMgdmFsdWVzIChlaXRoZXIgY29udGludW91cyBvciBkaXNjcmV0ZSBpbnRlZ2VycykuIEV4YW1wbGVzIGluY2x1ZGUgYWdlIChyZWFsKSwgYmxvb2QgcHJlc3N1cmUgKHJlYWwpLCBhbmQgb3RoZXIgY2xpbmljYWwgbWVhc3VyZW1lbnRzIHRoYXQgYXJlIHR5cGljYWxseSB0cmVhdGVkIGFzIHF1YW50aXRhdGl2ZSB2YXJpYWJsZXMuDQoNCklmIHlvdSBhcmUgYXJlIHJ1bm5pbmcgYSByZWdyZXNzaW9uIG9yIHNpbWlsYXIgbW9kZWwsIHdoaWNoIG9uZSBpcyB5b3VyIGRlcGVuZGVudCB2YXJpYWJsZT8NCg0KSXQgaXMgdGhlIGludGVudCB0byB3b3JrIG9uIGEgY2xhc3NpZmljYXRpb24gTW9kZWw6DQoNCkRlcGVuZGVudCBWYXJpYWJsZSAoVGFyZ2V0IFZhcmlhYmxlKToNCg0KTWF0ZXJuYWwgSGVhbHRoIFJpc2s6KHRhcmdldCkgYSBjYXRlZ29yaWNhbCB2YXJpYWJsZQ0KDQpCaW5hcnkgY2xhc3NpZmljYXRpb246ICJoaWdoIHJpc2siIHZzLiAibG93IHJpc2siLg0KTXVsdGljbGFzcyBjbGFzc2lmaWNhdGlvbjogImxvdyByaXNrIiwgIm1lZGl1bSByaXNrIiwgImhpZ2ggcmlzayIgDQoNCkluZGVwZW5kZW50IFZhcmlhYmxlcyAoRmVhdHVyZXMpOg0KDQpUaGUgaW5kZXBlbmRlbnQgdmFyaWFibGVzIChmZWF0dXJlcykgYXJlIHRoZSBoZWFsdGggaW5kaWNhdG9ycyBsaWtlIGFnZSwgYmxvb2QgcHJlc3N1cmUsIG9yIG90aGVyIGNsaW5pY2FsIG1lYXN1cmVzIHRoYXQgY291bGQgYmUgdXNlZCB0byBwcmVkaWN0IHRoZSBtYXRlcm5hbCBoZWFsdGggcmlzay4NCg0KDQojIyMgUmVsZXZhbnQgc3VtbWFyeSBzdGF0aXN0aWNzIA0KDQpQcm92aWRlIHN1bW1hcnkgc3RhdGlzdGljcyBmb3IgZWFjaCB0aGUgdmFyaWFibGVzLiBBbHNvIGluY2x1ZGUgYXBwcm9wcmlhdGUgdmlzdWFsaXphdGlvbnMgcmVsYXRlZCB0byB5b3VyIHJlc2VhcmNoIHF1ZXN0aW9uIChlLmcuIHNjYXR0ZXIgcGxvdCwgYm94cGxvdHMsIGV0YykuIFRoaXMgc3RlcCByZXF1aXJlcyB0aGUgdXNlIG9mIFIsIGhlbmNlIGEgY29kZSBjaHVuayBpcyBwcm92aWRlZCBiZWxvdy4gSW5zZXJ0IG1vcmUgY29kZSBjaHVua3MgYXMgbmVlZGVkLioqDQoNCmBgYHtyfQ0Kc3VtbWFyeShkZikNCmBgYA0KDQpgYGB7cn0NCiMgSGlzdG9ncmFtDQpoaXN0KGRmJEFnZSwgbWFpbiA9ICJBZ2UgRGlzdHJpYnV0aW9uIiwgeGxhYiA9ICJBZ2UiLCBjb2wgPSAibGlnaHRibHVlIiwgYm9yZGVyID0gImJsYWNrIikNCg0KIyBCb3hwbG90DQpib3hwbG90KGRmJEFnZSB+IGRmJFJpc2tMZXZlbCwgbWFpbiA9ICJBZ2UgYnkgUmlzayBMZXZlbCIsIHhsYWIgPSAiUmlzayBMZXZlbCIsIHlsYWIgPSAiQWdlIikNCg0KIyBEZW5zaXR5IHBsb3QNCnBsb3QoZGVuc2l0eShkZiRBZ2UpLCBtYWluID0gIkRlbnNpdHkgUGxvdCBvZiBBZ2UiLCB4bGFiID0gIkFnZSIpDQoNCmBgYA0KDQoNCmBgYHtyfQ0KI0NhbGN1bGF0aW5nIHNrZXduZXNzIGFuZCBrdXJ0b3Npcw0KbGlicmFyeShlMTA3MSkNCnNrZXduZXNzKGRmJEFnZSkNCmt1cnRvc2lzKGRmJEFnZSkNCg0KYGBgDQoNCmBgYHtyfQ0KIyBQYWlyd2lzZSBzY2F0dGVyIHBsb3RzDQpwYWlycyhkZlssIHNhcHBseShkZiwgaXMubnVtZXJpYyldKQ0KDQojIEFsdGVybmF0aXZlbHksIGdncGFpcnMgZm9yIG1vcmUgb3B0aW9ucw0KbGlicmFyeShHR2FsbHkpDQpnZ3BhaXJzKGRmWywgc2FwcGx5KGRmLCBpcy5udW1lcmljKV0pDQoNCmBgYA0KDQoNCg==