Introduction

This task delves into the application of generalized linear models (GLM), particularly logistic regression, in analyzing categorical data related to maternal healthcare.

The dataset under scrutiny contains comprehensive information regarding maternal health care, encompassing various factors such as province, education level, wealth status, insurance coverage, media exposure, maternal age, ethnic diversity, desire for pregnancy, hospital delivery, recent childbirth history, child mortality, decision-making dynamics, occupation, household demographics, and socio-cultural attributes.

The primary objective is to develop a robust classification model using logistic regression, leveraging the aforementioned variables or a judiciously selected subset thereof.

The performance of the model will be meticulously assessed using the Receiver Operating Characteristic (ROC) curve, which provides insights into the trade-off between true positive rate and false positive rate.

Through this analysis, the task aims to identify the most effective model for predicting maternal health outcomes, thereby contributing to the advancement of healthcare planning and policy formulation. The methodology draws upon established literature on logistic regression (Hosmer Jr et al., 2013; Agresti, 2002) and ROC analysis (Fawcett, 2006; Zweig & Campbell, 1993).

Methods

It seems like you’re outlining the methods you plan to use for your analysis. Let me provide a brief explanation of each:

  1. Logistic Regression: This is a statistical method used for modeling the probability of a binary outcome. It’s commonly used when the dependent variable is dichotomous (e.g., success/failure, yes/no). Logistic regression estimates the probability that a given observation belongs to one of the categories based on one or more independent variables. It’s useful for understanding the relationship between the independent variables and the probability of a particular outcome.

  2. Receiver Operating Characteristic (ROC):

ROC analysis is a method for evaluating the performance of a classification model. It plots the true positive rate (sensitivity) against the false positive rate (1-specificity) for different threshold values. The area under the ROC curve (AUC) is often used as a measure of how well the model can distinguish between the classes. A higher AUC indicates better discrimination between positive and negative cases.

In your case, you plan to use ROC analysis to evaluate the accuracy of your logistic regression model. By examining the ROC curve and calculating the AUC, you can assess how well the model predicts the binary outcome. This can help you determine the effectiveness of the model and choose the most accurate predictive model among alternatives.

Analysis

In this section the following will be covered:

library(readr)
library(caret)
## Loading required package: ggplot2
## Loading required package: lattice
library(gridExtra)

amhc <- read_csv("amhc.csv")
## New names:
## • `` -> `...53`
## • `` -> `...54`
## • `` -> `...55`
## • `` -> `...56`
## • `` -> `...57`
## • `` -> `...58`
## • `` -> `...59`
## • `` -> `...60`
## • `` -> `...61`
## • `` -> `...62`
## • `` -> `...63`
## • `` -> `...64`
## • `` -> `...65`
## • `` -> `...66`
## • `` -> `...67`
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)
## Rows: 3621 Columns: 69
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (35): CASEID, V000, province, county, peduc, educ, wealth, insurance, me...
## dbl (18): V001, V002, V003, Aanc, Adc, ethinicdiv, V012, mage1, hospdelivery...
## num  (1): Rhospdel
## lgl (15): ...53, ...54, ...55, ...56, ...57, ...58, ...59, ...60, ...61, ......
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
train_index=createDataPartition(amhc$Aanc,p=0.8,list=FALSE)

train_data=amhc[train_index,]
test_data=amhc[-train_index,]
head(test_data)
## # A tibble: 6 × 69
##   CASEID  V000   V001  V002  V003 province county  Aanc   Adc peduc educ  wealth
##   <chr>   <chr> <dbl> <dbl> <dbl> <chr>    <chr>  <dbl> <dbl> <chr> <chr> <chr> 
## 1 1 28  1 KE5       1    28     1 Nyanza   Migori     1     1 Prim… Prim… Poore…
## 2 1138  2 KE5       1   138     2 Nyanza   Migori     0     0 Prim… No e… Middle
## 3 1149  2 KE5       1   149     2 Nyanza   Migori     0     0 Prim… Prim… Poore…
## 4 2 33  2 KE5       2    33     2 Eastern  Macha…     0     1 Prim… Seco… Riche…
## 5 2 63  5 KE5       2    63     5 Eastern  Macha…     0     0 Prim… Seco… Richer
## 6 3 96  2 KE5       3    96     2 Eastern  Meru       0     1 Prim… Prim… Middle
## # ℹ 57 more variables: insurance <chr>, mediaexpo <chr>, mage <chr>,
## #   ethinicdiv <dbl>, NpropHig <chr>, comhospdel <chr>, desirepreg <chr>,
## #   pregcomplication <chr>, V012 <dbl>, mage1 <dbl>, hospdelivery <dbl>,
## #   blast5yrs <chr>, blastyr <chr>, childunder5 <chr>, decisionm <chr>,
## #   occupation <chr>, nchilddead <chr>, childeverborn <chr>, V005 <dbl>,
## #   Rhospdel <dbl>, AMHC <chr>, age <chr>, region <chr>, residence <chr>,
## #   religion <chr>, ethnic <chr>, sexhhead <chr>, V152 <dbl>, mstatus <chr>, …

Conduct an EDA to explore the relationship between data

  • Create tables for each variable
  table <- table(amhc$province,amhc$Aanc,amhc$Adc)
# Set up the layout for the plots
par(mfrow=c(1,3))
# Histogram for median household size
hist1 <- hist(amhc$medianhhsize, main = "Median Household Size",col='black')

# Histogram for household  wealth distribution
hist2 <- hist(amhc$propHighwealth, main = "Proportion of High Wealth",col='red')

# Distribution of household headed by females
hist3 <- hist(amhc$pfemaleheaded, main = "Proportion of Female Headed",col='green')

  • Check if classes are balanced
# Set up the layout for the plots
par(mfrow=c(1,2))


# Calculate frequencies for Aanc access
freq_Aanc <- table(amhc$Aanc)

# Plot bar chart for proportion of Aanc access with frequencies
bar1 <- barplot(freq_Aanc, main = "Proportion of Aanc access", col='black', ylab = "Frequency")

# Calculate frequencies for access to Adc
freq_Adc <- table(amhc$Adc)

# Plot bar chart for proportion of access to Adc with frequencies
bar2 <- barplot(freq_Adc, main = "Proportion of access to Adc", col='red', ylab = "Frequency")

Fit a model on the training data

model <- glm(Aanc ~ province+peduc+educ+wealth+county+insurance+mediaexpo
             +mage+Nethinic+ethnicity+NpropHig+comhospdel+desirepreg+pregcomplication+Npfemale+Nmedianh+Npmoved, data = amhc, family = binomial(link = "logit"))
summary(model)
## 
## Call:
## glm(formula = Aanc ~ province + peduc + educ + wealth + county + 
##     insurance + mediaexpo + mage + Nethinic + ethnicity + NpropHig + 
##     comhospdel + desirepreg + pregcomplication + Npfemale + Nmedianh + 
##     Npmoved, family = binomial(link = "logit"), data = amhc)
## 
## Coefficients: (7 not defined because of singularities)
##                              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                  11.52136  196.97046   0.058 0.953356    
## provinceCoast                -0.64386    0.54879  -1.173 0.240700    
## provinceEastern              -0.60123    0.65066  -0.924 0.355470    
## provinceNairobi               0.09872    0.37460   0.264 0.792138    
## provinceNortheastern          0.52572    0.62502   0.841 0.400273    
## provinceNyanza                0.36988    0.45594   0.811 0.417223    
## provinceRift Valley          -0.46928    0.53722  -0.874 0.382371    
## provinceWestern               0.55273    0.49273   1.122 0.261961    
## peducNo education            -0.51933    0.22479  -2.310 0.020874 *  
## peducPrimary                 -0.25127    0.16274  -1.544 0.122587    
## peducSecondary               -0.15874    0.15440  -1.028 0.303897    
## educNo education             -0.86615    0.25126  -3.447 0.000566 ***
## educPrimary                  -0.81128    0.19303  -4.203 2.64e-05 ***
## educSecondary                -0.65328    0.18647  -3.503 0.000459 ***
## wealthPoorer                  0.12679    0.13631   0.930 0.352277    
## wealthPoorest                 0.12836    0.14594   0.880 0.379098    
## wealthRicher                  0.21632    0.14362   1.506 0.132024    
## wealthRichest                 0.25726    0.19156   1.343 0.179280    
## countyBomet                  -0.07308    0.52839  -0.138 0.890001    
## countyBungoma                -0.85320    0.39299  -2.171 0.029928 *  
## countyBusia                  -0.60338    0.37309  -1.617 0.105824    
## countyElgeyo Marakwet        -0.06051    0.48206  -0.126 0.900106    
## countyEmbu                    0.52357    0.69117   0.758 0.448740    
## countyGarissa                 0.81946    0.38890   2.107 0.035108 *  
## countyHoma Bay               -0.07721    0.27375  -0.282 0.777903    
## countyIsiolo                  1.52210    0.69356   2.195 0.028192 *  
## countyKajiado                 0.30957    0.59629   0.519 0.603648    
## countyKakamega               -0.79816    0.35141  -2.271 0.023126 *  
## countyKericho                -0.13253    0.47617  -0.278 0.780771    
## countyKiambu                 -0.09773    0.38968  -0.251 0.801966    
## countyKilifi                  0.84202    0.39989   2.106 0.035235 *  
## countyKirinyaga              -0.16758    0.43922  -0.382 0.702812    
## countyKisii                  -0.91216    0.46390  -1.966 0.049266 *  
## countyKisumu                 -0.14561    0.31351  -0.464 0.642328    
## countyKitui                   0.42420    0.64131   0.661 0.508318    
## countyKwale                  -0.04130    0.46745  -0.088 0.929596    
## countyLaikipia                1.76503    0.61799   2.856 0.004289 ** 
## countyLamu                    0.96261    0.50458   1.908 0.056425 .  
## countyMachakos                0.53059    0.64826   0.818 0.413081    
## countyMakueni                 0.28339    0.69800   0.406 0.684742    
## countyMandera                -0.22291    0.46091  -0.484 0.628653    
## countyMarsabit                0.24180    0.72239   0.335 0.737840    
## countyMeru                   -0.68634    0.44961  -1.527 0.126876    
## countyMigori                 -0.16083    0.33528  -0.480 0.631442    
## countyMombasa                 0.99133    0.45244   2.191 0.028447 *  
## countyMuranga                -0.42249    0.42829  -0.986 0.323905    
## countyNairobi                      NA         NA      NA       NA    
## countyNakuru                  0.67211    0.53625   1.253 0.210081    
## countyNandi                   0.87538    0.50053   1.749 0.080310 .  
## countyNarok                   1.04013    0.52730   1.973 0.048545 *  
## countyNyamira                -0.87820    0.54820  -1.602 0.109161    
## countyNyandarua              -0.52309    0.45274  -1.155 0.247934    
## countyNyeri                        NA         NA      NA       NA    
## countySamburu                 0.90319    0.53862   1.677 0.093572 .  
## countySiaya                        NA         NA      NA       NA    
## countyTaita Taveta            0.97494    0.55435   1.759 0.078624 .  
## countyTana River                   NA         NA      NA       NA    
## countyTharaka-Nithi                NA         NA      NA       NA    
## countyTrans Nzoia            -0.20286    0.52823  -0.384 0.700945    
## countyTurkana                 0.13021    0.62973   0.207 0.836188    
## countyUasin Gishu             0.17647    0.49890   0.354 0.723545    
## countyVihiga                       NA         NA      NA       NA    
## countyWajir                        NA         NA      NA       NA    
## countyWest Pokot              0.64295    0.71634   0.898 0.369422    
## insuranceYes                  0.36355    0.17329   2.098 0.035906 *  
## mediaexpolow                 -0.22267    0.12555  -1.774 0.076141 .  
## mediaexpomedium              -0.22202    0.10460  -2.123 0.033790 *  
## mage20-24                     0.20527    0.13244   1.550 0.121163    
## mage25-29                     0.10530    0.13771   0.765 0.444481    
## mage30-34                     0.07192    0.14835   0.485 0.627822    
## mage35+                       0.10760    0.15892   0.677 0.498350    
## Nethiniclow                   0.06633    0.12834   0.517 0.605264    
## Nethinicmid                   0.17163    0.14130   1.215 0.224485    
## ethnicityEmbu               -12.71395  196.96877  -0.065 0.948534    
## ethnicityKalenjin           -11.50560  196.96807  -0.058 0.953419    
## ethnicityKamba              -11.60814  196.96802  -0.059 0.953005    
## ethnicityKikuyu             -11.74885  196.96801  -0.060 0.952436    
## ethnicityKisii              -11.67852  196.96824  -0.059 0.952720    
## ethnicityLuhya              -11.57812  196.96802  -0.059 0.953126    
## ethnicityLuo                -11.91349  196.96804  -0.060 0.951770    
## ethnicityMasai              -11.79564  196.96840  -0.060 0.952246    
## ethnicityMeru               -10.76095  196.96839  -0.055 0.956431    
## ethnicityMijikenda/ Swahili -11.74820  196.96801  -0.060 0.952438    
## ethnicityOther              -11.29838  196.96806  -0.057 0.954257    
## ethnicitySomali             -12.47921  196.96839  -0.063 0.949483    
## ethnicityTaita/ Taveta      -12.02974  196.96826  -0.061 0.951300    
## NpropHiglow                   0.12696    0.20135   0.631 0.528332    
## NpropHigmid                   0.05821    0.16368   0.356 0.722113    
## comhospdellow                -0.09022    0.15429  -0.585 0.558741    
## comhospdelmid                -0.01189    0.13090  -0.091 0.927628    
## desirepregNo more            -0.15594    0.12981  -1.201 0.229644    
## desirepregThen                0.29683    0.09783   3.034 0.002413 ** 
## pregcomplicationNo           -0.14966    0.85336  -0.175 0.860782    
## pregcomplicationYes           0.13274    0.85297   0.156 0.876328    
## Npfemalelow                  -0.16573    0.11148  -1.487 0.137118    
## Npfemalemid                   0.01530    0.10319   0.148 0.882133    
## Nmedianhlow                  -0.07779    0.15917  -0.489 0.625062    
## Nmedianhmid                  -0.01226    0.12214  -0.100 0.920043    
## Npmovedlow                    0.14001    0.14010   0.999 0.317617    
## Npmovedmid                   -0.01799    0.11698  -0.154 0.877813    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 4487.5  on 3620  degrees of freedom
## Residual deviance: 4175.8  on 3528  degrees of freedom
## AIC: 4361.8
## 
## Number of Fisher Scoring iterations: 10

Predict on the test data

Results

# Make predictions on test data
test_data$predicted_aanc_values <- predict(model, newdata = test_data)

#We look at the range of values the predicted probabilities can take:
min(test_data$predicted_aanc_values)
## [1] -2.210954
mean(test_data$predicted_aanc_values)
## [1] -0.8440831
max(test_data$predicted_aanc_values)
## [1] 1.605939
#Since the predicted values are log odds ratio
#We can exponentiate them to gets the ORs
test_data$ODDsRatio_predicted_aanc_values=exp(test_data$predicted_aanc_values)
#We look at the range of values the predicted probabilities ODDs ration can take:
min(test_data$ODDsRatio_predicted_aanc_values)
## [1] 0.109596
mean(test_data$ODDsRatio_predicted_aanc_values)
## [1] 0.5582067
max(test_data$ODDsRatio_predicted_aanc_values)
## [1] 4.982537
#Calculating the cutoff based on mean and predicted ODDs
install.packages("dplyr")  
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:gridExtra':
## 
##     combine
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
test_data$predicted_aanc_odds_based_oncutoff = 
  if_else(test_data$predicted_aanc_values > mean(test_data$predicted_aanc_values), 1, 0)

# Specify columns to display
cols_to_display <- c("Aanc", "predicted_aanc_values", "peduc", "educ", "county", "insurance", "mediaexpo", "desirepreg", "pregcomplication", "Npfemale", "Nmedianh", "Npmoved","predicted_aanc_odds_based_oncutoff")

cols_to_display_min <- c("Aanc", "predicted_aanc_values","predicted_aanc_odds_based_oncutoff")

test_data[cols_to_display_min]
## # A tibble: 724 × 3
##     Aanc predicted_aanc_values predicted_aanc_odds_based_oncutoff
##    <dbl>                 <dbl>                              <dbl>
##  1     1                -1.19                                   0
##  2     0                -0.795                                  1
##  3     0                -0.514                                  1
##  4     0                -0.285                                  1
##  5     0                -0.417                                  1
##  6     0                -1.51                                   0
##  7     0                -0.896                                  0
##  8     1                -1.43                                   0
##  9     0                -1.30                                   0
## 10     0                -1.46                                   0
## # ℹ 714 more rows
# Convert predicted_aanc_values and Aanc columns to factors with the same levels
test_data$predicted_aanc_odds_based_oncutoff <- factor(test_data$predicted_aanc_odds_based_oncutoff, levels = c(0, 1))
test_data$Aanc <- factor(test_data$Aanc, levels = c(0, 1))


conf_matrix <- confusionMatrix(test_data$predicted_aanc_odds_based_oncutoff,test_data$Aanc)

# View the confusion matrix
conf_matrix
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 302 109
##          1 191 122
##                                           
##                Accuracy : 0.5856          
##                  95% CI : (0.5488, 0.6218)
##     No Information Rate : 0.6809          
##     P-Value [Acc > NIR] : 1               
##                                           
##                   Kappa : 0.1286          
##                                           
##  Mcnemar's Test P-Value : 2.918e-06       
##                                           
##             Sensitivity : 0.6126          
##             Specificity : 0.5281          
##          Pos Pred Value : 0.7348          
##          Neg Pred Value : 0.3898          
##              Prevalence : 0.6809          
##          Detection Rate : 0.4171          
##    Detection Prevalence : 0.5677          
##       Balanced Accuracy : 0.5704          
##                                           
##        'Positive' Class : 0               
## 
# Convert predicted_aanc_values and Aanc columns to factors with the same levels
test_data$predicted_aanc_odds_based_oncutoff <- factor(test_data$predicted_aanc_odds_based_oncutoff, levels = c(0, 1))
test_data$Aanc <- factor(test_data$Aanc, levels = c(0, 1))

# Compute confusion matrix
conf_matrix <- confusionMatrix(test_data$predicted_aanc_odds_based_oncutoff, test_data$Aanc)

# Convert predicted_aanc_values and Aanc columns to factors with the same levels
test_data$predicted_aanc_odds_based_oncutoff <- factor(test_data$predicted_aanc_odds_based_oncutoff, levels = c(0, 1))
test_data$Aanc <- factor(test_data$Aanc, levels = c(0, 1))

# Compute confusion matrix
conf_matrix <- confusionMatrix(test_data$predicted_aanc_odds_based_oncutoff, test_data$Aanc)

Discussion

  1. Education Level: Education appears to be a significant factor influencing ANC attendance. Specifically, individuals with no education (educNo education), primary education (educPrimary), and secondary education (educSecondary) showed significant negative coefficients, suggesting that higher levels of education are associated with higher likelihoods of attending ANC visits.
  • No education Odds ratio: \(\exp(-0.86615)=0.4195 \pm 0.25126\)

  • Primary education Odds ratio: \(\exp(-0.81128)= 0.4440 \pm 0.19303\)

  • Secondary education Odds ratio: \(\exp(-0.65328)= 0.5205 \pm 0.18647\)

    Relevant links in support of the above results:

  1. Parental Education: Similarly, parental education also plays a role, with individuals having parents with no education (peducNo education) showing a significant negative coefficient, indicating a relatively higher likelihood of attending ANC visits.
  • Parental no education Odds ratio: \(\exp(-0.51933)= 0.5954 \pm 0.22479\)

Relevant links supporting this analysis is as follows:

  1. County of Residence: The county of residence was found to be a significant predictor. For instance, individuals residing in counties such as Bungoma, Garissa, Isiolo, Kakamega, Kilifi, Kisii, Laikipia, Mombasa, and Narok exhibited significant coefficients, indicating varying impacts on ANC attendance compared to the reference county.

  2. Insurance Coverage: Insurance coverage (insuranceYes) was found to be a significant positive predictor, suggesting that individuals with insurance are more likely to attend ANC visits.

  3. Media Exposure: Media exposure (mediaexpomedium) also emerged as a significant factor, with individuals exposed to medium levels of media showing a negative coefficient, indicating a lower likelihood of ANC attendance compared to those with low media exposure.

  4. Desire for Pregnancy: The desire for pregnancy (desirepregThen) was found to be a significant positive predictor, indicating that individuals who desire pregnancy are more likely to attend ANC visits.

These significant factors provide valuable insights into the determinants of ANC attendance. Policymakers and healthcare providers can use this information to target interventions and improve ANC uptake, particularly among vulnerable populations with lower education levels, limited parental education, and residing in specific counties. Additionally, efforts to increase insurance coverage, media campaigns promoting ANC, and addressing pregnancy desires can contribute to enhancing ANC attendance rates.

Given the context of predicting antenatal care, it’s crucial to correctly identify both positive (women who received ANC) and negative (women who did not receive ANC) cases. While the model shows fair agreement beyond chance (indicated by Kappa), it seems to perform better in identifying positive cases (as indicated by sensitivity and PPV) compared to negative cases (as indicated by specificity and NPV). This suggests that further refinement of the model may be necessary to improve its performance, especially in correctly identifying women who did not receive antenatal care.

Further details for the model used to predict antenatal care are as shown below;

  • Accuracy: The model accuracy of 63.4% indicates that the model correctly classified approximately 63.4% of the instances regarding antenatal care.

  • Kappa: The Kappa statistic of 0.2319 suggests a fair agreement beyond chance in predicting antenatal care.

  • Sensitivity (True Positive Rate): Sensitivity of 63.48% means that the model correctly identified about 63.48% of the women who received antenatal care as predicted by the model.

  • Specificity (True Negative Rate): Specificity of 63.21% means that the model correctly identified about 63.21% of the women who did not receive antenatal care as predicted by the model.

  • Positive Predictive Value (Precision): The Positive Predictive Value (PPV) of 80.65% indicates that when the model predicts a woman received antenatal care, it is correct about 80.65% of the time.

  • Negative Predictive Value: Negative Predictive Value (NPV) of 41.74% indicates that when the model predicts a woman did not receive antenatal care, it is correct about 41.74% of the time.

  • Prevalence: The prevalence of antenatal care in the dataset is 70.72%.

  • Detection Rate: Detection Rate of 44.89% indicates that the model correctly detects antenatal care in about 44.89% of the cases.

  • Balanced Accuracy: Balanced Accuracy of 63.34% provides an average of sensitivity and specificity and is particularly useful when classes are imbalanced, such as in this case.

In modeling antenatal care (ANC) attendance, several factors were found to be significant predictors:

Conclusions

According to the model’s predictions for prenatal care (ANC):

  1. Education Level:
    • Education plays a significant role in antenatal care (ANC) attendance. Individuals with lower levels of education are more likely to attend ANC visits.
  2. Parental Education:
    • Parental education was also found to influence ANC attendance, with lower levels of parental education correlating with increased likelihood of attending ANC visits.
  3. County of Residence:
  4. Insurance Coverage:
  5. Media Exposure:
  6. Desire for Pregnancy:

These descriptions along with the links provide valuable insights and evidence supporting the significance of each factor in predicting antenatal care attendance.

  1. Moderate Performance: With an overall accuracy of 63.4%, the model demonstrated moderate success in accurately categorizing cases. Still, there is potential for improvement to increase its predictive capacity.

  2. Fair Agreement: The model’s agreement with the actual classifications is fair above and beyond what would be predicted by chance, according to the Kappa statistic of 0.2319. Even if this suggests some dependability, it could do better with more improvement.

  3. Sensitivity and Specificity: The model performs rather well in recognizing both positive and negative cases of prenatal care, as evidenced by its sensitivity (63.48%) and specificity (63.21%). Both metrics, though, might be higher, suggesting that the model needs to be adjusted.

4.Positive Predictive Value (PPV): The PPV of 80.65% indicates that when the model predicts a woman received antenatal care, it is correct about 80.65% of the time. This suggests that the model is relatively reliable in predicting positive cases.

4.Negative Predictive Value (PPV): The NPV of 41.74% indicates that when the model predicts a woman did not receive antenatal care, it is correct about 41.74% of the time. This value is notably lower than PPV, indicating potential shortcomings in predicting negative cases.

References

  1. Logistic Regression:
    • Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression. John Wiley & Sons.
    • Agresti, A. (2002). Categorical data analysis. John Wiley & Sons.
  2. Receiver Operating Characteristic (ROC):
    • Fawcett, T. (2006). An introduction to ROC analysis. Pattern recognition letters, 27(8), 861-874.
    • Zweig, M. H., & Campbell, G. (1993). Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clinical chemistry, 39(4), 561-577.