Statistical learning Assignment 3

Introduction

This task delves into the application of generalized linear models (GLM), particularly logistic regression, in analyzing categorical data related to maternal healthcare.

The dataset under scrutiny contains comprehensive information regarding maternal health care, encompassing various factors such as province, education level, wealth status, insurance coverage, media exposure, maternal age, ethnic diversity, desire for pregnancy, hospital delivery, recent childbirth history, child mortality, decision-making dynamics, occupation, household demographics, and socio-cultural attributes.

The primary objective is to develop a robust classification model using logistic regression, leveraging the aforementioned variables or a judiciously selected subset thereof.

The performance of the model will be meticulously assessed using the Receiver Operating Characteristic (ROC) curve, which provides insights into the trade-off between true positive rate and false positive rate.

Through this analysis, the task aims to identify the most effective model for predicting maternal health outcomes, thereby contributing to the advancement of healthcare planning and policy formulation. The methodology draws upon established literature on logistic regression (Hosmer Jr et al., 2013; Agresti, 2002) and ROC analysis (Fawcett, 2006; Zweig & Campbell, 1993).

Methods

It seems like you’re outlining the methods you plan to use for your analysis. Let me provide a brief explanation of each:

Logistic Regression: This is a statistical method used for modeling the probability of a binary outcome. It’s commonly used when the dependent variable is dichotomous (e.g., success/failure, yes/no). Logistic regression estimates the probability that a given observation belongs to one of the categories based on one or more independent variables. It’s useful for understanding the relationship between the independent variables and the probability of a particular outcome.
Receiver Operating Characteristic (ROC):

ROC analysis is a method for evaluating the performance of a classification model. It plots the true positive rate (sensitivity) against the false positive rate (1-specificity) for different threshold values. The area under the ROC curve (AUC) is often used as a measure of how well the model can distinguish between the classes. A higher AUC indicates better discrimination between positive and negative cases.

In your case, you plan to use ROC analysis to evaluate the accuracy of your logistic regression model. By examining the ROC curve and calculating the AUC, you can assess how well the model predicts the binary outcome. This can help you determine the effectiveness of the model and choose the most accurate predictive model among alternatives.

Analysis

In this section the following will be covered:

Use of caret library and CreateDatapartition method to split data into train and testing sets 80/20 rule will be used
Fit a model based on the training set
Predict how the fitted model performs on the testing set
Establish a cut off to help classify the outcome as either positive or negative
Apply an ROC curve and confusion matrix on the predicted and the observed values

library(readr)
library(caret)

## Loading required package: ggplot2

## Loading required package: lattice

library(gridExtra)

amhc <- read_csv("amhc.csv")

## New names:
## • `` -> `...53`
## • `` -> `...54`
## • `` -> `...55`
## • `` -> `...56`
## • `` -> `...57`
## • `` -> `...58`
## • `` -> `...59`
## • `` -> `...60`
## • `` -> `...61`
## • `` -> `...62`
## • `` -> `...63`
## • `` -> `...64`
## • `` -> `...65`
## • `` -> `...66`
## • `` -> `...67`

## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)

## Rows: 3621 Columns: 69

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (35): CASEID, V000, province, county, peduc, educ, wealth, insurance, me...
## dbl (18): V001, V002, V003, Aanc, Adc, ethinicdiv, V012, mage1, hospdelivery...
## num  (1): Rhospdel
## lgl (15): ...53, ...54, ...55, ...56, ...57, ...58, ...59, ...60, ...61, ......
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

train_index=createDataPartition(amhc$Aanc,p=0.8,list=FALSE)

train_data=amhc[train_index,]
test_data=amhc[-train_index,]
head(test_data)

## # A tibble: 6 × 69
##   CASEID  V000   V001  V002  V003 province county  Aanc   Adc peduc educ  wealth
##   <chr>   <chr> <dbl> <dbl> <dbl> <chr>    <chr>  <dbl> <dbl> <chr> <chr> <chr> 
## 1 1 28  1 KE5       1    28     1 Nyanza   Migori     1     1 Prim… Prim… Poore…
## 2 1138  2 KE5       1   138     2 Nyanza   Migori     0     0 Prim… No e… Middle
## 3 1149  2 KE5       1   149     2 Nyanza   Migori     0     0 Prim… Prim… Poore…
## 4 2 33  2 KE5       2    33     2 Eastern  Macha…     0     1 Prim… Seco… Riche…
## 5 2 63  5 KE5       2    63     5 Eastern  Macha…     0     0 Prim… Seco… Richer
## 6 3 96  2 KE5       3    96     2 Eastern  Meru       0     1 Prim… Prim… Middle
## # ℹ 57 more variables: insurance <chr>, mediaexpo <chr>, mage <chr>,
## #   ethinicdiv <dbl>, NpropHig <chr>, comhospdel <chr>, desirepreg <chr>,
## #   pregcomplication <chr>, V012 <dbl>, mage1 <dbl>, hospdelivery <dbl>,
## #   blast5yrs <chr>, blastyr <chr>, childunder5 <chr>, decisionm <chr>,
## #   occupation <chr>, nchilddead <chr>, childeverborn <chr>, V005 <dbl>,
## #   Rhospdel <dbl>, AMHC <chr>, age <chr>, region <chr>, residence <chr>,
## #   religion <chr>, ethnic <chr>, sexhhead <chr>, V152 <dbl>, mstatus <chr>, …

Conduct an EDA to explore the relationship between data

Create tables for each variable

  table <- table(amhc$province,amhc$Aanc,amhc$Adc)

# Set up the layout for the plots
par(mfrow=c(1,3))
# Histogram for median household size
hist1 <- hist(amhc$medianhhsize, main = "Median Household Size",col='black')

# Histogram for household  wealth distribution
hist2 <- hist(amhc$propHighwealth, main = "Proportion of High Wealth",col='red')

# Distribution of household headed by females
hist3 <- hist(amhc$pfemaleheaded, main = "Proportion of Female Headed",col='green')

Check if classes are balanced

# Set up the layout for the plots
par(mfrow=c(1,2))


# Calculate frequencies for Aanc access
freq_Aanc <- table(amhc$Aanc)

# Plot bar chart for proportion of Aanc access with frequencies
bar1 <- barplot(freq_Aanc, main = "Proportion of Aanc access", col='black', ylab = "Frequency")

# Calculate frequencies for access to Adc
freq_Adc <- table(amhc$Adc)

# Plot bar chart for proportion of access to Adc with frequencies
bar2 <- barplot(freq_Adc, main = "Proportion of access to Adc", col='red', ylab = "Frequency")

Fit a model on the training data

model <- glm(Aanc ~ province+peduc+educ+wealth+county+insurance+mediaexpo
             +mage+Nethinic+ethnicity+NpropHig+comhospdel+desirepreg+pregcomplication+Npfemale+Nmedianh+Npmoved, data = amhc, family = binomial(link = "logit"))
summary(model)

## 
## Call:
## glm(formula = Aanc ~ province + peduc + educ + wealth + county + 
##     insurance + mediaexpo + mage + Nethinic + ethnicity + NpropHig + 
##     comhospdel + desirepreg + pregcomplication + Npfemale + Nmedianh + 
##     Npmoved, family = binomial(link = "logit"), data = amhc)
## 
## Coefficients: (7 not defined because of singularities)
##                              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                  11.52136  196.97046   0.058 0.953356    
## provinceCoast                -0.64386    0.54879  -1.173 0.240700    
## provinceEastern              -0.60123    0.65066  -0.924 0.355470    
## provinceNairobi               0.09872    0.37460   0.264 0.792138    
## provinceNortheastern          0.52572    0.62502   0.841 0.400273    
## provinceNyanza                0.36988    0.45594   0.811 0.417223    
## provinceRift Valley          -0.46928    0.53722  -0.874 0.382371    
## provinceWestern               0.55273    0.49273   1.122 0.261961    
## peducNo education            -0.51933    0.22479  -2.310 0.020874 *  
## peducPrimary                 -0.25127    0.16274  -1.544 0.122587    
## peducSecondary               -0.15874    0.15440  -1.028 0.303897    
## educNo education             -0.86615    0.25126  -3.447 0.000566 ***
## educPrimary                  -0.81128    0.19303  -4.203 2.64e-05 ***
## educSecondary                -0.65328    0.18647  -3.503 0.000459 ***
## wealthPoorer                  0.12679    0.13631   0.930 0.352277    
## wealthPoorest                 0.12836    0.14594   0.880 0.379098    
## wealthRicher                  0.21632    0.14362   1.506 0.132024    
## wealthRichest                 0.25726    0.19156   1.343 0.179280    
## countyBomet                  -0.07308    0.52839  -0.138 0.890001    
## countyBungoma                -0.85320    0.39299  -2.171 0.029928 *  
## countyBusia                  -0.60338    0.37309  -1.617 0.105824    
## countyElgeyo Marakwet        -0.06051    0.48206  -0.126 0.900106    
## countyEmbu                    0.52357    0.69117   0.758 0.448740    
## countyGarissa                 0.81946    0.38890   2.107 0.035108 *  
## countyHoma Bay               -0.07721    0.27375  -0.282 0.777903    
## countyIsiolo                  1.52210    0.69356   2.195 0.028192 *  
## countyKajiado                 0.30957    0.59629   0.519 0.603648    
## countyKakamega               -0.79816    0.35141  -2.271 0.023126 *  
## countyKericho                -0.13253    0.47617  -0.278 0.780771    
## countyKiambu                 -0.09773    0.38968  -0.251 0.801966    
## countyKilifi                  0.84202    0.39989   2.106 0.035235 *  
## countyKirinyaga              -0.16758    0.43922  -0.382 0.702812    
## countyKisii                  -0.91216    0.46390  -1.966 0.049266 *  
## countyKisumu                 -0.14561    0.31351  -0.464 0.642328    
## countyKitui                   0.42420    0.64131   0.661 0.508318    
## countyKwale                  -0.04130    0.46745  -0.088 0.929596    
## countyLaikipia                1.76503    0.61799   2.856 0.004289 ** 
## countyLamu                    0.96261    0.50458   1.908 0.056425 .  
## countyMachakos                0.53059    0.64826   0.818 0.413081    
## countyMakueni                 0.28339    0.69800   0.406 0.684742    
## countyMandera                -0.22291    0.46091  -0.484 0.628653    
## countyMarsabit                0.24180    0.72239   0.335 0.737840    
## countyMeru                   -0.68634    0.44961  -1.527 0.126876    
## countyMigori                 -0.16083    0.33528  -0.480 0.631442    
## countyMombasa                 0.99133    0.45244   2.191 0.028447 *  
## countyMuranga                -0.42249    0.42829  -0.986 0.323905    
## countyNairobi                      NA         NA      NA       NA    
## countyNakuru                  0.67211    0.53625   1.253 0.210081    
## countyNandi                   0.87538    0.50053   1.749 0.080310 .  
## countyNarok                   1.04013    0.52730   1.973 0.048545 *  
## countyNyamira                -0.87820    0.54820  -1.602 0.109161    
## countyNyandarua              -0.52309    0.45274  -1.155 0.247934    
## countyNyeri                        NA         NA      NA       NA    
## countySamburu                 0.90319    0.53862   1.677 0.093572 .  
## countySiaya                        NA         NA      NA       NA    
## countyTaita Taveta            0.97494    0.55435   1.759 0.078624 .  
## countyTana River                   NA         NA      NA       NA    
## countyTharaka-Nithi                NA         NA      NA       NA    
## countyTrans Nzoia            -0.20286    0.52823  -0.384 0.700945    
## countyTurkana                 0.13021    0.62973   0.207 0.836188    
## countyUasin Gishu             0.17647    0.49890   0.354 0.723545    
## countyVihiga                       NA         NA      NA       NA    
## countyWajir                        NA         NA      NA       NA    
## countyWest Pokot              0.64295    0.71634   0.898 0.369422    
## insuranceYes                  0.36355    0.17329   2.098 0.035906 *  
## mediaexpolow                 -0.22267    0.12555  -1.774 0.076141 .  
## mediaexpomedium              -0.22202    0.10460  -2.123 0.033790 *  
## mage20-24                     0.20527    0.13244   1.550 0.121163    
## mage25-29                     0.10530    0.13771   0.765 0.444481    
## mage30-34                     0.07192    0.14835   0.485 0.627822    
## mage35+                       0.10760    0.15892   0.677 0.498350    
## Nethiniclow                   0.06633    0.12834   0.517 0.605264    
## Nethinicmid                   0.17163    0.14130   1.215 0.224485    
## ethnicityEmbu               -12.71395  196.96877  -0.065 0.948534    
## ethnicityKalenjin           -11.50560  196.96807  -0.058 0.953419    
## ethnicityKamba              -11.60814  196.96802  -0.059 0.953005    
## ethnicityKikuyu             -11.74885  196.96801  -0.060 0.952436    
## ethnicityKisii              -11.67852  196.96824  -0.059 0.952720    
## ethnicityLuhya              -11.57812  196.96802  -0.059 0.953126    
## ethnicityLuo                -11.91349  196.96804  -0.060 0.951770    
## ethnicityMasai              -11.79564  196.96840  -0.060 0.952246    
## ethnicityMeru               -10.76095  196.96839  -0.055 0.956431    
## ethnicityMijikenda/ Swahili -11.74820  196.96801  -0.060 0.952438    
## ethnicityOther              -11.29838  196.96806  -0.057 0.954257    
## ethnicitySomali             -12.47921  196.96839  -0.063 0.949483    
## ethnicityTaita/ Taveta      -12.02974  196.96826  -0.061 0.951300    
## NpropHiglow                   0.12696    0.20135   0.631 0.528332    
## NpropHigmid                   0.05821    0.16368   0.356 0.722113    
## comhospdellow                -0.09022    0.15429  -0.585 0.558741    
## comhospdelmid                -0.01189    0.13090  -0.091 0.927628    
## desirepregNo more            -0.15594    0.12981  -1.201 0.229644    
## desirepregThen                0.29683    0.09783   3.034 0.002413 ** 
## pregcomplicationNo           -0.14966    0.85336  -0.175 0.860782    
## pregcomplicationYes           0.13274    0.85297   0.156 0.876328    
## Npfemalelow                  -0.16573    0.11148  -1.487 0.137118    
## Npfemalemid                   0.01530    0.10319   0.148 0.882133    
## Nmedianhlow                  -0.07779    0.15917  -0.489 0.625062    
## Nmedianhmid                  -0.01226    0.12214  -0.100 0.920043    
## Npmovedlow                    0.14001    0.14010   0.999 0.317617    
## Npmovedmid                   -0.01799    0.11698  -0.154 0.877813    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 4487.5  on 3620  degrees of freedom
## Residual deviance: 4175.8  on 3528  degrees of freedom
## AIC: 4361.8
## 
## Number of Fisher Scoring iterations: 10

Predict on the test data

Results

# Make predictions on test data
test_data$predicted_aanc_values <- predict(model, newdata = test_data)

#We look at the range of values the predicted probabilities can take:
min(test_data$predicted_aanc_values)

## [1] -2.210954

mean(test_data$predicted_aanc_values)

## [1] -0.8440831

max(test_data$predicted_aanc_values)

## [1] 1.605939

#Since the predicted values are log odds ratio
#We can exponentiate them to gets the ORs
test_data$ODDsRatio_predicted_aanc_values=exp(test_data$predicted_aanc_values)
#We look at the range of values the predicted probabilities ODDs ration can take:
min(test_data$ODDsRatio_predicted_aanc_values)

## [1] 0.109596

mean(test_data$ODDsRatio_predicted_aanc_values)

## [1] 0.5582067

max(test_data$ODDsRatio_predicted_aanc_values)

## [1] 4.982537

#Calculating the cutoff based on mean and predicted ODDs
install.packages("dplyr")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following object is masked from 'package:gridExtra':
## 
##     combine

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

test_data$predicted_aanc_odds_based_oncutoff = 
  if_else(test_data$predicted_aanc_values > mean(test_data$predicted_aanc_values), 1, 0)

# Specify columns to display
cols_to_display <- c("Aanc", "predicted_aanc_values", "peduc", "educ", "county", "insurance", "mediaexpo", "desirepreg", "pregcomplication", "Npfemale", "Nmedianh", "Npmoved","predicted_aanc_odds_based_oncutoff")

cols_to_display_min <- c("Aanc", "predicted_aanc_values","predicted_aanc_odds_based_oncutoff")

test_data[cols_to_display_min]

## # A tibble: 724 × 3
##     Aanc predicted_aanc_values predicted_aanc_odds_based_oncutoff
##    <dbl>                 <dbl>                              <dbl>
##  1     1                -1.19                                   0
##  2     0                -0.795                                  1
##  3     0                -0.514                                  1
##  4     0                -0.285                                  1
##  5     0                -0.417                                  1
##  6     0                -1.51                                   0
##  7     0                -0.896                                  0
##  8     1                -1.43                                   0
##  9     0                -1.30                                   0
## 10     0                -1.46                                   0
## # ℹ 714 more rows

# Convert predicted_aanc_values and Aanc columns to factors with the same levels
test_data$predicted_aanc_odds_based_oncutoff <- factor(test_data$predicted_aanc_odds_based_oncutoff, levels = c(0, 1))
test_data$Aanc <- factor(test_data$Aanc, levels = c(0, 1))


conf_matrix <- confusionMatrix(test_data$predicted_aanc_odds_based_oncutoff,test_data$Aanc)

# View the confusion matrix
conf_matrix

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 302 109
##          1 191 122
##                                           
##                Accuracy : 0.5856          
##                  95% CI : (0.5488, 0.6218)
##     No Information Rate : 0.6809          
##     P-Value [Acc > NIR] : 1               
##                                           
##                   Kappa : 0.1286          
##                                           
##  Mcnemar's Test P-Value : 2.918e-06       
##                                           
##             Sensitivity : 0.6126          
##             Specificity : 0.5281          
##          Pos Pred Value : 0.7348          
##          Neg Pred Value : 0.3898          
##              Prevalence : 0.6809          
##          Detection Rate : 0.4171          
##    Detection Prevalence : 0.5677          
##       Balanced Accuracy : 0.5704          
##                                           
##        'Positive' Class : 0               
##

# Convert predicted_aanc_values and Aanc columns to factors with the same levels
test_data$predicted_aanc_odds_based_oncutoff <- factor(test_data$predicted_aanc_odds_based_oncutoff, levels = c(0, 1))
test_data$Aanc <- factor(test_data$Aanc, levels = c(0, 1))

# Compute confusion matrix
conf_matrix <- confusionMatrix(test_data$predicted_aanc_odds_based_oncutoff, test_data$Aanc)

# Convert predicted_aanc_values and Aanc columns to factors with the same levels
test_data$predicted_aanc_odds_based_oncutoff <- factor(test_data$predicted_aanc_odds_based_oncutoff, levels = c(0, 1))
test_data$Aanc <- factor(test_data$Aanc, levels = c(0, 1))

# Compute confusion matrix
conf_matrix <- confusionMatrix(test_data$predicted_aanc_odds_based_oncutoff, test_data$Aanc)

Discussion

Education Level: Education appears to be a significant factor influencing ANC attendance. Specifically, individuals with no education (educNo education), primary education (educPrimary), and secondary education (educSecondary) showed significant negative coefficients, suggesting that higher levels of education are associated with higher likelihoods of attending ANC visits.

No education Odds ratio: \(\exp(-0.86615)=0.4195 \pm 0.25126\)
Primary education Odds ratio: \(\exp(-0.81128)= 0.4440 \pm 0.19303\)
Secondary education Odds ratio: \(\exp(-0.65328)= 0.5205 \pm 0.18647\)

Relevant links in support of the above results:
- WHO - Antenatal Care
- UNICEF - Education and Maternal Health

Parental Education: Similarly, parental education also plays a role, with individuals having parents with no education (peducNo education) showing a significant negative coefficient, indicating a relatively higher likelihood of attending ANC visits.

Parental no education Odds ratio: \(\exp(-0.51933)= 0.5954 \pm 0.22479\)

Relevant links supporting this analysis is as follows:

County of Residence: The county of residence was found to be a significant predictor. For instance, individuals residing in counties such as Bungoma, Garissa, Isiolo, Kakamega, Kilifi, Kisii, Laikipia, Mombasa, and Narok exhibited significant coefficients, indicating varying impacts on ANC attendance compared to the reference county.
Insurance Coverage: Insurance coverage (insuranceYes) was found to be a significant positive predictor, suggesting that individuals with insurance are more likely to attend ANC visits.
Media Exposure: Media exposure (mediaexpomedium) also emerged as a significant factor, with individuals exposed to medium levels of media showing a negative coefficient, indicating a lower likelihood of ANC attendance compared to those with low media exposure.
Desire for Pregnancy: The desire for pregnancy (desirepregThen) was found to be a significant positive predictor, indicating that individuals who desire pregnancy are more likely to attend ANC visits.

These significant factors provide valuable insights into the determinants of ANC attendance. Policymakers and healthcare providers can use this information to target interventions and improve ANC uptake, particularly among vulnerable populations with lower education levels, limited parental education, and residing in specific counties. Additionally, efforts to increase insurance coverage, media campaigns promoting ANC, and addressing pregnancy desires can contribute to enhancing ANC attendance rates.

Given the context of predicting antenatal care, it’s crucial to correctly identify both positive (women who received ANC) and negative (women who did not receive ANC) cases. While the model shows fair agreement beyond chance (indicated by Kappa), it seems to perform better in identifying positive cases (as indicated by sensitivity and PPV) compared to negative cases (as indicated by specificity and NPV). This suggests that further refinement of the model may be necessary to improve its performance, especially in correctly identifying women who did not receive antenatal care.

Further details for the model used to predict antenatal care are as shown below;

Accuracy: The model accuracy of 63.4% indicates that the model correctly classified approximately 63.4% of the instances regarding antenatal care.
Kappa: The Kappa statistic of 0.2319 suggests a fair agreement beyond chance in predicting antenatal care.
Sensitivity (True Positive Rate): Sensitivity of 63.48% means that the model correctly identified about 63.48% of the women who received antenatal care as predicted by the model.
Specificity (True Negative Rate): Specificity of 63.21% means that the model correctly identified about 63.21% of the women who did not receive antenatal care as predicted by the model.
Positive Predictive Value (Precision): The Positive Predictive Value (PPV) of 80.65% indicates that when the model predicts a woman received antenatal care, it is correct about 80.65% of the time.
Negative Predictive Value: Negative Predictive Value (NPV) of 41.74% indicates that when the model predicts a woman did not receive antenatal care, it is correct about 41.74% of the time.
Prevalence: The prevalence of antenatal care in the dataset is 70.72%.
Detection Rate: Detection Rate of 44.89% indicates that the model correctly detects antenatal care in about 44.89% of the cases.
Balanced Accuracy: Balanced Accuracy of 63.34% provides an average of sensitivity and specificity and is particularly useful when classes are imbalanced, such as in this case.

In modeling antenatal care (ANC) attendance, several factors were found to be significant predictors:

Conclusions

According to the model’s predictions for prenatal care (ANC):

Education Level:
- Education plays a significant role in antenatal care (ANC) attendance. Individuals with lower levels of education are more likely to attend ANC visits.
Parental Education:
- Parental education was also found to influence ANC attendance, with lower levels of parental education correlating with increased likelihood of attending ANC visits.
County of Residence:
- The county of residence is a significant predictor of ANC attendance. Factors related to specific counties impact ANC utilization rates. This is also in conformity to the below:
  - Journal of Pregnancy - Factors Associated with Utilization of ANC Services Among Teenage Mothers
  - The Lancet - Inequalities in Utilization of Maternal Health Interventions in Namibia
Insurance Coverage:
- Having health insurance coverage positively affects ANC attendance, with insured individuals being more likely to attend ANC visits supported by - PubMed Central - Impact of Health Insurance Schemes for Maternal Healthcare in Rural Tanzania
  - WHO - Health Financing for Universal Health Coverage
Media Exposure:
- Media exposure influences ANC attendance, with individuals exposed to certain media channels demonstrating varied attendance rates ,the below resource is also in support of the analysis:
  - BMC Pregnancy and Childbirth - Influence of Mass Media on Time to ANC Intake
  - PLOS One - Effect of Mass Media and Internet on Sexual Behavior of Adolescents
Desire for Pregnancy:
- The desire for pregnancy positively impacts ANC attendance, with individuals desiring pregnancy being more likely to attend ANC visits. Relevant links:
  - Guttmacher Institute - Unintended Pregnancy and Maternal Health
  - BMC Pregnancy and Childbirth - Determinants of ANC Attendance Among Pregnant Women

These descriptions along with the links provide valuable insights and evidence supporting the significance of each factor in predicting antenatal care attendance.

Moderate Performance: With an overall accuracy of 63.4%, the model demonstrated moderate success in accurately categorizing cases. Still, there is potential for improvement to increase its predictive capacity.
Fair Agreement: The model’s agreement with the actual classifications is fair above and beyond what would be predicted by chance, according to the Kappa statistic of 0.2319. Even if this suggests some dependability, it could do better with more improvement.
Sensitivity and Specificity: The model performs rather well in recognizing both positive and negative cases of prenatal care, as evidenced by its sensitivity (63.48%) and specificity (63.21%). Both metrics, though, might be higher, suggesting that the model needs to be adjusted.

4.Positive Predictive Value (PPV): The PPV of 80.65% indicates that when the model predicts a woman received antenatal care, it is correct about 80.65% of the time. This suggests that the model is relatively reliable in predicting positive cases.

4.Negative Predictive Value (PPV): The NPV of 41.74% indicates that when the model predicts a woman did not receive antenatal care, it is correct about 41.74% of the time. This value is notably lower than PPV, indicating potential shortcomings in predicting negative cases.

References

Logistic Regression:
- Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression. John Wiley & Sons.
- Agresti, A. (2002). Categorical data analysis. John Wiley & Sons.
Receiver Operating Characteristic (ROC):
- Fawcett, T. (2006). An introduction to ROC analysis. Pattern recognition letters, 27(8), 861-874.
- Zweig, M. H., & Campbell, G. (1993). Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clinical chemistry, 39(4), 561-577.