1. Introduction

This project analyzes the impact of COVID-19 on education using statistical and data science techniques. The goal is to understand dropout patterns and identify key influencing factors.


2. Load Data

data <- read.csv("C:/Users/micha/OneDrive/Desktop/open_one_time_covid_education_impact.csv")

3. Basic Data Exploration

str(data)
## 'data.frame':    4436 obs. of  27 variables:
##  $ submission_id                                                         : num  4.57e+15 6.44e+15 5.00e+15 5.52e+15 5.03e+15 ...
##  $ submission_date                                                       : chr  "2021-03-17" "2021-03-29" "2021-03-18" "2021-03-24" ...
##  $ gender                                                                : chr  "Female" "Male" "Female" "Male" ...
##  $ age                                                                   : chr  "Over 45 years old" "26 to 35 years old" "26 to 35 years old" "36 to 45 years old" ...
##  $ geography                                                             : chr  "Suburban/Peri-urban" "Suburban/Peri-urban" "City center or metropolitan area" "Suburban/Peri-urban" ...
##  $ financial_situation                                                   : chr  "I can afford food and regular expenses, but nothing else" "I cannot afford enough food for my family" "I can comfortably afford food, clothes, and furniture, and I have savings" "I can afford food, but nothing else" ...
##  $ education                                                             : chr  "University or college degree completed" "University or college degree completed" "University or college degree completed" "University or college degree completed" ...
##  $ employment_status                                                     : chr  "I am unemployed" "I am unemployed" "I work full-time, either as an employee or self-employed" "I work full-time, either as an employee or self-employed" ...
##  $ submission_state                                                      : chr  "Miranda" "Miranda" "Miranda" "Miranda" ...
##  $ are_there_children_0_to_2_yrs_out_of_educational_system               : int  0 0 1 0 0 0 0 0 0 1 ...
##  $ were_children_3_to_17_yrs_enrolled_and_did_not_return_to_school       : int  1 1 1 0 1 0 1 0 0 1 ...
##  $ are_there_children_who_stopped_enrolling_in_primary_education         : int  1 0 1 0 0 1 0 0 0 0 ...
##  $ are_there_children_who_stopped_enrolling_in_secondary_education       : int  0 0 1 0 0 1 0 0 0 0 ...
##  $ are_children_attending_face_to_face_classes                           : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ can_children_observe_deterioration_of_basic_services_of_school        : int  1 1 1 1 1 0 1 1 1 1 ...
##  $ do_children_3_and_17_yrs_receive_regular_school_meals                 : chr  "Every day" "No" "No" "No" ...
##  $ are_there_teachers_at_scheduled_class_hours                           : chr  "Irregularly" "Irregularly" "There are not enough" "There are enough" ...
##  $ are_children_3_to_17_yrs_dealing_with_irregular_school_activity       : int  0 1 1 1 1 0 1 1 0 0 ...
##  $ are_children_being_teached_by_unqualified_people                      : int  0 0 1 1 0 1 0 0 1 0 ...
##  $ did_teachers_leave_the_educational_system                             : int  0 1 1 1 1 1 0 1 1 0 ...
##  $ do_school_and_the_teachers_have_internet_connection                   : int  1 0 0 0 0 1 1 0 1 1 ...
##  $ do_children_have_internet_connection                                  : int  1 1 1 1 1 0 1 0 0 1 ...
##  $ do_children_3_to_17_yrs_miss_virtual_class_due_to_lack_of_electricity : int  0 1 0 0 1 0 0 1 1 0 ...
##  $ does_home_shows_severe_deficit_of_electricity                         : int  0 0 1 0 0 0 0 0 0 1 ...
##  $ does_home_shows_severe_deficit_of_internet                            : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ do_children_3_to_17_yrs_miss_class_or_in_lower_grade                  : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ are_children_promoted_with_a_modality_different_from_formal_evaluation: int  0 0 1 0 1 1 0 0 1 0 ...
summary(data)
##  submission_id       submission_date       gender              age           
##  Min.   :4.504e+15   Length:4436        Length:4436        Length:4436       
##  1st Qu.:5.077e+15   Class :character   Class :character   Class :character  
##  Median :5.642e+15   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :5.633e+15                                                           
##  3rd Qu.:6.188e+15                                                           
##  Max.   :6.755e+15                                                           
##   geography         financial_situation  education         employment_status 
##  Length:4436        Length:4436         Length:4436        Length:4436       
##  Class :character   Class :character    Class :character   Class :character  
##  Mode  :character   Mode  :character    Mode  :character   Mode  :character  
##                                                                              
##                                                                              
##                                                                              
##  submission_state   are_there_children_0_to_2_yrs_out_of_educational_system
##  Length:4436        Min.   :0.0000                                         
##  Class :character   1st Qu.:0.0000                                         
##  Mode  :character   Median :0.0000                                         
##                     Mean   :0.2949                                         
##                     3rd Qu.:1.0000                                         
##                     Max.   :1.0000                                         
##  were_children_3_to_17_yrs_enrolled_and_did_not_return_to_school
##  Min.   :0.0000                                                 
##  1st Qu.:0.0000                                                 
##  Median :1.0000                                                 
##  Mean   :0.6132                                                 
##  3rd Qu.:1.0000                                                 
##  Max.   :1.0000                                                 
##  are_there_children_who_stopped_enrolling_in_primary_education
##  Min.   :0.0000                                               
##  1st Qu.:0.0000                                               
##  Median :0.0000                                               
##  Mean   :0.2065                                               
##  3rd Qu.:0.0000                                               
##  Max.   :1.0000                                               
##  are_there_children_who_stopped_enrolling_in_secondary_education
##  Min.   :0.0000                                                 
##  1st Qu.:0.0000                                                 
##  Median :0.0000                                                 
##  Mean   :0.1943                                                 
##  3rd Qu.:0.0000                                                 
##  Max.   :1.0000                                                 
##  are_children_attending_face_to_face_classes
##  Min.   :0.0000                             
##  1st Qu.:0.0000                             
##  Median :0.0000                             
##  Mean   :0.1637                             
##  3rd Qu.:0.0000                             
##  Max.   :1.0000                             
##  can_children_observe_deterioration_of_basic_services_of_school
##  Min.   :0.0000                                                
##  1st Qu.:1.0000                                                
##  Median :1.0000                                                
##  Mean   :0.8005                                                
##  3rd Qu.:1.0000                                                
##  Max.   :1.0000                                                
##  do_children_3_and_17_yrs_receive_regular_school_meals
##  Length:4436                                          
##  Class :character                                     
##  Mode  :character                                     
##                                                       
##                                                       
##                                                       
##  are_there_teachers_at_scheduled_class_hours
##  Length:4436                                
##  Class :character                           
##  Mode  :character                           
##                                             
##                                             
##                                             
##  are_children_3_to_17_yrs_dealing_with_irregular_school_activity
##  Min.   :0.0000                                                 
##  1st Qu.:0.0000                                                 
##  Median :1.0000                                                 
##  Mean   :0.6431                                                 
##  3rd Qu.:1.0000                                                 
##  Max.   :1.0000                                                 
##  are_children_being_teached_by_unqualified_people
##  Min.   :0.0000                                  
##  1st Qu.:0.0000                                  
##  Median :0.0000                                  
##  Mean   :0.3165                                  
##  3rd Qu.:1.0000                                  
##  Max.   :1.0000                                  
##  did_teachers_leave_the_educational_system
##  Min.   :0.0000                           
##  1st Qu.:0.0000                           
##  Median :1.0000                           
##  Mean   :0.6643                           
##  3rd Qu.:1.0000                           
##  Max.   :1.0000                           
##  do_school_and_the_teachers_have_internet_connection
##  Min.   :0.0000                                     
##  1st Qu.:0.0000                                     
##  Median :1.0000                                     
##  Mean   :0.5604                                     
##  3rd Qu.:1.0000                                     
##  Max.   :1.0000                                     
##  do_children_have_internet_connection
##  Min.   :0.0000                      
##  1st Qu.:0.0000                      
##  Median :1.0000                      
##  Mean   :0.6285                      
##  3rd Qu.:1.0000                      
##  Max.   :1.0000                      
##  do_children_3_to_17_yrs_miss_virtual_class_due_to_lack_of_electricity
##  Min.   :0.0000                                                       
##  1st Qu.:0.0000                                                       
##  Median :1.0000                                                       
##  Mean   :0.6655                                                       
##  3rd Qu.:1.0000                                                       
##  Max.   :1.0000                                                       
##  does_home_shows_severe_deficit_of_electricity
##  Min.   :0.0000                               
##  1st Qu.:0.0000                               
##  Median :0.0000                               
##  Mean   :0.2845                               
##  3rd Qu.:1.0000                               
##  Max.   :1.0000                               
##  does_home_shows_severe_deficit_of_internet
##  Min.   :0.0000                            
##  1st Qu.:0.0000                            
##  Median :1.0000                            
##  Mean   :0.5791                            
##  3rd Qu.:1.0000                            
##  Max.   :1.0000                            
##  do_children_3_to_17_yrs_miss_class_or_in_lower_grade
##  Min.   :0.0000                                      
##  1st Qu.:0.0000                                      
##  Median :0.0000                                      
##  Mean   :0.2464                                      
##  3rd Qu.:0.0000                                      
##  Max.   :1.0000                                      
##  are_children_promoted_with_a_modality_different_from_formal_evaluation
##  Min.   :0.0000                                                        
##  1st Qu.:0.0000                                                        
##  Median :0.0000                                                        
##  Mean   :0.4272                                                        
##  3rd Qu.:1.0000                                                        
##  Max.   :1.0000
colnames(data)
##  [1] "submission_id"                                                         
##  [2] "submission_date"                                                       
##  [3] "gender"                                                                
##  [4] "age"                                                                   
##  [5] "geography"                                                             
##  [6] "financial_situation"                                                   
##  [7] "education"                                                             
##  [8] "employment_status"                                                     
##  [9] "submission_state"                                                      
## [10] "are_there_children_0_to_2_yrs_out_of_educational_system"               
## [11] "were_children_3_to_17_yrs_enrolled_and_did_not_return_to_school"       
## [12] "are_there_children_who_stopped_enrolling_in_primary_education"         
## [13] "are_there_children_who_stopped_enrolling_in_secondary_education"       
## [14] "are_children_attending_face_to_face_classes"                           
## [15] "can_children_observe_deterioration_of_basic_services_of_school"        
## [16] "do_children_3_and_17_yrs_receive_regular_school_meals"                 
## [17] "are_there_teachers_at_scheduled_class_hours"                           
## [18] "are_children_3_to_17_yrs_dealing_with_irregular_school_activity"       
## [19] "are_children_being_teached_by_unqualified_people"                      
## [20] "did_teachers_leave_the_educational_system"                             
## [21] "do_school_and_the_teachers_have_internet_connection"                   
## [22] "do_children_have_internet_connection"                                  
## [23] "do_children_3_to_17_yrs_miss_virtual_class_due_to_lack_of_electricity" 
## [24] "does_home_shows_severe_deficit_of_electricity"                         
## [25] "does_home_shows_severe_deficit_of_internet"                            
## [26] "do_children_3_to_17_yrs_miss_class_or_in_lower_grade"                  
## [27] "are_children_promoted_with_a_modality_different_from_formal_evaluation"
dim(data)
## [1] 4436   27
sum(is.na(data))
## [1] 0
colSums(is.na(data))
##                                                          submission_id 
##                                                                      0 
##                                                        submission_date 
##                                                                      0 
##                                                                 gender 
##                                                                      0 
##                                                                    age 
##                                                                      0 
##                                                              geography 
##                                                                      0 
##                                                    financial_situation 
##                                                                      0 
##                                                              education 
##                                                                      0 
##                                                      employment_status 
##                                                                      0 
##                                                       submission_state 
##                                                                      0 
##                are_there_children_0_to_2_yrs_out_of_educational_system 
##                                                                      0 
##        were_children_3_to_17_yrs_enrolled_and_did_not_return_to_school 
##                                                                      0 
##          are_there_children_who_stopped_enrolling_in_primary_education 
##                                                                      0 
##        are_there_children_who_stopped_enrolling_in_secondary_education 
##                                                                      0 
##                            are_children_attending_face_to_face_classes 
##                                                                      0 
##         can_children_observe_deterioration_of_basic_services_of_school 
##                                                                      0 
##                  do_children_3_and_17_yrs_receive_regular_school_meals 
##                                                                      0 
##                            are_there_teachers_at_scheduled_class_hours 
##                                                                      0 
##        are_children_3_to_17_yrs_dealing_with_irregular_school_activity 
##                                                                      0 
##                       are_children_being_teached_by_unqualified_people 
##                                                                      0 
##                              did_teachers_leave_the_educational_system 
##                                                                      0 
##                    do_school_and_the_teachers_have_internet_connection 
##                                                                      0 
##                                   do_children_have_internet_connection 
##                                                                      0 
##  do_children_3_to_17_yrs_miss_virtual_class_due_to_lack_of_electricity 
##                                                                      0 
##                          does_home_shows_severe_deficit_of_electricity 
##                                                                      0 
##                             does_home_shows_severe_deficit_of_internet 
##                                                                      0 
##                   do_children_3_to_17_yrs_miss_class_or_in_lower_grade 
##                                                                      0 
## are_children_promoted_with_a_modality_different_from_formal_evaluation 
##                                                                      0

Insight: The dataset includes both categorical and numerical variables. Missing values highlight the need for preprocessing.


4. Data Preparation

data$gender <- as.factor(data$gender)
data$age <- as.factor(data$age)
data$geography <- as.factor(data$geography)
data$education <- as.factor(data$education)
data$employment_status <- as.factor(data$employment_status)

Insight: Categorical variables are converted into factors for proper statistical modeling.


5. Descriptive Analysis

mean(data$do_children_have_internet_connection) * 100
## [1] 62.84941
table(data$do_children_3_to_17_yrs_miss_virtual_class_due_to_lack_of_electricity)
## 
##    0    1 
## 1484 2952
table(data$were_children_3_to_17_yrs_enrolled_and_did_not_return_to_school)
## 
##    0    1 
## 1716 2720
mean(data$were_children_3_to_17_yrs_enrolled_and_did_not_return_to_school) * 100
## [1] 61.3165
table(data$are_children_attending_face_to_face_classes)
## 
##    0    1 
## 3710  726
table(data$age, data$were_children_3_to_17_yrs_enrolled_and_did_not_return_to_school)
##                     
##                        0   1
##   16 to 25 years old 549 753
##   26 to 35 years old 464 840
##   36 to 45 years old 413 674
##   Not Available        1   2
##   Over 45 years old  289 450
##   Under 16             0   1
table(data$financial_situation)
## 
##                        I can afford food and regular expenses, but nothing else 
##                                                                            1060 
##                                             I can afford food, but nothing else 
##                                                                            1445 
##              I can afford food, regular expenses, and clothes, but nothing else 
##                                                                             244 
##       I can comfortably afford food, clothes, and furniture, and I have savings 
##                                                                             157 
## I can comfortably afford food, clothes, and furniture, but I don’t have savings 
##                                                                             127 
##                                       I cannot afford enough food for my family 
##                                                                            1163 
##                                                                   Not Available 
##                                                                               1 
##                                                            Prefer not to answer 
##                                                                             239

Insight: Shows internet accessibility, dropout levels, and financial distribution among students.


6. Diagnostic Analysis

table(data$do_children_have_internet_connection,
      data$do_children_3_to_17_yrs_miss_virtual_class_due_to_lack_of_electricity)
##    
##        0    1
##   0  485 1163
##   1  999 1789
table(data$does_home_shows_severe_deficit_of_electricity,
      data$do_children_3_to_17_yrs_miss_virtual_class_due_to_lack_of_electricity)
##    
##        0    1
##   0 1270 1904
##   1  214 1048
table(data$financial_situation,
      data$were_children_3_to_17_yrs_enrolled_and_did_not_return_to_school)
##                                                                                  
##                                                                                     0
##   I can afford food and regular expenses, but nothing else                        442
##   I can afford food, but nothing else                                             530
##   I can afford food, regular expenses, and clothes, but nothing else               89
##   I can comfortably afford food, clothes, and furniture, and I have savings        56
##   I can comfortably afford food, clothes, and furniture, but I don’t have savings  54
##   I cannot afford enough food for my family                                       434
##   Not Available                                                                     1
##   Prefer not to answer                                                            110
##                                                                                  
##                                                                                     1
##   I can afford food and regular expenses, but nothing else                        618
##   I can afford food, but nothing else                                             915
##   I can afford food, regular expenses, and clothes, but nothing else              155
##   I can comfortably afford food, clothes, and furniture, and I have savings       101
##   I can comfortably afford food, clothes, and furniture, but I don’t have savings  73
##   I cannot afford enough food for my family                                       729
##   Not Available                                                                     0
##   Prefer not to answer                                                            129
table(data$does_home_shows_severe_deficit_of_internet,
      data$were_children_3_to_17_yrs_enrolled_and_did_not_return_to_school)
##    
##        0    1
##   0  828 1039
##   1  888 1681
table(data$geography,
      data$were_children_3_to_17_yrs_enrolled_and_did_not_return_to_school)
##                                   
##                                       0    1
##   City center or metropolitan area  748 1172
##   Not Available                       1    0
##   Rural                             406  735
##   Suburban/Peri-urban               561  813
table(data$are_children_3_to_17_yrs_dealing_with_irregular_school_activity,
      data$were_children_3_to_17_yrs_enrolled_and_did_not_return_to_school)
##    
##        0    1
##   0  748  835
##   1  968 1885
table(data$did_teachers_leave_the_educational_system,
      data$are_children_3_to_17_yrs_dealing_with_irregular_school_activity)
##    
##        0    1
##   0  934  555
##   1  649 2298

Insight: Infrastructure issues (internet/electricity) and financial status strongly influence dropout.


7. Predictive Analysis

data$internet_access <- as.factor(data$do_children_have_internet_connection)
data$return_to_school <- as.factor(data$were_children_3_to_17_yrs_enrolled_and_did_not_return_to_school)
data$irregular_activity <- as.factor(data$are_children_3_to_17_yrs_dealing_with_irregular_school_activity)
data$electricity_issue <- as.factor(data$does_home_shows_severe_deficit_of_electricity)
data$financial_status <- as.factor(data$financial_situation)

model1 <- glm(do_children_3_to_17_yrs_miss_virtual_class_due_to_lack_of_electricity ~ internet_access,
              data = data, family = "binomial")

model2 <- glm(return_to_school ~ internet_access + electricity_issue + financial_status + geography,
              data = data, family = "binomial")

model3 <- glm(irregular_activity ~ internet_access + electricity_issue + financial_status,
              data = data, family = "binomial")

summary(model1)
## 
## Call:
## glm(formula = do_children_3_to_17_yrs_miss_virtual_class_due_to_lack_of_electricity ~ 
##     internet_access, family = "binomial", data = data)
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       0.87461    0.05405  16.181  < 2e-16 ***
## internet_access1 -0.29195    0.06695  -4.361 1.29e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 5654.5  on 4435  degrees of freedom
## Residual deviance: 5635.3  on 4434  degrees of freedom
## AIC: 5639.3
## 
## Number of Fisher Scoring iterations: 4
summary(model2)
## 
## Call:
## glm(formula = return_to_school ~ internet_access + electricity_issue + 
##     financial_status + geography, family = "binomial", data = data)
## 
## Coefficients: (1 not defined because of singularities)
##                                                                                                  Estimate
## (Intercept)                                                                                       0.17902
## internet_access1                                                                                  0.09999
## electricity_issue1                                                                                0.42152
## financial_statusI can afford food, but nothing else                                               0.18449
## financial_statusI can afford food, regular expenses, and clothes, but nothing else                0.22088
## financial_statusI can comfortably afford food, clothes, and furniture, and I have savings         0.24278
## financial_statusI can comfortably afford food, clothes, and furniture, but I don’t have savings  -0.06844
## financial_statusI cannot afford enough food for my family                                         0.13011
## financial_statusNot Available                                                                   -11.84506
## financial_statusPrefer not to answer                                                             -0.18498
## geographyNot Available                                                                                 NA
## geographyRural                                                                                    0.10113
## geographySuburban/Peri-urban                                                                     -0.07416
##                                                                                                 Std. Error
## (Intercept)                                                                                        0.08630
## internet_access1                                                                                   0.06518
## electricity_issue1                                                                                 0.07147
## financial_statusI can afford food, but nothing else                                                0.08388
## financial_statusI can afford food, regular expenses, and clothes, but nothing else                 0.14751
## financial_statusI can comfortably afford food, clothes, and furniture, and I have savings          0.17919
## financial_statusI can comfortably afford food, clothes, and furniture, but I don’t have savings    0.19128
## financial_statusI cannot afford enough food for my family                                          0.08908
## financial_statusNot Available                                                                    196.96769
## financial_statusPrefer not to answer                                                               0.14494
## geographyNot Available                                                                                  NA
## geographyRural                                                                                     0.08023
## geographySuburban/Peri-urban                                                                       0.07303
##                                                                                                 z value
## (Intercept)                                                                                       2.074
## internet_access1                                                                                  1.534
## electricity_issue1                                                                                5.898
## financial_statusI can afford food, but nothing else                                               2.199
## financial_statusI can afford food, regular expenses, and clothes, but nothing else                1.497
## financial_statusI can comfortably afford food, clothes, and furniture, and I have savings         1.355
## financial_statusI can comfortably afford food, clothes, and furniture, but I don’t have savings  -0.358
## financial_statusI cannot afford enough food for my family                                         1.461
## financial_statusNot Available                                                                    -0.060
## financial_statusPrefer not to answer                                                             -1.276
## geographyNot Available                                                                               NA
## geographyRural                                                                                    1.260
## geographySuburban/Peri-urban                                                                     -1.015
##                                                                                                 Pr(>|z|)
## (Intercept)                                                                                       0.0380
## internet_access1                                                                                  0.1250
## electricity_issue1                                                                              3.69e-09
## financial_statusI can afford food, but nothing else                                               0.0279
## financial_statusI can afford food, regular expenses, and clothes, but nothing else                0.1343
## financial_statusI can comfortably afford food, clothes, and furniture, and I have savings         0.1755
## financial_statusI can comfortably afford food, clothes, and furniture, but I don’t have savings   0.7205
## financial_statusI cannot afford enough food for my family                                         0.1441
## financial_statusNot Available                                                                     0.9520
## financial_statusPrefer not to answer                                                              0.2019
## geographyNot Available                                                                                NA
## geographyRural                                                                                    0.2075
## geographySuburban/Peri-urban                                                                      0.3099
##                                                                                                    
## (Intercept)                                                                                     *  
## internet_access1                                                                                   
## electricity_issue1                                                                              ***
## financial_statusI can afford food, but nothing else                                             *  
## financial_statusI can afford food, regular expenses, and clothes, but nothing else                 
## financial_statusI can comfortably afford food, clothes, and furniture, and I have savings          
## financial_statusI can comfortably afford food, clothes, and furniture, but I don’t have savings    
## financial_statusI cannot afford enough food for my family                                          
## financial_statusNot Available                                                                      
## financial_statusPrefer not to answer                                                               
## geographyNot Available                                                                             
## geographyRural                                                                                     
## geographySuburban/Peri-urban                                                                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 5920.4  on 4435  degrees of freedom
## Residual deviance: 5861.5  on 4424  degrees of freedom
## AIC: 5885.5
## 
## Number of Fisher Scoring iterations: 10
summary(model3)
## 
## Call:
## glm(formula = irregular_activity ~ internet_access + electricity_issue + 
##     financial_status, family = "binomial", data = data)
## 
## Coefficients:
##                                                                                                   Estimate
## (Intercept)                                                                                       0.645181
## internet_access1                                                                                 -0.366553
## electricity_issue1                                                                                0.900427
## financial_statusI can afford food, but nothing else                                              -0.056385
## financial_statusI can afford food, regular expenses, and clothes, but nothing else               -0.084548
## financial_statusI can comfortably afford food, clothes, and furniture, and I have savings        -0.002933
## financial_statusI can comfortably afford food, clothes, and furniture, but I don’t have savings  -0.323386
## financial_statusI cannot afford enough food for my family                                         0.023017
## financial_statusNot Available                                                                    11.287426
## financial_statusPrefer not to answer                                                             -0.361595
##                                                                                                 Std. Error
## (Intercept)                                                                                       0.082887
## internet_access1                                                                                  0.068379
## electricity_issue1                                                                                0.078209
## financial_statusI can afford food, but nothing else                                               0.086171
## financial_statusI can afford food, regular expenses, and clothes, but nothing else                0.149326
## financial_statusI can comfortably afford food, clothes, and furniture, and I have savings         0.181161
## financial_statusI can comfortably afford food, clothes, and furniture, but I don’t have savings   0.195118
## financial_statusI cannot afford enough food for my family                                         0.092080
## financial_statusNot Available                                                                   196.967691
## financial_statusPrefer not to answer                                                              0.147988
##                                                                                                 z value
## (Intercept)                                                                                       7.784
## internet_access1                                                                                 -5.361
## electricity_issue1                                                                               11.513
## financial_statusI can afford food, but nothing else                                              -0.654
## financial_statusI can afford food, regular expenses, and clothes, but nothing else               -0.566
## financial_statusI can comfortably afford food, clothes, and furniture, and I have savings        -0.016
## financial_statusI can comfortably afford food, clothes, and furniture, but I don’t have savings  -1.657
## financial_statusI cannot afford enough food for my family                                         0.250
## financial_statusNot Available                                                                     0.057
## financial_statusPrefer not to answer                                                             -2.443
##                                                                                                 Pr(>|z|)
## (Intercept)                                                                                     7.04e-15
## internet_access1                                                                                8.29e-08
## electricity_issue1                                                                               < 2e-16
## financial_statusI can afford food, but nothing else                                               0.5129
## financial_statusI can afford food, regular expenses, and clothes, but nothing else                0.5713
## financial_statusI can comfortably afford food, clothes, and furniture, and I have savings         0.9871
## financial_statusI can comfortably afford food, clothes, and furniture, but I don’t have savings   0.0974
## financial_statusI cannot afford enough food for my family                                         0.8026
## financial_statusNot Available                                                                     0.9543
## financial_statusPrefer not to answer                                                              0.0145
##                                                                                                    
## (Intercept)                                                                                     ***
## internet_access1                                                                                ***
## electricity_issue1                                                                              ***
## financial_statusI can afford food, but nothing else                                                
## financial_statusI can afford food, regular expenses, and clothes, but nothing else                 
## financial_statusI can comfortably afford food, clothes, and furniture, and I have savings          
## financial_statusI can comfortably afford food, clothes, and furniture, but I don’t have savings .  
## financial_statusI cannot afford enough food for my family                                          
## financial_statusNot Available                                                                      
## financial_statusPrefer not to answer                                                            *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 5780.9  on 4435  degrees of freedom
## Residual deviance: 5575.1  on 4426  degrees of freedom
## AIC: 5595.1
## 
## Number of Fisher Scoring iterations: 10
data$dropout_risk <- predict(model2, type = "response")

Insight: Logistic regression predicts dropout probability based on key factors.


8. Correlation Analysis

data$internet_num <- as.numeric(data$internet_access)
data$return_num <- as.numeric(data$return_to_school)
data$electricity_num <- as.numeric(data$electricity_issue)
data$financial_num <- as.numeric(data$financial_status)

cor_matrix <- cor(data[, c("internet_num","return_num","electricity_num","financial_num","dropout_risk")])
cor_matrix
##                 internet_num   return_num electricity_num financial_num
## internet_num     1.000000000  0.009096232     -0.11804453  -0.049153182
## return_num       0.009096232  1.000000000      0.09149783  -0.002601488
## electricity_num -0.118044528  0.091497829      1.00000000   0.050075172
## financial_num   -0.049153182 -0.002601488      0.05007517   1.000000000
## dropout_risk     0.079628692  0.114286379      0.80097447  -0.022773434
##                 dropout_risk
## internet_num      0.07962869
## return_num        0.11428638
## electricity_num   0.80097447
## financial_num    -0.02277343
## dropout_risk      1.00000000

Insight: Shows strength and direction of relationships among variables.


9. Prescriptive Analysis

prop.table(table(data$internet_access, data$return_to_school), 1)
##    
##             0         1
##   0 0.3925971 0.6074029
##   1 0.3834290 0.6165710
high_risk_students <- data[data$dropout_risk > 0.7, ]
nrow(high_risk_students)
## [1] 381
aggregate(dropout_risk ~ financial_status + geography, data = data, mean)
##                                                                   financial_status
## 1                         I can afford food and regular expenses, but nothing else
## 2                                              I can afford food, but nothing else
## 3               I can afford food, regular expenses, and clothes, but nothing else
## 4        I can comfortably afford food, clothes, and furniture, and I have savings
## 5  I can comfortably afford food, clothes, and furniture, but I don’t have savings
## 6                                        I cannot afford enough food for my family
## 7                                                             Prefer not to answer
## 8                                                                    Not Available
## 9                         I can afford food and regular expenses, but nothing else
## 10                                             I can afford food, but nothing else
## 11              I can afford food, regular expenses, and clothes, but nothing else
## 12       I can comfortably afford food, clothes, and furniture, and I have savings
## 13 I can comfortably afford food, clothes, and furniture, but I don’t have savings
## 14                                       I cannot afford enough food for my family
## 15                                                            Prefer not to answer
## 16                        I can afford food and regular expenses, but nothing else
## 17                                             I can afford food, but nothing else
## 18              I can afford food, regular expenses, and clothes, but nothing else
## 19       I can comfortably afford food, clothes, and furniture, and I have savings
## 20 I can comfortably afford food, clothes, and furniture, but I don’t have savings
## 21                                       I cannot afford enough food for my family
## 22                                                            Prefer not to answer
##                           geography dropout_risk
## 1  City center or metropolitan area 5.856496e-01
## 2  City center or metropolitan area 6.288990e-01
## 3  City center or metropolitan area 6.378863e-01
## 4  City center or metropolitan area 6.459355e-01
## 5  City center or metropolitan area 5.726864e-01
## 6  City center or metropolitan area 6.237103e-01
## 7  City center or metropolitan area 5.373316e-01
## 8                     Not Available 9.482496e-06
## 9                             Rural 6.161654e-01
## 10                            Rural 6.597386e-01
## 11                            Rural 6.578040e-01
## 12                            Rural 6.699145e-01
## 13                            Rural 6.086975e-01
## 14                            Rural 6.499037e-01
## 15                            Rural 5.712774e-01
## 16              Suburban/Peri-urban 5.652749e-01
## 17              Suburban/Peri-urban 6.128347e-01
## 18              Suburban/Peri-urban 6.207245e-01
## 19              Suburban/Peri-urban 6.225738e-01
## 20              Suburban/Peri-urban 5.633299e-01
## 21              Suburban/Peri-urban 6.026952e-01
## 22              Suburban/Peri-urban 5.130942e-01

Insight: Identifies high-risk student groups for targeted interventions.


10. Statistical Calculations

median(data$dropout_risk)
## [1] 0.6039146
names(sort(table(data$return_to_school), decreasing = TRUE))[1]
## [1] "1"
var(data$dropout_risk)
## [1] 0.003095888
sd(data$dropout_risk)
## [1] 0.0556407
quantile(data$dropout_risk)
##           0%          25%          50%          75%         100% 
## 9.482496e-06 5.718382e-01 6.039146e-01 6.494608e-01 7.397042e-01
IQR(data$dropout_risk)
## [1] 0.0776226
quantile(data$dropout_risk, probs = c(0.25,0.5,0.75,0.9))
##       25%       50%       75%       90% 
## 0.5718382 0.6039146 0.6494608 0.6967302

Insight: Describes central tendency and variability of dropout risk.


11. Data Preprocessing Additions

data$financial_situation[is.na(data$financial_situation)] <- "Unknown"
data$internet_binary <- ifelse(data$internet_access == "Yes", 1, 0)
reduced_data <- data[, c("internet_binary","financial_num","dropout_risk")]

Insight: Improves data quality and reduces dimensionality.


12. Visualizations (12 Total)

1–2 Bar Charts

ggplot(data, aes(x = internet_access, fill = internet_access)) +
  geom_bar() + theme_minimal()

ggplot(data, aes(x = return_to_school, fill = return_to_school)) +
  geom_bar() + theme_minimal()

3 Boxplot

ggplot(data, aes(x = financial_status, y = dropout_risk, fill = financial_status)) +
  geom_boxplot() + theme_minimal()

4 Scatter + Regression

ggplot(data, aes(x = financial_num, y = dropout_risk)) +
  geom_point() +
  geom_smooth(method = "lm")
## `geom_smooth()` using formula = 'y ~ x'

5 Pie Chart

pie(table(data$internet_access),
    main = "Internet Access Distribution")

6 CDF

plot(ecdf(data$dropout_risk),
     main = "CDF of Dropout Risk",
     sub = "Shows % of students below risk level")

7 Pair Plot

ggpairs(data[, c("internet_num","financial_num","dropout_risk")])

8 Histogram

ggplot(data, aes(x = dropout_risk)) +
  geom_histogram(bins = 20, fill = "purple")

9 Density Plot

ggplot(data, aes(x = dropout_risk, fill = return_to_school)) +
  geom_density(alpha = 0.4)

10 Heatmap

heat <- as.data.frame(table(data$internet_access, data$return_to_school))
ggplot(heat, aes(Var1, Var2, fill = Freq)) +
  geom_tile() + geom_text(aes(label = Freq))

11 Facet Plot

ggplot(data, aes(x = internet_access, fill = return_to_school)) +
  geom_bar(position = "dodge") +
  facet_wrap(~ geography)

12 Model Plot

ggplot(data, aes(x = internet_access, y = dropout_risk)) +
  geom_jitter(alpha = 0.3) +
  stat_summary(fun = mean, geom = "point", color = "red")


13. Moments

skewness(data$dropout_risk)
## [1] 0.003830186
kurtosis(data$dropout_risk)
## [1] 5.629831

Insight: Explains distribution shape and presence of extreme values.

16. Statistical Concepts Explanation (For Viva)

1. P-Value Interpretation

In the regression output:

  • The p-value (Pr(>|z|)) tells us whether a variable is statistically significant.
  • If p < 0.05, the variable significantly affects the outcome.

Example from Model:

  • electricity_issue1 has p-value < 0.001
    → This means electricity problems strongly affect dropout or irregular activity

  • internet_access1 in Model 1 has p-value ≈ 0.0000129
    → Internet access is also statistically significant

Conclusion: Lower p-values = stronger evidence that the variable impacts education outcomes.


2. F-Statistic (Concept Explanation)

Although logistic regression uses z-values, the F-statistic is used in linear regression to test:

  • Whether the overall model is significant

Formula conceptually: F = (Model Variance) / (Error Variance)

Interpretation: - Large F-value → Model explains data well - Small F-value → Model is weak

In this project: - We rely on z-values and p-values instead of F-statistic because we used logistic regression (glm)


3. Standard Deviation

From output: - SD of dropout_risk = 0.0556

Interpretation: - This shows how much dropout probability varies from the mean - Small SD → values are closely grouped - Large SD → high variation

In this dataset: Dropout risk does not vary extremely → most students fall in a similar risk range (~0.57 to 0.65)


4. Cumulative Distribution Function (CDF)

The CDF graph shows:

  • The probability that a value is less than or equal to a given value

Example: - If CDF at 0.6 = 0.5
→ 50% of students have dropout risk ≤ 0.6

Interpretation in this project: - Helps understand percentage of students below a certain risk level - Useful for identifying thresholds (e.g., high-risk students)


5. Variance

Variance = 0.003095

Interpretation: - Measures spread of data - Low variance → data points are close to mean


6. Skewness and Kurtosis

  • Skewness ≈ 0 → distribution is symmetric
  • Kurtosis ≈ 5.63 → distribution has heavy tails (more extreme values)

Final Interpretation for Viva

  • Electricity issues are the strongest predictor
  • Internet access improves education outcomes
  • Financial condition has moderate impact
  • Dropout risk distribution is slightly concentrated but with extreme cases

15. Conclusion

Final Recommendation: Improving digital access and providing financial support will reduce dropout rates and improve educational outcomes.