This project analyzes the impact of COVID-19 on student education, focusing on internet access, dropout rates, and financial conditions. The goal is to identify key factors contributing to educational disruption and predict dropout risk.
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.5.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.5.3
library(moments)
library(reshape2)
## Warning: package 'reshape2' was built under R version 4.5.3
data <- read.csv("C:/Users/PANASHE/OneDrive/open_one_time_covid_education_impact.csv")
str(data)
## 'data.frame': 4436 obs. of 27 variables:
## $ submission_id : num 4.57e+15 6.44e+15 5.00e+15 5.52e+15 5.03e+15 ...
## $ submission_date : chr "2021-03-17" "2021-03-29" "2021-03-18" "2021-03-24" ...
## $ gender : chr "Female" "Male" "Female" "Male" ...
## $ age : chr "Over 45 years old" "26 to 35 years old" "26 to 35 years old" "36 to 45 years old" ...
## $ geography : chr "Suburban/Peri-urban" "Suburban/Peri-urban" "City center or metropolitan area" "Suburban/Peri-urban" ...
## $ financial_situation : chr "I can afford food and regular expenses, but nothing else" "I cannot afford enough food for my family" "I can comfortably afford food, clothes, and furniture, and I have savings" "I can afford food, but nothing else" ...
## $ education : chr "University or college degree completed" "University or college degree completed" "University or college degree completed" "University or college degree completed" ...
## $ employment_status : chr "I am unemployed" "I am unemployed" "I work full-time, either as an employee or self-employed" "I work full-time, either as an employee or self-employed" ...
## $ submission_state : chr "Miranda" "Miranda" "Miranda" "Miranda" ...
## $ are_there_children_0_to_2_yrs_out_of_educational_system : int 0 0 1 0 0 0 0 0 0 1 ...
## $ were_children_3_to_17_yrs_enrolled_and_did_not_return_to_school : int 1 1 1 0 1 0 1 0 0 1 ...
## $ are_there_children_who_stopped_enrolling_in_primary_education : int 1 0 1 0 0 1 0 0 0 0 ...
## $ are_there_children_who_stopped_enrolling_in_secondary_education : int 0 0 1 0 0 1 0 0 0 0 ...
## $ are_children_attending_face_to_face_classes : int 0 0 0 0 0 0 0 0 0 0 ...
## $ can_children_observe_deterioration_of_basic_services_of_school : int 1 1 1 1 1 0 1 1 1 1 ...
## $ do_children_3_and_17_yrs_receive_regular_school_meals : chr "Every day" "No" "No" "No" ...
## $ are_there_teachers_at_scheduled_class_hours : chr "Irregularly" "Irregularly" "There are not enough" "There are enough" ...
## $ are_children_3_to_17_yrs_dealing_with_irregular_school_activity : int 0 1 1 1 1 0 1 1 0 0 ...
## $ are_children_being_teached_by_unqualified_people : int 0 0 1 1 0 1 0 0 1 0 ...
## $ did_teachers_leave_the_educational_system : int 0 1 1 1 1 1 0 1 1 0 ...
## $ do_school_and_the_teachers_have_internet_connection : int 1 0 0 0 0 1 1 0 1 1 ...
## $ do_children_have_internet_connection : int 1 1 1 1 1 0 1 0 0 1 ...
## $ do_children_3_to_17_yrs_miss_virtual_class_due_to_lack_of_electricity : int 0 1 0 0 1 0 0 1 1 0 ...
## $ does_home_shows_severe_deficit_of_electricity : int 0 0 1 0 0 0 0 0 0 1 ...
## $ does_home_shows_severe_deficit_of_internet : int 0 0 0 0 0 0 0 0 0 0 ...
## $ do_children_3_to_17_yrs_miss_class_or_in_lower_grade : int 0 0 0 0 0 0 0 0 0 0 ...
## $ are_children_promoted_with_a_modality_different_from_formal_evaluation: int 0 0 1 0 1 1 0 0 1 0 ...
summary(data)
## submission_id submission_date gender age
## Min. :4.504e+15 Length:4436 Length:4436 Length:4436
## 1st Qu.:5.077e+15 Class :character Class :character Class :character
## Median :5.642e+15 Mode :character Mode :character Mode :character
## Mean :5.633e+15
## 3rd Qu.:6.188e+15
## Max. :6.755e+15
## geography financial_situation education employment_status
## Length:4436 Length:4436 Length:4436 Length:4436
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## submission_state are_there_children_0_to_2_yrs_out_of_educational_system
## Length:4436 Min. :0.0000
## Class :character 1st Qu.:0.0000
## Mode :character Median :0.0000
## Mean :0.2949
## 3rd Qu.:1.0000
## Max. :1.0000
## were_children_3_to_17_yrs_enrolled_and_did_not_return_to_school
## Min. :0.0000
## 1st Qu.:0.0000
## Median :1.0000
## Mean :0.6132
## 3rd Qu.:1.0000
## Max. :1.0000
## are_there_children_who_stopped_enrolling_in_primary_education
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.2065
## 3rd Qu.:0.0000
## Max. :1.0000
## are_there_children_who_stopped_enrolling_in_secondary_education
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.1943
## 3rd Qu.:0.0000
## Max. :1.0000
## are_children_attending_face_to_face_classes
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.1637
## 3rd Qu.:0.0000
## Max. :1.0000
## can_children_observe_deterioration_of_basic_services_of_school
## Min. :0.0000
## 1st Qu.:1.0000
## Median :1.0000
## Mean :0.8005
## 3rd Qu.:1.0000
## Max. :1.0000
## do_children_3_and_17_yrs_receive_regular_school_meals
## Length:4436
## Class :character
## Mode :character
##
##
##
## are_there_teachers_at_scheduled_class_hours
## Length:4436
## Class :character
## Mode :character
##
##
##
## are_children_3_to_17_yrs_dealing_with_irregular_school_activity
## Min. :0.0000
## 1st Qu.:0.0000
## Median :1.0000
## Mean :0.6431
## 3rd Qu.:1.0000
## Max. :1.0000
## are_children_being_teached_by_unqualified_people
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.3165
## 3rd Qu.:1.0000
## Max. :1.0000
## did_teachers_leave_the_educational_system
## Min. :0.0000
## 1st Qu.:0.0000
## Median :1.0000
## Mean :0.6643
## 3rd Qu.:1.0000
## Max. :1.0000
## do_school_and_the_teachers_have_internet_connection
## Min. :0.0000
## 1st Qu.:0.0000
## Median :1.0000
## Mean :0.5604
## 3rd Qu.:1.0000
## Max. :1.0000
## do_children_have_internet_connection
## Min. :0.0000
## 1st Qu.:0.0000
## Median :1.0000
## Mean :0.6285
## 3rd Qu.:1.0000
## Max. :1.0000
## do_children_3_to_17_yrs_miss_virtual_class_due_to_lack_of_electricity
## Min. :0.0000
## 1st Qu.:0.0000
## Median :1.0000
## Mean :0.6655
## 3rd Qu.:1.0000
## Max. :1.0000
## does_home_shows_severe_deficit_of_electricity
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.2845
## 3rd Qu.:1.0000
## Max. :1.0000
## does_home_shows_severe_deficit_of_internet
## Min. :0.0000
## 1st Qu.:0.0000
## Median :1.0000
## Mean :0.5791
## 3rd Qu.:1.0000
## Max. :1.0000
## do_children_3_to_17_yrs_miss_class_or_in_lower_grade
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.2464
## 3rd Qu.:0.0000
## Max. :1.0000
## are_children_promoted_with_a_modality_different_from_formal_evaluation
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.4272
## 3rd Qu.:1.0000
## Max. :1.0000
colnames(data)
## [1] "submission_id"
## [2] "submission_date"
## [3] "gender"
## [4] "age"
## [5] "geography"
## [6] "financial_situation"
## [7] "education"
## [8] "employment_status"
## [9] "submission_state"
## [10] "are_there_children_0_to_2_yrs_out_of_educational_system"
## [11] "were_children_3_to_17_yrs_enrolled_and_did_not_return_to_school"
## [12] "are_there_children_who_stopped_enrolling_in_primary_education"
## [13] "are_there_children_who_stopped_enrolling_in_secondary_education"
## [14] "are_children_attending_face_to_face_classes"
## [15] "can_children_observe_deterioration_of_basic_services_of_school"
## [16] "do_children_3_and_17_yrs_receive_regular_school_meals"
## [17] "are_there_teachers_at_scheduled_class_hours"
## [18] "are_children_3_to_17_yrs_dealing_with_irregular_school_activity"
## [19] "are_children_being_teached_by_unqualified_people"
## [20] "did_teachers_leave_the_educational_system"
## [21] "do_school_and_the_teachers_have_internet_connection"
## [22] "do_children_have_internet_connection"
## [23] "do_children_3_to_17_yrs_miss_virtual_class_due_to_lack_of_electricity"
## [24] "does_home_shows_severe_deficit_of_electricity"
## [25] "does_home_shows_severe_deficit_of_internet"
## [26] "do_children_3_to_17_yrs_miss_class_or_in_lower_grade"
## [27] "are_children_promoted_with_a_modality_different_from_formal_evaluation"
dim(data)
## [1] 4436 27
sum(is.na(data))
## [1] 0
colSums(is.na(data))
## submission_id
## 0
## submission_date
## 0
## gender
## 0
## age
## 0
## geography
## 0
## financial_situation
## 0
## education
## 0
## employment_status
## 0
## submission_state
## 0
## are_there_children_0_to_2_yrs_out_of_educational_system
## 0
## were_children_3_to_17_yrs_enrolled_and_did_not_return_to_school
## 0
## are_there_children_who_stopped_enrolling_in_primary_education
## 0
## are_there_children_who_stopped_enrolling_in_secondary_education
## 0
## are_children_attending_face_to_face_classes
## 0
## can_children_observe_deterioration_of_basic_services_of_school
## 0
## do_children_3_and_17_yrs_receive_regular_school_meals
## 0
## are_there_teachers_at_scheduled_class_hours
## 0
## are_children_3_to_17_yrs_dealing_with_irregular_school_activity
## 0
## are_children_being_teached_by_unqualified_people
## 0
## did_teachers_leave_the_educational_system
## 0
## do_school_and_the_teachers_have_internet_connection
## 0
## do_children_have_internet_connection
## 0
## do_children_3_to_17_yrs_miss_virtual_class_due_to_lack_of_electricity
## 0
## does_home_shows_severe_deficit_of_electricity
## 0
## does_home_shows_severe_deficit_of_internet
## 0
## do_children_3_to_17_yrs_miss_class_or_in_lower_grade
## 0
## are_children_promoted_with_a_modality_different_from_formal_evaluation
## 0
The dataset contains both categorical and numerical variables. Missing values are checked to ensure data reliability before analysis.
data$gender <- as.factor(data$gender)
data$age <- as.factor(data$age)
data$geography <- as.factor(data$geography)
data$education <- as.factor(data$education)
data$employment_status <- as.factor(data$employment_status)
Categorical variables are converted into factors to ensure proper statistical analysis and modeling.
mean(data$do_children_have_internet_connection) * 100
## [1] 62.84941
table(data$financial_situation)
##
## I can afford food and regular expenses, but nothing else
## 1060
## I can afford food, but nothing else
## 1445
## I can afford food, regular expenses, and clothes, but nothing else
## 244
## I can comfortably afford food, clothes, and furniture, and I have savings
## 157
## I can comfortably afford food, clothes, and furniture, but I don’t have savings
## 127
## I cannot afford enough food for my family
## 1163
## Not Available
## 1
## Prefer not to answer
## 239
table(data$were_children_3_to_17_yrs_enrolled_and_did_not_return_to_school)
##
## 0 1
## 1716 2720
A significant percentage of students lack internet access, indicating a digital divide. Financial conditions vary widely and influence access to education and return rates.
table(data$financial_situation,
data$were_children_3_to_17_yrs_enrolled_and_did_not_return_to_school)
##
## 0
## I can afford food and regular expenses, but nothing else 442
## I can afford food, but nothing else 530
## I can afford food, regular expenses, and clothes, but nothing else 89
## I can comfortably afford food, clothes, and furniture, and I have savings 56
## I can comfortably afford food, clothes, and furniture, but I don’t have savings 54
## I cannot afford enough food for my family 434
## Not Available 1
## Prefer not to answer 110
##
## 1
## I can afford food and regular expenses, but nothing else 618
## I can afford food, but nothing else 915
## I can afford food, regular expenses, and clothes, but nothing else 155
## I can comfortably afford food, clothes, and furniture, and I have savings 101
## I can comfortably afford food, clothes, and furniture, but I don’t have savings 73
## I cannot afford enough food for my family 729
## Not Available 0
## Prefer not to answer 129
There is a clear relationship between financial status and dropout, with lower-income groups showing higher dropout rates.
data$internet_access <- as.factor(data$do_children_have_internet_connection)
data$return_to_school <- as.factor(data$were_children_3_to_17_yrs_enrolled_and_did_not_return_to_school)
data$electricity_issue <- as.factor(data$does_home_shows_severe_deficit_of_electricity)
data$financial_status <- as.factor(data$financial_situation)
model <- glm(return_to_school ~ internet_access + electricity_issue + financial_status + geography,
data = data, family = "binomial")
summary(model)
##
## Call:
## glm(formula = return_to_school ~ internet_access + electricity_issue +
## financial_status + geography, family = "binomial", data = data)
##
## Coefficients: (1 not defined because of singularities)
## Estimate
## (Intercept) 0.17902
## internet_access1 0.09999
## electricity_issue1 0.42152
## financial_statusI can afford food, but nothing else 0.18449
## financial_statusI can afford food, regular expenses, and clothes, but nothing else 0.22088
## financial_statusI can comfortably afford food, clothes, and furniture, and I have savings 0.24278
## financial_statusI can comfortably afford food, clothes, and furniture, but I don’t have savings -0.06844
## financial_statusI cannot afford enough food for my family 0.13011
## financial_statusNot Available -11.84506
## financial_statusPrefer not to answer -0.18498
## geographyNot Available NA
## geographyRural 0.10113
## geographySuburban/Peri-urban -0.07416
## Std. Error
## (Intercept) 0.08630
## internet_access1 0.06518
## electricity_issue1 0.07147
## financial_statusI can afford food, but nothing else 0.08388
## financial_statusI can afford food, regular expenses, and clothes, but nothing else 0.14751
## financial_statusI can comfortably afford food, clothes, and furniture, and I have savings 0.17919
## financial_statusI can comfortably afford food, clothes, and furniture, but I don’t have savings 0.19128
## financial_statusI cannot afford enough food for my family 0.08908
## financial_statusNot Available 196.96769
## financial_statusPrefer not to answer 0.14494
## geographyNot Available NA
## geographyRural 0.08023
## geographySuburban/Peri-urban 0.07303
## z value
## (Intercept) 2.074
## internet_access1 1.534
## electricity_issue1 5.898
## financial_statusI can afford food, but nothing else 2.199
## financial_statusI can afford food, regular expenses, and clothes, but nothing else 1.497
## financial_statusI can comfortably afford food, clothes, and furniture, and I have savings 1.355
## financial_statusI can comfortably afford food, clothes, and furniture, but I don’t have savings -0.358
## financial_statusI cannot afford enough food for my family 1.461
## financial_statusNot Available -0.060
## financial_statusPrefer not to answer -1.276
## geographyNot Available NA
## geographyRural 1.260
## geographySuburban/Peri-urban -1.015
## Pr(>|z|)
## (Intercept) 0.0380
## internet_access1 0.1250
## electricity_issue1 3.69e-09
## financial_statusI can afford food, but nothing else 0.0279
## financial_statusI can afford food, regular expenses, and clothes, but nothing else 0.1343
## financial_statusI can comfortably afford food, clothes, and furniture, and I have savings 0.1755
## financial_statusI can comfortably afford food, clothes, and furniture, but I don’t have savings 0.7205
## financial_statusI cannot afford enough food for my family 0.1441
## financial_statusNot Available 0.9520
## financial_statusPrefer not to answer 0.2019
## geographyNot Available NA
## geographyRural 0.2075
## geographySuburban/Peri-urban 0.3099
##
## (Intercept) *
## internet_access1
## electricity_issue1 ***
## financial_statusI can afford food, but nothing else *
## financial_statusI can afford food, regular expenses, and clothes, but nothing else
## financial_statusI can comfortably afford food, clothes, and furniture, and I have savings
## financial_statusI can comfortably afford food, clothes, and furniture, but I don’t have savings
## financial_statusI cannot afford enough food for my family
## financial_statusNot Available
## financial_statusPrefer not to answer
## geographyNot Available
## geographyRural
## geographySuburban/Peri-urban
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 5920.4 on 4435 degrees of freedom
## Residual deviance: 5861.5 on 4424 degrees of freedom
## AIC: 5885.5
##
## Number of Fisher Scoring iterations: 10
data$dropout_risk <- predict(model, type = "response")
The logistic regression model shows that internet access, electricity availability, and financial status significantly influence dropout risk.
data$internet_num <- as.numeric(data$internet_access)
data$return_num <- as.numeric(data$return_to_school)
data$financial_num <- as.numeric(data$financial_status)
cor_matrix <- cor(data[, c("internet_num", "return_num", "financial_num", "dropout_risk")])
cor_matrix
## internet_num return_num financial_num dropout_risk
## internet_num 1.000000000 0.009096232 -0.049153182 0.07962869
## return_num 0.009096232 1.000000000 -0.002601488 0.11428638
## financial_num -0.049153182 -0.002601488 1.000000000 -0.02277343
## dropout_risk 0.079628692 0.114286379 -0.022773434 1.00000000
Correlation analysis helps quantify relationships between variables and dropout risk.
cor_melt <- melt(cor_matrix)
ggplot(cor_melt, aes(Var1, Var2, fill = value)) +
geom_tile() +
geom_text(aes(label = round(value, 2))) +
labs(title = "Correlation Heatmap") +
theme_minimal()
The heatmap visually highlights strong and weak relationships between variables.
ggplot(data, aes(x = internet_access, fill = internet_access)) +
geom_bar() +
labs(title = "Internet Access Distribution") +
theme_minimal()
This chart shows the distribution of internet access among students.
ggplot(data, aes(x = return_to_school, fill = return_to_school)) +
geom_bar() +
labs(title = "Children Not Returning to School") +
theme_minimal()
This plot highlights the number of students who did not return to school.
ggplot(data, aes(x = financial_status, fill = return_to_school)) +
geom_bar(position = "fill") +
labs(title = "Financial Status vs Dropout (Proportion)") +
theme_minimal()
Students from weaker financial backgrounds show higher dropout proportions.
ggplot(data, aes(x = dropout_risk)) +
geom_histogram(bins = 20, fill = "purple", alpha = 0.7) +
geom_vline(xintercept = 0.7, color = "red", linetype = "dashed") +
labs(title = "Dropout Risk Distribution") +
theme_minimal()
This histogram shows how dropout risk is distributed and identifies high-risk students.
ggplot(data, aes(x = return_to_school, y = dropout_risk, fill = return_to_school)) +
stat_summary(fun = mean, geom = "bar") +
labs(title = "Average Dropout Risk by Return Status") +
theme_minimal()
Students who did not return to school have a higher average predicted dropout risk.
ggplot(data, aes(x = financial_num, y = dropout_risk)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", color = "blue") +
labs(title = "Financial Status vs Dropout Risk") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
There is a positive relationship between financial hardship and dropout risk.
skewness(data$dropout_risk)
## [1] 0.003830186
kurtosis(data$dropout_risk)
## [1] 5.629831
The distribution shows concentration of risk among specific groups and presence of extreme cases.
The analysis shows that internet access, electricity availability, and financial status are the main drivers of student dropout. Students from disadvantaged backgrounds are at higher risk. Predictive modeling helps identify vulnerable groups, allowing targeted interventions to reduce educational inequality.C