Data_607_Final_Project

Nick Climaco

2023-05-11

Introduction

Are countries generating enough electricity to meet its demand regardless if that country is considered “developed” or “developing?

Libraries

# libraries
library(tidyverse)
library(readr)
library(DT)
library(rvest)
library(countrycode)
library(plotly)
library(caret)
library(e1071)
library(pROC)

Summary Statistics

Summary Statistics

Summary Statistics

Summary Statistics

Analysis and Visualization

Statistical Analysis

t.test(developed_df$electricity_demand, developed_df$electricity_generation)
## 
##  Welch Two Sample t-test
## 
## data:  developed_df$electricity_demand and developed_df$electricity_generation
## t = -0.84602, df = 2340.6, p-value = 0.3976
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -70.47259  27.99218
## sample estimates:
## mean of x mean of y 
##  237.5535  258.7937
t.test(developing_df$electricity_demand, developing_df$electricity_generation)
## 
##  Welch Two Sample t-test
## 
## data:  developing_df$electricity_demand and developing_df$electricity_generation
## t = -0.36415, df = 5880.8, p-value = 0.7158
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -26.70547  18.33840
## sample estimates:
## mean of x mean of y 
##  77.50350  81.68704

Statistical Analysis

Regression Model

reg_model <- lm(avg_electricity_demand ~ year + status, data = ml_data)
summary(reg_model)
## 
## Call:
## lm(formula = avg_electricity_demand ~ year + status, data = ml_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -59.108 -12.762  -3.931  11.225  61.249 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -6737.4198   732.4478  -9.198 5.34e-13 ***
## year                 3.4704     0.3651   9.505 1.66e-13 ***
## statusdeveloping  -154.7933     6.5315 -23.700  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.71 on 59 degrees of freedom
## Multiple R-squared:  0.917,  Adjusted R-squared:  0.9142 
## F-statistic:   326 on 2 and 59 DF,  p-value: < 2.2e-16

Classification Model

Data Partition

trainIndex <- createDataPartition(class_df$status, p = 0.7, list = FALSE)

train_data <- class_df[trainIndex, ]
test_data <- class_df[-trainIndex, ]

Model Training

ml_model <- naiveBayes(factor(status) ~ gdp_per_capita + hdi_2021 + continent, data = train_data)

predictions <- predict(ml_model, test_data)

Classification Model

Confusion Matrix

## Confusion Matrix and Statistics
## 
##             Actual
## Predictions  developed developing
##   developed        317         61
##   developing         6        859
##                                         
##                Accuracy : 0.9461        
##                  95% CI : (0.932, 0.958)
##     No Information Rate : 0.7401        
##     P-Value [Acc > NIR] : < 2.2e-16     
##                                         
##                   Kappa : 0.8672        
##                                         
##  Mcnemar's Test P-Value : 4.191e-11     
##                                         
##             Sensitivity : 0.9814        
##             Specificity : 0.9337        
##          Pos Pred Value : 0.8386        
##          Neg Pred Value : 0.9931        
##              Prevalence : 0.2599        
##          Detection Rate : 0.2550        
##    Detection Prevalence : 0.3041        
##       Balanced Accuracy : 0.9576        
##                                         
##        'Positive' Class : developed     
## 

Classification Model

ROC Curve

Conclusion

In this project,

Overall, We found that electricity demand and generation are statistically equal which indicates that we are barely break even when it meeting the demand of electricity in the world. This finding suggests that there is little room for error in the electricity supply system. If there were any unexpected changes in demand or supply, it could result in either shortages or surpluses of electricity. This could lead to power outages, blackouts, or other disruptions to the electricity supply.