Data_Final_Project

Nick Climaco

2023-05-08

Abstract

Introduction

Do developed countries have a significantly different electricity demand than developing countries?

Introduction

https://www.kaggle.com/datasets/pralabhpoudel/world-energy-consumption

https://www.theglobaleconomy.com/rankings/human_development/

Summary Statistics

Summary Statistics

Summary Statistics

Analysis and Visualization

Statistical Analysis

t.test(electricity_demand ~ status, data = prime_df)
## 
##  Welch Two Sample t-test
## 
## data:  electricity_demand by status
## t = 8.247, df = 1666.6, p-value = 3.257e-16
## alternative hypothesis: true difference in means between group developed and group developing is not equal to 0
## 95 percent confidence interval:
##  121.9851 198.1150
## sample estimates:
##  mean in group developed mean in group developing 
##                 237.5535                  77.5035

ANOVA

aov(electricity_demand ~ status, data = prime_df) |>
    summary()
##               Df    Sum Sq  Mean Sq F value Pr(>F)    
## status         1  20574690 20574690   84.92 <2e-16 ***
## Residuals   3963 960110844   242269                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1148 observations deleted due to missingness

Regression Model

reg_model <- lm(avg_electricity_demand ~ year + status, data = ml_data)
summary(reg_model)
## 
## Call:
## lm(formula = avg_electricity_demand ~ year + status, data = ml_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -59.108 -12.762  -3.931  11.225  61.249 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -6737.4198   732.4478  -9.198 5.34e-13 ***
## year                 3.4704     0.3651   9.505 1.66e-13 ***
## statusdeveloping  -154.7933     6.5315 -23.700  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.71 on 59 degrees of freedom
## Multiple R-squared:  0.917,  Adjusted R-squared:  0.9142 
## F-statistic:   326 on 2 and 59 DF,  p-value: < 2.2e-16

Regression Model

Classification Model

Data Partition

trainIndex <- createDataPartition(class_df$status, p = 0.7, list = FALSE)

train_data <- class_df[trainIndex, ]
test_data <- class_df[-trainIndex, ]

Model Training

ml_model <- naiveBayes(factor(status) ~ gdp_per_capita + hdi_2021 + continent, data = train_data)

predictions <- predict(ml_model, test_data)

Classification Model

Confusion Matrix

## Confusion Matrix and Statistics
## 
##             Actual
## Predictions  developed developing
##   developed        317         61
##   developing         6        859
##                                         
##                Accuracy : 0.9461        
##                  95% CI : (0.932, 0.958)
##     No Information Rate : 0.7401        
##     P-Value [Acc > NIR] : < 2.2e-16     
##                                         
##                   Kappa : 0.8672        
##                                         
##  Mcnemar's Test P-Value : 4.191e-11     
##                                         
##             Sensitivity : 0.9814        
##             Specificity : 0.9337        
##          Pos Pred Value : 0.8386        
##          Neg Pred Value : 0.9931        
##              Prevalence : 0.2599        
##          Detection Rate : 0.2550        
##    Detection Prevalence : 0.3041        
##       Balanced Accuracy : 0.9576        
##                                         
##        'Positive' Class : developed     
## 

Classification Model

ROC Curve

Conclusion

In this project,

Overall, We found that electricity demand increased over time and that developed countries had statistically higher demand than developing countries. Our findings may have important insights for policymakers and energy companies looking to understand global trends in electricity consumption and plan for the future.