1 Introduction

Pollution is one of the most critical global challenges affecting environmental sustainability and human health. This project analyzes global CO2 emission trends using R programming to understand historical patterns, identify key contributors, and evaluate potential future impacts.

2 Dataset

The dataset used in this project is sourced from Our World in Data. It contains country-wise information on CO2 emissions, population, and related environmental indicators across multiple years.

# Load data
data <- read.csv("data.csv")

# View data
head(data)
##       country year iso_code population gdp cement_co2 cement_co2_per_capita co2
## 1 Afghanistan 1750      AFG    2802560  NA          0                     0  NA
## 2 Afghanistan 1751      AFG         NA  NA          0                    NA  NA
## 3 Afghanistan 1752      AFG         NA  NA          0                    NA  NA
## 4 Afghanistan 1753      AFG         NA  NA          0                    NA  NA
## 5 Afghanistan 1754      AFG         NA  NA          0                    NA  NA
## 6 Afghanistan 1755      AFG         NA  NA          0                    NA  NA
##   co2_growth_abs co2_growth_prct co2_including_luc co2_including_luc_growth_abs
## 1             NA              NA                NA                           NA
## 2             NA              NA                NA                           NA
## 3             NA              NA                NA                           NA
## 4             NA              NA                NA                           NA
## 5             NA              NA                NA                           NA
## 6             NA              NA                NA                           NA
##   co2_including_luc_growth_prct co2_including_luc_per_capita
## 1                            NA                           NA
## 2                            NA                           NA
## 3                            NA                           NA
## 4                            NA                           NA
## 5                            NA                           NA
## 6                            NA                           NA
##   co2_including_luc_per_gdp co2_including_luc_per_unit_energy co2_per_capita
## 1                        NA                                NA             NA
## 2                        NA                                NA             NA
## 3                        NA                                NA             NA
## 4                        NA                                NA             NA
## 5                        NA                                NA             NA
## 6                        NA                                NA             NA
##   co2_per_gdp co2_per_unit_energy coal_co2 coal_co2_per_capita consumption_co2
## 1          NA                  NA       NA                  NA              NA
## 2          NA                  NA       NA                  NA              NA
## 3          NA                  NA       NA                  NA              NA
## 4          NA                  NA       NA                  NA              NA
## 5          NA                  NA       NA                  NA              NA
## 6          NA                  NA       NA                  NA              NA
##   consumption_co2_per_capita consumption_co2_per_gdp cumulative_cement_co2
## 1                         NA                      NA                     0
## 2                         NA                      NA                     0
## 3                         NA                      NA                     0
## 4                         NA                      NA                     0
## 5                         NA                      NA                     0
## 6                         NA                      NA                     0
##   cumulative_co2 cumulative_co2_including_luc cumulative_coal_co2
## 1             NA                           NA                  NA
## 2             NA                           NA                  NA
## 3             NA                           NA                  NA
## 4             NA                           NA                  NA
## 5             NA                           NA                  NA
## 6             NA                           NA                  NA
##   cumulative_flaring_co2 cumulative_gas_co2 cumulative_luc_co2
## 1                     NA                 NA                 NA
## 2                     NA                 NA                 NA
## 3                     NA                 NA                 NA
## 4                     NA                 NA                 NA
## 5                     NA                 NA                 NA
## 6                     NA                 NA                 NA
##   cumulative_oil_co2 cumulative_other_co2 energy_per_capita energy_per_gdp
## 1                 NA                   NA                NA             NA
## 2                 NA                   NA                NA             NA
## 3                 NA                   NA                NA             NA
## 4                 NA                   NA                NA             NA
## 5                 NA                   NA                NA             NA
## 6                 NA                   NA                NA             NA
##   flaring_co2 flaring_co2_per_capita gas_co2 gas_co2_per_capita
## 1          NA                     NA      NA                 NA
## 2          NA                     NA      NA                 NA
## 3          NA                     NA      NA                 NA
## 4          NA                     NA      NA                 NA
## 5          NA                     NA      NA                 NA
## 6          NA                     NA      NA                 NA
##   ghg_excluding_lucf_per_capita ghg_per_capita land_use_change_co2
## 1                            NA             NA                  NA
## 2                            NA             NA                  NA
## 3                            NA             NA                  NA
## 4                            NA             NA                  NA
## 5                            NA             NA                  NA
## 6                            NA             NA                  NA
##   land_use_change_co2_per_capita methane methane_per_capita nitrous_oxide
## 1                             NA      NA                 NA            NA
## 2                             NA      NA                 NA            NA
## 3                             NA      NA                 NA            NA
## 4                             NA      NA                 NA            NA
## 5                             NA      NA                 NA            NA
## 6                             NA      NA                 NA            NA
##   nitrous_oxide_per_capita oil_co2 oil_co2_per_capita other_co2_per_capita
## 1                       NA      NA                 NA                   NA
## 2                       NA      NA                 NA                   NA
## 3                       NA      NA                 NA                   NA
## 4                       NA      NA                 NA                   NA
## 5                       NA      NA                 NA                   NA
## 6                       NA      NA                 NA                   NA
##   other_industry_co2 primary_energy_consumption share_global_cement_co2
## 1                 NA                         NA                      NA
## 2                 NA                         NA                      NA
## 3                 NA                         NA                      NA
## 4                 NA                         NA                      NA
## 5                 NA                         NA                      NA
## 6                 NA                         NA                      NA
##   share_global_co2 share_global_co2_including_luc share_global_coal_co2
## 1               NA                             NA                    NA
## 2               NA                             NA                    NA
## 3               NA                             NA                    NA
## 4               NA                             NA                    NA
## 5               NA                             NA                    NA
## 6               NA                             NA                    NA
##   share_global_cumulative_cement_co2 share_global_cumulative_co2
## 1                                 NA                          NA
## 2                                 NA                          NA
## 3                                 NA                          NA
## 4                                 NA                          NA
## 5                                 NA                          NA
## 6                                 NA                          NA
##   share_global_cumulative_co2_including_luc share_global_cumulative_coal_co2
## 1                                        NA                               NA
## 2                                        NA                               NA
## 3                                        NA                               NA
## 4                                        NA                               NA
## 5                                        NA                               NA
## 6                                        NA                               NA
##   share_global_cumulative_flaring_co2 share_global_cumulative_gas_co2
## 1                                  NA                              NA
## 2                                  NA                              NA
## 3                                  NA                              NA
## 4                                  NA                              NA
## 5                                  NA                              NA
## 6                                  NA                              NA
##   share_global_cumulative_luc_co2 share_global_cumulative_oil_co2
## 1                              NA                              NA
## 2                              NA                              NA
## 3                              NA                              NA
## 4                              NA                              NA
## 5                              NA                              NA
## 6                              NA                              NA
##   share_global_cumulative_other_co2 share_global_flaring_co2
## 1                                NA                       NA
## 2                                NA                       NA
## 3                                NA                       NA
## 4                                NA                       NA
## 5                                NA                       NA
## 6                                NA                       NA
##   share_global_gas_co2 share_global_luc_co2 share_global_oil_co2
## 1                   NA                   NA                   NA
## 2                   NA                   NA                   NA
## 3                   NA                   NA                   NA
## 4                   NA                   NA                   NA
## 5                   NA                   NA                   NA
## 6                   NA                   NA                   NA
##   share_global_other_co2 share_of_temperature_change_from_ghg
## 1                     NA                                   NA
## 2                     NA                                   NA
## 3                     NA                                   NA
## 4                     NA                                   NA
## 5                     NA                                   NA
## 6                     NA                                   NA
##   temperature_change_from_ch4 temperature_change_from_co2
## 1                          NA                          NA
## 2                          NA                          NA
## 3                          NA                          NA
## 4                          NA                          NA
## 5                          NA                          NA
## 6                          NA                          NA
##   temperature_change_from_ghg temperature_change_from_n2o total_ghg
## 1                          NA                          NA        NA
## 2                          NA                          NA        NA
## 3                          NA                          NA        NA
## 4                          NA                          NA        NA
## 5                          NA                          NA        NA
## 6                          NA                          NA        NA
##   total_ghg_excluding_lucf trade_co2 trade_co2_share
## 1                       NA        NA              NA
## 2                       NA        NA              NA
## 3                       NA        NA              NA
## 4                       NA        NA              NA
## 5                       NA        NA              NA
## 6                       NA        NA              NA

3 Libraries

The following libraries were used for data analysis and visualization:

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.5.3
## Warning: package 'ggplot2' was built under R version 4.5.3
## Warning: package 'tidyr' was built under R version 4.5.3
## Warning: package 'lubridate' was built under R version 4.5.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.2.0
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)    
library(ggplot2)   
library(readr)     
library(corrplot)
## Warning: package 'corrplot' was built under R version 4.5.3
## corrplot 0.95 loaded
library(caret)
## Warning: package 'caret' was built under R version 4.5.3
## Loading required package: lattice
## 
## Attaching package: 'caret'
## 
## The following object is masked from 'package:purrr':
## 
##     lift
library(plotly)
## Warning: package 'plotly' was built under R version 4.5.3
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout

4 Data Cleaning

Before analysis, the dataset was cleaned to ensure accuracy and consistency.

# Remove missing values
data <- data %>%
  filter(!is.na(co2), !is.na(population))

# Remove non-country data (like World, Asia)
data <- data %>%
  filter(nchar(iso_code) == 3)

# Remove zero or negative values
data <- data %>%
  filter(co2 > 0)

# Check structure
str(data)
## 'data.frame':    22386 obs. of  79 variables:
##  $ country                                  : chr  "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
##  $ year                                     : int  1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 ...
##  $ iso_code                                 : chr  "AFG" "AFG" "AFG" "AFG" ...
##  $ population                               : num  7356890 7776180 7879343 7987784 8096703 ...
##  $ gdp                                      : num  NA 9.42e+09 9.69e+09 1.00e+10 1.06e+10 ...
##  $ cement_co2                               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ cement_co2_per_capita                    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ co2                                      : num  0.015 0.084 0.092 0.092 0.106 0.106 0.154 0.183 0.293 0.33 ...
##  $ co2_growth_abs                           : num  NA 0.07 0.007 0 0.015 0 0.048 0.029 0.11 0.037 ...
##  $ co2_growth_prct                          : num  NA 475 8.7 0 16 ...
##  $ co2_including_luc                        : num  6.12 7.17 8.09 9.01 10.07 ...
##  $ co2_including_luc_growth_abs             : num  NA 1.052 0.923 0.913 1.065 ...
##  $ co2_including_luc_growth_prct            : num  NA 17.2 12.9 11.3 11.8 ...
##  $ co2_including_luc_per_capita             : num  0.831 0.922 1.027 1.127 1.244 ...
##  $ co2_including_luc_per_gdp                : num  NA 0.761 0.835 0.899 0.947 ...
##  $ co2_including_luc_per_unit_energy        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ co2_per_capita                           : num  0.002 0.011 0.012 0.011 0.013 0.013 0.018 0.022 0.034 0.038 ...
##  $ co2_per_gdp                              : num  NA 0.009 0.009 0.009 0.01 0.01 0.014 0.016 0.025 0.027 ...
##  $ co2_per_unit_energy                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ coal_co2                                 : num  0.015 0.021 0.026 0.032 0.038 0.043 0.062 0.062 0.077 0.092 ...
##  $ coal_co2_per_capita                      : num  0.002 0.003 0.003 0.004 0.005 0.005 0.007 0.007 0.009 0.011 ...
##  $ consumption_co2                          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ consumption_co2_per_capita               : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ consumption_co2_per_gdp                  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ cumulative_cement_co2                    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ cumulative_co2                           : num  0.015 0.099 0.191 0.282 0.388 ...
##  $ cumulative_co2_including_luc             : num  6.12 13.29 21.38 30.38 40.45 ...
##  $ cumulative_coal_co2                      : num  0.015 0.036 0.061 0.093 0.131 0.174 0.236 0.298 0.375 0.467 ...
##  $ cumulative_flaring_co2                   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ cumulative_gas_co2                       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ cumulative_luc_co2                       : num  500 507 515 524 533 ...
##  $ cumulative_oil_co2                       : num  0 0.063 0.129 0.189 0.257 0.321 0.413 0.534 0.75 0.988 ...
##  $ cumulative_other_co2                     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ energy_per_capita                        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ energy_per_gdp                           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ flaring_co2                              : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ flaring_co2_per_capita                   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ gas_co2                                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ gas_co2_per_capita                       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ ghg_excluding_lucf_per_capita            : num  0.154 0.159 0.154 0.149 0.146 0.141 0.144 0.144 0.159 0.159 ...
##  $ ghg_per_capita                           : num  2.54 2.56 2.67 2.77 2.87 ...
##  $ land_use_change_co2                      : num  6.1 7.08 8 8.91 9.96 ...
##  $ land_use_change_co2_per_capita           : num  0.829 0.911 1.015 1.116 1.231 ...
##  $ methane                                  : num  7.73 7.88 7.97 8.07 8.19 ...
##  $ methane_per_capita                       : num  1.05 1.01 1.01 1.01 1.01 ...
##  $ nitrous_oxide                            : num  2.16 2.23 2.29 2.37 2.45 ...
##  $ nitrous_oxide_per_capita                 : num  0.294 0.287 0.291 0.296 0.302 0.309 0.314 0.32 0.323 0.326 ...
##  $ oil_co2                                  : num  0 0.063 0.066 0.06 0.068 0.064 0.092 0.121 0.216 0.238 ...
##  $ oil_co2_per_capita                       : num  0 0.008 0.008 0.007 0.008 0.008 0.011 0.014 0.025 0.027 ...
##  $ other_co2_per_capita                     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ other_industry_co2                       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ primary_energy_consumption               : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_global_cement_co2                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ share_global_co2                         : num  0 0.001 0.001 0.001 0.002 0.002 0.002 0.002 0.004 0.004 ...
##  $ share_global_co2_including_luc           : num  0.056 0.057 0.06 0.066 0.072 0.075 0.078 0.081 0.085 0.091 ...
##  $ share_global_coal_co2                    : num  0 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.002 0.002 ...
##  $ share_global_cumulative_cement_co2       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ share_global_cumulative_co2              : num  0 0 0 0 0 0 0 0 0 0.001 ...
##  $ share_global_cumulative_co2_including_luc: num  0.001 0.002 0.003 0.004 0.006 0.007 0.009 0.01 0.012 0.013 ...
##  $ share_global_cumulative_coal_co2         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ share_global_cumulative_flaring_co2      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_global_cumulative_gas_co2          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ share_global_cumulative_luc_co2          : num  0.117 0.116 0.116 0.116 0.117 0.117 0.118 0.118 0.119 0.12 ...
##  $ share_global_cumulative_oil_co2          : num  0 0 0 0.001 0.001 0.001 0.001 0.001 0.002 0.002 ...
##  $ share_global_cumulative_other_co2        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_global_flaring_co2                 : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_global_gas_co2                     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ share_global_luc_co2                     : num  0.107 0.106 0.111 0.123 0.135 0.139 0.149 0.158 0.166 0.179 ...
##  $ share_global_oil_co2                     : num  0 0.004 0.004 0.003 0.004 0.003 0.004 0.005 0.008 0.009 ...
##  $ share_global_other_co2                   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ share_of_temperature_change_from_ghg     : num  0.131 0.131 0.13 0.13 0.13 0.13 0.13 0.13 0.13 0.13 ...
##  $ temperature_change_from_ch4              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ temperature_change_from_co2              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ temperature_change_from_ghg              : num  0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 ...
##  $ temperature_change_from_n2o              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ total_ghg                                : num  18.7 19.9 21.1 22.1 23.3 ...
##  $ total_ghg_excluding_lucf                 : num  1.13 1.24 1.22 1.19 1.18 ...
##  $ trade_co2                                : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ trade_co2_share                          : num  NA NA NA NA NA NA NA NA NA NA ...
#Check missing values
summary(is.na(data))
##   country           year          iso_code       population     
##  Mode :logical   Mode :logical   Mode :logical   Mode :logical  
##  FALSE:22386     FALSE:22386     FALSE:22386     FALSE:22386    
##                                                                 
##     gdp          cement_co2      cement_co2_per_capita    co2         
##  Mode :logical   Mode :logical   Mode :logical         Mode :logical  
##  FALSE:14548     FALSE:17391     FALSE:17391           FALSE:22386    
##  TRUE :7838      TRUE :4995      TRUE :4995                           
##  co2_growth_abs  co2_growth_prct co2_including_luc co2_including_luc_growth_abs
##  Mode :logical   Mode :logical   Mode :logical     Mode :logical               
##  FALSE:22109     FALSE:22091     FALSE:20588       FALSE:20334                 
##  TRUE :277       TRUE :295       TRUE :1798        TRUE :2052                  
##  co2_including_luc_growth_prct co2_including_luc_per_capita
##  Mode :logical                 Mode :logical               
##  FALSE:20334                   FALSE:20588                 
##  TRUE :2052                    TRUE :1798                  
##  co2_including_luc_per_gdp co2_including_luc_per_unit_energy co2_per_capita 
##  Mode :logical             Mode :logical                     Mode :logical  
##  FALSE:14135               FALSE:9016                        FALSE:22386    
##  TRUE :8251                TRUE :13370                                      
##  co2_per_gdp     co2_per_unit_energy  coal_co2       coal_co2_per_capita
##  Mode :logical   Mode :logical       Mode :logical   Mode :logical      
##  FALSE:14548     FALSE:9642          FALSE:17595     FALSE:17595        
##  TRUE :7838      TRUE :12744         TRUE :4791      TRUE :4791         
##  consumption_co2 consumption_co2_per_capita consumption_co2_per_gdp
##  Mode :logical   Mode :logical              Mode :logical          
##  FALSE:4063      FALSE:4063                 FALSE:3908             
##  TRUE :18323     TRUE :18323                TRUE :18478            
##  cumulative_cement_co2 cumulative_co2  cumulative_co2_including_luc
##  Mode :logical         Mode :logical   Mode :logical               
##  FALSE:17391           FALSE:22386     FALSE:20588                 
##  TRUE :4995                            TRUE :1798                  
##  cumulative_coal_co2 cumulative_flaring_co2 cumulative_gas_co2
##  Mode :logical       Mode :logical          Mode :logical     
##  FALSE:17595         FALSE:11869            FALSE:14044       
##  TRUE :4791          TRUE :10517            TRUE :8342        
##  cumulative_luc_co2 cumulative_oil_co2 cumulative_other_co2 energy_per_capita
##  Mode :logical      Mode :logical      Mode :logical        Mode :logical    
##  FALSE:20588        FALSE:21278        FALSE:1734           FALSE:9695       
##  TRUE :1798         TRUE :1108         TRUE :20652          TRUE :12691      
##  energy_per_gdp  flaring_co2     flaring_co2_per_capita  gas_co2       
##  Mode :logical   Mode :logical   Mode :logical          Mode :logical  
##  FALSE:7763      FALSE:11869     FALSE:11869            FALSE:14044    
##  TRUE :14623     TRUE :10517     TRUE :10517            TRUE :8342     
##  gas_co2_per_capita ghg_excluding_lucf_per_capita ghg_per_capita 
##  Mode :logical      Mode :logical                 Mode :logical  
##  FALSE:14044        FALSE:20889                   FALSE:21019    
##  TRUE :8342         TRUE :1497                    TRUE :1367     
##  land_use_change_co2 land_use_change_co2_per_capita  methane       
##  Mode :logical       Mode :logical                  Mode :logical  
##  FALSE:20588         FALSE:20588                    FALSE:21019    
##  TRUE :1798          TRUE :1798                     TRUE :1367     
##  methane_per_capita nitrous_oxide   nitrous_oxide_per_capita  oil_co2       
##  Mode :logical      Mode :logical   Mode :logical            Mode :logical  
##  FALSE:21019        FALSE:21087     FALSE:21087              FALSE:21278    
##  TRUE :1367         TRUE :1299      TRUE :1299               TRUE :1108     
##  oil_co2_per_capita other_co2_per_capita other_industry_co2
##  Mode :logical      Mode :logical        Mode :logical     
##  FALSE:21278        FALSE:1734           FALSE:1734        
##  TRUE :1108         TRUE :20652          TRUE :20652       
##  primary_energy_consumption share_global_cement_co2 share_global_co2
##  Mode :logical              Mode :logical           Mode :logical   
##  FALSE:9695                 FALSE:17207             FALSE:22386     
##  TRUE :12691                TRUE :5179                              
##  share_global_co2_including_luc share_global_coal_co2
##  Mode :logical                  Mode :logical        
##  FALSE:20588                    FALSE:17595          
##  TRUE :1798                     TRUE :4791           
##  share_global_cumulative_cement_co2 share_global_cumulative_co2
##  Mode :logical                      Mode :logical              
##  FALSE:17207                        FALSE:22386                
##  TRUE :5179                                                    
##  share_global_cumulative_co2_including_luc share_global_cumulative_coal_co2
##  Mode :logical                             Mode :logical                   
##  FALSE:20588                               FALSE:17595                     
##  TRUE :1798                                TRUE :4791                      
##  share_global_cumulative_flaring_co2 share_global_cumulative_gas_co2
##  Mode :logical                       Mode :logical                  
##  FALSE:9583                          FALSE:12481                    
##  TRUE :12803                         TRUE :9905                     
##  share_global_cumulative_luc_co2 share_global_cumulative_oil_co2
##  Mode :logical                   Mode :logical                  
##  FALSE:20588                     FALSE:20702                    
##  TRUE :1798                      TRUE :1684                     
##  share_global_cumulative_other_co2 share_global_flaring_co2
##  Mode :logical                     Mode :logical           
##  FALSE:1610                        FALSE:9583              
##  TRUE :20776                       TRUE :12803             
##  share_global_gas_co2 share_global_luc_co2 share_global_oil_co2
##  Mode :logical        Mode :logical        Mode :logical       
##  FALSE:12481          FALSE:20588          FALSE:20702         
##  TRUE :9905           TRUE :1798           TRUE :1684          
##  share_global_other_co2 share_of_temperature_change_from_ghg
##  Mode :logical          Mode :logical                       
##  FALSE:1610             FALSE:21863                         
##  TRUE :20776            TRUE :523                           
##  temperature_change_from_ch4 temperature_change_from_co2
##  Mode :logical               Mode :logical              
##  FALSE:21060                 FALSE:21863                
##  TRUE :1326                  TRUE :523                  
##  temperature_change_from_ghg temperature_change_from_n2o total_ghg      
##  Mode :logical               Mode :logical               Mode :logical  
##  FALSE:21863                 FALSE:21060                 FALSE:21019    
##  TRUE :523                   TRUE :1326                  TRUE :1367     
##  total_ghg_excluding_lucf trade_co2       trade_co2_share
##  Mode :logical            Mode :logical   Mode :logical  
##  FALSE:20889              FALSE:4063      FALSE:4063     
##  TRUE :1497               TRUE :18323     TRUE :18323

5 Analysis

5.1 Which countries emit the most CO2 globally?

top_countries <- data %>%
  group_by(country) %>%
  summarise(total_co2 = sum(co2, na.rm=TRUE)) %>%
  arrange(desc(total_co2)) %>%
  head(10)

top_countries
## # A tibble: 10 × 2
##    country        total_co2
##    <chr>              <dbl>
##  1 United States    434867.
##  2 China            285087.
##  3 Russia           122808.
##  4 Germany           95132.
##  5 United Kingdom    79394.
##  6 Japan             69612.
##  7 India             66073.
##  8 France            40048.
##  9 Canada            35644.
## 10 Ukraine           31236.

Inference: Top countries contributing to CO2 emissions are shown above.

5.2 How has global CO2 emission changed over time?

theme_set(theme_minimal())
          
global_trend <- data %>%
  group_by(year) %>%
  summarise(total_co2 = sum(co2, na.rm=TRUE))

trend <- ggplot(global_trend, aes(x=year, y=total_co2)) +
  geom_line(color="blue") +
  labs(title="Global CO2 Emissions Over Time",
       x="Year",
       y="Total CO2 Emissions")
plotly::ggplotly(trend)

Inference: Global CO2 emissions have shown a continuous increasing trend over time, indicating rising pollution levels.

5.3 Which year recorded the highest global CO2 emissions?

peak_year <- global_trend[which.max(global_trend$total_co2), ]

peak_year
## # A tibble: 1 × 2
##    year total_co2
##   <int>     <dbl>
## 1  2024    37398.

Inference: The year shown above recorded the highest global CO2 emissions, indicating peak pollution levels.

5.4 What is the percentage contribution of top countries to global CO2 emissions?

top5 <- c("United States", "China", "India", "Russia", "Japan")

share_data <- data %>%
  filter(country %in% top5) %>%
  group_by(country) %>%
  summarise(total_co2 = sum(co2, na.rm=TRUE))

share_data$percentage <- (share_data$total_co2 / sum(share_data$total_co2)) * 100

ggplot(share_data, aes(x="", y=percentage, fill=country)) +
  geom_bar(stat="identity", width=1) +
  coord_polar("y") +
  labs(title="Contribution of Top Countries to CO2 Emissions")

Inference: The chart shows that a small number of countries contribute a large share of global emissions.

5.5 How do CO2 emissions per person vary across countries?

data$co2_per_capita <- data$co2 / data$population

top_per_capita <- data %>%
  filter(!is.na(co2_per_capita)) %>%
  filter(nchar(iso_code) == 3) %>%
  group_by(country) %>%
  summarise(avg_per_capita = mean(co2_per_capita, na.rm=TRUE)) %>%
  arrange(desc(avg_per_capita)) %>%
  head(10)

top_per_capita
## # A tibble: 10 × 2
##    country                   avg_per_capita
##    <chr>                              <dbl>
##  1 Sint Maarten (Dutch part)      0.000143 
##  2 Curacao                        0.0000491
##  3 Qatar                          0.0000461
##  4 Kuwait                         0.0000289
##  5 United Arab Emirates           0.0000287
##  6 Luxembourg                     0.0000252
##  7 Brunei                         0.0000243
##  8 Bahrain                        0.0000200
##  9 Saudi Arabia                   0.0000141
## 10 Trinidad and Tobago            0.0000134

Inference: The results show that some countries have significantly higher emissions per person, indicating greater individual environmental impact.

5.6 What is the distribution of CO2 emissions across countries?

distribution <- ggplot(data, aes(x=co2)) +
  geom_histogram(bins=30, fill="steelblue", color="black") +
  labs(
    title="Distribution of CO2 Emissions",
    x="CO2 Emissions",
    y="Frequency"
  )
plotly::ggplotly(distribution)

Inference: The distribution indicates that most countries have relatively low emissions, while a few countries contribute disproportionately high emissions.

5.7 Are there any extreme outliers in CO2 emissions?

box <- ggplot(data, aes(y = co2)) +
  geom_boxplot(fill = "orange", color = "black") +
  scale_y_log10() +
  labs(
    title = "Boxplot of CO2 Emissions (Log Scale)",
    y = "CO2 Emissions (log scale)"
  )
plotly::ggplotly(box)

Inference: The boxplot clearly highlights extreme outliers, representing countries with exceptionally high emission levels compared to others.

5.8 Is global CO2 emission increasing over time?

trend_model <- lm(total_co2 ~ year, data = global_trend)

summary(trend_model)
## 
## Call:
## lm(formula = total_co2 ~ year, data = global_trend)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8389.7 -4867.4  -576.2  3764.2 14137.4 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.480e+05  1.082e+04  -22.92   <2e-16 ***
## year         1.340e+02  5.665e+00   23.66   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5802 on 228 degrees of freedom
## Multiple R-squared:  0.7106, Adjusted R-squared:  0.7093 
## F-statistic: 559.8 on 1 and 228 DF,  p-value: < 2.2e-16

Inference: The positive trend indicates that global CO2 emissions are increasing over time, highlighting worsening pollution levels.

5.9 What could be the future trend of CO2 emissions if current patterns continue?

global_trend$future_co2 <- global_trend$total_co2 * 1.1

pred1 <- ggplot(global_trend, aes(x=year)) +
  geom_line(aes(y=total_co2), color="blue") +
  geom_line(aes(y=future_co2), color="red") +
  labs(
    title="Current vs Predicted CO2 Emissions",
    x="Year",
    y="CO2 Emissions"
  )
plotly::ggplotly(pred1)

Inference: If current trends continue, CO2 emissions are expected to rise further, posing serious environmental risks.

5.10 Which countries should take immediate action based on high emissions?

high_risk <- data %>%
  filter(nchar(iso_code) == 3) %>%
  group_by(country) %>%
  summarise(total_co2 = sum(co2, na.rm=TRUE)) %>%
  arrange(desc(total_co2)) %>%
  head(5)

high_risk
## # A tibble: 5 × 2
##   country        total_co2
##   <chr>              <dbl>
## 1 United States    434867.
## 2 China            285087.
## 3 Russia           122808.
## 4 Germany           95132.
## 5 United Kingdom    79394.

Inference: Countries with the highest emissions should take immediate action to control pollution and reduce environmental impact.

5.11 How might increasing CO2 emissions impact future environmental conditions?

model <- lm(co2 ~ year, data = data)

summary(model)
## 
## Call:
## lm(formula = co2 ~ year, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
##  -139.8   -98.7   -67.7   -15.8 12149.2 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.759e+03  1.231e+02  -14.30   <2e-16 ***
## year         9.383e-01  6.274e-02   14.95   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 444.9 on 22384 degrees of freedom
## Multiple R-squared:  0.009892,   Adjusted R-squared:  0.009848 
## F-statistic: 223.6 on 1 and 22384 DF,  p-value: < 2.2e-16

Inference: The model shows a positive relationship between year and CO2 emissions, indicating that pollution is increasing over time and may worsen future environmental conditions.

5.13 Which countries show the fastest growth in CO2 emissions?

growth_data <- data %>%
  filter(nchar(iso_code) == 3) %>%
  group_by(country) %>%
  summarise(growth = max(co2, na.rm=TRUE) - min(co2, na.rm=TRUE)) %>%
  arrange(desc(growth)) %>%
  head(10)

growth_data
## # A tibble: 10 × 2
##    country       growth
##    <chr>          <dbl>
##  1 China         12272.
##  2 United States  6127.
##  3 India          3193.
##  4 Russia         2536.
##  5 Japan          1312.
##  6 Germany        1117.
##  7 Indonesia       812.
##  8 Iran            793.
##  9 Ukraine         744.
## 10 Saudi Arabia    708.

Inference: These countries have shown the highest increase in emissions over time, indicating rapid industrial or economic growth.

5.14 Which countries are the top polluters in recent years?

latest_year <- max(data$year, na.rm=TRUE)

recent_data <- data %>%
  filter(year == latest_year) %>%
  filter(nchar(iso_code) == 3) %>%
  arrange(desc(co2)) %>%
  head(10)

recent_data[, c("country", "co2")]
##          country       co2
## 1          China 12289.037
## 2  United States  4904.120
## 3          India  3193.478
## 4         Russia  1780.524
## 5          Japan   961.867
## 6      Indonesia   812.220
## 7           Iran   792.631
## 8   Saudi Arabia   692.133
## 9    South Korea   583.679
## 10       Germany   572.319

Inference: The most recent data highlights current global pollution leaders, which are key contributors to environmental issues today.

5.15 How do CO2 emissions of top countries change over time?

top5 <- c("United States", "China", "India", "Russia", "Japan")

trend_data <- data %>%
  filter(country %in% top5)

top <- ggplot(trend_data, aes(x=year, y=co2, color=country)) +
  geom_line(size=1) +
  labs(
    title="CO2 Emission Trends of Top 5 Countries",
    x="Year",
    y="CO2 Emissions"
  )
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
plotly::ggplotly(top)

Inference: The graph shows how emissions have evolved over time for major countries, highlighting differences in growth patterns and industrial development.

5.16 What is the percentage contribution of top countries to total emissions?

total_global <- sum(data$co2, na.rm=TRUE)

top5_data <- data %>%
  filter(country %in% c("United States", "China", "India", "Russia", "Japan")) %>%
  group_by(country) %>%
  summarise(total = sum(co2, na.rm=TRUE))

top5_data$percentage <- (top5_data$total / total_global) * 100

top5_data
## # A tibble: 5 × 3
##   country         total percentage
##   <chr>           <dbl>      <dbl>
## 1 China         285087.      15.8 
## 2 India          66073.       3.67
## 3 Japan          69612.       3.86
## 4 Russia        122808.       6.81
## 5 United States 434867.      24.1

Inference: A small number of countries contribute a large percentage of global emissions.

5.17 Are CO2 emissions increasing at an accelerating rate?

global_trend$change <- c(NA, diff(global_trend$total_co2))

acc <- ggplot(global_trend, aes(x=year, y=change)) +
  geom_line(color="purple") +
  labs(
    title="Yearly Change in CO2 Emissions",
    x="Year",
    y="Change in Emissions"
  )
plotly::ggplotly(acc)

Inference: The increasing fluctuations suggest that emissions are not just rising but may be accelerating.

5.18 Which countries have reduced their CO2 emissions over time?

reduction <- data %>%
  filter(nchar(iso_code) == 3) %>%
  group_by(country) %>%
  summarise(change = last(co2) - first(co2)) %>%
  arrange(change)

head(reduction, 10)
## # A tibble: 10 × 2
##    country                    change
##    <chr>                       <dbl>
##  1 Curacao                   -3.84  
##  2 Niue                       0.004 
##  3 Saint Helena               0.007 
##  4 Tuvalu                     0.007 
##  5 Wallis and Futuna          0.013 
##  6 Micronesia (country)       0.0150
##  7 Andorra                    0.0220
##  8 Montserrat                 0.023 
##  9 Nauru                      0.032 
## 10 Sint Maarten (Dutch part)  0.0350

Inference: Some countries have successfully reduced emissions, indicating effective environmental policies.

5.20 How does CO2 emission vary across different time periods?

data$period <- ifelse(data$year < 1980, "Before 1980",
                ifelse(data$year < 2000, "1980-2000", "After 2000"))

period_data <- data %>%
  group_by(period) %>%
  summarise(avg_co2 = mean(co2, na.rm=TRUE))

period_data
## # A tibble: 3 × 2
##   period      avg_co2
##   <chr>         <dbl>
## 1 1980-2000     103. 
## 2 After 2000    151. 
## 3 Before 1980    44.3

Inference: CO2 emissions have increased significantly in recent decades, especially after 2000, indicating rapid industrial growth and environmental impact.

5.21 How does population size influence total CO2 emissions?

effect <- ggplot(data, aes(x=population, y=co2)) +
  geom_point(alpha=0.5, color="blue") +
  labs(
    title="Population vs CO2 Emissions",
    x="Population",
    y="CO2 Emissions"
  )
plotly::ggplotly(effect)

Inference: Countries with larger populations tend to have higher total emissions.

5.22 How do total emissions differ from per capita emissions?

compare <- ggplot(data, aes(x=co2_per_capita, y=co2)) +
  geom_point(alpha=0.5, color="red") +
  labs(
    title="Per Capita vs Total Emissions",
    x="CO2 per Capita",
    y="Total CO2"
  )
plotly::ggplotly(compare)

Inference: Some countries have high total emissions but lower per capita values.

5.23 Which countries have the highest per capita emissions in recent years?

latest_year <- max(data$year, na.rm=TRUE)

recent_pc <- data %>%
  filter(year == latest_year) %>%
  filter(nchar(iso_code) == 3) %>%
  arrange(desc(co2_per_capita)) %>%
  head(10)

recent_pc[, c("country","co2_per_capita")]
##                      country co2_per_capita
## 1                      Qatar   4.127109e-05
## 2                     Kuwait   2.624760e-05
## 3                     Brunei   2.604520e-05
## 4                    Bahrain   2.426980e-05
## 5        Trinidad and Tobago   2.293176e-05
## 6               Saudi Arabia   2.037918e-05
## 7       United Arab Emirates   2.013107e-05
## 8              New Caledonia   1.806564e-05
## 9  Sint Maarten (Dutch part)   1.655446e-05
## 10                      Oman   1.565111e-05

Inference: Some smaller countries have extremely high emissions per person.

5.24 What is the smoothed trend of global CO2 emissions?

smooth <- ggplot(global_trend, aes(x=year, y=total_co2)) +
  geom_line(color="gray") +
  geom_smooth(method="loess", color="red") +
  labs(title="Smoothed CO2 Emission Trend")
plotly::ggplotly(smooth)
## `geom_smooth()` using formula = 'y ~ x'

Inference: The smoothed curve highlights the long-term upward trend in emissions.

5.25 How are CO2 emissions distributed across countries?

den <- ggplot(data, aes(x=co2)) +
  geom_density(fill="green", alpha=0.5)
plotly::ggplotly(den)

Inference: Most countries cluster at lower emission levels with a long tail of high emitters.

5.26 How have emissions changed in the last decade?

recent_data <- data %>%
  filter(year >= max(year) - 10)

dec <- ggplot(recent_data, aes(x=year, y=co2)) +
  geom_line(color="blue")
plotly::ggplotly(dec)

Inference: Recent years show continued increase in emissions.

5.27 Which countries have stable emission patterns over time?

stability <- data %>%
  filter(nchar(iso_code) == 3) %>%
  group_by(country) %>%
  summarise(sd_co2 = sd(co2, na.rm=TRUE)) %>%
  arrange(sd_co2)

head(stability, 10)
## # A tibble: 10 × 2
##    country                    sd_co2
##    <chr>                       <dbl>
##  1 Niue                      0.00152
##  2 Tuvalu                    0.00274
##  3 Saint Helena              0.00296
##  4 Wallis and Futuna         0.00315
##  5 Montserrat                0.0128 
##  6 Saint Pierre and Miquelon 0.0175 
##  7 Kiribati                  0.0181 
##  8 Cook Islands              0.0236 
##  9 Micronesia (country)      0.0241 
## 10 Marshall Islands          0.0243

Inference: Countries with low variation show stable emission patterns.

5.28 Is there a correlation between population and CO2 emissions?

cor(data$population, data$co2, use="complete.obs")
## [1] 0.6311937

Inference: There is a positive correlation between population and emissions.

5.29 How does log transformation help in understanding emissions?

log1 <- ggplot(data, aes(x=log(co2))) +
  geom_histogram(bins=30, fill="purple")
plotly::ggplotly(log1)

Inference: Log transformation reduces skewness and improves visualization.

5.30 How have top contributors changed over time?

top_countries_names <- top_countries$country

trend_top <- data %>%
  filter(country %in% top_countries_names)

trend2 <- ggplot(trend_top, aes(x=year, y=co2, color=country)) +
  geom_line()
plotly::ggplotly(trend2)

Inference: Top contributors remain consistent over time.

5.31 What share do top countries contribute globally?

top_total <- sum(top_countries$total_co2)
global_total <- sum(data$co2, na.rm=TRUE)

top_total / global_total * 100
## [1] 69.91129

Inference: Top countries contribute a major portion of global emissions.

5.32 Which countries have the lowest emissions?

low_emitters <- data %>%
  filter(nchar(iso_code) == 3) %>%
  arrange(co2) %>%
  head(10)

low_emitters[, c("country","co2")]
##      country   co2
## 1    Armenia 0.001
## 2    Armenia 0.001
## 3    Armenia 0.001
## 4    Armenia 0.001
## 5  Australia 0.001
## 6  Australia 0.001
## 7  Australia 0.001
## 8  Australia 0.001
## 9  Australia 0.001
## 10 Australia 0.001

Inference: Some countries contribute very little to global emissions.

5.33 How variable are emissions globally?

var(data$co2, na.rm=TRUE)
## [1] 199870.7

Inference: High variance indicates unequal emission distribution.

5.34 How does emission growth indicate future risk?

risk <- ggplot(growth_data, aes(x=reorder(country, growth), y=growth)) +
  geom_bar(stat="identity", fill="red") +
  coord_flip()
plotly::ggplotly(risk)

Inference: Countries with highest growth pose future environmental risks.

5.36 What are the descriptive statistics of CO2 emissions?

mean(data$co2, na.rm=TRUE)
## [1] 80.50304
median(data$co2, na.rm=TRUE)
## [1] 3.2115
sd(data$co2, na.rm=TRUE)
## [1] 447.069
quantile(data$co2, probs = c(0.25, 0.5, 0.75), na.rm=TRUE)
##      25%      50%      75% 
##  0.37700  3.21150 25.31125

Inference: CO2 emissions show high variability with a significant difference between median and maximum values, indicating skewness.

5.37 What is the standard deviation of CO2 emissions?

sd(data$co2, na.rm=TRUE)
## [1] 447.069

Inference: High standard deviation indicates unequal emission distribution across countries.

5.38 What is the distribution shape of CO2 emissions?

distribution1 <- ggplot(data, aes(x=co2)) +
  geom_histogram(bins=30, fill="blue") +
  geom_density(color="red")
plotly::ggplotly(distribution1)

Inference: The distribution is right-skewed, meaning most countries have low emissions while few have very high emissions.

5.39 Are there outliers in CO2 emissions using IQR method?

Q1 <- quantile(data$co2, 0.25, na.rm=TRUE)
Q3 <- quantile(data$co2, 0.75, na.rm=TRUE)
IQR_val <- Q3 - Q1

lower <- Q1 - 1.5 * IQR_val
upper <- Q3 + 1.5 * IQR_val

outliers <- data %>%
  filter(co2 < lower | co2 > upper)

head(outliers)
##   country year iso_code population          gdp cement_co2
## 1 Algeria 1980      DZA   18607175  94481648187      1.729
## 2 Algeria 1984      DZA   21271969 114406671796      2.131
## 3 Algeria 1985      DZA   22008542 120364208777      2.299
## 4 Algeria 1986      DZA   22745501 119145030269      2.384
## 5 Algeria 1987      DZA   23443627 118321211437      2.727
## 6 Algeria 1988      DZA   24109538 115821713281      2.546
##   cement_co2_per_capita    co2 co2_growth_abs co2_growth_prct co2_including_luc
## 1                 0.093 66.124         20.820          45.957            68.773
## 2                 0.100 70.417         18.343          35.224            77.104
## 3                 0.104 71.988          1.571           2.231            77.087
## 4                 0.105 75.382          3.394           4.715            83.829
## 5                 0.116 83.020          7.638          10.132            89.001
## 6                 0.106 82.839         -0.181          -0.218            88.672
##   co2_including_luc_growth_abs co2_including_luc_growth_prct
## 1                       19.776                        40.363
## 2                       11.209                        17.011
## 3                       -0.017                        -0.022
## 4                        6.742                         8.746
## 5                        5.172                         6.169
## 6                       -0.329                        -0.370
##   co2_including_luc_per_capita co2_including_luc_per_gdp
## 1                        3.696                     0.728
## 2                        3.625                     0.674
## 3                        3.503                     0.640
## 4                        3.686                     0.704
## 5                        3.796                     0.752
## 6                        3.678                     0.766
##   co2_including_luc_per_unit_energy co2_per_capita co2_per_gdp
## 1                             0.388   3.553683e-06       0.700
## 2                             0.296   3.310319e-06       0.615
## 3                             0.294   3.270912e-06       0.598
## 4                             0.302   3.314150e-06       0.633
## 5                             0.312   3.541261e-06       0.702
## 6                             0.290   3.435943e-06       0.715
##   co2_per_unit_energy coal_co2 coal_co2_per_capita consumption_co2
## 1               0.373    1.608               0.086              NA
## 2               0.271    4.027               0.189              NA
## 3               0.274    3.078               0.140              NA
## 4               0.272    2.726               0.120              NA
## 5               0.291    2.968               0.127              NA
## 6               0.271    2.975               0.123              NA
##   consumption_co2_per_capita consumption_co2_per_gdp cumulative_cement_co2
## 1                         NA                      NA                16.711
## 2                         NA                      NA                24.314
## 3                         NA                      NA                26.613
## 4                         NA                      NA                28.997
## 5                         NA                      NA                31.723
## 6                         NA                      NA                34.269
##   cumulative_co2 cumulative_co2_including_luc cumulative_coal_co2
## 1        544.567                     1356.417              35.981
## 2        751.865                     1583.301              47.040
## 3        823.853                     1660.388              50.118
## 4        899.235                     1744.217              52.844
## 5        982.255                     1833.218              55.811
## 6       1065.093                     1921.889              58.786
##   cumulative_flaring_co2 cumulative_gas_co2 cumulative_luc_co2
## 1                195.702             95.228           1051.398
## 2                237.293            160.683           1070.983
## 3                253.701            188.383           1076.083
## 4                268.126            221.469           1084.530
## 5                280.243            261.586           1090.511
## 6                291.304            300.559           1096.344
##   cumulative_oil_co2 cumulative_other_co2 energy_per_capita energy_per_gdp
## 1            200.946                   NA          9520.142          1.875
## 2            282.535                   NA         12231.881          2.274
## 3            305.039                   NA         11917.944          2.179
## 4            327.800                   NA         12199.799          2.329
## 5            352.891                   NA         12163.992          2.410
## 6            380.176                   NA         12668.091          2.637
##   flaring_co2 flaring_co2_per_capita gas_co2 gas_co2_per_capita
## 1      18.686                  1.004  25.479              1.369
## 2      10.981                  0.516  30.510              1.434
## 3      16.407                  0.746  27.700              1.259
## 4      14.425                  0.634  33.086              1.455
## 5      12.117                  0.517  40.117              1.711
## 6      11.061                  0.459  38.972              1.616
##   ghg_excluding_lucf_per_capita ghg_per_capita land_use_change_co2
## 1                         6.103          6.495               2.649
## 2                         5.501          6.037               6.687
## 3                         5.457          5.953               5.099
## 4                         5.508          6.029               8.447
## 5                         5.698          6.157               5.981
## 6                         5.519          5.956               5.833
##   land_use_change_co2_per_capita methane methane_per_capita nitrous_oxide
## 1                          0.142  50.788              2.729         3.845
## 2                          0.314  50.028              2.352         4.266
## 3                          0.232  51.467              2.338         4.502
## 4                          0.371  53.124              2.336         4.541
## 5                          0.255  53.929              2.300         4.416
## 6                          0.242  53.571              2.222         4.187
##   nitrous_oxide_per_capita oil_co2 oil_co2_per_capita other_co2_per_capita
## 1                    0.207  18.620              1.001                   NA
## 2                    0.201  22.768              1.070                   NA
## 3                    0.205  22.504              1.023                   NA
## 4                    0.200  22.761              1.001                   NA
## 5                    0.188  25.091              1.070                   NA
## 6                    0.174  27.285              1.132                   NA
##   other_industry_co2 primary_energy_consumption share_global_cement_co2
## 1                 NA                    177.143                   0.426
## 2                 NA                    260.196                   0.506
## 3                 NA                    262.297                   0.540
## 4                 NA                    277.491                   0.540
## 5                 NA                    285.168                   0.595
## 6                 NA                    305.422                   0.528
##   share_global_co2 share_global_co2_including_luc share_global_coal_co2
## 1            0.341                          0.271                 0.023
## 2            0.362                          0.298                 0.053
## 3            0.357                          0.286                 0.038
## 4            0.369                          0.306                 0.033
## 5            0.393                          0.322                 0.035
## 6            0.378                          0.316                 0.034
##   share_global_cumulative_cement_co2 share_global_cumulative_co2
## 1                              0.209                       0.090
## 2                              0.252                       0.111
## 3                              0.264                       0.118
## 4                              0.275                       0.125
## 5                              0.289                       0.133
## 6                              0.299                       0.140
##   share_global_cumulative_co2_including_luc share_global_cumulative_coal_co2
## 1                                     0.109                            0.010
## 2                                     0.118                            0.012
## 3                                     0.121                            0.013
## 4                                     0.125                            0.013
## 5                                     0.129                            0.014
## 6                                     0.133                            0.014
##   share_global_cumulative_flaring_co2 share_global_cumulative_gas_co2
## 1                               3.159                           0.201
## 2                               3.358                           0.275
## 3                               3.500                           0.306
## 4                               3.615                           0.343
## 5                               3.697                           0.385
## 6                               3.751                           0.422
##   share_global_cumulative_luc_co2 share_global_cumulative_oil_co2
## 1                           0.164                           0.110
## 2                           0.161                           0.131
## 3                           0.160                           0.136
## 4                           0.159                           0.141
## 5                           0.159                           0.146
## 6                           0.158                           0.152
##   share_global_cumulative_other_co2 share_global_flaring_co2
## 1                                NA                    5.905
## 2                                NA                    5.880
## 3                                NA                    9.042
## 4                                NA                    8.502
## 5                                NA                    7.433
## 6                                NA                    5.959
##   share_global_gas_co2 share_global_luc_co2 share_global_oil_co2
## 1                0.929                0.044                0.209
## 2                1.027                0.104                0.276
## 3                0.897                0.075                0.273
## 4                1.098                0.122                0.268
## 5                1.222                0.092                0.292
## 6                1.134                0.095                0.307
##   share_global_other_co2 share_of_temperature_change_from_ghg
## 1                     NA                                0.248
## 2                     NA                                0.261
## 3                     NA                                0.266
## 4                     NA                                0.270
## 5                     NA                                0.275
## 6                     NA                                0.279
##   temperature_change_from_ch4 temperature_change_from_co2
## 1                       0.002                       0.000
## 2                       0.002                       0.000
## 3                       0.002                       0.001
## 4                       0.002                       0.001
## 5                       0.002                       0.001
## 6                       0.002                       0.001
##   temperature_change_from_ghg temperature_change_from_n2o total_ghg
## 1                       0.002                           0   120.856
## 2                       0.002                           0   128.428
## 3                       0.002                           0   131.022
## 4                       0.002                           0   137.133
## 5                       0.003                           0   144.349
## 6                       0.003                           0   143.595
##   total_ghg_excluding_lucf trade_co2 trade_co2_share    period
## 1                  113.565        NA              NA 1980-2000
## 2                  117.018        NA              NA 1980-2000
## 3                  120.111        NA              NA 1980-2000
## 4                  125.273        NA              NA 1980-2000
## 5                  133.570        NA              NA 1980-2000
## 6                  133.069        NA              NA 1980-2000

Inference: Outliers represent countries with extremely high emissions.

5.40 What is the correlation between population and CO2 emissions?

cor(data$population, data$co2, use="complete.obs")
## [1] 0.6311937

Inference: A positive correlation suggests that higher population is associated with higher emissions.

5.41 How can correlation between variables be visualized?

corr1 <- ggplot(data, aes(x=population, y=co2)) +
  geom_point(alpha=0.5) +
  geom_smooth(method="lm", color="red")
plotly::ggplotly(corr1)
## `geom_smooth()` using formula = 'y ~ x'

Inference: The plot shows a positive relationship between population and emissions.

5.42 How strong is the relationship between variables?

correlation <- cor(data$population, data$co2, use="complete.obs")

correlation
## [1] 0.6311937

Inference: The strength of correlation indicates how strongly population influences emissions.

5.43 Can we model CO2 emissions using population?

model <- lm(co2 ~ population, data = data)

summary(model)
## 
## Call:
## lm(formula = co2 ~ population, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2433.1   -25.3   -12.7    -8.7  7841.7 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 9.669e+00  2.390e+00   4.046 5.22e-05 ***
## population  3.127e-06  2.568e-08 121.753  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 346.8 on 22384 degrees of freedom
## Multiple R-squared:  0.3984, Adjusted R-squared:  0.3984 
## F-statistic: 1.482e+04 on 1 and 22384 DF,  p-value: < 2.2e-16

Inference: The regression model shows how population influences CO2 emissions.

5.44 What do regression coefficients indicate?

coef(model)
##  (Intercept)   population 
## 9.668677e+00 3.126621e-06

Inference: Coefficients represent how much CO2 emissions change with population.

5.45 Can we predict CO2 emissions on unseen data using regression?

# Split data
set.seed(123)
train_index <- createDataPartition(data$co2, p = 0.7, list = FALSE)

train_data <- data[train_index, ]
test_data  <- data[-train_index, ]

# Train model
model <- lm(co2 ~ year, data = train_data)

# Predict on unseen data
prediction <- predict(model, newdata = test_data)

# View predictions
head(data.frame(
  Actual = test_data$co2,
  Predicted = prediction
))
##    Actual Predicted
## 3   0.092  70.26912
## 4   0.092  71.23942
## 6   0.106  73.18001
## 8   0.183  75.12061
## 16  0.839  82.88300
## 29  2.384  95.49689

Inference: The model was trained on 70% of the data and used to predict CO2 emissions on unseen test data. The predicted values provide an estimate based on the year variable.

5.46 How can we visually evaluate the model performance?

library(ggplot2)

# Create result dataframe
results <- data.frame(
  Actual = test_data$co2,
  Predicted = prediction
)

# Plot
visual1 <- ggplot(results, aes(x = Actual, y = Predicted)) +
  geom_point(color = "blue", alpha = 0.5) +
  geom_abline(slope = 1, intercept = 0, color = "red") +
  labs(
    title = "Actual vs Predicted CO2 Emissions",
    x = "Actual CO2",
    y = "Predicted CO2"
  )+
  coord_flip()
plotly::ggplotly(visual1)

Inference: Points close to the red line indicate accurate predictions. Deviations from the line represent prediction errors, showing how well the model performs on unseen data.

6 Key Insights

  • After data cleaning and preprocessing, the dataset provided reliable country-level CO2 emission analysis.
  • Global CO2 emissions show a strong increasing trend over time, supported by both visualization and regression modeling.
  • A small number of countries contribute a disproportionately large share of total emissions, highlighting global imbalance.
  • Per capita analysis reveals that some countries with smaller populations still have high individual emission levels.
  • The distribution of CO2 emissions is highly skewed, with several extreme outliers identified using boxplots and IQR methods.
  • Correlation analysis indicates a positive relationship between population and CO2 emissions, though with variability.
  • Regression modeling suggests that if current trends continue, CO2 emissions are likely to increase further, indicating future environmental risk.

7 Conclusion

This project analyzed global CO2 emission trends using R, incorporating data cleaning, exploratory data analysis, visualization, correlation, and regression techniques. The findings reveal a consistent rise in emissions over time, with significant disparities among countries.

Advanced analysis techniques such as boxplots, correlation, and regression provided deeper insights into emission patterns and relationships between variables. The results highlight that while population contributes to emissions, other factors also influence variability across countries.

Overall, the study emphasizes the urgent need for effective environmental policies and sustainable practices. If current trends continue, future environmental conditions may worsen, making it essential to take immediate global action to reduce pollution.