Pollution is one of the most critical global challenges affecting environmental sustainability and human health. This project analyzes global CO2 emission trends using R programming to understand historical patterns, identify key contributors, and evaluate potential future impacts.
The dataset used in this project is sourced from Our World in Data. It contains country-wise information on CO2 emissions, population, and related environmental indicators across multiple years.
## country year iso_code population gdp cement_co2 cement_co2_per_capita co2
## 1 Afghanistan 1750 AFG 2802560 NA 0 0 NA
## 2 Afghanistan 1751 AFG NA NA 0 NA NA
## 3 Afghanistan 1752 AFG NA NA 0 NA NA
## 4 Afghanistan 1753 AFG NA NA 0 NA NA
## 5 Afghanistan 1754 AFG NA NA 0 NA NA
## 6 Afghanistan 1755 AFG NA NA 0 NA NA
## co2_growth_abs co2_growth_prct co2_including_luc co2_including_luc_growth_abs
## 1 NA NA NA NA
## 2 NA NA NA NA
## 3 NA NA NA NA
## 4 NA NA NA NA
## 5 NA NA NA NA
## 6 NA NA NA NA
## co2_including_luc_growth_prct co2_including_luc_per_capita
## 1 NA NA
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 NA NA
## 6 NA NA
## co2_including_luc_per_gdp co2_including_luc_per_unit_energy co2_per_capita
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## co2_per_gdp co2_per_unit_energy coal_co2 coal_co2_per_capita consumption_co2
## 1 NA NA NA NA NA
## 2 NA NA NA NA NA
## 3 NA NA NA NA NA
## 4 NA NA NA NA NA
## 5 NA NA NA NA NA
## 6 NA NA NA NA NA
## consumption_co2_per_capita consumption_co2_per_gdp cumulative_cement_co2
## 1 NA NA 0
## 2 NA NA 0
## 3 NA NA 0
## 4 NA NA 0
## 5 NA NA 0
## 6 NA NA 0
## cumulative_co2 cumulative_co2_including_luc cumulative_coal_co2
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## cumulative_flaring_co2 cumulative_gas_co2 cumulative_luc_co2
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## cumulative_oil_co2 cumulative_other_co2 energy_per_capita energy_per_gdp
## 1 NA NA NA NA
## 2 NA NA NA NA
## 3 NA NA NA NA
## 4 NA NA NA NA
## 5 NA NA NA NA
## 6 NA NA NA NA
## flaring_co2 flaring_co2_per_capita gas_co2 gas_co2_per_capita
## 1 NA NA NA NA
## 2 NA NA NA NA
## 3 NA NA NA NA
## 4 NA NA NA NA
## 5 NA NA NA NA
## 6 NA NA NA NA
## ghg_excluding_lucf_per_capita ghg_per_capita land_use_change_co2
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## land_use_change_co2_per_capita methane methane_per_capita nitrous_oxide
## 1 NA NA NA NA
## 2 NA NA NA NA
## 3 NA NA NA NA
## 4 NA NA NA NA
## 5 NA NA NA NA
## 6 NA NA NA NA
## nitrous_oxide_per_capita oil_co2 oil_co2_per_capita other_co2_per_capita
## 1 NA NA NA NA
## 2 NA NA NA NA
## 3 NA NA NA NA
## 4 NA NA NA NA
## 5 NA NA NA NA
## 6 NA NA NA NA
## other_industry_co2 primary_energy_consumption share_global_cement_co2
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## share_global_co2 share_global_co2_including_luc share_global_coal_co2
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## share_global_cumulative_cement_co2 share_global_cumulative_co2
## 1 NA NA
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 NA NA
## 6 NA NA
## share_global_cumulative_co2_including_luc share_global_cumulative_coal_co2
## 1 NA NA
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 NA NA
## 6 NA NA
## share_global_cumulative_flaring_co2 share_global_cumulative_gas_co2
## 1 NA NA
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 NA NA
## 6 NA NA
## share_global_cumulative_luc_co2 share_global_cumulative_oil_co2
## 1 NA NA
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 NA NA
## 6 NA NA
## share_global_cumulative_other_co2 share_global_flaring_co2
## 1 NA NA
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 NA NA
## 6 NA NA
## share_global_gas_co2 share_global_luc_co2 share_global_oil_co2
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## share_global_other_co2 share_of_temperature_change_from_ghg
## 1 NA NA
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 NA NA
## 6 NA NA
## temperature_change_from_ch4 temperature_change_from_co2
## 1 NA NA
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 NA NA
## 6 NA NA
## temperature_change_from_ghg temperature_change_from_n2o total_ghg
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## total_ghg_excluding_lucf trade_co2 trade_co2_share
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
The following libraries were used for data analysis and visualization:
## Warning: package 'tidyverse' was built under R version 4.5.3
## Warning: package 'ggplot2' was built under R version 4.5.3
## Warning: package 'tidyr' was built under R version 4.5.3
## Warning: package 'lubridate' was built under R version 4.5.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.0 ✔ readr 2.2.0
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.2 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Warning: package 'corrplot' was built under R version 4.5.3
## corrplot 0.95 loaded
## Warning: package 'caret' was built under R version 4.5.3
## Loading required package: lattice
##
## Attaching package: 'caret'
##
## The following object is masked from 'package:purrr':
##
## lift
## Warning: package 'plotly' was built under R version 4.5.3
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
Before analysis, the dataset was cleaned to ensure accuracy and consistency.
# Remove missing values
data <- data %>%
filter(!is.na(co2), !is.na(population))
# Remove non-country data (like World, Asia)
data <- data %>%
filter(nchar(iso_code) == 3)
# Remove zero or negative values
data <- data %>%
filter(co2 > 0)
# Check structure
str(data)## 'data.frame': 22386 obs. of 79 variables:
## $ country : chr "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
## $ year : int 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 ...
## $ iso_code : chr "AFG" "AFG" "AFG" "AFG" ...
## $ population : num 7356890 7776180 7879343 7987784 8096703 ...
## $ gdp : num NA 9.42e+09 9.69e+09 1.00e+10 1.06e+10 ...
## $ cement_co2 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ cement_co2_per_capita : num 0 0 0 0 0 0 0 0 0 0 ...
## $ co2 : num 0.015 0.084 0.092 0.092 0.106 0.106 0.154 0.183 0.293 0.33 ...
## $ co2_growth_abs : num NA 0.07 0.007 0 0.015 0 0.048 0.029 0.11 0.037 ...
## $ co2_growth_prct : num NA 475 8.7 0 16 ...
## $ co2_including_luc : num 6.12 7.17 8.09 9.01 10.07 ...
## $ co2_including_luc_growth_abs : num NA 1.052 0.923 0.913 1.065 ...
## $ co2_including_luc_growth_prct : num NA 17.2 12.9 11.3 11.8 ...
## $ co2_including_luc_per_capita : num 0.831 0.922 1.027 1.127 1.244 ...
## $ co2_including_luc_per_gdp : num NA 0.761 0.835 0.899 0.947 ...
## $ co2_including_luc_per_unit_energy : num NA NA NA NA NA NA NA NA NA NA ...
## $ co2_per_capita : num 0.002 0.011 0.012 0.011 0.013 0.013 0.018 0.022 0.034 0.038 ...
## $ co2_per_gdp : num NA 0.009 0.009 0.009 0.01 0.01 0.014 0.016 0.025 0.027 ...
## $ co2_per_unit_energy : num NA NA NA NA NA NA NA NA NA NA ...
## $ coal_co2 : num 0.015 0.021 0.026 0.032 0.038 0.043 0.062 0.062 0.077 0.092 ...
## $ coal_co2_per_capita : num 0.002 0.003 0.003 0.004 0.005 0.005 0.007 0.007 0.009 0.011 ...
## $ consumption_co2 : num NA NA NA NA NA NA NA NA NA NA ...
## $ consumption_co2_per_capita : num NA NA NA NA NA NA NA NA NA NA ...
## $ consumption_co2_per_gdp : num NA NA NA NA NA NA NA NA NA NA ...
## $ cumulative_cement_co2 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ cumulative_co2 : num 0.015 0.099 0.191 0.282 0.388 ...
## $ cumulative_co2_including_luc : num 6.12 13.29 21.38 30.38 40.45 ...
## $ cumulative_coal_co2 : num 0.015 0.036 0.061 0.093 0.131 0.174 0.236 0.298 0.375 0.467 ...
## $ cumulative_flaring_co2 : num NA NA NA NA NA NA NA NA NA NA ...
## $ cumulative_gas_co2 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ cumulative_luc_co2 : num 500 507 515 524 533 ...
## $ cumulative_oil_co2 : num 0 0.063 0.129 0.189 0.257 0.321 0.413 0.534 0.75 0.988 ...
## $ cumulative_other_co2 : num NA NA NA NA NA NA NA NA NA NA ...
## $ energy_per_capita : num NA NA NA NA NA NA NA NA NA NA ...
## $ energy_per_gdp : num NA NA NA NA NA NA NA NA NA NA ...
## $ flaring_co2 : num NA NA NA NA NA NA NA NA NA NA ...
## $ flaring_co2_per_capita : num NA NA NA NA NA NA NA NA NA NA ...
## $ gas_co2 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ gas_co2_per_capita : num 0 0 0 0 0 0 0 0 0 0 ...
## $ ghg_excluding_lucf_per_capita : num 0.154 0.159 0.154 0.149 0.146 0.141 0.144 0.144 0.159 0.159 ...
## $ ghg_per_capita : num 2.54 2.56 2.67 2.77 2.87 ...
## $ land_use_change_co2 : num 6.1 7.08 8 8.91 9.96 ...
## $ land_use_change_co2_per_capita : num 0.829 0.911 1.015 1.116 1.231 ...
## $ methane : num 7.73 7.88 7.97 8.07 8.19 ...
## $ methane_per_capita : num 1.05 1.01 1.01 1.01 1.01 ...
## $ nitrous_oxide : num 2.16 2.23 2.29 2.37 2.45 ...
## $ nitrous_oxide_per_capita : num 0.294 0.287 0.291 0.296 0.302 0.309 0.314 0.32 0.323 0.326 ...
## $ oil_co2 : num 0 0.063 0.066 0.06 0.068 0.064 0.092 0.121 0.216 0.238 ...
## $ oil_co2_per_capita : num 0 0.008 0.008 0.007 0.008 0.008 0.011 0.014 0.025 0.027 ...
## $ other_co2_per_capita : num NA NA NA NA NA NA NA NA NA NA ...
## $ other_industry_co2 : num NA NA NA NA NA NA NA NA NA NA ...
## $ primary_energy_consumption : num NA NA NA NA NA NA NA NA NA NA ...
## $ share_global_cement_co2 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ share_global_co2 : num 0 0.001 0.001 0.001 0.002 0.002 0.002 0.002 0.004 0.004 ...
## $ share_global_co2_including_luc : num 0.056 0.057 0.06 0.066 0.072 0.075 0.078 0.081 0.085 0.091 ...
## $ share_global_coal_co2 : num 0 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.002 0.002 ...
## $ share_global_cumulative_cement_co2 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ share_global_cumulative_co2 : num 0 0 0 0 0 0 0 0 0 0.001 ...
## $ share_global_cumulative_co2_including_luc: num 0.001 0.002 0.003 0.004 0.006 0.007 0.009 0.01 0.012 0.013 ...
## $ share_global_cumulative_coal_co2 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ share_global_cumulative_flaring_co2 : num NA NA NA NA NA NA NA NA NA NA ...
## $ share_global_cumulative_gas_co2 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ share_global_cumulative_luc_co2 : num 0.117 0.116 0.116 0.116 0.117 0.117 0.118 0.118 0.119 0.12 ...
## $ share_global_cumulative_oil_co2 : num 0 0 0 0.001 0.001 0.001 0.001 0.001 0.002 0.002 ...
## $ share_global_cumulative_other_co2 : num NA NA NA NA NA NA NA NA NA NA ...
## $ share_global_flaring_co2 : num NA NA NA NA NA NA NA NA NA NA ...
## $ share_global_gas_co2 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ share_global_luc_co2 : num 0.107 0.106 0.111 0.123 0.135 0.139 0.149 0.158 0.166 0.179 ...
## $ share_global_oil_co2 : num 0 0.004 0.004 0.003 0.004 0.003 0.004 0.005 0.008 0.009 ...
## $ share_global_other_co2 : num NA NA NA NA NA NA NA NA NA NA ...
## $ share_of_temperature_change_from_ghg : num 0.131 0.131 0.13 0.13 0.13 0.13 0.13 0.13 0.13 0.13 ...
## $ temperature_change_from_ch4 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ temperature_change_from_co2 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ temperature_change_from_ghg : num 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 ...
## $ temperature_change_from_n2o : num 0 0 0 0 0 0 0 0 0 0 ...
## $ total_ghg : num 18.7 19.9 21.1 22.1 23.3 ...
## $ total_ghg_excluding_lucf : num 1.13 1.24 1.22 1.19 1.18 ...
## $ trade_co2 : num NA NA NA NA NA NA NA NA NA NA ...
## $ trade_co2_share : num NA NA NA NA NA NA NA NA NA NA ...
## country year iso_code population
## Mode :logical Mode :logical Mode :logical Mode :logical
## FALSE:22386 FALSE:22386 FALSE:22386 FALSE:22386
##
## gdp cement_co2 cement_co2_per_capita co2
## Mode :logical Mode :logical Mode :logical Mode :logical
## FALSE:14548 FALSE:17391 FALSE:17391 FALSE:22386
## TRUE :7838 TRUE :4995 TRUE :4995
## co2_growth_abs co2_growth_prct co2_including_luc co2_including_luc_growth_abs
## Mode :logical Mode :logical Mode :logical Mode :logical
## FALSE:22109 FALSE:22091 FALSE:20588 FALSE:20334
## TRUE :277 TRUE :295 TRUE :1798 TRUE :2052
## co2_including_luc_growth_prct co2_including_luc_per_capita
## Mode :logical Mode :logical
## FALSE:20334 FALSE:20588
## TRUE :2052 TRUE :1798
## co2_including_luc_per_gdp co2_including_luc_per_unit_energy co2_per_capita
## Mode :logical Mode :logical Mode :logical
## FALSE:14135 FALSE:9016 FALSE:22386
## TRUE :8251 TRUE :13370
## co2_per_gdp co2_per_unit_energy coal_co2 coal_co2_per_capita
## Mode :logical Mode :logical Mode :logical Mode :logical
## FALSE:14548 FALSE:9642 FALSE:17595 FALSE:17595
## TRUE :7838 TRUE :12744 TRUE :4791 TRUE :4791
## consumption_co2 consumption_co2_per_capita consumption_co2_per_gdp
## Mode :logical Mode :logical Mode :logical
## FALSE:4063 FALSE:4063 FALSE:3908
## TRUE :18323 TRUE :18323 TRUE :18478
## cumulative_cement_co2 cumulative_co2 cumulative_co2_including_luc
## Mode :logical Mode :logical Mode :logical
## FALSE:17391 FALSE:22386 FALSE:20588
## TRUE :4995 TRUE :1798
## cumulative_coal_co2 cumulative_flaring_co2 cumulative_gas_co2
## Mode :logical Mode :logical Mode :logical
## FALSE:17595 FALSE:11869 FALSE:14044
## TRUE :4791 TRUE :10517 TRUE :8342
## cumulative_luc_co2 cumulative_oil_co2 cumulative_other_co2 energy_per_capita
## Mode :logical Mode :logical Mode :logical Mode :logical
## FALSE:20588 FALSE:21278 FALSE:1734 FALSE:9695
## TRUE :1798 TRUE :1108 TRUE :20652 TRUE :12691
## energy_per_gdp flaring_co2 flaring_co2_per_capita gas_co2
## Mode :logical Mode :logical Mode :logical Mode :logical
## FALSE:7763 FALSE:11869 FALSE:11869 FALSE:14044
## TRUE :14623 TRUE :10517 TRUE :10517 TRUE :8342
## gas_co2_per_capita ghg_excluding_lucf_per_capita ghg_per_capita
## Mode :logical Mode :logical Mode :logical
## FALSE:14044 FALSE:20889 FALSE:21019
## TRUE :8342 TRUE :1497 TRUE :1367
## land_use_change_co2 land_use_change_co2_per_capita methane
## Mode :logical Mode :logical Mode :logical
## FALSE:20588 FALSE:20588 FALSE:21019
## TRUE :1798 TRUE :1798 TRUE :1367
## methane_per_capita nitrous_oxide nitrous_oxide_per_capita oil_co2
## Mode :logical Mode :logical Mode :logical Mode :logical
## FALSE:21019 FALSE:21087 FALSE:21087 FALSE:21278
## TRUE :1367 TRUE :1299 TRUE :1299 TRUE :1108
## oil_co2_per_capita other_co2_per_capita other_industry_co2
## Mode :logical Mode :logical Mode :logical
## FALSE:21278 FALSE:1734 FALSE:1734
## TRUE :1108 TRUE :20652 TRUE :20652
## primary_energy_consumption share_global_cement_co2 share_global_co2
## Mode :logical Mode :logical Mode :logical
## FALSE:9695 FALSE:17207 FALSE:22386
## TRUE :12691 TRUE :5179
## share_global_co2_including_luc share_global_coal_co2
## Mode :logical Mode :logical
## FALSE:20588 FALSE:17595
## TRUE :1798 TRUE :4791
## share_global_cumulative_cement_co2 share_global_cumulative_co2
## Mode :logical Mode :logical
## FALSE:17207 FALSE:22386
## TRUE :5179
## share_global_cumulative_co2_including_luc share_global_cumulative_coal_co2
## Mode :logical Mode :logical
## FALSE:20588 FALSE:17595
## TRUE :1798 TRUE :4791
## share_global_cumulative_flaring_co2 share_global_cumulative_gas_co2
## Mode :logical Mode :logical
## FALSE:9583 FALSE:12481
## TRUE :12803 TRUE :9905
## share_global_cumulative_luc_co2 share_global_cumulative_oil_co2
## Mode :logical Mode :logical
## FALSE:20588 FALSE:20702
## TRUE :1798 TRUE :1684
## share_global_cumulative_other_co2 share_global_flaring_co2
## Mode :logical Mode :logical
## FALSE:1610 FALSE:9583
## TRUE :20776 TRUE :12803
## share_global_gas_co2 share_global_luc_co2 share_global_oil_co2
## Mode :logical Mode :logical Mode :logical
## FALSE:12481 FALSE:20588 FALSE:20702
## TRUE :9905 TRUE :1798 TRUE :1684
## share_global_other_co2 share_of_temperature_change_from_ghg
## Mode :logical Mode :logical
## FALSE:1610 FALSE:21863
## TRUE :20776 TRUE :523
## temperature_change_from_ch4 temperature_change_from_co2
## Mode :logical Mode :logical
## FALSE:21060 FALSE:21863
## TRUE :1326 TRUE :523
## temperature_change_from_ghg temperature_change_from_n2o total_ghg
## Mode :logical Mode :logical Mode :logical
## FALSE:21863 FALSE:21060 FALSE:21019
## TRUE :523 TRUE :1326 TRUE :1367
## total_ghg_excluding_lucf trade_co2 trade_co2_share
## Mode :logical Mode :logical Mode :logical
## FALSE:20889 FALSE:4063 FALSE:4063
## TRUE :1497 TRUE :18323 TRUE :18323
top_countries <- data %>%
group_by(country) %>%
summarise(total_co2 = sum(co2, na.rm=TRUE)) %>%
arrange(desc(total_co2)) %>%
head(10)
top_countries## # A tibble: 10 × 2
## country total_co2
## <chr> <dbl>
## 1 United States 434867.
## 2 China 285087.
## 3 Russia 122808.
## 4 Germany 95132.
## 5 United Kingdom 79394.
## 6 Japan 69612.
## 7 India 66073.
## 8 France 40048.
## 9 Canada 35644.
## 10 Ukraine 31236.
Inference: Top countries contributing to CO2 emissions are shown above.
theme_set(theme_minimal())
global_trend <- data %>%
group_by(year) %>%
summarise(total_co2 = sum(co2, na.rm=TRUE))
trend <- ggplot(global_trend, aes(x=year, y=total_co2)) +
geom_line(color="blue") +
labs(title="Global CO2 Emissions Over Time",
x="Year",
y="Total CO2 Emissions")
plotly::ggplotly(trend)Inference: Global CO2 emissions have shown a continuous increasing trend over time, indicating rising pollution levels.
## # A tibble: 1 × 2
## year total_co2
## <int> <dbl>
## 1 2024 37398.
Inference: The year shown above recorded the highest global CO2 emissions, indicating peak pollution levels.
top5 <- c("United States", "China", "India", "Russia", "Japan")
share_data <- data %>%
filter(country %in% top5) %>%
group_by(country) %>%
summarise(total_co2 = sum(co2, na.rm=TRUE))
share_data$percentage <- (share_data$total_co2 / sum(share_data$total_co2)) * 100
ggplot(share_data, aes(x="", y=percentage, fill=country)) +
geom_bar(stat="identity", width=1) +
coord_polar("y") +
labs(title="Contribution of Top Countries to CO2 Emissions")Inference: The chart shows that a small number of countries contribute a large share of global emissions.
data$co2_per_capita <- data$co2 / data$population
top_per_capita <- data %>%
filter(!is.na(co2_per_capita)) %>%
filter(nchar(iso_code) == 3) %>%
group_by(country) %>%
summarise(avg_per_capita = mean(co2_per_capita, na.rm=TRUE)) %>%
arrange(desc(avg_per_capita)) %>%
head(10)
top_per_capita## # A tibble: 10 × 2
## country avg_per_capita
## <chr> <dbl>
## 1 Sint Maarten (Dutch part) 0.000143
## 2 Curacao 0.0000491
## 3 Qatar 0.0000461
## 4 Kuwait 0.0000289
## 5 United Arab Emirates 0.0000287
## 6 Luxembourg 0.0000252
## 7 Brunei 0.0000243
## 8 Bahrain 0.0000200
## 9 Saudi Arabia 0.0000141
## 10 Trinidad and Tobago 0.0000134
Inference: The results show that some countries have significantly higher emissions per person, indicating greater individual environmental impact.
distribution <- ggplot(data, aes(x=co2)) +
geom_histogram(bins=30, fill="steelblue", color="black") +
labs(
title="Distribution of CO2 Emissions",
x="CO2 Emissions",
y="Frequency"
)
plotly::ggplotly(distribution)Inference: The distribution indicates that most countries have relatively low emissions, while a few countries contribute disproportionately high emissions.
box <- ggplot(data, aes(y = co2)) +
geom_boxplot(fill = "orange", color = "black") +
scale_y_log10() +
labs(
title = "Boxplot of CO2 Emissions (Log Scale)",
y = "CO2 Emissions (log scale)"
)
plotly::ggplotly(box)Inference: The boxplot clearly highlights extreme outliers, representing countries with exceptionally high emission levels compared to others.
##
## Call:
## lm(formula = total_co2 ~ year, data = global_trend)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8389.7 -4867.4 -576.2 3764.2 14137.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.480e+05 1.082e+04 -22.92 <2e-16 ***
## year 1.340e+02 5.665e+00 23.66 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5802 on 228 degrees of freedom
## Multiple R-squared: 0.7106, Adjusted R-squared: 0.7093
## F-statistic: 559.8 on 1 and 228 DF, p-value: < 2.2e-16
Inference: The positive trend indicates that global CO2 emissions are increasing over time, highlighting worsening pollution levels.
global_trend$future_co2 <- global_trend$total_co2 * 1.1
pred1 <- ggplot(global_trend, aes(x=year)) +
geom_line(aes(y=total_co2), color="blue") +
geom_line(aes(y=future_co2), color="red") +
labs(
title="Current vs Predicted CO2 Emissions",
x="Year",
y="CO2 Emissions"
)
plotly::ggplotly(pred1)Inference: If current trends continue, CO2 emissions are expected to rise further, posing serious environmental risks.
high_risk <- data %>%
filter(nchar(iso_code) == 3) %>%
group_by(country) %>%
summarise(total_co2 = sum(co2, na.rm=TRUE)) %>%
arrange(desc(total_co2)) %>%
head(5)
high_risk## # A tibble: 5 × 2
## country total_co2
## <chr> <dbl>
## 1 United States 434867.
## 2 China 285087.
## 3 Russia 122808.
## 4 Germany 95132.
## 5 United Kingdom 79394.
Inference: Countries with the highest emissions should take immediate action to control pollution and reduce environmental impact.
##
## Call:
## lm(formula = co2 ~ year, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -139.8 -98.7 -67.7 -15.8 12149.2
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.759e+03 1.231e+02 -14.30 <2e-16 ***
## year 9.383e-01 6.274e-02 14.95 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 444.9 on 22384 degrees of freedom
## Multiple R-squared: 0.009892, Adjusted R-squared: 0.009848
## F-statistic: 223.6 on 1 and 22384 DF, p-value: < 2.2e-16
Inference: The model shows a positive relationship between year and CO2 emissions, indicating that pollution is increasing over time and may worsen future environmental conditions.
visual <- ggplot(global_trend, aes(x=year)) +
geom_line(aes(y=total_co2), color="blue") +
geom_smooth(aes(y=total_co2), method="lm", color="red") +
labs(
title="Trend Projection of CO2 Emissions",
x="Year",
y="CO2 Emissions"
)
plotly::ggplotly(visual)## `geom_smooth()` using formula = 'y ~ x'
Inference: The trend line indicates that CO2 emissions are expected to continue rising if current patterns persist.
growth_data <- data %>%
filter(nchar(iso_code) == 3) %>%
group_by(country) %>%
summarise(growth = max(co2, na.rm=TRUE) - min(co2, na.rm=TRUE)) %>%
arrange(desc(growth)) %>%
head(10)
growth_data## # A tibble: 10 × 2
## country growth
## <chr> <dbl>
## 1 China 12272.
## 2 United States 6127.
## 3 India 3193.
## 4 Russia 2536.
## 5 Japan 1312.
## 6 Germany 1117.
## 7 Indonesia 812.
## 8 Iran 793.
## 9 Ukraine 744.
## 10 Saudi Arabia 708.
Inference: These countries have shown the highest increase in emissions over time, indicating rapid industrial or economic growth.
latest_year <- max(data$year, na.rm=TRUE)
recent_data <- data %>%
filter(year == latest_year) %>%
filter(nchar(iso_code) == 3) %>%
arrange(desc(co2)) %>%
head(10)
recent_data[, c("country", "co2")]## country co2
## 1 China 12289.037
## 2 United States 4904.120
## 3 India 3193.478
## 4 Russia 1780.524
## 5 Japan 961.867
## 6 Indonesia 812.220
## 7 Iran 792.631
## 8 Saudi Arabia 692.133
## 9 South Korea 583.679
## 10 Germany 572.319
Inference: The most recent data highlights current global pollution leaders, which are key contributors to environmental issues today.
top5 <- c("United States", "China", "India", "Russia", "Japan")
trend_data <- data %>%
filter(country %in% top5)
top <- ggplot(trend_data, aes(x=year, y=co2, color=country)) +
geom_line(size=1) +
labs(
title="CO2 Emission Trends of Top 5 Countries",
x="Year",
y="CO2 Emissions"
)## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Inference: The graph shows how emissions have evolved over time for major countries, highlighting differences in growth patterns and industrial development.
total_global <- sum(data$co2, na.rm=TRUE)
top5_data <- data %>%
filter(country %in% c("United States", "China", "India", "Russia", "Japan")) %>%
group_by(country) %>%
summarise(total = sum(co2, na.rm=TRUE))
top5_data$percentage <- (top5_data$total / total_global) * 100
top5_data## # A tibble: 5 × 3
## country total percentage
## <chr> <dbl> <dbl>
## 1 China 285087. 15.8
## 2 India 66073. 3.67
## 3 Japan 69612. 3.86
## 4 Russia 122808. 6.81
## 5 United States 434867. 24.1
Inference: A small number of countries contribute a large percentage of global emissions.
global_trend$change <- c(NA, diff(global_trend$total_co2))
acc <- ggplot(global_trend, aes(x=year, y=change)) +
geom_line(color="purple") +
labs(
title="Yearly Change in CO2 Emissions",
x="Year",
y="Change in Emissions"
)
plotly::ggplotly(acc)Inference: The increasing fluctuations suggest that emissions are not just rising but may be accelerating.
reduction <- data %>%
filter(nchar(iso_code) == 3) %>%
group_by(country) %>%
summarise(change = last(co2) - first(co2)) %>%
arrange(change)
head(reduction, 10)## # A tibble: 10 × 2
## country change
## <chr> <dbl>
## 1 Curacao -3.84
## 2 Niue 0.004
## 3 Saint Helena 0.007
## 4 Tuvalu 0.007
## 5 Wallis and Futuna 0.013
## 6 Micronesia (country) 0.0150
## 7 Andorra 0.0220
## 8 Montserrat 0.023
## 9 Nauru 0.032
## 10 Sint Maarten (Dutch part) 0.0350
Inference: Some countries have successfully reduced emissions, indicating effective environmental policies.
## # A tibble: 5 × 2
## country growth
## <chr> <dbl>
## 1 China 12272.
## 2 United States 6127.
## 3 India 3193.
## 4 Russia 2536.
## 5 Japan 1312.
Inference: Countries with rapid emission growth are at higher environmental risk in the future.
data$period <- ifelse(data$year < 1980, "Before 1980",
ifelse(data$year < 2000, "1980-2000", "After 2000"))
period_data <- data %>%
group_by(period) %>%
summarise(avg_co2 = mean(co2, na.rm=TRUE))
period_data## # A tibble: 3 × 2
## period avg_co2
## <chr> <dbl>
## 1 1980-2000 103.
## 2 After 2000 151.
## 3 Before 1980 44.3
Inference: CO2 emissions have increased significantly in recent decades, especially after 2000, indicating rapid industrial growth and environmental impact.
effect <- ggplot(data, aes(x=population, y=co2)) +
geom_point(alpha=0.5, color="blue") +
labs(
title="Population vs CO2 Emissions",
x="Population",
y="CO2 Emissions"
)
plotly::ggplotly(effect)Inference: Countries with larger populations tend to have higher total emissions.
compare <- ggplot(data, aes(x=co2_per_capita, y=co2)) +
geom_point(alpha=0.5, color="red") +
labs(
title="Per Capita vs Total Emissions",
x="CO2 per Capita",
y="Total CO2"
)
plotly::ggplotly(compare)Inference: Some countries have high total emissions but lower per capita values.
latest_year <- max(data$year, na.rm=TRUE)
recent_pc <- data %>%
filter(year == latest_year) %>%
filter(nchar(iso_code) == 3) %>%
arrange(desc(co2_per_capita)) %>%
head(10)
recent_pc[, c("country","co2_per_capita")]## country co2_per_capita
## 1 Qatar 4.127109e-05
## 2 Kuwait 2.624760e-05
## 3 Brunei 2.604520e-05
## 4 Bahrain 2.426980e-05
## 5 Trinidad and Tobago 2.293176e-05
## 6 Saudi Arabia 2.037918e-05
## 7 United Arab Emirates 2.013107e-05
## 8 New Caledonia 1.806564e-05
## 9 Sint Maarten (Dutch part) 1.655446e-05
## 10 Oman 1.565111e-05
Inference: Some smaller countries have extremely high emissions per person.
smooth <- ggplot(global_trend, aes(x=year, y=total_co2)) +
geom_line(color="gray") +
geom_smooth(method="loess", color="red") +
labs(title="Smoothed CO2 Emission Trend")
plotly::ggplotly(smooth)## `geom_smooth()` using formula = 'y ~ x'
Inference: The smoothed curve highlights the long-term upward trend in emissions.
Inference: Most countries cluster at lower emission levels with a long tail of high emitters.
recent_data <- data %>%
filter(year >= max(year) - 10)
dec <- ggplot(recent_data, aes(x=year, y=co2)) +
geom_line(color="blue")
plotly::ggplotly(dec)Inference: Recent years show continued increase in emissions.
stability <- data %>%
filter(nchar(iso_code) == 3) %>%
group_by(country) %>%
summarise(sd_co2 = sd(co2, na.rm=TRUE)) %>%
arrange(sd_co2)
head(stability, 10)## # A tibble: 10 × 2
## country sd_co2
## <chr> <dbl>
## 1 Niue 0.00152
## 2 Tuvalu 0.00274
## 3 Saint Helena 0.00296
## 4 Wallis and Futuna 0.00315
## 5 Montserrat 0.0128
## 6 Saint Pierre and Miquelon 0.0175
## 7 Kiribati 0.0181
## 8 Cook Islands 0.0236
## 9 Micronesia (country) 0.0241
## 10 Marshall Islands 0.0243
Inference: Countries with low variation show stable emission patterns.
## [1] 0.6311937
Inference: There is a positive correlation between population and emissions.
log1 <- ggplot(data, aes(x=log(co2))) +
geom_histogram(bins=30, fill="purple")
plotly::ggplotly(log1)Inference: Log transformation reduces skewness and improves visualization.
top_countries_names <- top_countries$country
trend_top <- data %>%
filter(country %in% top_countries_names)
trend2 <- ggplot(trend_top, aes(x=year, y=co2, color=country)) +
geom_line()
plotly::ggplotly(trend2)Inference: Top contributors remain consistent over time.
low_emitters <- data %>%
filter(nchar(iso_code) == 3) %>%
arrange(co2) %>%
head(10)
low_emitters[, c("country","co2")]## country co2
## 1 Armenia 0.001
## 2 Armenia 0.001
## 3 Armenia 0.001
## 4 Armenia 0.001
## 5 Australia 0.001
## 6 Australia 0.001
## 7 Australia 0.001
## 8 Australia 0.001
## 9 Australia 0.001
## 10 Australia 0.001
Inference: Some countries contribute very little to global emissions.
## [1] 199870.7
Inference: High variance indicates unequal emission distribution.
risk <- ggplot(growth_data, aes(x=reorder(country, growth), y=growth)) +
geom_bar(stat="identity", fill="red") +
coord_flip()
plotly::ggplotly(risk)Inference: Countries with highest growth pose future environmental risks.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.306 210.004 2973.988 7835.396 11777.002 37398.067
Inference: Global emissions show a clear increasing pattern with significant disparities across countries.
## [1] 80.50304
## [1] 3.2115
## [1] 447.069
## 25% 50% 75%
## 0.37700 3.21150 25.31125
Inference: CO2 emissions show high variability with a significant difference between median and maximum values, indicating skewness.
## [1] 447.069
Inference: High standard deviation indicates unequal emission distribution across countries.
distribution1 <- ggplot(data, aes(x=co2)) +
geom_histogram(bins=30, fill="blue") +
geom_density(color="red")
plotly::ggplotly(distribution1)Inference: The distribution is right-skewed, meaning most countries have low emissions while few have very high emissions.
Q1 <- quantile(data$co2, 0.25, na.rm=TRUE)
Q3 <- quantile(data$co2, 0.75, na.rm=TRUE)
IQR_val <- Q3 - Q1
lower <- Q1 - 1.5 * IQR_val
upper <- Q3 + 1.5 * IQR_val
outliers <- data %>%
filter(co2 < lower | co2 > upper)
head(outliers)## country year iso_code population gdp cement_co2
## 1 Algeria 1980 DZA 18607175 94481648187 1.729
## 2 Algeria 1984 DZA 21271969 114406671796 2.131
## 3 Algeria 1985 DZA 22008542 120364208777 2.299
## 4 Algeria 1986 DZA 22745501 119145030269 2.384
## 5 Algeria 1987 DZA 23443627 118321211437 2.727
## 6 Algeria 1988 DZA 24109538 115821713281 2.546
## cement_co2_per_capita co2 co2_growth_abs co2_growth_prct co2_including_luc
## 1 0.093 66.124 20.820 45.957 68.773
## 2 0.100 70.417 18.343 35.224 77.104
## 3 0.104 71.988 1.571 2.231 77.087
## 4 0.105 75.382 3.394 4.715 83.829
## 5 0.116 83.020 7.638 10.132 89.001
## 6 0.106 82.839 -0.181 -0.218 88.672
## co2_including_luc_growth_abs co2_including_luc_growth_prct
## 1 19.776 40.363
## 2 11.209 17.011
## 3 -0.017 -0.022
## 4 6.742 8.746
## 5 5.172 6.169
## 6 -0.329 -0.370
## co2_including_luc_per_capita co2_including_luc_per_gdp
## 1 3.696 0.728
## 2 3.625 0.674
## 3 3.503 0.640
## 4 3.686 0.704
## 5 3.796 0.752
## 6 3.678 0.766
## co2_including_luc_per_unit_energy co2_per_capita co2_per_gdp
## 1 0.388 3.553683e-06 0.700
## 2 0.296 3.310319e-06 0.615
## 3 0.294 3.270912e-06 0.598
## 4 0.302 3.314150e-06 0.633
## 5 0.312 3.541261e-06 0.702
## 6 0.290 3.435943e-06 0.715
## co2_per_unit_energy coal_co2 coal_co2_per_capita consumption_co2
## 1 0.373 1.608 0.086 NA
## 2 0.271 4.027 0.189 NA
## 3 0.274 3.078 0.140 NA
## 4 0.272 2.726 0.120 NA
## 5 0.291 2.968 0.127 NA
## 6 0.271 2.975 0.123 NA
## consumption_co2_per_capita consumption_co2_per_gdp cumulative_cement_co2
## 1 NA NA 16.711
## 2 NA NA 24.314
## 3 NA NA 26.613
## 4 NA NA 28.997
## 5 NA NA 31.723
## 6 NA NA 34.269
## cumulative_co2 cumulative_co2_including_luc cumulative_coal_co2
## 1 544.567 1356.417 35.981
## 2 751.865 1583.301 47.040
## 3 823.853 1660.388 50.118
## 4 899.235 1744.217 52.844
## 5 982.255 1833.218 55.811
## 6 1065.093 1921.889 58.786
## cumulative_flaring_co2 cumulative_gas_co2 cumulative_luc_co2
## 1 195.702 95.228 1051.398
## 2 237.293 160.683 1070.983
## 3 253.701 188.383 1076.083
## 4 268.126 221.469 1084.530
## 5 280.243 261.586 1090.511
## 6 291.304 300.559 1096.344
## cumulative_oil_co2 cumulative_other_co2 energy_per_capita energy_per_gdp
## 1 200.946 NA 9520.142 1.875
## 2 282.535 NA 12231.881 2.274
## 3 305.039 NA 11917.944 2.179
## 4 327.800 NA 12199.799 2.329
## 5 352.891 NA 12163.992 2.410
## 6 380.176 NA 12668.091 2.637
## flaring_co2 flaring_co2_per_capita gas_co2 gas_co2_per_capita
## 1 18.686 1.004 25.479 1.369
## 2 10.981 0.516 30.510 1.434
## 3 16.407 0.746 27.700 1.259
## 4 14.425 0.634 33.086 1.455
## 5 12.117 0.517 40.117 1.711
## 6 11.061 0.459 38.972 1.616
## ghg_excluding_lucf_per_capita ghg_per_capita land_use_change_co2
## 1 6.103 6.495 2.649
## 2 5.501 6.037 6.687
## 3 5.457 5.953 5.099
## 4 5.508 6.029 8.447
## 5 5.698 6.157 5.981
## 6 5.519 5.956 5.833
## land_use_change_co2_per_capita methane methane_per_capita nitrous_oxide
## 1 0.142 50.788 2.729 3.845
## 2 0.314 50.028 2.352 4.266
## 3 0.232 51.467 2.338 4.502
## 4 0.371 53.124 2.336 4.541
## 5 0.255 53.929 2.300 4.416
## 6 0.242 53.571 2.222 4.187
## nitrous_oxide_per_capita oil_co2 oil_co2_per_capita other_co2_per_capita
## 1 0.207 18.620 1.001 NA
## 2 0.201 22.768 1.070 NA
## 3 0.205 22.504 1.023 NA
## 4 0.200 22.761 1.001 NA
## 5 0.188 25.091 1.070 NA
## 6 0.174 27.285 1.132 NA
## other_industry_co2 primary_energy_consumption share_global_cement_co2
## 1 NA 177.143 0.426
## 2 NA 260.196 0.506
## 3 NA 262.297 0.540
## 4 NA 277.491 0.540
## 5 NA 285.168 0.595
## 6 NA 305.422 0.528
## share_global_co2 share_global_co2_including_luc share_global_coal_co2
## 1 0.341 0.271 0.023
## 2 0.362 0.298 0.053
## 3 0.357 0.286 0.038
## 4 0.369 0.306 0.033
## 5 0.393 0.322 0.035
## 6 0.378 0.316 0.034
## share_global_cumulative_cement_co2 share_global_cumulative_co2
## 1 0.209 0.090
## 2 0.252 0.111
## 3 0.264 0.118
## 4 0.275 0.125
## 5 0.289 0.133
## 6 0.299 0.140
## share_global_cumulative_co2_including_luc share_global_cumulative_coal_co2
## 1 0.109 0.010
## 2 0.118 0.012
## 3 0.121 0.013
## 4 0.125 0.013
## 5 0.129 0.014
## 6 0.133 0.014
## share_global_cumulative_flaring_co2 share_global_cumulative_gas_co2
## 1 3.159 0.201
## 2 3.358 0.275
## 3 3.500 0.306
## 4 3.615 0.343
## 5 3.697 0.385
## 6 3.751 0.422
## share_global_cumulative_luc_co2 share_global_cumulative_oil_co2
## 1 0.164 0.110
## 2 0.161 0.131
## 3 0.160 0.136
## 4 0.159 0.141
## 5 0.159 0.146
## 6 0.158 0.152
## share_global_cumulative_other_co2 share_global_flaring_co2
## 1 NA 5.905
## 2 NA 5.880
## 3 NA 9.042
## 4 NA 8.502
## 5 NA 7.433
## 6 NA 5.959
## share_global_gas_co2 share_global_luc_co2 share_global_oil_co2
## 1 0.929 0.044 0.209
## 2 1.027 0.104 0.276
## 3 0.897 0.075 0.273
## 4 1.098 0.122 0.268
## 5 1.222 0.092 0.292
## 6 1.134 0.095 0.307
## share_global_other_co2 share_of_temperature_change_from_ghg
## 1 NA 0.248
## 2 NA 0.261
## 3 NA 0.266
## 4 NA 0.270
## 5 NA 0.275
## 6 NA 0.279
## temperature_change_from_ch4 temperature_change_from_co2
## 1 0.002 0.000
## 2 0.002 0.000
## 3 0.002 0.001
## 4 0.002 0.001
## 5 0.002 0.001
## 6 0.002 0.001
## temperature_change_from_ghg temperature_change_from_n2o total_ghg
## 1 0.002 0 120.856
## 2 0.002 0 128.428
## 3 0.002 0 131.022
## 4 0.002 0 137.133
## 5 0.003 0 144.349
## 6 0.003 0 143.595
## total_ghg_excluding_lucf trade_co2 trade_co2_share period
## 1 113.565 NA NA 1980-2000
## 2 117.018 NA NA 1980-2000
## 3 120.111 NA NA 1980-2000
## 4 125.273 NA NA 1980-2000
## 5 133.570 NA NA 1980-2000
## 6 133.069 NA NA 1980-2000
Inference: Outliers represent countries with extremely high emissions.
## [1] 0.6311937
Inference: A positive correlation suggests that higher population is associated with higher emissions.
corr1 <- ggplot(data, aes(x=population, y=co2)) +
geom_point(alpha=0.5) +
geom_smooth(method="lm", color="red")
plotly::ggplotly(corr1)## `geom_smooth()` using formula = 'y ~ x'
Inference: The plot shows a positive relationship between population and emissions.
## [1] 0.6311937
Inference: The strength of correlation indicates how strongly population influences emissions.
##
## Call:
## lm(formula = co2 ~ population, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2433.1 -25.3 -12.7 -8.7 7841.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.669e+00 2.390e+00 4.046 5.22e-05 ***
## population 3.127e-06 2.568e-08 121.753 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 346.8 on 22384 degrees of freedom
## Multiple R-squared: 0.3984, Adjusted R-squared: 0.3984
## F-statistic: 1.482e+04 on 1 and 22384 DF, p-value: < 2.2e-16
Inference: The regression model shows how population influences CO2 emissions.
## (Intercept) population
## 9.668677e+00 3.126621e-06
Inference: Coefficients represent how much CO2 emissions change with population.
# Split data
set.seed(123)
train_index <- createDataPartition(data$co2, p = 0.7, list = FALSE)
train_data <- data[train_index, ]
test_data <- data[-train_index, ]
# Train model
model <- lm(co2 ~ year, data = train_data)
# Predict on unseen data
prediction <- predict(model, newdata = test_data)
# View predictions
head(data.frame(
Actual = test_data$co2,
Predicted = prediction
))## Actual Predicted
## 3 0.092 70.26912
## 4 0.092 71.23942
## 6 0.106 73.18001
## 8 0.183 75.12061
## 16 0.839 82.88300
## 29 2.384 95.49689
Inference: The model was trained on 70% of the data and used to predict CO2 emissions on unseen test data. The predicted values provide an estimate based on the year variable.
library(ggplot2)
# Create result dataframe
results <- data.frame(
Actual = test_data$co2,
Predicted = prediction
)
# Plot
visual1 <- ggplot(results, aes(x = Actual, y = Predicted)) +
geom_point(color = "blue", alpha = 0.5) +
geom_abline(slope = 1, intercept = 0, color = "red") +
labs(
title = "Actual vs Predicted CO2 Emissions",
x = "Actual CO2",
y = "Predicted CO2"
)+
coord_flip()
plotly::ggplotly(visual1)Inference: Points close to the red line indicate accurate predictions. Deviations from the line represent prediction errors, showing how well the model performs on unseen data.
This project analyzed global CO2 emission trends using R, incorporating data cleaning, exploratory data analysis, visualization, correlation, and regression techniques. The findings reveal a consistent rise in emissions over time, with significant disparities among countries.
Advanced analysis techniques such as boxplots, correlation, and regression provided deeper insights into emission patterns and relationships between variables. The results highlight that while population contributes to emissions, other factors also influence variability across countries.
Overall, the study emphasizes the urgent need for effective environmental policies and sustainable practices. If current trends continue, future environmental conditions may worsen, making it essential to take immediate global action to reduce pollution.