Pollution is one of the most critical global challenges affecting environmental sustainability and human health. This project analyzes global CO2 emission trends using R programming to understand historical patterns, identify key contributors, and evaluate potential future impacts.
The dataset used in this project is sourced from Our World in Data. It contains country-wise information on CO2 emissions, population, and related environmental indicators across multiple years.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
data <- read.csv("data.csv")
top_countries <- data %>%
filter(!is.na(co2)) %>%
filter(nchar(iso_code) == 3) %>%
group_by(country) %>%
summarise(total_co2 = sum(co2, na.rm=TRUE)) %>%
arrange(desc(total_co2)) %>%
head(10)
print(top_countries)
## # A tibble: 10 × 2
## country total_co2
## <chr> <dbl>
## 1 United States 434867.
## 2 China 285087.
## 3 Russia 122808.
## 4 Germany 95136.
## 5 United Kingdom 80079.
## 6 Japan 69612.
## 7 India 66073.
## 8 France 40048.
## 9 Canada 35644.
## 10 Ukraine 31236.
Inference: Top countries contributing to CO2 emissions are shown above.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.5.3
theme_set(theme_minimal())
global_trend <- data %>%
group_by(year) %>%
summarise(total_co2 = sum(co2, na.rm=TRUE))
ggplot(global_trend, aes(x=year, y=total_co2)) +
geom_line(color="blue") +
labs(title="Global CO2 Emissions Over Time",
x="Year",
y="Total CO2 Emissions")
Inference: Global CO2 emissions have shown a continuous increasing trend over time, indicating rising pollution levels.
peak_year <- global_trend[which.max(global_trend$total_co2), ]
peak_year
## # A tibble: 1 × 2
## year total_co2
## <int> <dbl>
## 1 2024 245672.
Inference: The year shown above recorded the highest global CO2 emissions, indicating peak pollution levels.
top5_countries <- c("United States", "China", "India", "Russia", "Japan")
filtered_data <- data %>%
filter(country %in% top5_countries)
ggplot(filtered_data, aes(x=year, y=co2, color=country)) +
geom_line(size=1) +
labs(title="CO2 Emission Trends of Major Countries",
x="Year",
y="CO2 Emissions")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 83 rows containing missing values or values outside the scale range
## (`geom_line()`).
Inference: The graph compares emission trends of major countries, showing how different nations contribute to global pollution over time.
data$co2_per_capita <- data$co2 / data$population
top_per_capita <- data %>%
filter(!is.na(co2_per_capita)) %>%
filter(nchar(iso_code) == 3) %>%
group_by(country) %>%
summarise(avg_per_capita = mean(co2_per_capita, na.rm=TRUE)) %>%
arrange(desc(avg_per_capita)) %>%
head(10)
top_per_capita
## # A tibble: 10 × 2
## country avg_per_capita
## <chr> <dbl>
## 1 Sint Maarten (Dutch part) 0.000143
## 2 Curacao 0.0000491
## 3 Qatar 0.0000461
## 4 United Arab Emirates 0.0000287
## 5 Kuwait 0.0000281
## 6 Luxembourg 0.0000252
## 7 Brunei 0.0000243
## 8 Bahrain 0.0000200
## 9 Saudi Arabia 0.0000138
## 10 Trinidad and Tobago 0.0000134
Inference: The results show that some countries have significantly higher emissions per person, indicating greater individual environmental impact.
ggplot(data, aes(x=co2)) +
geom_histogram(bins=30, fill="steelblue", color="black") +
labs(
title="Distribution of CO2 Emissions",
x="CO2 Emissions",
y="Frequency"
)
## Warning: Removed 21027 rows containing non-finite outside the scale range
## (`stat_bin()`).
Inference: The distribution indicates that most countries have relatively low emissions, while a few countries contribute disproportionately high emissions.
ggplot(data, aes(y=co2)) +
geom_boxplot(fill="orange", color="black") +
labs(
title="Boxplot of CO2 Emissions"
)
## Warning: Removed 21027 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
Inference: The boxplot clearly highlights extreme outliers, representing countries with exceptionally high emission levels compared to others.
trend_model <- lm(total_co2 ~ year, data = global_trend)
summary(trend_model)
##
## Call:
## lm(formula = total_co2 ~ year, data = global_trend)
##
## Residuals:
## Min 1Q Median 3Q Max
## -57202 -37371 -5574 29702 104020
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.288e+06 5.786e+04 -22.25 <2e-16 ***
## year 7.062e+02 3.064e+01 23.05 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 40330 on 273 degrees of freedom
## Multiple R-squared: 0.6606, Adjusted R-squared: 0.6594
## F-statistic: 531.4 on 1 and 273 DF, p-value: < 2.2e-16
Inference: The positive trend indicates that global CO2 emissions are increasing over time, highlighting worsening pollution levels.
global_trend$future_co2 <- global_trend$total_co2 * 1.1
ggplot(global_trend, aes(x=year)) +
geom_line(aes(y=total_co2), color="blue") +
geom_line(aes(y=future_co2), color="red") +
labs(
title="Current vs Predicted CO2 Emissions",
x="Year",
y="CO2 Emissions"
)
Inference: If current trends continue, CO2 emissions are expected to rise further, posing serious environmental risks.
high_risk <- data %>%
filter(nchar(iso_code) == 3) %>%
group_by(country) %>%
summarise(total_co2 = sum(co2, na.rm=TRUE)) %>%
arrange(desc(total_co2)) %>%
head(5)
high_risk
## # A tibble: 5 × 2
## country total_co2
## <chr> <dbl>
## 1 United States 434867.
## 2 China 285087.
## 3 Russia 122808.
## 4 Germany 95136.
## 5 United Kingdom 80079.
Inference: Countries with the highest emissions should take immediate action to control pollution and reduce environmental impact.
model <- lm(co2 ~ year, data = data)
summary(model)
##
## Call:
## lm(formula = co2 ~ year, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -763 -568 -394 -106 37835
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8416.5716 375.6285 -22.41 <2e-16 ***
## year 4.5355 0.1927 23.54 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1954 on 29382 degrees of freedom
## (21027 observations deleted due to missingness)
## Multiple R-squared: 0.0185, Adjusted R-squared: 0.01847
## F-statistic: 554 on 1 and 29382 DF, p-value: < 2.2e-16
Inference: The model shows a positive relationship between year and CO2 emissions, indicating that pollution is increasing over time and may worsen future environmental conditions.
ggplot(global_trend, aes(x=year)) +
geom_line(aes(y=total_co2), color="blue") +
geom_smooth(aes(y=total_co2), method="lm", color="red") +
labs(
title="Trend Projection of CO2 Emissions",
x="Year",
y="CO2 Emissions"
)
## `geom_smooth()` using formula = 'y ~ x'
Inference: The trend line indicates that CO2 emissions are expected to continue rising if current patterns persist.
growth_data <- data %>%
filter(nchar(iso_code) == 3) %>%
group_by(country) %>%
summarise(growth = max(co2, na.rm=TRUE) - min(co2, na.rm=TRUE)) %>%
arrange(desc(growth)) %>%
head(10)
## Warning: There were 6 warnings in `summarise()`.
## The first warning was:
## ℹ In argument: `growth = max(co2, na.rm = TRUE) - min(co2, na.rm = TRUE)`.
## ℹ In group 128: `country = "Monaco"`.
## Caused by warning in `max()`:
## ! no non-missing arguments to max; returning -Inf
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 5 remaining warnings.
growth_data
## # A tibble: 10 × 2
## country growth
## <chr> <dbl>
## 1 China 12272.
## 2 United States 6127.
## 3 India 3193.
## 4 Russia 2536.
## 5 Japan 1312.
## 6 Germany 1117.
## 7 Indonesia 812.
## 8 Iran 793.
## 9 Ukraine 744.
## 10 Saudi Arabia 708.
Inference: These countries have shown the highest increase in emissions over time, indicating rapid industrial or economic growth.
latest_year <- max(data$year, na.rm=TRUE)
recent_data <- data %>%
filter(year == latest_year) %>%
filter(nchar(iso_code) == 3) %>%
arrange(desc(co2)) %>%
head(10)
recent_data[, c("country", "co2")]
## country co2
## 1 China 12289.037
## 2 United States 4904.120
## 3 India 3193.478
## 4 Russia 1780.524
## 5 Japan 961.867
## 6 Indonesia 812.220
## 7 Iran 792.631
## 8 Saudi Arabia 692.133
## 9 South Korea 583.679
## 10 Germany 572.319
Inference: The most recent data highlights current global pollution leaders, which are key contributors to environmental issues today.
top5 <- c("United States", "China", "India", "Russia", "Japan")
comparison_data <- data %>%
filter(country %in% top5)
ggplot(comparison_data, aes(x=year, y=co2, color=country)) +
geom_line(size=1) +
labs(
title="Comparison of CO2 Emissions (Top 5 Countries)",
x="Year",
y="CO2 Emissions"
)
## Warning: Removed 83 rows containing missing values or values outside the scale range
## (`geom_line()`).
Inference: The graph shows how major countries differ in their emission trends over time.
total_global <- sum(data$co2, na.rm=TRUE)
top5_data <- data %>%
filter(country %in% c("United States", "China", "India", "Russia", "Japan")) %>%
group_by(country) %>%
summarise(total = sum(co2, na.rm=TRUE))
top5_data$percentage <- (top5_data$total / total_global) * 100
top5_data
## # A tibble: 5 × 3
## country total percentage
## <chr> <dbl> <dbl>
## 1 China 285087. 2.31
## 2 India 66073. 0.535
## 3 Japan 69612. 0.564
## 4 Russia 122808. 0.995
## 5 United States 434867. 3.52
Inference: A small number of countries contribute a large percentage of global emissions.
global_trend$change <- c(NA, diff(global_trend$total_co2))
ggplot(global_trend, aes(x=year, y=change)) +
geom_line(color="purple") +
labs(
title="Yearly Change in CO2 Emissions",
x="Year",
y="Change in Emissions"
)
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
Inference: The increasing fluctuations suggest that emissions are not just rising but may be accelerating.
reduction <- data %>%
filter(nchar(iso_code) == 3) %>%
group_by(country) %>%
summarise(change = last(co2) - first(co2)) %>%
arrange(change)
head(reduction, 10)
## # A tibble: 10 × 2
## country change
## <chr> <dbl>
## 1 Moldova 5.33
## 2 Latvia 6.46
## 3 Armenia 7.43
## 4 Estonia 8.31
## 5 Tajikistan 10.7
## 6 Kyrgyzstan 11.8
## 7 Georgia 11.8
## 8 Lithuania 12.5
## 9 Denmark 28.2
## 10 New Zealand 32.5
Inference: Some countries have successfully reduced emissions, indicating effective environmental policies.
risk_data <- growth_data %>%
head(5)
risk_data
## # A tibble: 5 × 2
## country growth
## <chr> <dbl>
## 1 China 12272.
## 2 United States 6127.
## 3 India 3193.
## 4 Russia 2536.
## 5 Japan 1312.
Inference: Countries with rapid emission growth are at higher environmental risk in the future.
data$period <- ifelse(data$year < 1980, "Before 1980",
ifelse(data$year < 2000, "1980-2000", "After 2000"))
period_data <- data %>%
group_by(period) %>%
summarise(avg_co2 = mean(co2, na.rm=TRUE))
period_data
## # A tibble: 3 × 2
## period avg_co2
## <chr> <dbl>
## 1 1980-2000 622.
## 2 After 2000 867.
## 3 Before 1980 219.
Inference: CO2 emissions have increased significantly in recent decades, especially after 2000, indicating rapid industrial growth and environmental impact.
ggplot(data, aes(x=population, y=co2)) +
geom_point(alpha=0.5, color="blue") +
labs(
title="Population vs CO2 Emissions",
x="Population",
y="CO2 Emissions"
)
## Warning: Removed 25048 rows containing missing values or values outside the scale range
## (`geom_point()`).
Inference: Countries with larger populations tend to have higher total emissions.
ggplot(data, aes(x=co2_per_capita, y=co2)) +
geom_point(alpha=0.5, color="red") +
labs(
title="Per Capita vs Total Emissions",
x="CO2 per Capita",
y="Total CO2"
)
## Warning: Removed 25048 rows containing missing values or values outside the scale range
## (`geom_point()`).
Inference: Some countries have high total emissions but lower per capita values.
latest_year <- max(data$year, na.rm=TRUE)
recent_pc <- data %>%
filter(year == latest_year) %>%
filter(nchar(iso_code) == 3) %>%
arrange(desc(co2_per_capita)) %>%
head(10)
recent_pc[, c("country","co2_per_capita")]
## country co2_per_capita
## 1 Qatar 4.127109e-05
## 2 Kuwait 2.624760e-05
## 3 Brunei 2.604520e-05
## 4 Bahrain 2.426980e-05
## 5 Trinidad and Tobago 2.293176e-05
## 6 Saudi Arabia 2.037918e-05
## 7 United Arab Emirates 2.013107e-05
## 8 New Caledonia 1.806564e-05
## 9 Sint Maarten (Dutch part) 1.655446e-05
## 10 Oman 1.565111e-05
Inference: Some smaller countries have extremely high emissions per person.
ggplot(global_trend, aes(x=year, y=total_co2)) +
geom_line(color="gray") +
geom_smooth(method="loess", color="red") +
labs(title="Smoothed CO2 Emission Trend")
## `geom_smooth()` using formula = 'y ~ x'
Inference: The smoothed curve highlights the long-term upward trend in emissions.
ggplot(data, aes(x=co2)) +
geom_density(fill="green", alpha=0.5)
## Warning: Removed 21027 rows containing non-finite outside the scale range
## (`stat_density()`).
Inference: Most countries cluster at lower emission levels with a long tail of high emitters.
recent_data <- data %>%
filter(year >= max(year) - 10)
ggplot(recent_data, aes(x=year, y=co2)) +
geom_line(color="blue")
Inference: Recent years show continued increase in emissions.
stability <- data %>%
filter(nchar(iso_code) == 3) %>%
group_by(country) %>%
summarise(sd_co2 = sd(co2, na.rm=TRUE)) %>%
arrange(sd_co2)
head(stability, 10)
## # A tibble: 10 × 2
## country sd_co2
## <chr> <dbl>
## 1 Niue 0.00152
## 2 Tuvalu 0.00274
## 3 Saint Helena 0.00296
## 4 Wallis and Futuna 0.00315
## 5 Antarctica 0.00493
## 6 Montserrat 0.0128
## 7 Saint Pierre and Miquelon 0.0175
## 8 Kiribati 0.0181
## 9 Cook Islands 0.0236
## 10 Micronesia (country) 0.0241
Inference: Countries with low variation show stable emission patterns.
cor(data$population, data$co2, use="complete.obs")
## [1] 0.8481262
Inference: There is a positive correlation between population and emissions.
ggplot(data, aes(x=log(co2))) +
geom_histogram(bins=30, fill="purple")
## Warning: Removed 22381 rows containing non-finite outside the scale range
## (`stat_bin()`).
Inference: Log transformation reduces skewness and improves visualization.
top_countries_names <- top_countries$country
trend_top <- data %>%
filter(country %in% top_countries_names)
ggplot(trend_top, aes(x=year, y=co2, color=country)) +
geom_line()
## Warning: Removed 83 rows containing missing values or values outside the scale range
## (`geom_line()`).
Inference: Top contributors remain consistent over time.
low_emitters <- data %>%
filter(nchar(iso_code) == 3) %>%
arrange(co2) %>%
head(10)
low_emitters[, c("country","co2")]
## country co2
## 1 Antarctica 0
## 2 Antarctica 0
## 3 Antarctica 0
## 4 Antarctica 0
## 5 Antarctica 0
## 6 Antarctica 0
## 7 Antarctica 0
## 8 Antarctica 0
## 9 Antarctica 0
## 10 Antarctica 0
Inference: Some countries contribute very little to global emissions.
var(data$co2, na.rm=TRUE)
## [1] 3889147
Inference: High variance indicates unequal emission distribution.
ggplot(growth_data, aes(x=reorder(country, growth), y=growth)) +
geom_bar(stat="identity", fill="red") +
coord_flip()
Inference: Countries with highest growth pose future environmental risks.
summary(global_trend$total_co2)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 55.84 303.42 8010.71 44901.64 52687.84 245672.26
Inference: Global emissions show a clear increasing pattern with significant disparities across countries.
This project analyzed global pollution trends using CO2 emission data. The results show a clear increase in emissions over time, with certain countries contributing disproportionately. Future projections suggest that without intervention, environmental conditions may deteriorate further. Therefore, immediate global action is required to reduce emissions and promote sustainability.