We begin by importing the necessary libraries required for data manipulation, visualization, and analysis. Common libraries include lubridate for Date and time data manipulation, ggplot2 for Static data visualization, and tidyr for data wrangling.
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
Next, we load the dataset and perform an initial inspection. This step involves understanding the structure of the data, identifying the types of variables, and checking for any immediate issues such as missing values or incorrect data types.
## 'data.frame': 4570 obs. of 32 variables:
## $ Area : chr "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
## $ Region : chr "Middle East/Central Asia" "Middle East/Central Asia" "Middle East/Central Asia" "Middle East/Central Asia" ...
## $ Year : int 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 ...
## $ Savanna.fires : num 14.7 14.7 14.7 14.7 14.7 ...
## $ Forest.fires : num 0.0557 0.0557 0.0557 0.0557 0.0557 ...
## $ Crop.Residues : num 206 209 197 231 242 ...
## $ Rice.Cultivation : num 686 678 686 686 706 ...
## $ Drained.organic.soils..CO2. : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Pesticides.Manufacturing : num 11.8 11.7 11.7 11.7 11.7 ...
## $ Food.Transport : num 63.1 61.2 53.3 54.4 54 ...
## $ Forestland : num -2389 -2389 -2389 -2389 -2389 ...
## $ Net.Forest.conversion : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Food.Household.Consumption : num 79.1 80.5 80.8 85.1 88.8 ...
## $ Food.Retail : num 109.6 116.7 126.2 81.5 90.4 ...
## $ On.farm.Electricity.Use : num 14.27 11.42 9.28 9.06 8.4 ...
## $ Food.Packaging : num 67.6 67.6 67.6 67.6 67.6 ...
## $ Agrifood.Systems.Waste.Disposal: num 692 711 744 792 832 ...
## $ Food.Processing : num 252 252 252 252 252 ...
## $ Fertilizers.Manufacturing : num 12 12.9 13.5 14.1 15.1 ...
## $ IPPU : num 210 217 222 201 182 ...
## $ Manure.applied.to.Soils : num 260 269 265 262 268 ...
## $ Manure.left.on.Pasture : num 1591 1657 1654 1643 1689 ...
## $ Manure.Management : num 319 342 349 352 368 ...
## $ Fires.in.organic.soils : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Fires.in.humid.tropical.forests: num 0 0 0 0 0 0 0 0 0 0 ...
## $ On.farm.energy.use : num NA NA NA NA NA NA NA NA NA NA ...
## $ Rural.population : int 9655167 10230490 10995568 11858090 12690115 13401971 13952791 14373573 14733655 15137497 ...
## $ Urban.population : int 2593947 2763167 2985663 3237009 3482604 3697570 3870093 4008032 4130344 4266179 ...
## $ Total.Population...Male : int 5348387 5372959 6028494 7003641 7733458 8219467 8569175 8916862 9275541 9667811 ...
## $ Total.Population...Female : int 5346409 5372208 6028939 7000119 7722096 8199445 8537421 8871958 9217591 9595036 ...
## $ total_emission : num 2199 2324 2356 2368 2501 ...
## $ Average.Temperature.C : num 0.5362 0.0207 -0.2596 0.1019 0.3723 ...
## Area Region Year Savanna.fires
## Length:4570 Length:4570 Min. :1990 Min. : 0.00
## Class :character Class :character 1st Qu.:1998 1st Qu.: 0.05
## Mode :character Mode :character Median :2005 Median : 6.19
## Mean :2005 Mean : 1484.72
## 3rd Qu.:2013 3rd Qu.: 298.60
## Max. :2020 Max. :114616.40
##
## Forest.fires Crop.Residues Rice.Cultivation
## Min. : 0.00 Min. : 0.00 Min. : 0.0
## 1st Qu.: 0.00 1st Qu.: 21.56 1st Qu.: 126.2
## Median : 1.78 Median : 148.36 Median : 320.2
## Mean : 1178.51 Mean : 1189.19 Mean : 5592.5
## 3rd Qu.: 121.52 3rd Qu.: 486.82 3rd Qu.: 1191.2
## Max. :52227.63 Max. :33490.07 Max. :164915.3
## NA's :362
## Drained.organic.soils..CO2. Pesticides.Manufacturing Food.Transport
## Min. : 0.00 Min. : 0.0 Min. : 0.37
## 1st Qu.: 0.00 1st Qu.: 4.0 1st Qu.: 64.00
## Median : 8.25 Median : 20.0 Median : 322.05
## Mean : 5080.59 Mean : 408.8 Mean : 2457.40
## 3rd Qu.: 2461.47 3rd Qu.: 129.4 3rd Qu.: 1472.75
## Max. :241025.07 Max. :16459.0 Max. :67945.76
##
## Forestland Net.Forest.conversion Food.Household.Consumption
## Min. :-797183.1 Min. : 0.0 Min. : 0.0
## 1st Qu.: -4619.1 1st Qu.: 0.0 1st Qu.: 24.6
## Median : -170.9 Median : 156.3 Median : 283.9
## Mean : -21499.5 Mean : 12560.2 Mean : 6607.2
## 3rd Qu.: 0.0 3rd Qu.: 5667.6 3rd Qu.: 2118.4
## Max. : 171121.1 Max. :724602.2 Max. :466288.2
## NA's :93 NA's :93 NA's :177
## Food.Retail On.farm.Electricity.Use Food.Packaging
## Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 58.32 1st Qu.: 8.70 1st Qu.: 50.86
## Median : 283.22 Median : 72.26 Median : 67.63
## Mean : 2821.19 Mean : 2361.44 Mean : 2342.89
## 3rd Qu.: 1329.63 3rd Qu.: 609.75 3rd Qu.: 372.89
## Max. :133784.07 Max. :165676.30 Max. :175741.31
##
## Agrifood.Systems.Waste.Disposal Food.Processing Fertilizers.Manufacturing
## Min. : 0.34 Min. : 0.0 Min. : 0.0
## 1st Qu.: 317.28 1st Qu.: 209.6 1st Qu.: 356.5
## Median : 1327.40 Median : 331.1 Median : 658.5
## Mean : 7681.93 Mean : 5456.2 Mean : 3606.2
## 3rd Qu.: 4835.84 3rd Qu.: 1612.6 3rd Qu.: 2327.9
## Max. :213289.70 Max. :274253.5 Max. :170826.4
##
## IPPU Manure.applied.to.Soils Manure.left.on.Pasture
## Min. : 0.3 Min. : 0.106 Min. : 0.0
## 1st Qu.: 142.9 1st Qu.: 28.468 1st Qu.: 213.6
## Median : 1605.3 Median : 147.864 Median : 862.6
## Mean : 26440.4 Mean : 1103.153 Mean : 3518.3
## 3rd Qu.: 9533.7 3rd Qu.: 550.386 3rd Qu.: 2628.3
## Max. :1861640.7 Max. :29865.389 Max. :60880.4
## NA's :186 NA's :176
## Manure.Management Fires.in.organic.soils Fires.in.humid.tropical.forests
## Min. : 0.57 Min. : 0 Min. : 0.00
## 1st Qu.: 83.72 1st Qu.: 0 1st Qu.: 0.00
## Median : 379.87 Median : 0 Median : 0.00
## Mean : 2866.95 Mean : 1845 Mean : 829.25
## 3rd Qu.: 1540.21 3rd Qu.: 0 3rd Qu.: 14.03
## Max. :70592.65 Max. :991718 Max. :51771.26
## NA's :176
## On.farm.energy.use Rural.population Urban.population
## Min. : 0.03 Min. : 0 Min. : 0
## 1st Qu.: 26.08 1st Qu.: 728122 1st Qu.: 1195061
## Median : 276.86 Median : 3394547 Median : 3780934
## Mean : 4008.63 Mean : 26026424 Mean : 22506955
## 3rd Qu.: 1676.45 3rd Qu.: 11812388 3rd Qu.: 12032732
## Max. :139388.92 Max. :900099113 Max. :902077760
## NA's :628
## Total.Population...Male Total.Population...Female total_emission
## Min. : 2939 Min. : 2966 Min. :-391884
## 1st Qu.: 1227770 1st Qu.: 1158133 1st Qu.: 6223
## Median : 4016625 Median : 4119207 Median : 15656
## Mean : 24540823 Mean : 24045410 Mean : 77992
## 3rd Qu.: 12874136 3rd Qu.: 12802408 3rd Qu.: 51095
## Max. :743586579 Max. :713341908 Max. :3115114
##
## Average.Temperature.C
## Min. :-1.3024
## 1st Qu.: 0.5210
## Median : 0.8968
## Mean : 0.9222
## 3rd Qu.: 1.2904
## Max. : 3.5581
##
Missing values can significantly affect the quality and accuracy of our analysis. Therefore, handling them appropriately is crucial. We can either remove rows with missing values, fill them with suitable values, or use advanced imputation methods.
To determine if there are any missing values in the data, we can use the anyNA() function
## [1] TRUE
The anyNA(CO_Emission) function in R checks if there are any NA (missing) values in the CO_Emission variable. The result [1] TRUE indicates that there are indeed missing values present in the CO_Emission variable.
Step-by-Step Handling of Missing Values -. Identify Missing Values by using sapply() function or is.na() and colSums() functions
# Identify missing values
missing_values <- sapply(CO_Emission, function(x) sum(is.na(x)))
missing_values## Area Region
## 0 0
## Year Savanna.fires
## 0 0
## Forest.fires Crop.Residues
## 0 362
## Rice.Cultivation Drained.organic.soils..CO2.
## 0 0
## Pesticides.Manufacturing Food.Transport
## 0 0
## Forestland Net.Forest.conversion
## 93 93
## Food.Household.Consumption Food.Retail
## 177 0
## On.farm.Electricity.Use Food.Packaging
## 0 0
## Agrifood.Systems.Waste.Disposal Food.Processing
## 0 0
## Fertilizers.Manufacturing IPPU
## 0 186
## Manure.applied.to.Soils Manure.left.on.Pasture
## 176 0
## Manure.Management Fires.in.organic.soils
## 176 0
## Fires.in.humid.tropical.forests On.farm.energy.use
## 0 628
## Rural.population Urban.population
## 0 0
## Total.Population...Male Total.Population...Female
## 0 0
## total_emission Average.Temperature.C
## 0 0
To check for missing values in each column, we can use the is.na() and colSums() functions.
## Area Region
## 0 0
## Year Savanna.fires
## 0 0
## Forest.fires Crop.Residues
## 0 362
## Rice.Cultivation Drained.organic.soils..CO2.
## 0 0
## Pesticides.Manufacturing Food.Transport
## 0 0
## Forestland Net.Forest.conversion
## 93 93
## Food.Household.Consumption Food.Retail
## 177 0
## On.farm.Electricity.Use Food.Packaging
## 0 0
## Agrifood.Systems.Waste.Disposal Food.Processing
## 0 0
## Fertilizers.Manufacturing IPPU
## 0 186
## Manure.applied.to.Soils Manure.left.on.Pasture
## 176 0
## Manure.Management Fires.in.organic.soils
## 176 0
## Fires.in.humid.tropical.forests On.farm.energy.use
## 0 628
## Rural.population Urban.population
## 0 0
## Total.Population...Male Total.Population...Female
## 0 0
## total_emission Average.Temperature.C
## 0 0
From the results above, we can see that our data consists of 4570 rows and 32 columns (4570 obs. of 32 variables). Additionally, we noticed that some columns contain empty character strings (““), which can prevent these data points from being recognized as missing values.
-. Remove Rows with Many Missing Values:
From step Identify Missing Values by using sapply() function or is.na() and colSums() functions, we have identified several column that has missing value as below. From the list below we pick only 2 columns that has missing value more than 300 : Crop.Residues and On.farm.energy.use.
List of missing values columns: Crop.Residues: 362 missing values Net.Forest.conversion: 93 missing values Food.Household.Consumption: 177 missing values Fertilizers.Manufacturing: 186 missing values Manure.applied.to.Soils: 176 missing values Manure.left.on.Pasture: 176 missing values Manure.Management: 176 missing values On.farm.energy.use: 628 missing values
CO_Emission_Clean <- subset(CO_Emission, select = -c(On.farm.energy.use, Crop.Residues))
CO_Emission_CleanTo another column that still have missing value, we should convert these empty strings to 0. Some columns have missing values which need to be addressed, as the presence of these 0 values can affect the analysis.
-. Replace Empty Column with 0:
# Replace empty column with 0
CO_Emission_Clean[is.na(CO_Emission_Clean)] <- 0
str(CO_Emission_Clean)## 'data.frame': 4570 obs. of 30 variables:
## $ Area : chr "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
## $ Region : chr "Middle East/Central Asia" "Middle East/Central Asia" "Middle East/Central Asia" "Middle East/Central Asia" ...
## $ Year : int 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 ...
## $ Savanna.fires : num 14.7 14.7 14.7 14.7 14.7 ...
## $ Forest.fires : num 0.0557 0.0557 0.0557 0.0557 0.0557 ...
## $ Rice.Cultivation : num 686 678 686 686 706 ...
## $ Drained.organic.soils..CO2. : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Pesticides.Manufacturing : num 11.8 11.7 11.7 11.7 11.7 ...
## $ Food.Transport : num 63.1 61.2 53.3 54.4 54 ...
## $ Forestland : num -2389 -2389 -2389 -2389 -2389 ...
## $ Net.Forest.conversion : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Food.Household.Consumption : num 79.1 80.5 80.8 85.1 88.8 ...
## $ Food.Retail : num 109.6 116.7 126.2 81.5 90.4 ...
## $ On.farm.Electricity.Use : num 14.27 11.42 9.28 9.06 8.4 ...
## $ Food.Packaging : num 67.6 67.6 67.6 67.6 67.6 ...
## $ Agrifood.Systems.Waste.Disposal: num 692 711 744 792 832 ...
## $ Food.Processing : num 252 252 252 252 252 ...
## $ Fertilizers.Manufacturing : num 12 12.9 13.5 14.1 15.1 ...
## $ IPPU : num 210 217 222 201 182 ...
## $ Manure.applied.to.Soils : num 260 269 265 262 268 ...
## $ Manure.left.on.Pasture : num 1591 1657 1654 1643 1689 ...
## $ Manure.Management : num 319 342 349 352 368 ...
## $ Fires.in.organic.soils : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Fires.in.humid.tropical.forests: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Rural.population : int 9655167 10230490 10995568 11858090 12690115 13401971 13952791 14373573 14733655 15137497 ...
## $ Urban.population : int 2593947 2763167 2985663 3237009 3482604 3697570 3870093 4008032 4130344 4266179 ...
## $ Total.Population...Male : int 5348387 5372959 6028494 7003641 7733458 8219467 8569175 8916862 9275541 9667811 ...
## $ Total.Population...Female : int 5346409 5372208 6028939 7000119 7722096 8199445 8537421 8871958 9217591 9595036 ...
## $ total_emission : num 2199 2324 2356 2368 2501 ...
## $ Average.Temperature.C : num 0.5362 0.0207 -0.2596 0.1019 0.3723 ...
check again for missing values in each column, we can use the is.na() and colSums() functions.
## Area Region
## 0 0
## Year Savanna.fires
## 0 0
## Forest.fires Rice.Cultivation
## 0 0
## Drained.organic.soils..CO2. Pesticides.Manufacturing
## 0 0
## Food.Transport Forestland
## 0 0
## Net.Forest.conversion Food.Household.Consumption
## 0 0
## Food.Retail On.farm.Electricity.Use
## 0 0
## Food.Packaging Agrifood.Systems.Waste.Disposal
## 0 0
## Food.Processing Fertilizers.Manufacturing
## 0 0
## IPPU Manure.applied.to.Soils
## 0 0
## Manure.left.on.Pasture Manure.Management
## 0 0
## Fires.in.organic.soils Fires.in.humid.tropical.forests
## 0 0
## Rural.population Urban.population
## 0 0
## Total.Population...Male Total.Population...Female
## 0 0
## total_emission Average.Temperature.C
## 0 0
The results of colSums(is.na(CO_Emission_Clean)) show that there are no missing values (NA) in any of the columns of the CO_Emission_Clean data frame.
To Rename columns we can use make.names() functions. The make.names() function in R is used to ensure that the names of objects, such as column names in a data frame, are syntactically valid variable names. This is particularly useful when the names might contain spaces, special characters, or other invalid syntax for variable names in R
# Rename columns if needed
colnames(CO_Emission_Clean) <- make.names(colnames(CO_Emission_Clean), unique = TRUE)## [1] "Area" "Region"
## [3] "Year" "Savanna.fires"
## [5] "Forest.fires" "Rice.Cultivation"
## [7] "Drained.organic.soils..CO2." "Pesticides.Manufacturing"
## [9] "Food.Transport" "Forestland"
## [11] "Net.Forest.conversion" "Food.Household.Consumption"
## [13] "Food.Retail" "On.farm.Electricity.Use"
## [15] "Food.Packaging" "Agrifood.Systems.Waste.Disposal"
## [17] "Food.Processing" "Fertilizers.Manufacturing"
## [19] "IPPU" "Manure.applied.to.Soils"
## [21] "Manure.left.on.Pasture" "Manure.Management"
## [23] "Fires.in.organic.soils" "Fires.in.humid.tropical.forests"
## [25] "Rural.population" "Urban.population"
## [27] "Total.Population...Male" "Total.Population...Female"
## [29] "total_emission" "Average.Temperature.C"
To Convert data type we can use below functions. *
as.character() * as.Date() *
as.integer() * as.numeric() *
as.factor()
From the str(CO_Emission_Clean),The data types of the dataset seem appropriate for the types of data they represent. However, there might be some considerations for further analysis:
Area and Region: These columns are currently character (chr) types, which is suitable for categorical data. However, converting them to factor (factor) types could be beneficial for analysis and modeling.
Year: The int type is appropriate for the Year column.
All other numeric columns: These are already of type num, which is appropriate for continuous numeric data.
So,we only change the Area and Region from chr to factor:
To verify the data after changes, we can use the str() function
## 'data.frame': 4570 obs. of 30 variables:
## $ Area : Factor w/ 153 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Region : Factor w/ 6 levels "Africa","Asia-Pacific",..: 4 4 4 4 4 4 4 4 4 4 ...
## $ Year : int 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 ...
## $ Savanna.fires : num 14.7 14.7 14.7 14.7 14.7 ...
## $ Forest.fires : num 0.0557 0.0557 0.0557 0.0557 0.0557 ...
## $ Rice.Cultivation : num 686 678 686 686 706 ...
## $ Drained.organic.soils..CO2. : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Pesticides.Manufacturing : num 11.8 11.7 11.7 11.7 11.7 ...
## $ Food.Transport : num 63.1 61.2 53.3 54.4 54 ...
## $ Forestland : num -2389 -2389 -2389 -2389 -2389 ...
## $ Net.Forest.conversion : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Food.Household.Consumption : num 79.1 80.5 80.8 85.1 88.8 ...
## $ Food.Retail : num 109.6 116.7 126.2 81.5 90.4 ...
## $ On.farm.Electricity.Use : num 14.27 11.42 9.28 9.06 8.4 ...
## $ Food.Packaging : num 67.6 67.6 67.6 67.6 67.6 ...
## $ Agrifood.Systems.Waste.Disposal: num 692 711 744 792 832 ...
## $ Food.Processing : num 252 252 252 252 252 ...
## $ Fertilizers.Manufacturing : num 12 12.9 13.5 14.1 15.1 ...
## $ IPPU : num 210 217 222 201 182 ...
## $ Manure.applied.to.Soils : num 260 269 265 262 268 ...
## $ Manure.left.on.Pasture : num 1591 1657 1654 1643 1689 ...
## $ Manure.Management : num 319 342 349 352 368 ...
## $ Fires.in.organic.soils : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Fires.in.humid.tropical.forests: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Rural.population : int 9655167 10230490 10995568 11858090 12690115 13401971 13952791 14373573 14733655 15137497 ...
## $ Urban.population : int 2593947 2763167 2985663 3237009 3482604 3697570 3870093 4008032 4130344 4266179 ...
## $ Total.Population...Male : int 5348387 5372959 6028494 7003641 7733458 8219467 8569175 8916862 9275541 9667811 ...
## $ Total.Population...Female : int 5346409 5372208 6028939 7000119 7722096 8199445 8537421 8871958 9217591 9595036 ...
## $ total_emission : num 2199 2324 2356 2368 2501 ...
## $ Average.Temperature.C : num 0.5362 0.0207 -0.2596 0.1019 0.3723 ...
The dataset CO_Emission_Clean has been meticulously cleaned and structured for analysis. It consists of 4570 observations across 30 variables, including factors like Area and Region, which have been appropriately converted. Numeric variables such as emissions from various sources, population data, and temperature measurements are well-distributed with no missing values following the replacement of empty entries with zeros. This preparation ensures the dataset is ready for comprehensive exploration and modeling to understand the relationships between emissions, population dynamics, and environmental factors across different regions and years.
Let see our data, with head() function:
Let’s take a look at the CO_Emission_Clean data by year. We can use Summary()function:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1990 1998 2005 2005 2013 2020
From the output, the earliest year in CO_Emission_Clean data set is 1990 and the latest year is 2020. The dataset CO_Emission_Clean spans from the year 1990 to 2020. The median year of the dataset is 2005, indicating that the data is roughly centered around this period, with a fairly even spread across the years from 1990 to 2020.
Let’s see how many regions in data set CO_Emission_Clean, using unique() function
# Get unique regions
unique_regions <- unique(CO_Emission_Clean$Region)
# Display the list of unique regions
print(unique_regions)## [1] Middle East/Central Asia Northern/Eastern Europe Africa
## [4] European Union Asia-Pacific North America
## 6 Levels: Africa Asia-Pacific European Union ... Northern/Eastern Europe
unique(CO_Emission_Clean$Region) function retrieves all unique regions listed in the Region column of the CO_Emission_Clean dataset. The dataset CO_Emission_Clean contains observations from six unique regions: 1. Middle East/Central Asia 2. Northern/Eastern Europe 3. Africa 4. European Union 5. Asia-Pacific 6. North America
The dataset CO_Emission_Clean spans from the year 1990 to 2020 and contains observations of CO_Emission from 6 unique regions. Let’s visually explore the data through exploratory visualization.
First we want to know Total CO2 emissions based on region and year. Here are the step to get the visualization.
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2) # for plotting (optional)
library(scales) # for comma formatting
# Aggregate CO2 emissions by Region and Year
CO2_agg <- CO_Emission_Clean %>%
group_by(Region, Year) %>%
summarise(Total_CO2_Emission = sum(total_emission, na.rm = TRUE), .groups = 'drop')%>%
arrange(desc(Total_CO2_Emission))%>%
mutate(text = paste0("Total_CO2_Emission:", {comma(Total_CO2_Emission)}, " kt"))For visualization, we can use ggplot2, a popular package for creating visualization in R, as follow:
ggplot(CO2_agg, aes(x = Year, y = Total_CO2_Emission, color = Region)) +
geom_line(size = 1.2) +
geom_point(size = 2.5) +
labs(title = "Total CO2 Emissions by Region per Year",
x = "Year",
y = "Total CO2 Emissions (kt)",
color = "Region") +
scale_y_continuous(labels = scales::comma) +
scale_color_manual(values = c(
"Middle East/Central Asia" = "#1f77b4",
"Northern/Eastern Europe" = "#ff7f0e",
"Africa" = "#2ca02c",
"European Union" = "#d62728",
"Asia-Pacific" = "#9467bd",
"North America" = "#8c564b"
)) +
theme_minimal() +
theme(
panel.background = element_rect(fill = "#f0f0f0"), # Set background color
plot.background = element_rect(fill = "#C4D5C5"),
panel.grid.major = element_line(size = 0.5, linetype = "solid", color = "gray"), # Major grid lines
panel.grid.minor = element_line(size = 0.2, linetype = "dotted", color = "gray"), # Minor grid lines
legend.position = "bottom", # Position of the legend
legend.title = element_text(face = "bold"), # Title of the legend
legend.text = element_text(size = 10), # Text of the legend
plot.title = element_text(face = "bold", size = 14, hjust = 0.5, color = "#333333"),
axis.text.y = element_text(size = 10, color = "black"),
axis.title = element_text(size = 12, face = "bold", color = "#333333"), # Adjust axis title
axis.text = element_text(size = 10, color = "black") # Adjust axis text
)## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
The visualization of “Total CO2 Emissions by Region per Year” provides insights of how CO2 emissions have varied across different regions from 1990 to 2020.
This exploratory visualization helps us understand the broad patterns and variations in CO2 emissions across regions over time, laying the foundation for more detailed analyses or policy discussions regarding climate change mitigation strategies.
We want to point out Region Asia-Pasific as exhibit the higher emission compared to others Region. We want to look the correlation between the Emission with average temperature increase. Below is the step:
ggplot2 is used for creating plots. glue is used for string interpolation, allowing us to create informative text labels. scales provides functions for formatting numbers. ggrepel helps in avoiding overlap of text labels in the plot.
library(ggplot2) # Static data visualization
library(glue) # String interpolation
library(scales) # for comma formatting
library(ggrepel) ## Warning: package 'ggrepel' was built under R version 4.4.1
The filter function selects records specific to the “Asia-Pacific” region. The group_by function groups the data by Year. The summarise function calculates the total CO2 emissions, average temperature increase, and sums of rural and urban populations for each year. The arrange function sorts the data by Year. *The mutate function creates two text labels (text and smooth_text) for annotating the plot.
# Data Preparation
CO_Emission_Region_Temperature <-
CO_Emission_Clean %>%
filter(Region=="Asia-Pacific")%>%
group_by(Year) %>%
summarise(total_emission = sum(total_emission), Average.Temperature =mean(Average.Temperature.C), Rural_Population=sum(Rural.population), Urban_Population=sum(Urban.population)) %>%
arrange(Year)%>%
mutate(text = glue("Total_Emission: {comma(total_emission)} kt
Average Temperature Increase: {format(Average.Temperature, digits= 1, nsmall =2)} C"),
smooth_text = glue("({format(Average.Temperature, digits = 2, nsmall = 2)} C, {comma(total_emission, accuracy = 0.01)} kt)"))geom_smooth adds a smoothed line with specified color, fill, size, and transparency. geom_point adds points, mapping the size to the combined rural and urban population and color to the year. geom_text_repel adds text labels that avoid overlapping. scale_y_continuous formats the y-axis labels with commas. scale_size_continuous adjusts the size range of the points. scale_color_viridis_c provides a color gradient for the year variable. labs sets the plot title and axis labels. theme_minimal applies a minimal theme, with further customizations for the title, background, grid lines, axis, and legend.
# Plotting with ggplot2
ggplot(data = CO_Emission_Region_Temperature, aes(x = Average.Temperature, y = total_emission)) +
geom_smooth(col = "maroon", fill = "lightpink", size = 1, alpha = 1) +
geom_point(aes(size = Rural_Population + Urban_Population, color = Year)) + # Map color to Year for legend
geom_text_repel(aes(label = smooth_text), vjust = -0.5, hjust = 0.5, size = 1.5, color = "blue") +
scale_y_continuous(labels = comma) +
scale_size_continuous(range = c(1, 6), guide = "none") +
scale_color_viridis_c(option = "A", direction = -1, labels = scales::number_format()) +
labs(title = "Correlation CO2 Emissions and Average Temperature Increase in Asia-Pacific",
x = "Average Temperature Increase (\u00B0C)", # Unicode for degree symbol
y = "Total CO2 Emission (kt)",
color = "Year") + # Add color legend label
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", size = 12, hjust = 0.5, color = "#333333"),
panel.background = element_rect(fill = "#f9f9f9"),
plot.background = element_rect(fill = "#C4D5C5"),
panel.grid.major = element_line(colour = "grey"),
axis.line = element_line(color = "grey"),
axis.text = element_text(size = 10, colour = "black"),
legend.position = "right" # Add the legend back to the plot
)## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
The code filters and aggregates CO2 emissions and average temperature data for the Asia-Pacific region, then creates a scatter plot showing the relationship between these variables over the years. The plot includes a smoothed trend line, points sized by population, and text annotations for each point. The color of the points represents the year, and a legend is included to indicate this. By incorporating various elements such as size, color, and text annotations, the plot provides a clear and informative depiction of the data. The use of ggrepel ensures that the text labels do not overlap, making the plot easy to interpret. This visualization effectively illustrates the correlation between CO2 emissions and average temperature increase in the Asia-Pacific region over time.
The visualization and analysis indicates that in the Asia-Pacific region, there is a correlation between the increase in CO2 emissions and the average temperature rise over the years. As CO2 emissions increase, the average temperature also tends to rise, suggesting a potential link between higher emissions and temperature increases.
The findings from this analysis and visualization can serve as a foundation for further research and analysis, particularly in the context of “go green” initiatives. Policymakers and environmentalists can use these insights to design and implement strategies aimed at reducing CO2 emissions and mitigating climate change. Furthermore, the data can inform social and political policies, encouraging governments and organizations to adopt sustainable practices and prioritize environmental conservation. This analysis underscores the importance of understanding the impact of emissions on temperature increases, highlighting the need for concerted efforts to address climate change at both national and regional levels.