This report looks into a country-level Global Waste Management data set from World Bank to explore the relationship between a country’s GDP and its waste generation. The data set can be found from https://datacatalog.worldbank.org/search/dataset/0039597/What-a-Waste-Global-Database
#config working directory
setwd("~/Desktop/mydata")
#load necessary libraries
library(readr)
library(tidyverse)
library(ggplot2)
library(ggthemes)
#import data
data = read_csv("global_waste_data.csv")
head(data)
## # A tibble: 6 × 51
## iso3c region_id country_name income_id gdp composition_food_organ…¹
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 ABW LCN Aruba HIC 35563. NA
## 2 AFG SAS Afghanistan LIC 2057. NA
## 3 AGO SSF Angola LMC 8037. 51.8
## 4 ALB ECS Albania UMC 13724. 51.4
## 5 AND ECS Andorra HIC 43712. 31.2
## 6 ARE MEA United Arab Emirates HIC 67119. 39
## # ℹ abbreviated name: ¹​composition_food_organic_waste_percent
## # ℹ 45 more variables: composition_glass_percent <dbl>,
## # composition_metal_percent <dbl>, composition_other_percent <dbl>,
## # composition_paper_cardboard_percent <dbl>,
## # composition_plastic_percent <dbl>,
## # composition_rubber_leather_percent <dbl>, composition_wood_percent <dbl>,
## # composition_yard_garden_green_waste_percent <dbl>, …
The data set has many columns half-filled or less. A new data set is created by removing them because they would not be useful in the analysis.
slice = data %>% select(3,4,5,20,6,10,11,27)
names(slice)
## [1] "country_name"
## [2] "income_id"
## [3] "gdp"
## [4] "population_population_number_of_people"
## [5] "composition_food_organic_waste_percent"
## [6] "composition_paper_cardboard_percent"
## [7] "composition_plastic_percent"
## [8] "total_msw_total_msw_generated_tons_year"
The column names of the sliced data set needs to be renamed shorted because they are too long to be referenced.
slice = rename(slice, "population" = "population_population_number_of_people",
"food&organic_waste(%)" = "composition_food_organic_waste_percent",
"paper_waste(%)" = "composition_paper_cardboard_percent",
"plastic_waste(%)" = "composition_plastic_percent",
"annual_total_waste(tons)" = "total_msw_total_msw_generated_tons_year")
names(slice)
## [1] "country_name" "income_id"
## [3] "gdp" "population"
## [5] "food&organic_waste(%)" "paper_waste(%)"
## [7] "plastic_waste(%)" "annual_total_waste(tons)"
In this section, rows with NA data will be removed. This however will not have much impact on the data set because those are either very small countries or territories. Also, income_id is changed to a factor so that it can be summarized in the later sections.
#remove rows with NA values
slice = slice %>% filter(complete.cases(slice))
#convert income_id as factors
income_id = as.factor(slice$income_id)
levels(slice$income_id) = c("LIC", "LMC", "UMC", "HIC")
head(slice)
## # A tibble: 6 × 8
## country_name income_id gdp population `food&organic_waste(%)`
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 Angola LMC 8037. 25096150 51.8
## 2 Albania UMC 13724. 2854191 51.4
## 3 Andorra HIC 43712. 82431 31.2
## 4 United Arab Emirates HIC 67119. 9770529 39
## 5 Argentina HIC 23550. 42981516 38.7
## 6 Armenia UMC 11020. 2906220 57
## # ℹ 3 more variables: `paper_waste(%)` <dbl>, `plastic_waste(%)` <dbl>,
## # `annual_total_waste(tons)` <dbl>
Calculate the average waste by category per GDP income group to see the relationship between GDP and waste generation
attach(slice)
slice %>% group_by(income_id) %>% summarize(avg_total = mean(`annual_total_waste(tons)`), food_organic_waste_avg = mean(`food&organic_waste(%)`), paper_waste_avg = mean(`paper_waste(%)`), plastic_waste_avg = mean(`plastic_waste(%)`)) %>% arrange(income_id)
## # A tibble: 4 × 5
## income_id avg_total food_organic_waste_avg paper_waste_avg plastic_waste_avg
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 HIC 10037674. 33.0 21.4 12.4
## 2 LIC 2528783. 51.3 7.29 7.97
## 3 LMC 6635725. 50.4 10.7 11.3
## 4 UMC 16920344. 46.6 12.2 12.6
The conclusions from the above data are as below:
**HIC: High income Country | UMC: Upper Middle Income Country | LMC: Low Middle Income Country | LIC: Low Income Country
Data summary by GDP income group will be performed again for countries with population higher than 10 million
big_economies = slice %>% filter(population > 10000000)
head(big_economies)
## # A tibble: 6 × 8
## country_name income_id gdp population `food&organic_waste(%)`
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 Angola LMC 8037. 25096150 51.8
## 2 Argentina HIC 23550. 42981516 38.7
## 3 Australia HIC 47784. 23789338 48.4
## 4 Belgium HIC 51915. 11484055 14.2
## 5 Burkina Faso LIC 1925. 18110624 21
## 6 Bangladesh LMC 3196. 155727056 80.6
## # ℹ 3 more variables: `paper_waste(%)` <dbl>, `plastic_waste(%)` <dbl>,
## # `annual_total_waste(tons)` <dbl>
detach(slice)
attach(big_economies)
## The following object is masked _by_ .GlobalEnv:
##
## income_id
## The following object is masked from package:tidyr:
##
## population
The below summary statistics shows that the following points verified:
However, below is a new finding from big_economies dataset:
**HIC: High income Country | UMC: Upper Middle Income Country | LMC: Low Middle Income Country | LIC: Low Income Country
big_economies %>% group_by(income_id) %>% summarize(avg_total = mean(`annual_total_waste(tons)`), food_organic_waste_avg = mean(`food&organic_waste(%)`), paper_waste_avg = mean(`paper_waste(%)`), plastic_waste_avg = mean(`plastic_waste(%)`)) %>% arrange(income_id)
## # A tibble: 4 × 5
## income_id avg_total food_organic_waste_avg paper_waste_avg plastic_waste_avg
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 HIC 31260457. 35.1 22.9 11.5
## 2 LIC 3342614. 50.4 8.33 9.43
## 3 LMC 11673521. 57.4 9.66 9.44
## 4 UMC 40511531. 52.3 11.9 12.1
The graphical representation of GDP versus different waste category further corroborates our previous finding as below:
par(mfrow = c(1, 2))
plot(x=`food&organic_waste(%)`, y=gdp, type="p", xlab="% in Total Waste Composition", ylab="GDP in US$", main="GDP VS. Food&Organic Waste", frame.plot=FALSE)
plot(x=`paper_waste(%)`, y=gdp, type="p", xlab="% in Total Waste Composition", ylab="GDP in US$", main="GDP VS. Paper Waste", frame.plot=FALSE)
The scatter plot below shows that there is no specific relationship between GDP and Plastic Waste. This indicates all countries should contribute in plastic waste reduction.
ggplot(slice, aes(x=`plastic_waste(%)`, y=gdp)) + geom_point(color="#4cbea3") + labs(title = "Plastic Waste VS. Country's GDP", x = "% in Total Waste Composition", y = "GDP in US$") + theme_economist_white()