I used open data available from the World Bank to conduct this report. This is open data as on the World Bank website it says here, under terms of use, that “Unless indicated otherwise in the data or indicator metadata, you are free to copy, distribute, adapt, display or include the data in other products for commercial or noncommercial purposes at no cost …”. For this project I used two datasets. The first that I used was a dataset containing the GDP of each country between the years of 1960 and 2019 (link). The second was a dataset logging the square kilometers of forest on each country every year from 1990 to 2018 (link). I chose these datasets because I was curios to see if there was a noticeable correlation between A country’s growing GDP and decreasing forest area as a result of clearing out forest for development.
Since I wanted to compare these two datsets I was going to have to get them so that they contained identical years and countries. After merging the separate metadata containing the Region and Income Group information removing all “Year” columns that weren’t between 1990 and 2005 (a self-designated workable selection), I went to removing unusable rows. This was done pretty quickly as I just removed any row that contained a cell with a value of “na”. I was left with a GDP dataset that contained fewer entries than the Forest area set so I filtered both of them based off the countries contained within the other. I then created four data frames. From the GDP dataset I created a a data frame that grouped all of the countries by income group and summed their GDP, then a second frame that did the same only it was grouped by Region. The same was done for the Forest area set.These later became two data frames as I merged the Region frames together and the Income Group frames together.
This was the part I struggled the hardest with. The way I had set up the dataframes/tibbles/tables was not one that lent itself to creating the graphs I had visualized. Fortunately, I came across the “melt()” operator which rearranged the data into a format that I could easily work into a ggplot chart. Now the chart I had envisioned when I set out on this project had two y-axis. one for GDP, and one for forest area. I realized pretty quickly that ggplot doesn’t have an easy way of integrating that second y-axis that is completely unrelated to the values on the main one. I eventually was able to find a way to accomplish this online that required a bit of modification to work with my own dataset, but it seems to have done the job pretty well.
This first chart shows the yearly GDP of the given world regions with the yearly total square kilometers of forest in those regions on the secondary y axis. A couple interesting things to note:
This second chart shows the yearly GDP of the given world income groups with the yearly total square kilometers of forest attributed to countries falling into those income groups on the secondary y axis. A couple interesting things to note:
| Income Group | 1990 GDP (USD) | 1995 GDP (USD) | 2000 GDP (USD) | 2005 GDP (USD) |
|---|---|---|---|---|
| High income | 18,631,932,651,798 | 25,354,207,284,560 | 27,014,701,512,745 | 36,898,505,941,734 |
| Upper middle income | 2,592,460,889,257 | 3,773,445,747,420 | 4,460,496,297,357 | 7,248,839,567,794 |
| Lower middle income | 857,549,762,009 | 946,188,224,073 | 1,201,245,959,543 | 1,994,669,454,736 |
| Low income | 76,985,772,892 | 63,156,790,705 | 96,262,502,565 | 135,384,720,274 |
I don’t feel that I have unearthed any data here that shows a correlation between growing GDP and deforestation. There are a few instances where an argument seemingly could be made for this to be the case, but I believe more a detailed analyses would be needed before anything could be stated concretely.