From data from gapminder https://www.gapminder.org/data/ I downloaded a csv file named “CO2 emissions (tonnes per person)”
The csv file has a data of countries in the world and how many tonnes per person of CO2 was emitted into the air. The records are from 1960 and 2014. There are lots of missing data in old years.
setwd("C:/workspace")
co2=read.csv("co2emission.csv",header=T,stringsAsFactors = FALSE)
head(co2)
## 癤풠ountry.Name Country.Code Indicator.Name
## 1 Aruba ABW CO2 emissions (metric tons per capita)
## 2 Afghanistan AFG CO2 emissions (metric tons per capita)
## 3 Angola AGO CO2 emissions (metric tons per capita)
## 4 Albania ALB CO2 emissions (metric tons per capita)
## 5 Andorra AND CO2 emissions (metric tons per capita)
## 6 Arab World ARB CO2 emissions (metric tons per capita)
## X1960 X1961 X1962 X1963 X1964 X1965
## 1 NA NA NA NA NA NA
## 2 0.0460599 0.05360430 0.07376479 0.07423269 0.08629245 0.1014674
## 3 0.0974716 0.07903808 0.20128908 0.19253474 0.20100336 0.1915284
## 4 1.2581949 1.37418605 1.43995596 1.18168114 1.11174196 1.1660990
## 5 NA NA NA NA NA NA
## 6 0.6436890 0.68515088 0.76085451 0.87494119 0.99909766 1.1657054
## X1966 X1967 X1968 X1969 X1970 X1971 X1972
## 1 NA NA NA NA NA NA NA
## 2 0.1076370 0.1237343 0.1154977 0.08682346 0.1502906 0.1660420 0.1307638
## 3 0.2464128 0.1549116 0.2563160 0.41955056 0.5286980 0.4923022 0.6352147
## 4 1.3330555 1.3637463 1.5195513 1.55896757 1.7532399 1.9894979 2.5159144
## 5 NA NA NA NA NA NA NA
## 6 1.2726507 1.3314044 1.5449425 1.78991339 1.8012455 1.9934862 2.1098716
## X1973 X1974 X1975 X1976 X1977 X1978 X1979
## 1 NA NA NA NA NA NA NA
## 2 0.1362798 0.1556494 0.1689286 0.1547872 0.1829636 0.1631596 0.1683767
## 3 0.6706243 0.6520234 0.5746931 0.4158503 0.4347550 0.6461792 0.6369442
## 4 2.3038974 1.8490067 1.9106336 2.0135846 2.2758764 2.5306250 2.8982085
## 5 NA NA NA NA NA NA NA
## 6 2.3967997 2.2734407 2.1842635 2.5687100 2.6316284 2.7432413 2.8426551
## X1980 X1981 X1982 X1983 X1984 X1985 X1986
## 1 NA NA NA NA NA NA 2.8683194
## 2 0.1328586 0.1519729 0.1648039 0.2036356 0.2349877 0.2978277 0.2708911
## 3 0.5987173 0.5712019 0.4852515 0.5150715 0.4873957 0.4431214 0.4267687
## 4 1.9350583 2.6930239 2.6248568 2.6832399 2.6942914 2.6580154 2.6653562
## 5 NA NA NA NA NA NA NA
## 6 3.0692088 2.9070579 2.7011166 2.7933569 2.9563182 3.0355580 3.2555366
## X1987 X1988 X1989 X1990 X1991 X1992
## 1 7.2351980 10.0261792 10.6347326 26.3745032 26.0461298 21.44255880
## 2 0.2716117 0.2484726 0.2356946 0.2134498 0.1876727 0.09966647
## 3 0.5184278 0.4455573 0.4235243 0.4202843 0.4054501 0.40067865
## 4 2.4140608 2.3315985 2.7832431 1.6781067 1.3122126 0.77472491
## 5 NA NA NA 7.4673357 7.1824566 6.91205339
## 6 3.1688219 3.2644890 3.2261271 2.9890081 3.2072246 3.38524700
## X1993 X1994 X1995 X1996 X1997 X1998
## 1 22.00078616 21.03624511 20.77193616 20.3183534 20.42681771 20.58766915
## 2 0.08915404 0.08003917 0.07269862 0.0660447 0.05964838 0.05520717
## 3 0.43088926 0.28109258 0.76917343 0.7123063 0.48920938 0.47137391
## 4 0.72379029 0.60020371 0.65453713 0.6366253 0.49036506 0.56027144
## 5 6.73605485 6.49420042 6.66205168 7.0650715 7.23971272 7.66078389
## 6 3.63837855 3.64485889 3.39819977 3.3047937 3.12484849 3.32954828
## X1999 X2000 X2001 X2002 X2003 X2004
## 1 20.3115668 26.19487524 25.93402441 25.67116178 26.4204521 26.51729342
## 2 0.0423326 0.03850634 0.03900233 0.04871555 0.0518296 0.03937783
## 3 0.5740836 0.58035266 0.57304749 0.72076885 0.4979751 0.99616548
## 4 0.9601644 0.97817468 1.05330418 1.22954071 1.4126972 1.37621273
## 5 7.9754544 8.01928429 7.78695000 7.59061514 7.3157607 7.35862494
## 6 3.3095534 3.68444127 3.59030296 3.58803558 3.7798890 4.05146517
## X2005 X2006 X2007 X2008 X2009 X2010
## 1 27.20070778 26.94826047 27.89557400 26.2308466 25.9158329 24.670529
## 2 0.05294821 0.06372847 0.08541751 0.1541014 0.2417227 0.293837
## 3 0.97974003 1.09888390 1.19784398 1.1815268 1.2324945 1.243406
## 4 1.41249821 1.30257637 1.32233486 1.4843111 1.4956002 1.578574
## 5 7.29987194 6.74621872 6.51946591 6.4278866 6.1216523 6.122595
## 6 4.16848626 4.26823987 4.10022627 4.3904014 4.5421515 4.615758
## X2011 X2012 X2013 X2014
## 1 24.5058352 13.1555417 8.3512943 8.408363
## 2 0.4120169 0.3503706 0.3156018 0.299445
## 3 1.2527893 1.3308430 1.2546172 1.291328
## 4 1.8037147 1.6929083 1.7492111 1.978763
## 5 5.8671299 5.9165969 5.9007526 5.832170
## 6 4.5377552 4.8136307 4.6504742 4.860234
names(co2) #incoding error
## [1] "癤풠ountry.Name" "Country.Code" "Indicator.Name"
## [4] "X1960" "X1961" "X1962"
## [7] "X1963" "X1964" "X1965"
## [10] "X1966" "X1967" "X1968"
## [13] "X1969" "X1970" "X1971"
## [16] "X1972" "X1973" "X1974"
## [19] "X1975" "X1976" "X1977"
## [22] "X1978" "X1979" "X1980"
## [25] "X1981" "X1982" "X1983"
## [28] "X1984" "X1985" "X1986"
## [31] "X1987" "X1988" "X1989"
## [34] "X1990" "X1991" "X1992"
## [37] "X1993" "X1994" "X1995"
## [40] "X1996" "X1997" "X1998"
## [43] "X1999" "X2000" "X2001"
## [46] "X2002" "X2003" "X2004"
## [49] "X2005" "X2006" "X2007"
## [52] "X2008" "X2009" "X2010"
## [55] "X2011" "X2012" "X2013"
## [58] "X2014"
names(co2)[1]="country name"
which(colnames(co2)=="X2014")
## [1] 58
names(co2)[which(colnames(co2)=="X2014")]="yr2014" #change a variable name
library(maps)
library(ggplot2)
library(RColorBrewer)
map.world=map_data("world") #world map. not a google map.
What we should be aware of is that map.world is a data frame of ‘region’(=country name), ‘long’(=longitude), ‘lat’(=latitude), ‘group’, ‘order’, ‘subgroup’, and ‘country code’. It has such a lot of data points(over 80000) of (lat, long) that will be connected as a world map later.
In order to represent features on a world map, we have to merge the co2 data frame and the map.world data frame. I got a great help from a youtube video from mitcourseware https://www.youtube.com/watch?v=2rnsbodsJVc
map.world=merge(map_data("world"),co2,by.x="region",by.y="country name")
ggplot(map.world,aes(x=long,y=lat,group=group))+geom_polygon(fill="white",color="black")
2 strange Things here. First, by the function ‘merge’, coordinates are ordered differently, leading to a mess. Look at United States. We should set the right order.
map.world=map.world[order(map.world$group,map.world$order),]
ggplot(map.world,aes(x=long,y=lat,group=group))+geom_polygon(aes(fill=yr2014),color="azure") #fill the color by co2 emission /person in 2014
Then the problem is that there are quite missing data. 1) Some are because there’s missing data in the data frame. Cannot do anything about that. Cannot replace those as 0 or the mean value. 2) Some are because the names of the countries in the co2 and the map.world is different. Like in the excel file, USA is written “United States”. There are whole lot of examples of these. Important thing here, keeping standard!!!!
co2$`country name`[co2$`country name`=="United States"]="USA"
co2$`country name`[co2$`country name`=="United Kingdom"]="UK"
co2$`country name`[co2$`country name`=="Korea, Rep."]="South Korea"
co2$`country name`[co2$`country name`=="Korea, Dem. People’s Rep.
"]="North Korea"
co2$`country name`[co2$`country name`=="Russian Federation"]="Russia"
co2$`country name`[co2$`country name`=="Qatar"]="Qatar"
map.world=merge(map_data("world"),co2,by.x="region",by.y="country name")
ggplot(map.world,aes(x=long,y=lat,group=group))+geom_polygon(aes(fill=yr2014),color="azure")
map.world=map.world[order(map.world$group,map.world$order),]
ggplot(map.world,aes(x=long,y=lat,group=group))+geom_polygon(aes(fill=yr2014),color="azure")+ scale_fill_gradientn(colours = brewer.pal(8, "RdYlBu")[4:1])
Without the gradientn, it would have been a series of blue color which is a little bit hard to decipher. I changed the colour into a series of red and yellow.
head(co2[order(co2$yr2014,decreasing=T),][1],n=15) #top 15 countries
## country name
## 199 Qatar
## 50 Curacao
## 241 Trinidad and Tobago
## 126 Kuwait
## 21 Bahrain
## 7 United Arab Emirates
## 30 Brunei Darussalam
## 204 Saudi Arabia
## 224 Sint Maarten (Dutch part)
## 143 Luxembourg
## 250 USA
## 169 North America
## 171 New Caledonia
## 83 Gibraltar
## 181 Oman
Biggest Co2 emitter per capita start from Qatar, Curacao, Trinidad and Tobago, Kuwait, Bahrain, UAE, Brunei, Saudi Arabia, Sint Maarten, Luxemburg, US, New Caledonia, Gibraitar, Oman, Australia, Canada… Many countries from the Middle East take the high seed while there’s no China and India in this list. It can be a surprise because those two countries are known for emitting tremendous amount of CO2. They do emit a lot totally but not that much per person.