As the countries are getting developed the pollution is also increasing. So, i want to find out how the GDP and the co2 emission are increased since 1960 for various countries.I am also curios to find out the relation between the GDP and the co2 emission. I minned data that is available with the world bank and discovered few interesting facts.
# 1. Reading Data sets :
setwd("D:/Raviteja/Raviteja Professional/Data Science/EDA_Course_Materials")
c2 <- read.csv("co2 emission.csv", sep= ',')
library('data.table')
library('dplyr')
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
##
## between, last
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
setnames(c2, "CO2.per.capita", "country")
g2 <- read.csv("GDPpercapitaconstant2000US.csv", sep= ',')
setnames(g2, "Income.per.person..fixed.2000.US..", "country")
library('tidyr')
library('gridExtra')
library(ggplot2)
g.c2 <- gather(c2, "year", "co2e", 2:length(colnames(c2)))
g.g2 <- gather(g2, "year", "gdp_pc", 2:length(colnames(g2)))
#combine dataframes into one for analysis
co2.gdp <- inner_join(g.g2, g.c2, by=c('country', 'year'))
## Warning in inner_join_impl(x, y, by$x, by$y): joining factors with
## different levels, coercing to character vector
#Remove the annoying X's on year
co2.gdp$year <- gsub("X", '', co2.gdp$year)
names(co2.gdp)
## [1] "country" "year" "gdp_pc" "co2e"
#2.Checking the relation between co2 emission ans the GDP per capita
ggplot(aes(x=gdp_pc,y=co2e),data=co2.gdp)+geom_point()+scale_x_continuous(limits=c(0,7000),breaks=seq(0,7000,1000))+scale_y_continuous(limits=c(0,25),breaks=seq(0,25,5))+labs(title="Co2 emission VS GDP per capita", x = "GDP per capita", y = "Co2 emission")+geom_smooth()
## Warning: Removed 7082 rows containing non-finite values (stat_smooth).
## Warning: Removed 7082 rows containing missing values (geom_point).
with(data=subset(co2.gdp, !is.na(~gdp_pc),!is.na(~co2e)),cor.test(gdp_pc,co2e, method='pearson'))
## Warning in is.na(~gdp_pc): is.na() applied to non-(list or vector) of type
## 'language'
## Warning in is.na(~co2e): is.na() applied to non-(list or vector) of type
## 'language'
##
## Pearson's product-moment correlation
##
## data: gdp_pc and co2e
## t = 73.373, df = 7015, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.6455023 0.6719857
## sample estimates:
## cor
## 0.6589482
# the corelation coefficient of 0.6589482 is indicating that there is Considerable corelation between per capita GDP and co2 emission
#3.Observing how the co2 emission and the GDP per capita changed over the years in few countries :
library(ggplot2)
selected.countries <- na.omit(co2.gdp[,1:4])
levels=c(selected.countries$country)
sample.ids <- sample(levels,16)
sample.ids <- c( "United States","France", "India", "Japan","China")
d1<-ggplot(aes(x = year, y = co2e), data = subset(selected.countries, country %in% sample.ids))+facet_wrap(~ country,scales="free_x",ncol=1) + geom_line(aes(group=country))+ scale_x_discrete(breaks=seq(1960, 2011,5))+labs(title = "co2 emission between 1960-2011", x = "Year", y = "Co2 Consumption")
d2<-ggplot(aes(x = year, y = gdp_pc), data = subset(selected.countries, country %in% sample.ids))+facet_wrap(~ country,scales="free_x", ncol=1) + geom_line(aes(group=country))+ scale_x_discrete(breaks=seq(1960, 2011,5))+labs(title = "per capita GDP between 1960-2011", x = "Year", y = "GDP per capita")
library("gridExtra")
grid.arrange(d2,d1,ncol=2)
#4. Observing the relation between co2 emission and the GDP per capita in few of the interesting countreies :
l1<-ggplot(aes(x = year, y = co2e, color= country), data = subset(selected.countries, country %in% sample.ids)) + geom_point(aes(color= country))+scale_x_discrete( breaks=seq(1960, 2011,5))+labs(title = "co2 emission between 1965-2011", x = "Year", y = "Co2 emission")
l2<-ggplot(aes(x = year, y = gdp_pc), data = subset(selected.countries, country %in% sample.ids))+geom_point(aes(color= country))+labs(title = "per capita GDP between 1960-2011", x = "Year", y = "GDP per capita")+scale_x_discrete(breaks=seq(1960, 2011,10))
grid.arrange(l1,l2,ncol=2)
#Interesting facts from the Above study :
# FRANCE is doing phenomenally well by reducing co2 emission over the years even though the per capita income is increasing.US is the major contributor of the co2 emission among the lot. From 2000 to 2010, the rate of co2 emission of China is the highest among others.Though the co2 emission of India is low , it should put a check on the co2 emission as the plot suggests an increasing trend.