Data source: Carbon Dioxide Information Analysis Center (CDIAC) - 2010 Fossil-Fuel CO2 Emissions per Capita
This is a simple practice to analyze the characteristics of the fossil-fuel carbon dioxide emission per capita of the world in 2010, and R markdown. From the practice, we will have a better data for correlation anlysis of other research. For example, if we would like to know the regression of CO2 to GDP, then we might be better use CO21/2 as the variable.
Download data (2010 Fossil-Fuel CO2 Emissions per Capita)
source_url="http://cdiac.ornl.gov/trends/emis/top2010.cap"
data_file="co2_per_cap_cdiac.txt"
## download file and extract file
if (!file.exists(data_file)) {
## download repdata_data_StormData.csv.bz2
download.file(source_url, data_file, quiet=T)
}Read and get some idea about the CO2 per Captia
data <- read.fwf(data_file, widths=c(7,52,7), skip=17, col.name=c("RANK","NATION","CO2_CAP"))
CO2_CAP=data$CO2_CAP
summary(CO2_CAP)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.195 0.845 1.410 1.910 10.900
plot(CO2_CAP,ylab=expression(""*CO[2]*" per Captia (tons of carbon)"))
par(mfrow=c(1,2))
hist(CO2_CAP, xlab=expression(""*CO[2]*" per Captia (tons of carbon)"), main="")
rug(CO2_CAP, ticksize = 0.03, side = 1, lwd = 0.5, col = par("fg"))
boxplot(CO2_CAP)
Remove very outliners - the largest three values
filter_CO2_CAP <- CO2_CAP[-c(1:3)]
sqrt_CO2_CAP <- sqrt(filter_CO2_CAP)
s.test=shapiro.test(sqrt_CO2_CAP)Transform the data to be more friendly (normlized).
After verifying several basic data transformations (log, square…) by Shapiro-Wilk normality test, the square root (statistic=95.5 % and p=2.7447 × 10-6) seems to be the better option.
par(mfrow=c(1,2))
boxplot(sqrt_CO2_CAP)
qqnorm(sqrt_CO2_CAP)
qqline(sqrt_CO2_CAP,col="red")