Fossil-Fuel CO2 Per Captia Data Analysis

Data source: Carbon Dioxide Information Analysis Center (CDIAC) - 2010 Fossil-Fuel CO2 Emissions per Capita


This is a simple practice to analyze the characteristics of the fossil-fuel carbon dioxide emission per capita of the world in 2010, and R markdown. From the practice, we will have a better data for correlation anlysis of other research. For example, if we would like to know the regression of CO2 to GDP, then we might be better use CO21/2 as the variable.

  1. Download data (2010 Fossil-Fuel CO2 Emissions per Capita)

    source_url="http://cdiac.ornl.gov/trends/emis/top2010.cap"
    data_file="co2_per_cap_cdiac.txt"
    ## download file and extract file
    if (!file.exists(data_file)) {
        ## download repdata_data_StormData.csv.bz2
        download.file(source_url, data_file, quiet=T)        
    }
  2. Read and get some idea about the CO2 per Captia

    data <- read.fwf(data_file, widths=c(7,52,7), skip=17, col.name=c("RANK","NATION","CO2_CAP"))
    CO2_CAP=data$CO2_CAP
    summary(CO2_CAP)
    ##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    ##   0.000   0.195   0.845   1.410   1.910  10.900
    plot(CO2_CAP,ylab=expression(""*CO[2]*" per Captia (tons of carbon)"))

    plot of chunk unnamed-chunk-2

    par(mfrow=c(1,2)) 
    hist(CO2_CAP, xlab=expression(""*CO[2]*" per Captia (tons of carbon)"), main="")
    rug(CO2_CAP, ticksize = 0.03, side = 1, lwd = 0.5, col = par("fg"))
    boxplot(CO2_CAP)
    plot of chunk unnamed-chunk-2
  3. Boxplot and summary (mean=1.41, median=0.845) - data is skewed right. If we want to use the data for advanced analysis, we will need to:
    • Remove very outliners - the largest three values

      filter_CO2_CAP <- CO2_CAP[-c(1:3)]
      sqrt_CO2_CAP <- sqrt(filter_CO2_CAP)
      s.test=shapiro.test(sqrt_CO2_CAP)
    • Transform the data to be more friendly (normlized).

  4. After verifying several basic data transformations (log, square…) by Shapiro-Wilk normality test, the square root (statistic=95.5 % and p=2.7447 × 10-6) seems to be the better option.

    par(mfrow=c(1,2)) 
    boxplot(sqrt_CO2_CAP)
    qqnorm(sqrt_CO2_CAP)
    qqline(sqrt_CO2_CAP,col="red")

    plot of chunk unnamed-chunk-4