SSI Carbon Emissions Study

Tim Waring April, 2014

This file is intended to carefully and publically document the stages of data analysis and visualization that contributed to this research project. The following R code, output and annotations accompany a study of the University of Maine's Sustainability Solutions Initiative's carbon emissions between 2009 and 2011. This project was the result of work by Mark Anderson, Mario Teisl, and Eva Manandhar in School of Economics at the University of Maine. The published research is available at: {URL}.

Abstract

This paper presents new data on the carbon emissions generated by travel undertaken for a major sustainability science research effort. Previous research has estimated CO2 emissions generated by individual scientists, by entire academic institutions, or by international climate conferences. Here, we sought to investigate the size, distribution and factors affecting the carbon emissions of travel for sustainability research in particular. Reported airline and automobile travel of participants in Maine’s Sustainability Solutions Initiative were used to calculate the carbon dioxide emissions attributable to research-related travel over a three-year period. Our methodology is simple and with planning can be applied at any scale. Carbon emissions varied substantially by researcher and by purpose of travel. Travel for the purpose of attending academic conferences created the largest carbon footprint of all types of travel. This result suggests that alternative networking and dissemination models are needed to replace the high carbon costs of annual society meetings. This research adds to literature that questions whether the cultural demands of contemporary academic careers are compatible with climate stabilization. We argue that precise record keeping and routine analysis of travel data are necessary to track and reduce the climate impacts of sustainability research. We summarize the barriers to behavioral change at individual and organizational levels and conclude with suggestions for reducing climate impacts of travel undertaken for sustainability research.

The data and code are provided as is, and may contain errors or imperfections. The data for this project is in two datafiles. There is no good reason for this fact, I just haven't had the time to merge them appropriately. They both come from the same source data. Like I said, no good reason. This analysis document was compiled in RStudio using the R Markdown functionality to intersperse code, results, and text. This R code uses knitr, plyr, and ggplot2.

library("knitr")
library("plyr")
library("ggplot2")

LOAD first dataset on SSI CO2 emissions

d <- data.frame(read.csv("/Users/twaring/Documents/Research/SSI Carbon/CO2.csv"))

Prepare the first dataframe for analysis

levels(d$Discip) <- c("na", "Natural", "Social")
d$Type <- as.factor(d$Type)
levels(d$Type) <- c("Visitor", "Full", "Assoc.", "Assist.", "Post Doc.", "PhD Student", 
    "MS Student", "Admin.", "other")
d$Type2 <- d$Type
levels(d$Type2) <- c("Visitor/other", "Professor", "Professor", "Professor", 
    "Student/Post Doc", "Student/Post Doc", "Student/Post Doc", "Administrator", 
    "Visitor/other")
levels(d$Cat) <- c("Student", "Faculty/Staff")
d$Type3 <- d$Type2
levels(d$Type3) <- c("Admin/Visitor/other", "Professor/Student", "Professor/Student", 
    "Admin/Visitor/other")

CHECK - should be 98 observations of 10 variables

str(d)

## 'data.frame':    98 obs. of  10 variables:
##  $ Rank_ann : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Rank_tot : int  11 23 25 26 1 2 7 3 12 14 ...
##  $ Name     : Factor w/ 98 levels "G01","G02","G03",..: 80 15 87 26 53 67 22 41 59 27 ...
##  $ CO2annual: int  2561 1720 1698 1611 1596 1589 1560 1506 1261 1248 ...
##  $ CO2total : int  2561 1720 1698 1611 4787 4768 3119 4518 2523 2496 ...
##  $ Cat      : Factor w/ 2 levels "Student","Faculty/Staff": 2 1 2 1 2 2 1 2 2 1 ...
##  $ Type     : Factor w/ 9 levels "Visitor","Full",..: 6 6 1 6 2 5 6 4 4 6 ...
##  $ Discip   : Factor w/ 3 levels "na","Natural",..: 2 3 1 3 3 3 2 3 2 3 ...
##  $ Type2    : Factor w/ 4 levels "Visitor/other",..: 3 3 1 3 2 3 3 2 2 3 ...
##  $ Type3    : Factor w/ 2 levels "Admin/Visitor/other",..: 2 2 1 2 2 2 2 2 2 2 ...

LOAD the second dataset on SSI CO2 emissions

carbonTW <- data.frame(read.csv("/Users/twaring/Documents/Research/SSI Carbon/carbonTW.csv"))

PREPARE the second dataset

carbonTW <- carbonTW[!is.na(carbonTW$month), ]
carbonTW$monthfact <- as.factor(carbonTW$month)
levels(carbonTW$monthfact) <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", 
    "Aug", "Sep", "Oct", "Nov", "Dec")

attach(carbonTW)
carbonTW$season <- "NA"
carbonTW$season[month < 6] <- "Spring"
carbonTW$season[month >= 6 & month < 9] <- "Summer"
carbonTW$season[month >= 9] <- "Fall"
carbonTW$fiscal <- "NA"
carbonTW$fiscal[year == 2009] <- "f2010"
carbonTW$fiscal[year == 2010 & month < 7] <- "f2010"
carbonTW$fiscal[year == 2010 & month >= 7] <- "f2011"
carbonTW$fiscal[year == 2011 & month < 7] <- "f2011"
carbonTW$fiscal[year == 2011 & month >= 7] <- "f2011"
carbonTW$fiscal[year == 2012] <- "f2012"
carbonTW$fiscal <- as.factor(carbonTW$fiscal)
carbonTW$allmonths <- "NA"
carbonTW$allmonths <- year - 2009 + month/12
detach(carbonTW)

carbonTW$dissem <- carbonTW$triptype
carbonTW$dissem = factor(carbonTW$dissem, levels(carbonTW$dissem)[c(1, 2, 4, 
    3)])
levels(carbonTW$dissem) <- c("Research", "Dissemination", "Research", "other")

CHECK - should be 403 obs. of 16 variables

str(carbonTW)

## 'data.frame':    403 obs. of  17 variables:
##  $ num            : int  1 2 3 4 5 6 7 8 9 11 ...
##  $ TripID         : int  40 48 41 49 42 50 99 1 23 25 ...
##  $ year           : int  2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 ...
##  $ month          : int  9 9 10 10 10 10 11 11 11 11 ...
##  $ fiscal         : Factor w/ 3 levels "f2010","f2011",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ total_car_miles: num  526 526 228 228 278 ...
##  $ total_car_CO2  : num  196.9 196.9 85.4 85.4 104.1 ...
##  $ total_air_CO2  : num  NA NA NA NA NA ...
##  $ Total_CO2_kg   : num  196.9 196.9 85.4 85.4 104.1 ...
##  $ Name           : Factor w/ 82 levels "G01","G02","G03",..: 19 19 19 19 19 19 56 49 35 2 ...
##  $ perstype       : Factor w/ 2 levels "staff","student": 2 2 2 2 2 2 1 1 2 2 ...
##  $ persrank       : int  6 6 6 6 6 6 0 0 6 6 ...
##  $ triptype       : Factor w/ 4 levels "admin","confpres",..: 3 3 3 3 3 3 1 1 3 3 ...
##  $ monthfact      : Factor w/ 12 levels "Jan","Feb","Mar",..: 9 9 10 10 10 10 11 11 11 11 ...
##  $ season         : chr  "Fall" "Fall" "Fall" "Fall" ...
##  $ allmonths      : num  0.75 0.75 0.833 0.833 0.833 ...
##  $ dissem         : Factor w/ 3 levels "Research","Dissemination",..: 3 3 3 3 3 3 1 1 3 3 ...

Compute some simple summary statistics

summary(d$CO2annual)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      14     155     466     559     800    2560

length(d$CO2annual[d$CO2annual > 1000])

## [1] 15

summary(carbonTW$Total_CO2_kg)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     7.5    64.0   105.0   249.0   326.0  1620.0


# A summary table by fiscal year
ddply(carbonTW, .(fiscal), summarize, car_trips = NROW(!is.na(total_car_miles)), 
    car_miles = sum(!is.na(total_car_miles)), car_CO2 = sum(!is.na(total_car_CO2)), 
    air_trips = NROW(total_air_CO2), air_CO2 = sum(!is.na(total_air_CO2)), total_CO2 = sum(Total_CO2_kg), 
    .inform = TRUE, .drop = TRUE)

##   fiscal car_trips car_miles car_CO2 air_trips air_CO2 total_CO2
## 1  f2010       115        96      96       115      23     22499
## 2  f2011       201       140     139       201      67     58484
## 3  f2012        87        67      67        87      20     19534

Manuscript Figures

Figure 1 in the paper was created with mapbox.com.

Figure 2 - Annual averages by individual

ggplot(d, aes(x = Rank_ann, y = CO2annual)) + geom_bar(colour = "black", fill = "darkgrey", 
    stat = "identity") + labs(y = expression("Average Emissions (kgCO"[2] * 
    "e/yr)"), x = "Individual") + ggtitle("Individual Emissions") + theme(legend.position = "none")

plot of chunk unnamed-chunk-9

Figure 3 - A Histogram of Trip Emissions Intensity

ggplot(carbonTW, aes(Total_CO2_kg)) + geom_bar(colour = "black", fill = "darkgrey", 
    binwidth = 20) + labs(x = expression("Trip Emissions (kgCO"[2] * "e)"), 
    y = "Trip Count") + ggtitle("Trip Emissions Frequency")

plot of chunk unnamed-chunk-10

Figure 4 - Emissions by Purpose of Travel

ggplot(carbonTW, aes(x = dissem, y = Total_CO2_kg)) + geom_jitter(aes(colour = dissem), 
    position = position_jitter(width = 0.3), alpha = 0.85, size = 3) + geom_boxplot(alpha = 0, 
    outlier.colour = "red", outlier.shape = NA) + labs(x = " ", y = expression("Total Emissions (kgCO"[2] * 
    "e)")) + ggtitle("Emissions by Purpose of Travel") + theme(legend.position = "none")

## Warning: Removed 33 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_point).

plot of chunk unnamed-chunk-11

APPENDIX FIGURES

Figure 1 - Average Monthly Emissions

ggplot(carbonTW, aes(monthfact, Total_CO2_kg)) + stat_summary(fun.y = "mean", 
    geom = "bar") + labs(x = " ", y = "kg CO2") + ggtitle("Average Monthly Emissions")

plot of chunk unnamed-chunk-12

Figure 2 - Total Emissions by individual

qplot(data = d, x = Rank_tot, y = CO2total, stat = "identity", geom = "bar", 
    main = "Total Emissions", xlab = "Individual", ylab = "kg CO2/yr")

plot of chunk unnamed-chunk-13

Figure 3 - Academic Season

ggplot(carbonTW, aes(x = season, y = Total_CO2_kg)) + geom_jitter(position = position_jitter(width = 0.3), 
    alpha = 0.25, size = 3) + geom_boxplot(alpha = 0, outlier.colour = "red", 
    outlier.shape = NA) + labs(x = " ", y = expression("Trip Emissions (kgCO"[2] * 
    "e)")) + ggtitle("Emissions by Academic Season") + theme(legend.position = "none")

## Warning: Removed 10 rows containing missing values (geom_point).
## Warning: Removed 31 rows containing missing values (geom_point).
## Warning: Removed 9 rows containing missing values (geom_point).

plot of chunk unnamed-chunk-14

Figure 4 - Fiscal Year

ggplot(carbonTW, aes(carbonTW$fiscal, Total_CO2_kg, na.rm = TRUE)) + geom_point(position = position_jitter(w = 0.2)) + 
    labs(x = " ", y = "kg CO2") + ggtitle("Emissions by Fiscal Year")

plot of chunk unnamed-chunk-15

Figure 5 - Year

ggplot(carbonTW, aes(x = factor(year), y = Total_CO2_kg)) + geom_jitter(position = position_jitter(width = 0.3), 
    alpha = 0.25, size = 3) + geom_boxplot(alpha = 0, outlier.colour = "red", 
    outlier.shape = NA) + labs(x = " ", y = expression("Trip Emissions (kgCO"[2] * 
    "e)")) + ggtitle("Emissions by Year") + theme(legend.position = "none")

## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 15 rows containing missing values (geom_point).
## Warning: Removed 6 rows containing missing values (geom_point).
## Warning: Removed 15 rows containing missing values (geom_point).

plot of chunk unnamed-chunk-16

Figure 6 - emissions intensity by travel purpose

ggplot(carbonTW, aes(x = Total_CO2_kg, fill = dissem)) + geom_histogram(colour = "black", 
    binwidth = 50) + facet_grid(dissem ~ ., scales = "free_y") + theme(strip.text.y = element_text(size = 12, 
    face = "bold")) + theme(legend.position = "none") + labs(y = "Trips", x = expression("Total Emissions (kgCO"[2] * 
    "e)"))

plot of chunk unnamed-chunk-17

Figure 7 - Annual averages by Type

ggplot(d, aes(Type2, CO2annual)) + geom_point(position = position_jitter(w = 0.15)) + 
    labs(x = " ", y = "kg CO2/yr") + ggtitle("Annual Emissions by Type of Traveler")

plot of chunk unnamed-chunk-18

Figure 8 - Annual averages by Disciplinary category

ggplot(d, aes(x = Discip, y = CO2annual)) + geom_jitter(position = position_jitter(width = 0.3), 
    alpha = 0.25, size = 3) + geom_boxplot(alpha = 0, outlier.colour = "red", 
    outlier.shape = NA) + labs(x = " ", y = expression("Trip Emissions (kgCO"[2] * 
    "e/yr)")) + ggtitle("Annual Emissions by Disciplinary Category") + theme(legend.position = "none")

## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).

plot of chunk unnamed-chunk-19