Tim Waring April, 2014
This file is intended to carefully and publically document the stages of data analysis and visualization that contributed to this research project. The following R code, output and annotations accompany a study of the University of Maine's Sustainability Solutions Initiative's carbon emissions between 2009 and 2011. This project was the result of work by Mark Anderson, Mario Teisl, and Eva Manandhar in School of Economics at the University of Maine. The published research is available at: {URL}.
This paper presents new data on the carbon emissions generated by travel undertaken for a major sustainability science research effort. Previous research has estimated CO2 emissions generated by individual scientists, by entire academic institutions, or by international climate conferences. Here, we sought to investigate the size, distribution and factors affecting the carbon emissions of travel for sustainability research in particular. Reported airline and automobile travel of participants in Maine’s Sustainability Solutions Initiative were used to calculate the carbon dioxide emissions attributable to research-related travel over a three-year period. Our methodology is simple and with planning can be applied at any scale. Carbon emissions varied substantially by researcher and by purpose of travel. Travel for the purpose of attending academic conferences created the largest carbon footprint of all types of travel. This result suggests that alternative networking and dissemination models are needed to replace the high carbon costs of annual society meetings. This research adds to literature that questions whether the cultural demands of contemporary academic careers are compatible with climate stabilization. We argue that precise record keeping and routine analysis of travel data are necessary to track and reduce the climate impacts of sustainability research. We summarize the barriers to behavioral change at individual and organizational levels and conclude with suggestions for reducing climate impacts of travel undertaken for sustainability research.
The data and code are provided as is, and may contain errors or imperfections. The data for this project is in two datafiles. There is no good reason for this fact, I just haven't had the time to merge them appropriately. They both come from the same source data. Like I said, no good reason. This analysis document was compiled in RStudio using the R Markdown functionality to intersperse code, results, and text. This R code uses knitr, plyr, and ggplot2.
library("knitr")
library("plyr")
library("ggplot2")
d <- data.frame(read.csv("/Users/twaring/Documents/Research/SSI Carbon/CO2.csv"))
levels(d$Discip) <- c("na", "Natural", "Social")
d$Type <- as.factor(d$Type)
levels(d$Type) <- c("Visitor", "Full", "Assoc.", "Assist.", "Post Doc.", "PhD Student",
"MS Student", "Admin.", "other")
d$Type2 <- d$Type
levels(d$Type2) <- c("Visitor/other", "Professor", "Professor", "Professor",
"Student/Post Doc", "Student/Post Doc", "Student/Post Doc", "Administrator",
"Visitor/other")
levels(d$Cat) <- c("Student", "Faculty/Staff")
d$Type3 <- d$Type2
levels(d$Type3) <- c("Admin/Visitor/other", "Professor/Student", "Professor/Student",
"Admin/Visitor/other")
str(d)
## 'data.frame': 98 obs. of 10 variables:
## $ Rank_ann : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Rank_tot : int 11 23 25 26 1 2 7 3 12 14 ...
## $ Name : Factor w/ 98 levels "G01","G02","G03",..: 80 15 87 26 53 67 22 41 59 27 ...
## $ CO2annual: int 2561 1720 1698 1611 1596 1589 1560 1506 1261 1248 ...
## $ CO2total : int 2561 1720 1698 1611 4787 4768 3119 4518 2523 2496 ...
## $ Cat : Factor w/ 2 levels "Student","Faculty/Staff": 2 1 2 1 2 2 1 2 2 1 ...
## $ Type : Factor w/ 9 levels "Visitor","Full",..: 6 6 1 6 2 5 6 4 4 6 ...
## $ Discip : Factor w/ 3 levels "na","Natural",..: 2 3 1 3 3 3 2 3 2 3 ...
## $ Type2 : Factor w/ 4 levels "Visitor/other",..: 3 3 1 3 2 3 3 2 2 3 ...
## $ Type3 : Factor w/ 2 levels "Admin/Visitor/other",..: 2 2 1 2 2 2 2 2 2 2 ...
carbonTW <- data.frame(read.csv("/Users/twaring/Documents/Research/SSI Carbon/carbonTW.csv"))
carbonTW <- carbonTW[!is.na(carbonTW$month), ]
carbonTW$monthfact <- as.factor(carbonTW$month)
levels(carbonTW$monthfact) <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul",
"Aug", "Sep", "Oct", "Nov", "Dec")
attach(carbonTW)
carbonTW$season <- "NA"
carbonTW$season[month < 6] <- "Spring"
carbonTW$season[month >= 6 & month < 9] <- "Summer"
carbonTW$season[month >= 9] <- "Fall"
carbonTW$fiscal <- "NA"
carbonTW$fiscal[year == 2009] <- "f2010"
carbonTW$fiscal[year == 2010 & month < 7] <- "f2010"
carbonTW$fiscal[year == 2010 & month >= 7] <- "f2011"
carbonTW$fiscal[year == 2011 & month < 7] <- "f2011"
carbonTW$fiscal[year == 2011 & month >= 7] <- "f2011"
carbonTW$fiscal[year == 2012] <- "f2012"
carbonTW$fiscal <- as.factor(carbonTW$fiscal)
carbonTW$allmonths <- "NA"
carbonTW$allmonths <- year - 2009 + month/12
detach(carbonTW)
carbonTW$dissem <- carbonTW$triptype
carbonTW$dissem = factor(carbonTW$dissem, levels(carbonTW$dissem)[c(1, 2, 4,
3)])
levels(carbonTW$dissem) <- c("Research", "Dissemination", "Research", "other")
str(carbonTW)
## 'data.frame': 403 obs. of 17 variables:
## $ num : int 1 2 3 4 5 6 7 8 9 11 ...
## $ TripID : int 40 48 41 49 42 50 99 1 23 25 ...
## $ year : int 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 ...
## $ month : int 9 9 10 10 10 10 11 11 11 11 ...
## $ fiscal : Factor w/ 3 levels "f2010","f2011",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ total_car_miles: num 526 526 228 228 278 ...
## $ total_car_CO2 : num 196.9 196.9 85.4 85.4 104.1 ...
## $ total_air_CO2 : num NA NA NA NA NA ...
## $ Total_CO2_kg : num 196.9 196.9 85.4 85.4 104.1 ...
## $ Name : Factor w/ 82 levels "G01","G02","G03",..: 19 19 19 19 19 19 56 49 35 2 ...
## $ perstype : Factor w/ 2 levels "staff","student": 2 2 2 2 2 2 1 1 2 2 ...
## $ persrank : int 6 6 6 6 6 6 0 0 6 6 ...
## $ triptype : Factor w/ 4 levels "admin","confpres",..: 3 3 3 3 3 3 1 1 3 3 ...
## $ monthfact : Factor w/ 12 levels "Jan","Feb","Mar",..: 9 9 10 10 10 10 11 11 11 11 ...
## $ season : chr "Fall" "Fall" "Fall" "Fall" ...
## $ allmonths : num 0.75 0.75 0.833 0.833 0.833 ...
## $ dissem : Factor w/ 3 levels "Research","Dissemination",..: 3 3 3 3 3 3 1 1 3 3 ...
summary(d$CO2annual)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 14 155 466 559 800 2560
length(d$CO2annual[d$CO2annual > 1000])
## [1] 15
summary(carbonTW$Total_CO2_kg)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 7.5 64.0 105.0 249.0 326.0 1620.0
# A summary table by fiscal year
ddply(carbonTW, .(fiscal), summarize, car_trips = NROW(!is.na(total_car_miles)),
car_miles = sum(!is.na(total_car_miles)), car_CO2 = sum(!is.na(total_car_CO2)),
air_trips = NROW(total_air_CO2), air_CO2 = sum(!is.na(total_air_CO2)), total_CO2 = sum(Total_CO2_kg),
.inform = TRUE, .drop = TRUE)
## fiscal car_trips car_miles car_CO2 air_trips air_CO2 total_CO2
## 1 f2010 115 96 96 115 23 22499
## 2 f2011 201 140 139 201 67 58484
## 3 f2012 87 67 67 87 20 19534
Figure 1 in the paper was created with mapbox.com.
ggplot(d, aes(x = Rank_ann, y = CO2annual)) + geom_bar(colour = "black", fill = "darkgrey",
stat = "identity") + labs(y = expression("Average Emissions (kgCO"[2] *
"e/yr)"), x = "Individual") + ggtitle("Individual Emissions") + theme(legend.position = "none")
ggplot(carbonTW, aes(Total_CO2_kg)) + geom_bar(colour = "black", fill = "darkgrey",
binwidth = 20) + labs(x = expression("Trip Emissions (kgCO"[2] * "e)"),
y = "Trip Count") + ggtitle("Trip Emissions Frequency")
ggplot(carbonTW, aes(x = dissem, y = Total_CO2_kg)) + geom_jitter(aes(colour = dissem),
position = position_jitter(width = 0.3), alpha = 0.85, size = 3) + geom_boxplot(alpha = 0,
outlier.colour = "red", outlier.shape = NA) + labs(x = " ", y = expression("Total Emissions (kgCO"[2] *
"e)")) + ggtitle("Emissions by Purpose of Travel") + theme(legend.position = "none")
## Warning: Removed 33 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_point).
ggplot(carbonTW, aes(monthfact, Total_CO2_kg)) + stat_summary(fun.y = "mean",
geom = "bar") + labs(x = " ", y = "kg CO2") + ggtitle("Average Monthly Emissions")
qplot(data = d, x = Rank_tot, y = CO2total, stat = "identity", geom = "bar",
main = "Total Emissions", xlab = "Individual", ylab = "kg CO2/yr")
ggplot(carbonTW, aes(x = season, y = Total_CO2_kg)) + geom_jitter(position = position_jitter(width = 0.3),
alpha = 0.25, size = 3) + geom_boxplot(alpha = 0, outlier.colour = "red",
outlier.shape = NA) + labs(x = " ", y = expression("Trip Emissions (kgCO"[2] *
"e)")) + ggtitle("Emissions by Academic Season") + theme(legend.position = "none")
## Warning: Removed 10 rows containing missing values (geom_point).
## Warning: Removed 31 rows containing missing values (geom_point).
## Warning: Removed 9 rows containing missing values (geom_point).
ggplot(carbonTW, aes(carbonTW$fiscal, Total_CO2_kg, na.rm = TRUE)) + geom_point(position = position_jitter(w = 0.2)) +
labs(x = " ", y = "kg CO2") + ggtitle("Emissions by Fiscal Year")
ggplot(carbonTW, aes(x = factor(year), y = Total_CO2_kg)) + geom_jitter(position = position_jitter(width = 0.3),
alpha = 0.25, size = 3) + geom_boxplot(alpha = 0, outlier.colour = "red",
outlier.shape = NA) + labs(x = " ", y = expression("Trip Emissions (kgCO"[2] *
"e)")) + ggtitle("Emissions by Year") + theme(legend.position = "none")
## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 15 rows containing missing values (geom_point).
## Warning: Removed 6 rows containing missing values (geom_point).
## Warning: Removed 15 rows containing missing values (geom_point).
ggplot(carbonTW, aes(x = Total_CO2_kg, fill = dissem)) + geom_histogram(colour = "black",
binwidth = 50) + facet_grid(dissem ~ ., scales = "free_y") + theme(strip.text.y = element_text(size = 12,
face = "bold")) + theme(legend.position = "none") + labs(y = "Trips", x = expression("Total Emissions (kgCO"[2] *
"e)"))
ggplot(d, aes(Type2, CO2annual)) + geom_point(position = position_jitter(w = 0.15)) +
labs(x = " ", y = "kg CO2/yr") + ggtitle("Annual Emissions by Type of Traveler")
ggplot(d, aes(x = Discip, y = CO2annual)) + geom_jitter(position = position_jitter(width = 0.3),
alpha = 0.25, size = 3) + geom_boxplot(alpha = 0, outlier.colour = "red",
outlier.shape = NA) + labs(x = " ", y = expression("Trip Emissions (kgCO"[2] *
"e/yr)")) + ggtitle("Annual Emissions by Disciplinary Category") + theme(legend.position = "none")
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).