The ability of plants to absorb and fix carbon dioxide from the atmosphere becomes increasingly interesting and important in our age of anthropocentric global warming. The following study examined the ability of a certain species of grass to absorb carbon dioxide as a function of temperature.
I will use the CO2 data set included in the data sets package that comes with rStudio. This data set contains the fields that will facilitate scatter plots on several factors, and for now should provide enough detail to demonstrate the features of ggplot.
My goal in this document is to use the data set to illustrate and study the statistical methods presented in Andy Field’s Discovering Statistics With R, a textbook which I would have loved to use in my introductory statistics classes, but deemed too advanced for that student body.
This description is taken from the help file associated with the data sets package in R.
The CO2 data frame has 84 rows and 5 columns of data from an experiment on the cold tolerance of the grass species Echinochloa crus-galli.
CO2
Format
An object of class c(“nfnGroupedData”, “nfGroupedData”, “groupedData”, “data.frame”) containing the following columns:
Plant
Type
Treatment
conc
uptake
The CO2 uptake of six plants from Quebec and six plants from Mississippi was measured at several levels of ambient CO2 concentration. Half the plants of each type were chilled overnight before the experiment was conducted.
This data set was originally part of package nlme, and that has methods (including for [, as.data.frame, plot and print) for its grouped-data classes.
## Plant Type Treatment conc uptake
## Qn1 : 7 Quebec :42 nonchilled:42 Min. : 95 Min. : 7.70
## Qn2 : 7 Mississippi:42 chilled :42 1st Qu.: 175 1st Qu.:17.90
## Qn3 : 7 Median : 350 Median :28.30
## Qc1 : 7 Mean : 435 Mean :27.21
## Qc3 : 7 3rd Qu.: 675 3rd Qu.:37.12
## Qc2 : 7 Max. :1000 Max. :45.50
## (Other):42
Plotting points on a graph for each measurement provides a quick way to determine patterns in the data, without imposing preconceived conditions or structure on the data. As long as there are two numeric data variables, we can make an x - y plot of the data.
In this case we will use the CO2 concentration in the atmosphere as the x or independent variable, as this was the variable experimenters presumably changed to observe the resulting uptake of CO2 by the plant subjects.
There are also two other factors in the data set that we must consider, the Type variable which defines the environment from which the plants were taken, and the Treatment, which the experimenters imposed upon the plants from each environment.
Using ggplot, I will first create scatter plots of uptake vs conc for the two treatment factors, chilled and nonchilled.
While this graph shows a definite difference between treatments, it mixes plants from two markedly different environments. We should examine the effect of these environments on the plant’s ability to absorb carbon dioxide.
Next, I will create scatter plots of uptake vs conc for the grasses in either Quebec or Mississippi.:
We see a rather surprising result in this graph. The colder environment of Quebec somehow is associated with greater levels of carbon dioxide uptake. Perhaps the plant in this environment must make and store more sugars to survive the cold climate, and thus has a higher level of photosynthesis. Alternatively, in the summer months, there is more daylight in the more northern latitude of Quebec and thus a longer period of photosynthesis per day.
Without knowing how the plant scientists conducted their experiment, it is impossible to say why the Quebec plants were so much more effective in CO2 uptake. It could be that the scientists tried to remove confounding variables such as length of daylight by somehow regulating that attribute. There may have been other factors beyond their control, such as air quality and the transmissivity of the atmosphere to sunlight.
Clearly, we must examine the temperature treatment on each environment separately. We will do this by creating a subset of the data set for each environment and plotting each subset separately, taking into account the treatment variable.
We see that although there is an effect of treatment on plant CO2 uptake in Quebec, this effect is minimal and almost within the standard deviation bounds of the linear model for each treatment.
Clearly, there is a markedly greater effect of chilling the plants in the Mississippi environment than that of Quebec. We see that colder plants are less able to absorb carbon dioxide, as we would expect on the general activity level of chemical reactions with temperature. However, this effect is minimal for the Quebec plants, perhaps due to the fact that they are accustomed to greater temperature differentials and have adapted to those extremes.
Scatter plots provide a powerful visual tool for analysis, and with careful consideration of possible confounding factors, associations between variables may appear through linear regression. We must understand the influence of different factors upon the data to extract meaningful relationships, though. Merely performing plots of the gross data set may mask the effect of certain factors, as we saw in this example with the location of the plants under study.
This is an example also of a dynamic and data-centered approach to data analysis, where preliminary analysis informs further strategies for analysis. This method is superior to prescriptive strategies that do not consider the information inherent in the data itself, and attempt to apply the same methodology to all data sets of similar type.
The greater ability of plants to absorb carbon dioxide at warmer temperatures offers some benefit for mitigating the effects of global warming. In temperate regions warmer climate might increase CO2 uptake. However, the increased variability of temperature and rainfall associated with climate change might impose additional stress on plants and ecosystems, and may reduce or overpower the beneficial effects of increased CO2 uptake.
This analysis exposes the need for informed judgment by educated and experienced scientists. While statistical analysis may uncover relationships between data elements, it still takes trained and thoughtful researchers to interpret the results and draw meaningful conclusions. While good graphs may suggest physical explanations, we as data analysts must leave those explanations to the experts. Our job is to reveal these relationships in the data in as clear a manner as possible.