Analyzing the Raw Data

The petri dishes show the growth of bacteria on plates with and without antibiotics (in this case, the antibiotic ampicillin). Each plate is overlaid by a grid that limits measurement error when counting colonies. Each colony intersected by a grid line should be counted.

Work in pairs to quantify the number of colonies per grid, enter the data into EXCEL and save the data as a comma-separate values file, and visualize the data in a manner that effectively and clearly illustrates the results relative to the hypothesis being tested.

DATA AND ANALYSIS

  1. In excel, calculate the frequency of antibiotic resistance at each site, described in the lab hand-out:

Number of colonies on experimental plate / number of colonies on control plate

Store this data in the column “Fabr”, in the rows corresponding to Treatment=Ab_plus. We’ll use this for our first hypothesis, about the frequency of antibiotic resistance.

  1. In excel, calculate the diversity of bacteria at each site, for each treatment. We’ve given you excel formulas for the first row - you can check these out (see whether the formulas match the equation in the hand-out) and drag them down to other rows.

  2. Save your data file as a .csv file.

Load the data and attach it so it’s searchable.

ab_data <- read.csv("lab5.csv") 
attach(ab_data, warn.conflicts = F)#

Look at your data. Did it import as you expected? What class is it?

str(ab_data)
## 'data.frame':    12 obs. of  20 variables:
##  $ Site           : int  1 1 2 2 3 3 4 4 5 5 ...
##  $ Distance_kms   : int  0 0 11 11 23 23 35 35 49 49 ...
##  $ Treatment      : Factor w/ 2 levels "Ab_plus","Control": 2 1 2 1 2 1 2 1 2 1 ...
##  $ Type_1         : int  1 1 5 1 32 1 1 1 11 1 ...
##  $ Type_2         : int  2 2 29 23 13 17 19 12 8 2 ...
##  $ Type_3         : int  10 1 4 1 2 1 1 1 6 1 ...
##  $ Type_4         : int  2 3 9 5 9 1 5 7 25 11 ...
##  $ Fabr           : num  NA 0.467 NA 0.638 NA ...
##  $ totalBact      : int  15 7 47 30 56 20 26 21 50 15 ...
##  $ pi_Type1       : num  0.0667 0.1429 0.1064 0.0333 0.5714 ...
##  $ pi_Type2       : num  0.133 0.286 0.617 0.767 0.232 ...
##  $ pi_Type3       : num  0.6667 0.1429 0.0851 0.0333 0.0357 ...
##  $ pi_Type4       : num  0.133 0.429 0.191 0.167 0.161 ...
##  $ pi.ln.pi._Type1: num  -0.181 -0.278 -0.238 -0.113 -0.32 ...
##  $ pi.ln.pi._Type2: num  -0.269 -0.358 -0.298 -0.204 -0.339 ...
##  $ pi.ln.pi._Type3: num  -0.27 -0.278 -0.21 -0.113 -0.119 ...
##  $ pi.ln.pi._Type4: num  -0.269 -0.363 -0.317 -0.299 -0.294 ...
##  $ Diversity      : num  0.988 1.277 1.063 0.729 1.072 ...
##  $ Type           : int  1 2 3 4 NA NA NA NA NA NA ...
##  $ Description    : Factor w/ 5 levels "","orange_diffuse",..: 2 4 5 3 1 1 1 1 1 1 ...
#class is data.frame

Make a vector with 6 distances. Why do we have to do this, rather than using the full Distance_kms variable as-is, for our plots below?

distance <- Distance_kms[Treatment=="Ab_plus"]

We have to establish the 6 distances so we are only looking at the treatment plates from each distance and not the control and experimental plates from each distance. If we used the Distance_kms variable as-is it would show up as if we have 12 sites rather than 6.

Similarly, make a vector of the frequency of antibiotic resistance.

ab_res <- Fabr[Treatment=="Ab_plus"]

Create a caption for the plot below. Include the output from your statistical test run below.

Construct a graph that clearly communicates the relationship between the frequency of antibiotic resistance and spatial location along the watershed.

plot(distance, ab_res, col="blue",main="Antibiotic Resistance Along the Watershed",
    xlab="Distance from high elevation site (km)",ylab="Frequency of Antibiotic Resistance", pch=18, cex=2)
fit<-glm(ab_res~distance)
co<-coef(fit)
abline(fit,col="black",lwd=2)
Figure 1: In this figure we see the frequency of antibiotic resistant bacteria along the watershed at different distances from the source.The line of best fit shows a decline in frequency of antiobiotic resistance as the distance from the original source at high elevation increases. The caculated S value for this graph is 56, the p-value is 0.2417, and the rho value is -0.6. Since the p-value is above 0.05 we fail to reject the null hypothesis that there is no relationship between the distance from high elevation and frequency of antiobiotic resistance. The rho value of -0.6 suggests that there is some negative correlation between the two variables but it is not very strong.

Figure 1: In this figure we see the frequency of antibiotic resistant bacteria along the watershed at different distances from the source.The line of best fit shows a decline in frequency of antiobiotic resistance as the distance from the original source at high elevation increases. The caculated S value for this graph is 56, the p-value is 0.2417, and the rho value is -0.6. Since the p-value is above 0.05 we fail to reject the null hypothesis that there is no relationship between the distance from high elevation and frequency of antiobiotic resistance. The rho value of -0.6 suggests that there is some negative correlation between the two variables but it is not very strong.

Use Spearman’s rank correlation (a non-parametric test of association) to test the relationship plotted above.

#The basic structure is cor.test(x,y,method="spearman"). 
cor.test(distance,ab_res, method="spearman") #Fill in the appropriate x and y variable names
## 
##  Spearman's rank correlation rho
## 
## data:  distance and ab_res
## S = 56, p-value = 0.2417
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##  rho 
## -0.6

Make an evidence-based claim about the frequency of antibiotic resistant bacteria along the Boulder Creek watershed.

Along the Boulder Creek watershed, the overall frequency of antibiotic resistance decreases as the distance from high elevation increases. There are two sites in which the frequency spikes which may suggest an influx of antiobtioc resistant bacteria from locations near those sites. The decrease in antiobtioc resistant bacteria frequency is likely due to the different enviroment of the water along the watershed which becomes less favorable for antibiotic resistant bacateria as it comes down to lower elevations. This could be a result of higher competition from non-resistant bacateria at lower elevations.

Now we’ll look at hypothesis 2, about the diversity of bacteria found at each site. Relativize your measure of diversity - divide what you found on the antibiotic plates by what you found on the control plates.

#Relative diversity = treatment / control

Diversity[2] / Diversity[1] #Gives you the relativized diversity at site 1
## [1] 1.292343
#We want to repeat this type of calculation, so we can use a for-loop
rel_diversity <- rep(NA, 6) #make an empty bin to store one value for each site

for(i in 1:6){
  rel_diversity[i] <- Diversity[(2*i)] / Diversity[(2*i-1)]
}

rel_diversity #Look at the output
## [1] 1.2923431 0.6861854 0.5482384 1.2246943 0.6984076 0.6427534

Why would this step (creating a relative measure of diversity) be important?

It is important to evaluate relative diversity rather than simply diversity because it standardizes the values. Since the overall frequency varied the diversity values would vary greatly. The relative diversity allows us to evaluate the data more accuratel.

Create a caption for the plot below. Include the output from your statistical test run below.

Construct a graph that clearly communicates the relationship between the (relative) diversity of antibiotic resistant bacteria and spatial location along the watershed.

plot(distance, rel_diversity, col="purple",main="Relative Diversity of Bacteria Along the Watershed",
    xlab="Distance from high elevation site (km)",ylab="Relative Diversity of Bacteria", pch=17, cex=2)
fit<-glm(rel_diversity~distance)
co<-coef(fit)
abline(fit,col="black",lwd=2)
Figure 2: In this figure we see the relative diversity of antibiotic resistant bacteria along the watershed at different distances from the source.The line of best fit shows a decline in diversity of antiobiotic resistance as the distance from the original source at high elevation increases. The caculated S value for this graph is 48, the p-value is 0.4972, and the rho value is -0.3714286. Since the p-value is above 0.05 we fail to reject the null hypothesis that there is no relationship between the distance from high elevation and diversity of antiobiotic resistance. The rho value of -0.3714286 suggests that there is a weak negative correlation between the two variables.

Figure 2: In this figure we see the relative diversity of antibiotic resistant bacteria along the watershed at different distances from the source.The line of best fit shows a decline in diversity of antiobiotic resistance as the distance from the original source at high elevation increases. The caculated S value for this graph is 48, the p-value is 0.4972, and the rho value is -0.3714286. Since the p-value is above 0.05 we fail to reject the null hypothesis that there is no relationship between the distance from high elevation and diversity of antiobiotic resistance. The rho value of -0.3714286 suggests that there is a weak negative correlation between the two variables.

Use Spearman’s rank correlation (a non-parametric test of association) to test the relationship plotted above.

#The basic structure is cor.test(x,y,method="spearman"). 
cor.test(distance,rel_diversity, method="spearman") #Fill in the appropriate x and y variable names
## 
##  Spearman's rank correlation rho
## 
## data:  distance and rel_diversity
## S = 48, p-value = 0.4972
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.3714286

Make an evidence-based claim about the diversity of antibiotic resistant bacteria along the Boulder Creek watershed.

Along the Boulder Creek watershed, the relative diversity of antibiotic resistance decreases as the distance from high elevation increases. The general decreasing trend in relative diversity of antiobtioc resistant bacteria suggests that the main source at high elevation contains the greatest relative diversity of antibiotic resistant bacateria and the different types of bacteria are slowly removed as they travel down the watershed. This could be a result of increased competition between the different antiobtiotic resistant bacteria as elevation decreases.

Propose a hypothesis that explains why there were antibiotic resistant bacteria at relatively pristine sites high above agricultural and municipal environments (assuming there are no antibiotics in that water).

Antiobiotic resistance is a natural mutation which occurs in the absence of antiobiotics; however, in the absence of antiobiotics resistant bacteria are often selected against due to decrease in rate of reproduction. It is possible that competition in higher elevations is lower because there is less pressure from agricultural and municipal environments so that both resistant and nonresistant bacteria can flourish. There could be more resources or niche environments the different bacteria strains can take advantage of so that antibiotic resistant bacteria can be successful even in antibiotic free water.