First lets load in the data.
Data <- read.csv("Data.csv")
Lets see what data we are working with.
print(Data)
## S1_T1_Time S1_T2_Time S1_T3_Time S2_T1_Time S2_T2_Time S2_T3_Time
## 1 66.34 547.77 773.25 13.83 55.37 71.77
## 2 85.76 278.69 139.76 43.83 144.04 1.75
## 3 46.97 89.33 217.60 18.30 29.72 26.45
## 4 53.78 57.04 356.54 30.30 48.97 1.11
## 5 66.44 374.60 20.95 11.98 140.71 1.98
## 6 168.54 401.20 NA 0.36 50.89 81.18
## 7 148.98 51.50 NA 12.10 333.29 26.97
## 8 55.67 NA NA 21.48 141.79 5.66
## 9 90.67 NA NA 222.81 22.41 394.45
## 10 83.98 NA NA 84.11 60.49 145.78
## 11 236.14 NA NA 4.44 180.22 14.63
## 12 59.89 NA NA 168.02 64.21 379.42
## 13 28.48 NA NA 407.38 19.04 200.75
## 14 40.08 NA NA 198.02 96.04 2.21
## 15 372.30 NA NA 46.92 72.24 7.63
## 16 4.08 NA NA 109.15 82.36 259.89
## 17 61.85 NA NA 27.96 141.76 192.78
## 18 78.30 NA NA 23.26 103.37 NA
## 19 NA NA NA 376.17 NA NA
## 20 NA NA NA NA NA NA
## 21 NA NA NA NA NA NA
Next lets get a summary of the data to see the raw means of each trial.
summary(Data)
## S1_T1_Time S1_T2_Time S1_T3_Time S2_T1_Time
## Min. : 4.08 Min. : 51.50 Min. : 20.95 Min. : 0.36
## 1st Qu.: 54.25 1st Qu.: 73.19 1st Qu.:139.76 1st Qu.: 16.07
## Median : 66.39 Median :278.69 Median :217.60 Median : 30.30
## Mean : 97.12 Mean :257.16 Mean :301.62 Mean : 95.81
## 3rd Qu.: 89.44 3rd Qu.:387.90 3rd Qu.:356.54 3rd Qu.:138.59
## Max. :372.30 Max. :547.77 Max. :773.25 Max. :407.38
## NA's :3 NA's :14 NA's :16 NA's :2
## S2_T2_Time S2_T3_Time
## Min. : 19.04 Min. : 1.11
## 1st Qu.: 52.01 1st Qu.: 5.66
## Median : 77.30 Median : 26.97
## Mean : 99.27 Mean :106.73
## 3rd Qu.:141.50 3rd Qu.:192.78
## Max. :333.29 Max. :394.45
## NA's :3 NA's :4
Lets see the combined site means of each site.
#Site 1 mean
mean(c(97.12, 257.16, 301.62))
## [1] 218.6333
#Site 2 mean
mean(c(95.81, 99.27, 106.73))
## [1] 100.6033
Just taking this data, it would seem like site 1 has a a much greater average time between pollinator sightings than site 2, but we need to further analyse the data.
Lets combine the data from each site to divide the data into site 1 and site 2
Site1 <- na.omit(c(Data$S1_T1_Time, Data$S1_T2_Time, Data$S1_T3_Time))
Site2 <- na.omit(c(Data$S2_T1_Time, Data$S2_T2_Time, Data$S2_T3_Time))
#put the data into one dataframe grouped by site
Site <- c(rep("Site1", length(Site1)), rep("Site2", length(Site2)))
df <- data.frame(Site = Site,
Value = c(Site1, Site2))
Now to do our hypothesis testing we perform an ancova test.
ancova_result <- aov(Value ~ Site, data=df)
summary(ancova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## Site 1 89562 89562 4.609 0.0348 *
## Residuals 82 1593510 19433
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This yields a p-value of 0.0348 which supports the alternative hypothesis that the means of site 1 and 2 are different. What this means is the data supports the claim that in the experiment site did have an impact on the time taken to find pollinators.
Lets graph this data.
hist(Site1, col="purple",xlab="Time in Seconds", main="Histogram of Non-planted Site")
hist(Site2, col="yellow",xlab="Time in Seconds", main="Histogram of Planted Site")