This is an analysis of light trap data from the Copenhagen Zoological Museum. I started by loading the data into a dataframe.
df <- read.csv("C:/Users/egdos/Desktop/ADAFall24/insects.csv", header=TRUE)
Next, I created a histogram showing the frequency of records for each year. I wanted to see if there was an increase or decrease in year frequencies of insects captured.
hist(df$year, xlab="Year",ylab="# of Records",main="Record Counts by Year")
Then, I created a graph showing the number of appearances for each order in the dataset.
ord_frq <- table(df$order)
barplot(ord_frq,ylab="Records",main="Appearances of Insect Orders")
We can tell from this that the bulk of records are from Lepidoptera (aka butterflies and moths).
First, I found the mean number of insects captured for each record, along with the standard deviation. This looked odd to me, so I looked for the max, and found a bizarre looking record that claimed to have caught 8396 insects in one day. Upon further examination, I found that this was 9 days worth of collection. I then searched for other records with counts over 1000, and found the majority of them (9/11) to be moths from the family Yponomeuta.
mean(df$individuals)
## [1] 5.714072
sd(df$individuals)
## [1] 59.73492
max(df$individuals)
## [1] 8396
df[df$individuals==8396,]
## order family name year date1 date2
## 36978 LEPIDOPTERA YPONOMEUTIDAE Yponomeuta padella L. 2009 7/11/09 7/20/09
## individuals
## 36978 8396
df[df$individuals>=1000,]
## order family name year date1
## 23645 LEPIDOPTERA PYRALIDAE Acentria ephemerella D. & S. 2005 7/12/05
## 36532 LEPIDOPTERA YPONOMEUTIDAE Yponomeuta evonymella L. 2000 7/21/00
## 36568 LEPIDOPTERA YPONOMEUTIDAE Yponomeuta evonymella L. 2003 7/21/03
## 36578 LEPIDOPTERA YPONOMEUTIDAE Yponomeuta evonymella L. 2004 7/30/04
## 36579 LEPIDOPTERA YPONOMEUTIDAE Yponomeuta evonymella L. 2004 8/2/04
## 36593 LEPIDOPTERA YPONOMEUTIDAE Yponomeuta evonymella L. 2005 7/12/05
## 36612 LEPIDOPTERA YPONOMEUTIDAE Yponomeuta evonymella L. 2006 7/14/06
## 36613 LEPIDOPTERA YPONOMEUTIDAE Yponomeuta evonymella L. 2006 7/21/06
## 36614 LEPIDOPTERA YPONOMEUTIDAE Yponomeuta evonymella L. 2006 7/28/06
## 36978 LEPIDOPTERA YPONOMEUTIDAE Yponomeuta padella L. 2009 7/11/09
## 43046 COLEOPTERA STAPHYLINIDAE Anotylus rugosus (Fabricius) 1999 7/9/99
## date2 individuals
## 23645 7/14/05 1656
## 36532 7/24/00 3074
## 36568 7/24/03 1131
## 36578 8/1/04 1000
## 36579 8/3/04 2125
## 36593 7/14/05 1284
## 36612 7/20/06 3174
## 36613 7/27/06 4872
## 36614 8/3/06 3543
## 36978 7/20/09 8396
## 43046 7/11/99 1281
I assume these moths have extremely prolific breeding seasons, especially Y. evonymella, which appears 8 times in the list.
For my statistical tests, I started by determining the distribution of an set I wanted to perform a t-test on: the count of how many times a particular number of individuals had been collected. While this was likely not normal, I wanted to check.
lep_count<-table(df$individuals[df$order=="LEPIDOPTERA"])
col_count<-table(df$individuals[df$order=="COLEOPTERA"])
shapiro.test(lep_count)
##
## Shapiro-Wilk normality test
##
## data: lep_count
## W = 0.095871, p-value < 2.2e-16
shapiro.test(col_count)
##
## Shapiro-Wilk normality test
##
## data: col_count
## W = 0.1287, p-value < 2.2e-16
As suspected, the distribution was not normal. Since this is based on count, I know it isn’t the presence of the >1000 samples we saw earlier. Next, I determined the mean number of collection instances, followed by a Wilcox test to see the difference between them.
mean(col_count)
## [1] 56.50427
mean(lep_count)
## [1] 164.3728
wilcox.test(col_count,lep_count)
##
## Wilcoxon rank sum test with continuity correction
##
## data: col_count and lep_count
## W = 12598, p-value = 0.3761
## alternative hypothesis: true location shift is not equal to 0
There was no significant distance between means, despite their large numerical gap.
Finally, I made a plot to begin looking a key point of the data: if decreases had been seen over time due to climate change. The simple scatter plot I created shows no correlation, with the only data points moving out of the dense bottom region being in more recent years with the aforementioned moth explosions.
plot(df$year,df$individuals)