Research methods - Assignment 1

2025-03-18

Dan Palmer

Study 1 - Habitat use by small rodents

H0

The rodent uses habitats, in proportion to their availability within the environment.

H1

The rodent shows preference in habitat selection, rather than using habitats in proportion to their availability within the environment.

Suitable statistical test

Equation used for Chi-Square test \[ \chi^2 = \sum \frac{(O_i - E_i)^2}{E_i} \]

Chi-Square test: Step by step

#load necessary packages

library(readxl)
library(knitr)

#load excel sheet containing rodent data

Book1 <- read_excel("Book1.xlsx")

#format and load table

Table 1: Overview of the area (km2) of each habitat type and number of rodent locations recorded within them

kable(Book1)
Habitat Type Area (km2) Number of rodent locations
Primary Forest 4 17
Secondary growth forest 2 2
Natural meadow 3 15
Recent clear cut 1 4
Recently burned 1 8
Alpine tundra 2 5
Agricultural land 2 4

#load Excel data

Book1 <- read_excel("Book1.xlsx")

#create data table, skipping the first column (remove habitats)

data <- read_excel("Book1.xlsx", sheet = "Sheet1", skip = 1)

#set new column headers

colnames(data) <- c("Habitat_Type", "Area_km2", "Rodent_Locations")

#calculate expected frequencies

data$Expected <- (data$Area_km2 / sum(data$Area_km2)) * sum(data$Rodent_Locations)

#set observed and expected variables

observed <- data$Rodent_Locations
expected <- data$Expected

#perform chisq test and print output

chisq_test <- chisq.test(x = observed, p = expected / sum(expected))
print(chisq_test)
## 
##  Chi-squared test for given probabilities
## 
## data:  observed
## X-squared = 13.114, df = 6, p-value = 0.04127

Chi square test output evaluation

The X-squared statistic of the Chi-square test is 13.114 with 6 degrees of freedom.

This gave a p-value of 0.04127.

At the p <0.05 threshold, this indicates a statistically significant effect.

Since the test proved statistical significance, the null hypothesis can be confidently rejected.

It is likely that the rodent shows preference in habitat selection, rather than using them in proportion to their availability within the environment.

Discussion

Figure 1: Recently burned forests, due to their small size and high number of visitations (8) were the most used, proportionally.

Recently burned forest, may encourage seed dispersal. This is likely to attract rodents, perhaps explaining it’s use by the rodent in the study (Puig-Gironès, 2022). Furthermore, invertebrates primarily hidden by leaf litter and dense vegetation are available for foraging by small rodents.

References

Dytham, Calvin (2011) Choosing and Using Statistics: A Biologist’s Guide (3rd. ed.),199-210

Patricia Morales-Diaz, S., Yolotl Alvarez-Anorve, M., Edith Zamora-Espinoza, M., Dirzo, R., Oyama, K. and Daniel Avila-Cabadilla, L. (2019) ‘Rodent community responses to vegetation and landscape changes in early successional stages of tropical dry forest’, Forest Ecology and Management, 433, pp. 633–644

Puig-Gironès Roger (2023) Can predators influence small rodent foraging activity rates immediately after wildfires?. International Journal of Wildland Fire 32, 1391-1403.

_______________________________________________________________________________-

Study 2 - How does habitat quality affect the population size of a species?

H0

Habitat quality has no affect on the population size of a species.

H1

Habitat quality does have an effect on the population size of a species.

Check for normality

To determine the statistical analysis needed to test the provided data, a Shapiro-Wilks normality test was performed.

#load and view dataset from Excel

quality <- read_excel("quality.xlsx")

Table 2: Overview of habitat quality indices (Quality_index) and population size of species (Species_size)

kable(quality)
Quality_index Species_size
0.60 450
0.55 350
0.80 750
0.85 850
0.95 1000
0.25 150
0.70 600
0.80 750
0.40 200
0.90 950

#structure data set

str(quality)
## tibble [10 × 2] (S3: tbl_df/tbl/data.frame)
##  $ Quality_index: num [1:10] 0.6 0.55 0.8 0.85 0.95 0.25 0.7 0.8 0.4 0.9
##  $ Species_size : num [1:10] 450 350 750 850 1000 150 600 750 200 950
summary(quality)
##  Quality_index     Species_size 
##  Min.   :0.2500   Min.   : 150  
##  1st Qu.:0.5625   1st Qu.: 375  
##  Median :0.7500   Median : 675  
##  Mean   :0.6800   Mean   : 605  
##  3rd Qu.:0.8375   3rd Qu.: 825  
##  Max.   :0.9500   Max.   :1000

#perform Shapiro-Wilk test for normality

shapiro.test(quality$`Quality_index`)
## 
##  Shapiro-Wilk normality test
## 
## data:  quality$Quality_index
## W = 0.93181, p-value = 0.4659
shapiro.test(quality$`Species_size`)
## 
##  Shapiro-Wilk normality test
## 
## data:  quality$Species_size
## W = 0.93423, p-value = 0.4907

#check for normality using Q-Q PLots

qqnorm(quality$Quality_index)
qqline(quality$Quality_index, col = "red")

Figure 2: Normal Q-Q plot of habitat quality index showing the quantiles of the data plotted against the quantiles of a normal distribution. Points close to the diagonal line suggest a normal distribution

qqnorm(quality$Species_size)
qqline(quality$Species_size, col = "red")

Figure 3: Normal Q-Q plot of population sizes showing the quantiles of the data plotted against the quantiles of a normal distribution. Points close to the diagonal line suggest a normal distribution

Choosing a statistical test

The Pearson correlation coefficient: Output

output <- cor.test(x = quality$Quality_index, 
                   y = quality$Species_size, 
                   alternative = "two.sided")
print(output)
## 
##  Pearson's product-moment correlation
## 
## data:  quality$Quality_index and quality$Species_size
## t = 14.784, df = 8, p-value = 4.312e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9239259 0.9959234
## sample estimates:
##       cor 
## 0.9821867

Pearson’s correlation coefficient output evaluation

Linear regression analysis

Figure 4: Regression analysis observing the relatonship between habitat quality and population size of species.

The following model is being used: A=β0+β1⋅B+ϵ

output<-lm(Species_size ~ Quality_index, data=quality)
mysummary<-summary(output)
mysummary
## 
## Call:
## lm(formula = Species_size ~ Quality_index, data = quality)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -83.85 -35.11 -12.98  34.95 111.11 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -290.24      63.53  -4.568  0.00183 ** 
## Quality_index  1316.52      89.05  14.784 4.31e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 60.79 on 8 degrees of freedom
## Multiple R-squared:  0.9647, Adjusted R-squared:  0.9603 
## F-statistic: 218.6 on 1 and 8 DF,  p-value: 4.312e-07

Discussion

_______________________________________________________________________________-

Study 3 - Does the temperature in July differ among the three weather stations at locations (Helsinki, Hyytiälä, and Kittilä) in different parts of Finland?

H0

The temperature in July does not differ between the 3 weather stations.

H1

There is a difference between the weather in July at the 3 weather stations.

Testing for normality

I performed a Shapiro-Wilks test to determine this, followed by a histogram to visualize normality.

#filter for all data recorded in July

TemperatureComparison_1_ <- TemperatureComparison_1_ %>% filter(Month == 7)
shapiro.test(TemperatureComparison_1_$Temperature[TemperatureComparison_1_$Station == "Helsinki Kumpula"])
## 
##  Shapiro-Wilk normality test
## 
## data:  TemperatureComparison_1_$Temperature[TemperatureComparison_1_$Station == "Helsinki Kumpula"]
## W = 0.99528, p-value = 0.02216
shapiro.test(TemperatureComparison_1_$Temperature[TemperatureComparison_1_$Station == "Juupajoki Hyytiälä"])
## 
##  Shapiro-Wilk normality test
## 
## data:  TemperatureComparison_1_$Temperature[TemperatureComparison_1_$Station == "Juupajoki Hyytiälä"]
## W = 0.99075, p-value = 0.0001351
shapiro.test(TemperatureComparison_1_$Temperature[TemperatureComparison_1_$Station == "Kittilä Pokka"])
## 
##  Shapiro-Wilk normality test
## 
## data:  TemperatureComparison_1_$Temperature[TemperatureComparison_1_$Station == "Kittilä Pokka"]
## W = 0.9719, p-value = 9.187e-11
library(wesanderson)
ggplot (TemperatureComparison_1_, aes(x = Temperature, fill = Station)) + geom_histogram(bins = 30, alpha = 0.7) + scale_fill_manual(values = wes_palette("Darjeeling1", n=3, type = "discrete")) + labs(title = "Temperature variability", x = "Temperature", y = "Observation") + theme_classic()

Figure 5: Histogram showing variability in temperature across three weather stations: Helsinki Kumpula (red), Juupajoki Hyytiälä (green), and Kittilä Pokka (yellow)

Choosing a statistical test

A Kruskal-Wallis test is suitable to investigate whether temperatures differ across the three weather stations.

Performing the Kruskal-Wallis test

kruskal_result <- kruskal.test(Temperature ~ Station, data = TemperatureComparison_1_)
print(kruskal_result)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Temperature by Station
## Kruskal-Wallis chi-squared = 278.26, df = 2, p-value < 2.2e-16

Since a significant effect was found I performed Dunn’s Post-hoc test to assess specific differences between the three weather stations.

#performing Dunn’s Post-hoc test

library(dunn.test)
dunn_result <- dunn.test(TemperatureComparison_1_$Temperature, TemperatureComparison_1_$Station, method = "bonferroni")
##   Kruskal-Wallis rank sum test
## 
## data: x and group
## Kruskal-Wallis chi-squared = 278.26, df = 2, p-value = 0
## 
## 
##                            Comparison of x by group                            
##                                  (Bonferroni)                                  
## Col Mean-|
## Row Mean |   Helsinki   Juupajok
## ---------+----------------------
## Juupajok |   11.06154
##          |    0.0000*
##          |
## Kittilä  |   16.34143   5.257851
##          |    0.0000*    0.0000*
## 
## alpha = 0.05
## Reject Ho if p <= alpha/2
print(dunn_result)
## $chi2
## [1] 278.26
## 
## $Z
## [1] 11.061544 16.341434  5.257851
## 
## $P
## [1] 9.637511e-29 2.503082e-60 7.287413e-08
## 
## $P.adjusted
## [1] 2.891253e-28 7.509245e-60 2.186224e-07
## 
## $comparisons
## [1] "Helsinki Kumpula - Juupajoki Hyytiälä"
## [2] "Helsinki Kumpula - Kittilä Pokka"     
## [3] "Juupajoki Hyytiälä - Kittilä Pokka"

#visualising post-hoc test

ggplot(TemperatureComparison_1_, aes(x = Station, y = Temperature, fill = Station)) +
    geom_boxplot(alpha = 0.7) +  # Adjust alpha for transparency
    scale_fill_manual(values = wes_palette("Darjeeling1", n = 3, type = "discrete"))

Figure 6: Boxplot, showing mean differences in temperature across three weather stations: Helsinki Kumpula (red), Juupajoki Hyytiälä (green), and Kittilä Pokka (yellow)

Dunn’s test evaluation

Discussion

Table 3: The latitude of the 3 weather stations may explain the results outputted in Dunn’s post-hoc test

Weather station Latitude
Helsinki Kumpula 60.20456
Juupajoki Hyytiälä 61.84534
Kittilä Pokka 68.15895

References

Dytham, Calvin (2011) Choosing and Using Statistics: A Biologist’s Guide (3rd. ed.),199-210

Kim, T.K. and Park, J.H. (2019) ‘More about the basic assumptions of t-test: normality and sample size’, Korean journal of anesthesiology, 72(4), pp. 331–335