Shared Workbook
Group Work (Lynx)
Introducing the data
Question: what trends and comparisons can be seen in lynx populations across the 19th and 20th century?
Background: Lynx (genus containing 4 distinct species) are medium sized mountain cats found in forest terrains across Europe, north America and Asia. This study looked at population changes by sampling population sizes across 70 sites within the 19th and 20th century, to monitor their growth and/or decline in these regions.
Understanding the data: This data set has 3 variables: “id”- giving the study site id; “lynx”- the total number of lynx captured in that area; and “century”- the century in which the lynx were captured. ID and Century are categorical data types as the values fit into the defined set categories, whereas lynx is a numerical discreet date type, with values able to fit anywhere on a defined numerical scale. Each variable has a total of 70 values
{r}.} 1 + 1
{r}.} library(tidyverse) install.packages("psych") library(psych) describe(Lynx_Dataset_Formative_) # it doesnt compare the 19th and 20th at all Lynx_Dataset_Formative_ glimpse(Lynx_Dataset_Formative_)
The below table shows a descriptive summary of all the values within the lynx variable. From this we can gather useful basic information about the lynx populations:
{r}.} describe(Lynx_Dataset_Formative_$lynx)
to better view this data, it seems sensible to try and separate the 19th and 20th century into two columns. this will allow us to better summaries and describe the data, compare values and spot trends
{r}.} Lynx_Dataset_Formative_ %>% mutate( cent19 = ifelse(century == 19, lynx, NA), cent20 = ifelse(century == 20, lynx, NA) )
Visualizing the data
Boxplot
box plot first attempt, forgot to add the colour
```{r}.} ggplot(Lynx_Dataset_Formative_, aes(x = factor(century), y = lynx)) + geom_boxplot() + labs(title = “Lynx Captures Across Centuries”, x = “Century”, y = “Number of Lynx Captures”) + theme_minimal()
ggplot(Lynx_Dataset_Formative_, aes(x = factor(century), y = lynx, fill = factor(century))) + geom_boxplot() + scale_fill_manual(values = c(“19” = “light blue”, “20” = “light pink”)) + labs(title = “Lynx Captures Across Centuries”, x = “Century”, y = “Number of Lynx Captures”) + theme_minimal()
### Density graph
```{r}.}
ggplot(Lynx_Dataset_Formative_, aes(x = lynx, fill = factor(century))) +
geom_density(alpha = 0.5) +
scale_fill_manual(values = c("19" = "cyan", "20" = "red")) +
labs(title = "Lynx Captures by Century",
x = "Number of Lynx Captures",
y = "Density",
fill = "Century") +
theme_minimal()
{r}.} head(Lynx_Dataset_Formative_)
Line graph
{r}.} ggplot(Lynx_Dataset_Formative_, aes(x = id, y = lynx, group = century, color = as.factor(century))) + geom_line() + geom_point() + labs(title = "Line graph", x = "ID", y = "Lynx Count", color = "Century") + theme_minimal()
```{r}.} install.packages(“ggplot2”) # If not already installed library(ggplot2) # Convert century to factor if it’s not already Lynx_Dataset_Formative_\(century <- as.factor(Lynx_Dataset_Formative_\)century)
Create the plot
ggplot(Lynx_Dataset_Formative_, aes(x = id, y = lynx, fill = century)) + geom_bar(stat = “identity”) + scale_fill_manual(values = c(“19” = “lightblue”, “20” = “lightgreen”)) + labs(x = “ID”, y = “Lynx Count”, fill = “Century”) + theme_minimal()
```{r}.}
# Display the first few rows and a summary of the dataset
head(Lynx_Dataset_Formative_)
summary(Lynx_Dataset_Formative_)
# Check column names
names(Lynx_Dataset_Formative_)
# Load ggplot2 library for plotting
library(ggplot2)
# Create a scatter plot of lynx population by century
ggplot(Lynx_Dataset_Formative_, aes(x = century, y = lynx)) +
geom_point(size = 3, color = "blue") +
labs(
title = "Lynx Population by Century",
x = "Century",
y = "Lynx Population"
) +
theme_minimal()
Stats testing
t test: This is a two sample t-test to compare the lynx population samples between the 19th and 20th century. The t-test was chosen as it is designed to compare two sets of data, using the means to evaluate if there is significant difference. The data we’re comparing (captures in the 19th and 20th century) are independent of one another, which is an important point when using a t-test.
{r}.} t.test(lynx ~ century, data = Lynx_Dataset_Formative_)
The p value is above the rejection level of 0.05, meaning there is no statistical significance in this data comparison. It suggests any difference seen in due to random chance rather than significant changes or effects.
{r}.} # Perform Wilcoxon Rank-Sum Test to compare lynx populations between centuries wilcox.test(lynx ~ century, data = Lynx_Dataset_Formative_, exact = FALSE)
```{r}.} ##Chi2 test: see if there is an association between century and lynx counts
library(dplyr)
Create a contingency table
contingency_table <- table(Lynx_Dataset_Formative_\(century, cut(Lynx_Dataset_Formative_\)lynx, breaks=5)) # Adjust breaks as needed
Print the contingency table
print(contingency_table)
Perform Chi-squared test
chi_squared_result <- chisq.test(contingency_table)
Print the result
print(chi_squared_result)
```
Hypothetical next steps:
we can conclude there is no strong evidence supporting the idea that lynx populations have significantly changed over the 19th and 20th centuries.
to improve, increase the number of samples taken, either by sampling more areas, or sampling the same areas repeatedly over time. This would allow for a better sample size to be represented