Shared Workbook

Author

Group

Group Work (Lynx)

Question: what trends and comparisons can be seen in lynx populations across the 19th and 20th century?

Background: Lynx (genus containing 4 distinct species) are medium sized mountain cats found in forest terrains across Europe, north America and Asia. This study looked at population changes by sampling population sizes across 70 sites within the 19th and 20th century, to monitor their growth and/or decline in these regions.

Understanding the data: This data set has 3 variables: “id”- giving the study site id; “lynx”- the total number of lynx captured in that area; and “century”- the century in which the lynx were captured. ID and Century are categorical data types as the values fit into the defined set categories, whereas lynx is a numerical discreet date type, with values able to fit anywhere on a defined numerical scale. Each variable has a total of 70 values

Load data

Change accordingly with your file structure

Lynx_Dataset_Formative_ <- read.delim("~/Downloads/linx")

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ purrr::%||%()   masks base::%||%()
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

#install.packages("psych")  
library(psych)


Anexando pacote: 'psych'

Os seguintes objetos são mascarados por 'package:ggplot2':

    %+%, alpha

describe(Lynx_Dataset_Formative_) # it doesnt compare the 19th and 20th at all   Lynx_Dataset_Formative_

        vars  n    mean      sd median trimmed     mad min  max range skew
id*        1 70   35.50   20.35   35.5   35.50   25.95   1   70    69 0.00
lynx       2 70 1668.07 1689.30  904.0 1423.89 1087.49  39 6991  6952 1.24
century    3 70   19.50    0.50   19.5   19.50    0.74  19   20     1 0.00
        kurtosis     se
id*        -1.25   2.43
lynx        1.03 201.91
century    -2.03   0.06

glimpse(Lynx_Dataset_Formative_)

Rows: 70
Columns: 3
$ id      <chr> "A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8", "A9", "A10", "…
$ lynx    <int> 3311, 6721, 4254, 687, 255, 473, 358, 784, 1594, 1676, 2251, 1…
$ century <int> 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19…

The below table shows a descriptive summary of all the values within the lynx variable. From this we can gather useful basic information about the lynx populations:

describe(Lynx_Dataset_Formative_$lynx)

   vars  n    mean     sd median trimmed     mad min  max range skew kurtosis
X1    1 70 1668.07 1689.3    904 1423.89 1087.49  39 6991  6952 1.24     1.03
       se
X1 201.91

to better view this data, it seems sensible to try and separate the 19th and 20th century into two columns. this will allow us to better summaries and describe the data, compare values and spot trends

Lynx_Dataset_Formative_ %>%   mutate(     cent19 = ifelse(century == 19, lynx, NA),     cent20 = ifelse(century == 20, lynx, NA)   )

    id lynx century cent19 cent20
1   A1 3311      19   3311     NA
2   A2 6721      19   6721     NA
3   A3 4254      19   4254     NA
4   A4  687      19    687     NA
5   A5  255      19    255     NA
6   A6  473      19    473     NA
7   A7  358      19    358     NA
8   A8  784      19    784     NA
9   A9 1594      19   1594     NA
10 A10 1676      19   1676     NA
11 A11 2251      19   2251     NA
12 A12 1426      19   1426     NA
13 A13  756      19    756     NA
14 A14  299      19    299     NA
15 A15  201      19    201     NA
16 A16  229      19    229     NA
17 A17  469      19    469     NA
18 A18  736      19    736     NA
19 A19 2042      19   2042     NA
20 A20 2811      19   2811     NA
21 A21 4431      19   4431     NA
22 A22 2511      19   2511     NA
23 A23  389      19    389     NA
24 A24   73      19     73     NA
25 A25   39      19     39     NA
26 A26   49      19     49     NA
27 A27   59      19     59     NA
28 A28  188      19    188     NA
29 A29  377      19    377     NA
30 A30 1292      19   1292     NA
31 A31 4031      19   4031     NA
32 A32 3495      19   3495     NA
33 A33  587      19    587     NA
34 A34  105      19    105     NA
35 A35  153      19    153     NA
36  B1  387      20     NA    387
37  B2  758      20     NA    758
38  B3 1307      20     NA   1307
39  B4 3465      20     NA   3465
40  B5 6991      20     NA   6991
41  B6 6313      20     NA   6313
42  B7 3794      20     NA   3794
43  B8 1836      20     NA   1836
44  B9  345      20     NA    345
45 B10  382      20     NA    382
46 B11  808      20     NA    808
47 B12 1388      20     NA   1388
48 B13 2713      20     NA   2713
49 B14 3800      20     NA   3800
50 B15 3091      20     NA   3091
51 B16 2985      20     NA   2985
52 B17 3790      20     NA   3790
53 B18  674      20     NA    674
54 B19   81      20     NA     81
55 B20   80      20     NA     80
56 B21  108      20     NA    108
57 B22  229      20     NA    229
58 B23  399      20     NA    399
59 B24 1132      20     NA   1132
60 B25 2432      20     NA   2432
61 B26 3574      20     NA   3574
62 B27 2935      20     NA   2935
63 B28 1537      20     NA   1537
64 B29  529      20     NA    529
65 B30  485      20     NA    485
66 B31  662      20     NA    662
67 B32 1000      20     NA   1000
68 B33 1590      20     NA   1590
69 B34 2657      20     NA   2657
70 B35 3396      20     NA   3396

Visualizing the data

Boxplot

box plot first attempt, forgot to add the colour

ggplot(Lynx_Dataset_Formative_, 
       aes(x = factor(century), y = lynx)) + 
  geom_boxplot() + 
  labs(title = "Lynx Captures Across Centuries", x = "Century", y = "Number of Lynx Captures") +
  theme_minimal()

Density graph

ggplot(Lynx_Dataset_Formative_, aes(x = lynx, fill = factor(century))) +
   geom_density(alpha = 0.5) +
   scale_fill_manual(values = c("19" = "cyan", "20" = "red")) +
   labs(title = "Lynx Captures by Century",
        x = "Number of Lynx Captures",
        y = "Density",
        fill = "Century") +
   theme_minimal()

Line graph

ggplot(Lynx_Dataset_Formative_, aes(x = id, y = lynx, group = century, color = as.factor(century))) +
  geom_line() +   
  geom_point() +   
  labs(title = "Line graph", x = "ID", y = "Lynx Count", color = "Century") +   
  theme_minimal()

Bar graph

# Convert century to factor if it's not already
Lynx_Dataset_Formative_$century <- as.factor(Lynx_Dataset_Formative_$century)

ggplot(Lynx_Dataset_Formative_, aes(x = id, y = lynx, fill = century)) + 
  geom_bar(stat = "identity") + 
  scale_fill_manual(values = c("19" = "lightblue", "20" = "lightgreen")) + 
  labs(x = "ID", y = "Lynx Count", fill = "Century") +
  theme_minimal()

# Display the first few rows and a summary of the dataset
head(Lynx_Dataset_Formative_)

  id lynx century
1 A1 3311      19
2 A2 6721      19
3 A3 4254      19
4 A4  687      19
5 A5  255      19
6 A6  473      19

summary(Lynx_Dataset_Formative_)

      id                 lynx        century
 Length:70          Min.   :  39.0   19:35  
 Class :character   1st Qu.: 378.2   20:35  
 Mode  :character   Median : 904.0          
                    Mean   :1668.1          
                    3rd Qu.:2786.5          
                    Max.   :6991.0

# Check column names
names(Lynx_Dataset_Formative_)

[1] "id"      "lynx"    "century"

# Load ggplot2 library for plotting
library(ggplot2)

# Create a scatter plot of lynx population by century
ggplot(Lynx_Dataset_Formative_, aes(x = century, y = lynx)) +
  geom_point(size = 3, color = "blue") +
  labs(
    title = "Lynx Population by Century",
    x = "Century",
    y = "Lynx Population"
  ) +
  theme_minimal()

Stats testing

t test: This is a two sample t-test to compare the lynx population samples between the 19th and 20th century. The t-test was chosen as it is designed to compare two sets of data, using the means to evaluate if there is significant difference. The data we’re comparing (captures in the 19th and 20th century) are independent of one another, which is an important point when using a t-test.

t.test(lynx ~ century, data = Lynx_Dataset_Formative_)


    Welch Two Sample t-test

data:  lynx by century
t = -1.3188, df = 67.704, p-value = 0.1917
alternative hypothesis: true difference in means between group 19 and group 20 is not equal to 0
95 percent confidence interval:
 -1331.3348   271.8491
sample estimates:
mean in group 19 mean in group 20 
        1403.200         1932.943

The p value is above the rejection level of 0.05, meaning there is no statistical significance in this data comparison. It suggests any difference seen in due to random chance rather than significant changes or effects.

# Perform Wilcoxon Rank-Sum Test to compare lynx populations between centuries  
wilcox.test(lynx ~ century, data = Lynx_Dataset_Formative_, exact = FALSE)


    Wilcoxon rank sum test with continuity correction

data:  lynx by century
W = 465.5, p-value = 0.08528
alternative hypothesis: true location shift is not equal to 0

##Chi2 test: see if there is an association between century and lynx counts 
contingency_table <-table(Lynx_Dataset_Formative_$century, cut(Lynx_Dataset_Formative_$lynx, breaks=5)) # Adjust breaks as needed 
print(contingency_table)

    
     (32,1.43e+03] (1.43e+03,2.82e+03] (2.82e+03,4.21e+03] (4.21e+03,5.6e+03]
  19            23                   6                   3                  2
  20            18                   6                   9                  0
    
     (5.6e+03,7e+03]
  19               1
  20               2

chi_squared_result<-chisq.test(contingency_table)

Warning in chisq.test(contingency_table): Aproximação do qui-quadrado pode
estar incorreta

print(chi_squared_result)


    Pearson's Chi-squared test

data:  contingency_table
X-squared = 5.9431, df = 4, p-value = 0.2034

Hypothetical next steps:

we can conclude there is no strong evidence supporting the idea that lynx populations have significantly changed over the 19th and 20th centuries.
to improve, increase the number of samples taken, either by sampling more areas, or sampling the same areas repeatedly over time. This would allow for a better sample size to be represented