Finding problem matrices

First load the required R packages

library(Rcompadre)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Then download the database.

compadre <- cdb_fetch("Compadre")

## This is COMPADRE version 6.23.5.0 (release date May_06_2023)
## See user agreement at https://compadre-db.org/Help/UserAgreement
## See how to cite with `citation(Rcompadre)`

Wherever possible, the compadre databases split the complete A matrix into the submatrices, U, F and C. These submatrices represent growth/survival, sexual reproduction and clonal reproduction respectively.

In this example I want to find matrices that have a problem in the U matrix. Specifically, I want to find cases where stage-specific survival is recorded as zero, or as 1. These are unrealistic and likely caused by sampling error. Some analyses may not work with these matrices so it can be a good idea to examine them carefully.

Before proceeding check out the documentation for the function cdb_flag which examines the data for common problems, and flags them in columns added to the data base.

First I extract the U matrices into a list using matU.

U_matrices <- matU(compadre)

I can look at an individual matrix using square-bracket subsetting. For example, here I look at the 10th matrix.

U_matrices[[10]]

##      U1   U2   U3   U4  U5
## U1 0.38 0.00 0.00 0.00 0.0
## U2 0.11 0.18 0.10 0.07 0.0
## U3 0.01 0.10 0.23 0.26 0.1
## U4 0.00 0.00 0.10 0.15 0.6
## U5 0.00 0.00 0.02 0.04 0.1

Next I need to write a small function that examines a single matrix and tests whether it has a problem. Obviously, one could change this function to identify other issues but in this case it checks whether any column sums of the U matrix are 0 or equal to 1.

problemFinderFunction <- function(m){
  column_sums <- colSums(m)
  #Check for problem survival
  problem_detected <- any(column_sums == 0) || any(column_sums == 1)
  return(problem_detected)
}

It is a good idea to check the function on a known matrix. Like this:

U_matrices[[10]]

##      U1   U2   U3   U4  U5
## U1 0.38 0.00 0.00 0.00 0.0
## U2 0.11 0.18 0.10 0.07 0.0
## U3 0.01 0.10 0.23 0.26 0.1
## U4 0.00 0.00 0.10 0.15 0.6
## U5 0.00 0.00 0.02 0.04 0.1

colSums(U_matrices[[10]])

##   U1   U2   U3   U4   U5 
## 0.50 0.28 0.45 0.52 0.80

problemFinderFunction(U_matrices[[10]])

## [1] FALSE

and

U_matrices[[1]]

##       U1 U2 U3 U4 U5 U6 U7 U8 U9
## U1 0.077  0  0  0  0  0  0  0  0
## U2 0.037  0  0  0  0  0  0  0  0
## U3 0.003  0  0  0  0  0  0  0  0
## U4 0.000  0  0  0  0  0  0  0  0
## U5 0.000  0  0  0  0  0  0  0  0
## U6 0.000  0  0  0  0  0  0  0  0
## U7 0.000  0  0  0  0  0  0  0  0
## U8 0.000  0  0  0  0  0  0  0  0
## U9 0.000  0  0  0  0  0  0  0  0

colSums(U_matrices[[1]])

##    U1    U2    U3    U4    U5    U6    U7    U8    U9 
## 0.117 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

problemFinderFunction(U_matrices[[1]])

## [1] TRUE

When you are sure that it works correctly, you can apply the function to the whole list of matrices using sapply. The result here is a binary vector (TRUE/FALSE) indicating whether the matrix has a problem.

problemMatrix <- sapply(U_matrices,problemFinderFunction)

You could add this indicator vector as an additional column to the original COMPADRE database metadata like this.

compadre_metadata <- cdb_metadata(compadre)
compadre_metadata <- cbind(compadre_metadata,problemMatrix)

Then you can filter the compadre_metadata in the normal way.

problemData <- compadre_metadata %>% 
  filter(problemMatrix == TRUE) %>% 
  select(SpeciesAuthor, Authors, Journal, YearPublication, DOI_ISBN) %>%
  as_tibble()

problemData

## # A tibble: 4,685 × 5
##    SpeciesAuthor                       Authors  Journal YearPublication DOI_ISBN
##    <chr>                               <chr>    <chr>   <chr>           <chr>   
##  1 Abies_balsamea                      "Silver… Am Nat  1999            10.1086…
##  2 Abies_balsamea                      "Silver… Am Nat  1999            10.1086…
##  3 Agave_angustifolia                  "Arias-… Bot Sci 2016            10.1712…
##  4 Ascophyllum_nodosum_3               "Aberg"  Mar Ec… 1990            10.3354…
##  5 Astragalus_australis_var._olympicus "Kaye"   <NA>    1990            <NA>    
##  6 Astragalus_australis_var._olympicus "Kaye"   <NA>    1990            <NA>    
##  7 Betula_pendula                      "Maille… J Appl… 1982            10.2307…
##  8 Carex_membranacea                   "Tolvan… J Veg … 2001            10.2307…
##  9 Carex_membranacea                   "Tolvan… J Veg … 2001            10.2307…
## 10 Cassia_nemophila                    "Siland… Oecolo… 1983            10.1007…
## # ℹ 4,675 more rows

Finding `problem` matrices

Owen Jones

2023-07-26