Function of the day

substr(),

Usefull when you work with strings. As a matter of fact, for intensive handling of strings I would recommend package stringi, but if you just need to extract some part of the string substr() is the right choice.

The syntax is simple. You need to specify:
x - the string of which you want to extract something
start - position in the strings, where your extraction starts, and
stop - postion in the strings, where your extraction ends. Easy-peasy!

Off we go!

Lets say you have a long list of files (n = 100), for example audio recordings of your ornithological survey. You have been naming these files in a very smart way - you inlcuded date_time-of-the-day_site_species. So you can easily find recordings from given day or species solely based on its name. But now you just need a brief summary on the number of recordings for the skylark for each day.

This may be your data frame with all files names (myfilesnames):

I will not now explain all details on how this data frame is created as it is not the focus of the current topic; it is also not the most elegant way to do it); So, do not think of it too much, and if you do not understand it just copy, paste and execute it in R. You basically need the final product of this whole code below, to apply the cool funtion - substr().

dat <- rep(c("2017-04-07", "2017-04-08", "2017-04-09", "2017-04-10", "2017-04-11"), each = 20)
tim <- rep(c("AM", "PM"), each = 50)
sit <- rep(c("Site1", "Site2"), each = 50)

set.seed(127)
spe <- sample(c("Skylark", "Merlin", "Greenfinch", "GreatTit", "Chiffchaf"), 100, replace = TRUE)

myfilesnames <- data.frame(fileID = paste(dat, tim, sit,spe, sep = "_"))

Having list of files as it is in your myfilesnames data frame, you can do what you want - quickly figure out how many skylarks you have recorded each day.

So, what you actually have to do now is to:
1) extract day and species from the data frame,
2) calculate the number of records per species per day, and
3) filter for the skylark

library(dplyr) # It will be way easier with the help of dplyr

myfilesnames %>% 
  mutate(Day = substr(fileID, start = 1, stop = 10), 
  # Simple here, you apply the function of the day to extarct day info  

                        Species = substr(fileID, 
                                         start = 21, 
                                         stop = nchar(as.character(fileID)))) %>%
# Almost the same, although a bit more complicated, as the ending position varies for species due to different length of their names, so we need to do a little trick here - use nchar() in place of the ending postion in substr(); nchar() calculates the number of characters in the string, which also denotes the last position of in the string
  
  
  group_by(Day, Species) %>%
  # We group by day and species
  
  summarise(Nrecordings = n()) %>%
  # count with n() all the recodings per group (day/species)
  
  filter(Species == "Skylark")  # get the result for species of interest

## # A tibble: 5 x 3
## # Groups:   Day [5]
##   Day        Species Nrecordings
##   <chr>      <chr>         <int>
## 1 2017-04-07 Skylark           5
## 2 2017-04-08 Skylark           1
## 3 2017-04-09 Skylark           6
## 4 2017-04-10 Skylark           5
## 5 2017-04-11 Skylark           3

There are only few things in the world that don’t change… and one of those is that R is great!

Function of the day

Katarzyna Wojczulanis-Jakubas

11 April 2018

substr(),