Using R to do what Dan did in Excel, plus a couple of figures. A nice little exercise in the use of the dplyr package from the tidyverse collection of packages.

If you want to try this for yourself:


library(tidyverse)
library(here) # so you don't have to worry about the whole idea of working directory, and pathway stuff.
library(lubridate) # for converting the date column to type 'date'.

Read in the data

path<-here("data","Bask_dep.csv") # this is where the package here come in handy.
basking_sharks<-read_csv(path)
#glimpse(basking_sharks)

Get data into the form that NEODAAS wants

basking_sharks<-basking_sharks %>%
  rename(Date=`Date ymd`) %>%
  mutate(Date=ymd(Date)) %>%  # this from the lubridate package - makes R recognise the dates AS dates.
  rename(Latitude=Lat) %>%
  rename(Longitude=Long) %>%
  select(Latitude,Longitude,Date) # just these three columns, in this order, as NEODAAS requires.
glimpse(basking_sharks)
## Rows: 23,216
## Columns: 3
## $ Latitude  <dbl> 51.69266, 51.70367, 51.70683, 51.70222, 51.72769, 50.60809,…
## $ Longitude <dbl> -5.159243, -5.151325, -5.167469, -5.269924, -5.292040, -1.1…
## $ Date      <date> 1960-06-06, 1960-06-06, 1960-06-06, 1960-06-06, 1960-06-06…

Number of observations by year

n_by_year <- basking_sharks %>%
  group_by (year=year(Date)) %>%  #year() function from lubridate extracts the year from a date
  summarise (sightings=n())
n_by_year
ggplot(n_by_year,aes(x=year,y=sightings))+
  geom_col()+
  xlab("Year")+
  ylab("Sightings")+
  theme_bw()

Number of observations by month

n_by_month <- basking_sharks %>%
  group_by (month=month(Date)) %>% # month() function from lubridate extracts the month from a date
  summarise (sightings=n())
n_by_month
ggplot(n_by_month,aes(x=as.factor(month),y=sightings))+
  geom_col()+
  xlab("Month")+
  ylab("Sightings")+
  theme_bw()

Pick out an individual year

year<-2016

data_subset<-basking_sharks %>%
  filter(year(Date)==year)
glimpse(data_subset)
## Rows: 291
## Columns: 3
## $ Latitude  <dbl> 50.40007, 52.13924, 50.39822, 50.05155, 50.41750, 51.59781,…
## $ Longitude <dbl> -3.480830, -4.624300, -3.473860, -5.534670, -5.128550, -3.9…
## $ Date      <date> 2016-02-17, 2016-03-24, 2016-03-30, 2016-04-06, 2016-04-19…

Write that year’s data out to a csv file with name bs_<year>

filename<-str_c("bs_",as.character(year),".csv")
path<-here("data",filename) # using here again to simplify paths to files.
write_csv(data_subset,path)

The file will appear in the project data folder