I. INTRODUCTION

1 Statement

So called “big data” are being increasingly used in the life sciences because they provide a lot of information on large scales and very fine resolution. However, these datasets can be quite tricky to work with. Most of the time the data is in the form of presence-only records.

Volunteers, or social media users, take a picture or record the presence of a particular species and they report the time of the sighting and its location. Therefore, what we have is thousands of points with temporal and spatial information attached to them.

2. Objective

Create species occurence map of puffins and puffin watchers

3. Data

Two datasets, one from the Global Biodiversity Information Facility (GBIF) and one from Flickr

4. Main tasks

Firstly, We will start with downloading all the occurrences of atlantic puffin in the UK that are in the GBIF database. Then we will do some spatial manipulation of data attached to pictures of atlantic puffins taken in the UK and uploaded on Flickr. Finally, we will produce density maps of both of these datasets to look for hotspots of puffins and/or puffin watchers.

Clear Workspace

rm(list = ls())

II. DATA ANALYSIS

1. Download puff occurrences from GBIF

library("rgbif")

1.1 UK Code

UK_code <- isocodes[grep("United Kingdom", isocodes$name), "code"]

1.2 Download the occurrence records for the atlantic puffin in the UK

occurence record: du lieu ghi nhan su xuat hien cua puffin

occur <- occ_search(scientificName = "Fratercula arctica", country = UK_code,
                    hasCoordinate = TRUE, limit = 3000, year = '2006,2016',
                    return = "data")

This will return a dataset of all the occurrences of atlantic puffin recorded in the UK between 2006 and 2016 that have geographic coordinates.

1.3 Plot occurences on map

library(ggplot2)
library(maps)
library(ggthemes)

(map <- ggplot(occur$data, aes(x = decimalLongitude, y = decimalLatitude)) + 
    # Specify to only present the UK region of the world in the map 
    # Also change the colour, size of map country borders
    borders(database = "world", regions = "UK", colour = "gray40", size = 0.3) +  
    theme_map() + 
    # Change the colour and transparency of the plotted occurrence points 
    geom_point(alpha = 0.4, colour = "red")) # alpha: transparent of point

2. Download puff occurrences from Flickr

2.1 Load file

flickr <- read.table("/Users/admin/Desktop/Linh Data Studio/Spatial Data/Spatial Modelling/flickr_puffins.txt", header = T, sep = "\t")

str(flickr)
## 'data.frame':    4939 obs. of  9 variables:
##  $ id       : num  9.11e+07 8.66e+07 4.32e+07 1.92e+08 1.17e+08 ...
##  $ owner    : chr  "69803582@N00" "31561511@N00" "53288778@N00" "36216683@N00" ...
##  $ datetaken: chr  "2006-01-25 11:28:50" "2006-01-14 21:57:42" "2006-01-31 06:02:08" "2006-02-22 16:12:29" ...
##  $ latitude : num  56.4 53.3 59.9 56.1 56.3 ...
##  $ longitude: num  -6.34 -4.06 -1.29 -2.77 -6.37 ...
##  $ page     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ month    : chr  "Jan" "Jan" "Jan" "Feb" ...
##  $ year     : int  2006 2006 2006 2006 2006 2006 2006 2006 2006 2006 ...
##  $ dateonly : chr  "2006-01-25" "2006-01-14" "2006-01-31" "2006-02-22" ...
flickr1<- flickr

We noticed that long-lat is in NUMBER format, to map it we need to change to long-lat format

2.3 Ploting for first look (Data check)

The function coordinates sets spatial coordinates to create a Spatial object or retrieves spatial coordinates from a Spatial object.

library(sp)                                           # load the package
geopics <- flickr[, c(4,5)]                           # subset the dataset to keep coordinates only
coordinates(geopics) <- c("longitude", "latitude")    # make it spatial
plot(geopics)                                         # plot it

# Remove areas not UK
which(flickr$latitude < 49.9)
##  [1]   51  157  268  269  410  575  600  934  940  996 1158 1329 1461 1492 1568
## [16] 1684 1879 2279 2498 2872 2998 3035 3340 3388 3460 3965 4016 4173 4336 4428
## [31] 4474 4548 4552 4682
flickr <- flickr[-which(flickr$latitude< 49.9), ]

2.4 Plot

For flickr data:

class(flickr) # dataframe object
## [1] "data.frame"
# Transform dataframe to Spatial object
library(rgdal)
xy <- flickr[, c(4,5)]
flickr_sp <- SpatialPointsDataFrame(coords = xy, data = flickr, proj4string = CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"))
# coordinates(flickr) <- c("longitude", "latitude")             # go back to original dataframe and make it spatial
crs.geo <- CRS("+proj=longlat +ellps=WGS84 +datum=WGS84")     # geographical, datum WGS84, GCS
proj4string(flickr_sp) <- crs.geo                                # assign the coordinate system
plot(flickr_sp, pch = 20, col = "steelblue")                     # plot the data

Reference

Species occurrence maps based on GBIF and Flickr data