Introduction

Every day, millions of geographical data in various formats are generated from many and diverse sources. Because not all softwares use the same file formats, this massive volume of data may become trapped inside specific softwares and unable to be transferred. This can be aggravating because not all data analysts utilize the same data management and analysis software, which can lead to inadequacy in the generation of valuable information.


Content Overview (Summarization of a tutorial)

Fortunately, most applications, including R, can import and export data in a variety of formats. We’ll go over some of the R packages and functions that may be used to import popular file formats (csv) into R as a Shapefile in the sections below. There is no need for external file converters with these packages and functions; R can do the work!!

Neon teaching data subset (site layout shapefiles) will be used as a base dataset for this tutorial.


Why You Should Care

This topic is important since we can save more time dealing with alternative data file formats by using these packages and functions rather than externally converting the files to be compatible with R.


Learning Objectives

Specifically, you’ll learn how to install, load and use these useful packages and functions to convert a spatial csv (Comma Separated Value) file to a shapefile in R.



Importing Data into R

Here, we’ll show more details and some examples about the packages and functions on how to import spatial points stored in csv format into R as an sf spatial object to be used for further spatial analysis and mapping.

When spatial data is recorded in a text file format (for example, csv) with a corresponding x and y location column. It’s possible to turn it into a sf spatial object. The sf object stores both the x,y values that reflect each point’s coordinate location and the related attribute data - or columns that describe each feature in the spatial object.


Installation

# Packages:

# to install:
# raster: install.packages("raster")
# rgdal: install.packages("rgdal")
# sp: install.packages("sp")
# ggplot2: install.packages("ggplot2")

library( raster ) 
library( rgdal )
library( sp )
library( ggplot2 )



1. Dataset and Working directory

# Download dataset:

# https://ndownloader.figshare.com/files/3708751

# set working directory to data folder
# setwd("pathToDirHere")

getwd()
## [1] "C:/Users/HP/Dimetry/PEDA - ASU-AGFE/Semester (4) - Spring 2022/CPP 529 - Community Analytics (Data Analytics Practicum)/Explainer Assignment (Code-Through)"
setwd( "C:/Users/HP/Dimetry/PEDA - ASU-AGFE/Semester (4) - Spring 2022/CPP 529 - Community Analytics (Data Analytics Practicum)/Explainer Assignment (Code-Through)" )


2. Import csv file

Import a csv file that contains plot coordinate x, y locations at the NEON Harvard Forest Field Site (HARV_PlotLocations.csv) and look at the structure of that new object.

# Read the csv file
plot_locations_HARV <-
  read.csv( "NEON-DS-Site-Layout-Files/HARV/HARV_PlotLocations.csv" )

# look at the data structure
str( plot_locations_HARV )
## 'data.frame':    21 obs. of  16 variables:
##  $ easting   : num  731405 731934 731754 731724 732125 ...
##  $ northing  : num  4713456 4713415 4713115 4713595 4713846 ...
##  $ geodeticDa: chr  "WGS84" "WGS84" "WGS84" "WGS84" ...
##  $ utmZone   : chr  "18N" "18N" "18N" "18N" ...
##  $ plotID    : chr  "HARV_015" "HARV_033" "HARV_034" "HARV_035" ...
##  $ stateProvi: chr  "MA" "MA" "MA" "MA" ...
##  $ county    : chr  "Worcester" "Worcester" "Worcester" "Worcester" ...
##  $ domainName: chr  "Northeast" "Northeast" "Northeast" "Northeast" ...
##  $ domainID  : chr  "D01" "D01" "D01" "D01" ...
##  $ siteID    : chr  "HARV" "HARV" "HARV" "HARV" ...
##  $ plotType  : chr  "distributed" "tower" "tower" "tower" ...
##  $ subtype   : chr  "basePlot" "basePlot" "basePlot" "basePlot" ...
##  $ plotSize  : int  1600 1600 1600 1600 1600 1600 1600 1600 1600 1600 ...
##  $ elevation : num  332 342 348 334 353 ...
##  $ soilTypeOr: chr  "Inceptisols" "Inceptisols" "Inceptisols" "Histosols" ...
##  $ plotdim_m : int  40 40 40 40 40 40 40 40 40 40 ...


Regularly csv file with spatial data will contain columns labeled:

  • “X” and “Y” OR

  • Latitude and Longitude OR

  • easting and northing (UTM coordinates)

# view column names
names( plot_locations_HARV )
##  [1] "easting"    "northing"   "geodeticDa" "utmZone"    "plotID"    
##  [6] "stateProvi" "county"     "domainName" "domainID"   "siteID"    
## [11] "plotType"   "subtype"    "plotSize"   "elevation"  "soilTypeOr"
## [16] "plotdim_m"



3. Identify X,Y Location Columns

After viewing the column names, we can see that our data.frame that contains several fields that might contain spatial information. The easting and northing columns contain these coordinate values.

# view first 6 rows of the X and Y columns
head( plot_locations_HARV$easting )
## [1] 731405.3 731934.3 731754.3 731724.3 732125.3 731634.3
head( plot_locations_HARV$northing )
## [1] 4713456 4713415 4713115 4713595 4713846 4713295


We have coordinate values in our data frame. In order to convert our data frame to an sf object, we also need to know the CRS associated with those coordinate values.

There are several ways to figure out the CRS of spatial data in text format.

We can check the file metadata in hopes that the CRS was recorded in the data. We can explore the file itself to see if CRS information is embedded in the file header or somewhere in the data columns. Following the easting and northing columns, there is a geodeticDa and a utmZone column. These appear to contain CRS information (datum and projection).

# view first 6 rows of the X and Y columns
head( plot_locations_HARV$geodeticDa )
## [1] "WGS84" "WGS84" "WGS84" "WGS84" "WGS84" "WGS84"
head( plot_locations_HARV$utmZone )
## [1] "18N" "18N" "18N" "18N" "18N" "18N"

It is not typical to store CRS information in a column. But this particular file contains CRS information this way. The geodeticDa and utmZone columns contain the information that helps us determine the CRS:

geodeticDa: WGS84 – this is geodetic datum WGS84 utmZone: 18

Then, let’s import the roads layer from Harvard forest and check out its CRS.

# Import the line shapefile
point_HARV <- readOGR( "NEON-DS-Site-Layout-Files/HARV/HARV_roads.shp", "HARV_roads")
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\HP\Dimetry\PEDA - ASU-AGFE\Semester (4) - Spring 2022\CPP 529 - Community Analytics (Data Analytics Practicum)\Explainer Assignment (Code-Through)\NEON-DS-Site-Layout-Files\HARV\HARV_roads.shp", layer: "HARV_roads"
## with 13 features
## It has 15 fields


# view extent
extent( point_HARV )
## class      : Extent 
## xmin       : 730741.2 
## xmax       : 733295.5 
## ymin       : 4711942 
## ymax       : 4714260


st_crs( point_HARV )
## Coordinate Reference System:
##   User input: WGS 84 / UTM zone 18N 
##   wkt:
## PROJCRS["WGS 84 / UTM zone 18N",
##     BASEGEOGCRS["WGS 84",
##         DATUM["World Geodetic System 1984",
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4326]],
##     CONVERSION["UTM zone 18N",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",0,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",-75,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.9996,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",500000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     ID["EPSG",32618]]


The output above shows that the points shapefile is in UTM zone 18N. We can thus use the CRS from that spatial object to convert our non-spatial dataframe into an sf object.

Next, let’s create a crs object that we can use to define the CRS of our sf object when we create it.

utm18nCRS <- st_crs( point_HARV )

utm18nCRS
## Coordinate Reference System:
##   User input: WGS 84 / UTM zone 18N 
##   wkt:
## PROJCRS["WGS 84 / UTM zone 18N",
##     BASEGEOGCRS["WGS 84",
##         DATUM["World Geodetic System 1984",
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4326]],
##     CONVERSION["UTM zone 18N",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",0,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",-75,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.9996,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",500000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     ID["EPSG",32618]]


class( utm18nCRS )
## [1] "crs"


4. csv to sf object

Next, let’s convert our data.frame into an sf object. To do this, we need to specify:

  1. The columns containing X (easting) and Y (northing) coordinate values
  2. The CRS that the column coordinate represent (units are included in the CRS) - stored in our utmCRS object.
# We will use the st_as_sf() function to perform the conversion

plot_locations_sp_HARV <- st_as_sf(plot_locations_HARV, coords = c( "easting", "northing" ), crs = utm18nCRS )


# We should double check the CRS to make sure it is correct

st_crs( plot_locations_sp_HARV )
## Coordinate Reference System:
##   User input: WGS 84 / UTM zone 18N 
##   wkt:
## PROJCRS["WGS 84 / UTM zone 18N",
##     BASEGEOGCRS["WGS 84",
##         DATUM["World Geodetic System 1984",
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4326]],
##     CONVERSION["UTM zone 18N",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",0,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",-75,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.9996,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",500000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     ID["EPSG",32618]]

5. Plot Spatial Object

We now have a spatial R object, we can plot our newly created spatial object.

ggplot() +
  geom_sf( data = plot_locations_sp_HARV ) +
  ggtitle( "Map of Plot Locations" )