• Polygons to Pixels
  • Building Raster data from Polygons
    • A little bit of background on spatial data types
      • Understanding the link between geometry and attributes
    • Generating a grid
      • Discovering the bounding box
      • Deciding how big we want our pixels to be
    • Building a SpatialPointsDataFrame
    • Visualizing Raster vs. Polygon Data

Polygons to Pixels

Building Raster data from Polygons

A little bit of background on spatial data types

There are a variety of ways to use GIS to abstract the world.

Vector Data

Vector data is when points, lines, and polygons are used to condense the natural and built world into geometric shapes. However, it is important to note that any given real-world object can be represented by a GIS (geographic information system) in multiple ways- the location of a cabin might be represented by a point, showing its location- or by a square showing the footprint of the structure itself. Rivers and lakes can be abstracted to lines and polygons, but given that the width of a river might also be of importance, rivers might also be represented as polygons. The level of detail describing the line or polygon also influences the precision with which you can make inferences about shape, size, and relationships between data (is your lake represented using a clean oval, or exploring every nook and cranny of the wetlands?). Finding areas of similarity, or connecting the dots between raster data objects often is one way of condensing and simplifying raster data into patches or lines from a surface of points.

Raster data

Raster data is when a network or mesh of evenly spaced points are used to create a “surface” that describes some attribute of that space. Remotely sensed data using satellites to take primary data like land cover and climate data are often represented this way. They are mapped such that the center of the points form the center of a graph-paper like surface, where the value in the center of the point applies to the entire graph square. This creates the appearance of a surface of squares. The size of these squares (the amount of space each data point is assumed to represent) is refered to as the resolution- if the squares are 5km by 5km, the raster is said to be at a 5km resolution. However, even this is an abstraction-the points are usually sampled such that they are evenly spaced in latitude and longitude, but these “squares” can look more irregular in some areas (such as at the poles) depending on what projection you are using. Alternately, these squares can be created such that they are evenly spaced everywhere within that projection. These spatial data often carry file extensions like .TIFF, .JPEG, although different software packages and governemnt agencies sometimes have their own proprietary formats.

Generating a grid

To generate a raster data set from a polygon data set, we need to generate a grid of points that will serve as the centroids of our pixels.

Discovering the bounding box

To do this, we first need to figure out the bounding box of the data– the spatial extent of the polygon layer in x and y coordinate minimums and maximums. If you need a raster even larger than the extent of your polygon layer, feel free to manually define your minimum and maximum x and y coordinates. This may be helpful if you have other data that has the top-left point in a certain location, and you want to be sure that your new raster data lines up perfectly with the old.

box<-data.table(dim=c("x","y"),bbox(mcnty_map))
box
##    dim      min       max
## 1:   x -2500000 2516373.8
## 2:   y -2500000  732103.3

Now that we have the spatial extent of the raster layer we need to create (it will be a rectangle such that all parts of the polygon will be included), we can generate a mesh or grid of points covering this space in equal intervals. There are two ways conceptually to do this:

Deciding how big we want our pixels to be

1.) We want a certain number of points in either the x or y direction, so we sample such that the distance between the points is always equal.

# Now, we find the maximum range, and say that we want 1,000 points
# between that range. "Steps" will be the interval we use to interpolate
# between the points.
  box[,range:=max-min]
##    dim      min       max   range
## 1:   x -2500000 2516373.8 5016374
## 2:   y -2500000  732103.3 3232103
  steps<-max(box$range)/1000

2.) We want a grid with meaningful, specific grid sizes. For this strategy, we look in the proj4string of our polygon object (it’s projection) to find out what the “units” are. In this case, we have units=m, which means that all of our geographic points are relating to eachother using meters as the unit of measurement. So, if we wanted a 5 km grid, we would set steps=5000, to generate pixels 5km by 5km.

proj4string(mcnty_map)
## [1] "+proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs"
# Now, we are setting the pixel size to be 5k x 5k
  steps<-5000

Now, we sample a grid of those points, by interpolating between the minimum and maximum of the x and y boundaries, using the steps we’ve just discovered. Next, we can use expand.grid, a function in the base package that generates all the combinations of two columns, which will create

interpolated_x_coordinates<-seq(from = box[dim=="x"]$min, to = box[dim=="x"]$max, by = steps)
interpolated_y_coordinates<-seq(from =box[dim=="y"]$min, box[dim=="y"]$max, by = steps)

# This now will have a list of x and y coordinates for the full 5x5 k sampled grid
  sampled_points<-expand.grid(x = interpolated_x_coordinates, 
                              y = interpolated_y_coordinates)
  
  head(sampled_points)
##          x        y
## 1 -2500000 -2500000
## 2 -2495000 -2500000
## 3 -2490000 -2500000
## 4 -2485000 -2500000
## 5 -2480000 -2500000
## 6 -2475000 -2500000
  n_points<-nrow(sampled_points)

Note that the increments for x and y coordinates are going up by 5000 meters.

Now, we will build a SpatialPointsDataFrame using these coordinates.

Building a SpatialPointsDataFrame

At this point, we have a list of coordinates, but x and y values alone don’t work with spatial methods. To manipulate and use this data, we need to generate a spatialPointsDataFrame.

Right now, the object “sampled_points” is a data.frame– we can check this by querying for its class:

class(sampled_points)
## [1] "data.frame"

To do this, we first need to explicitly define the fact that two fo the columns in this data.frame are actually coordinate values. To do this, we designate the x and y columns as coordinates, and thus change the class of the sampled_points object:

  coordinates(sampled_points) <- ~x + y
  class(sampled_points)
## [1] "SpatialPoints"
## attr(,"package")
## [1] "sp"

At this point, R knows that sampled_points is a SpatialPoints object. To make this a SpatialPointsdataFrame object, we need to add some attribute data– we do this by passing in a data.table of a sequential ID variable that we can use to keep track of our points down the line. We also tell it what projection to use– since we created our mesh of points based on the mcnty_map layer, we need to apply the same projection here as well.

sampled_points <- sp::SpatialPointsDataFrame(coords=sampled_points, 
                                             data=data.table(point=seq(1,n_points)))

# setting the proj4string (the projection information) of this layer to be the
# same as the mcnty_map.
proj4string(sampled_points)<-proj4string(mcnty_map)

class(sampled_points)
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"

Excellent- we now have a SpatialPointsDataFrame. At this point, we can “sample” the polygons using our regular, 5km points. This operation is essentially a spatial join– the points that you overlay on top of the polygons inherit the data attributes of the polygons below. Once we have this information, we can apply the polygon data to the points by assigning the information from the overlay into the @data slot of the sampled_points layer.

  overlay<-sp::over(sampled_points,mcnty_map) # overlaying the sampled points onto the mcnty map
  sampled_points@data<-overlay # making a full-on data.table for the spatial object

At this point, we also would like the coordinates (x and y) for our mesh to be saved somewhere more convenient than in the @coords slot of this data object. As such, we can add it back into the data by saving a new data.table with the coordinate information. Given that these coordinates are in the albers equal area projection, I am designating column names that reflect that projection, in case this data becomes separate from the spatial projection information later on. We add the coordinate data back into the data.table using a cbind() operation. We also can save the data as a simple data.table, separate from the Spatial geometry.

  albers_coords<-data.table(sampled_points@coords)[,list(x_albers=x,y_albers=y)]
  sampled_points@data<-cbind(sampled_points@data,albers_coords)
  points_dt<-copy(sampled_points@data)

At this point, we have a raster data format that we can use to map data. Let’s compare this to the polygon version.

Visualizing Raster vs. Polygon Data

Let’s make a chloropleth map using the Woodson library, using the polygon geometry as the vehicle for the data:

wmap(chloropleth_map = mcnty_map,
     data=simulated_us_data[year==2000],
     variable="var1",
     geog_id="mcnty")

Now, let’s try plotting some data by merging in the county data with the points– this will be a simple, rough sketch of the same variable using the 5x5 km point data, being represented as points.

points_dt<-merge(simulated_us_data[year==2000],points_dt,by="mcnty")

ggplot(data=points_dt)+geom_point(aes(x=x_albers,y=y_albers,color=var1))

We can also use ggplot()’s “geom_raster” plotting function, which is even faster than geom_tile– it’s designed to take points of equal spacing.

ggplot(data=points_dt)+geom_raster(aes(x=x_albers,y=y_albers,fill=var1))

Or, we can use the Woodson mapping function, which can now map raster data!

wmaprast(geometry=points_dt,
         xcol="x_albers",
         ycol="y_albers",
         geog_id="mcnty",
         variable="var1",
         map_title="Raster version of Variable 1")

Note that this doesn’t look very different- in part because the sample is so small. Let’s try a similar process, but this time, with much greater grid size.

steps<-50000
  sampled_points<-expand.grid(x = seq(from = box[dim=="x"]$min, to = box[dim=="x"]$max, by = steps), 
                              y = seq(from =box[dim=="y"]$min, box[dim=="y"]$max, by = steps))
  n_points<-nrow(sampled_points)

# We build a SpatialPointsDataFrame using 
  coordinates(sampled_points) <- ~x + y
  sampled_points <- sp::SpatialPointsDataFrame(sampled_points, data.table(point=seq(1,n_points))) # make spatial data frame 
  proj4string(sampled_points)<-proj4string(mcnty_map) # defining the same projection
  overlay<-sp::over(sampled_points,mcnty_map) # overlaying the sampled points onto the mcnty map
  sampled_points@data<-overlay # making afull-on data.table for the spatial object
  albers_coords<-data.table(sampled_points@coords)[,list(x_albers=x,y_albers=y)]
  sampled_points@data<-cbind(sampled_points@data,albers_coords)
  
  wmaprast(geometry=sampled_points@data,
         xcol="x_albers",
         ycol="y_albers",
         geog_id="mcnty",
         data=simulated_us_data[year==2000],
         variable="var1",
         map_title="Coarse Raster version of Variable 1")

Ah. Impressionist!

At this point you could save your new spatialpoints data frame as a Shapefile, or other vector format:

rgdal::writeOGR(obj=sampled_points, dsn=paste0(getwd(),"/outputs/sampled_points.shp"), layer="sampled_points", driver="ESRI Shapefile") 

Alternately, you give R the hint that this is actually a gridded, raster data set, and can switch the spatial format by asserting that gridded()<-TRUE.

  gridded(sampled_points)<-TRUE # making R know that this is a grid of points.
  class(sampled_points)
## [1] "SpatialPixelsDataFrame"
## attr(,"package")
## [1] "sp"