knitr::opts_chunk$set(echo = TRUE)

Assignment 2

Assignment 1/2. Combine:

  • Population (point) data
  • Ports, airports, etc.

Load libraries and data

library(sf) 
## Linking to GEOS 3.11.2, GDAL 3.7.2, PROJ 9.3.0; sf_use_s2() is TRUE
library(spData) 
## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`
library(tidyverse) 
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)

setwd("C:/Users/enman/Documents/BSE/BSE/Term 2/Geospatial Data Science/Assignments/Assignment2")

#Load the data requiered for these following tasks
world_data <- world
airports <- st_read("ne_10m_airports","ne_10m_airports")
## Reading layer `ne_10m_airports' from data source 
##   `C:\Users\enman\Documents\BSE\BSE\Term 2\Geospatial Data Science\Assignments\Assignment2\ne_10m_airports' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 893 features and 40 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -175.1356 ymin: -53.78147 xmax: 179.1954 ymax: 78.24672
## Geodetic CRS:  WGS 84
ports <- st_read("ne_10m_ports","ne_10m_ports")
## Reading layer `ne_10m_ports' from data source 
##   `C:\Users\enman\Documents\BSE\BSE\Term 2\Geospatial Data Science\Assignments\Assignment2\ne_10m_ports' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1081 features and 6 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -171.758 ymin: -54.80944 xmax: 179.3094 ymax: 78.22611
## Geodetic CRS:  WGS 84
locations <- st_read("ne_10m_populated_places","ne_10m_populated_places")
## Reading layer `ne_10m_populated_places' from data source 
##   `C:\Users\enman\Documents\BSE\BSE\Term 2\Geospatial Data Science\Assignments\Assignment2\ne_10m_populated_places' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 7342 features and 137 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -179.59 ymin: -90 xmax: 179.3833 ymax: 82.48332
## Geodetic CRS:  WGS 84
railroads <- st_read("ne_10m_railroads","ne_10m_railroads")
## Reading layer `ne_10m_railroads' from data source 
##   `C:\Users\enman\Documents\BSE\BSE\Term 2\Geospatial Data Science\Assignments\Assignment2\ne_10m_railroads' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 25413 features and 12 fields
## Geometry type: LINESTRING
## Dimension:     XY
## Bounding box:  xmin: -150.1122 ymin: -51.89528 xmax: 179.3578 ymax: 69.60437
## Geodetic CRS:  WGS 84
roads <- st_read("ne_10m_roads","ne_10m_roads")
## Reading layer `ne_10m_roads' from data source 
##   `C:\Users\enman\Documents\BSE\BSE\Term 2\Geospatial Data Science\Assignments\Assignment2\ne_10m_roads' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 56600 features and 31 fields
## Geometry type: MULTILINESTRING
## Dimension:     XY
## Bounding box:  xmin: -166.5325 ymin: -55.11212 xmax: 178.4191 ymax: 71.17768
## Geodetic CRS:  WGS 84

The homework suggested downloading information on ports, airports, and other relevant data, including roads and railroads. To start, let’s create a visual representation to understand what we’re working with. We’ll begin by plotting ports and airports to keep things clear and avoid a mess on the graph. After that, we’ll look at the additional data.

ggplot() +
  geom_sf(data = world_data) + #World shape file
  geom_sf(data = locations, size = 0.5) + #data with points of locations that are populated
  geom_sf(data = airports, aes(color = "Airports"), size = 0.7) +  #data points with all the airports worldwide
  geom_sf(data = ports, aes(color = "Ports"), size= 0.7) +  #data points with all the ports worldwide
  scale_color_manual(values = c("Airports" = "red", "Ports" = "lightblue"),name = "Type of Infrastructure")  # legend

ggplot() +
  geom_sf(data = world_data) + #World shape file
  geom_sf(data = roads, aes(color = "Railroads")) + #data points with all the railroads 
  geom_sf(data = railroads, aes(color = "Roads")) + #data points with all the roads 
   scale_color_manual(values = c( "Railroads" = "blue",
                                "Roads" = "orange" ),name = "Type of Infrastructure") 

After analyzing and taking a look at all of our downloaded data, we are asked to combine the data of world, population (points), ports, airports and others. One interpretation of combining would be putting all the layers togheter which is what we did before. However, under our current assignment it also makes sense to combining the data using st_join, which will be helpful for other tasks.

world_with_airports <- st_join(world_data, airports, join = st_intersects)
world_with_ports <- st_join(world_data, ports, join = st_intersects)
world_with_populated <- st_join(world_data, locations, join = st_intersects)
world_with_railroads <- st_join(world_data, railroads, join = st_intersects)
world_with_roads <- st_join(world_data, roads, join = st_intersects)

Next, we’re tasked with creating a map showing each country’s total population. This task is quite direct because the world dataset already includes population figures. We’ll make two maps: one will simply display the total population for each country, and the other will add points for airports and ports, incorporating the data we’ve already gathered.

ggplot() +  
  geom_sf(data = world, aes(fill = pop)) + 
  labs(fill = "Population") 

ggplot() +
  geom_sf(data = world_data, aes(fill = pop)) + #World shape file
  labs(fill = "Population") +
  geom_sf(data = airports, aes(color = "Airports")) +  #data points with all the airports worldwide
  geom_sf(data = ports, aes(color = "Ports")) +  #data points with all the ports worldwide
  scale_color_manual(values = c("Airports" = "red", "Ports" = "lightblue"),name = "Type of Infrastructure")  # legend

For our next task, we’re going to create a histogram that shows the distribution of country populations by continent. We’ll use the world data we have, which already includes columns for country and continent. We just need to group the data by continent to get the total population for each one.

world_data_group <- world_data %>%
  group_by(continent)
ggplot(world_data_group, aes(x = continent, y = pop, fill = continent)) +
  geom_bar(stat = "identity") +
  theme_minimal() +
  labs(title = "Country Population Distribution by Continent",
       x = "Continent",
       y = "Total Population") +
  scale_fill_brewer(palette = "Paired")
## Warning: Removed 10 rows containing missing values (`position_stack()`).

After running the code, we get the table we were looking for, but we notice that Antarctica and “Seven Seas” show a population of 0, which makes the graph look unbalanced. To fix this, we’ll adjust the grouped data by removing these two continents.

world_data_group <- world_data %>%
 filter(!(continent %in% c("Antarctica","Seven seas (open ocean)")))

ggplot(world_data_group, aes(x = continent, y = pop, fill = continent)) +
  geom_bar(stat = "identity") +
  theme_minimal() +
  labs(title = "Country Population Distribution by Continent",
       x = "Continent",
       y = "Total Population") +
  scale_fill_brewer(palette = "Paired") 
## Warning: Removed 8 rows containing missing values (`position_stack()`).

Removing those last two continents makes the graph clearer and better communicates the data. For more detail, let’s also create individual histograms for each continent.

ggplot(world_data_group, aes(x = pop)) +
  geom_histogram(fill = "orange", 
                 binwidth = 14000000,
                 color = "black") + 
  facet_wrap(~continent, scales = "free_x") + # Creates a separate histogram for each continent
  labs(x = "Population", y = "Frequency") +
  theme_minimal()
## Warning: Removed 8 rows containing non-finite values (`stat_bin()`).

For the final task in this first part, we’re asked to create a graph that shows the average distance from locations to ports or airports, broken down by continent. We’ll use st_join again, along with st_distance, to calculate the distances between locations and these points of interest.

locations_with_continent <- st_join(locations, world, join = st_within)

After joining the data and doing the calculations, let’s get the minimum of those distances and put it into a data frame.

# Compute distances to airports and ports
distances_to_airports <- st_distance(locations, airports)
distances_to_ports <- st_distance(locations, ports)

# Find the minimum distances
min_distances_to_airports <- apply(distances_to_airports, 1, min)
min_distances_to_ports <- apply(distances_to_ports, 1, min)

# Add these distances to your locations_with_continent data frame
locations_with_continent$min_distance_to_airport <- min_distances_to_airports
locations_with_continent$min_distance_to_port <- min_distances_to_ports

locations_with_continent <- locations_with_continent %>% 
  filter((continent %in% c("Africa","Asia","Europe","North America","Oceania",
                           "South America"))) #again let's take out the unused continents

average_distances <- locations_with_continent %>% #Group by to make a summary of all data by continent
  group_by(continent) %>%
  summarise(
    avg_distance_to_airport = mean(min_distance_to_airport, na.rm = TRUE),
    avg_distance_to_port = mean(min_distance_to_port, na.rm = TRUE)
  )

Let’s now graph our histograms by continent, we’ll do one for airports and then one for the ports.

# Histogram for distances to airports
ggplot(locations_with_continent, aes(x = min_distance_to_airport, fill = continent)) +
  geom_histogram(binwidth = 10000, position = "dodge") +
  labs(x = "Distance to Nearest Airport", y = "Number of Locations") +
  facet_wrap(~continent, scales = "free")

# Histogram for distances to ports
ggplot(locations_with_continent, aes(x = min_distance_to_port, fill = continent)) +
  geom_histogram(binwidth = 10000, position = "dodge") +
  labs(x = "Distance to Nearest Port", y = "Number of Locations") +
  facet_wrap(~continent, scales = "free")

After the individual histograms, let’s make a bar plot to have a more general idea of how these continents behave.

ggplot(locations_with_continent, aes(x = continent, y = min_distance_to_port, fill = continent)) +
  geom_bar(stat = "identity") +
  theme_minimal() +
  labs(title = "Mean Distance by Continent by Continent",
       x = "Continent",
       y = "Distance to Port") +
  scale_fill_brewer(palette = "Paired")

ggplot(locations_with_continent, aes(x = continent, y = min_distance_to_airport, fill = continent)) +
  geom_bar(stat = "identity") +
  theme_minimal() +
  labs(title = "Mean Distance by Continent",
       x = "Continent",
       y = "Distance to Airport") +
  scale_fill_brewer(palette = "Paired") 

Part 2

Moving on to the second exercise of this assignment, we need to create a map with representation of marketplaces across Africa and plot them. To do this, we’ve downloaded the Excel files provided in the directory linked to the paper designated for this part of the assignment.

library(readxl)

markets <- st_read("Catchments","MktCatch6Plus5_Dissolve")
## Reading layer `MktCatch6Plus5_Dissolve' from data source 
##   `C:\Users\enman\Documents\BSE\BSE\Term 2\Geospatial Data Science\Assignments\Assignment2\Catchments' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 235 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -1690159 ymin: -4179878 xmax: 4948778 ymax: 3356116
## Projected CRS: World_Behrmann
market_loc <- read_xlsx("MktCoords.xlsx")

africa <- world %>% 
  filter(continent == "Africa") # Because we are only working with africa, it makes sense to isolate it here

ggplot() +
  geom_sf(data = africa, fill = "#FFFFBE") +
  geom_point(aes(x = longitude, y = latitude), 
           data = market_loc, color = "black", size = 1.5) +
  theme(panel.background = element_rect(fill = "white"))

We got a pretty similar graph to figure 3 of the paper, but they removed some countries for the sake of their analysis.

Next we are asked to Calculate minimum distance of each market to the (i) coast, (ii) nearest road, and (iii) nearest airport. For the coast we are going to use ports, because we figured we are already using it and is still a proxy for coast.

markets <- st_as_sf(market_loc, coords = c("longitude", "latitude"), crs = 4326)

dist_to_port <- st_distance(markets, ports)
dist_to_road <- st_distance(markets, roads)
dist_to_airport <- st_distance(markets, airports)

min_dist_to_coast <- apply(dist_to_port, 1, min)
min_dist_to_road <- apply(dist_to_road, 1, min)
min_dist_to_airport <- apply(dist_to_airport, 1, min)
# Add to markets as new columns
markets$dist_to_coast <- min_dist_to_coast
markets$dist_to_road <- min_dist_to_road
markets$dist_to_airport <- min_dist_to_airport

dist_to_coast <- st_distance(markets, ports)
dist_to_road <- st_distance(markets, roads)
dist_to_airport <- st_distance(markets, airports)

# Extract minimum distances
min_distances <- pmin(dist_to_coast, dist_to_airport)
## Warning in pmin(dist_to_coast, dist_to_airport): an argument will be
## fractionally recycled

We are tasked with calculating the mentioned distances and then plotting these distances in a scatter plot against the product prices. We’ll load the Excel file containing the prices. According to the provided readme note, each column after the fifth represents a different month. To simplify our analysis and avoid selecting a specific month, we’ll calculate the average price across all months for each row. This approach gives us a more comprehensive view, minimizes the impact of seasonal variations, and better represents the relationship between distance and prices.

prices <- read_excel("PriceMaster4GAMS.xlsx")
prices$average_price <- rowMeans(prices[, 5:ncol(prices)], na.rm = TRUE)

ggplot(prices, aes(x = crop, y = average_price, fill = crop)) +
  geom_bar(stat = "identity") +
  labs(x = "Product", y = "Average Price", title = "Average Prices by Product") +
  theme_minimal()

market_prices_join <- markets %>%
  left_join(prices, by = "mktcode")

ggplot(market_prices_join, aes(y = average_price, x = dist_to_coast, color = crop)) +
  geom_point() +
  labs(title = "Price vs distance to coast",
       y = "Avg Market Price",
       x = "Distance to coast (meters)") +
  theme_minimal()

ggplot(market_prices_join, aes(y = average_price, x = dist_to_road, color = crop)) +
  geom_point() +
  labs(title = "Price vs distance to road",
       y = "Avg Market Price",
       x = "Distance to coast (meters)") +
  theme_minimal()

ggplot(market_prices_join, aes(y = average_price, x = dist_to_airport, color = crop)) +
  geom_point() +
  labs(title = "Price vs distance to airport",
       y = "Avg Market Price",
       x = "Distance to coast (meters)") +
  theme_minimal()

In these scatter plots, we observe that the relationship between prices and distances significantly differs depending on the type of infrastructure. We also highlighted the type of product in each scatter plot to extract as much information as possible from the data.