Welcome to the first script of Module 1!
At the example of some agricultural statistics of Armenia, you will here learn how to:

import and edit data
plot data in diagrams
plot data in maps

Please read the following code instructions carefully, try to understand the code that follows each instruction, execute it and see what happens. Do not hesitate to change the code or write your own code, and execute it as well!

1. Import and edit data

We start off with defining our working directory. This should be the folder to which you downloaded the material of module 1. Please change the following path accordingly, and ensure that the folder names are separated by forward slashes, not backward slashes:

dir <- "C:/Users/Max/Desktop/IAMO_neu/eLearning/Module_1_Data_visualization/"

Let’s import the file “all_stats.csv”. We can specify its location by pasting the name to the previously defined object “dir” using the function paste0(). We then import the data with the function read.csv():

dir2 <- paste0(dir, "M01_Part01_Agricultural-Statistics/data/all_stats.csv")
dir2

## [1] "C:/Users/Max/Desktop/IAMO_neu/eLearning/Module_1_Data_visualization/M01_Part01_Agricultural-Statistics/data/all_stats.csv"

data_raw <- read.csv(dir2)

Let’s have a look at the data. The function head() shows us the first 6 lines of “data_raw”:

head(data_raw)

region	crop	indicator	year	value
00_ARMENIA	Winter Wheat	Sown area	2020	57964
Yerevan c.	Winter Wheat	Sown area	2020	NA
Aragatsotn	Winter Wheat	Sown area	2020	4954
Ararat	Winter Wheat	Sown area	2020	1752
Armavir	Winter Wheat	Sown area	2020	3193
Gegharkunik	Winter Wheat	Sown area	2020	10470

We can obtain a list of all unique elements of a specific column in the following way:

unique(data_raw$region)

##  [1] "00_ARMENIA"  "Yerevan c."  "Aragatsotn"  "Ararat"      "Armavir"    
##  [6] "Gegharkunik" "Lori"        "Kotayk"      "Shirak"      "Syunik"     
## [11] "Vayots dzor" "Tavush"

unique(data_raw$crop)

##  [1] "Winter Wheat"     "Winter Barley"    "Spring Wheat"     "Spring Barley"   
##  [5] "Potato"           "Cucumber"         "Tomato"           "Maize for silage"
##  [9] "Pomaceous Fruits" "Stone Fruits"     "Berries"

unique(data_raw$indicator)

## [1] "Sown area"      "Harvested area" "Yield"          "Production"

unique(data_raw$year)

##  [1] 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006
## [16] 2005

To create subsets of “data_raw”, we can use the function filter() from the package dplyr. To use this function, we first need to load the package with the function library(). If you have never installed dplyr before, you have to execute the installation line first. Once you have installed the package, you will not need the installation line again, but you always have to load the package when you had closed R Studio before.

# install.packages("dplyr")
library(dplyr)

Now let’s create a sub-sample of “data_raw” that only contains data on spring wheat:

data_sample1 <- filter(data_raw, crop == "Spring Wheat")
head(data_sample1)

region	crop	indicator	year	value
00_ARMENIA	Spring Wheat	Sown area	2020	1429
Yerevan c.	Spring Wheat	Sown area	2020	NA
Aragatsotn	Spring Wheat	Sown area	2020	134
Ararat	Spring Wheat	Sown area	2020	26
Armavir	Spring Wheat	Sown area	2020	68
Gegharkunik	Spring Wheat	Sown area	2020	460

Other ways to sub-sample the data are:

data_sample2 <- filter(data_raw, crop %in% c("Spring Wheat", "Winter Wheat"))
data_sample3 <- filter(data_raw, crop %in% c("Spring Wheat", "Winter Wheat") == FALSE)
data_sample4 <- filter(data_raw, year > 2019)
data_sample5 <- filter(data_raw, crop != "Spring Wheat", year == 2020, region == "Ararat", indicator == "Sown area")

For the next steps, let us work with a sub-sample that excludes the country-level statistics (“00_ARMENIA”) and the statistics for Yerevan, because we cannot or do not want to display these statistics in the map later:

data <- filter(data_raw, region %in% c("00_ARMENIA", "Yerevan c.") == FALSE)

We also have to re-name “Vayots dzor” to “Vayotz Dzor” for a later step where we join the agricultural statistics with a shapefile:

data$region[data$region=="Vayots dzor"] <- "Vayotz Dzor"

Let us also recalculate the units of the measurements and add a column with explanations:

## convert yields to tonnes per hectare
data$value[data$indicator=="Yield"] <- data$value[data$indicator=="Yield"]/10

## convert production to kilotonnes
data$value[data$indicator=="Production"] <- data$value[data$indicator=="Production"]/1000

## convert area to *1000 ha
data$value[data$indicator=="Sown area"] <- data$value[data$indicator=="Sown area"]/1000
data$value[data$indicator=="Harvested area"] <- data$value[data$indicator=="Harvested area"]/1000

## add units to indicators
data$indicator_long[data$indicator=="Sown area"] <- "Sown area (*1000 ha)"
data$indicator_long[data$indicator=="Harvested area"] <- "Harvested area *1000 (ha)"
data$indicator_long[data$indicator=="Production"] <- "Production (*1000 tonnes)"
data$indicator_long[data$indicator=="Yield"] <- "Yield (tonnes/ha)"

Have a look at the final data subsample:

head(data)

region	crop	indicator	year	value	indicator_long
Aragatsotn	Winter Wheat	Sown area	2020	4.954	Sown area (*1000 ha)
Ararat	Winter Wheat	Sown area	2020	1.752	Sown area (*1000 ha)
Armavir	Winter Wheat	Sown area	2020	3.193	Sown area (*1000 ha)
Gegharkunik	Winter Wheat	Sown area	2020	10.470	Sown area (*1000 ha)
Lori	Winter Wheat	Sown area	2020	7.836	Sown area (*1000 ha)
Kotayk	Winter Wheat	Sown area	2020	3.910	Sown area (*1000 ha)

The data in “all_stats.csv” is already in the right format for creating plots based on it. However, our data is sometimes in a different format. Have a look at the file “pomaceous_fruits.csv” in the folder “yield_by_crop”. As you can see, there is one column for each year:

data_pom_raw <- read.csv(paste0(dir, "M01_Part01_Agricultural-Statistics/data/yield_by_crop/pomaceous_fruits.csv"), sep=";")
head(data_pom_raw)

region	region_type	crop	indicator	year_2020	year_2019	year_2018	year_2017	year_2016	year_2015	year_2014	year_2013	year_2012	year_2011	year_2010	year_2009	year_2008	year_2007	year_2006	year_2005
Aragatsotn	marzes	pomaceous_fruits	crop_capacity	113.0	118.7	181.8	140.6	86.4	228.6	261.4	193.7	213.1	116.9	65.8	263.4	313.2	197.6	191.4	206.3
Ararat	marzes	pomaceous_fruits	crop_capacity	186.6	157.6	152.5	185.0	107.1	143.9	200.2	159.9	168.9	197.1	187.7	214.9	218.4	172.3	194.3	179.9
Armavir	marzes	pomaceous_fruits	crop_capacity	120.6	108.4	107.8	120.9	77.5	105.4	184.5	135.2	138.1	124.9	45.2	189.2	191.7	171.8	224.3	127.9
Gegharkunik	marzes	pomaceous_fruits	crop_capacity	96.3	100.3	90.1	101.9	84.6	128.4	205.0	200.6	195.3	183.1	169.2	169.2	179.1	176.0	133.1	121.2
Lori	marzes	pomaceous_fruits	crop_capacity	37.2	33.5	36.0	45.1	6.6	11.7	8.9	27.5	29.8	16.7	2.8	34.3	35.1	69.3	33.8	24.5
Kotayk	marzes	pomaceous_fruits	crop_capacity	53.5	40.3	68.4	110.0	29.2	58.8	48.6	53.7	49.1	44.0	24.6	76.1	81.4	70.5	43.3	49.2

We can easily re-arrange this table with the function melt() from the package reshape2. The argument “id.vars” specifies all columns that should remain in the new dataframe “data_pom”; the rest is joined into two new columns: the column “variable” contains the column names of the previous object “data_pom_raw”, the column “value” contains the yield (= crop capacity) measurements.

# install.packages("reshape2")
library(reshape2)

data_pom <- melt(data_pom_raw, id.vars = c("region", "region_type", "crop", "indicator"))
head(data_pom)

region	region_type	crop	indicator	variable	value
Aragatsotn	marzes	pomaceous_fruits	crop_capacity	year_2020	113.0
Ararat	marzes	pomaceous_fruits	crop_capacity	year_2020	186.6
Armavir	marzes	pomaceous_fruits	crop_capacity	year_2020	120.6
Gegharkunik	marzes	pomaceous_fruits	crop_capacity	year_2020	96.3
Lori	marzes	pomaceous_fruits	crop_capacity	year_2020	37.2
Kotayk	marzes	pomaceous_fruits	crop_capacity	year_2020	53.5

For clarification, we re-name the last two columns:

names(data_pom)[5:6] <- c("year", "yield")
head(data_pom)

region	region_type	crop	indicator	year	yield
Aragatsotn	marzes	pomaceous_fruits	crop_capacity	year_2020	113.0
Ararat	marzes	pomaceous_fruits	crop_capacity	year_2020	186.6
Armavir	marzes	pomaceous_fruits	crop_capacity	year_2020	120.6
Gegharkunik	marzes	pomaceous_fruits	crop_capacity	year_2020	96.3
Lori	marzes	pomaceous_fruits	crop_capacity	year_2020	37.2
Kotayk	marzes	pomaceous_fruits	crop_capacity	year_2020	53.5

Let’s delete some columns:

data_pom <- data_pom[,-c(2:4)]
head(data_pom)

region	year	yield
Aragatsotn	year_2020	113.0
Ararat	year_2020	186.6
Armavir	year_2020	120.6
Gegharkunik	year_2020	96.3
Lori	year_2020	37.2
Kotayk	year_2020	53.5

Finally, we have to convert the entries in the column “year” to numbers. We do that by subsetting the text entries, and then converting them to numbers:

data_pom$year <- as.numeric(substr(data_pom$year, 6, 9))
head(data_pom)

region	year	yield
Aragatsotn	2020	113.0
Ararat	2020	186.6
Armavir	2020	120.6
Gegharkunik	2020	96.3
Lori	2020	37.2
Kotayk	2020	53.5

2. Plot data in diagrams

There are of course many different ways in which we could visualize the agricultural statistics contained in the object “data”. Here, we focus on creating bar plots to compare the production amounts between marzes and over time.

To create diagrams, we will almost always use the function ggplot() from the package ggplot2. Please install this package and then load it:

# install.packages("ggplot2")
library(ggplot2)

To visualize the production amounts of winter wheat in 2020 among all marzes, we create a new sub-sample, accordingly:

prod_wwheat <- filter(data, crop == "Winter Wheat", year == 2020, indicator == "Production")
prod_wwheat

region	crop	indicator	year	value	indicator_long
Aragatsotn	Winter Wheat	Production	2020	118.201	Production (*1000 tonnes)
Ararat	Winter Wheat	Production	2020	68.485	Production (*1000 tonnes)
Armavir	Winter Wheat	Production	2020	135.460	Production (*1000 tonnes)
Gegharkunik	Winter Wheat	Production	2020	187.817	Production (*1000 tonnes)
Lori	Winter Wheat	Production	2020	221.201	Production (*1000 tonnes)
Kotayk	Winter Wheat	Production	2020	73.759	Production (*1000 tonnes)
Shirak	Winter Wheat	Production	2020	316.660	Production (*1000 tonnes)
Syunik	Winter Wheat	Production	2020	140.509	Production (*1000 tonnes)
Vayotz Dzor	Winter Wheat	Production	2020	3.499	Production (*1000 tonnes)
Tavush	Winter Wheat	Production	2020	25.869	Production (*1000 tonnes)

We create a first plot by specifying the data object, and by defining in the function aes() what should be on the x and y axes. We then add with the sign + a new line of code that contains the function geom_col() to draw the bars:

ggplot(data=prod_wwheat, aes(x=region, y=value)) +
  geom_col()

In the function aes(), we can specify additional arguments. For example, we can give each bar a different color:

ggplot(data=prod_wwheat, aes(x=region, y=value, fill=region)) +
  geom_col()

By default, R uses a rainbow color scale to color the bars. We can change that by supplying an own vector of colors by using words or HEX codes. By the way, the order in which you specify the layers (almost) does not matter. Just make sure that ggplot() is always the first line.

ggplot(data=prod_wwheat, aes(x=region, y=value, fill=region)) +
  geom_col() +
  scale_fill_manual(values=c("red", "#E69F00", "#56B4E9", "green", "orange", "yellow", "blue", "steelblue", "grey", "black"))

Let’s include a title, change the y axis name, and delete the x axis name:

ggplot(data=prod_wwheat, aes(x=region, y=value, fill=region)) +
  geom_col() +
  ggtitle("Production of winter wheat in 2020, in kilotonnes") +
  ylab("production (t*1000)") +
  xlab("")

You can also save plots as objects. This saves you lines of code if you want to add more layers to the plot later.

plot1 <- ggplot(data=prod_wwheat, aes(x=region, y=value, fill=region)) +
  geom_col() +
  ggtitle("Production of winter wheat in 2020, in kilotonnes") +
  xlab("") +
  ylab("production (t*1000)")

Add an additional layer to “plot1” that results in rotated x axis labels:

plot1 + theme(axis.text.x = element_text(angle = 45, hjust=1))

There are many more function for changing the appearance of a plot. You will get to know them step by step!

You can easily export plots with the function png(), followed by the function dev.off(). To export the plot, let’s increase the text size in the function theme():

png(paste0(dir, "plot1.png"), height = 600, width = 800)
 plot1 + theme(axis.text.x = element_text(angle = 45, hjust=1),
               text = element_text(size = 20))
dev.off()

## png 
##   2

Assume we do not just want to plot the 2020 data of one crop, but of all crops, at once. To do so, we first need to draw the respective sub-sample from our dataset:

prod_allcrops <- filter(data, year == 2020, indicator == "Production")

We can now use facetting to easily create a multiplot that contains a separate plot for each crop, by specifying “crop” in the function facet_wrap(). The argument “scales=”free"" allows different y axes for all plots. You can ignore the warning message “Removed 6 rows containing missing values (position_stack).”:

ggplot(data=prod_allcrops, aes(x=region, y=value, fill=region)) +
  geom_col() +
  ggtitle("Production in 2020, in kilotonnes") +
  xlab("") +
  ylab("production (t*1000)") +
  theme(axis.text.x = element_text(angle = 45, hjust=1)) +
  facet_wrap(~crop, scales="free")

## Warning: Removed 6 rows containing missing values (position_stack).

We can also create grouped bar plots, in which we visualize the evolution of the production amount over time. For this, We choose the data from 2016 to 2020 from Aragatsotn, Ararat and Armavir, and create four facets - one for berries, potato, cucumber and tomato, respectively:

prod_4crops <- filter(data, year >= 2016, crop %in% c("Berries", "Potato", "Cucumber", "Tomato"), region %in% c("Aragatsotn", "Ararat", "Armavir"), indicator == "Production")

ggplot(data=prod_4crops, aes(x=year, y=value, fill=region)) +
  geom_bar(position="dodge", stat="identity") +
  ggtitle("Production by crop and marz, in kilotonnes") +
  xlab("") +
  ylab("production (t*1000)") +
  theme(axis.text.x = element_text(angle = 45, hjust=1)) +
  facet_wrap(~crop, scales="free")

3. Plot data in maps

For each crop, we want to create maps of the long-year mean yield, production and sown area. To do so, we first need to calculate these means for each marz and each crop. To calculate means across groups that are defined by parameter combinations, we can use the functions group_by() and summarize() from the package dplyr. You can ignore the error message “summarise() has grouped output by ‘region’, ‘crop’. You can override using the .groups argument.” The resulting object “data_means_raw” is a tibble object, that we convert to a dataframe:

data_means_raw <- data %>% group_by(region, crop, indicator_long) %>% summarize(mean=mean(na.omit(value)))

## `summarise()` has grouped output by 'region', 'crop'. You can override using the `.groups` argument.

data_means <- as.data.frame(data_means_raw)
head(data_means)

region	crop	indicator_long	mean
Aragatsotn	Berries	Harvested area *1000 (ha)	0.8195625
Aragatsotn	Berries	Production (*1000 tonnes)	33.9123125
Aragatsotn	Berries	Sown area (*1000 ha)	0.8205333
Aragatsotn	Berries	Yield (tonnes/ha)	5.1143750
Aragatsotn	Cucumber	Harvested area *1000 (ha)	0.1407500
Aragatsotn	Cucumber	Production (*1000 tonnes)	35.2226875

Let’s create a sub-sample of “data_means” that only contains tomato yields:

yield_tomato <- filter(data_means, crop == "Tomato", indicator_long == "Yield (tonnes/ha)")
yield_tomato

region	crop	indicator_long	mean
Aragatsotn	Tomato	Yield (tonnes/ha)	28.85250
Ararat	Tomato	Yield (tonnes/ha)	46.07250
Armavir	Tomato	Yield (tonnes/ha)	44.72937
Gegharkunik	Tomato	Yield (tonnes/ha)	12.02250
Kotayk	Tomato	Yield (tonnes/ha)	11.73875
Lori	Tomato	Yield (tonnes/ha)	5.79875
Shirak	Tomato	Yield (tonnes/ha)	19.76500
Syunik	Tomato	Yield (tonnes/ha)	14.40563
Tavush	Tomato	Yield (tonnes/ha)	8.92125
Vayotz Dzor	Tomato	Yield (tonnes/ha)	18.01312

We can import a shapefile with the function readOGR() from the package rgdal:

# install.packages("rgdal")
library(rgdal)

Again, we have to specify the file location in the function readOGR():

marzes <- readOGR(paste0(dir, "/shapefiles/ARM_marzes_short_simplified_500.shp"), verbose=F)

We can easily plot the polygons of the shapefile “marzes”:

plot(marzes)

The object “marzes” is of the type “SpatialPolygonsDataFrame”, and therefore has a particular data structure. We can access certain specific information such as the attribute table or the coordinate system by calling the slots “data” and “proj4string”, indicated by the sign “@”:

class(marzes)

## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"

summary(marzes)

## Object of class SpatialPolygonsDataFrame
## Coordinates:
##         min       max
## x  369676.6  641124.7
## y 4299853.9 4572195.5
## Is projected: TRUE 
## proj4string :
## [+proj=utm +zone=38 +datum=WGS84 +units=m +no_defs]
## Data attributes:
##   div_short        
##  Length:11         
##  Class :character  
##  Mode  :character

marzes@data

	div_short
0	Syunik
1	Vayotz Dzor
2	Ararat
3	Armavir
4	Gegharkunik
5	Yerevan
6	Aragatsotn
7	Kotayk
8	Shirak
9	Tavush
10	Lori

marzes@proj4string

## CRS arguments:
##  +proj=utm +zone=38 +datum=WGS84 +units=m +no_defs

If you want to know the coordinates of all vertices of one polygon, you can call them by executing:

head(marzes@polygons[[1]]@Polygons[[1]]@coords)

##          [,1]    [,2]
## [1,] 562430.1 4383917
## [2,] 559782.3 4383971
## [3,] 559473.9 4385158
## [4,] 561113.8 4388292
## [5,] 561405.4 4392555
## [6,] 562535.2 4395234

As you have seen, the attribute table of “marzes” only contains the names of the marzes. To visualize yields in a map, we first have to populate the attribute table with this information. So, let’s join “marzes” and “yield_tomato”. We can do this using the function geo_join() from the package tigris, specifying the column names in “marzes” and “yield_tomato” on which the join should be based on. The resulting object “marzes_yield_tomato” is again a SpatialPolygonsDataFrame.

# install.packages("tigris")
library(tigris)

marzes_yield_tomato <- geo_join(marzes, yield_tomato, "div_short", "region", how = 'left')
marzes_yield_tomato@data

	div_short	region	crop	indicator_long	mean
0	Syunik	Syunik	Tomato	Yield (tonnes/ha)	14.40563
1	Vayotz Dzor	Vayotz Dzor	Tomato	Yield (tonnes/ha)	18.01312
2	Ararat	Ararat	Tomato	Yield (tonnes/ha)	46.07250
3	Armavir	Armavir	Tomato	Yield (tonnes/ha)	44.72937
4	Gegharkunik	Gegharkunik	Tomato	Yield (tonnes/ha)	12.02250
5	Yerevan	NA	NA	NA	NA
6	Aragatsotn	Aragatsotn	Tomato	Yield (tonnes/ha)	28.85250
7	Kotayk	Kotayk	Tomato	Yield (tonnes/ha)	11.73875
8	Shirak	Shirak	Tomato	Yield (tonnes/ha)	19.76500
9	Tavush	Tavush	Tomato	Yield (tonnes/ha)	8.92125
10	Lori	Lori	Tomato	Yield (tonnes/ha)	5.79875

We can now create a map of Armenia with each marz colored according to the long-year tomato yield, using the function spplot() from the package sp. We specify the name of the column on which the coloring should be based on:

# install.packages("sp")
library(sp)
spplot(marzes_yield_tomato, "mean")

Let’s customize the map. We use the color scale “rocket” from the package viridis, supply a title, and change the label size in the legend:

# install.packages("viridis")
library(viridis)

## Loading required package: viridisLite

spplot(marzes_yield_tomato, "mean", 
       col.regions = rev(rocket(21)[6:21]), 
       main=list(cex=1, label="Yield of tomatoes, mean 2005-2020"), 
       colorkey=list(labels=list(cex=0.8)))

You can find more information on nice color scales here:

So far, we have only created one map for the yield of tomatoes. We can use a for-loop to create several yield maps at a time, and then combine them to one figure. However first, we have to re-arrange our data to be able to join the yields of all crops with our shapefile at once. We first select all yield data from “data_means”, and then use the function dcast() from reshape2, which does the opposite of melt(), that is create several columns out of one. We then join the resulting table “data_means_yield_bycrop” with “marzes”. The resulting object “marzes_yields” contains one column for each crop.

data_means_yield <- filter(data_means, indicator_long == "Yield (tonnes/ha)")
data_means_yield_bycrop <- dcast(data_means_yield, region~crop)

## Using mean as value column: use value.var to override.

data_means_yield_bycrop

region	Berries	Cucumber	Maize for silage	Pomaceous Fruits	Potato	Spring Barley	Spring Wheat	Stone Fruits	Tomato	Winter Barley	Winter Wheat
Aragatsotn	5.114375	22.610000	5.41125	18.074375	22.25250	2.238750	2.285625	6.330000	28.85250	2.381875	2.426250
Ararat	13.430000	39.010625	24.91692	17.664375	30.18688	2.898125	3.256000	11.126250	46.07250	3.433750	3.877500
Armavir	9.918125	31.996875	25.04000	13.583750	32.74813	2.838125	3.417778	11.237500	44.72937	2.968125	3.535000
Gegharkunik	5.507500	10.106875	11.67000	14.583750	19.63000	2.240000	2.300625	3.648125	12.02250	2.800000	2.478750
Kotayk	3.662500	10.239375	21.58583	5.629375	19.96688	1.793750	1.732500	3.329375	11.73875	1.496154	1.957500
Lori	1.967500	5.768125	18.38062	2.830000	14.06250	2.123750	2.188750	1.975625	5.79875	1.682500	2.560625
Shirak	5.461875	19.366875	19.49250	11.175625	23.30937	2.203750	2.420625	8.424375	19.76500	2.033333	2.536250
Syunik	14.062500	13.213750	23.30000	4.919375	16.85625	1.908125	1.763125	2.808750	14.40563	2.058125	2.126250
Tavush	6.765000	9.823125	17.56545	2.883125	10.30125	1.815000	1.695000	5.715625	8.92125	1.981250	2.053125
Vayotz Dzor	1.081875	16.463750	4.49500	3.809375	16.78000	1.720625	1.653125	2.279375	18.01312	1.502500	2.050625

marzes_yields <- geo_join(marzes, data_means_yield_bycrop, "div_short", "region", how = 'left')

## Warning: We recommend using the dplyr::*_join() family of functions instead.

marzes_yields@data

	div_short	region	Berries	Cucumber	Maize.for.silage	Pomaceous.Fruits	Potato	Spring.Barley	Spring.Wheat	Stone.Fruits	Tomato	Winter.Barley	Winter.Wheat
0	Syunik	Syunik	14.062500	13.213750	23.30000	4.919375	16.85625	1.908125	1.763125	2.808750	14.40563	2.058125	2.126250
1	Vayotz Dzor	Vayotz Dzor	1.081875	16.463750	4.49500	3.809375	16.78000	1.720625	1.653125	2.279375	18.01312	1.502500	2.050625
2	Ararat	Ararat	13.430000	39.010625	24.91692	17.664375	30.18688	2.898125	3.256000	11.126250	46.07250	3.433750	3.877500
3	Armavir	Armavir	9.918125	31.996875	25.04000	13.583750	32.74813	2.838125	3.417778	11.237500	44.72937	2.968125	3.535000
4	Gegharkunik	Gegharkunik	5.507500	10.106875	11.67000	14.583750	19.63000	2.240000	2.300625	3.648125	12.02250	2.800000	2.478750
5	Yerevan	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
6	Aragatsotn	Aragatsotn	5.114375	22.610000	5.41125	18.074375	22.25250	2.238750	2.285625	6.330000	28.85250	2.381875	2.426250
7	Kotayk	Kotayk	3.662500	10.239375	21.58583	5.629375	19.96688	1.793750	1.732500	3.329375	11.73875	1.496154	1.957500
8	Shirak	Shirak	5.461875	19.366875	19.49250	11.175625	23.30937	2.203750	2.420625	8.424375	19.76500	2.033333	2.536250
9	Tavush	Tavush	6.765000	9.823125	17.56545	2.883125	10.30125	1.815000	1.695000	5.715625	8.92125	1.981250	2.053125
10	Lori	Lori	1.967500	5.768125	18.38062	2.830000	14.06250	2.123750	2.188750	1.975625	5.79875	1.682500	2.560625

We now iterate over each column of “marzes” that contains yield data, and each time create a plot that we save in a plot list. In each step, we take the column name i+2 and save it as object “crop”. We use “crop” to tell the function spplot() on which column to base the coloring on, and also use it as title of the respective map (argument “label” in “main=list()”). We then use the function grid.arrange() from the package gridExtra to bind the first six plots contained in “plot_list” into one figure. We give that figure an overall title in the function textGrob() that comes with the package grid.

library(gridExtra)

## 
## Attaching package: 'gridExtra'

## The following object is masked from 'package:dplyr':
## 
##     combine

library(grid)

plot_list <- list()

for (i in 1:10)
{ crop <- names(marzes_yields@data)[i+2]
  plot_list[[i]] <- spplot(marzes_yields, crop, col.regions = rev(rocket(21)[6:21]),
                           main=list(cex=1, label=crop, colorkey=list(labels=list(cex=0.8))))
}

plot_list[[1]]

grid.arrange(plot_list[[1]], plot_list[[2]], plot_list[[3]], plot_list[[4]], plot_list[[5]], plot_list[[6]], ncol=2,
               top = textGrob("Mean yield from 2005 to 2020", vjust = 1, gp = gpar(fontface = "bold", cex = 1.5)))

Let’s export the plot with maps of the first 6 crops as a PNG:

png(paste0(dir, "plot2.png"), height=1000, width=600)
  grid.arrange(plot_list[[1]], plot_list[[2]], 
               plot_list[[3]], plot_list[[4]], 
               plot_list[[5]], plot_list[[6]], ncol=2,
               top = textGrob("Mean yield from 2005 to 2020", 
                              vjust = 1, 
                              gp = gpar(fontface = "bold", cex = 1.5)))
dev.off()

## png 
##   2

Congratulations, you made it to the end of the first script of module 1! You can now solve the exercises 1-5 (in “M01_Exercises_Assignment.Rmd”), or go to the next script about visualizing phenological observation data (“M01_Script02_Phenological-Observations.Rmd”).

Module 1: Data visualization
Script 01: Agricultural Statistics

Max Hofmann

1. Import and edit data

2. Plot data in diagrams

3. Plot data in maps

Module 1: Data visualization Script 01: Agricultural Statistics

Max Hofmann

1. Import and edit data

2. Plot data in diagrams

3. Plot data in maps

Module 1: Data visualization
Script 01: Agricultural Statistics