Introduction

The world population today is about 7.5 billion and is growing by an annual 83 million approximately. The global population is expected to reach approximately 10 billion in 2050 and 11 billion by 2100. (Sustainablescale 2016) This increase in the world population has been accompanied by trends like increased rural to urban migration, longer life expectancy but decreased fertility rates with better economic development.(Sustainablescale 2016)

A variety of geographic, demographic and socio-economic factors have contributed to this massive rate of increase in the human population. Among them, a few primary factors are improved healthcare annd vaccination, control of large scale epidemics, better economic and environmental living conditions and lesser war conflicts. The effect of these factors are demonstrated in the changes of national birth and death rates, life expectancy and infant mortality rates of each nation. Correlating this information with various economic factors like economic growth and healthcare spending provides insights into how these factors affect the population growth and have been previously studied in quite detail.(WorldBank 2016, Sustainablescale (2016))

However, the food production which is likely to be a strong factor in population growth but has recieved less attention. With more mouths to feed as the population increases, there must be a considerable increase in food production with a more effective geographical distribution & consumption of resources. Though a causal relation between poulation growth and crop production is very difficult to establish, the presence of a strong correlation can be established quite easily.

Historically, the factors affecting increased food production were the advancement in soil sciences, advanced farming techniques like ploughing, better weather predictions and increase in yield with better seed quality and use of fertilizers and pesticides.(Hartemink 2007) An example of a study conducted to understand the link between population growth and food production was done in Indonesia where three regions were studied, viz. Java, Sumatra and Borneo. Java had a mean population of 316 people per Km. Sq. whereas Sumatra and Borneo had only 11 people per Sq.Km. The difference was in the quality of the soil, Java was primary the volcanic soil that was more fertile and had a better impact on the irrigation water that supported the growth of rice. However, this was changed in last few decades as the transportation system has changed the direct correlation of crop production and human settlements. Rapid urbanization also has certain cultural impact of this. In this article, we modeled the relationship between population increase and its consequent demand on food production.

Methods

Our primary objective was to build up a model that correlates the world population with the world food production. Based on our model, we wanted to predict the world food demand based on the projected population. A description of the required datasets and the corresponding measures and procedures involved is provided below.

Datasets

We have used datasets from multiple sources to obtain information about the world population and total crop production as shown below.

World population

We obtain population trends and the factors affecting it from the “IDB” (IDB 2016) datasets. The IDB datasets provide demographic information about each country with its total mid-year population, land area, crude birth and death rates, infant mortality rates and life expectancy, age-specific fertility rates as well as net migration and growth rates.

Food Production

The “FAOstats” (FAO 2016) dataset provides information on food production including total annual production of various crops with groupings of cereals, vegetables, fruits, etc. In addition, the total arable land and yields per hectare for each country is also provided for the last 50 years.

Measures and Procedures

We used the IDB datasets to analyze the demographic and socio-economic factors affecting the world population trends, its geographical distributions and future population estimates for the 21st century. For the world food production, we used the FAOstats datasets to do a longitudinal study of the major crop production of every nation, their average annual yield and usage of arable land. Then, we correlated the human population growth of each country and their annual food production and tried to build up a model. Finally, we predict the world food demand based on our model and the projections of the world population. We used R (3.4.0, R Core Team 2017) and the R-packages bindrcpp (0.2, Muller 2017), devtools (1.13.3, Wickham and Chang 2017), dplyr (0.7.2, Wickham et al. 2017), ggplot2 (2.2.1, Wickham 2009), lattice (0.20.35, Sarkar 2008), mapdata (2.2.6, Richard A. Becker and Ray Brownrigg. 2016), maps (3.2.0, Richard A. Becker, Ray Brownrigg. Enhancements by Thomas P Minka, and Deckmyn. 2017), papaja (0.1.0.9492, Aust and Barth 2017), purrr (0.2.3, Henry and Wickham 2017), readr (1.1.1, Wickham, Hester, and Francois 2017), stringr (1.2.0, Wickham 2017a), tibble (1.3.3, Muller and Wickham 2017), tidyr (0.7.0, Wickham 2017b), and tidyverse (1.1.1, Wickham 2017c) for all our data analysis and data visualization.

Results

We began by loading the data for population and crop production from the IDB (IDB 2016) and FAO (FAO 2016) datasets respectively. The normalized data for all production crops is loaded in the Production_Crops data frame and is used as a measure for direct and indirect consumption. Hence, the need to analyze the processed crops and livestock is eliminated. The population data from the IDB datasets is loaded in the IDBext001 and IDBextCTYS data frames. The IDBext001 data frame shows the mid-year population of each country with its country code name and IDBextCTYS data frame provides a lookup table for the country code names and their total land area. In the following subsections, we analyze these datasets in greater detail.

# FAOstats - Production of all crops
Production_Crops <- read_csv("C:/Harrisburg University Classes/Sem 1 - Summer 2017/ANLY-500/Term Project/FAOSTAT/FAOStats Code/Production_Crops_E_All_Data_(Normalized).csv",
                             na = "NA")
## Parsed with column specification:
## cols(
##   `Area Code` = col_integer(),
##   Area = col_character(),
##   `Item Code` = col_integer(),
##   Item = col_character(),
##   `Element Code` = col_integer(),
##   Element = col_character(),
##   `Year Code` = col_integer(),
##   Year = col_integer(),
##   Unit = col_character(),
##   Value = col_double(),
##   Flag = col_character()
## )
# IDB - Population Data by Country Code & Year
IDBext001 <- read_delim("C:/Harrisburg University Classes/Sem 1 - Summer 2017/ANLY-500/Term Project/FAOSTAT/FAOStats Code/IDBext001.txt", 
                           "|", quote = "\\\"", escape_backslash = TRUE,
                        col_names = c("CTY", "YR", "BSEX"),
                        na = "-9", trim_ws = TRUE)
## Parsed with column specification:
## cols(
##   CTY = col_character(),
##   YR = col_integer(),
##   BSEX = col_integer()
## )
# IDB - Data of Country Code, Country Name and Land Suface Area  
IDBextCTYS <- read_delim("C:/Harrisburg University Classes/Sem 1 - Summer 2017/ANLY-500/Term Project/FAOSTAT/FAOStats Code/IDBextCTYS.txt", 
                         "|", quote = "\\\"", escape_backslash = TRUE,
                         col_names = c("CTY", "CNAME", "AREA_KM2"),
                         na = "-9", trim_ws = TRUE) 
## Parsed with column specification:
## cols(
##   CTY = col_character(),
##   CNAME = col_character(),
##   AREA_KM2 = col_double()
## )

Analysis of the Human Population Data

We used the IDBext001 data frame to analyze the world population. The world population total is obtained by totalling up the human population of all the countries grouped by each year from 1950 to the projected population till 2050. Figure 1 shows the world population trend from 1950 to 2050 (projected). We see that in 1950 the world population was about 2.5 billion and is projected to 9.5 billion by 2050. We have color coded the population data from 1961 to 2014 as this is the period that we have the crop production information from FAOstats datasets. From 1961 to 2014, the corresponding population increase was 3 billion to 7 billion.

# Analyze World's Total Population from 1950 to 2050
Pop.world.year.stats <- IDBext001 %>%
  group_by(YR) %>%
  filter(!is.na(BSEX)) %>%
  summarize(Country.count = n(),
            Pop.world.mean = mean(BSEX),
            Pop.world.median = median(BSEX),
            Pop.world.total = mean(BSEX)*n())
## Visualize World Population by Year
ggplot(Pop.world.year.stats) +
  aes(x = YR, y = Pop.world.total/1e+09, col = YR > 1960 & YR < 2015) + 
  geom_point(size = 2,
             alpha = 0.5) +
  theme_bw() +
  theme(legend.position = "none") +
  xlab( "Year") +
  ylab("World Population (bn)") +
  ggtitle("World Population (bn) from 1950 to 2050 (projected)")

Analysis of the Crop Production Data

We analyzed the annual crop production of the world from the Production_Crops data frame by subsetting for Area = World and Element = Production. Then, we calculate the total crops produced annually. Figure 3 shows the world’s billion tonnes of crops produced annually from 1961 to 2014. The total crop production has increased from 6 to 21 billion tonnes in this period.

# Analyze the Worls's total annual crop production (tonnes) 
ProdCrops.World.year <- Production_Crops %>%
  filter(Element == "Production") %>%
  filter(Area == "World") %>%
  group_by(Year) %>%
  filter(!is.na(Value)) %>%
  summarize(Year.count = n(),
            Year.world.mean = mean(Value),
            Year.world.median = median(Value),
            Year.world.total = mean(Value)*n())
# Visualize World Crop Production (tonnes) by Year
ggplot(ProdCrops.World.year) +
  aes(x = Year, y = Year.world.total/1e+09, col = "red") + 
  geom_point(size = 2,
             alpha = 0.5) +
  geom_smooth(method = "loess",
              span = 0.2) +
  theme_bw() +
  theme(legend.position = "none") +
  labs(y = "Total Crop Production (bn tonnes)",
       x = "Year",
       title = "World's Total Crop Production (bn tonnes) from 1961 to 2014")

Current geographical distribution of world’s crop production

In order to visualize the the current geographical distribution of the total crop production, we calculated the total crops produced by each country in 2014 and then projected it on the world map as seen in Figure 4. To do this, we first obtained the world’s map data from the mapdata library and stored it in the world data frame. Then, we calculated the total crops produced in 2014 for each country. There was some mismatch in the country names between the Production_Crops and world data frames. We used the anti_join command in R to identify the mismatches. Then, we cleaned up the data by creating a lookup table and matched the country names. Finally, using left_join we joined the world map data with the crop production data for 2014. Figure 4 shows the rainbow color gradient runs from blue to red on a log scale of crop production by billion tonnes with red being the highest and blue the lowest. As evident from the deep red color, China ranks number one in the world for the total crops produced by tonnage. India is the second highest, followed by USA, Brazil and Russia.

# Get world map data
world <- map_data("world")

# Analyze by Area/Region - Crop Production & Year
ProdCropsArea2014 <- Production_Crops %>%
  group_by(Area) %>%
  filter(Element == "Production" & Year == "2014" & Flag != "A") %>%
  filter(!is.na(Value)) %>%
  summarize(Prod2014.Area.count = n(),
            Prod2014.Area.mean = mean(Value),
            Prod2014.Area.median = median(Value),
            Production.tonnes = mean(Value)*n()) 
names(ProdCropsArea2014)[1] <- "region"

## Datasets_Cleanup
# Anti_join crop data to world data
region_names_mismatch_crop <-  ProdCropsArea2014 %>%
  anti_join(world, by = "region")

# Anti_join world data to crop data
region_names_mismatch_world <-  world %>%
  anti_join(ProdCropsArea2014, by = "region")

# Name mismatch for region lookup 
mismatch.region.lookup <- data.frame(
  region = c("Bolivia (Plurinational State of)",
             "China, mainland",
             "China, Taiwan Province of",
            "Democratic People's Republic of Korea",
            "Iran (Islamic Republic of)",
            "Republic of Korea",
            "Russian Federation",
            "Syrian Arab Republic",
            "United Republic of Tanzania",
            "United States of America", 
            "Venezuela (Bolivarian Republic of)",
            "Viet Nam"),
  New.region = c("Bolivia",
                 "China",
                 "Taiwan",
                 "North Korea",
                 "Iran",
                 "South Korea",
                 "Russia",
                 "Syria",
                 "Tanzania",
                 "USA", 
                 "Venezuela",
                 "Vietnam"),
  stringsAsFactors = FALSE)
#View(mismatch.region.lookup)

# Left join the modified region names
ProdCropsArea2014_mod1 <- ProdCropsArea2014 %>%
  left_join(mismatch.region.lookup, by = "region")

ProdCropsArea2014_mod1[22,1]= "Bolivia"
ProdCropsArea2014_mod1[42,1]="China"
ProdCropsArea2014_mod1[43,1]="Taiwan"
ProdCropsArea2014_mod1[53,1]="North Korea"
ProdCropsArea2014_mod1[91,1]= "Iran"
ProdCropsArea2014_mod1[155,1]="South Korea"
ProdCropsArea2014_mod1[158,1]="Russia"
ProdCropsArea2014_mod1[185,1]="Syria"
ProdCropsArea2014_mod1[202,1]="Tanzania"
ProdCropsArea2014_mod1[203,1]="USA"
ProdCropsArea2014_mod1[207,1]="Venezuela"
ProdCropsArea2014_mod1[208,1]="Vietnam"

# Left join world data to crop data
region_crops <-  world %>%
  left_join(ProdCropsArea2014_mod1, by = "region")
# Base world map
world_base <- ggplot(data = world, 
                     mapping = aes(x = long, y = lat, group = group)) +
  coord_fixed(1.2) +
  geom_polygon(color = "black", fill = NA)

# Rainbow colored gradient fill for total crops produced (tonnes) in 2014
World_map_Crops_Prod <- world_base + 
  geom_polygon(data = region_crops, 
               aes(fill = Production.tonnes), 
               color = "white") +
  geom_polygon(color = "black", fill = NA) +
  theme_bw() +
  theme(legend.position = "none") +
  scale_fill_gradientn(colours = rev(rainbow(7)),
                       breaks = 10^seq(3,9),
                       trans = "log10") +
  ggtitle("Geographical Distribution of Crops Produced in 2014")
World_map_Crops_Prod

Linear Regression Model

Based on our analysis above, it seems that the crop production of the countries with the highest population and the world itself has kept pace with the increases in the population. So, we took a subset of the population data from the IDBext001 data frame from 1961 to 2014 and studied its correlation to the total crop production of the world. Figure 6 shows the scatter plot of the world’s total crops produced vs the world population for the period 1961 to 2014. There seems to be a strong correlation between the crop production and population increase for the entire world.
Further, we applied a linear regression model of the total crops produced with the world population for the period from 1961 to 2014 based on the available data. The straight line in Figure 6 shows the fitted regression trend ine. The data fits the linear regression model quite well. From the model, it seems that for every additional billion people, the total crops produced needs to increase by 3.1 billion tonnes.

# Subset World Population from 1961 to 2014
World.Pop <- IDBext001 %>%
  group_by(YR) %>%
  filter(!is.na(BSEX)) %>%
  filter(YR %in% 1961:2014) %>%
  summarize(Population.World = mean(BSEX)*n())
names(World.Pop)[1] = "Year"

# Analyze the World's total annual crop production (tonnes) by YR 
Total.Crops.Prod.YR <- Production_Crops %>%
  filter(Element == "Production" & Area == "World") %>%
  group_by(Year) %>%
  filter(!is.na(Value)) %>%
  summarize(CropsProduction.World = mean(Value)*n()) 

# Add World's Total_Crops_Produced by Year 
World.Pop <- World.Pop %>%
  left_join(Total.Crops.Prod.YR, by = "Year") 
# Linear Model of Crop Prodiuction vs Population
xyplot(CropsProduction.World/1e+09 ~ Population.World/1e+09, 
       data = World.Pop, 
       type = c("p", "r"),
       xlab = "World Population (bn)", 
       ylab = "Total Crops Produced (bn tonnes)",
       main = "Crops Produced vs World Population",
       col = "blue",
       pch = 19)

# Get coefficients for the linear model
fit <- lm(CropsProduction.World ~ Population.World, 
          data = World.Pop)
# coef(fit)
#(Intercept) Population.World 
#-4.188068e+09     3.146699e+00 

# Model Equation
# CropsProduction.World =  3.146699*Population.World - 4188068000

Food demand based on projected population

Based on the projections of the populations and our fitted linear regression model, we estimated the world’s future demand for total crop production until 2050. Figure 7 shows the demand for food with the projected population change from 2015 to 2050 based on the above model. By 2050, the food production needs to increase by 33% of our current crop production of 18 billion tonnes. The food demand at 2050 will be 25 billion tonnes of crop production. We applied this same model to the countries with the highest populations viz. China, India and USA for the 1961 to 2014 period. The ratio for the tonnes of crop produced vs the population increase is close to the 3.1 ratio obtained for the entire world.

# Get the projected population from 2015 to 2050
Pop.world.projected <- Pop.world.year.stats[(66:101), c(1,5)] 

# Projected production for crops using the Linear Model equation
CropsProduction.projected <- 3.146699*Pop.world.projected$Pop.world.total - 4188068000

# New data frame with projected World's population and production 
Projected.Pop.Crops <- cbind.data.frame(Pop.world.projected,
                                        CropsProduction.projected)

# gather() in tidyr
Projected_long <- Projected.Pop.Crops %>%
  gather(Projections, value,
         Pop.world.total:CropsProduction.projected)
# Visualize Projected World Crop Production (tonnes) & Population by Year
ggplot(Projected_long) +
  aes(x = YR, y = value/1e+09, col = Projections) + 
  geom_point(size = 2,
             alpha = 0.5) +
theme_bw() +
theme(legend.position = "bottom") +
  labs(y = "Crop Production & Population (bn)",
       x = "Year",
       title = "World's Total Crop Production & Population (projected)")

Summary and Discussion

In the last half-century, the manifold increase in the wold population was accompanied by a corresponding increase in the total crop prodution of the world. The countries that had the highest population viz. China, India and Brazil have produced the major grains of the world i.e. maize, rice and wheat. Based on our modeling and analysis of the crop production and the world population data, the total crop production increased by 3.1 billion tonnes for every additional 1 bilion people. With the population set to increase by atleast another 2 billion by 2050, the total crop production needs to increase by about 6.5 billion tonnes, a 33% increase. Various challenges like climate change accompanied by prolonged floods and droughts, plant dieases, drug-resistant pests, etc. exist that poses major headwinds to achieve this target. By concerted global effort and co-operation that ensures reducing global poverty, hunger and wastage of crop products, we can overcome these major challenges and ensure a sustainable future to look forward to.

Appendix

#install.packages("tidyverse")
#install.packages("stringr")
#install.packages("lattice")
#install.packages("mapdata")
#install.packages("maps")
library("papaja")
library(readr)
library(tidyverse)
library(stringr)
library(lattice)
library(devtools)
library(maps)
library(mapdata)

# Hide Warnings
oldw <- getOption("warn")
options(warn = -1)
# FAOstats - Production of all crops
Production_Crops <- read_csv("C:/Harrisburg University Classes/Sem 1 - Summer 2017/ANLY-500/Term Project/FAOSTAT/FAOStats Code/Production_Crops_E_All_Data_(Normalized).csv",
                             na = "NA")

# IDB - Population Data by Country Code & Year
IDBext001 <- read_delim("C:/Harrisburg University Classes/Sem 1 - Summer 2017/ANLY-500/Term Project/FAOSTAT/FAOStats Code/IDBext001.txt", 
                           "|", quote = "\\\"", escape_backslash = TRUE,
                        col_names = c("CTY", "YR", "BSEX"),
                        na = "-9", trim_ws = TRUE)

# IDB - Data of Country Code, Country Name and Land Suface Area  
IDBextCTYS <- read_delim("C:/Harrisburg University Classes/Sem 1 - Summer 2017/ANLY-500/Term Project/FAOSTAT/FAOStats Code/IDBextCTYS.txt", 
                         "|", quote = "\\\"", escape_backslash = TRUE,
                         col_names = c("CTY", "CNAME", "AREA_KM2"),
                         na = "-9", trim_ws = TRUE) 
# Analyze World's Total Population from 1950 to 2050
Pop.world.year.stats <- IDBext001 %>%
  group_by(YR) %>%
  filter(!is.na(BSEX)) %>%
  summarize(Country.count = n(),
            Pop.world.mean = mean(BSEX),
            Pop.world.median = median(BSEX),
            Pop.world.total = mean(BSEX)*n())
## Visualize World Population by Year
ggplot(Pop.world.year.stats) +
  aes(x = YR, y = Pop.world.total/1e+09, col = YR > 1960 & YR < 2015) + 
  geom_point(size = 2,
             alpha = 0.5) +
  theme_bw() +
  theme(legend.position = "none") +
  xlab( "Year") +
  ylab("World Population (bn)") +
  ggtitle("World Population (bn) from 1950 to 2050 (projected)")
# Analyze top 5 Population of countries from 1950 to 2050
Pop.top5.year <- IDBext001 %>%
  group_by(YR) %>%
  arrange(YR, desc(BSEX))  %>%
  top_n(n=5, wt = BSEX)

# Left join with IDBextCTYS to obtain the full name for countries.
Pop.top5.year <- Pop.top5.year %>%
  left_join(IDBextCTYS, by = "CTY")
# Stacked barchart countries with top 5 poulation from 1950 to 2050
ggplot(Pop.top5.year, aes(x = YR, y = BSEX/1e+09, fill = CNAME)) + 
  geom_bar(stat="identity") +
  xlab("Year") +
  ylab("Population (bn)") +
  ggtitle("World Population - Top 5 Countries from 1950 to 2050 (projected)") + 
  theme_bw()
# Analyze the Worls's total annual crop production (tonnes) 
ProdCrops.World.year <- Production_Crops %>%
  filter(Element == "Production") %>%
  filter(Area == "World") %>%
  group_by(Year) %>%
  filter(!is.na(Value)) %>%
  summarize(Year.count = n(),
            Year.world.mean = mean(Value),
            Year.world.median = median(Value),
            Year.world.total = mean(Value)*n())
# Visualize World Crop Production (tonnes) by Year
ggplot(ProdCrops.World.year) +
  aes(x = Year, y = Year.world.total/1e+09, col = "red") + 
  geom_point(size = 2,
             alpha = 0.5) +
  geom_smooth(method = "loess",
              span = 0.2) +
  theme_bw() +
  theme(legend.position = "none") +
  labs(y = "Total Crop Production (bn tonnes)",
       x = "Year",
       title = "World's Total Crop Production (bn tonnes) from 1961 to 2014")
# Get world map data
world <- map_data("world")

# Analyze by Area/Region - Crop Production & Year
ProdCropsArea2014 <- Production_Crops %>%
  group_by(Area) %>%
  filter(Element == "Production" & Year == "2014" & Flag != "A") %>%
  filter(!is.na(Value)) %>%
  summarize(Prod2014.Area.count = n(),
            Prod2014.Area.mean = mean(Value),
            Prod2014.Area.median = median(Value),
            Production.tonnes = mean(Value)*n()) 
names(ProdCropsArea2014)[1] <- "region"

## Datasets_Cleanup
# Anti_join crop data to world data
region_names_mismatch_crop <-  ProdCropsArea2014 %>%
  anti_join(world, by = "region")

# Anti_join world data to crop data
region_names_mismatch_world <-  world %>%
  anti_join(ProdCropsArea2014, by = "region")

# Name mismatch for region lookup 
mismatch.region.lookup <- data.frame(
  region = c("Bolivia (Plurinational State of)",
             "China, mainland",
             "China, Taiwan Province of",
            "Democratic People's Republic of Korea",
            "Iran (Islamic Republic of)",
            "Republic of Korea",
            "Russian Federation",
            "Syrian Arab Republic",
            "United Republic of Tanzania",
            "United States of America", 
            "Venezuela (Bolivarian Republic of)",
            "Viet Nam"),
  New.region = c("Bolivia",
                 "China",
                 "Taiwan",
                 "North Korea",
                 "Iran",
                 "South Korea",
                 "Russia",
                 "Syria",
                 "Tanzania",
                 "USA", 
                 "Venezuela",
                 "Vietnam"),
  stringsAsFactors = FALSE)
#View(mismatch.region.lookup)

# Left join the modified region names
ProdCropsArea2014_mod1 <- ProdCropsArea2014 %>%
  left_join(mismatch.region.lookup, by = "region")

ProdCropsArea2014_mod1[22,1]= "Bolivia"
ProdCropsArea2014_mod1[42,1]="China"
ProdCropsArea2014_mod1[43,1]="Taiwan"
ProdCropsArea2014_mod1[53,1]="North Korea"
ProdCropsArea2014_mod1[91,1]= "Iran"
ProdCropsArea2014_mod1[155,1]="South Korea"
ProdCropsArea2014_mod1[158,1]="Russia"
ProdCropsArea2014_mod1[185,1]="Syria"
ProdCropsArea2014_mod1[202,1]="Tanzania"
ProdCropsArea2014_mod1[203,1]="USA"
ProdCropsArea2014_mod1[207,1]="Venezuela"
ProdCropsArea2014_mod1[208,1]="Vietnam"

# Left join world data to crop data
region_crops <-  world %>%
  left_join(ProdCropsArea2014_mod1, by = "region")
# Base world map
world_base <- ggplot(data = world, 
                     mapping = aes(x = long, y = lat, group = group)) +
  coord_fixed(1.2) +
  geom_polygon(color = "black", fill = NA)

# Rainbow colored gradient fill for total crops produced (tonnes) in 2014
World_map_Crops_Prod <- world_base + 
  geom_polygon(data = region_crops, 
               aes(fill = Production.tonnes), 
               color = "white") +
  geom_polygon(color = "black", fill = NA) +
  theme_bw() +
  theme(legend.position = "none") +
  scale_fill_gradientn(colours = rev(rainbow(7)),
                       breaks = 10^seq(3,9),
                       trans = "log10") +
  ggtitle("Geographical Distribution of Crops Produced in 2014")
World_map_Crops_Prod
# Analyze top 5 countries producing top 5 crops in 2014 
Crop.Country.top5 <- Production_Crops %>%
  filter(Element == "Production" & Year == "2014" & Flag != "A") %>%
  filter(`Item Code` %in% c(156,56,15,27,116)) %>%
  filter(!is.na(Value)) %>%
  group_by(Item)  %>%
  arrange(Item, desc(Value)) %>%
  top_n(n=5, wt = Value)
# Stacked Column chart for top 5 crops 
# with top 5 countries producing them by year
Crop.Country.top5$Item <- factor(Crop.Country.top5$Item,
                                 levels=unique(Crop.Country.top5$Item))
ggplot(Crop.Country.top5, aes(x= reorder(Item, -Value),
                              y=Value/1e+09, fill=Area)) + 
  geom_bar(stat="identity") +
  xlab("Top 5 Crops Produced") +
  ylab("Crop Production (billion tonnes)") +
  ggtitle("Top 5 Crops Produced with top 5 Countries") + 
  theme_bw()
# Subset World Population from 1961 to 2014
World.Pop <- IDBext001 %>%
  group_by(YR) %>%
  filter(!is.na(BSEX)) %>%
  filter(YR %in% 1961:2014) %>%
  summarize(Population.World = mean(BSEX)*n())
names(World.Pop)[1] = "Year"

# Analyze the World's total annual crop production (tonnes) by YR 
Total.Crops.Prod.YR <- Production_Crops %>%
  filter(Element == "Production" & Area == "World") %>%
  group_by(Year) %>%
  filter(!is.na(Value)) %>%
  summarize(CropsProduction.World = mean(Value)*n()) 

# Add World's Total_Crops_Produced by Year 
World.Pop <- World.Pop %>%
  left_join(Total.Crops.Prod.YR, by = "Year") 
# Linear Model of Crop Prodiuction vs Population
xyplot(CropsProduction.World/1e+09 ~ Population.World/1e+09, 
       data = World.Pop, 
       type = c("p", "r"),
       xlab = "World Population (bn)", 
       ylab = "Total Crops Produced (bn tonnes)",
       main = "Crops Produced vs World Population",
       col = "blue",
       pch = 19)

# Get coefficients for the linear model
fit <- lm(CropsProduction.World ~ Population.World, 
          data = World.Pop)
# coef(fit)
#(Intercept) Population.World 
#-4.188068e+09     3.146699e+00 

# Model Equation
# CropsProduction.World =  3.146699*Population.World - 4188068000
# Get the projected population from 2015 to 2050
Pop.world.projected <- Pop.world.year.stats[(66:101), c(1,5)] 

# Projected production for crops using the Linear Model equation
CropsProduction.projected <- 3.146699*Pop.world.projected$Pop.world.total - 4188068000

# New data frame with projected World's population and production 
Projected.Pop.Crops <- cbind.data.frame(Pop.world.projected,
                                        CropsProduction.projected)

# gather() in tidyr
Projected_long <- Projected.Pop.Crops %>%
  gather(Projections, value,
         Pop.world.total:CropsProduction.projected)
# Visualize Projected World Crop Production (tonnes) & Population by Year
ggplot(Projected_long) +
  aes(x = YR, y = value/1e+09, col = Projections) + 
  geom_point(size = 2,
             alpha = 0.5) +
theme_bw() +
theme(legend.position = "bottom") +
  labs(y = "Crop Production & Population (bn)",
       x = "Year",
       title = "World's Total Crop Production & Population (projected)")
r_refs(file = "references.bib")

References

r_refs(file = "references.bib")

Aust, Frederik, and Marius Barth. 2017. papaja: Create APA Manuscripts with R Markdown. https://github.com/crsh/papaja.

FAO. 2016. FAOstats. http://www.fao.org/faostat/en/#data.

Hartemink, Alfred E. 2007. “Soil Science, Population Growth and Food Production: Some Historical Developments.” In Advances in Integrated Soil Fertility Management in Sub-Saharan Africa: Challenges and Opportunities, edited by Andre Bationo, Boaz Waswa, Job Kihara, and Joseph Kimetu, 85–97. Dordrecht: Springer Netherlands. doi:10.1007/978-1-4020-5760-1_6.

Henry, Lionel, and Hadley Wickham. 2017. Purrr: Functional Programming Tools. https://CRAN.R-project.org/package=purrr.

Muller, Kirill. 2017. Bindrcpp: An ’Rcpp’ Interface to Active Bindings. https://CRAN.R-project.org/package=bindrcpp.

Muller, Kirill, and Hadley Wickham. 2017. Tibble: Simple Data Frames. https://CRAN.R-project.org/package=tibble.

R Core Team. 2017. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Richard A. Becker, Original S code by, and Allan R. Wilks. R version by Ray Brownrigg. 2016. Mapdata: Extra Map Databases. https://CRAN.R-project.org/package=mapdata.

Richard A. Becker, Original S code by, Allan R. Wilks. R version by Ray Brownrigg. Enhancements by Thomas P Minka, and Alex Deckmyn. 2017. Maps: Draw Geographical Maps. https://CRAN.R-project.org/package=maps.

Sarkar, Deepayan. 2008. Lattice: Multivariate Data Visualization with R. New York: Springer. http://lmdvr.r-forge.r-project.org.

Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.

———. 2017a. Stringr: Simple, Consistent Wrappers for Common String Operations. https://CRAN.R-project.org/package=stringr.

———. 2017b. Tidyr: Easily Tidy Data with ’Spread()’ and ’Gather()’ Functions. https://CRAN.R-project.org/package=tidyr.

———. 2017c. Tidyverse: Easily Install and Load ’Tidyverse’ Packages. https://CRAN.R-project.org/package=tidyverse.

Wickham, Hadley, and Winston Chang. 2017. Devtools: Tools to Make Developing R Packages Easier. https://CRAN.R-project.org/package=devtools.

Wickham, Hadley, Romain Francois, Lionel Henry, and Kirill Muller. 2017. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.

Wickham, Hadley, Jim Hester, and Romain Francois. 2017. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.

WorldBank. 2016. World Bank Data. http://data.worldbank.org/data-catalog.