Abstract
The world human population has seen an exponential rise going from 1 billion to 7.5 billion currently in less than 200 years and is expected to reach about 10 billion by 2050. This huge increase in the world population was accompanied by a corresponding massive increase in the crop production. In this article, we examined the correlation between the population increase with the crop production. We demonstrated that the geographical distribution of the global population and crop production was similar for the last 50 years. Finally, we built a linear regression model that correlates the population growth of the world with the world’s annual crop production. Based on our model, we predict that the world crop production needs to increase by atleastt 33% of the current rate in order to meet the food demand for the projected population increase of another 2.5 billion until 2050.
The world population today is about 7.5 billion and is growing by an annual 83 million approximately. The global population is expected to reach approximately 10 billion in 2050 and 11 billion by 2100. (Sustainablescale 2016) This increase in the world population has been accompanied by trends like increased rural to urban migration, longer life expectancy but decreased fertility rates with better economic development.(Sustainablescale 2016)
A variety of geographic, demographic and socio-economic factors have contributed to this massive rate of increase in the human population. Among them, a few primary factors are improved healthcare annd vaccination, control of large scale epidemics, better economic and environmental living conditions and lesser war conflicts. The effect of these factors are demonstrated in the changes of national birth and death rates, life expectancy and infant mortality rates of each nation. Correlating this information with various economic factors like economic growth and healthcare spending provides insights into how these factors affect the population growth and have been previously studied in quite detail.(WorldBank 2016, Sustainablescale (2016))
However, the food production which is likely to be a strong factor in population growth but has recieved less attention. With more mouths to feed as the population increases, there must be a considerable increase in food production with a more effective geographical distribution & consumption of resources. Though a causal relation between poulation growth and crop production is very difficult to establish, the presence of a strong correlation can be established quite easily.
Historically, the factors affecting increased food production were the advancement in soil sciences, advanced farming techniques like ploughing, better weather predictions and increase in yield with better seed quality and use of fertilizers and pesticides.(Hartemink 2007) An example of a study conducted to understand the link between population growth and food production was done in Indonesia where three regions were studied, viz. Java, Sumatra and Borneo. Java had a mean population of 316 people per Km. Sq. whereas Sumatra and Borneo had only 11 people per Sq.Km. The difference was in the quality of the soil, Java was primary the volcanic soil that was more fertile and had a better impact on the irrigation water that supported the growth of rice. However, this was changed in last few decades as the transportation system has changed the direct correlation of crop production and human settlements. Rapid urbanization also has certain cultural impact of this. In this article, we modeled the relationship between population increase and its consequent demand on food production.
Our primary objective was to build up a model that correlates the world population with the world food production. Based on our model, we wanted to predict the world food demand based on the projected population. A description of the required datasets and the corresponding measures and procedures involved is provided below.
We have used datasets from multiple sources to obtain information about the world population and total crop production as shown below.
We obtain population trends and the factors affecting it from the “IDB” (IDB 2016) datasets. The IDB datasets provide demographic information about each country with its total mid-year population, land area, crude birth and death rates, infant mortality rates and life expectancy, age-specific fertility rates as well as net migration and growth rates.
The “FAOstats” (FAO 2016) dataset provides information on food production including total annual production of various crops with groupings of cereals, vegetables, fruits, etc. In addition, the total arable land and yields per hectare for each country is also provided for the last 50 years.
We used the IDB datasets to analyze the demographic and socio-economic factors affecting the world population trends, its geographical distributions and future population estimates for the 21st century. For the world food production, we used the FAOstats datasets to do a longitudinal study of the major crop production of every nation, their average annual yield and usage of arable land. Then, we correlated the human population growth of each country and their annual food production and tried to build up a model. Finally, we predict the world food demand based on our model and the projections of the world population. We used R (3.4.0, R Core Team 2017) and the R-packages bindrcpp (0.2, Muller 2017), devtools (1.13.3, Wickham and Chang 2017), dplyr (0.7.2, Wickham et al. 2017), ggplot2 (2.2.1, Wickham 2009), lattice (0.20.35, Sarkar 2008), mapdata (2.2.6, Richard A. Becker and Ray Brownrigg. 2016), maps (3.2.0, Richard A. Becker, Ray Brownrigg. Enhancements by Thomas P Minka, and Deckmyn. 2017), papaja (0.1.0.9492, Aust and Barth 2017), purrr (0.2.3, Henry and Wickham 2017), readr (1.1.1, Wickham, Hester, and Francois 2017), stringr (1.2.0, Wickham 2017a), tibble (1.3.3, Muller and Wickham 2017), tidyr (0.7.0, Wickham 2017b), and tidyverse (1.1.1, Wickham 2017c) for all our data analysis and data visualization.
We began by loading the data for population and crop production from the IDB (IDB 2016) and FAO (FAO 2016) datasets respectively. The normalized data for all production crops is loaded in the Production_Crops data frame and is used as a measure for direct and indirect consumption. Hence, the need to analyze the processed crops and livestock is eliminated. The population data from the IDB datasets is loaded in the IDBext001 and IDBextCTYS data frames. The IDBext001 data frame shows the mid-year population of each country with its country code name and IDBextCTYS data frame provides a lookup table for the country code names and their total land area. In the following subsections, we analyze these datasets in greater detail.
# FAOstats - Production of all crops
Production_Crops <- read_csv("C:/Harrisburg University Classes/Sem 1 - Summer 2017/ANLY-500/Term Project/FAOSTAT/FAOStats Code/Production_Crops_E_All_Data_(Normalized).csv",
na = "NA")
## Parsed with column specification:
## cols(
## `Area Code` = col_integer(),
## Area = col_character(),
## `Item Code` = col_integer(),
## Item = col_character(),
## `Element Code` = col_integer(),
## Element = col_character(),
## `Year Code` = col_integer(),
## Year = col_integer(),
## Unit = col_character(),
## Value = col_double(),
## Flag = col_character()
## )
# IDB - Population Data by Country Code & Year
IDBext001 <- read_delim("C:/Harrisburg University Classes/Sem 1 - Summer 2017/ANLY-500/Term Project/FAOSTAT/FAOStats Code/IDBext001.txt",
"|", quote = "\\\"", escape_backslash = TRUE,
col_names = c("CTY", "YR", "BSEX"),
na = "-9", trim_ws = TRUE)
## Parsed with column specification:
## cols(
## CTY = col_character(),
## YR = col_integer(),
## BSEX = col_integer()
## )
# IDB - Data of Country Code, Country Name and Land Suface Area
IDBextCTYS <- read_delim("C:/Harrisburg University Classes/Sem 1 - Summer 2017/ANLY-500/Term Project/FAOSTAT/FAOStats Code/IDBextCTYS.txt",
"|", quote = "\\\"", escape_backslash = TRUE,
col_names = c("CTY", "CNAME", "AREA_KM2"),
na = "-9", trim_ws = TRUE)
## Parsed with column specification:
## cols(
## CTY = col_character(),
## CNAME = col_character(),
## AREA_KM2 = col_double()
## )
We used the IDBext001 data frame to analyze the world population. The world population total is obtained by totalling up the human population of all the countries grouped by each year from 1950 to the projected population till 2050. Figure 1 shows the world population trend from 1950 to 2050 (projected). We see that in 1950 the world population was about 2.5 billion and is projected to 9.5 billion by 2050. We have color coded the population data from 1961 to 2014 as this is the period that we have the crop production information from FAOstats datasets. From 1961 to 2014, the corresponding population increase was 3 billion to 7 billion.
# Analyze World's Total Population from 1950 to 2050
Pop.world.year.stats <- IDBext001 %>%
group_by(YR) %>%
filter(!is.na(BSEX)) %>%
summarize(Country.count = n(),
Pop.world.mean = mean(BSEX),
Pop.world.median = median(BSEX),
Pop.world.total = mean(BSEX)*n())
## Visualize World Population by Year
ggplot(Pop.world.year.stats) +
aes(x = YR, y = Pop.world.total/1e+09, col = YR > 1960 & YR < 2015) +
geom_point(size = 2,
alpha = 0.5) +
theme_bw() +
theme(legend.position = "none") +
xlab( "Year") +
ylab("World Population (bn)") +
ggtitle("World Population (bn) from 1950 to 2050 (projected)")
The colored stacked bar chart in Figure 2 shows the country-wise population trends. The top 5 countries with the largest population for each year is shown from 1950 to 2050 (projected). Throughout this period, China, India and USA have been the top 3 countries with the largest population consistently. Current populations are China 1.3 billion, India 1.2 billion and USA 0.33 billion in 2017. Currently, about 40% of the world’s population live in these 3 countries. Population growth rate in China has been marginal since its 1 child policy in the 70’s, India has grown considerably and USA has no significant change. Population growth rates in Asia is decreasing though it’s increasing considerably in Africa and that is where the significant future population growth will occur. Projections for 2050 are: India will be number 1 country followed by China. However, the African country of Nigeria will have similar populations to that of USA by 2050.
# Analyze top 5 Population of countries from 1950 to 2050
Pop.top5.year <- IDBext001 %>%
group_by(YR) %>%
arrange(YR, desc(BSEX)) %>%
top_n(n=5, wt = BSEX)
# Left join with IDBextCTYS to obtain the full name for countries.
Pop.top5.year <- Pop.top5.year %>%
left_join(IDBextCTYS, by = "CTY")
# Stacked barchart countries with top 5 poulation from 1950 to 2050
ggplot(Pop.top5.year, aes(x = YR, y = BSEX/1e+09, fill = CNAME)) +
geom_bar(stat="identity") +
xlab("Year") +
ylab("Population (bn)") +
ggtitle("World Population - Top 5 Countries from 1950 to 2050 (projected)") +
theme_bw()
We analyzed the annual crop production of the world from the Production_Crops data frame by subsetting for Area = World and Element = Production. Then, we calculate the total crops produced annually. Figure 3 shows the world’s billion tonnes of crops produced annually from 1961 to 2014. The total crop production has increased from 6 to 21 billion tonnes in this period.
# Analyze the Worls's total annual crop production (tonnes)
ProdCrops.World.year <- Production_Crops %>%
filter(Element == "Production") %>%
filter(Area == "World") %>%
group_by(Year) %>%
filter(!is.na(Value)) %>%
summarize(Year.count = n(),
Year.world.mean = mean(Value),
Year.world.median = median(Value),
Year.world.total = mean(Value)*n())
# Visualize World Crop Production (tonnes) by Year
ggplot(ProdCrops.World.year) +
aes(x = Year, y = Year.world.total/1e+09, col = "red") +
geom_point(size = 2,
alpha = 0.5) +
geom_smooth(method = "loess",
span = 0.2) +
theme_bw() +
theme(legend.position = "none") +
labs(y = "Total Crop Production (bn tonnes)",
x = "Year",
title = "World's Total Crop Production (bn tonnes) from 1961 to 2014")
In order to visualize the the current geographical distribution of the total crop production, we calculated the total crops produced by each country in 2014 and then projected it on the world map as seen in Figure 4. To do this, we first obtained the world’s map data from the mapdata library and stored it in the world data frame. Then, we calculated the total crops produced in 2014 for each country. There was some mismatch in the country names between the Production_Crops and world data frames. We used the anti_join command in R to identify the mismatches. Then, we cleaned up the data by creating a lookup table and matched the country names. Finally, using left_join we joined the world map data with the crop production data for 2014. Figure 4 shows the rainbow color gradient runs from blue to red on a log scale of crop production by billion tonnes with red being the highest and blue the lowest. As evident from the deep red color, China ranks number one in the world for the total crops produced by tonnage. India is the second highest, followed by USA, Brazil and Russia.
# Get world map data
world <- map_data("world")
# Analyze by Area/Region - Crop Production & Year
ProdCropsArea2014 <- Production_Crops %>%
group_by(Area) %>%
filter(Element == "Production" & Year == "2014" & Flag != "A") %>%
filter(!is.na(Value)) %>%
summarize(Prod2014.Area.count = n(),
Prod2014.Area.mean = mean(Value),
Prod2014.Area.median = median(Value),
Production.tonnes = mean(Value)*n())
names(ProdCropsArea2014)[1] <- "region"
## Datasets_Cleanup
# Anti_join crop data to world data
region_names_mismatch_crop <- ProdCropsArea2014 %>%
anti_join(world, by = "region")
# Anti_join world data to crop data
region_names_mismatch_world <- world %>%
anti_join(ProdCropsArea2014, by = "region")
# Name mismatch for region lookup
mismatch.region.lookup <- data.frame(
region = c("Bolivia (Plurinational State of)",
"China, mainland",
"China, Taiwan Province of",
"Democratic People's Republic of Korea",
"Iran (Islamic Republic of)",
"Republic of Korea",
"Russian Federation",
"Syrian Arab Republic",
"United Republic of Tanzania",
"United States of America",
"Venezuela (Bolivarian Republic of)",
"Viet Nam"),
New.region = c("Bolivia",
"China",
"Taiwan",
"North Korea",
"Iran",
"South Korea",
"Russia",
"Syria",
"Tanzania",
"USA",
"Venezuela",
"Vietnam"),
stringsAsFactors = FALSE)
#View(mismatch.region.lookup)
# Left join the modified region names
ProdCropsArea2014_mod1 <- ProdCropsArea2014 %>%
left_join(mismatch.region.lookup, by = "region")
ProdCropsArea2014_mod1[22,1]= "Bolivia"
ProdCropsArea2014_mod1[42,1]="China"
ProdCropsArea2014_mod1[43,1]="Taiwan"
ProdCropsArea2014_mod1[53,1]="North Korea"
ProdCropsArea2014_mod1[91,1]= "Iran"
ProdCropsArea2014_mod1[155,1]="South Korea"
ProdCropsArea2014_mod1[158,1]="Russia"
ProdCropsArea2014_mod1[185,1]="Syria"
ProdCropsArea2014_mod1[202,1]="Tanzania"
ProdCropsArea2014_mod1[203,1]="USA"
ProdCropsArea2014_mod1[207,1]="Venezuela"
ProdCropsArea2014_mod1[208,1]="Vietnam"
# Left join world data to crop data
region_crops <- world %>%
left_join(ProdCropsArea2014_mod1, by = "region")
# Base world map
world_base <- ggplot(data = world,
mapping = aes(x = long, y = lat, group = group)) +
coord_fixed(1.2) +
geom_polygon(color = "black", fill = NA)
# Rainbow colored gradient fill for total crops produced (tonnes) in 2014
World_map_Crops_Prod <- world_base +
geom_polygon(data = region_crops,
aes(fill = Production.tonnes),
color = "white") +
geom_polygon(color = "black", fill = NA) +
theme_bw() +
theme(legend.position = "none") +
scale_fill_gradientn(colours = rev(rainbow(7)),
breaks = 10^seq(3,9),
trans = "log10") +
ggtitle("Geographical Distribution of Crops Produced in 2014")
World_map_Crops_Prod
In this section, we wanted to anlyze the historical trends in the majors crops produced and their geographical distribution. The colored stacked barchart in Figure 5 shows the top 5 crops produced with the top 5 countries for each crop from 1961 to 2014. Sugarcane is the largest crop produced by billion tonnes of crops with Brazil and India being the highest producers. However, the next 3 are Maize, Rice and Wheat which constitute the grains that sustain the population of the top 3 countries with the highest population. Maize is largely produced by USA and China. Rice is primarily grown in the Asian countries - China, India, Indonesia, Bangladesh and Vietnam being the top 5. Wheat is primarily grown in China and India.
# Analyze top 5 countries producing top 5 crops in 2014
Crop.Country.top5 <- Production_Crops %>%
filter(Element == "Production" & Year == "2014" & Flag != "A") %>%
filter(`Item Code` %in% c(156,56,15,27,116)) %>%
filter(!is.na(Value)) %>%
group_by(Item) %>%
arrange(Item, desc(Value)) %>%
top_n(n=5, wt = Value)
# Stacked Column chart for top 5 crops
# with top 5 countries producing them by year
Crop.Country.top5$Item <- factor(Crop.Country.top5$Item,
levels=unique(Crop.Country.top5$Item))
ggplot(Crop.Country.top5, aes(x= reorder(Item, -Value),
y=Value/1e+09, fill=Area)) +
geom_bar(stat="identity") +
xlab("Top 5 Crops Produced") +
ylab("Crop Production (billion tonnes)") +
ggtitle("Top 5 Crops Produced with top 5 Countries") +
theme_bw()
Based on our analysis above, it seems that the crop production of the countries with the highest population and the world itself has kept pace with the increases in the population. So, we took a subset of the population data from the IDBext001 data frame from 1961 to 2014 and studied its correlation to the total crop production of the world. Figure 6 shows the scatter plot of the world’s total crops produced vs the world population for the period 1961 to 2014. There seems to be a strong correlation between the crop production and population increase for the entire world.
Further, we applied a linear regression model of the total crops produced with the world population for the period from 1961 to 2014 based on the available data. The straight line in Figure 6 shows the fitted regression trend ine. The data fits the linear regression model quite well. From the model, it seems that for every additional billion people, the total crops produced needs to increase by 3.1 billion tonnes.
# Subset World Population from 1961 to 2014
World.Pop <- IDBext001 %>%
group_by(YR) %>%
filter(!is.na(BSEX)) %>%
filter(YR %in% 1961:2014) %>%
summarize(Population.World = mean(BSEX)*n())
names(World.Pop)[1] = "Year"
# Analyze the World's total annual crop production (tonnes) by YR
Total.Crops.Prod.YR <- Production_Crops %>%
filter(Element == "Production" & Area == "World") %>%
group_by(Year) %>%
filter(!is.na(Value)) %>%
summarize(CropsProduction.World = mean(Value)*n())
# Add World's Total_Crops_Produced by Year
World.Pop <- World.Pop %>%
left_join(Total.Crops.Prod.YR, by = "Year")
# Linear Model of Crop Prodiuction vs Population
xyplot(CropsProduction.World/1e+09 ~ Population.World/1e+09,
data = World.Pop,
type = c("p", "r"),
xlab = "World Population (bn)",
ylab = "Total Crops Produced (bn tonnes)",
main = "Crops Produced vs World Population",
col = "blue",
pch = 19)
# Get coefficients for the linear model
fit <- lm(CropsProduction.World ~ Population.World,
data = World.Pop)
# coef(fit)
#(Intercept) Population.World
#-4.188068e+09 3.146699e+00
# Model Equation
# CropsProduction.World = 3.146699*Population.World - 4188068000
Based on the projections of the populations and our fitted linear regression model, we estimated the world’s future demand for total crop production until 2050. Figure 7 shows the demand for food with the projected population change from 2015 to 2050 based on the above model. By 2050, the food production needs to increase by 33% of our current crop production of 18 billion tonnes. The food demand at 2050 will be 25 billion tonnes of crop production. We applied this same model to the countries with the highest populations viz. China, India and USA for the 1961 to 2014 period. The ratio for the tonnes of crop produced vs the population increase is close to the 3.1 ratio obtained for the entire world.
# Get the projected population from 2015 to 2050
Pop.world.projected <- Pop.world.year.stats[(66:101), c(1,5)]
# Projected production for crops using the Linear Model equation
CropsProduction.projected <- 3.146699*Pop.world.projected$Pop.world.total - 4188068000
# New data frame with projected World's population and production
Projected.Pop.Crops <- cbind.data.frame(Pop.world.projected,
CropsProduction.projected)
# gather() in tidyr
Projected_long <- Projected.Pop.Crops %>%
gather(Projections, value,
Pop.world.total:CropsProduction.projected)
# Visualize Projected World Crop Production (tonnes) & Population by Year
ggplot(Projected_long) +
aes(x = YR, y = value/1e+09, col = Projections) +
geom_point(size = 2,
alpha = 0.5) +
theme_bw() +
theme(legend.position = "bottom") +
labs(y = "Crop Production & Population (bn)",
x = "Year",
title = "World's Total Crop Production & Population (projected)")
In the last half-century, the manifold increase in the wold population was accompanied by a corresponding increase in the total crop prodution of the world. The countries that had the highest population viz. China, India and Brazil have produced the major grains of the world i.e. maize, rice and wheat. Based on our modeling and analysis of the crop production and the world population data, the total crop production increased by 3.1 billion tonnes for every additional 1 bilion people. With the population set to increase by atleast another 2 billion by 2050, the total crop production needs to increase by about 6.5 billion tonnes, a 33% increase. Various challenges like climate change accompanied by prolonged floods and droughts, plant dieases, drug-resistant pests, etc. exist that poses major headwinds to achieve this target. By concerted global effort and co-operation that ensures reducing global poverty, hunger and wastage of crop products, we can overcome these major challenges and ensure a sustainable future to look forward to.
#install.packages("tidyverse")
#install.packages("stringr")
#install.packages("lattice")
#install.packages("mapdata")
#install.packages("maps")
library("papaja")
library(readr)
library(tidyverse)
library(stringr)
library(lattice)
library(devtools)
library(maps)
library(mapdata)
# Hide Warnings
oldw <- getOption("warn")
options(warn = -1)
# FAOstats - Production of all crops
Production_Crops <- read_csv("C:/Harrisburg University Classes/Sem 1 - Summer 2017/ANLY-500/Term Project/FAOSTAT/FAOStats Code/Production_Crops_E_All_Data_(Normalized).csv",
na = "NA")
# IDB - Population Data by Country Code & Year
IDBext001 <- read_delim("C:/Harrisburg University Classes/Sem 1 - Summer 2017/ANLY-500/Term Project/FAOSTAT/FAOStats Code/IDBext001.txt",
"|", quote = "\\\"", escape_backslash = TRUE,
col_names = c("CTY", "YR", "BSEX"),
na = "-9", trim_ws = TRUE)
# IDB - Data of Country Code, Country Name and Land Suface Area
IDBextCTYS <- read_delim("C:/Harrisburg University Classes/Sem 1 - Summer 2017/ANLY-500/Term Project/FAOSTAT/FAOStats Code/IDBextCTYS.txt",
"|", quote = "\\\"", escape_backslash = TRUE,
col_names = c("CTY", "CNAME", "AREA_KM2"),
na = "-9", trim_ws = TRUE)
# Analyze World's Total Population from 1950 to 2050
Pop.world.year.stats <- IDBext001 %>%
group_by(YR) %>%
filter(!is.na(BSEX)) %>%
summarize(Country.count = n(),
Pop.world.mean = mean(BSEX),
Pop.world.median = median(BSEX),
Pop.world.total = mean(BSEX)*n())
## Visualize World Population by Year
ggplot(Pop.world.year.stats) +
aes(x = YR, y = Pop.world.total/1e+09, col = YR > 1960 & YR < 2015) +
geom_point(size = 2,
alpha = 0.5) +
theme_bw() +
theme(legend.position = "none") +
xlab( "Year") +
ylab("World Population (bn)") +
ggtitle("World Population (bn) from 1950 to 2050 (projected)")
# Analyze top 5 Population of countries from 1950 to 2050
Pop.top5.year <- IDBext001 %>%
group_by(YR) %>%
arrange(YR, desc(BSEX)) %>%
top_n(n=5, wt = BSEX)
# Left join with IDBextCTYS to obtain the full name for countries.
Pop.top5.year <- Pop.top5.year %>%
left_join(IDBextCTYS, by = "CTY")
# Stacked barchart countries with top 5 poulation from 1950 to 2050
ggplot(Pop.top5.year, aes(x = YR, y = BSEX/1e+09, fill = CNAME)) +
geom_bar(stat="identity") +
xlab("Year") +
ylab("Population (bn)") +
ggtitle("World Population - Top 5 Countries from 1950 to 2050 (projected)") +
theme_bw()
# Analyze the Worls's total annual crop production (tonnes)
ProdCrops.World.year <- Production_Crops %>%
filter(Element == "Production") %>%
filter(Area == "World") %>%
group_by(Year) %>%
filter(!is.na(Value)) %>%
summarize(Year.count = n(),
Year.world.mean = mean(Value),
Year.world.median = median(Value),
Year.world.total = mean(Value)*n())
# Visualize World Crop Production (tonnes) by Year
ggplot(ProdCrops.World.year) +
aes(x = Year, y = Year.world.total/1e+09, col = "red") +
geom_point(size = 2,
alpha = 0.5) +
geom_smooth(method = "loess",
span = 0.2) +
theme_bw() +
theme(legend.position = "none") +
labs(y = "Total Crop Production (bn tonnes)",
x = "Year",
title = "World's Total Crop Production (bn tonnes) from 1961 to 2014")
# Get world map data
world <- map_data("world")
# Analyze by Area/Region - Crop Production & Year
ProdCropsArea2014 <- Production_Crops %>%
group_by(Area) %>%
filter(Element == "Production" & Year == "2014" & Flag != "A") %>%
filter(!is.na(Value)) %>%
summarize(Prod2014.Area.count = n(),
Prod2014.Area.mean = mean(Value),
Prod2014.Area.median = median(Value),
Production.tonnes = mean(Value)*n())
names(ProdCropsArea2014)[1] <- "region"
## Datasets_Cleanup
# Anti_join crop data to world data
region_names_mismatch_crop <- ProdCropsArea2014 %>%
anti_join(world, by = "region")
# Anti_join world data to crop data
region_names_mismatch_world <- world %>%
anti_join(ProdCropsArea2014, by = "region")
# Name mismatch for region lookup
mismatch.region.lookup <- data.frame(
region = c("Bolivia (Plurinational State of)",
"China, mainland",
"China, Taiwan Province of",
"Democratic People's Republic of Korea",
"Iran (Islamic Republic of)",
"Republic of Korea",
"Russian Federation",
"Syrian Arab Republic",
"United Republic of Tanzania",
"United States of America",
"Venezuela (Bolivarian Republic of)",
"Viet Nam"),
New.region = c("Bolivia",
"China",
"Taiwan",
"North Korea",
"Iran",
"South Korea",
"Russia",
"Syria",
"Tanzania",
"USA",
"Venezuela",
"Vietnam"),
stringsAsFactors = FALSE)
#View(mismatch.region.lookup)
# Left join the modified region names
ProdCropsArea2014_mod1 <- ProdCropsArea2014 %>%
left_join(mismatch.region.lookup, by = "region")
ProdCropsArea2014_mod1[22,1]= "Bolivia"
ProdCropsArea2014_mod1[42,1]="China"
ProdCropsArea2014_mod1[43,1]="Taiwan"
ProdCropsArea2014_mod1[53,1]="North Korea"
ProdCropsArea2014_mod1[91,1]= "Iran"
ProdCropsArea2014_mod1[155,1]="South Korea"
ProdCropsArea2014_mod1[158,1]="Russia"
ProdCropsArea2014_mod1[185,1]="Syria"
ProdCropsArea2014_mod1[202,1]="Tanzania"
ProdCropsArea2014_mod1[203,1]="USA"
ProdCropsArea2014_mod1[207,1]="Venezuela"
ProdCropsArea2014_mod1[208,1]="Vietnam"
# Left join world data to crop data
region_crops <- world %>%
left_join(ProdCropsArea2014_mod1, by = "region")
# Base world map
world_base <- ggplot(data = world,
mapping = aes(x = long, y = lat, group = group)) +
coord_fixed(1.2) +
geom_polygon(color = "black", fill = NA)
# Rainbow colored gradient fill for total crops produced (tonnes) in 2014
World_map_Crops_Prod <- world_base +
geom_polygon(data = region_crops,
aes(fill = Production.tonnes),
color = "white") +
geom_polygon(color = "black", fill = NA) +
theme_bw() +
theme(legend.position = "none") +
scale_fill_gradientn(colours = rev(rainbow(7)),
breaks = 10^seq(3,9),
trans = "log10") +
ggtitle("Geographical Distribution of Crops Produced in 2014")
World_map_Crops_Prod
# Analyze top 5 countries producing top 5 crops in 2014
Crop.Country.top5 <- Production_Crops %>%
filter(Element == "Production" & Year == "2014" & Flag != "A") %>%
filter(`Item Code` %in% c(156,56,15,27,116)) %>%
filter(!is.na(Value)) %>%
group_by(Item) %>%
arrange(Item, desc(Value)) %>%
top_n(n=5, wt = Value)
# Stacked Column chart for top 5 crops
# with top 5 countries producing them by year
Crop.Country.top5$Item <- factor(Crop.Country.top5$Item,
levels=unique(Crop.Country.top5$Item))
ggplot(Crop.Country.top5, aes(x= reorder(Item, -Value),
y=Value/1e+09, fill=Area)) +
geom_bar(stat="identity") +
xlab("Top 5 Crops Produced") +
ylab("Crop Production (billion tonnes)") +
ggtitle("Top 5 Crops Produced with top 5 Countries") +
theme_bw()
# Subset World Population from 1961 to 2014
World.Pop <- IDBext001 %>%
group_by(YR) %>%
filter(!is.na(BSEX)) %>%
filter(YR %in% 1961:2014) %>%
summarize(Population.World = mean(BSEX)*n())
names(World.Pop)[1] = "Year"
# Analyze the World's total annual crop production (tonnes) by YR
Total.Crops.Prod.YR <- Production_Crops %>%
filter(Element == "Production" & Area == "World") %>%
group_by(Year) %>%
filter(!is.na(Value)) %>%
summarize(CropsProduction.World = mean(Value)*n())
# Add World's Total_Crops_Produced by Year
World.Pop <- World.Pop %>%
left_join(Total.Crops.Prod.YR, by = "Year")
# Linear Model of Crop Prodiuction vs Population
xyplot(CropsProduction.World/1e+09 ~ Population.World/1e+09,
data = World.Pop,
type = c("p", "r"),
xlab = "World Population (bn)",
ylab = "Total Crops Produced (bn tonnes)",
main = "Crops Produced vs World Population",
col = "blue",
pch = 19)
# Get coefficients for the linear model
fit <- lm(CropsProduction.World ~ Population.World,
data = World.Pop)
# coef(fit)
#(Intercept) Population.World
#-4.188068e+09 3.146699e+00
# Model Equation
# CropsProduction.World = 3.146699*Population.World - 4188068000
# Get the projected population from 2015 to 2050
Pop.world.projected <- Pop.world.year.stats[(66:101), c(1,5)]
# Projected production for crops using the Linear Model equation
CropsProduction.projected <- 3.146699*Pop.world.projected$Pop.world.total - 4188068000
# New data frame with projected World's population and production
Projected.Pop.Crops <- cbind.data.frame(Pop.world.projected,
CropsProduction.projected)
# gather() in tidyr
Projected_long <- Projected.Pop.Crops %>%
gather(Projections, value,
Pop.world.total:CropsProduction.projected)
# Visualize Projected World Crop Production (tonnes) & Population by Year
ggplot(Projected_long) +
aes(x = YR, y = value/1e+09, col = Projections) +
geom_point(size = 2,
alpha = 0.5) +
theme_bw() +
theme(legend.position = "bottom") +
labs(y = "Crop Production & Population (bn)",
x = "Year",
title = "World's Total Crop Production & Population (projected)")
r_refs(file = "references.bib")
r_refs(file = "references.bib")
Aust, Frederik, and Marius Barth. 2017. papaja: Create APA Manuscripts with R Markdown. https://github.com/crsh/papaja.
FAO. 2016. FAOstats. http://www.fao.org/faostat/en/#data.
Hartemink, Alfred E. 2007. “Soil Science, Population Growth and Food Production: Some Historical Developments.” In Advances in Integrated Soil Fertility Management in Sub-Saharan Africa: Challenges and Opportunities, edited by Andre Bationo, Boaz Waswa, Job Kihara, and Joseph Kimetu, 85–97. Dordrecht: Springer Netherlands. doi:10.1007/978-1-4020-5760-1_6.
Henry, Lionel, and Hadley Wickham. 2017. Purrr: Functional Programming Tools. https://CRAN.R-project.org/package=purrr.
IDB. 2016. International Database. https://www.census.gov/population/international/data/index.html.
Muller, Kirill. 2017. Bindrcpp: An ’Rcpp’ Interface to Active Bindings. https://CRAN.R-project.org/package=bindrcpp.
Muller, Kirill, and Hadley Wickham. 2017. Tibble: Simple Data Frames. https://CRAN.R-project.org/package=tibble.
R Core Team. 2017. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Richard A. Becker, Original S code by, and Allan R. Wilks. R version by Ray Brownrigg. 2016. Mapdata: Extra Map Databases. https://CRAN.R-project.org/package=mapdata.
Richard A. Becker, Original S code by, Allan R. Wilks. R version by Ray Brownrigg. Enhancements by Thomas P Minka, and Alex Deckmyn. 2017. Maps: Draw Geographical Maps. https://CRAN.R-project.org/package=maps.
Sarkar, Deepayan. 2008. Lattice: Multivariate Data Visualization with R. New York: Springer. http://lmdvr.r-forge.r-project.org.
Sustainablescale. 2016. Sustainablescale.org. http://www.sustainablescale.org/areasofconcern/population/populationandscale/quickfacts.aspx.
Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.
———. 2017a. Stringr: Simple, Consistent Wrappers for Common String Operations. https://CRAN.R-project.org/package=stringr.
———. 2017b. Tidyr: Easily Tidy Data with ’Spread()’ and ’Gather()’ Functions. https://CRAN.R-project.org/package=tidyr.
———. 2017c. Tidyverse: Easily Install and Load ’Tidyverse’ Packages. https://CRAN.R-project.org/package=tidyverse.
Wickham, Hadley, and Winston Chang. 2017. Devtools: Tools to Make Developing R Packages Easier. https://CRAN.R-project.org/package=devtools.
Wickham, Hadley, Romain Francois, Lionel Henry, and Kirill Muller. 2017. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Wickham, Hadley, Jim Hester, and Romain Francois. 2017. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.
WorldBank. 2016. World Bank Data. http://data.worldbank.org/data-catalog.