This publication purpose is to fulfill module learn by building (LBB) : Programming for Data Science. In this publication, we will digging down and explore rice production in the world then going details for Indonesia
Rice is the seed of the grass species Oryza sativa (Asian rice) or less commonly Oryza glaberrima (African rice). As a cereal grain, it is the most widely consumed staple food for a large part of the world’s human population, especially in Asia and Africa. It is the agricultural commodity with the third-highest worldwide production
Rice, a monocot, is normally grown as an annual plant, although in tropical areas it can survive as a perennial and can produce a ratoon crop for up to 30 years. Rice cultivation is well-suited to countries and regions with low labor costs and high rainfall, as it is labor-intensive to cultivate and requires ample water. However, rice can be grown practically anywhere, even on a steep hill or mountain area with the use of water-controlling terrace systems. Although its parent species are native to Asia and certain parts of Africa, centuries of trade and exportation have made it commonplace in many cultures worldwide.
The traditional method for cultivating rice is flooding the fields while, or after, setting the young seedlings. This simple method requires sound irrigation planning but reduces the growth of less robust weed and pest plants that have no submerged growth state, and deters vermin. While flooding is not mandatory for the cultivation of rice, all other methods of irrigation require higher effort in weed and pest control during growth periods and a different approach for fertilizing the soil.
This publication using 2 sources data :
1. World Rice Production Profile
2. Indonesia Rice Production
Please find some library that we will use in this publication :
library(tidyverse)
library(ggplot2)
library(rmarkdown)
library(epuRate)
library(ggplot2)
library(plotly)
library(glue)
library(scales)
library(sf)
library(plotly)
library(leaflet)
library(leaflet.extras)Then we will continue import those data set into R. Assign World Rice Production Profile to world and assign Indonesia Rice Production to indo
world <- read.csv("Production_Crops_E_All_Data_(Normalized).csv")
indo <- read.csv("Profile Produksi Rice Indonesia 2018-2020.csv") Next, we will inspect the data before we do some analysis
Let’s check header data :
## 'data.frame': 2513868 obs. of 11 variables:
## $ Area.Code : int 2 2 2 2 2 2 2 2 2 2 ...
## $ Area : chr "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
## $ Item.Code : int 221 221 221 221 221 221 221 221 221 221 ...
## $ Item : chr "Almonds, with shell" "Almonds, with shell" "Almonds, with shell" "Almonds, with shell" ...
## $ Element.Code: int 5312 5312 5312 5312 5312 5312 5312 5312 5312 5312 ...
## $ Element : chr "Area harvested" "Area harvested" "Area harvested" "Area harvested" ...
## $ Year.Code : int 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 ...
## $ Year : int 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 ...
## $ Unit : chr "ha" "ha" "ha" "ha" ...
## $ Value : num 0 5900 6000 6000 6000 5800 5800 5800 5700 5700 ...
## $ Flag : chr "F" "F" "F" "F" ...
We have 11 columns with column item is list of crops that we have in our data set. Data was gathered for most of country since 1975. Column value is number related with parameter that we have in column element
Let’s check if the dataset has null value
## Area.Code Area Item.Code Item Element.Code Element
## 0 0 0 0 0 0
## Year.Code Year Unit Value Flag
## 0 0 0 154312 0
We have some null value on value column. We will keep as is for the null value
Because we will focus only for rice crops. We will do clean up and do some filtering for the crops where based on data below we have ~175 type crops.
Let’s check with column Element
We have 3 list under Element : Area Harvested, Yield and Production. For this publication we will focus for Area Harvested and Production and select only for Rice, paddy to new dataset world_rice
world_rice <- world %>%
select(Year, Area, Item, Element, Value, Unit) %>%
filter(Item =="Rice, paddy", Element != "Yield")
head(world_rice,8)Now we have world_rice dataset that consist of Rice, paddy data from each country since 1961 until 2020 with focus to Area Harvested and Production. Let’s move to Indonesia Rice data
Let’s check header data :
## 'data.frame': 315 obs. of 6 variables:
## $ X : int 0 1 2 3 4 5 6 7 8 9 ...
## $ PROVINCE: chr "ACEH" "SUMATERA UTARA" "SUMATERA BARAT" "RIAU" ...
## $ CATEGORY: chr "Luas Panen (ha)" "Luas Panen (ha)" "Luas Panen (ha)" "Luas Panen (ha)" ...
## $ YEAR : int 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
## $ VALUE : num 320753 400301 309365 71632 86233 ...
## $ UNIT : chr "ha" "ha" "ha" "ha" ...
We have 5 columns that consist for last 3 years of each province in Indonesia. Let’s check if the dataset has null value
## X PROVINCE CATEGORY YEAR VALUE UNIT
## 0 0 0 0 0 0
Let’s check with column CATEGORY
We have 3 category inside CATEGORY columns. We will keep those 3 data for the next analysis. To simplify dataset, We will remove column index X and assign new data to indo_rice
Let’s check our rice historical area harvested and production since 1961 until the recent data.
We have World list at column Area, so we will use for filtering the data. Let’s check historical rice production in the world since 1961.
world_rice %>%
filter(Area == "World", Element == "Production") %>%
ggplot(mapping= aes(x=Year, y = Value/10^6))+
geom_line(mapping = aes(col= Element),size = 1.7, show.legend = FALSE)+
geom_point(mapping = aes(col= Element),shape=16 ,size = 2.5, show.legend = FALSE)+
theme_minimal() +
theme(legend.position = "right",
text = element_text(size=10),
#plot.title = element_text(size = 20, face ='bold'),
plot.subtitle = element_text(size = 8, face ='italic'),
plot.caption = element_text(size = 8, face ='italic'),
legend.title = element_text(size = 14, face ='bold'),
axis.text.x = element_text(color = "grey20", size = 8, angle = 0, hjust = .5, vjust = .5,face='bold'),
axis.text.y = element_text(color = "grey20", size = 8, angle = 0, hjust = 1, vjust = 0, face='bold'),
axis.title.x = element_text(color = "grey20", size = 10, angle = 0, hjust = .5, vjust = -2, face='bold'),
axis.title.y = element_text(color = "grey20", size = 10, angle = 90, hjust = 0.5, vjust = 2, face='bold')
)+
labs(title = 'WORLD RICE PADDY PRODUCTION PROFILE',
subtitle = 'data from 1961-2019',
caption ='source data : http://www.fao.org/faostat/en/#data')+
xlab('Year')+
ylab('Production (MMTON)')+
scale_y_continuous(labels = scales::number_format(accuracy = 1, decimal.mark = '.'))+
scale_x_continuous(breaks = c(1960,1970,1980,1990,2000,2010,2020))+
scale_color_manual(values = c('#00897b')) As we can see from the chart above show the trend still increasing for rice production. Assume it align with our growth of population in the world. If we looking from last 5 years details, in 2019 we achieved ~755 MM Ton rice production from ~162 MM hectare area harvested.
world_rice %>%
filter(Area == "World") %>%
mutate(Area_Harvested = ifelse(Element == 'Area harvested',Value,0),
Production = ifelse(Element == 'Production', Value,0)) %>%
select(Year, Area, Item, Area_Harvested, Production) %>%
group_by(Year, Area, Item) %>%
summarise(Area_Harvested_ha = sum(Area_Harvested), Production_tonnes = sum(Production), .groups='drop') %>%
arrange(-Year) %>%
head(5)Then if we populated 2020 rice production data, show China, India and Indonesia is the top three of highest rice production. However if looking from value of rice production below, China and India is the only country with rice production higher than ~150 M Ton / year.
world_rice %>%
filter(Element == "Production", Year == 2019) %>%
filter(!Area %in% c("World","Asia","Southern Asia","Eastern Asia","China, mainland","South-eastern Asia","Net Food Importing Developing Countries",
"Least Developed Countries","Low Income Food Deficit Countries","Africa","South America","Western Africa","Americas",
"Land Locked Developing Countries")) %>% #filter out row data that not belong to countries type
select(Area, Element, Value, Unit) %>%
arrange(-Value) %>%
head(10) %>%
ggplot(mapping= aes(x = reorder(Area, -Value), y= Value/10^6))+
geom_col(mapping = aes(fill=Area), alpha = 1) +
geom_text(mapping = aes(label = round(Value/10^6,0)), position = position_dodge(0.9), vjust = -0.5, size = 3) +
theme_minimal() +
theme(legend.position = "none",
text = element_text(size=10),
#plot.title = element_text(size = 20, face ='bold'),
plot.subtitle = element_text(size = 8, face ='italic'),
plot.caption = element_text(size = 8, face ='italic'),
legend.title = element_text(size = 14, face ='bold'),
axis.text.x = element_text(color = "grey20", size = 8, angle = 0, hjust = .5, vjust = .5,face='italic'),
axis.text.y = element_text(color = "grey20", size = 8, angle = 0, hjust = 1, vjust = 0, face='italic'),
axis.title.x = element_text(color = "grey20", size = 10, angle = 0, hjust = .5, vjust = -2, face='bold'),
axis.title.y = element_text(color = "grey20", size = 10, angle = 90, hjust = 0.5, vjust = 2, face='bold')
)+
labs(title = 'TOP 10 COUNTRY OF RICE PRODUCTION',
subtitle = 'data 2019',
caption ='source data : http://www.fao.org/faostat/en/#data')+
xlab('Country')+
ylab('Production (MMTON)')+
scale_y_continuous(labels = scales::number_format(accuracy = 1, decimal.mark = '.'))+
scale_fill_manual(values = c("#004d40","#00695c","#00796b","#00897b","#009688","#26a69a","#4db6ac","#80cbc4","#b2dfdb","#e0f2f1"))From the bar chart above show Indonesia was on 3rd place for world rice production. In 2019, Indonesia able to produce ~55 MTon.
Let’s check historical rice production in Indonesia since 1961.
world_rice %>%
filter(Area == "Indonesia", Element == "Production") %>%
ggplot(mapping= aes(x=Year, y = Value/10^6))+
geom_line(mapping = aes(col= Element),size = 1.7, show.legend = FALSE)+
geom_point(mapping = aes(col= Element),shape=16 ,size = 2.5, show.legend = FALSE)+
theme_minimal() +
theme(legend.position = "right",
text = element_text(size=10),
#plot.title = element_text(size = 20, face ='bold'),
plot.subtitle = element_text(size = 8, face ='italic'),
plot.caption = element_text(size = 8, face ='italic'),
legend.title = element_text(size = 14, face ='bold'),
axis.text.x = element_text(color = "grey20", size = 8, angle = 0, hjust = .5, vjust = .5,face='bold'),
axis.text.y = element_text(color = "grey20", size = 8, angle = 0, hjust = 1, vjust = 0, face='bold'),
axis.title.x = element_text(color = "grey20", size = 10, angle = 0, hjust = .5, vjust = -2, face='bold'),
axis.title.y = element_text(color = "grey20", size = 10, angle = 90, hjust = 0.5, vjust = 2, face='bold')
)+
labs(title = 'INDONESIA RICE PADDY PRODUCTION PROFILE',
subtitle = 'data from 1961-2019',
caption ='source data : http://www.fao.org/faostat/en/#data')+
xlab('Year')+
ylab('Production (MMTON)')+
scale_y_continuous(labels = scales::number_format(accuracy = 1, decimal.mark = '.'))+
scale_x_continuous(breaks = c(1960,1970,1980,1990,2000,2010,2020))+
scale_color_manual(values = c('#00897b'))indo_rice %>%
filter(PROVINCE == "INDONESIA") %>%
mutate(Area_Harvested = ifelse(CATEGORY == 'Luas Panen (ha)',VALUE,0),
Production = ifelse(CATEGORY == 'Produksi (ton)', VALUE,0)) %>%
select(YEAR, PROVINCE, Area_Harvested, Production) %>%
group_by(YEAR, PROVINCE) %>%
summarise(Area_Harvested_ha = sum(Area_Harvested), Production_Tonnes = sum(Production), .groups='drop') %>%
arrange(-YEAR) %>%
head(5)Based on average last 3 years (2018-2020), show Indonesia has range rice production 54-59 MM Ton with area harvested 10-11 MM hectare. Please find the details for rice production for each province on 2020. As we can see, all Jawa Province still dominated rice production in Indonesia then follow Sulawesi Selatan and Sumatera Selatan.
indo_rice %>%
filter(YEAR == 2020, PROVINCE != "INDONESIA") %>%
mutate(Area_Harvested = ifelse(CATEGORY == 'Luas Panen (ha)',VALUE,0),
Production = ifelse(CATEGORY == 'Produksi (ton)', VALUE,0)) %>%
select(YEAR, PROVINCE, Area_Harvested, Production) %>%
group_by(YEAR, PROVINCE) %>%
summarise(Area_Harvested_ha = sum(Area_Harvested), Production_Tonnes = sum(Production), .groups='drop') %>%
arrange(-Production_Tonnes) A work by Yana Wicaksana