FARMER’S GROSS MARGIN PER HECTARE

Date created: August 5, 2017
Latest date updated: September 13, 2017

SYNOPSIS

The purpose of this report is to address AVANSE INDICATOR 1.1 EG.3-6,7,8 (FORMERLY F, FTF 4.5-16, 17, 18 FARMER’S GROSS MARGIN PER HECTARE, PER ANIMAL OR PER CAGE OBTAINED WITH USG ASSISTANCE (RAA) for the Winter campaign. The gross margin is the difference between the total value of smallholder production of an agricultural commodity (crop, fish, milk, eggs, live animals) and the cost of producing that commodity, divided by the total number of units in production (hectares of crops, pond area in hectares for pond aquaculture, cage count for open water aquaculture, number of animals in the herd for live animal sales, number of producing cows or hens for dairy or eggs). Gross margin per hectare, per animal and per cage is a measure of net income from that farm, fisheries, or livestock activity.

Gross margin is calculated automatically by FTFMS from the following data points, reported as totals across all direct beneficiaries, and disaggregated by commodity and by sex:

Targeted commodity (i.e.type of crop) as per FTFMS;
Sex of beneficiary farmer: Male, Female, Joint Association-applied

Gross margin per ha, per animal, per cage = [(TP x VS/QS) - IC ] / UP. It is more explicitly expressed by the formula below: \[\frac{\frac{Total~production*Total~value~of~sales}{Volume~of~sales} - {Inut~costs}}{Total~hectares~planted}\]

Unit of Measure: is dollars/hectare (Haitian gourdes will be converted to USD when reporting results). Gross margin is calculated from five(5) data points:

Total Production (kg, mt, number, or other unit of measure) by all direct beneficiaries during the reporting period (TP);
Total Value of Sales (U.S. dollars) by all direct beneficiaries during the reporting period (VS);
Total Quantity of Sales (kg, mt, number or other unit of measure) by all direct beneficiaries during the reporting period (QS);
Total Recurrent Cash Input Costs (U.S. dollars) of all direct beneficiaries during the reporting period (IC);
Total Units of Production: Area planted in ha (for crops); Area in ha (for aquaculture ponds); Number of animals in herd for live animal or meat sales; Number of animal in production for dairy or eggs; Number of cages for open water aquaculture for direct beneficiaries during the production period (UP).

We have used a sample of 367 farmers dataset from the rice mobile data collection launched by the GIS Specialist. We have explored the relationship between a set of variables and Gross margin per hectare. We have choosen on the basis of specific criteria a model that gives us a better sample mean. With this model we have illustrated that Gross margin mean of female farmers is 1262 US dollars and Gross margin mean of male farmers is 1235 US dollars. A test statistic tells us there is not a difference in Gross margin mean between male and female farmers.

SAMPLE SIZE

The formula for calculating the initial sample size for the estimation of indicators of totals is given by: \[initial~sample~size~=~n_{initial}~=~\frac{N^2*z^2*s^2}{Total~hectares~planted}\] Where:

N = total number of beneficiary farmers
z = critical value from Normal Probability Distribution
s = standard deviation of the distribution of beneficiary data
MOE = margin of error

Components of the Formula

The following section provides a description of each of the components of the sample size formula given above, along with recommendations on how to estimate each of them.

Total Number of Beneficiaries (N). The first component of the formula is N, which is the total number of beneficiary farmers participating in the relevant project interventions tracked by the above indicator at the time of the design of the survey. N includes only smallholder beneficiary, producers engaged in land-based agriculture.

Critical Value from the Normal Probability Distribution (z). The next component is z, the critical value that is a fixed value from the Normal Probability Distribution, which is one of the most commonly used probability distributions in statistics and which follows the well-recognized “bell” shape. The point on the Normal Probability Distribution curve corresponding to a 95% “confidence level” 32 is typically chosen; this corresponds to a critical value of 1.96 on the Normal Probability Distribution. Therefore, Feed the Future IPs should use a fixed value of z = 1.96 for the purposes of calculating sample sizes in the current context.

Standard deviation (s) of the distribution of beneficiary data. The third component of the sample size formula is s, the standard deviation of the distribution of beneficiary data. This standard deviation is a measure of dispersion in the beneficiary-level data around the central value in the sample distribution and provides an indication of how much variation there is in the individual data points. The standard deviation is expressed in the same units as the indicator itself, and can be calculated directly from survey data.

Margin of error (MOE). The final component of the sample size formula is MOE, the margin of error, which is the half-width of a confidence interval around the estimate of the indicator representing a total, and is expressed in the same units as the indicator used as a basis for the sample size calculation. A smaller MOE results in a larger sample size, whereas a larger MOE results in a smaller sample size.
There is no generalized rule of thumb for specifying the value of the MOE to use in the sample size calculation relating to indicators of totals. However, an estimate can be obtained for the MOE using the following formula: \[MOE = p*target~value~of~indicator\] This formula has two terms. The first term, p, denotes an acceptable percentage error, and is typically subjectively specified to range between 5% and 10% (expressed as p = 0.05 and p = 0.10, respectively). Specifying p = 0.05 will result in a sample size that is greater (and often much greater) than specifying p = 0.10. For purposes of Feed the Future annual performance monitoring for both FFP and non-FFP projects, p = 0.10 should be used, unless this results in an overall survey sample size of less than 525 beneficiaries, in which case a sample size of 525 should be adopted. More detail on this guidance is provided in Section 9.2.5. The second term, target value of indicator, is set by the IPs in their indicator performance tracking tables (IPTTs) as the target value for the indicator to be achieved in the year in which the survey is being conducted.

1. Show the sample mean and compare it to the theoretical mean of the distribution.

The sample mean or 367 normals or expected value is the center of the normal distribution. When we do a thousand simulation, we generate a discrete random variable x with a probability mass function p of x. It is simply the summation of the possible values that x can take, times the probability it takes them. The sample mean is the center mass if we treat each data point as equally likely. So, in other words, the sample mean is where the probability is one over N, and each data point $x_i$ has that probability. It is calculated as follows: \[\bar{X}=\sum_{i=1}^n x_ip(x_i)\] The way it is expressed in R is as follows:

Xbar<-NULL
for (i in 1:sim){      
      Xbar = c(Xbar,mean(rnorm(n,mu,sd)))  
      }
mean(Xbar)

## [1] 0.001050601

The theoritical mean is the mean of the normal distribution. It is expressed by: $\mu=0$

hist(Xbar, prob=TRUE, main="367 normals sample mean versus theoretical mean",xlab="")
abline(v=mean(Xbar),col='green',lwd='1')
abline(v=0, col='red',lwd='1')
legend(0.07,6,c('Sample mean','population mean'),cex=0.8,col=c('green','red'),lty=1)
lines(density(Xbar),col='green') # distribution of samples

The green density we have here is the result of thousands of simulations of 367 iid Gaussian. Because there’s so much simulation, this is a good approximation of the truth, and what this is simply telling us is that if we collect lots and lots of data from a population, in this case the normal distribution, We can well approximate the distribution that it comes from.

The distributin of averages of 367 iid variables is approximately centered at the same place as the normal distribution, the original population itself. What my simulation gives me is an idea about the distribution of averages of 367 iid variables. the distribution is concentrated about the mean.

The conclusion of this is that the expected value of the sample mean is exactly the population that it’s trying to estimate. So in other words, the distribution of the sample mean, the population distribution of the sample mean is centered in the same place as the original population that the data is drawn from. The estimation we are trying to do is unbiased.

2. Show that the distribution is approximately normal.

The CLT states that the distribution of averages of iid variables become that of a standard normal as the sample size increases. The basic result is that if we take the sample average of the normal distribution, substract off the mean of the normal distribution $\mu$, and divide by its standard error $\frac{\sigma}{\sqrt(n)}$, that sample limits to that of a standard normal. The following equation summarizes this statement:
\[\frac{\bar{X}-{\mu}}{{\sigma}/{\sqrt(n)}}=\frac{{\sqrt(n)}(\bar{X}-{\mu})}{\sigma}=\frac{Estimate - Mean~of~estimate}{Std.~Err.~of~Estimate}\]
Remember that the mean of the normal distribution is: $\mu=E[X_{nomr}]=0$ and that the standard deviation is: sd = 1
Let’s take the mean of the sample, substract off 0 and divide by $\frac{1}{\sqrt(367)}$ and repeat this a thousand times. If the central limit theorem is right, this should like exactly like a standard bell curve. The way is it expressed in R is as follows:

# Simulation of the normal distribution: rexp(n, lambda)
set.seed(1000)
CLT<-NULL
for (i in 1:sim){      
      CLT = c(CLT, (mean(rnorm(n,mu,sd))-0)/(1/sqrt(367))) 
      }
hist(CLT,breaks=20, prob=TRUE,
     main="Simulation of the normal distribution through the investigation 
     of averages of 367 normals",xlab="")
lines(density(scale(Xbar)),col='blue') # distribution of samples
lines(seq(-3, 3, length = 100), dnorm(seq(-3, 3, length = 100)), col = 'red') # bell curve
legend(1.5,0.3,c('Bell curve(Normal)','Simulation'),cex=0.8,col=c('red','blue'),lty=1)

The histogram displays the distribution of the different normalized averages that we got. We can see that the distribution is centered around zero because we have substracted off the mean 0. In addition the CLT tells us about the shape. The shape is like that of a bell curve. We see finally that the approximation is very good for the averages of 367 normals because the shape of the distribution of the sample size is approximately the shape of the bell curve.

We have illustrated that $\bar{X}$, the 367 iid Gaussian sample mean is approximately normal distributed with the normal distribution equal to $\mu$ and standard deviation equal to the standard error of the mean:$\frac{\sigma}{\sqrt(n)}$.

In this distribution $\mu$ plus 2 standard errors ($2\frac{\sigma}{\sqrt(n)}$) is pretty far out in the tail, with only a 2.5% chance of a normal being larger than two standard deviation in the tail. The way it is expressed in R is as follows:

# Simulation of the normal distribution: rexp(n, lambda)
set.seed(1000)
CLT<-NULL
for (i in 1:sim){      
      CLT = c(CLT, (mean(rnorm(n,mu,sd))-0)/(1/sqrt(367))) 
      }
hist(CLT,breaks=20, prob=TRUE,
     main="Illustration of mu plus two standard errors",xlab="")
lines(density(scale(Xbar)),col='blue') # distribution of samples
lines(seq(-3, 3, length = 100), dnorm(seq(-3, 3, length = 100)), col = 'red') # bell curve
legend(1.5,0.3,c('Bell curve(Normal)','Simulation'),cex=0.8,col=c('red','blue'),lty=1)
x <- seq(2, 3, length = 100)
polygon(c(x, rev(x)),c(dnorm(x), rep(0, length(x))), col = "salmon")
text(mean(x), mean(dnorm(x)) + .03, "2.5%", cex = 2)

Similarly, $\mu$ minus 2 standard errors ($2\frac{\sigma}{\sqrt(n)}$) is pretty far in the left tail, with only a 2.5% chance of a normal being smaller than two standard deviations in the left tail. The way it is expressed in R is as follows:

# Simulation of the normal distribution: rexp(n, lambda)
set.seed(1000)
CLT<-NULL
for (i in 1:sim){      
      CLT = c(CLT, (mean(rnorm(n,mu,sd))-0)/(1/sqrt(367))) 
      }
hist(CLT,breaks=20, prob=TRUE,
     main="Illustration of mu minus two standard errors",xlab="")
lines(density(scale(Xbar)),col='blue') # distribution of samples
lines(seq(-3, 3, length = 100), dnorm(seq(-3, 3, length = 100)), col = 'red') # bell curve
legend(1.5,0.3,c('Bell curve(Normal)','Simulation'),cex=0.8,col=c('red','blue'),lty=1)
x <- seq(-3, -2, length = 100)
polygon(c(x, rev(x)),c(dnorm(x), rep(0, length(x))), col = "salmon")
text(mean(x), mean(dnorm(x)) + .03, "2.5%", cex = 2)

So the probability that $\bar{X}$ is bigger than $\mu$ plus 2 standard errors or smaller than $\mu$ minus 2 standard errors is 5%. Where equivalently, the probability that $\mu$ is between these limits is 95%. We can reverse the role of $\bar{X}$, $\mu$ without changing the probality equalities and get the quantity $\bar{X} \pm \frac{2\sigma}{\sqrt(n)}$ contains $\mu$ with probability 95%. The way it is expressed in R is as follows:

# Simulation of the normal distribution: rexp(n, lambda)
set.seed(1000)
CLT<-NULL
for (i in 1:sim){      
      CLT = c(CLT, (mean(rnorm(n,mu,sd))-0)/(1/sqrt(367))) 
      }
hist(CLT,breaks=20, prob=TRUE,
     main="Confidence interval of the simulation of averages of 367 normals",xlab="")
lines(density(scale(Xbar)),col='blue') # distribution of samples
lines(seq(-3, 3, length = 100), dnorm(seq(-3, 3, length = 100)), col = 'red') # bell curve
legend(1.5,0.3,c('Bell curve(Normal)','Simulation'),cex=0.8,col=c('red','blue'),lty=1)
x <- seq(-2,2, length = 100)
polygon(c(x, rev(x)),c(dnorm(x), rep(0, length(x))), col = "royalblue")
text(mean(x), mean(dnorm(Xbar)) + .08, "95%", cex = 2)

Remember that we are treating the interval as a thousand simulation of the averages of 367 normals sample plus or minus two standard errors ($\bar{X} \pm \frac{2\sigma}{\sqrt(n)}$) while $\mu$ is fixed. So we talked about the probability that this interval contains $\mu$. The actual interpretation of this is that if we were to get a thousand simulation of the averages of 367 normals, construct a confidence interval in each case, then about 95% of the intervals we obtained would contain $\mu$, the theoretical mean of the normal distribution. The way it is expressed in R is as follows:

mean(Xbar) + c(-1,1)*qnorm(0.975)*sd(Xbar)/sqrt(length(Xbar))

## [1] -0.002243755  0.004344958

I would notate that I get the 2 by rounding up the 97.5th quantile which is closer to 1.96. I get the confidence interval -0.00224 to 0.004345. So if we were willing to assume that the simulation of the averages of 367 normals draw ideally the normal distribution, the confidence interval for the average of this sample would be -0.00224 to 0.004345.

INPUT DATA

The structure of the AVANSE beneficiary database launched by our STTA John Deriggi is mainly composed of beneficiary, distribution, training and new technology dataset. In addition, local spatial database contains parcel data which are related to beneficiary data. Based on beneficiary and parcel data, AVANSE database has been extented through mobile data collection, to address rice Gross margin, indicator.

library(RPostgreSQL)
drv <- dbDriver("PostgreSQL")
avansedb <- dbConnect(drv, host= '107.23.36.208', port='5434', dbname='avanse', user='postgres')
tbl_bio_data <- dbGetQuery(avansedb, "SELECT bio_id, telephone, cin, is_female FROM bio_data")

setwd("C:/Users/Dpierre/amel")
tbl_bio_data <- readRDS("avansedb/bio_data.rds")

Mobile data collection use googlesheets as database. Though it is a semi-structured dataset, it allows to orgnaize data by type of agriculture operation as per Excel spreadsheet. Data are downloaded and read in an Integrated Development Environment(RStudio) via the googlesheets library.

library(googlesheets)
sheet2 <- gs_title("AVANSEdbms")
tbl_parcel <- gs_read_csv(sheet2, ws = "tbl_parcel"); Sys.sleep(10)
tbl_herbcides_invoice <- gs_read_csv(sheet2, ws = "tbl_herbcides_invoice"); Sys.sleep(10)
tbl_weeding <- gs_read_csv(sheet2, ws = "tbl_weeding"); Sys.sleep(10)
tbl_yield <- gs_read_csv(sheet2, ws = "tbl_yield"); Sys.sleep(10)
tbl_encadreur <- gs_read_csv(sheet2, ws = "tbl_encadreur"); Sys.sleep(10)
tbl_nursery_design <- gs_read_csv(sheet2, ws = "tbl_nursery_design"); Sys.sleep(10)
tbl_channel <- gs_read_csv(sheet2, ws = "tbl_channel"); Sys.sleep(10)
tbl_seeds <- gs_read_csv(sheet2, ws = "tbl_seeds"); Sys.sleep(10)
tbl_transplanting <- gs_read_csv(sheet2, ws = "tbl_transplanting"); Sys.sleep(10)
tbl_rice_field_plowing <- gs_read_csv(sheet2, ws = "tbl_rice_field_plowing"); Sys.sleep(10)
tbl_harvest <- gs_read_csv(sheet2, ws = "tbl_harvest");Sys.sleep(10)
tbl_application <- gs_read_csv(sheet2, ws = "tbl_application"); Sys.sleep(10)
tbl_fertilizer <- gs_read_csv(sheet2, ws = "tbl_fertilizer"); Sys.sleep(10)
tbl_pest_name <- gs_read_csv(sheet2, ws = "tbl_pest_name"); Sys.sleep(10)
tbl_pest <- gs_read_csv(sheet2, ws = "tbl_pest"); Sys.sleep(10)
tbl_post_harvest <- gs_read_csv(sheet2, ws = "tbl_post_harvest"); Sys.sleep(10)
tbl_pesticides_name <- gs_read_csv(sheet2, ws = "tbl_pesticides_name"); Sys.sleep(10)
tbl_pesticides <- gs_read_csv(sheet2, ws = "tbl_pesticides"); Sys.sleep(10)
tbl_treatment <- gs_read_csv(sheet2, ws = "tbl_treatment"); Sys.sleep(10)
tbl_fees <- gs_read_csv(sheet2, ws = "tbl_fees"); Sys.sleep(10)
tbl_value_chain <- gs_read_csv(sheet2, ws = "tbl_value_chain"); Sys.sleep(10)
tbl_production <- gs_read_csv(sheet2, ws = "tbl_production"); Sys.sleep(10)
tbl_irrigation <- gs_read_csv(sheet2, ws = "tbl_irrigation"); Sys.sleep(10)
tbl_sales <- gs_read_csv(sheet2, ws = "tbl_sales");

DATA PROCESSING

At this step, we summarize total production, input costs, volume of sales, values of sales by farmer. We manage inconsistent, missing and outlier values through an extensive data cleaning process. We find pattern and do replacements. For examples, 999 and missing values is replaced by zero; bidon, a local unit of total production, is converted in kg. We read landscape spatial data like river and road as they will be used in spatial ovelay analysis and thematic maps. Show more

library(sqldf); library(googlesheets); library(rgdal)
options(sqldf.driver = "SQLite") # as per FAQ #7 force SQLite    
dt_PB <- sqldf("SELECT tbl_value_chain.value_chain value_chain, tbl_value_chain.Full_name name_producer, tbl_bio_data.is_female sex, tbl_production.Quantitity Total_production, tbl_production.Unit Unit, tbl_production.Total PB_value, tbl_production.Utilisation Utilisation, tbl_parcel.commune commune, tbl_parcel.superficie superficie, tbl_parcel.GPS GPS FROM (tbl_bio_data INNER JOIN tbl_parcel ON tbl_bio_data.bio_id = tbl_parcel.bio_id) INNER JOIN (tbl_value_chain INNER JOIN tbl_production ON tbl_value_chain.Value_chain = tbl_production.Value_chain) ON tbl_parcel.parcel_id = tbl_value_chain.Producer GROUP BY tbl_value_chain.Value_chain, tbl_value_chain.Full_name, tbl_bio_data.is_female, tbl_production.Quantitity, tbl_production.Unit, tbl_production.Total, tbl_production.Utilisation, tbl_parcel.commune, tbl_parcel.superficie, tbl_parcel.GPS;")

for(i in 1:nrow(dt_PB)) {
  if(dt_PB$Unit[i] == "bidon" & dt_PB$commune[i] == "OUANAMINTHE") {
    dt_PB$Total_production[i] <- dt_PB$Total_production[i]*22.5
  } else if(dt_PB$Unit[i] == "bidon" & dt_PB$commune[i] %in% c("FORT LIBERTE", "FORT LIBERTE", "FERRIER") ) {
    dt_PB$Total_production[i] <- dt_PB$Total_production[i]*25
  } else dt_PB$Total_production[i] <- dt_PB$Total_production[i]
}

dt_PB$Utilisation <- gsub("Cosommation", "Consommation", dt_PB$Utilisation)
dt_PB$Utilisation <- gsub("Cadeau ou en échange", "Cadeau", dt_PB$Utilisation)
dt_PB$commune <- gsub("ACUL.DU.NORD", "ACUL DU NORD", dt_PB$commune)
dt_PB$commune <- gsub("FORT.LIBERTE", "FORT LIBERTE", dt_PB$commune)
dt_PB$commune <- gsub("PLAINE.DU.NORD", "PLAINE DU NORD", dt_PB$commune)
PB <- aggregate(Total_production ~ ., dt_PB[ ,-c(5:7)], sum)

# PB_Sum1 <- aggregate(round(PB_value) ~ ., dt_PB[ ,c(6,7)], sum)
# PB_Sum2 <- aggregate(PB_value ~ ., dt_PB[ ,c(3,6,7)], sum)
PB_Mean1 <- aggregate(PB_value ~ ., dt_PB[ ,c(6,7)], mean)
PB_Mean1$PB_value <- round(PB_Mean1$PB_value)
PB_Mean2 <- aggregate(PB_value ~ ., dt_PB[ ,c(3,6,7)], mean)
PB_Mean2$PB_value <- round(PB_Mean2$PB_value)

sales <- subset(dt_PB[ ,c(1,4,6,7)], Utilisation == "Vendue")
names(sales) <- gsub("PB_value", "Value_sales", names(sales))
names(sales) <- gsub("Total_production", "Volume_sales", names(sales))
sales <- aggregate(cbind(Volume_sales,Value_sales) ~ ., sales, sum)

fees_irrigation_expenses <- sqldf("SELECT tbl_value_chain.Value_chain value_chain, tbl_value_chain.Full_name name_producer, tbl_fees.Cost fees_cost, tbl_irrigation.Quantity Quantity, tbl_irrigation.Price Price FROM tbl_value_chain INNER JOIN (tbl_fees LEFT JOIN tbl_irrigation ON tbl_fees.Fees_Id = tbl_irrigation.Fees_id) ON tbl_value_chain.Value_chain = tbl_fees.Value_chain GROUP BY tbl_value_chain.Value_chain, tbl_value_chain.Full_name, tbl_fees.Cost, tbl_irrigation.Quantity, tbl_irrigation.Price;")
fees_irrigation_expenses$fees_cost <- as.integer(gsub(999, 0, fees_irrigation_expenses$fees_cost))
fees_irrigation_expenses$Price <- as.integer(gsub(999, 0, fees_irrigation_expenses$Price))
fees_irrigation_expenses$Quantity <- as.integer(gsub(999, 0, fees_irrigation_expenses$Quantity))
index <- which(is.na(fees_irrigation_expenses$Quantity))
if(length(index) > 0) {
  fees_irrigation_expenses[index, ]$Quantity <- 0
}
rind <- which(is.na(fees_irrigation_expenses$Price))
if(length(rind) > 0) {
  fees_irrigation_expenses[rind, ]$Price <- 0
}
fees_irrigation_expenses$irrigation_cost <- fees_irrigation_expenses$Quantity*fees_irrigation_expenses$Price
fees_irrigation_expenses <- aggregate(irrigation_cost ~ value_chain + name_producer + fees_cost, fees_irrigation_expenses, sum)

pesticides_expenses <- sqldf("SELECT tbl_value_chain.Value_chain value_chain, tbl_value_chain.Full_name name_producer, tbl_pesticides.Quantity Quantity, tbl_pesticides.Price Price, tbl_treatment.Cost_Workforce workforce_pesticides FROM tbl_value_chain INNER JOIN (tbl_pest INNER JOIN (tbl_pesticides INNER JOIN tbl_treatment ON tbl_pesticides.Pesticides_Id = tbl_treatment.Pesticides) ON tbl_pest.Pest_Id = tbl_treatment.Pest) ON (tbl_value_chain.Value_chain = tbl_pesticides.Value_chain) AND (tbl_value_chain.Value_chain = tbl_pest.Value_chain) GROUP BY tbl_value_chain.Value_chain, tbl_value_chain.Full_name, tbl_pesticides.Quantity, tbl_pesticides.Price, tbl_treatment.Cost_Workforce;")
pesticides_expenses$workforce_pesticides <- as.integer(gsub(999, 0, pesticides_expenses$workforce_pesticides))

pesticides_expenses$Price <- as.integer(gsub(999, 0, pesticides_expenses$Price))
pesticides_expenses$Quantity <- as.integer(gsub(999, 0, pesticides_expenses$Quantity))
index <- which(is.na(pesticides_expenses$Quantity))
if(length(index) > 0) {
  pesticides_expenses[index, ]$Quantity <- 0
}
rind <- which(is.na(pesticides_expenses$Price))
if(length(rind) > 0) {
  pesticides_expenses[rind, ]$Price <- 0
}
pesticides_expenses$cost_pesticides <- pesticides_expenses$Quantity*pesticides_expenses$Price
pesticides_expenses <- aggregate(cbind(cost_pesticides, workforce_pesticides) ~ value_chain + name_producer, pesticides_expenses, sum)

harvests_expenses <- sqldf("SELECT tbl_value_chain.Value_chain value_chain, tbl_value_chain.Full_name name_producer, tbl_harvest.Price price, tbl_harvest.Quantity quantity, tbl_post_harvest.Cost post_harvest_cost, tbl_post_harvest.Other_cost other_cost FROM tbl_value_chain INNER JOIN (tbl_harvest LEFT JOIN tbl_post_harvest ON tbl_harvest.Harvest_id = tbl_post_harvest.Harvest_id) ON tbl_value_chain.Value_chain = tbl_harvest.Value_chain GROUP BY tbl_value_chain.Value_chain, tbl_value_chain.Full_name, tbl_harvest.Price, tbl_harvest.Quantity, tbl_post_harvest.Cost, tbl_post_harvest.Other_cost;")
harvests_expenses$price <- as.integer(gsub(999, 0, harvests_expenses$price))
harvests_expenses$quantity <- as.integer(gsub(999, 0, harvests_expenses$quantity))
harvests_expenses$harvest_cost <- harvests_expenses$quantity*harvests_expenses$price
harvests_expenses$post_harvest_cost <- as.integer(gsub(999, 0, harvests_expenses$post_harvest_cost))
harvests_expenses$other_cost <- as.integer(gsub(999, 0, harvests_expenses$other_cost))
index <- which(is.na(harvests_expenses$post_harvest_cost))
if(length(index) > 0) {
  harvests_expenses[index, ]$post_harvest_cost <- 0
}
rind <- which(is.na(harvests_expenses$other_cost))

if(length(rind) > 0){
 harvests_expenses[rind, ]$other_cost <- 0
}
harvests_expenses <- aggregate(cbind(harvest_cost, post_harvest_cost, other_cost) ~ value_chain + name_producer, harvests_expenses, sum)


fertilizing_expenses <- sqldf("SELECT tbl_value_chain.Value_chain value_chain, tbl_value_chain.Full_name name_producer, tbl_fertilizer.Price_calc cost_fertilizer, tbl_application.Quantity quantity, tbl_application.Price2 price FROM tbl_value_chain INNER JOIN (tbl_fertilizer LEFT JOIN tbl_application ON tbl_fertilizer.Fertilizer_id = tbl_application.Fertilizer) ON tbl_value_chain.Value_chain = tbl_fertilizer.Value_chain GROUP BY tbl_value_chain.Value_chain, tbl_value_chain.Full_name, tbl_fertilizer.Price_calc, tbl_application.Quantity, tbl_application.Price2;")
fertilizing_expenses$cost_fertilizer <- as.integer(gsub(999, 0, fertilizing_expenses$cost_fertilizer))
fertilizing_expenses$price <- as.integer(gsub(999, 0, fertilizing_expenses$price))
fertilizing_expenses$quantity <- as.integer(gsub(999, 0, fertilizing_expenses$quantity))
index <- which(is.na(fertilizing_expenses$quantity))
if(length(index) > 0) {
  fertilizing_expenses[index, ]$quantity <- 0
}
rind <- which(is.na(fertilizing_expenses$price))
if(length(rind) > 0) {
  fertilizing_expenses[rind, ]$price <- 0
}
fertilizing_expenses$cost_application <- fertilizing_expenses$quantity*fertilizing_expenses$price
fertilizing_expenses <- aggregate(cbind(cost_fertilizer, cost_application) ~ value_chain + name_producer, fertilizing_expenses, sum)


plowing_expenses <- sqldf("SELECT tbl_value_chain.Value_chain value_chain, tbl_value_chain.Full_name name_producer, tbl_rice_field_plowing.Quantity Quantity, tbl_rice_field_plowing.Price Price FROM tbl_value_chain INNER JOIN tbl_rice_field_plowing ON tbl_value_chain.Value_chain = tbl_rice_field_plowing.Value_chain GROUP BY tbl_value_chain.Value_chain, tbl_value_chain.Full_name, tbl_rice_field_plowing.Quantity, tbl_rice_field_plowing.Price;")
plowing_expenses$Price <- as.integer(gsub(999, 0, plowing_expenses$Price))
index <- which(is.na(plowing_expenses$Quantity))
if(length(index) > 0) {
  plowing_expenses[index, ]$Quantity <- 0
}
rind <- which(is.na(plowing_expenses$Price))
if(length(rind) > 0){
  plowing_expenses[rind, ]$Price <- 0
}
plowing_expenses$cost_plowing <- plowing_expenses$Quantity*plowing_expenses$Price
plowing_expenses <- aggregate(cost_plowing ~ value_chain + name_producer, plowing_expenses, sum)

transplanting_expenses <- sqldf("SELECT tbl_value_chain.Value_chain value_chain, tbl_value_chain.Full_name name_producer, tbl_transplanting.Quantity Quantity, tbl_transplanting.Price Price FROM tbl_value_chain INNER JOIN tbl_transplanting ON tbl_value_chain.Value_chain = tbl_transplanting.Value_chain GROUP BY tbl_value_chain.Value_chain, tbl_value_chain.Full_name, tbl_transplanting.Quantity, tbl_transplanting.Price, tbl_transplanting.Cost;")
transplanting_expenses$Price <- as.integer(gsub(999, 0, transplanting_expenses$Price))
transplanting_expenses$Quantity <- as.integer(gsub(999, 0, transplanting_expenses$Quantity))
index <- which(is.na(transplanting_expenses$Quantity))
if(length(index) > 0) {
  transplanting_expenses[index, ]$Quantity <- 0
}
rind <- which(is.na(transplanting_expenses$Price))
if(length(rind) > 0){
  transplanting_expenses[rind, ]$Price <- 0
}
transplanting_expenses$cost_transplanting <- transplanting_expenses$Quantity*transplanting_expenses$Price
transplanting_expenses <- aggregate(cost_transplanting ~ value_chain + name_producer, transplanting_expenses, sum)

seeds_expenses <- sqldf("SELECT tbl_value_chain.Value_chain value_chain, tbl_value_chain.Full_name name_producer, tbl_seeds.Quanitity Quantity, tbl_seeds.Price Price FROM tbl_value_chain INNER JOIN tbl_seeds ON tbl_value_chain.Value_chain = tbl_seeds.Value_chain GROUP BY tbl_value_chain.Value_chain, tbl_value_chain.Full_name, tbl_seeds.Quanitity, tbl_seeds.Price, tbl_seeds.Cost;")
seeds_expenses$Price <- as.integer(gsub(999, 0, seeds_expenses$Price))
seeds_expenses$Quantity <- as.integer(gsub(999, 0, seeds_expenses$Quantity))
index <- which(is.na(seeds_expenses$Quantity))
if(length(index) > 0) {
  seeds_expenses[index, ]$Quantity <- 0
}
rind <- which(is.na(seeds_expenses$Price))
if(length(rind) > 0){
  seeds_expenses[rind, ]$Price <- 0
}
seeds_expenses$cost_seeds <- seeds_expenses$Quantity*seeds_expenses$Price
seeds_expenses <- aggregate(cost_seeds ~ value_chain + name_producer, seeds_expenses, sum)

channel_expenses <- sqldf("SELECT tbl_value_chain.Value_chain value_chain, tbl_value_chain.Full_name name_producer, tbl_channel.Quanity Quantity, tbl_channel.Price Price FROM tbl_value_chain INNER JOIN tbl_channel ON tbl_value_chain.Value_chain = tbl_channel.Value_chain GROUP BY tbl_value_chain.Value_chain, tbl_value_chain.Full_name, tbl_channel.Quanity, tbl_channel.Price;")
channel_expenses$Price <- as.integer(gsub(999, 0, channel_expenses$Price))
channel_expenses$Quantity <- as.integer(gsub(999, 0, channel_expenses$Quantity))
index <- which(is.na(channel_expenses$Quantity))
if(length(index) > 0) {
  channel_expenses[index, ]$Quantity <- 0
}
rind <- which(is.na(channel_expenses$Price))
if(length(rind) > 0){
  channel_expenses[rind, ]$Price <- 0
}
channel_expenses$cost_channel <- channel_expenses$Quantity*channel_expenses$Price
channel_expenses <- aggregate(cost_channel ~ value_chain + name_producer, channel_expenses, sum)

nursery_expenses <- sqldf("SELECT tbl_value_chain.Value_chain value_chain, tbl_value_chain.Full_name name_producer, tbl_nursery_design.Quantity Quantity, tbl_nursery_design.Price Price FROM tbl_value_chain INNER JOIN tbl_nursery_design ON tbl_value_chain.Value_chain = tbl_nursery_design.Value_chain GROUP BY tbl_value_chain.Value_chain, tbl_value_chain.Full_name, tbl_nursery_design.Quantity, tbl_nursery_design.Price;")
nursery_expenses$Price <- as.integer(gsub(999, 0, nursery_expenses$Price))
nursery_expenses$Quantity <- as.integer(gsub(999, 0, nursery_expenses$Quantity))
index <- which(is.na(nursery_expenses$Quantity))
if(length(index) > 0) {
  nursery_expenses[index, ]$Quantity <- 0
}
rind <- which(is.na(nursery_expenses$Price))
if(length(rind) > 0){
  nursery_expenses[rind, ]$Price <- 0
}
nursery_expenses$cost_nursery <- nursery_expenses$Quantity*nursery_expenses$Price
nursery_expenses <- aggregate(cost_nursery ~ value_chain + name_producer, nursery_expenses, sum)

YIELD <- sqldf("SELECT tbl_value_chain.Value_chain value_chain, tbl_value_chain.Full_name name_producer, tbl_bio_data.is_female sex, tbl_yield.PB_kg PB_kg, tbl_yield.Yield_T_ha Yield_T_ha, tbl_yield.Area_ha Area_ha, tbl_parcel.superficie superficie FROM tbl_bio_data INNER JOIN (tbl_parcel INNER JOIN (tbl_value_chain INNER JOIN tbl_yield ON tbl_value_chain.Value_chain = tbl_yield.Value_chain) ON tbl_parcel.parcel_id = tbl_value_chain.Producer) ON tbl_bio_data.bio_id = tbl_parcel.bio_id GROUP BY tbl_value_chain.Value_chain, tbl_value_chain.Full_name, tbl_bio_data.is_female, tbl_yield.PB_kg, tbl_yield.Yield_T_ha, tbl_yield.Area_ha, tbl_parcel.superficie;")
YIELD <- aggregate(cbind(PB_kg, Yield_T_ha) ~ ., YIELD, mean)

weeding_expenses <- sqldf("SELECT tbl_value_chain.Value_chain value_chain, tbl_value_chain.Full_name name_producer, tbl_weeding.Quantity Quantity1, tbl_weeding.Price Price1, tbl_herbcides_invoice.Quantity Quantity2, tbl_herbcides_invoice.Price Price2 FROM (tbl_value_chain INNER JOIN tbl_weeding ON tbl_value_chain.Value_chain = tbl_weeding.Value_chain) LEFT JOIN tbl_herbcides_invoice ON tbl_weeding.Weeding_Id = tbl_herbcides_invoice.Weeding_Id GROUP BY tbl_value_chain.Value_chain, tbl_value_chain.Full_name, tbl_weeding.Quantity, tbl_weeding.Price, tbl_weeding.Total, tbl_herbcides_invoice.Quantity, tbl_herbcides_invoice.Price, tbl_herbcides_invoice.Total;")
weeding_expenses$Price1 <- as.integer(gsub(999, 0, weeding_expenses$Price1))
weeding_expenses$Quantity1 <- as.integer(gsub(999, 0, weeding_expenses$Quantity1))
index <- which(is.na(weeding_expenses$Quantity1))
if(length(index) > 0) {
  weeding_expenses[index, ]$Quantity1 <- 0
}
rind <- which(is.na(weeding_expenses$Price1))
if(length(rind) > 0){
  weeding_expenses[rind, ]$Price1 <- 0
}
weeding_expenses$cost_weeding <- weeding_expenses$Quantity1*weeding_expenses$Price1

weeding_expenses$Price2 <- as.integer(gsub(999, 0, weeding_expenses$Price2))
weeding_expenses$Quantity2 <- as.integer(gsub(999, 0, weeding_expenses$Quantity2))
index <- which(is.na(weeding_expenses$Quantity2))
if(length(index) > 0) {
  weeding_expenses[index, ]$Quantity2 <- 0
}
rind <- which(is.na(weeding_expenses$Price2))
if(length(rind) > 0){
  weeding_expenses[rind, ]$Price2 <- 0
}
weeding_expenses$cost_herbicides <- weeding_expenses$Quantity2*weeding_expenses$Price2
weeding_expenses <- aggregate(cbind(cost_weeding, cost_herbicides) ~ value_chain + name_producer, weeding_expenses, sum)

all_data <- sqldf("SELECT PB.value_chain, PB.name_producer, PB.commune, PB.sex, PB.superficie, PB.GPS, PB.Total_production, sales.Volume_sales, sales.Value_sales, YIELD.PB_kg, YIELD.Area_ha, YIELD.Yield_T_ha, channel_expenses.cost_channel, harvests_expenses.harvest_cost, harvests_expenses.post_harvest_cost, harvests_expenses.other_cost, fees_irrigation_expenses.fees_cost, fees_irrigation_expenses.irrigation_cost, fertilizing_expenses.cost_fertilizer, fertilizing_expenses.cost_application, weeding_expenses.cost_weeding, weeding_expenses.cost_herbicides, pesticides_expenses.cost_pesticides, pesticides_expenses.workforce_pesticides, transplanting_expenses.cost_transplanting, nursery_expenses.cost_nursery, seeds_expenses.cost_seeds, plowing_expenses.cost_plowing
FROM (((((((((((PB LEFT JOIN channel_expenses ON PB.value_chain = channel_expenses.value_chain) LEFT JOIN fees_irrigation_expenses ON PB.value_chain = fees_irrigation_expenses.value_chain) LEFT JOIN fertilizing_expenses ON PB.value_chain = fertilizing_expenses.value_chain) LEFT JOIN harvests_expenses ON PB.value_chain = harvests_expenses.value_chain) LEFT JOIN nursery_expenses ON PB.value_chain = nursery_expenses.value_chain) LEFT JOIN pesticides_expenses ON PB.value_chain = pesticides_expenses.value_chain) LEFT JOIN plowing_expenses ON PB.value_chain = plowing_expenses.value_chain) LEFT JOIN seeds_expenses ON PB.value_chain = seeds_expenses.value_chain) LEFT JOIN transplanting_expenses ON PB.value_chain = transplanting_expenses.value_chain) LEFT JOIN weeding_expenses ON PB.value_chain = weeding_expenses.value_chain) LEFT JOIN YIELD ON PB.value_chain = YIELD.value_chain) LEFT JOIN sales ON PB.value_chain = sales.value_chain;")

plowing_site <- sqldf("SELECT tbl_rice_field_plowing.Activity, tbl_rice_field_plowing.Material_service, PB.commune, tbl_rice_field_plowing.Cost FROM tbl_rice_field_plowing INNER JOIN PB ON tbl_rice_field_plowing.Value_chain = PB.Value_chain;")

harvest_site <- sqldf("SELECT tbl_harvest.Method, PB.commune, tbl_harvest.Cost FROM tbl_harvest INNER JOIN PB ON tbl_harvest.Value_chain = PB.Value_chain;")

for (i in 1:length(all_data)) {
  index <- which(is.na(all_data[ ,i]))
  if(length(index) > 0) {
    all_data[index, i] <- 0
  }
}

for(i in 1:nrow(all_data)) {
  all_data$Input_costs[i] <- sum(all_data[i, 13:28])
  all_data$Gross_margin[i] <- ((all_data$Total_production[i]*all_data[i,9])/all_data[i,8] - all_data$Input_costs[i])/all_data[i,5]
}

# index <- which(all_data$Gross_margin < 0)
# all_data <- all_data[-index, ]
all_data$sex = gsub("F", "0", all_data$sex)
all_data$sex = gsub("M", "1", all_data$sex)
all_data$sex = as.numeric(all_data$sex)

EXPLOROTARY ANALYSIS

Here we plot the mean value of Total production in gourdes and we are trying to understand how it is different from sales, seeds and other types of use. We can see that sales value is higher over cadeau, consommation, paiement metayage, semences value. Nontheless, male producers tend to sale more rice than female farmers.

We group Total production value (expressed in gourdes) by type of use disaggregated by sex or commune. We are exploring how it is different between sales, feeding, gift, seeds and land billing use.

Rice Encadreurs entered volume of production twice. Rather than measuring directly at parcels level average volume of production by square method, after collecting volume of production through multi-use responses, they converted previous data into kg. It turned out an important inconsistency in calculation and oulier results. Encadreurs explained that This option was driven due to yield activities conducted by farmers prior to data collection.
We are exploring how correlated are those sources of data. The correlation is equal to 0.89. They are strongly correlated. Nonetheless, data from direct questions to farmers are more reliable, They will be considered in Gross margin calculation.

Here we plot the frequency of Gross Margin. We can see that the distribution is skewed at high gross margin values and the highest frequency is 189. Nonetheless, frequency is very low at High Gross margin value. It is clearly illustrated in the table below. Note that we have eleven(7) farmers with negative Gross margin value in the range of 0 to -13370 HTG. We will be examining how those value are distributed between male and female farmers
,br>

Table 1. Gross margins frequency obtained by rice farmers in Northern Corridor

x	Freq	relFreq	Cumul	relcumul
(-1.34e+04,8.66e+04]	237	0.6528926	237	0.6528926
(8.66e+04,1.87e+05]	113	0.3112948	350	0.9641873
(1.87e+05,2.87e+05]	4	0.0110193	354	0.9752066
(2.87e+05,3.87e+05]	8	0.0220386	362	0.9972452
(3.87e+05,4.87e+05]	0	0.0000000	362	0.9972452
(4.87e+05,5.87e+05]	1	0.0027548	363	1.0000000

Here we smooth the relationship between Gross margin per hectare (GM/ha) and Input Costs. We are looking how this relationship is different between male and female farmers. We remove rows containing non-finites and missing values. There’s a positive relationship and it seems to be stronger for male farmers. Notice that we have nine(5) male farmers and only two(2) female farmers with negative Gross margin per hectare.

We explore Gross margin, expenses, area data. The red line is the linear regression, blue and green dots is respectively male and female farmers. The graph above shows that expenses and area_ha are poorly correlated with Gross margin. Nonetheless, they seems to have a good correlation between them. In addtion, these variables might have no variability in them and would no be useful covariates. We will identify if they have a very little variability.

##              freqRatio percentUnique zeroVar   nzv
## superficie        1.25      82.56131   FALSE FALSE
## cost_plowing      1.20      35.42234   FALSE FALSE

Input cost and area_ha have good variability and should be used in the model selection algorithm. We will be focusing on Gross margin as outcome and the set of variables: expenses, area_ha, and sex in our model selection process.

REGRESSION MODEL

Is there a difference in average Gross Margin per hectare between male and female farmers?

In this exercice, we are going to build a model. We include as per our exploratory analysis expenses and sex as explanatory variables and Gross Margin per hectare as outcome. The basic objective is to determine Gross margin mean from a 367 sample of farmers

Model1

##                  Estimate   Std. Error    t value     Pr(>|t|)
## (Intercept)  8.192205e+04 6141.6390765 13.3387927 2.924888e-33
## Input_costs -6.627864e-02    0.1841203 -0.3599746 7.190772e-01
## sex         -1.270776e+03 6429.0766937 -0.1976608 8.434220e-01

We grap only Cost coefficients. Its Estimated value is -0.06. So we estimate an expected 0.06 HTG decrease in Gross margin per hecatare for every 1% increase in value of Input Cost in holding the sex variable constant.
Notice that The t-test for $H_0: \beta_{Input_costs} = 0$ versus $H_a: \beta_{Input_costs} \neq 0$ has a P-value equal to 0.71 greater than 0.05. it is not significant.

Model2

##               Estimate Std. Error    t value     Pr(>|t|)
## (Intercept)  80765.704   5228.268 15.4478887 1.102362e-41
## factor(sex)1 -1694.753   6312.647 -0.2684695 7.884913e-01

If we include only sex, the estimate is now -1694.753, but the P-value is 0.788 greater than 0.05. The test statistic is not significative. Notice that we have only sex1, male farmer in the table. It is because R has elected to choose sex0: female farmer as the reference category. The number -1694.753 HTG is the estimated decrease in Gross margin per hectare comparing male to female farmers.

Confidence interval

## 
##  Welch Two Sample t-test
## 
## data:  Gross_margin by factor(sex)
## t = 0.31734, df = 329.46, p-value = 0.7512
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8810.92 12200.43
## sample estimates:
## mean in group 0 mean in group 1 
##        80765.70        79070.95

If we were willing to choose the model 2 as our best model, then the confidence interval for the -1694.753 HTG Gross margin difference between male and female farmers would be -8810.92 and 12200.43.

Residuals and Diagnostic

The first graph is the unajusted one. It shows a poor relationship. You have also seen in the exploratory analysis the confounder Cost is poorly correlated with sex. We can remove the effect of Cost by taking it out of sex. You can see that once we factor out Cost, we have a relationship slightly modified, and it’s basically saying if we were to fix the unajusted model2, what you would see is a slighty modified slope.

Model3

##              Estimate Std. Error  t value     Pr(>|t|)
## factor(sex)0 80765.70   5228.268 15.44789 1.102362e-41
## factor(sex)1 79070.95   3537.615 22.35148 4.360560e-70

If we omit the intercept, then the model includes both male and female farmers. Now male is not a linear combination of female, there’s two(2) means in the dataset. The expected value of the outcome should be the mean for male or female farmers. As we can see in the table Gross margin per hectare is respectively 79070.95 and 80765.70 for male and female farmer and it is clearly illustrated in the boxplot and the pie chart below.

If I were to substract 79070.95 to 80765.70, I would get exactly -1694.75. In other words, model2 and model3 are perfectly consistent. If you are going to fix the mean of male and female farmers, any way you do this, the models will always be consistent.

CONCLUSION 1

In the second and third model, we have illustrated average Gross margin per herctare between male and female farmers and a Gross margin difference of -1694.753 HTG. However the P-value is 0.788, greater than 0.05. We fail to reject the null hypothesis and claim that there is not a difference in Gross margin mean between male and female farmers.

RICE VALUE CHAIN ANALYSIS

Rice is beconming an important cash and food crop in Northern corridor since the intervention of AVANSE project. Between Base Year 2014 and Year 2016, rice gross margin grew by 78% (from $217.96 to 1,020/ha), mainly due to favourable techniques(SRI), the development of improved seeds and the high level of prices in the domestic market which constitute a strong incentive for farmers.

Rice Value Chain Actors

Producers

The most common variety grown by farmers is “Jaragwa”. They grow this variety because of its precocity(cultural cycle does not exceed three(3) months), its transformation rate(roughly 70%) and its delicious flavor. Once rice is harvested, it is threshed and winnowed at farm level. Then the unshelled rice is transported to rice mill sites or are sold to “madan sara” at farms. In theory, farmers tend to favour milling before selling as it adds substantial value to the commodity. However, it is not pratically possible, because they do not have milling machine that can process large quantities. They are forced to send paddy rice to millers. In most cases, individual farmers do not have liquidity to pay transportation from farm to miller. Also the need for immediate cash to pay household daily expenses forces farmers to sell their rice as paddy at farm level directy to “madan sara”.

Madan sara

They go around harvested farms and collect directly rice based on transportation constraint of farmers. The paddy rice which is collected is then hot and/or dried, milled via (individual miller), cleaned and sold to wholesalers. They might store it to sell it to conumers as well. Rice hot which is a type of process to have shelled rice is a very popular marketing rate at consumers level. Farmers tend to sell rice to Madan sara for this specific treatment.

Millers

They buy on site paddy rice from farmers. The unshelled rice is then dried, milled, cleaned, packaged, stored and sold to wholesalers or directly to consumers.

Wholesalers

They buy Milled rice from Millers and sell it to rural retailers.

Rural retailers

They use either “madan sara” or millers sold rice and they sell it to consumers.

Rice Value Chain Mapping

Gross Margin Analysis

Generally, the average cost of producing rice was greater in Ouanaminthe, Fort-Liberte, Acul-du-Nord, Perches and Ferrier. Harvesting is a costly operation in Ouanaminthe, Fort-Liberte, and Ferrrier since moisonneuse batteuse is increasingly used in these communes. However average gross margin per hecare value in Ouanaminthe is:44303 HTG/ha, the lowest value among farmers zones as highlighted in the cloud representation below.
We can see that Acul-du-Nord and Milot have big sizes. It is simply says that Gross margin mean is more important at Acul-du-Nord and Milot.

Though Ploughing and harvesting were the most costly operation to rice farmers in that area, they cannot explain this low value of gross margin in Ouanaminthe because other communes like Ferrier have cosly Ploughing and harvesting expenses but gross margin values are highly greater than Ouanaminthe. Thematic maps below illustrate the spatial variation of harvesting and plowing input cost in Perches and Acul-du-Nord

The map above illustrates harvest expenses per hectare in Ferrier using Kernel density. It calculates the density of parcels in a neighborhood around those parcels. In non-technical terms, this means that the value of each parcel harvest expenses is spread around its vicinity. Rather than a literal interpretation, the interpretation here is qualitative or relative. It is simply says that purple darker colored area have more parcels around them but their harvest expenses is lower comparing lighter colored area.

Table 2. Gross margins and expenses mean per hectare obtained by rice farmers in Northern Corridor

Cost_and_Gross_margin	ACUL_DU_NORD	FERRIER	FORT_LIBERTE	MILOT	OUANAMINTHE	PERCHES	PLAINE_DU_NORD
Gross_margin	106774	88039	75987	59916	94552	44303	126533
Input_costs	37858	41442	38916	41786	18432	35714	48508
cost_plowing	10247	11792	7294	8844	10861	8787	9321
cost_seeds	0	8	516	0	0	109	306
cost_nursery	259	700	1350	1990	373	316	668
cost_transplanting	4244	4226	5692	7075	2743	5112	13808
workforce_pesticides	0	338	388	774	0	304	233
cost_pesticides	0	2339	537	0	0	1288	361
cost_herbicides	1029	1135	1904	774	154	168	1899
cost_weeding	1767	1158	3501	884	1744	5776	5323
cost_application	0	574	285	221	0	280	276
cost_fertilizer	10823	7460	5269	3316	329	5171	8395
irrigation_cost	0	195	1366	3980	13	247	0
fees_cost	0	128	45	0	205	365	0
other_cost	2280	36	0	0	0	36	0
post_harvest_cost	93	48	70	0	0	85	0
harvest_cost	3792	9655	8218	10612	1072	6606	6926
cost_channel	3325	1648	2482	3316	936	1064	993

Graph 2. Gross margins and expenses mean per hectare obtained by rice farmers in Northern Corridor

## geom_path: Each group consists of only one observation. Do you need to
## adjust the group aesthetic?

Here we are trying to illustrate the frequency of harvesting methods among communes. We can see that Ouanaminthe, Ferrier et Fort-Liberte tends to use moisonneuse batteuse equipment more than other communes.

Table 3. Harvesting method versus rice farmers sites

	Main d’oeuvre	Moisonneuse batteuse
ACUL DU NORD	50	0
FERRIER	9	67
FORT LIBERTE	112	19
LIMONADE	1	0
MILOT	35	1
OUANAMINTHE	54	7
PERCHES	12	0
PLAINE DU NORD	10	0

Purple darker colored area have more parcels around them but their harvest expenses is lower comparing lighter colored area.

## Warning: Removed 4 rows containing missing values (geom_point).

## Warning: Removed 4 rows containing missing values (geom_point).

CONCLUSION 2

We have illustrated mean of Gross margin per hectare difference among communes. We have attempted to explain the low gross margin value in Ouanaminthe. Further analysis in other area of activities is required to help us determine this gross margin average difference.