For predicting whether the agricultural epidemic of powdery mildew in mango will erupt in a certain year in the state of Uttar Pradesh in India, Misra et al. (2004) used annual outbreak 9 A. K. Misra, O. M. Prakash, and V. Ramasubramanian. Forewarning powdery mildew caused by Oidium mangiferae in mango (Mangifera indica) using logistic regression models. Indian Journal of Agricultural Science, 74(2):84-87, 2004 records during 1987-2000. The epidemic typically occurs in the third and fourth week of March, and hence outbreak status is known by the end of March of a given year. The authors used a logistic regression model with two weather predictors (maximum temperature and relative humidity) to forecast an outbreak. The data is shown in the table below and are available in PowderyMildewEpidemic.xls.
library(pander)
library(forecast)
library(caret)
library(dplyr)
setwd("/Users/Chris Iyer/Documents/")
mildew <- read.csv("PowderyMildewEpidemic.csv")
mildew <- mildew %>% mutate(Outbreak1 = ifelse(Outbreak == "Yes", 1,0))
pander(mildew)
length(mildew)
Is the mildew following a trend?
mildew.ts <- ts(mildew$RelHumidity, start = 1987, frequency = 1)
plot(mildew.ts)
** Question: 1. In order for the model to serve as a forewarning system for farmers, what requirements must be satisfied regarding data availability?**
Answer: In order to generate a forecast using logistic regression, the measurement of the object of interest can either be numerical or binary, however since the forecast is binary, if the object of interest is numerical, the forecaster has to create a derived variable. In the case of predicting whether the powdery mildew will occur in Uttar Pradesh, India, we are given a dataset with the measurements of two predictor variables and a binary yes/no for the outbreak outcome. The outbreak variable has to be converted to a 0,1 for no/yes outcomes. The 0,1 is the derived variable.
\(log(odds(Mildew)_t=\beta_0 + \beta_{mildew_{t-1}} + \beta_{humidity_{t-1}} + \beta_{maxTemp_{t-1}}\)
plot(mildew$MaxTemp ~ mildew$RelHumidity, xlab = "Relative Humidity", ylab = "Max Temp (degrees Celsius)", col = mildew$Outbreak1 + 1, bty = "l", pch = 15)
legend(60,29, c("No Outbreak", "Outbreak"), col = 1:2, pch = 15, bty = "l")
plot(mildew$RelHumidity ~ mildew$MaxTemp, ylab = "Relative Humidity", xlab = "Max Temp (degrees Celsius)", col = mildew$Outbreak1 + 1, bty = "l", pch = 15)
legend(27,70, c("No Outbreak", "Outbreak"), col = 1:2, pch = 15, bty = "l")
naiveForecasts <- mildew$Outbreak1[(length(mildew$Outbreak1)-1-3) : (length(mildew$Outbreak1)-1)]
class(naiveForecasts)
[1] "numeric"
confusionMatrix(naiveForecasts, mildew$Outbreak1[(length(mildew$Outbreak1)-3): length(mildew$Outbreak1)], positive = c("1"))
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 1 1
1 2 0
Accuracy : 0.25
95% CI : (0.0063, 0.8059)
No Information Rate : 0.75
P-Value [Acc > NIR] : 0.9961
Kappa : -0.5
Mcnemar's Test P-Value : 1.0000
Sensitivity : 0.0000
Specificity : 0.3333
Pos Pred Value : 0.0000
Neg Pred Value : 0.5000
Prevalence : 0.2500
Detection Rate : 0.0000
Detection Prevalence : 0.5000
Balanced Accuracy : 0.1667
'Positive' Class : 1
trainOutMildew <- mildew[1:8,]
trainOutMildew
outLogRegMildew <- glm(Outbreak1 ~ MaxTemp + RelHumidity, data = trainOutMildew, family = "binomial")
summary(outLogRegMildew)
Call:
glm(formula = Outbreak1 ~ MaxTemp + RelHumidity, family = "binomial",
data = trainOutMildew)
Deviance Residuals:
1 2 3 4 5 6
0.7466 -1.7276 -0.3132 1.0552 -1.1419 1.2419
7 8
-0.3908 0.6060
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -56.1543 44.4573 -1.263 0.207
MaxTemp 1.3849 1.1406 1.214 0.225
RelHumidity 0.1877 0.1578 1.189 0.234
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 11.0904 on 7 degrees of freedom
Residual deviance: 8.1198 on 5 degrees of freedom
AIC: 14.12
Number of Fisher Scoring iterations: 5
predictionsMildew <- predict(outLogRegMildew, mildew[9:12,],type = "response")
predictionsMildew
9 10 11 12
0.1119407 0.7021411 0.5705413 0.3894790
Cutoff value = 0.5
confusionMatrix(ifelse(predictionsMildew > 0.5, 1,0), mildew[9:12,]$Outbreak1, positive = c("1"))
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 2 0
1 1 1
Accuracy : 0.75
95% CI : (0.1941, 0.9937)
No Information Rate : 0.75
P-Value [Acc > NIR] : 0.7383
Kappa : 0.5
Mcnemar's Test P-Value : 1.0000
Sensitivity : 1.0000
Specificity : 0.6667
Pos Pred Value : 0.5000
Neg Pred Value : 1.0000
Prevalence : 0.2500
Detection Rate : 0.2500
Detection Prevalence : 0.5000
Balanced Accuracy : 0.8333
'Positive' Class : 1