Introduction

Regression analysis is one of the most widely used statistical techniques in agricultural research. It is used to study the relationship between crop yield and various influencing factors such as fertilizer application, rainfall, irrigation, temperature, and soil nutrients.

Multiple regression analysis helps quantify these relationships and enables prediction of crop yield under varying conditions.

R Studio provides an efficient and user-friendly environment for performing regression analysis, graphical visualization, and statistical interpretation.

Objectives

The objectives of this practical tutorial are:

  • To understand the concept of multiple regression analysis
  • To create and manage agricultural datasets in RStudio
  • To fit a multiple linear regression model
  • To interpret regression coefficients and statistical output
  • To generate regression plots and diagnostic plots

Software Requirements

Software Purpose
R Software Statistical Computing
RStudio Integrated Development Environment

Introduction to R and RStudio

R is an open-source programming language widely used for statistical analysis, data visualization, and predictive modelling.

RStudio is an Integrated Development Environment (IDE) for R.

Main Components of RStudio

  1. Source Editor
  2. Console
  3. Environment/History
  4. Files/Plots/Packages/Help

Agricultural Dataset Description

In this tutorial, a hypothetical agricultural dataset is used to study the effect of:

  • Fertilizer application
  • Rainfall
  • Irrigation

on crop yield.

Variables Used

Variable Description Unit
fertilizer Amount of fertilizer applied kg/ha
rainfall Seasonal rainfall received mm
irrigation Irrigation hours supplied hours
yield Crop yield quintal/ha

Creating Agricultural Dataset

cropdata <- data.frame(
  fertilizer = c(40,42,38,50,45,47,43,39,41,44,
                 46,48,37,36,49,51,52,53,35,34,
                 55,56,57,58,59,60,61,62,63,64,
                 65,66,67,68,69,70,71,72,73,74),

  rainfall = c(820,790,760,880,850,840,810,770,800,830,
               860,870,750,740,890,900,910,920,730,720,
               930,940,950,960,970,980,990,1000,1010,1020,
               1030,1040,1050,1060,1070,1080,1090,1100,1110,1120),

  irrigation = c(12,13,11,15,14,14,13,12,13,14,
                 15,15,11,10,16,16,17,17,10,9,
                 18,18,19,19,20,20,21,21,22,22,
                 23,23,24,24,25,25,26,26,27,27),

  yield = c(28,30,26,36,33,34,31,27,29,32,
            35,36,25,24,38,39,40,41,23,22,
            42,43,44,45,46,47,48,49,50,51,
            52,53,54,55,56,57,58,59,60,61)
)

cropdata
##    fertilizer rainfall irrigation yield
## 1          40      820         12    28
## 2          42      790         13    30
## 3          38      760         11    26
## 4          50      880         15    36
## 5          45      850         14    33
## 6          47      840         14    34
## 7          43      810         13    31
## 8          39      770         12    27
## 9          41      800         13    29
## 10         44      830         14    32
## 11         46      860         15    35
## 12         48      870         15    36
## 13         37      750         11    25
## 14         36      740         10    24
## 15         49      890         16    38
## 16         51      900         16    39
## 17         52      910         17    40
## 18         53      920         17    41
## 19         35      730         10    23
## 20         34      720          9    22
## 21         55      930         18    42
## 22         56      940         18    43
## 23         57      950         19    44
## 24         58      960         19    45
## 25         59      970         20    46
## 26         60      980         20    47
## 27         61      990         21    48
## 28         62     1000         21    49
## 29         63     1010         22    50
## 30         64     1020         22    51
## 31         65     1030         23    52
## 32         66     1040         23    53
## 33         67     1050         24    54
## 34         68     1060         24    55
## 35         69     1070         25    56
## 36         70     1080         25    57
## 37         71     1090         26    58
## 38         72     1100         26    59
## 39         73     1110         27    60
## 40         74     1120         27    61

Structure of Dataset

str(cropdata)
## 'data.frame':    40 obs. of  4 variables:
##  $ fertilizer: num  40 42 38 50 45 47 43 39 41 44 ...
##  $ rainfall  : num  820 790 760 880 850 840 810 770 800 830 ...
##  $ irrigation: num  12 13 11 15 14 14 13 12 13 14 ...
##  $ yield     : num  28 30 26 36 33 34 31 27 29 32 ...

Summary Statistics

summary(cropdata)
##    fertilizer       rainfall        irrigation        yield      
##  Min.   :34.00   Min.   : 720.0   Min.   : 9.00   Min.   :22.00  
##  1st Qu.:43.75   1st Qu.: 827.5   1st Qu.:13.75   1st Qu.:31.75  
##  Median :54.00   Median : 925.0   Median :17.50   Median :41.50  
##  Mean   :54.00   Mean   : 923.5   Mean   :17.93   Mean   :41.48  
##  3rd Qu.:64.25   3rd Qu.:1022.5   3rd Qu.:22.25   3rd Qu.:51.25  
##  Max.   :74.00   Max.   :1120.0   Max.   :27.00   Max.   :61.00

Scatter Plot

plot(cropdata$fertilizer,
     cropdata$yield,
     main = "Effect of Fertilizer on Crop Yield",
     xlab = "Fertilizer (kg/ha)",
     ylab = "Crop Yield (quintal/ha)",
     pch = 19)

abline(lm(yield ~ fertilizer, data = cropdata),
       col = "blue",
       lwd = 2)

Multiple Regression Analysis

Fitting Regression Model

model <- lm(yield ~ fertilizer + rainfall + irrigation,
            data = cropdata)

model
## 
## Call:
## lm(formula = yield ~ fertilizer + rainfall + irrigation, data = cropdata)
## 
## Coefficients:
## (Intercept)   fertilizer     rainfall   irrigation  
##   -17.47294      0.53455      0.02395      0.44444

Regression Summary

summary(model)
## 
## Call:
## lm(formula = yield ~ fertilizer + rainfall + irrigation, data = cropdata)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.99510 -0.15986  0.03119  0.08189  0.85553 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.472943   2.864455  -6.100 5.11e-07 ***
## fertilizer    0.534548   0.077066   6.936 3.98e-08 ***
## rainfall      0.023948   0.006697   3.576 0.001018 ** 
## irrigation    0.444435   0.122672   3.623 0.000891 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3532 on 36 degrees of freedom
## Multiple R-squared:  0.9992, Adjusted R-squared:  0.9991 
## F-statistic: 1.425e+04 on 3 and 36 DF,  p-value: < 2.2e-16

ANOVA Table

anova(model)
## Analysis of Variance Table
## 
## Response: yield
##            Df Sum Sq Mean Sq   F value    Pr(>F)    
## fertilizer  1 5331.5  5331.5 42728.315 < 2.2e-16 ***
## rainfall    1    2.3     2.3    18.496 0.0001242 ***
## irrigation  1    1.6     1.6    13.126 0.0008912 ***
## Residuals  36    4.5     0.1                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Understanding Regression Output

Regression Coefficients

The coefficients indicate the effect of each independent variable on crop yield.

  • Positive coefficient indicates increase in yield
  • Negative coefficient indicates decrease in yield

p-value

Decision rule:

p-value Interpretation
p < 0.05 Significant
p > 0.05 Not Significant

R-squared

R² measures the goodness of fit of the regression model.

R² Value Interpretation
Close to 1 Good Fit
Close to 0 Poor Fit

Regression Diagnostics

par(mfrow=c(2,2))
plot(model)

The above command generates:

  1. Residual vs Fitted Plot
  2. Normal Q-Q Plot
  3. Scale-Location Plot
  4. Residuals vs Leverage Plot

Interpretation of Diagnostic Plots

Residual vs Fitted Plot

This plot checks:

  • Linearity
  • Constant variance

Random scatter of points indicates good model fit.

Normal Q-Q Plot

This plot checks normality of residuals.

Residuals should approximately follow a straight line.

Scale-Location Plot

This plot checks homoscedasticity.

Equal spread of residuals indicates constant variance.

Residuals vs Leverage Plot

This plot identifies influential observations and outliers.

Applications in Agriculture

Regression analysis has wide applications in agricultural sciences.

Major Applications

  • Crop yield prediction
  • Fertilizer recommendation studies
  • Rainfall impact assessment
  • Soil nutrient analysis
  • Irrigation management
  • Agricultural economic forecasting
  • Pest and disease modelling
  • Climate change impact studies

Saving Graphs

Graphs can be exported from the Plots window using:

  1. Export
  2. Save as Image
  3. Choose format:
    • PNG
    • JPEG
    • PDF

Conclusion

Multiple regression analysis is an important statistical tool for agricultural research and decision-making.

RStudio provides a powerful environment for:

  • Data analysis
  • Model fitting
  • Graphical visualization
  • Statistical interpretation
  • Prediction modelling

The methods explained in this tutorial can be extended to advanced predictive agricultural analytics.

References

  1. Montgomery, D.C., Peck, E.A. and Vining, G.G. Introduction to Linear Regression Analysis.
  2. Kutner, M.H., Nachtsheim, C.J., Neter, J. and Li, W. Applied Linear Statistical Models.
  3. R Core Team. R: A Language and Environment for Statistical Computing.
  4. https://cran.r-project.org/
  5. https://posit.co/download/rstudio-desktop/