Load Required Packages

library(ggplot2)
library(dplyr)
library(corrplot)
library(GGally)
library(psych)
library(knitr)

Introduction

Correlation analysis is one of the most important statistical techniques used in agricultural research. It helps measure the strength and direction of relationship among variables such as fertilizer application, rainfall, irrigation, temperature, and crop yield.

Correlation analysis helps researchers understand how agricultural variables are associated with one another and supports scientific decision-making in crop management and agricultural planning.

R Studio provides a powerful and user-friendly platform for performing correlation analysis, graphical visualization, and interpretation of agricultural datasets.

Objectives

The objectives of this tutorial are:

  • To understand correlation analysis
  • To create agricultural datasets in R Studio
  • To compute correlation coefficients
  • To visualize correlation matrices
  • To perform significance testing of correlation
  • To generate graphical representations of relationships among variables

Software Requirements

Software Purpose
R Software Statistical Computing
RStudio Integrated Development Environment

Introduction to R and RStudio

R is an open-source programming language widely used for statistical analysis, data visualization, and predictive modelling.

RStudio is an Integrated Development Environment (IDE) for R.

Main Components of RStudio

  1. Source Editor
  2. Console
  3. Environment/History
  4. Files/Plots/Packages/Help

Agricultural Dataset Description

In this tutorial, a hypothetical agricultural dataset is used to study the relationship among:

  • Fertilizer application
  • Rainfall
  • Irrigation
  • Crop yield

Variables Used

Variable Description Unit
fertilizer Amount of fertilizer applied kg/ha
rainfall Seasonal rainfall received mm
irrigation Irrigation hours supplied hours
yield Crop yield quintal/ha

Creating Agricultural Dataset

cropdata <- data.frame(
  fertilizer = c(40,42,38,50,45,47,43,39,41,44,
                 46,48,37,36,49,51,52,53,35,34,
                 55,56,57,58,59,60,61,62,63,64,
                 65,66,67,68,69,70,71,72,73,74),

  rainfall = c(820,790,760,880,850,840,810,770,800,830,
               860,870,750,740,890,900,910,920,730,720,
               930,940,950,960,970,980,990,1000,1010,1020,
               1030,1040,1050,1060,1070,1080,1090,1100,1110,1120),

  irrigation = c(12,13,11,15,14,14,13,12,13,14,
                 15,15,11,10,16,16,17,17,10,9,
                 18,18,19,19,20,20,21,21,22,22,
                 23,23,24,24,25,25,26,26,27,27),

  yield = c(28,30,26,36,33,34,31,27,29,32,
            35,36,25,24,38,39,40,41,23,22,
            42,43,44,45,46,47,48,49,50,51,
            52,53,54,55,56,57,58,59,60,61)
)

cropdata
##    fertilizer rainfall irrigation yield
## 1          40      820         12    28
## 2          42      790         13    30
## 3          38      760         11    26
## 4          50      880         15    36
## 5          45      850         14    33
## 6          47      840         14    34
## 7          43      810         13    31
## 8          39      770         12    27
## 9          41      800         13    29
## 10         44      830         14    32
## 11         46      860         15    35
## 12         48      870         15    36
## 13         37      750         11    25
## 14         36      740         10    24
## 15         49      890         16    38
## 16         51      900         16    39
## 17         52      910         17    40
## 18         53      920         17    41
## 19         35      730         10    23
## 20         34      720          9    22
## 21         55      930         18    42
## 22         56      940         18    43
## 23         57      950         19    44
## 24         58      960         19    45
## 25         59      970         20    46
## 26         60      980         20    47
## 27         61      990         21    48
## 28         62     1000         21    49
## 29         63     1010         22    50
## 30         64     1020         22    51
## 31         65     1030         23    52
## 32         66     1040         23    53
## 33         67     1050         24    54
## 34         68     1060         24    55
## 35         69     1070         25    56
## 36         70     1080         25    57
## 37         71     1090         26    58
## 38         72     1100         26    59
## 39         73     1110         27    60
## 40         74     1120         27    61

Structure of Dataset

str(cropdata)
## 'data.frame':    40 obs. of  4 variables:
##  $ fertilizer: num  40 42 38 50 45 47 43 39 41 44 ...
##  $ rainfall  : num  820 790 760 880 850 840 810 770 800 830 ...
##  $ irrigation: num  12 13 11 15 14 14 13 12 13 14 ...
##  $ yield     : num  28 30 26 36 33 34 31 27 29 32 ...

Summary Statistics

summary(cropdata)
##    fertilizer       rainfall        irrigation        yield      
##  Min.   :34.00   Min.   : 720.0   Min.   : 9.00   Min.   :22.00  
##  1st Qu.:43.75   1st Qu.: 827.5   1st Qu.:13.75   1st Qu.:31.75  
##  Median :54.00   Median : 925.0   Median :17.50   Median :41.50  
##  Mean   :54.00   Mean   : 923.5   Mean   :17.93   Mean   :41.48  
##  3rd Qu.:64.25   3rd Qu.:1022.5   3rd Qu.:22.25   3rd Qu.:51.25  
##  Max.   :74.00   Max.   :1120.0   Max.   :27.00   Max.   :61.00

Correlation Analysis

Correlation analysis measures the strength and direction of relationship among variables.

The correlation coefficient ranges from:

Correlation Value Interpretation
+1 Perfect Positive Correlation
0 No Correlation
-1 Perfect Negative Correlation

Computing Correlation Matrix

correlation_matrix <- cor(cropdata)

correlation_matrix
##            fertilizer  rainfall irrigation     yield
## fertilizer  1.0000000 0.9974009  0.9961273 0.9992096
## rainfall    0.9974009 1.0000000  0.9946824 0.9981105
## irrigation  0.9961273 0.9946824  1.0000000 0.9971846
## yield       0.9992096 0.9981105  0.9971846 1.0000000

Interpretation of Correlation Matrix

  • Positive values indicate positive association
  • Negative values indicate negative association
  • Values close to +1 indicate strong positive correlation
  • Values close to -1 indicate strong negative correlation

Visualizing Correlation Matrix

corrplot(correlation_matrix,
         method = "circle",
         type = "upper",
         tl.col = "black",
         tl.srt = 45)

Pairwise Scatter Plot Matrix

ggpairs(cropdata)

Correlation Test Between Fertilizer and Yield

cor.test(cropdata$fertilizer,
         cropdata$yield)
## 
##  Pearson's product-moment correlation
## 
## data:  cropdata$fertilizer and cropdata$yield
## t = 154.95, df = 38, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.998495 0.999585
## sample estimates:
##       cor 
## 0.9992096

Interpretation of Correlation Test

The correlation test provides:

  • Correlation coefficient (r)
  • p-value
  • Confidence interval

Decision Rule

p-value Interpretation
p < 0.05 Significant Correlation
p > 0.05 Non-significant Correlation

A significant positive correlation indicates that increase in fertilizer application is associated with increase in crop yield.

Scatter Plot of Fertilizer and Yield

plot(cropdata$fertilizer,
     cropdata$yield,
     main = "Relationship Between Fertilizer and Crop Yield",
     xlab = "Fertilizer (kg/ha)",
     ylab = "Crop Yield (quintal/ha)",
     pch = 19,
     col = "blue")

Applications in Agriculture

Correlation analysis has wide applications in agricultural sciences.

Major Applications

  • Crop yield relationship studies
  • Fertilizer response analysis
  • Rainfall impact assessment
  • Soil nutrient relationship studies
  • Irrigation management
  • Climate impact studies
  • Pest and disease association studies
  • Agricultural forecasting

Saving Graphs

Graphs can be exported from the Plots window using:

  1. Export
  2. Save as Image
  3. Choose format:
    • PNG
    • JPEG
    • PDF

Conclusion

Correlation analysis is an important statistical tool for understanding relationships among agricultural variables.

R Studio provides a powerful environment for:

  • Statistical analysis
  • Data visualization
  • Correlation studies
  • Graphical interpretation
  • Agricultural data analysis

The methods explained in this tutorial can be extended to advanced statistical and predictive agricultural studies.

References

  1. R Core Team. R: A Language and Environment for Statistical Computing.
  2. https://cran.r-project.org/
  3. https://posit.co/download/rstudio-desktop/