Main Components of RStudio
- Source Editor
- Console
- Environment/History
- Files/Plots/Packages/Help
Regression analysis is one of the most widely used statistical techniques in agricultural research. It is used to study the relationship between crop yield and various influencing factors such as fertilizer application, rainfall, irrigation, temperature, and soil nutrients.
Multiple regression analysis helps quantify these relationships and enables prediction of crop yield under varying conditions.
R Studio provides an efficient and user-friendly environment for performing regression analysis, graphical visualization, and statistical interpretation.
The objectives of this practical tutorial are:
| Software | Purpose |
|---|---|
| R Software | Statistical Computing |
| RStudio | Integrated Development Environment |
R is an open-source programming language widely used for statistical analysis, data visualization, and predictive modelling.
RStudio is an Integrated Development Environment (IDE) for R.
In this tutorial, a hypothetical agricultural dataset is used to study the effect of:
on crop yield.
| Variable | Description | Unit |
|---|---|---|
| fertilizer | Amount of fertilizer applied | kg/ha |
| rainfall | Seasonal rainfall received | mm |
| irrigation | Irrigation hours supplied | hours |
| yield | Crop yield | quintal/ha |
library(readxl)
library(ggplot2)
library(dplyr)
library(knitr)
library(car)
library(lmtest)
Datanew <- read_excel("cropdataregress.xlsx")
head(Datanew)
## # A tibble: 6 × 4
## fertilizer rainfall irrigation yield
## <dbl> <dbl> <dbl> <dbl>
## 1 40 820 12 28
## 2 42 790 13 30
## 3 38 760 11 26
## 4 50 880 15 36
## 5 55 910 16 40
## 6 48 850 14 34
str(Datanew)
## tibble [40 × 4] (S3: tbl_df/tbl/data.frame)
## $ fertilizer: num [1:40] 40 42 38 50 55 48 60 62 45 52 ...
## $ rainfall : num [1:40] 820 790 760 880 910 850 940 960 810 900 ...
## $ irrigation: num [1:40] 12 13 11 15 16 14 17 18 13 15 ...
## $ yield : num [1:40] 28 30 26 36 40 34 44 46 31 38 ...
summary(Datanew)
## fertilizer rainfall irrigation yield
## Min. :38.00 Min. : 760.0 Min. :11.00 Min. :26.00
## 1st Qu.:45.75 1st Qu.: 827.5 1st Qu.:13.00 1st Qu.:31.75
## Median :53.50 Median : 897.5 Median :15.00 Median :38.50
## Mean :53.35 Mean : 889.6 Mean :15.40 Mean :38.40
## 3rd Qu.:60.25 3rd Qu.: 942.5 3rd Qu.:17.25 3rd Qu.:44.25
## Max. :69.00 Max. :1010.0 Max. :20.00 Max. :52.00
plot(Datanew$fertilizer,
Datanew$yield,
main = "Effect of Fertilizer on Crop Yield",
xlab = "Fertilizer (kg/ha)",
ylab = "Crop Yield (quintal/ha)",
pch = 19)
abline(lm(yield ~ fertilizer, data = Datanew),
col = "blue",
lwd = 2)
model <- lm(yield ~ fertilizer + rainfall + irrigation,
data = Datanew)
model
##
## Call:
## lm(formula = yield ~ fertilizer + rainfall + irrigation, data = Datanew)
##
## Coefficients:
## (Intercept) fertilizer rainfall irrigation
## -17.18765 0.50468 0.02209 0.58514
summary(model)
##
## Call:
## lm(formula = yield ~ fertilizer + rainfall + irrigation, data = Datanew)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.99759 -0.24834 -0.02828 0.21340 0.93317
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.187649 3.386389 -5.076 1.19e-05 ***
## fertilizer 0.504680 0.069382 7.274 1.44e-08 ***
## rainfall 0.022090 0.007223 3.058 0.00418 **
## irrigation 0.585138 0.176587 3.314 0.00211 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3945 on 36 degrees of freedom
## Multiple R-squared: 0.9976, Adjusted R-squared: 0.9974
## F-statistic: 4913 on 3 and 36 DF, p-value: < 2.2e-16
anova(model)
## Analysis of Variance Table
##
## Response: yield
## Df Sum Sq Mean Sq F value Pr(>F)
## fertilizer 1 2290.04 2290.04 14714.42 < 2.2e-16 ***
## rainfall 1 2.25 2.25 14.43 0.0005402 ***
## irrigation 1 1.71 1.71 10.98 0.0021068 **
## Residuals 36 5.60 0.16
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The coefficients indicate the effect of each independent variable on crop yield.
Decision rule:
| p-value | Interpretation |
|---|---|
| p < 0.05 | Significant |
| p > 0.05 | Not Significant |
R² measures the goodness of fit of the regression model.
| R² Value | Interpretation |
|---|---|
| Close to 1 | Good Fit |
| Close to 0 | Poor Fit |
par(mfrow=c(2,2))
plot(model)
The above command generates:
This plot checks:
Random scatter of points indicates good model fit.
This plot checks normality of residuals.
Residuals should approximately follow a straight line.
This plot checks homoscedasticity.
Equal spread of residuals indicates constant variance.
This plot identifies influential observations and outliers.
Variance Inflation Factor (VIF) is used to detect multicollinearity among independent variables in a regression model.
car::vif(model)
## fertilizer rainfall irrigation
## 97.83653 68.15842 54.81909
Durbin-Watson test is used to detect autocorrelation among residuals in regression analysis.
lmtest::dwtest(model)
##
## Durbin-Watson test
##
## data: model
## DW = 1.617, p-value = 0.1078
## alternative hypothesis: true autocorrelation is greater than 0
Regression analysis has wide applications in agricultural sciences.
Multiple regression analysis is an important statistical tool for agricultural research and decision-making.
RStudio provides a powerful environment for:
The methods explained in this tutorial can be extended to advanced predictive agricultural analytics.