Main Components of RStudio
- Source Editor
- Console
- Environment/History
- Files/Plots/Packages/Help
Regression analysis is one of the most widely used statistical techniques in agricultural research. It is used to study the relationship between crop yield and various influencing factors such as fertilizer application, rainfall, irrigation, temperature, and soil nutrients.
Multiple regression analysis helps quantify these relationships and enables prediction of crop yield under varying conditions.
R Studio provides an efficient and user-friendly environment for performing regression analysis, graphical visualization, and statistical interpretation.
The objectives of this practical tutorial are:
| Software | Purpose |
|---|---|
| R Software | Statistical Computing |
| RStudio | Integrated Development Environment |
R is an open-source programming language widely used for statistical analysis, data visualization, and predictive modelling.
RStudio is an Integrated Development Environment (IDE) for R.
In this tutorial, a hypothetical agricultural dataset is used to study the effect of:
on crop yield.
| Variable | Description | Unit |
|---|---|---|
| fertilizer | Amount of fertilizer applied | kg/ha |
| rainfall | Seasonal rainfall received | mm |
| irrigation | Irrigation hours supplied | hours |
| yield | Crop yield | quintal/ha |
cropdata <- data.frame(
fertilizer = c(40,42,38,50,45,47,43,39,41,44,
46,48,37,36,49,51,52,53,35,34,
55,56,57,58,59,60,61,62,63,64,
65,66,67,68,69,70,71,72,73,74),
rainfall = c(820,790,760,880,850,840,810,770,800,830,
860,870,750,740,890,900,910,920,730,720,
930,940,950,960,970,980,990,1000,1010,1020,
1030,1040,1050,1060,1070,1080,1090,1100,1110,1120),
irrigation = c(12,13,11,15,14,14,13,12,13,14,
15,15,11,10,16,16,17,17,10,9,
18,18,19,19,20,20,21,21,22,22,
23,23,24,24,25,25,26,26,27,27),
yield = c(28,30,26,36,33,34,31,27,29,32,
35,36,25,24,38,39,40,41,23,22,
42,43,44,45,46,47,48,49,50,51,
52,53,54,55,56,57,58,59,60,61)
)
cropdata
## fertilizer rainfall irrigation yield
## 1 40 820 12 28
## 2 42 790 13 30
## 3 38 760 11 26
## 4 50 880 15 36
## 5 45 850 14 33
## 6 47 840 14 34
## 7 43 810 13 31
## 8 39 770 12 27
## 9 41 800 13 29
## 10 44 830 14 32
## 11 46 860 15 35
## 12 48 870 15 36
## 13 37 750 11 25
## 14 36 740 10 24
## 15 49 890 16 38
## 16 51 900 16 39
## 17 52 910 17 40
## 18 53 920 17 41
## 19 35 730 10 23
## 20 34 720 9 22
## 21 55 930 18 42
## 22 56 940 18 43
## 23 57 950 19 44
## 24 58 960 19 45
## 25 59 970 20 46
## 26 60 980 20 47
## 27 61 990 21 48
## 28 62 1000 21 49
## 29 63 1010 22 50
## 30 64 1020 22 51
## 31 65 1030 23 52
## 32 66 1040 23 53
## 33 67 1050 24 54
## 34 68 1060 24 55
## 35 69 1070 25 56
## 36 70 1080 25 57
## 37 71 1090 26 58
## 38 72 1100 26 59
## 39 73 1110 27 60
## 40 74 1120 27 61
str(cropdata)
## 'data.frame': 40 obs. of 4 variables:
## $ fertilizer: num 40 42 38 50 45 47 43 39 41 44 ...
## $ rainfall : num 820 790 760 880 850 840 810 770 800 830 ...
## $ irrigation: num 12 13 11 15 14 14 13 12 13 14 ...
## $ yield : num 28 30 26 36 33 34 31 27 29 32 ...
summary(cropdata)
## fertilizer rainfall irrigation yield
## Min. :34.00 Min. : 720.0 Min. : 9.00 Min. :22.00
## 1st Qu.:43.75 1st Qu.: 827.5 1st Qu.:13.75 1st Qu.:31.75
## Median :54.00 Median : 925.0 Median :17.50 Median :41.50
## Mean :54.00 Mean : 923.5 Mean :17.93 Mean :41.48
## 3rd Qu.:64.25 3rd Qu.:1022.5 3rd Qu.:22.25 3rd Qu.:51.25
## Max. :74.00 Max. :1120.0 Max. :27.00 Max. :61.00
plot(cropdata$fertilizer,
cropdata$yield,
main = "Effect of Fertilizer on Crop Yield",
xlab = "Fertilizer (kg/ha)",
ylab = "Crop Yield (quintal/ha)",
pch = 19)
abline(lm(yield ~ fertilizer, data = cropdata),
col = "blue",
lwd = 2)
model <- lm(yield ~ fertilizer + rainfall + irrigation,
data = cropdata)
model
##
## Call:
## lm(formula = yield ~ fertilizer + rainfall + irrigation, data = cropdata)
##
## Coefficients:
## (Intercept) fertilizer rainfall irrigation
## -17.47294 0.53455 0.02395 0.44444
summary(model)
##
## Call:
## lm(formula = yield ~ fertilizer + rainfall + irrigation, data = cropdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.99510 -0.15986 0.03119 0.08189 0.85553
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.472943 2.864455 -6.100 5.11e-07 ***
## fertilizer 0.534548 0.077066 6.936 3.98e-08 ***
## rainfall 0.023948 0.006697 3.576 0.001018 **
## irrigation 0.444435 0.122672 3.623 0.000891 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3532 on 36 degrees of freedom
## Multiple R-squared: 0.9992, Adjusted R-squared: 0.9991
## F-statistic: 1.425e+04 on 3 and 36 DF, p-value: < 2.2e-16
anova(model)
## Analysis of Variance Table
##
## Response: yield
## Df Sum Sq Mean Sq F value Pr(>F)
## fertilizer 1 5331.5 5331.5 42728.315 < 2.2e-16 ***
## rainfall 1 2.3 2.3 18.496 0.0001242 ***
## irrigation 1 1.6 1.6 13.126 0.0008912 ***
## Residuals 36 4.5 0.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The coefficients indicate the effect of each independent variable on crop yield.
Decision rule:
| p-value | Interpretation |
|---|---|
| p < 0.05 | Significant |
| p > 0.05 | Not Significant |
R² measures the goodness of fit of the regression model.
| R² Value | Interpretation |
|---|---|
| Close to 1 | Good Fit |
| Close to 0 | Poor Fit |
par(mfrow=c(2,2))
plot(model)
The above command generates:
This plot checks:
Random scatter of points indicates good model fit.
This plot checks normality of residuals.
Residuals should approximately follow a straight line.
This plot checks homoscedasticity.
Equal spread of residuals indicates constant variance.
This plot identifies influential observations and outliers.
Regression analysis has wide applications in agricultural sciences.
Graphs can be exported from the Plots window using:
Multiple regression analysis is an important statistical tool for agricultural research and decision-making.
RStudio provides a powerful environment for:
The methods explained in this tutorial can be extended to advanced predictive agricultural analytics.