Introduction

This data explores crops and the conditions they like to be grown in. This data can be used to find suitable environments for growth, and which plants like the same conditions/ can be grown together.

Dataset

#Explore Data
dim(Crop_recommendation)
## [1] 2200    8
str(Crop_recommendation)
## 'data.frame':    2200 obs. of  8 variables:
##  $ N          : int  90 85 60 74 78 69 69 94 89 68 ...
##  $ P          : int  42 58 55 35 42 37 55 53 54 58 ...
##  $ K          : int  43 41 44 40 42 42 38 40 38 38 ...
##  $ temperature: num  20.9 21.8 23 26.5 20.1 ...
##  $ humidity   : num  82 80.3 82.3 80.2 81.6 ...
##  $ ph         : num  6.5 7.04 7.84 6.98 7.63 ...
##  $ rainfall   : num  203 227 264 243 263 ...
##  $ label      : chr  "rice" "rice" "rice" "rice" ...
summary(Crop_recommendation)
##        N                P                K           temperature    
##  Min.   :  0.00   Min.   :  5.00   Min.   :  5.00   Min.   : 8.826  
##  1st Qu.: 21.00   1st Qu.: 28.00   1st Qu.: 20.00   1st Qu.:22.769  
##  Median : 37.00   Median : 51.00   Median : 32.00   Median :25.599  
##  Mean   : 50.55   Mean   : 53.36   Mean   : 48.15   Mean   :25.616  
##  3rd Qu.: 84.25   3rd Qu.: 68.00   3rd Qu.: 49.00   3rd Qu.:28.562  
##  Max.   :140.00   Max.   :145.00   Max.   :205.00   Max.   :43.675  
##     humidity           ph           rainfall         label          
##  Min.   :14.26   Min.   :3.505   Min.   : 20.21   Length:2200       
##  1st Qu.:60.26   1st Qu.:5.972   1st Qu.: 64.55   Class :character  
##  Median :80.47   Median :6.425   Median : 94.87   Mode  :character  
##  Mean   :71.48   Mean   :6.469   Mean   :103.46                     
##  3rd Qu.:89.95   3rd Qu.:6.924   3rd Qu.:124.27                     
##  Max.   :99.98   Max.   :9.935   Max.   :298.56

The data looks at seven factors influencing growth in Agriculture. These include:

Nitrogen: Essential for synthesizing amino acids Phosphorus: Used for tissue building and new cell growth. Potassium: Essential mineral macro-nutrient Temperature(C°): Plays a role in germination, which at certain temperatures, begins to decline. Humidity: Plants can wilt if humidity is too high. pH:pH range 5.5–6.5 is optimal for plant growth as the availability of nutrients is optimal. Rainfall: rain balanced with irrigation can speed up growth time.

We can see that some factors have slim ranges, such as pH, whereas others have lots of variability such as the amount of rainfall.

Findings

Some crops differ from the norm when it comes to Temperature and pH. Some measurements are correlated, while others lie apart.

Comparing Means of each Measurement by Crop

To understand the distribution of the data and to see what crops require, this graph explores the mean of each attribute for each crop in the data.

par(mfrow=c(7,1),mar=c(2,2,2,2),oma=c(2,2,2,0),mgp=c(2,1,0))
for (index in 2:ncol(grouped)) {
  barplot(height=grouped[,index],names.arg=grouped$Group.1,
          xlab="", ylab=colnames(grouped)[index], main="Comparision of Mean Attributes of various classes")
  # add y-axis label
  mtext(side = 2, text = colnames(grouped)[index], line = 2, cex=0.75)
}

Cotton requires the most Nitrogen. Apple requires the most Phosphorus. Grapes require the most Potassium. Papaya requires a hot climate. Coconut requires a humid climate. Chickpea requires high pH in soil. Rice requires a huge amount of Rainfall.

Avg Temp by Crop vs Standard Bioactivity Range

This graph compares the temperatures that crops sampled in the data set were grown at with the standard bioactivity range. This is a general indicator for success, as most crops should fall within the 10C°-23.88C° range.

ggplot(grouped, aes(x=Group.1, y=temperature)) + 
  geom_point() +
  labs(x="Crop", y="Temperature (C°)", title="Avg Temp by Crop vs Standard Bioactivity Range")+
  scale_y_continuous(limits=c(0,85))+
  geom_hline(yintercept = 10, color = "red", size = 1) +
  geom_hline(yintercept = 23.88, color = "red", size = 1)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.

Surprisingly, more crops are outside the range than not. However, certain plants require certain climates.

Distribution of pH

To see if the majority of crops are grown at optimal pH levels (5.5-6.5), a histogram displaying the distribution of pH in the data is shown.

hist(Crop_recommendation$ph, main = "Histogram of pH", xlab = "pH", ylab = "Frequency",col = "#8b0000")
axis(side=1, at=seq(0,10, 0.5))

The highest frequency bins are within the optimal range, which is an indicator for success.

Correlation Matrix Heatmap among Measurements

Now to see what measurements are related with each other. This can show relationships that can prove beneficial when a crop needs more or less of an attribute. Ex: if the pH needs to be lowered and we see that pH and Nitrogen have a positive correlation, we can lower the nitrogen levels in the soil.

cor_matrix <- cor(Crop_recommendation[-8])
corrplot(cor_matrix, method = "color", type = "upper", tl.col = "black", tl.srt = 45,  main = "Correlation Matrix and Heatmap among measurements")

We see that Phosphorus and Potassium have a strong positive correlation.

Testing for Variability and Outliers using PCA

This figure explores the relationships among crops. This can be used to see which crops can be grown together, and which crops are outliers in the data.

# Perform PCA
pca <- prcomp(Crop_recommendation[-8], scale. = TRUE)

# Extract the PCA results
df_pca <- data.frame(pca$x[,1:2])

# Add the labels to the PCA results
df_pca$label <- Crop_recommendation$label

# Create the scatter plot
ggplot(df_pca, aes(x = PC1, y = PC2, color = label)) +
  geom_point() +
  ggtitle("Decomposed using PCA")

We can see that most crops fall within the same area as one another, showing little variability between them. However, apple and grapes are outliers.

Conclusion

Through visualizing the data, we know that most crops are grown at a higher than average temperature. We also know that most crops are being grown at suitable pH conditions. We see that Phosphorus and Potassium are highly correlated, and that apples and grapes favor different condistions than most other crops.