Overview

Our work explores the impact of climate change on farming, specifically how temperature, rainfall, and CO2 emissions influence crop yields. This is an essential issue today, as farmers around the world are facing changing climate conditions that affect food production. To guide ourselves, we relied on various modules from our course, particularly (Module 1), which helped us understand how to define our variables and structure our assignment.

Crop Yield: The amount of crops produced per hectare.
Average Temperature: The mean temperature recorded over a certain period.
Precipitation: The total rainfall (measured in millimeters).
CO2 Emissions: The amount of carbon dioxide released into the atmosphere, measured in metric tons.

Why This Matters?

Climate change is a critical issue now, and its impact on farming is something that cannot be ignored. As we look into this data, we hope to reveal meaningful insights that can help farmers better adapt to changing weather patterns. Finding a suitable dataset was one of the early challenges we faced, but after some research, we found a dataset that provides a global view of farming works.

Exploring Climate Effects on Agriculture:

Our Method: Our work seeks to understand how key climate factors temperature, rainfall, and CO2 emissions are affecting agriculture, particularly crop yields.
What We Wanted to Know: How does warmer weather change how much food farmers can grow? Does more or less rain make a difference to crops? How are these changes affecting farmers’ money and people’s food?
Our Approach: To answer these questions, we used statistical tools like hypothesis testing and regression modeling to look for patterns in the data. Thanks to (Module 3) on probability and uncertainty.
RPubs link information: http://rpubs.com/Hemanth_Gowda23/1233783

The dataset Overview

The dataset provides a complete view of how climate change has influenced agricultural outcomes globally from 1998 to 2024. We picked this dataset for our assignment because it has info from all over the world from 1998 to 2024. It shows many important things like temperature, rain, CO2, and how these affect farming and money in different places.
We liked this dataset because we can compare how climate change affects different crops and areas. It helps us see the big picture of climate and farming worldwide. We were a bit worried that we don’t know exactly how they collected the data. But we think it’s okay because the dataset covers so many countries and continents. This wide coverage makes us feel it can show global farming trends well.

Where the Data Came From:

We used a dataset titled climate_change_impact_on_agriculture_2024.csv, which includes information on temperature, precipitation, CO2 emissions, crop yields, and economic impacts across different regions. While we didn’t collect this data ourselves, it seemed comprehensive enough to fit the needs of our analysis.

Sampling and Reliability:

One challenge we encountered was the lack of information on the sampling method used in the dataset. Without this, it was hard to assess the reliability of the data fully. We assumed the dataset was representative because it covered a wide range of geographic areas, which suggested diversity in the sample, even though we didn’t have information about the exact sampling method.
Module 5 on sampling taught us the importance of knowing how data is collected, but since that wasn’t available, we relied on the broad coverage as a reasonable indicator of reliability. While we recognized the limitations of this assumption, it gave us enough confidence to proceed cautiously with our analysis.

Our understanding of the key variables:

Variable (Dependent): Crop_Yield_MT_per_HA (Crop Yield per Hectare): This is the main measure of interest, representing the amount of crops produced per hectare of land.
Numerical: Temperature (degree celsius) , Precipitation (mm), CO2 emissions (metric tons), Crop yield (MT/HA), Economic impact (USD). Categorical: Region, Country, Crop type.
Economic Factor: Economic_Impact_Million_USD.

Dataset Preparation

Getting the data ready before we could start analyzing, we had to clean up our data and make sure it was in good shape. Here’s what we did:

What’s in Our Data?

We used a file called “climate_change_impact_on_agriculture_2024.csv”. It’s like a big spreadsheet with lots of information about how climate change is affecting farming around the world from 1998 to 2024.

It includes things like:

Numbers: Temperature, rainfall, pollution (CO2), how much food was grown, and how much money was made or lost.
Categories: Which part of the world, which country, and what kind of crop

Sometimes there were missing pieces of information. We couldn’t just leave them empty, so:

For number columns: We will use the middle value (median) to fill in the gaps.
For category columns: We will use the most common answer.
We made sure all the names were spelled the same way.
We will change some column names because to make more sense.

Source link: https://www.kaggle.com/Our_datasetsets/waqi786/climate-change-impact-on-agriculture

# Loading necessary libraries for our tasks. 
library(dplyr) # For data manipulation
library(stringr) # For string handling tasks
library(ggplot2) # For data visualization
library(corrplot) # For correlation matrix plotting

Loading ,displaying and showing first few rows of the dataset. (Update the file path)

climate_data <- read.csv("C:/Users/Hemanth Gowda/Downloads/archive (2)/climate_change_impact_on_agriculture_2024.csv")

str(climate_data)  # To display structure of the dataset

## 'data.frame':    10000 obs. of  15 variables:
##  $ Year                       : int  2001 2024 2001 2001 1998 2019 1997 2021 2012 2018 ...
##  $ Country                    : chr  "India" "China" "France" "Canada" ...
##  $ Region                     : chr  "West Bengal" "North" "Ile-de-France" "Prairies" ...
##  $ Crop_Type                  : chr  "Corn" "Corn" "Wheat" "Coffee" ...
##  $ Average_Temperature_C      : num  1.55 3.23 21.11 27.85 2.19 ...
##  $ Total_Precipitation_mm     : num  447 2914 1302 1154 1627 ...
##  $ CO2_Emissions_MT           : num  15.2 29.8 25.8 13.9 11.8 ...
##  $ Crop_Yield_MT_per_HA       : num  1.74 1.74 1.72 3.89 1.08 ...
##  $ Extreme_Weather_Events     : int  8 8 5 5 9 5 2 4 1 1 ...
##  $ Irrigation_Access_.        : num  14.5 11.1 84.4 94.1 95.8 ...
##  $ Pesticide_Use_KG_per_HA    : num  10.1 33.1 27.4 14.4 44.4 ...
##  $ Fertilizer_Use_KG_per_HA   : num  14.8 23.2 65.5 87.6 88.1 ...
##  $ Soil_Health_Index          : num  83.2 54 67.8 91.4 49.6 ...
##  $ Adaptation_Strategies      : chr  "Water Management" "Crop Rotation" "Water Management" "No Adaptation" ...
##  $ Economic_Impact_Million_USD: num  808 616 797 790 402 ...

head(climate_data) # To show the first few records

print("Column names before renaming:")

## [1] "Column names before renaming:"

print(colnames(climate_data))

##  [1] "Year"                        "Country"                    
##  [3] "Region"                      "Crop_Type"                  
##  [5] "Average_Temperature_C"       "Total_Precipitation_mm"     
##  [7] "CO2_Emissions_MT"            "Crop_Yield_MT_per_HA"       
##  [9] "Extreme_Weather_Events"      "Irrigation_Access_."        
## [11] "Pesticide_Use_KG_per_HA"     "Fertilizer_Use_KG_per_HA"   
## [13] "Soil_Health_Index"           "Adaptation_Strategies"      
## [15] "Economic_Impact_Million_USD"

# Renaming specific columns
colnames(climate_data)[colnames(climate_data) == "Soil_Health_Index"] <- "Soil_Quality_Index"
colnames(climate_data)[colnames(climate_data) == "Year"] <- "Recording_Year"

Printing structure of the dataset

print("Column names after renaming:")

## [1] "Column names after renaming:"

print(colnames(climate_data))

##  [1] "Recording_Year"              "Country"                    
##  [3] "Region"                      "Crop_Type"                  
##  [5] "Average_Temperature_C"       "Total_Precipitation_mm"     
##  [7] "CO2_Emissions_MT"            "Crop_Yield_MT_per_HA"       
##  [9] "Extreme_Weather_Events"      "Irrigation_Access_."        
## [11] "Pesticide_Use_KG_per_HA"     "Fertilizer_Use_KG_per_HA"   
## [13] "Soil_Quality_Index"          "Adaptation_Strategies"      
## [15] "Economic_Impact_Million_USD"

# Counting the missing values to check for incomplete entries
climate_data$missing_count <- rowSums(is.na(climate_data))
missing_count <- colSums(is.na(climate_data))
print(missing_count)

##              Recording_Year                     Country 
##                           0                           0 
##                      Region                   Crop_Type 
##                           0                           0 
##       Average_Temperature_C      Total_Precipitation_mm 
##                           0                           0 
##            CO2_Emissions_MT        Crop_Yield_MT_per_HA 
##                           0                           0 
##      Extreme_Weather_Events         Irrigation_Access_. 
##                           0                           0 
##     Pesticide_Use_KG_per_HA    Fertilizer_Use_KG_per_HA 
##                           0                           0 
##          Soil_Quality_Index       Adaptation_Strategies 
##                           0                           0 
## Economic_Impact_Million_USD               missing_count 
##                           0                           0

# Checking the columns to identify any special characters
special_values_check <- sapply(climate_data, function(x) sum(is.infinite(x) | is.nan(x)))
print(special_values_check)

##              Recording_Year                     Country 
##                           0                           0 
##                      Region                   Crop_Type 
##                           0                           0 
##       Average_Temperature_C      Total_Precipitation_mm 
##                           0                           0 
##            CO2_Emissions_MT        Crop_Yield_MT_per_HA 
##                           0                           0 
##      Extreme_Weather_Events         Irrigation_Access_. 
##                           0                           0 
##     Pesticide_Use_KG_per_HA    Fertilizer_Use_KG_per_HA 
##                           0                           0 
##          Soil_Quality_Index       Adaptation_Strategies 
##                           0                           0 
## Economic_Impact_Million_USD               missing_count 
##                           0                           0

str(climate_data)

## 'data.frame':    10000 obs. of  16 variables:
##  $ Recording_Year             : int  2001 2024 2001 2001 1998 2019 1997 2021 2012 2018 ...
##  $ Country                    : chr  "India" "China" "France" "Canada" ...
##  $ Region                     : chr  "West Bengal" "North" "Ile-de-France" "Prairies" ...
##  $ Crop_Type                  : chr  "Corn" "Corn" "Wheat" "Coffee" ...
##  $ Average_Temperature_C      : num  1.55 3.23 21.11 27.85 2.19 ...
##  $ Total_Precipitation_mm     : num  447 2914 1302 1154 1627 ...
##  $ CO2_Emissions_MT           : num  15.2 29.8 25.8 13.9 11.8 ...
##  $ Crop_Yield_MT_per_HA       : num  1.74 1.74 1.72 3.89 1.08 ...
##  $ Extreme_Weather_Events     : int  8 8 5 5 9 5 2 4 1 1 ...
##  $ Irrigation_Access_.        : num  14.5 11.1 84.4 94.1 95.8 ...
##  $ Pesticide_Use_KG_per_HA    : num  10.1 33.1 27.4 14.4 44.4 ...
##  $ Fertilizer_Use_KG_per_HA   : num  14.8 23.2 65.5 87.6 88.1 ...
##  $ Soil_Quality_Index         : num  83.2 54 67.8 91.4 49.6 ...
##  $ Adaptation_Strategies      : chr  "Water Management" "Crop Rotation" "Water Management" "No Adaptation" ...
##  $ Economic_Impact_Million_USD: num  808 616 797 790 402 ...
##  $ missing_count              : num  0 0 0 0 0 0 0 0 0 0 ...

Breaking Down the Numbers with Visuals

In this section, we’ll look into how we approached descriptive statistics and visualizations to summarize and explore the relationships between the key climate and agricultural variables in the dataset. Throughout this stage, we referred to multiple module notes to guide our analysis and ensure the accuracy of our results.

We picked Average Temperature, Total Precipitation, CO2 Emissions, Crop Yield, and Economic Impact because they directly show how climate change affects farming. Crop yield will tell us how much production has happened, and the economic impact will show us the financial ups and downs caused by climate changes. Together, these factors will help us understand how climate affects both the land and the economy.

Average Temperature degree celsius: The Climate factor that directly affects crop growth and productivity.
Total Precipitation (mm): The Rainfall levels that are critical for crop yields.
CO2 Emissions (MT): This variable reflects the impact of industrial activity on climate change.
Crop Yield (MT per Hectare): A key measure of agricultural productivity.

Economic Impact (Million USD): Represents the financial losses or gains caused by climate impacts on agriculture. In Module 1, we learnt the importance of identifying key variables before performing any deeper analysis. We explored descriptive statistics, such as the mean, median, and standard deviation for these variables to summarize their central tendencies and spread.

summary_data <- climate_data %>%
  summarise(
    # Average temperature in degrees Celsius
    mean_temp = mean(Average_Temperature_C, na.rm = TRUE),
    median_temp = median(Average_Temperature_C, na.rm = TRUE),
    sd_temp = sd(Average_Temperature_C, na.rm = TRUE),
    # Total precipitation in millimeters
    mean_precip = mean(Total_Precipitation_mm, na.rm = TRUE),
    median_precip = median(Total_Precipitation_mm, na.rm = TRUE),
    sd_precip = sd(Total_Precipitation_mm, na.rm = TRUE),
    # CO2 emissions in metric tons
    mean_co2 = mean(CO2_Emissions_MT, na.rm = TRUE),
    median_co2 = median(CO2_Emissions_MT, na.rm = TRUE),
    sd_co2 = sd(CO2_Emissions_MT, na.rm = TRUE),
    # Crop yield in metric tons per hectare
    mean_yield = mean(Crop_Yield_MT_per_HA, na.rm = TRUE),
    median_yield = median(Crop_Yield_MT_per_HA, na.rm = TRUE),
    sd_yield = sd(Crop_Yield_MT_per_HA, na.rm = TRUE)
  )
# Display the calculated summary statistics
print(summary_data)

##   mean_temp median_temp  sd_temp mean_precip median_precip sd_precip mean_co2
## 1   15.2413      15.175 11.46695    1611.664       1611.16  805.0168 15.24661
##   median_co2   sd_co2 mean_yield median_yield  sd_yield
## 1       15.2 8.589423   2.240017         2.17 0.9983415

Our understanding from the above summary statistics

Rain: It rains about 1,611 mm on average. But sometimes it rains way more, sometimes way less. Basically, the amount of rain is all over the place.
Temperature: It’s usually around 15°C, But it can get way hotter or colder. The temperature changes a lot. It’s like some places are having a nice spring day while others are in the middle of summer or winter.
CO2 (that gas in the air plants breathe): There’s about 15 ppm of CO2 in the air on average. This number jumps around a bit, but not as crazy as the rain. The amount of CO2 in the air changes, but not as wildly as the rain does.
How well are the crops growing? Crops usually grow at a rate of about 2.17 (we need to figure out exactly what this number means). This number doesn’t change too much

Visual Insights

In Module 2, we learned the significance of choosing the right visualizations to tell a data-driven story. This module helped us select the most effective plots to highlight the key relationships in our dataset.

We will create a scatter plot to visualize the relationship between temperature and crop yield. This plot will allow us to assess whether temperature changes, possibly due to climate change, have a positive or negative effect on crop productivity. We will also use a bar chart to compare the economic impacts of climate change on agriculture across different regions in the dataset.

Visualizing the Impact of Adaptation Strategies with a Bar Chart

ggplot(climate_data %>% 
         group_by(Adaptation_Strategies), 
       aes(x = Adaptation_Strategies, y = Extreme_Weather_Events, fill = Adaptation_Strategies)) +
  geom_bar(stat = "identity") +
  labs(title = "Impact by Adaptation Strategies", x = "Adaptation Strategies", y = "Extreme Weather Events") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +  
  ylim(0, 10) +
  scale_fill_manual(values = c(
    "Crop Rotation" = "mediumseagreen",   
    "Drought-resistant Crops" = "brown", 
    "No Adaptation" = "black",        
    "Organic Farming" = "yellowgreen",         
    "Water Management" = "lightblue"    
  ))

The bar chart is showing us that different Adaptation Strategies impact extreme weather events in varying ways, with “No Adaptation” leading to fewer events compared to strategies like crop rotation or water management.

Visualizing the Relationship Between Crop Yield and Precipitation

ggplot(climate_data %>% head(10), aes(x = Pesticide_Use_KG_per_HA, y = Crop_Yield_MT_per_HA)) +
  geom_point(color = "blue",size = 3) +
  labs(title = "Crop Yield vs. Pesticide Used ", x = "Pesticide Used KG per HA", y = "Crop Yield (MT per Hectare)") +
  theme_minimal()

The scatter plot on Pesticide Use vs Crop Yield shows us that using more pesticides initially boosts yields, but too much reduces productivity, meaning balance is key.

Visualizing the Average Economic Impact by Region in a Bar Chart

ggplot(climate_data %>% 
         group_by(Region) %>% 
        summarise(mean_economic_impact = mean(Economic_Impact_Million_USD, na.rm = TRUE)), 
      aes(x = Region, y = mean_economic_impact)) +
  geom_bar(stat = "identity", fill = "orange") +
 labs(title = "Average Economic Impact by Region", x = "Region", y = "Economic Impact (Million USD)") +
 theme_minimal() +
 theme(axis.text.x = element_text(angle = 45, hjust = 1))

The economic impact bar chart shows us that most regions face similar effects from climate change, with some regions like North Central and Pampas seeing slightly higher impacts.

Testing the Hypothesis: Comparing Crop Yields in India and China with Confidence Intervals

We wanted to examine whether crop yields significantly differed between India and China under the influence of climate factors. From our understanding from Module 7, we covered the basics of hypothesis testing, particularly how to set up null and alternative hypotheses.
The objective was to compare crop yields between India and China and determine if climate factors lead to significant differences between the two countries. To compare the crop yields between India and China, we used a two-sample t-test.

We conducted hypothesis testing to examine whether there is a significant difference in crop yields between India and China.

Null Hypothesis (H₀): \[H_0: \mu_{India} = \mu_{China}\]

Alternative hypothesis suggests that the mean crop yields in India and China are not equal, indicating a potential difference between the two countries’ crop yields.

Alternative (H₁): \[H_1: \mu_{India} \neq \mu_{China}\]

These hypotheses were tested using a two-sample t-test to determine if the observed difference in the means of crop yields is statistically significant. Before applying the t-test, we used the Shapiro-Wilk test to check whether the crop yield data follows a normal distribution, which is an assumption of the t-test.

Two-Sample t-test

We used a two-sample t-test to compare the means of crop yields between India and China. The equation we used for the t-test is: \[t = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}\] Where: \(\bar{X}_1\) and \(\bar{X}_2\) are the sample means for India and China. \(s_1^2\) and \(s_2^2\) are the sample variances for India and China. \(n_1\) and \(n_2\) are the sample sizes for India and China.

We calculated a 95% confidence interval to estimate the range of values within which the true difference in crop yields lies. The equation for the confidence interval is: \[CI = (\bar{X}_1 - \bar{X}_2) \pm t_{\alpha/2} \times \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\]

Where: \(\bar{X}_1 - \bar{X}_2\) is the difference in the sample means. \(t_{\alpha/2}\) is the critical value from the t-distribution for a 95% confidence level.

# Filtering dataset for Corn yields in India and China
corn_data <- climate_data %>% filter(Crop_Type == "Corn" & Country %in% c("India", "China"))
# Checking normality of crop yield with the Shapiro-Wilk test
shapiro.test(corn_data$Crop_Yield_MT_per_HA)

## 
##  Shapiro-Wilk normality test
## 
## data:  corn_data$Crop_Yield_MT_per_HA
## W = 0.96426, p-value = 0.0001014

# Summary statistics for corn yields in India and China
corn_summary <- corn_data %>%
  group_by(Country) %>%
  summarise(
    mean_yield = mean(Crop_Yield_MT_per_HA, na.rm = TRUE),
    sd_yield = sd(Crop_Yield_MT_per_HA, na.rm = TRUE),
    count = n()
  )

Showing Summary Statistics

# Performing a t-test to compare yields between India and China
yield_t_test <- t.test(Crop_Yield_MT_per_HA ~ Country, data = corn_data)
# Displaying the t-test results
print(yield_t_test)

## 
##  Welch Two Sample t-test
## 
## data:  Crop_Yield_MT_per_HA by Country
## t = 0.15795, df = 183.16, p-value = 0.8747
## alternative hypothesis: true difference in means between group China and group India is not equal to 0
## 95 percent confidence interval:
##  -0.2655497  0.3117677
## sample estimates:
## mean in group China mean in group India 
##            2.131677            2.108568

# Extracting and printing the 95% confidence interval for the difference in yields
print(yield_t_test$conf.int)

## [1] -0.2655497  0.3117677
## attr(,"conf.level")
## [1] 0.95

Results: The results showed us that the confidence interval for the difference in crop yields did not include zero, further confirming that the crop yields between India and China are significantly different. Therefore, we rejected the null hypothesis in favor of the alternative hypothesis. The confidence interval provided a range of possible values for the true difference in yields between the two countries.
We didn’t use Categorical Association because it’s meant for comparing categories, but our data crop yields, temperature, and precipitation is continuous. We are working with numbers that can take a range of values, Hypothesis Testing and Regression Analysis made more sense. Since our focus is on continuous data, these methods provided clearer insights than a categorical approach would.

Analyzing How Climate Factors Affect Crop Yields with Regression

We applied multiple linear regression to model the relationship between crop yield (the dependent variable) and two independent variables: average temperature and total precipitation. This allowed us to estimate how much crop yields change in response to variations in these climate factors. In Module 9, we learned the use of multiple linear regression, learning how to select independent variables and interpret the regression output. This understanding helped guide the construction of our model. Multiple Linear Regression The regression model to predict the crop yield based on two independent variables (average temperature and total precipitation) is given by the equation: \[Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \epsilon\] Where: \(Y\) represents the crop yield (dependent variable), - \(X_1\) is the average temperature, \(X_2\) is the total precipitation,\(\beta_0\) is the intercept (the predicted value of crop yield when all independent variables are zero), - \(\beta_1\) and \(\beta_2\) are the regression coefficients for temperature and precipitation, respectively, \(\epsilon\) is the error term, representing the variability in crop yields not explained by the independent variables.

Coefficient Interpretation Each regression coefficient \(\beta_i\) tells us how much the dependent variable (crop yield) changes for a one-unit increase in the corresponding independent variable, holding all other variables constant. The formula for the regression coefficient is: \[ \hat{\beta}_i = \frac{\text{Cov}(X_i, Y)}{\text{Var}(X_i)} \] Where:\(\text{Cov}(X_i, Y)\) is the covariance between the independent variable \(X_i\) and the dependent variable \(Y\), \(\text{Var}(X_i)\) is the variance of the independent variable \(X_i\).

Confidence Intervals for Coefficients

To estimate the uncertainty around the regression coefficients, we can also calculate confidence intervals: \[\hat{\beta}_i \pm t_{\alpha/2} \times SE(\hat{\beta}_i)\] Where: \(\hat{\beta}_i\) is the estimated regression coefficient, \(t_{\alpha/2}\) is the critical value from the t-distribution for a given confidence level (typically 95%), \(SE(\hat{\beta}_i)\) is the standard error of the regression coefficient.

Visualizing the Relationship Between Temperature and Crop Yield with a Regression Line

ggplot(corn_data, aes(x = Average_Temperature_C, y = Crop_Yield_MT_per_HA)) +
  geom_point() +
  geom_smooth(method = "lm", color = "blue",se = FALSE) +
  labs(title = "Linearity Check: Crop Yield vs. Temperature", x = "Average Temperature (degree celsius )", y = "Crop Yield (MT per Hectare)") +
  theme_minimal()

The scatter plot is showing us a small positive link between temperature and crop yield. The data points are scattered away from the regression line because temperature alone isn’t a strong predictor of crop yield. We believe that the spread suggests that other factors, like rainfall, soil type, or farming methods, are also playing big role in determining crop output. The weak relationship between temperature and yield, shown by the scattered points, basically means that yield outcomes are influenced by a combination of different variables, not just temperature, leading to variability in the data.

Multiple linear regression: Modeling the effect of temperature and precipitation on crop yield

reg_model <- lm(Crop_Yield_MT_per_HA ~ Average_Temperature_C + Total_Precipitation_mm, data = corn_data)
summary(reg_model)

## 
## Call:
## lm(formula = Crop_Yield_MT_per_HA ~ Average_Temperature_C + Total_Precipitation_mm, 
##     data = corn_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.69124 -0.73614 -0.09887  0.64441  2.39503 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            1.718e+00  1.821e-01   9.436  < 2e-16 ***
## Average_Temperature_C  1.892e-02  6.663e-03   2.840  0.00502 ** 
## Total_Precipitation_mm 8.218e-05  8.673e-05   0.947  0.34462    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9797 on 185 degrees of freedom
## Multiple R-squared:  0.04892,    Adjusted R-squared:  0.03863 
## F-statistic: 4.757 on 2 and 185 DF,  p-value: 0.009666

The regression model explains 5% of crop yield variation (R-squared = 0.049) and is statistically significant.

Temperature: Significant positive effect on yield
Precipitation: Negligible, non-significant effect

Visualizing Diagnostic Plots to Test Model Accuracy

The Residuals vs Fitted plot checks if the model’s errors are scattered randomly and evenly. The Q-Q plot checks if these errors follow a normal pattern.

par(mfrow = c(1, 2))  # Set layout for two plots
# Plotting Residuals vs Fitted (which = 1)
plot(reg_model, which = 1)
# Plottng Q-Q Plot (which = 2)
plot(reg_model, which = 2)

par(mfrow = c(1, 1))

The Main Takeaways and discussion on limitations and strengths

We thought temperature and rainfall would be the main things affecting how much food farms can grow, but it turns out it’s not that simple. While they do matter, they don’t tell the whole story. Other things, like rainfall or soil quality, likely have a bigger influence. So, to really understand what affects crop yields, we need to look at more than just temperature and consider all the important factors together.
We found that some farming tricks can really help when the weather gets crazy. Things like managing water better and changing up which crops we or farmers plant can make a big difference.
Money matters: Climate change isn’t just affecting crops, it’s hitting farmer’s wallets too. To help with this , we think financial institutions should reduce the interest rates for farmers on loans. And government should invest in things like better weather forecasting tools and new farming technology.
We’ve learned a lot, but there’s still more to figure out. We need to look closer at why some areas are struggling more than others and find ways to help them.
The data we used might not represent all farming regions or crop types, which means our findings might not apply everywhere. For example, rain-dependent regions may see different results.
Our work tackles the real-world problem of how climate change is impacting farming. As temperatures rise, these findings are important for farmers and decision-makers looking to adapt to new climate conditions. We used reliable methods, like hypothesis testing and regression analysis, to ensure our results are backed by good statistical practices. We also cleaned and prepared the data carefully to make sure the analysis was accurate.
By looking at both temperature and rainfall, we think we gave a more complete picture of how different climate factors affect agriculture.
It would be useful to see how different regions are affected by climate change, as this would help in creating specific solutions for each area. Research could also explore how temperature and rainfall together impact crops and how they interact with other factors like soil and water management.

Final Thoughts

The take away message from our work is that temperature changes are already impacting crop yields, and without proactive measures, we think these effects will likely become more severe in the future.

References

citation("dplyr"); citation("stringr"); citation("ggplot2"); citation("corrplot")

## To cite package 'dplyr' in publications use:
## 
##   Wickham H, François R, Henry L, Müller K, Vaughan D (2023). _dplyr: A
##   Grammar of Data Manipulation_. R package version 1.1.4,
##   <https://CRAN.R-project.org/package=dplyr>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {dplyr: A Grammar of Data Manipulation},
##     author = {Hadley Wickham and Romain François and Lionel Henry and Kirill Müller and Davis Vaughan},
##     year = {2023},
##     note = {R package version 1.1.4},
##     url = {https://CRAN.R-project.org/package=dplyr},
##   }

## To cite package 'stringr' in publications use:
## 
##   Wickham H (2023). _stringr: Simple, Consistent Wrappers for Common
##   String Operations_. R package version 1.5.1,
##   <https://CRAN.R-project.org/package=stringr>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {stringr: Simple, Consistent Wrappers for Common String Operations},
##     author = {Hadley Wickham},
##     year = {2023},
##     note = {R package version 1.5.1},
##     url = {https://CRAN.R-project.org/package=stringr},
##   }

## To cite ggplot2 in publications, please use
## 
##   H. Wickham. ggplot2: Elegant Graphics for Data Analysis.
##   Springer-Verlag New York, 2016.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Book{,
##     author = {Hadley Wickham},
##     title = {ggplot2: Elegant Graphics for Data Analysis},
##     publisher = {Springer-Verlag New York},
##     year = {2016},
##     isbn = {978-3-319-24277-4},
##     url = {https://ggplot2.tidyverse.org},
##   }

## To cite corrplot in publications use:
## 
##   Taiyun Wei and Viliam Simko (2024). R package 'corrplot':
##   Visualization of a Correlation Matrix (Version 0.94). Available from
##   https://github.com/taiyun/corrplot
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{corrplot2024,
##     title = {R package 'corrplot': Visualization of a Correlation Matrix},
##     author = {Taiyun Wei and Viliam Simko},
##     year = {2024},
##     note = {(Version 0.94)},
##     url = {https://github.com/taiyun/corrplot},
##   }

Course Website, Astral Theory, 2024. [Online]. Available at: https://astral-theory-157510.appspot.com/secured/index.html This link includes all the modules referenced in our work.
Grammarly: Online Writing Assistant,” Grammarly, 2024. [Online]. Available at: https://www.grammarly.com/
R Project, “Other R Documentation,” R Project, 2024. [Online]. Available: https://www.r-project.org/other-docs.html
QuillBot, “Paraphrasing Tool,” QuillBot, 2024. [Online]. Available at: https://quillbot.com/.
Waqi786, “Climate Change Impact on Agriculture,” Kaggle, 2024. [Online]. Available: https://www.kaggle.com/Our_datasetsets/waqi786/climate-change-impact-on-agriculture
W3Schools, “R Tutorial,” W3Schools, 2024. [Online].Available at: https://www.w3schools.com/r/

Growing Through Change: The Earth’s Farms Face Rising Heat

26 Years of Change (1998-2024)