Initialization:

Load the tidyverse dataset, and disable the output status.

library(tidyverse)
library(pwr)
library(broom)
library(readr)
#for multicollinearity check sense the car package is not working 
library(performance)
df_main <- read.csv("climate_change_dataset.csv")

df_main |> head()
df_main |> str()
## 'data.frame':    1000 obs. of  10 variables:
##  $ Year                       : int  2006 2019 2014 2010 2007 2020 2006 2018 2022 2010 ...
##  $ Country                    : chr  "UK" "USA" "France" "Argentina" ...
##  $ Avg.Temperature...C.       : num  8.9 31 33.9 5.9 26.9 32.3 30.7 33.9 27.8 18.3 ...
##  $ CO2.Emissions..Tons.Capita.: num  9.3 4.8 2.8 1.8 5.6 1.4 11.6 6 16.6 1.9 ...
##  $ Sea.Level.Rise..mm.        : num  3.1 4.2 2.2 3.2 2.4 2.7 3.9 4.5 1.5 3.5 ...
##  $ Rainfall..mm.              : int  1441 2407 1241 1892 1743 2100 1755 827 1966 2599 ...
##  $ Population                 : int  530911230 107364344 441101758 1069669579 124079175 1202028857 586706107 83947380 980305187 849496137 ...
##  $ Renewable.Energy....       : num  20.4 49.2 33.3 23.7 12.5 49.4 41.9 17.7 8.2 7.5 ...
##  $ Extreme.Weather.Events     : int  14 8 9 7 4 12 10 1 4 5 ...
##  $ Forest.Area....            : num  59.8 31 35.5 17.7 17.4 47.2 50.5 56.6 43.4 48.7 ...
#debug
names(df_main)
##  [1] "Year"                        "Country"                    
##  [3] "Avg.Temperature...C."        "CO2.Emissions..Tons.Capita."
##  [5] "Sea.Level.Rise..mm."         "Rainfall..mm."              
##  [7] "Population"                  "Renewable.Energy...."       
##  [9] "Extreme.Weather.Events"      "Forest.Area...."

Binary Variable Creation:

We will use Extreme Weather Events as our binary variable. It is not a natural binary column but there are no columns in the dataset that are. This means we have to convert any column that we choose and Extreme Weather Events is a good column to use because perhaps we can learn what causes these events to happen and ways to mitigate them.

#make binary variable using median split
df_main <- df_main |>
  mutate(
    High_Extreme_Weather = ifelse(
      `Extreme.Weather.Events` > median(`Extreme.Weather.Events`, na.rm = TRUE),
      1, 0
    )
  )

#distirbution
df_main |> count(High_Extreme_Weather)

Here we will set 1 to be a the extreme weather flag and 0 will be set to not extreme weather effect. We then split the entire Extreme Weather Events column along the median. With our continuous variable converted to a binary representation, we move to the next step.

Defining Explanatory Variables and Building the Model:

We know select 4 predictor variables for our model. We will use High_Extreme_Weather, Avg Temperature, CO2 Emissions, Sea Level Rise, and Renewable Energy. We choose these because we know that CO2 increase can effect the average temperature of the planet which can lead to higher sea levels which in turn can effect weather patterns. We also add renewable energy into the mix because it may effect the CO2 emission levels which lead to the all above stated effects.

df_model <- df_main |>
  select(
    High_Extreme_Weather,
    `Avg.Temperature...C.`,
    `CO2.Emissions..Tons.Capita.`,
    `Sea.Level.Rise..mm.`,
    `Renewable.Energy....`
  ) |>
  drop_na()

Building the Logistic Regression Model:

With the variables selected we now move to build the actual model:

model <- glm(
  High_Extreme_Weather ~ 
    `Avg.Temperature...C.` +
    `CO2.Emissions..Tons.Capita.` +
    `Sea.Level.Rise..mm.` +
    `Renewable.Energy....`,
  data = df_model,
  family = "binomial"
)

summary(model)
## 
## Call:
## glm(formula = High_Extreme_Weather ~ Avg.Temperature...C. + CO2.Emissions..Tons.Capita. + 
##     Sea.Level.Rise..mm. + Renewable.Energy...., family = "binomial", 
##     data = df_model)
## 
## Coefficients:
##                              Estimate Std. Error z value Pr(>|z|)  
## (Intercept)                 -0.615847   0.298003  -2.067   0.0388 *
## Avg.Temperature...C.         0.009162   0.007520   1.218   0.2231  
## CO2.Emissions..Tons.Capita.  0.004001   0.011407   0.351   0.7258  
## Sea.Level.Rise..mm.          0.069157   0.055985   1.235   0.2167  
## Renewable.Energy....        -0.003031   0.004943  -0.613   0.5398  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1368.8  on 999  degrees of freedom
## Residual deviance: 1365.0  on 995  degrees of freedom
## AIC: 1375
## 
## Number of Fisher Scoring iterations: 4

Coefficients Interpretation and Analysis:

model |> tidy()

Insights: This model predicts the probability of extreme weather events given a set of calibration variables. First lets observe that the p-value for average temperature is 0.2231. This means that 0.2231 > 0.05 and thus there is no statistical signifiance and we would FAIL to reject the null hypothesis (we don’t have one in this case but if we did) sense there is insufficent evidence to conclude a correlation exists.

We observe that CO2 emissions p-value is 0.7258 > 0.05 again indicating that there is insufficent evidence to suggest that there is a statisitcal influence between CO2 emissions and extreme weather effects. Our intercept is 0.00400 which indicates that the higher CO2 values do slightly increase extreme weather probability but the effect is extremely small. However, we must take the more pronounced p-value difference into account and say that the two are statistically unrelated.

When we observe Sea Level Rise, we see that the p-value there is 0.2167 with an intercept of 0.06916. Via the intercept we can see that there is a positive relationship between higher sea levels and more extreme weather. The correlation is stronger then CO2 emissions and average temperature. However because the p-value of 0.2167 > 0.05 we still must say that there is not enough statistical evidence to conclude that there is a link between sea level rise and extreme weather events.

Finally we consider renewable energy. We see that the p-value here is 0.5398 and the intercept value is -0.00303. Here we have a slightly negative correlation between renewable energy and extreme weather events. This indicates that as renewable energy use goes up, we actually reduce the probability of extreme weather events. Unfortuantly we cannot claim this to be so because our p-value is 0.5398 > 0.05 which means that there is no statistical signifiance to suggest that increase renewabl energy use would actually reduce extreme weather events.

After consideration we can see that all 4 of our selected variables failed to statistically significantly predict our target variable of extreme weather events. There are some variables, like sea level rise and temperature, which show positive relationships with extreme weather, but their high p-values suggest that these effects are not reliable. From this, we must inevitable conclude that either the calibration variables we choose were weak predictors of extreme weather events, or that additional variables contain information that may be critical to showing us some hidden correlation. We may also need to change our modeling approach to something more complex.

Confidence Interval:

We’ll select Average Temperature as our target for computing the confidence interval:

target_summary <- summary(model)$coefficients

beta <- target_summary["Avg.Temperature...C.", "Estimate"]
se   <- target_summary["Avg.Temperature...C.", "Std. Error"]

#95% CI
lower <- beta - 1.96 * se
upper <- beta + 1.96 * se

lower
## [1] -0.005576984
upper
## [1] 0.02390153

Here we calculated the 95% convidence interval for the coefficient of average temperature. The output gives us two values: an upper bound of 0.02390 and a lower bound of -0.00558. We can rewrite this as (−0.00558, 0.02390). This confidence interval gives us the range of all possible values for the true effect of average temperature on weather. However, you will note that this interval is quite large. We can also see that the interval runs from negative to positive. This means that the correlation between average temperature could be positive (higher average temperature increases extreme weather effects), negative (higher average temperature reduces extreme weather events), or neutral (no correlation).