Project Objective

The objective of this project is to identify relationships between various meteorological variables. This pursuit has two main goals: first, to explore correlations and the significance of relationships between climatic factors; second, to build and compare multiple linear regression models to determine the most relevant predictors of rainfall. This approach allows testing variable selection methods and performance criteria (adjusted R², MSE, AIC, Mallows’ Cp, etc.) in a controlled setting, providing a pedagogical exercise and a methodological foundation transferable to real-world climate data analysis.

Packages

The following packages are required for the analysis. Install them using: install.packages('package_name').

library(pastecs)
library(prettyR)
library(psych)
library(kableExtra)
library(dplyr)
library(purrr)
library(tidyr)
library(Hmisc)
library(corrplot)
library(leaps)
library(MASS)
library(olsrr)
library(AICcmodavg)

Data Preparation

A synthetic climate dataset is generated to simulate meteorological variables. The dataset includes 100 observations with 20 variables, such as rainfall, temperature, humidity, and wind speed.

set.seed(123)   # For reproducibility
n <- 100        # Number of observations
data <- data.frame(
  rainfall       = rnorm(n, mean = 50, sd = 20),   # mm
  temperature    = rnorm(n, mean = 27, sd = 5),    # °C
  mslp           = rnorm(n, mean = 1013, sd = 10), # hPa
  humidity       = rnorm(n, mean = 70, sd = 15),   # %
  wind_speed     = rnorm(n, mean = 5, sd = 2),     # m/s
  wind_gust      = rnorm(n, mean = 10, sd = 4),
  wind_dir       = runif(n, min = 0, max = 360),   # degrees
  sunshine       = rnorm(n, mean = 8, sd = 3),     # hours
  cloud_cover    = rnorm(n, mean = 50, sd = 20),   # %
  evapotransp    = rnorm(n, mean = 4, sd = 1.2),   # mm
  soil_moisture  = rnorm(n, mean = 30, sd = 10),   # %
  dew_point      = rnorm(n, mean = 20, sd = 3),    # °C
  vapor_pressure = rnorm(n, mean = 25, sd = 5),    # hPa
  radiation      = rnorm(n, mean = 200, sd = 40),  # W/m²
  snowfall       = rnorm(n, mean = 2, sd = 1),     # mm
  cyclone_freq   = rpois(n, lambda = 1),           # count/year
  drought_index  = rnorm(n, mean = 0, sd = 1),     # normalized index
  heatwaves      = rpois(n, lambda = 3),           # count/year
  storm_surge    = rnorm(n, mean = 0.5, sd = 0.2), # m
  sea_temp       = rnorm(n, mean = 28, sd = 2)     # °C
)

# Display dataset dimensions and summary
dim(data)
## [1] 100  20
head(data)
##   rainfall temperature     mslp humidity wind_speed wind_gust wind_dir sunshine
## 1 38.79049    23.44797 1034.988 59.27137   4.852888  7.592429 309.0929 2.184486
## 2 45.39645    28.28442 1026.124 58.70967   2.662697  6.025206 319.4585 8.321571
## 3 81.17417    25.76654 1010.349 55.92192   3.730503 14.107140 176.0729 9.826337
## 4 51.41017    25.26229 1018.432 54.21230   4.942317 13.004245 258.5131 3.647527
## 5 52.58575    22.24191 1008.857 63.44261   6.341392  3.963334 175.2140 9.441877
## 6 84.30130    26.77486 1008.238 74.96769   1.698907  9.619410 355.9352 5.515477
##   cloud_cover evapotransp soil_moisture dew_point vapor_pressure radiation
## 1    80.76860    4.674721      28.58738  20.99430       24.71515  242.0681
## 2    47.80579    3.883105      19.94622  19.43046       20.21075  224.9162
## 3    60.22942    5.219746      31.56156  21.41148       27.95531  217.3448
## 4    54.27916    2.612599      32.33634  17.14496       25.86552  215.4434
## 5    46.27759    6.785032      33.55588  23.47373       31.99892  251.6529
## 6    47.59212    3.275763      13.78142  21.75412       25.58730  159.9096
##    snowfall cyclone_freq drought_index heatwaves storm_surge sea_temp
## 1 1.8535725            3    -0.8209867         2  0.06747837 26.59699
## 2 0.5664378            1    -0.3072572         3  0.46396250 29.76447
## 3 1.2093922            1    -0.9020980         4  0.78204784 27.73326
## 4 2.8851125            0     0.6270687         2  0.62869373 25.75864
## 5 2.9030761            2     1.1203550         2  0.33574829 28.92238
## 6 4.0055733            1     2.1272136         3  0.19081667 31.04829
round(stat.desc(data), 2)
##              rainfall temperature      mslp humidity wind_speed wind_gust
## nbr.val        100.00      100.00    100.00   100.00     100.00    100.00
## nbr.null         0.00        0.00      0.00     0.00       0.00      0.00
## nbr.na           0.00        0.00      0.00     0.00       0.00      0.00
## min              3.82       16.73    995.43    33.01      -0.32     -1.24
## max             93.75       43.21   1035.93   108.57       9.79     19.72
## range           89.93       26.47     40.50    75.56      10.12     20.96
## sum           5180.81     2646.23 101420.47  6945.67     521.17    983.08
## median          51.24       25.87   1013.36    69.95       5.33      9.65
## mean            51.81       26.46   1014.20    69.46       5.21      9.83
## SE.mean          1.83        0.48      0.95     1.56       0.20      0.38
## CI.mean.0.95     3.62        0.96      1.88     3.09       0.39      0.75
## var            333.29       23.38     90.23   242.79       3.92     14.10
## std.dev         18.26        4.83      9.50    15.58       1.98      3.75
## coef.var         0.35        0.18      0.01     0.22       0.38      0.38
##              wind_dir sunshine cloud_cover evapotransp soil_moisture dew_point
## nbr.val        100.00   100.00      100.00      100.00        100.00    100.00
## nbr.null         0.00     0.00        0.00        0.00          0.00      0.00
## nbr.na           0.00     0.00        0.00        0.00          0.00      0.00
## min              0.42     1.25       -0.16        1.69          4.51     12.87
## max            359.60    16.08      103.70        7.09         58.32     29.55
## range          359.18    14.82      103.86        5.40         53.82     16.68
## sum          17947.68   811.39     4993.08      417.62       3016.34   2008.01
## median         188.70     8.34       49.99        4.15         30.55     19.82
## mean           179.48     8.11       49.93        4.18         30.16     20.08
## SE.mean         10.59     0.32        1.96        0.13          1.08      0.30
## CI.mean.0.95    21.02     0.64        3.90        0.25          2.15      0.59
## var          11225.00    10.24      385.64        1.59        116.89      8.70
## std.dev        105.95     3.20       19.64        1.26         10.81      2.95
## coef.var         0.59     0.39        0.39        0.30          0.36      0.15
##              vapor_pressure radiation snowfall cyclone_freq drought_index
## nbr.val              100.00    100.00   100.00       100.00        100.00
## nbr.null               0.00      0.00     0.00        38.00          0.00
## nbr.na                 0.00      0.00     0.00         0.00          0.00
## min                   14.01    107.45    -0.70         0.00         -2.25
## max                   35.97    335.61     4.01         4.00          2.13
## range                 21.96    228.16     4.70         4.00          4.38
## sum                 2463.16  19908.79   200.67        97.00          3.55
## median                24.69    197.59     2.25         1.00          0.05
## mean                  24.63    199.09     2.01         0.97          0.04
## SE.mean                0.49      3.81     0.10         0.10          0.09
## CI.mean.0.95           0.98      7.57     0.19         0.19          0.18
## var                   24.19   1454.54     0.95         0.94          0.79
## std.dev                4.92     38.14     0.97         0.97          0.89
## coef.var               0.20      0.19     0.49         1.00         25.06
##              heatwaves storm_surge sea_temp
## nbr.val         100.00      100.00   100.00
## nbr.null          7.00        0.00     0.00
## nbr.na            0.00        0.00     0.00
## min               0.00       -0.11    23.91
## max              10.00        1.02    32.73
## range            10.00        1.13     8.82
## sum             301.00       49.51  2801.97
## median            3.00        0.49    28.10
## mean              3.01        0.50    28.02
## SE.mean           0.19        0.02     0.19
## CI.mean.0.95      0.37        0.04     0.37
## var               3.48        0.05     3.45
## std.dev           1.87        0.22     1.86
## coef.var          0.62        0.44     0.07
describe(data, num.desc=c("mean","max","min","sd","var","median","valid.n"), xname=NA, horizontal=TRUE)
## data 
## 
##  20  Variables      100  Observations
## --------------------------------------------------------------------------------
## rainfall 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      100        0      100        1    51.81    20.71    24.70    28.64 
##      .25      .50      .75      .90      .95 
##    40.12    51.24    63.84    75.29    81.33 
## 
## lowest : 3.81662 10.6677 16.2661 19.0249 24.6921
## highest: 84.3013 85.7383 91.0017 93.3791 93.7467
## --------------------------------------------------------------------------------
## temperature 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      100        0      100        1    26.46    5.393    19.41    20.55 
##      .25      .50      .75      .90      .95 
##    22.99    25.87    29.34    32.29    36.24 
## 
## lowest : 16.7338 18.6603 18.9106 18.9923 19.1393
## highest: 36.5455 36.9861 37.5005 37.6423 43.2052
## --------------------------------------------------------------------------------
## mslp 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      100        0      100        1     1014    10.81    999.5   1002.7 
##      .25      .50      .75      .90      .95 
##   1007.7   1013.4   1020.6   1028.5   1031.7 
## 
## lowest : 995.435 996.325 997.671 999.047 999.402
## highest: 1032.02 1032.55 1033.02 1034.99 1035.93
## --------------------------------------------------------------------------------
## humidity 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      100        0      100        1    69.46    17.73    42.99    49.88 
##      .25      .50      .75      .90      .95 
##    59.06    69.95    80.33    88.65    92.90 
## 
## lowest : 33.0115 37.6353 39.2167 39.7868 40.1088
## highest: 95.0658 95.2665 100.564 106.252 108.572
## --------------------------------------------------------------------------------
## wind_speed 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      100        0      100        1    5.212    2.214    1.726    2.648 
##      .25      .50      .75      .90      .95 
##    4.207    5.330    6.443    7.595    8.275 
## 
## lowest : -0.321846 -0.286298 0.550025  0.895326  1.69891  
## highest: 8.42461   8.44852   8.55901   8.95084   9.7949   
## --------------------------------------------------------------------------------
## wind_gust 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      100        0      100        1    9.831    4.252    4.154    5.210 
##      .25      .50      .75      .90      .95 
##    7.413    9.651   12.366   14.444   15.503 
## 
## lowest : -1.2391 1.717   2.23393 3.96333 3.96897
## highest: 15.9361 16.6077 17.4396 17.7834 19.7209
## --------------------------------------------------------------------------------
## wind_dir 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      100        0      100        1    179.5    122.7    15.13    28.41 
##      .25      .50      .75      .90      .95 
##    89.80   188.70   267.21   314.66   340.85 
## 
## lowest : 0.416095 1.66973  9.36386  11.4326  14.5371 
## highest: 343.532  350.209  353.356  355.935  359.598 
## --------------------------------------------------------------------------------
## sunshine 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      100        0      100        1    8.114    3.608    2.184    3.409 
##      .25      .50      .75      .90      .95 
##    5.895    8.339   10.029   11.563   13.250 
## 
## lowest : 1.25285 1.3681  1.76453 2.11488 2.16905
## highest: 13.8987 15.1242 15.1488 15.3742 16.0751
## --------------------------------------------------------------------------------
## cloud_cover 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      100        0      100        1    49.93    22.05    14.88    20.91 
##      .25      .50      .75      .90      .95 
##    38.16    49.99    62.96    71.28    80.35 
## 
## lowest : -0.158356 9.24635   10.8359   12.2735   12.9477  
## highest: 80.7686   82.3109   87.5573   93.2283   103.697  
## --------------------------------------------------------------------------------
## evapotransp 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      100        0      100        1    4.176    1.441    2.328    2.631 
##      .25      .50      .75      .90      .95 
##    3.239    4.148    5.014    5.950    6.344 
## 
## lowest : 1.68983 1.76773 1.86403 1.9769  2.24938
## highest: 6.40802 6.78503 6.9856  7.06363 7.09054
## --------------------------------------------------------------------------------
## soil_moisture 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      100        0      100        1    30.16    12.11    13.77    17.33 
##      .25      .50      .75      .90      .95 
##    21.62    30.55    35.03    45.27    48.46 
## 
## lowest : 4.50657 8.88792 11.5153 13.2887 13.5151
## highest: 52.9962 53.0506 54.1621 57.9739 58.3223
## --------------------------------------------------------------------------------
## dew_point 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      100        0      100        1    20.08    3.322    15.67    16.93 
##      .25      .50      .75      .90      .95 
##    18.03    19.82    21.73    23.62    24.77 
## 
## lowest : 12.8678 13.9836 14.7549 14.9244 15.2947
## highest: 25.3097 25.3373 25.7151 27.2735 29.5521
## --------------------------------------------------------------------------------
## vapor_pressure 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      100        0      100        1    24.63    5.606    15.85    18.82 
##      .25      .50      .75      .90      .95 
##    21.19    24.69    28.05    30.93    32.20 
## 
## lowest : 14.0054 14.1294 14.1773 14.9066 15.3278
## highest: 32.2641 32.412  34.7683 35.3235 35.968 
## --------------------------------------------------------------------------------
## radiation 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      100        0      100        1    199.1    42.54    131.8    151.9 
##      .25      .50      .75      .90      .95 
##    175.7    197.6    224.6    245.4    257.4 
## 
## lowest : 107.451 120.237 120.338 125.453 130.915
## highest: 258.399 259.727 270.93  276.088 335.615
## --------------------------------------------------------------------------------
## snowfall 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      100        0      100        1    2.007      1.1   0.2964   0.6705 
##      .25      .50      .75      .90      .95 
##   1.3572   2.2457   2.6620   3.1414   3.3082 
## 
## lowest : -0.695329  -0.168418  0.00741423 0.0313384  0.258978  
## highest: 3.36467    3.37305    3.58043    3.90014    4.00557   
## --------------------------------------------------------------------------------
## cyclone_freq 
##        n  missing distinct     Info     Mean      Gmd 
##      100        0        5    0.893     0.97    1.028 
##                                    
## Value         0    1    2    3    4
## Frequency    38   35   21    4    2
## Proportion 0.38 0.35 0.21 0.04 0.02
## 
## For the frequency table, variable is rounded to the nearest 0
## --------------------------------------------------------------------------------
## drought_index 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      100        0      100        1  0.03549    1.009 -1.43218 -1.11495 
##      .25      .50      .75      .90      .95 
## -0.54047  0.05388  0.64071  1.11228  1.53470 
## 
## lowest : -2.24869 -2.1449  -1.58499 -1.55785 -1.54609
## highest: 1.57785  1.76202  1.76937  1.9727   2.12721 
## --------------------------------------------------------------------------------
## heatwaves 
##        n  missing distinct     Info     Mean      Gmd 
##      100        0        9    0.966     3.01    2.051 
##                                                        
## Value         0    1    2    3    4    5    6    8   10
## Frequency     7   12   27   20   10   15    7    1    1
## Proportion 0.07 0.12 0.27 0.20 0.10 0.15 0.07 0.01 0.01
## 
## For the frequency table, variable is rounded to the nearest 0
## --------------------------------------------------------------------------------
## storm_surge 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      100        0      100        1   0.4951   0.2474   0.1479   0.2299 
##      .25      .50      .75      .90      .95 
##   0.3378   0.4911   0.6750   0.7756   0.8260 
## 
## lowest : -0.109572 0.007485  0.0674784 0.0812721 0.112012 
## highest: 0.852909  0.858085  0.873731  0.876568  1.01545  
## --------------------------------------------------------------------------------
## sea_temp 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      100        0      100        1    28.02    2.115    25.18    25.70 
##      .25      .50      .75      .90      .95 
##    26.66    28.10    29.03    30.54    31.05 
## 
## lowest : 23.911  24.2366 24.3302 24.6937 24.8818
## highest: 31.1316 31.1749 31.3802 31.8795 32.7301
## --------------------------------------------------------------------------------

Correlation Analysis

This section explores correlations between meteorological variables using Pearson correlation coefficients and significance tests. A custom function cor_stars is used to display correlations with significance stars, and a correlation plot is generated for visualization.

# Correlation matrix
lowerCor(data)
##                rnfll tmprt mslp  hmdty wnd_s wnd_g wnd_d snshn cld_c evptr
## rainfall        1.00                                                      
## temperature    -0.05  1.00                                                
## mslp           -0.13  0.03  1.00                                          
## humidity       -0.04  0.04 -0.04  1.00                                    
## wind_speed     -0.19 -0.13 -0.02 -0.02  1.00                              
## wind_gust      -0.06  0.11  0.02 -0.09  0.21  1.00                        
## wind_dir        0.07  0.00  0.07 -0.02 -0.16 -0.10  1.00                  
## sunshine        0.16 -0.04  0.20 -0.10  0.05  0.05  0.04  1.00            
## cloud_cover     0.00 -0.20  0.07 -0.19 -0.06  0.02 -0.06  0.01  1.00      
## evapotransp     0.17 -0.05 -0.09  0.18 -0.06 -0.20  0.05 -0.15 -0.03  1.00
## soil_moisture   0.12  0.04  0.01  0.01 -0.01  0.05  0.00  0.14 -0.08  0.11
## dew_point       0.01 -0.11 -0.01  0.09  0.01  0.04 -0.10  0.09  0.03  0.03
## vapor_pressure -0.09 -0.04 -0.01  0.06  0.00  0.06 -0.05  0.11 -0.01  0.00
## radiation       0.05 -0.04 -0.09  0.06  0.03 -0.10 -0.02 -0.01  0.14  0.29
## snowfall        0.06 -0.12 -0.01 -0.03 -0.17  0.10 -0.06  0.02  0.19 -0.05
## cyclone_freq    0.05 -0.07  0.11 -0.05  0.13 -0.02 -0.12  0.15  0.03 -0.02
## drought_index   0.07 -0.05 -0.08  0.13  0.10  0.20 -0.18 -0.09 -0.10  0.01
## heatwaves      -0.01 -0.16  0.07 -0.06  0.09  0.21 -0.17 -0.02  0.05 -0.11
## storm_surge     0.03 -0.16 -0.09  0.00 -0.11  0.09 -0.14  0.07  0.21  0.03
## sea_temp       -0.14 -0.12 -0.01 -0.24 -0.03 -0.01 -0.26 -0.05 -0.01 -0.05
##                sl_ms
## rainfall            
## temperature         
## mslp                
## humidity            
## wind_speed          
## wind_gust           
## wind_dir            
## sunshine            
## cloud_cover         
## evapotransp         
## soil_moisture   1.00
## dew_point       0.03
## vapor_pressure -0.05
## radiation       0.10
## snowfall        0.12
## cyclone_freq    0.07
## drought_index  -0.07
## heatwaves      -0.04
## storm_surge    -0.06
## sea_temp       -0.12
##                dw_pn vpr_p radtn snwfl cycl_ drgh_ htwvs strm_ s_tmp
## dew_point       1.00                                                
## vapor_pressure  0.18  1.00                                          
## radiation       0.20  0.10  1.00                                    
## snowfall        0.20  0.00 -0.04  1.00                              
## cyclone_freq   -0.04 -0.02  0.07 -0.08  1.00                        
## drought_index   0.23  0.03 -0.02  0.02 -0.10  1.00                  
## heatwaves      -0.01  0.06  0.04  0.05  0.06  0.02  1.00            
## storm_surge    -0.01 -0.13  0.00  0.05 -0.18  0.01 -0.04  1.00      
## sea_temp        0.03 -0.05 -0.22  0.16 -0.06  0.03  0.16 -0.08  1.00
# Custom function to display correlations with significance stars
#cor_stars <- function(dt, digits = 3){
#  names <- names(dt)
#  n <- length(names)
#  x <- character(0)
#  for(i in 1:n) x <- c(x, rep(names[i], n-i))
#  y <- character(0)
#  for(i in 1:(n-1)) y <- c(y, names[(i+1):n])
#  df <- tibble(x=x, y=y)
#  df <- df %>%
#    mutate(r = map2_dbl(.x = x, .y = y, ~ cor(dt %>% pull(.x), dt %>% pull(.y))),
#           p = map2_dbl(.x = x, .y = y, ~ cor.test(dt %>% pull(.x), dt %>% pull(.y))$p.value))
#  df <- df %>%
#    mutate(stars = case_when(p < 0.05 & p >= 0.01 ~ "*  ",
#                             p < 0.01 & p >= 0.001 ~ "** ",
#                             p < 0.001 ~ "***",
#                             is.na(r) ~ "   ",
#                             TRUE ~ "   "),
#           fr = format(r, digits = digits-1, nsmall = digits-1))
#  df <- df %>%
#    mutate(cs = paste0(fr, stars))
#  r_tab <- df %>%
#    select(x, y, cs) %>%
#     pivot_wider(id_cols = y, names_from = x, values_from = cs)
#  r_tab <- r_tab %>%
#    mutate(across(everything(), ~ replace_na(.x, ""))) %>%
#    rename("term" = "y")
#  r_tab <- data.frame(r_tab)
#  return(r_tab)
#}

# Display correlation table
#cor_stars(data) %>%
#  kbl() %>%
#  kable_classic_2(full_width = FALSE)
#cor_stars(data, digits = 2)

# Additional correlation outputs
round(cor(data, method="pearson"), 3)
##                rainfall temperature   mslp humidity wind_speed wind_gust
## rainfall          1.000      -0.050 -0.129   -0.044     -0.193    -0.056
## temperature      -0.050       1.000  0.031    0.044     -0.131     0.114
## mslp             -0.129       0.031  1.000   -0.045     -0.025     0.018
## humidity         -0.044       0.044 -0.045    1.000     -0.019    -0.090
## wind_speed       -0.193      -0.131 -0.025   -0.019      1.000     0.207
## wind_gust        -0.056       0.114  0.018   -0.090      0.207     1.000
## wind_dir          0.073      -0.003  0.074   -0.017     -0.159    -0.101
## sunshine          0.157      -0.039  0.197   -0.104      0.047     0.053
## cloud_cover       0.000      -0.195  0.065   -0.192     -0.061     0.024
## evapotransp       0.172      -0.052 -0.088    0.182     -0.060    -0.195
## soil_moisture     0.121       0.044  0.012    0.011     -0.007     0.050
## dew_point         0.015      -0.107 -0.012    0.091      0.013     0.036
## vapor_pressure   -0.093      -0.039 -0.009    0.063      0.003     0.064
## radiation         0.055      -0.044 -0.092    0.060      0.025    -0.095
## snowfall          0.055      -0.123 -0.015   -0.032     -0.171     0.100
## cyclone_freq      0.053      -0.073  0.113   -0.055      0.125    -0.017
## drought_index     0.069      -0.053 -0.079    0.131      0.102     0.201
## heatwaves        -0.006      -0.164  0.074   -0.064      0.095     0.212
## storm_surge       0.033      -0.159 -0.092    0.003     -0.109     0.091
## sea_temp         -0.145      -0.123 -0.011   -0.242     -0.028    -0.008
##                wind_dir sunshine cloud_cover evapotransp soil_moisture
## rainfall          0.073    0.157       0.000       0.172         0.121
## temperature      -0.003   -0.039      -0.195      -0.052         0.044
## mslp              0.074    0.197       0.065      -0.088         0.012
## humidity         -0.017   -0.104      -0.192       0.182         0.011
## wind_speed       -0.159    0.047      -0.061      -0.060        -0.007
## wind_gust        -0.101    0.053       0.024      -0.195         0.050
## wind_dir          1.000    0.042      -0.064       0.046         0.005
## sunshine          0.042    1.000       0.014      -0.154         0.141
## cloud_cover      -0.064    0.014       1.000      -0.034        -0.077
## evapotransp       0.046   -0.154      -0.034       1.000         0.112
## soil_moisture     0.005    0.141      -0.077       0.112         1.000
## dew_point        -0.104    0.089       0.028       0.030         0.032
## vapor_pressure   -0.052    0.112      -0.007      -0.005        -0.054
## radiation        -0.018   -0.010       0.144       0.290         0.099
## snowfall         -0.056    0.024       0.193      -0.050         0.125
## cyclone_freq     -0.125    0.150       0.030      -0.016         0.067
## drought_index    -0.182   -0.094      -0.104       0.011        -0.065
## heatwaves        -0.170   -0.021       0.053      -0.110        -0.045
## storm_surge      -0.140    0.070       0.214       0.029        -0.064
## sea_temp         -0.261   -0.047      -0.007      -0.046        -0.125
##                dew_point vapor_pressure radiation snowfall cyclone_freq
## rainfall           0.015         -0.093     0.055    0.055        0.053
## temperature       -0.107         -0.039    -0.044   -0.123       -0.073
## mslp              -0.012         -0.009    -0.092   -0.015        0.113
## humidity           0.091          0.063     0.060   -0.032       -0.055
## wind_speed         0.013          0.003     0.025   -0.171        0.125
## wind_gust          0.036          0.064    -0.095    0.100       -0.017
## wind_dir          -0.104         -0.052    -0.018   -0.056       -0.125
## sunshine           0.089          0.112    -0.010    0.024        0.150
## cloud_cover        0.028         -0.007     0.144    0.193        0.030
## evapotransp        0.030         -0.005     0.290   -0.050       -0.016
## soil_moisture      0.032         -0.054     0.099    0.125        0.067
## dew_point          1.000          0.175     0.200    0.197       -0.042
## vapor_pressure     0.175          1.000     0.104    0.000       -0.024
## radiation          0.200          0.104     1.000   -0.036        0.066
## snowfall           0.197          0.000    -0.036    1.000       -0.081
## cyclone_freq      -0.042         -0.024     0.066   -0.081        1.000
## drought_index      0.233          0.026    -0.021    0.021       -0.101
## heatwaves         -0.006          0.058     0.042    0.054        0.056
## storm_surge       -0.011         -0.130     0.001    0.053       -0.181
## sea_temp           0.028         -0.054    -0.218    0.155       -0.059
##                drought_index heatwaves storm_surge sea_temp
## rainfall               0.069    -0.006       0.033   -0.145
## temperature           -0.053    -0.164      -0.159   -0.123
## mslp                  -0.079     0.074      -0.092   -0.011
## humidity               0.131    -0.064       0.003   -0.242
## wind_speed             0.102     0.095      -0.109   -0.028
## wind_gust              0.201     0.212       0.091   -0.008
## wind_dir              -0.182    -0.170      -0.140   -0.261
## sunshine              -0.094    -0.021       0.070   -0.047
## cloud_cover           -0.104     0.053       0.214   -0.007
## evapotransp            0.011    -0.110       0.029   -0.046
## soil_moisture         -0.065    -0.045      -0.064   -0.125
## dew_point              0.233    -0.006      -0.011    0.028
## vapor_pressure         0.026     0.058      -0.130   -0.054
## radiation             -0.021     0.042       0.001   -0.218
## snowfall               0.021     0.054       0.053    0.155
## cyclone_freq          -0.101     0.056      -0.181   -0.059
## drought_index          1.000     0.023       0.012    0.034
## heatwaves              0.023     1.000      -0.041    0.157
## storm_surge            0.012    -0.041       1.000   -0.075
## sea_temp               0.034     0.157      -0.075    1.000
round(corr.test(data)$p, 3)
##                rainfall temperature  mslp humidity wind_speed wind_gust
## rainfall          0.000       1.000 1.000    1.000      1.000     1.000
## temperature       0.625       0.000 1.000    1.000      1.000     1.000
## mslp              0.200       0.763 0.000    1.000      1.000     1.000
## humidity          0.663       0.665 0.658    0.000      1.000     1.000
## wind_speed        0.055       0.195 0.806    0.849      0.000     1.000
## wind_gust         0.577       0.257 0.857    0.374      0.039     0.000
## wind_dir          0.472       0.974 0.467    0.866      0.114     0.316
## sunshine          0.119       0.697 0.049    0.304      0.641     0.597
## cloud_cover       1.000       0.052 0.518    0.056      0.546     0.810
## evapotransp       0.087       0.605 0.382    0.070      0.555     0.052
## soil_moisture     0.229       0.661 0.905    0.910      0.949     0.619
## dew_point         0.886       0.291 0.903    0.368      0.894     0.720
## vapor_pressure    0.360       0.699 0.929    0.536      0.974     0.529
## radiation         0.588       0.661 0.361    0.555      0.805     0.346
## snowfall          0.585       0.222 0.883    0.749      0.089     0.324
## cyclone_freq      0.603       0.470 0.262    0.588      0.214     0.865
## drought_index     0.497       0.597 0.437    0.194      0.315     0.044
## heatwaves         0.950       0.103 0.463    0.528      0.348     0.035
## storm_surge       0.743       0.114 0.361    0.974      0.279     0.369
## sea_temp          0.151       0.225 0.910    0.015      0.783     0.935
##                wind_dir sunshine cloud_cover evapotransp soil_moisture
## rainfall          1.000    1.000       1.000       1.000         1.000
## temperature       1.000    1.000       1.000       1.000         1.000
## mslp              1.000    1.000       1.000       1.000         1.000
## humidity          1.000    1.000       1.000       1.000         1.000
## wind_speed        1.000    1.000       1.000       1.000         1.000
## wind_gust         1.000    1.000       1.000       1.000         1.000
## wind_dir          0.000    1.000       1.000       1.000         1.000
## sunshine          0.680    0.000       1.000       1.000         1.000
## cloud_cover       0.525    0.894       0.000       1.000         1.000
## evapotransp       0.649    0.126       0.737       0.000         1.000
## soil_moisture     0.963    0.162       0.448       0.268         0.000
## dew_point         0.304    0.377       0.785       0.768         0.754
## vapor_pressure    0.606    0.266       0.943       0.963         0.592
## radiation         0.855    0.923       0.154       0.003         0.326
## snowfall          0.582    0.815       0.055       0.623         0.217
## cyclone_freq      0.217    0.135       0.768       0.871         0.507
## drought_index     0.070    0.350       0.303       0.917         0.518
## heatwaves         0.091    0.833       0.603       0.275         0.659
## storm_surge       0.166    0.491       0.032       0.771         0.528
## sea_temp          0.009    0.646       0.943       0.653         0.216
##                dew_point vapor_pressure radiation snowfall cyclone_freq
## rainfall           1.000          1.000     1.000    1.000        1.000
## temperature        1.000          1.000     1.000    1.000        1.000
## mslp               1.000          1.000     1.000    1.000        1.000
## humidity           1.000          1.000     1.000    1.000        1.000
## wind_speed         1.000          1.000     1.000    1.000        1.000
## wind_gust          1.000          1.000     1.000    1.000        1.000
## wind_dir           1.000          1.000     1.000    1.000        1.000
## sunshine           1.000          1.000     1.000    1.000        1.000
## cloud_cover        1.000          1.000     1.000    1.000        1.000
## evapotransp        1.000          1.000     0.658    1.000        1.000
## soil_moisture      1.000          1.000     1.000    1.000        1.000
## dew_point          0.000          1.000     1.000    1.000        1.000
## vapor_pressure     0.082          0.000     1.000    1.000        1.000
## radiation          0.046          0.304     0.000    1.000        1.000
## snowfall           0.049          0.998     0.726    0.000        1.000
## cyclone_freq       0.679          0.813     0.514    0.422        0.000
## drought_index      0.019          0.796     0.833    0.839        0.316
## heatwaves          0.956          0.564     0.678    0.592        0.580
## storm_surge        0.911          0.199     0.994    0.600        0.072
## sea_temp           0.781          0.594     0.029    0.123        0.563
##                drought_index heatwaves storm_surge sea_temp
## rainfall               1.000     1.000       1.000        1
## temperature            1.000     1.000       1.000        1
## mslp                   1.000     1.000       1.000        1
## humidity               1.000     1.000       1.000        1
## wind_speed             1.000     1.000       1.000        1
## wind_gust              1.000     1.000       1.000        1
## wind_dir               1.000     1.000       1.000        1
## sunshine               1.000     1.000       1.000        1
## cloud_cover            1.000     1.000       1.000        1
## evapotransp            1.000     1.000       1.000        1
## soil_moisture          1.000     1.000       1.000        1
## dew_point              1.000     1.000       1.000        1
## vapor_pressure         1.000     1.000       1.000        1
## radiation              1.000     1.000       1.000        1
## snowfall               1.000     1.000       1.000        1
## cyclone_freq           1.000     1.000       1.000        1
## drought_index          0.000     1.000       1.000        1
## heatwaves              0.817     0.000       1.000        1
## storm_surge            0.907     0.685       0.000        1
## sea_temp               0.737     0.118       0.458        0
cor.test(data[,1], data[,2], alternative="two.sided")
## 
##  Pearson's product-moment correlation
## 
## data:  data[, 1] and data[, 2]
## t = -0.49095, df = 98, p-value = 0.6246
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2435805  0.1483291
## sample estimates:
##         cor 
## -0.04953215
# Correlation plot
M <- cor(data)
col <- colorRampPalette(c("red", "white", "blue"))(5)
corrplot(M, col=col, type="lower", method="color", order="original", diag=FALSE, tl.col="black", tl.srt=90)

Variable Selection

This section applies variable selection techniques to identify the most relevant predictors for rainfall, using methods such as best subset selection.

# Verify dataset
cbind(names(data))
##       [,1]            
##  [1,] "rainfall"      
##  [2,] "temperature"   
##  [3,] "mslp"          
##  [4,] "humidity"      
##  [5,] "wind_speed"    
##  [6,] "wind_gust"     
##  [7,] "wind_dir"      
##  [8,] "sunshine"      
##  [9,] "cloud_cover"   
## [10,] "evapotransp"   
## [11,] "soil_moisture" 
## [12,] "dew_point"     
## [13,] "vapor_pressure"
## [14,] "radiation"     
## [15,] "snowfall"      
## [16,] "cyclone_freq"  
## [17,] "drought_index" 
## [18,] "heatwaves"     
## [19,] "storm_surge"   
## [20,] "sea_temp"
dim(data)
## [1] 100  20
# Multiple linear regression (full model)
fitall <- lm(rainfall ~ temperature + mslp + humidity + wind_speed, data = data)
summary(fitall)
## 
## Call:
## lm(formula = rainfall ~ temperature + mslp + humidity + wind_speed, 
##     data = data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -48.51 -11.77  -0.67  10.99  47.02 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 334.94695  194.64831   1.721   0.0885 .
## temperature  -0.26476    0.37877  -0.699   0.4863  
## mslp         -0.25840    0.19123  -1.351   0.1798  
## humidity     -0.05976    0.11662  -0.512   0.6095  
## wind_speed   -1.90243    0.92462  -2.058   0.0424 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.04 on 95 degrees of freedom
## Multiple R-squared:  0.06282,    Adjusted R-squared:  0.02336 
## F-statistic: 1.592 on 4 and 95 DF,  p-value: 0.1827
# Best subset selection
selols2 <- ols_step_best_subset(fitall)
write.table(selols2, "OLS.txt", row.names = TRUE, col.names = TRUE)

Model Development

Multiple linear regression models are constructed to predict rainfall based on selected meteorological variables. A reduced model is also built for comparison.

# Reduced model
fit <- lm(rainfall ~ temperature + mslp + humidity, data = data)

Model Comparison

The developed models are compared using performance metrics such as adjusted R², Mean Squared Error (MSE), Akaike Information Criterion (AIC), and Mallows’ Cp. Variance Inflation Factor (VIF) is calculated to check for multicollinearity.

# Model fit indicators
round(summary(fit)$r.squared, 3)        # R-Squared
## [1] 0.021
round(summary(fit)$adj.r.squared, 3)    # Adjusted R-Squared
## [1] -0.01
round(mean(fit$residuals^2), 3)         # Mean Squared Error (MSE)
## [1] 323.011
# Model selection criteria
ols_aic(fit, method = c("R", "STATA", "SAS"))
## [1] 871.5563
round(ols_mallows_cp(fit, fitall), 3)   # Mallows' Cp
## [1] 7.233
round(ols_apc(fit), 3)
## [1] 1.061
# AIC comparison
models <- list(fitall, fit)
mod.names <- c('Full Model', 'Reduced Model')
aictab(cand.set = models, modnames = mod.names)
## 
## Model selection based on AICc:
## 
##               K   AICc Delta_AICc AICcWt Cum.Wt      LL
## Full Model    6 870.10       0.00   0.74   0.74 -428.60
## Reduced Model 5 872.19       2.09   0.26   1.00 -430.78
# Variance Inflation Factor
ols_vif_tol(fit)
##     Variables Tolerance      VIF
## 1 temperature 0.9970173 1.002992
## 2        mslp 0.9969258 1.003084
## 3    humidity 0.9959417 1.004075

Pedagogical Value

This project serves as an educational exercise, demonstrating the application of statistical methods and regression modeling in a controlled environment. The methodology developed here can be applied to real-world climate data analysis, providing insights into climate patterns and informing predictive modeling. author: “Abdi-Basid ADAN” email: “