The objective of this project is to identify relationships between various meteorological variables. This pursuit has two main goals: first, to explore correlations and the significance of relationships between climatic factors; second, to build and compare multiple linear regression models to determine the most relevant predictors of rainfall. This approach allows testing variable selection methods and performance criteria (adjusted R², MSE, AIC, Mallows’ Cp, etc.) in a controlled setting, providing a pedagogical exercise and a methodological foundation transferable to real-world climate data analysis.
The following packages are required for the analysis. Install them
using: install.packages('package_name').
library(pastecs)
library(prettyR)
library(psych)
library(kableExtra)
library(dplyr)
library(purrr)
library(tidyr)
library(Hmisc)
library(corrplot)
library(leaps)
library(MASS)
library(olsrr)
library(AICcmodavg)
A synthetic climate dataset is generated to simulate meteorological variables. The dataset includes 100 observations with 20 variables, such as rainfall, temperature, humidity, and wind speed.
set.seed(123) # For reproducibility
n <- 100 # Number of observations
data <- data.frame(
rainfall = rnorm(n, mean = 50, sd = 20), # mm
temperature = rnorm(n, mean = 27, sd = 5), # °C
mslp = rnorm(n, mean = 1013, sd = 10), # hPa
humidity = rnorm(n, mean = 70, sd = 15), # %
wind_speed = rnorm(n, mean = 5, sd = 2), # m/s
wind_gust = rnorm(n, mean = 10, sd = 4),
wind_dir = runif(n, min = 0, max = 360), # degrees
sunshine = rnorm(n, mean = 8, sd = 3), # hours
cloud_cover = rnorm(n, mean = 50, sd = 20), # %
evapotransp = rnorm(n, mean = 4, sd = 1.2), # mm
soil_moisture = rnorm(n, mean = 30, sd = 10), # %
dew_point = rnorm(n, mean = 20, sd = 3), # °C
vapor_pressure = rnorm(n, mean = 25, sd = 5), # hPa
radiation = rnorm(n, mean = 200, sd = 40), # W/m²
snowfall = rnorm(n, mean = 2, sd = 1), # mm
cyclone_freq = rpois(n, lambda = 1), # count/year
drought_index = rnorm(n, mean = 0, sd = 1), # normalized index
heatwaves = rpois(n, lambda = 3), # count/year
storm_surge = rnorm(n, mean = 0.5, sd = 0.2), # m
sea_temp = rnorm(n, mean = 28, sd = 2) # °C
)
# Display dataset dimensions and summary
dim(data)
## [1] 100 20
head(data)
## rainfall temperature mslp humidity wind_speed wind_gust wind_dir sunshine
## 1 38.79049 23.44797 1034.988 59.27137 4.852888 7.592429 309.0929 2.184486
## 2 45.39645 28.28442 1026.124 58.70967 2.662697 6.025206 319.4585 8.321571
## 3 81.17417 25.76654 1010.349 55.92192 3.730503 14.107140 176.0729 9.826337
## 4 51.41017 25.26229 1018.432 54.21230 4.942317 13.004245 258.5131 3.647527
## 5 52.58575 22.24191 1008.857 63.44261 6.341392 3.963334 175.2140 9.441877
## 6 84.30130 26.77486 1008.238 74.96769 1.698907 9.619410 355.9352 5.515477
## cloud_cover evapotransp soil_moisture dew_point vapor_pressure radiation
## 1 80.76860 4.674721 28.58738 20.99430 24.71515 242.0681
## 2 47.80579 3.883105 19.94622 19.43046 20.21075 224.9162
## 3 60.22942 5.219746 31.56156 21.41148 27.95531 217.3448
## 4 54.27916 2.612599 32.33634 17.14496 25.86552 215.4434
## 5 46.27759 6.785032 33.55588 23.47373 31.99892 251.6529
## 6 47.59212 3.275763 13.78142 21.75412 25.58730 159.9096
## snowfall cyclone_freq drought_index heatwaves storm_surge sea_temp
## 1 1.8535725 3 -0.8209867 2 0.06747837 26.59699
## 2 0.5664378 1 -0.3072572 3 0.46396250 29.76447
## 3 1.2093922 1 -0.9020980 4 0.78204784 27.73326
## 4 2.8851125 0 0.6270687 2 0.62869373 25.75864
## 5 2.9030761 2 1.1203550 2 0.33574829 28.92238
## 6 4.0055733 1 2.1272136 3 0.19081667 31.04829
round(stat.desc(data), 2)
## rainfall temperature mslp humidity wind_speed wind_gust
## nbr.val 100.00 100.00 100.00 100.00 100.00 100.00
## nbr.null 0.00 0.00 0.00 0.00 0.00 0.00
## nbr.na 0.00 0.00 0.00 0.00 0.00 0.00
## min 3.82 16.73 995.43 33.01 -0.32 -1.24
## max 93.75 43.21 1035.93 108.57 9.79 19.72
## range 89.93 26.47 40.50 75.56 10.12 20.96
## sum 5180.81 2646.23 101420.47 6945.67 521.17 983.08
## median 51.24 25.87 1013.36 69.95 5.33 9.65
## mean 51.81 26.46 1014.20 69.46 5.21 9.83
## SE.mean 1.83 0.48 0.95 1.56 0.20 0.38
## CI.mean.0.95 3.62 0.96 1.88 3.09 0.39 0.75
## var 333.29 23.38 90.23 242.79 3.92 14.10
## std.dev 18.26 4.83 9.50 15.58 1.98 3.75
## coef.var 0.35 0.18 0.01 0.22 0.38 0.38
## wind_dir sunshine cloud_cover evapotransp soil_moisture dew_point
## nbr.val 100.00 100.00 100.00 100.00 100.00 100.00
## nbr.null 0.00 0.00 0.00 0.00 0.00 0.00
## nbr.na 0.00 0.00 0.00 0.00 0.00 0.00
## min 0.42 1.25 -0.16 1.69 4.51 12.87
## max 359.60 16.08 103.70 7.09 58.32 29.55
## range 359.18 14.82 103.86 5.40 53.82 16.68
## sum 17947.68 811.39 4993.08 417.62 3016.34 2008.01
## median 188.70 8.34 49.99 4.15 30.55 19.82
## mean 179.48 8.11 49.93 4.18 30.16 20.08
## SE.mean 10.59 0.32 1.96 0.13 1.08 0.30
## CI.mean.0.95 21.02 0.64 3.90 0.25 2.15 0.59
## var 11225.00 10.24 385.64 1.59 116.89 8.70
## std.dev 105.95 3.20 19.64 1.26 10.81 2.95
## coef.var 0.59 0.39 0.39 0.30 0.36 0.15
## vapor_pressure radiation snowfall cyclone_freq drought_index
## nbr.val 100.00 100.00 100.00 100.00 100.00
## nbr.null 0.00 0.00 0.00 38.00 0.00
## nbr.na 0.00 0.00 0.00 0.00 0.00
## min 14.01 107.45 -0.70 0.00 -2.25
## max 35.97 335.61 4.01 4.00 2.13
## range 21.96 228.16 4.70 4.00 4.38
## sum 2463.16 19908.79 200.67 97.00 3.55
## median 24.69 197.59 2.25 1.00 0.05
## mean 24.63 199.09 2.01 0.97 0.04
## SE.mean 0.49 3.81 0.10 0.10 0.09
## CI.mean.0.95 0.98 7.57 0.19 0.19 0.18
## var 24.19 1454.54 0.95 0.94 0.79
## std.dev 4.92 38.14 0.97 0.97 0.89
## coef.var 0.20 0.19 0.49 1.00 25.06
## heatwaves storm_surge sea_temp
## nbr.val 100.00 100.00 100.00
## nbr.null 7.00 0.00 0.00
## nbr.na 0.00 0.00 0.00
## min 0.00 -0.11 23.91
## max 10.00 1.02 32.73
## range 10.00 1.13 8.82
## sum 301.00 49.51 2801.97
## median 3.00 0.49 28.10
## mean 3.01 0.50 28.02
## SE.mean 0.19 0.02 0.19
## CI.mean.0.95 0.37 0.04 0.37
## var 3.48 0.05 3.45
## std.dev 1.87 0.22 1.86
## coef.var 0.62 0.44 0.07
describe(data, num.desc=c("mean","max","min","sd","var","median","valid.n"), xname=NA, horizontal=TRUE)
## data
##
## 20 Variables 100 Observations
## --------------------------------------------------------------------------------
## rainfall
## n missing distinct Info Mean Gmd .05 .10
## 100 0 100 1 51.81 20.71 24.70 28.64
## .25 .50 .75 .90 .95
## 40.12 51.24 63.84 75.29 81.33
##
## lowest : 3.81662 10.6677 16.2661 19.0249 24.6921
## highest: 84.3013 85.7383 91.0017 93.3791 93.7467
## --------------------------------------------------------------------------------
## temperature
## n missing distinct Info Mean Gmd .05 .10
## 100 0 100 1 26.46 5.393 19.41 20.55
## .25 .50 .75 .90 .95
## 22.99 25.87 29.34 32.29 36.24
##
## lowest : 16.7338 18.6603 18.9106 18.9923 19.1393
## highest: 36.5455 36.9861 37.5005 37.6423 43.2052
## --------------------------------------------------------------------------------
## mslp
## n missing distinct Info Mean Gmd .05 .10
## 100 0 100 1 1014 10.81 999.5 1002.7
## .25 .50 .75 .90 .95
## 1007.7 1013.4 1020.6 1028.5 1031.7
##
## lowest : 995.435 996.325 997.671 999.047 999.402
## highest: 1032.02 1032.55 1033.02 1034.99 1035.93
## --------------------------------------------------------------------------------
## humidity
## n missing distinct Info Mean Gmd .05 .10
## 100 0 100 1 69.46 17.73 42.99 49.88
## .25 .50 .75 .90 .95
## 59.06 69.95 80.33 88.65 92.90
##
## lowest : 33.0115 37.6353 39.2167 39.7868 40.1088
## highest: 95.0658 95.2665 100.564 106.252 108.572
## --------------------------------------------------------------------------------
## wind_speed
## n missing distinct Info Mean Gmd .05 .10
## 100 0 100 1 5.212 2.214 1.726 2.648
## .25 .50 .75 .90 .95
## 4.207 5.330 6.443 7.595 8.275
##
## lowest : -0.321846 -0.286298 0.550025 0.895326 1.69891
## highest: 8.42461 8.44852 8.55901 8.95084 9.7949
## --------------------------------------------------------------------------------
## wind_gust
## n missing distinct Info Mean Gmd .05 .10
## 100 0 100 1 9.831 4.252 4.154 5.210
## .25 .50 .75 .90 .95
## 7.413 9.651 12.366 14.444 15.503
##
## lowest : -1.2391 1.717 2.23393 3.96333 3.96897
## highest: 15.9361 16.6077 17.4396 17.7834 19.7209
## --------------------------------------------------------------------------------
## wind_dir
## n missing distinct Info Mean Gmd .05 .10
## 100 0 100 1 179.5 122.7 15.13 28.41
## .25 .50 .75 .90 .95
## 89.80 188.70 267.21 314.66 340.85
##
## lowest : 0.416095 1.66973 9.36386 11.4326 14.5371
## highest: 343.532 350.209 353.356 355.935 359.598
## --------------------------------------------------------------------------------
## sunshine
## n missing distinct Info Mean Gmd .05 .10
## 100 0 100 1 8.114 3.608 2.184 3.409
## .25 .50 .75 .90 .95
## 5.895 8.339 10.029 11.563 13.250
##
## lowest : 1.25285 1.3681 1.76453 2.11488 2.16905
## highest: 13.8987 15.1242 15.1488 15.3742 16.0751
## --------------------------------------------------------------------------------
## cloud_cover
## n missing distinct Info Mean Gmd .05 .10
## 100 0 100 1 49.93 22.05 14.88 20.91
## .25 .50 .75 .90 .95
## 38.16 49.99 62.96 71.28 80.35
##
## lowest : -0.158356 9.24635 10.8359 12.2735 12.9477
## highest: 80.7686 82.3109 87.5573 93.2283 103.697
## --------------------------------------------------------------------------------
## evapotransp
## n missing distinct Info Mean Gmd .05 .10
## 100 0 100 1 4.176 1.441 2.328 2.631
## .25 .50 .75 .90 .95
## 3.239 4.148 5.014 5.950 6.344
##
## lowest : 1.68983 1.76773 1.86403 1.9769 2.24938
## highest: 6.40802 6.78503 6.9856 7.06363 7.09054
## --------------------------------------------------------------------------------
## soil_moisture
## n missing distinct Info Mean Gmd .05 .10
## 100 0 100 1 30.16 12.11 13.77 17.33
## .25 .50 .75 .90 .95
## 21.62 30.55 35.03 45.27 48.46
##
## lowest : 4.50657 8.88792 11.5153 13.2887 13.5151
## highest: 52.9962 53.0506 54.1621 57.9739 58.3223
## --------------------------------------------------------------------------------
## dew_point
## n missing distinct Info Mean Gmd .05 .10
## 100 0 100 1 20.08 3.322 15.67 16.93
## .25 .50 .75 .90 .95
## 18.03 19.82 21.73 23.62 24.77
##
## lowest : 12.8678 13.9836 14.7549 14.9244 15.2947
## highest: 25.3097 25.3373 25.7151 27.2735 29.5521
## --------------------------------------------------------------------------------
## vapor_pressure
## n missing distinct Info Mean Gmd .05 .10
## 100 0 100 1 24.63 5.606 15.85 18.82
## .25 .50 .75 .90 .95
## 21.19 24.69 28.05 30.93 32.20
##
## lowest : 14.0054 14.1294 14.1773 14.9066 15.3278
## highest: 32.2641 32.412 34.7683 35.3235 35.968
## --------------------------------------------------------------------------------
## radiation
## n missing distinct Info Mean Gmd .05 .10
## 100 0 100 1 199.1 42.54 131.8 151.9
## .25 .50 .75 .90 .95
## 175.7 197.6 224.6 245.4 257.4
##
## lowest : 107.451 120.237 120.338 125.453 130.915
## highest: 258.399 259.727 270.93 276.088 335.615
## --------------------------------------------------------------------------------
## snowfall
## n missing distinct Info Mean Gmd .05 .10
## 100 0 100 1 2.007 1.1 0.2964 0.6705
## .25 .50 .75 .90 .95
## 1.3572 2.2457 2.6620 3.1414 3.3082
##
## lowest : -0.695329 -0.168418 0.00741423 0.0313384 0.258978
## highest: 3.36467 3.37305 3.58043 3.90014 4.00557
## --------------------------------------------------------------------------------
## cyclone_freq
## n missing distinct Info Mean Gmd
## 100 0 5 0.893 0.97 1.028
##
## Value 0 1 2 3 4
## Frequency 38 35 21 4 2
## Proportion 0.38 0.35 0.21 0.04 0.02
##
## For the frequency table, variable is rounded to the nearest 0
## --------------------------------------------------------------------------------
## drought_index
## n missing distinct Info Mean Gmd .05 .10
## 100 0 100 1 0.03549 1.009 -1.43218 -1.11495
## .25 .50 .75 .90 .95
## -0.54047 0.05388 0.64071 1.11228 1.53470
##
## lowest : -2.24869 -2.1449 -1.58499 -1.55785 -1.54609
## highest: 1.57785 1.76202 1.76937 1.9727 2.12721
## --------------------------------------------------------------------------------
## heatwaves
## n missing distinct Info Mean Gmd
## 100 0 9 0.966 3.01 2.051
##
## Value 0 1 2 3 4 5 6 8 10
## Frequency 7 12 27 20 10 15 7 1 1
## Proportion 0.07 0.12 0.27 0.20 0.10 0.15 0.07 0.01 0.01
##
## For the frequency table, variable is rounded to the nearest 0
## --------------------------------------------------------------------------------
## storm_surge
## n missing distinct Info Mean Gmd .05 .10
## 100 0 100 1 0.4951 0.2474 0.1479 0.2299
## .25 .50 .75 .90 .95
## 0.3378 0.4911 0.6750 0.7756 0.8260
##
## lowest : -0.109572 0.007485 0.0674784 0.0812721 0.112012
## highest: 0.852909 0.858085 0.873731 0.876568 1.01545
## --------------------------------------------------------------------------------
## sea_temp
## n missing distinct Info Mean Gmd .05 .10
## 100 0 100 1 28.02 2.115 25.18 25.70
## .25 .50 .75 .90 .95
## 26.66 28.10 29.03 30.54 31.05
##
## lowest : 23.911 24.2366 24.3302 24.6937 24.8818
## highest: 31.1316 31.1749 31.3802 31.8795 32.7301
## --------------------------------------------------------------------------------
This section explores correlations between meteorological variables
using Pearson correlation coefficients and significance tests. A custom
function cor_stars is used to display correlations with
significance stars, and a correlation plot is generated for
visualization.
# Correlation matrix
lowerCor(data)
## rnfll tmprt mslp hmdty wnd_s wnd_g wnd_d snshn cld_c evptr
## rainfall 1.00
## temperature -0.05 1.00
## mslp -0.13 0.03 1.00
## humidity -0.04 0.04 -0.04 1.00
## wind_speed -0.19 -0.13 -0.02 -0.02 1.00
## wind_gust -0.06 0.11 0.02 -0.09 0.21 1.00
## wind_dir 0.07 0.00 0.07 -0.02 -0.16 -0.10 1.00
## sunshine 0.16 -0.04 0.20 -0.10 0.05 0.05 0.04 1.00
## cloud_cover 0.00 -0.20 0.07 -0.19 -0.06 0.02 -0.06 0.01 1.00
## evapotransp 0.17 -0.05 -0.09 0.18 -0.06 -0.20 0.05 -0.15 -0.03 1.00
## soil_moisture 0.12 0.04 0.01 0.01 -0.01 0.05 0.00 0.14 -0.08 0.11
## dew_point 0.01 -0.11 -0.01 0.09 0.01 0.04 -0.10 0.09 0.03 0.03
## vapor_pressure -0.09 -0.04 -0.01 0.06 0.00 0.06 -0.05 0.11 -0.01 0.00
## radiation 0.05 -0.04 -0.09 0.06 0.03 -0.10 -0.02 -0.01 0.14 0.29
## snowfall 0.06 -0.12 -0.01 -0.03 -0.17 0.10 -0.06 0.02 0.19 -0.05
## cyclone_freq 0.05 -0.07 0.11 -0.05 0.13 -0.02 -0.12 0.15 0.03 -0.02
## drought_index 0.07 -0.05 -0.08 0.13 0.10 0.20 -0.18 -0.09 -0.10 0.01
## heatwaves -0.01 -0.16 0.07 -0.06 0.09 0.21 -0.17 -0.02 0.05 -0.11
## storm_surge 0.03 -0.16 -0.09 0.00 -0.11 0.09 -0.14 0.07 0.21 0.03
## sea_temp -0.14 -0.12 -0.01 -0.24 -0.03 -0.01 -0.26 -0.05 -0.01 -0.05
## sl_ms
## rainfall
## temperature
## mslp
## humidity
## wind_speed
## wind_gust
## wind_dir
## sunshine
## cloud_cover
## evapotransp
## soil_moisture 1.00
## dew_point 0.03
## vapor_pressure -0.05
## radiation 0.10
## snowfall 0.12
## cyclone_freq 0.07
## drought_index -0.07
## heatwaves -0.04
## storm_surge -0.06
## sea_temp -0.12
## dw_pn vpr_p radtn snwfl cycl_ drgh_ htwvs strm_ s_tmp
## dew_point 1.00
## vapor_pressure 0.18 1.00
## radiation 0.20 0.10 1.00
## snowfall 0.20 0.00 -0.04 1.00
## cyclone_freq -0.04 -0.02 0.07 -0.08 1.00
## drought_index 0.23 0.03 -0.02 0.02 -0.10 1.00
## heatwaves -0.01 0.06 0.04 0.05 0.06 0.02 1.00
## storm_surge -0.01 -0.13 0.00 0.05 -0.18 0.01 -0.04 1.00
## sea_temp 0.03 -0.05 -0.22 0.16 -0.06 0.03 0.16 -0.08 1.00
# Custom function to display correlations with significance stars
#cor_stars <- function(dt, digits = 3){
# names <- names(dt)
# n <- length(names)
# x <- character(0)
# for(i in 1:n) x <- c(x, rep(names[i], n-i))
# y <- character(0)
# for(i in 1:(n-1)) y <- c(y, names[(i+1):n])
# df <- tibble(x=x, y=y)
# df <- df %>%
# mutate(r = map2_dbl(.x = x, .y = y, ~ cor(dt %>% pull(.x), dt %>% pull(.y))),
# p = map2_dbl(.x = x, .y = y, ~ cor.test(dt %>% pull(.x), dt %>% pull(.y))$p.value))
# df <- df %>%
# mutate(stars = case_when(p < 0.05 & p >= 0.01 ~ "* ",
# p < 0.01 & p >= 0.001 ~ "** ",
# p < 0.001 ~ "***",
# is.na(r) ~ " ",
# TRUE ~ " "),
# fr = format(r, digits = digits-1, nsmall = digits-1))
# df <- df %>%
# mutate(cs = paste0(fr, stars))
# r_tab <- df %>%
# select(x, y, cs) %>%
# pivot_wider(id_cols = y, names_from = x, values_from = cs)
# r_tab <- r_tab %>%
# mutate(across(everything(), ~ replace_na(.x, ""))) %>%
# rename("term" = "y")
# r_tab <- data.frame(r_tab)
# return(r_tab)
#}
# Display correlation table
#cor_stars(data) %>%
# kbl() %>%
# kable_classic_2(full_width = FALSE)
#cor_stars(data, digits = 2)
# Additional correlation outputs
round(cor(data, method="pearson"), 3)
## rainfall temperature mslp humidity wind_speed wind_gust
## rainfall 1.000 -0.050 -0.129 -0.044 -0.193 -0.056
## temperature -0.050 1.000 0.031 0.044 -0.131 0.114
## mslp -0.129 0.031 1.000 -0.045 -0.025 0.018
## humidity -0.044 0.044 -0.045 1.000 -0.019 -0.090
## wind_speed -0.193 -0.131 -0.025 -0.019 1.000 0.207
## wind_gust -0.056 0.114 0.018 -0.090 0.207 1.000
## wind_dir 0.073 -0.003 0.074 -0.017 -0.159 -0.101
## sunshine 0.157 -0.039 0.197 -0.104 0.047 0.053
## cloud_cover 0.000 -0.195 0.065 -0.192 -0.061 0.024
## evapotransp 0.172 -0.052 -0.088 0.182 -0.060 -0.195
## soil_moisture 0.121 0.044 0.012 0.011 -0.007 0.050
## dew_point 0.015 -0.107 -0.012 0.091 0.013 0.036
## vapor_pressure -0.093 -0.039 -0.009 0.063 0.003 0.064
## radiation 0.055 -0.044 -0.092 0.060 0.025 -0.095
## snowfall 0.055 -0.123 -0.015 -0.032 -0.171 0.100
## cyclone_freq 0.053 -0.073 0.113 -0.055 0.125 -0.017
## drought_index 0.069 -0.053 -0.079 0.131 0.102 0.201
## heatwaves -0.006 -0.164 0.074 -0.064 0.095 0.212
## storm_surge 0.033 -0.159 -0.092 0.003 -0.109 0.091
## sea_temp -0.145 -0.123 -0.011 -0.242 -0.028 -0.008
## wind_dir sunshine cloud_cover evapotransp soil_moisture
## rainfall 0.073 0.157 0.000 0.172 0.121
## temperature -0.003 -0.039 -0.195 -0.052 0.044
## mslp 0.074 0.197 0.065 -0.088 0.012
## humidity -0.017 -0.104 -0.192 0.182 0.011
## wind_speed -0.159 0.047 -0.061 -0.060 -0.007
## wind_gust -0.101 0.053 0.024 -0.195 0.050
## wind_dir 1.000 0.042 -0.064 0.046 0.005
## sunshine 0.042 1.000 0.014 -0.154 0.141
## cloud_cover -0.064 0.014 1.000 -0.034 -0.077
## evapotransp 0.046 -0.154 -0.034 1.000 0.112
## soil_moisture 0.005 0.141 -0.077 0.112 1.000
## dew_point -0.104 0.089 0.028 0.030 0.032
## vapor_pressure -0.052 0.112 -0.007 -0.005 -0.054
## radiation -0.018 -0.010 0.144 0.290 0.099
## snowfall -0.056 0.024 0.193 -0.050 0.125
## cyclone_freq -0.125 0.150 0.030 -0.016 0.067
## drought_index -0.182 -0.094 -0.104 0.011 -0.065
## heatwaves -0.170 -0.021 0.053 -0.110 -0.045
## storm_surge -0.140 0.070 0.214 0.029 -0.064
## sea_temp -0.261 -0.047 -0.007 -0.046 -0.125
## dew_point vapor_pressure radiation snowfall cyclone_freq
## rainfall 0.015 -0.093 0.055 0.055 0.053
## temperature -0.107 -0.039 -0.044 -0.123 -0.073
## mslp -0.012 -0.009 -0.092 -0.015 0.113
## humidity 0.091 0.063 0.060 -0.032 -0.055
## wind_speed 0.013 0.003 0.025 -0.171 0.125
## wind_gust 0.036 0.064 -0.095 0.100 -0.017
## wind_dir -0.104 -0.052 -0.018 -0.056 -0.125
## sunshine 0.089 0.112 -0.010 0.024 0.150
## cloud_cover 0.028 -0.007 0.144 0.193 0.030
## evapotransp 0.030 -0.005 0.290 -0.050 -0.016
## soil_moisture 0.032 -0.054 0.099 0.125 0.067
## dew_point 1.000 0.175 0.200 0.197 -0.042
## vapor_pressure 0.175 1.000 0.104 0.000 -0.024
## radiation 0.200 0.104 1.000 -0.036 0.066
## snowfall 0.197 0.000 -0.036 1.000 -0.081
## cyclone_freq -0.042 -0.024 0.066 -0.081 1.000
## drought_index 0.233 0.026 -0.021 0.021 -0.101
## heatwaves -0.006 0.058 0.042 0.054 0.056
## storm_surge -0.011 -0.130 0.001 0.053 -0.181
## sea_temp 0.028 -0.054 -0.218 0.155 -0.059
## drought_index heatwaves storm_surge sea_temp
## rainfall 0.069 -0.006 0.033 -0.145
## temperature -0.053 -0.164 -0.159 -0.123
## mslp -0.079 0.074 -0.092 -0.011
## humidity 0.131 -0.064 0.003 -0.242
## wind_speed 0.102 0.095 -0.109 -0.028
## wind_gust 0.201 0.212 0.091 -0.008
## wind_dir -0.182 -0.170 -0.140 -0.261
## sunshine -0.094 -0.021 0.070 -0.047
## cloud_cover -0.104 0.053 0.214 -0.007
## evapotransp 0.011 -0.110 0.029 -0.046
## soil_moisture -0.065 -0.045 -0.064 -0.125
## dew_point 0.233 -0.006 -0.011 0.028
## vapor_pressure 0.026 0.058 -0.130 -0.054
## radiation -0.021 0.042 0.001 -0.218
## snowfall 0.021 0.054 0.053 0.155
## cyclone_freq -0.101 0.056 -0.181 -0.059
## drought_index 1.000 0.023 0.012 0.034
## heatwaves 0.023 1.000 -0.041 0.157
## storm_surge 0.012 -0.041 1.000 -0.075
## sea_temp 0.034 0.157 -0.075 1.000
round(corr.test(data)$p, 3)
## rainfall temperature mslp humidity wind_speed wind_gust
## rainfall 0.000 1.000 1.000 1.000 1.000 1.000
## temperature 0.625 0.000 1.000 1.000 1.000 1.000
## mslp 0.200 0.763 0.000 1.000 1.000 1.000
## humidity 0.663 0.665 0.658 0.000 1.000 1.000
## wind_speed 0.055 0.195 0.806 0.849 0.000 1.000
## wind_gust 0.577 0.257 0.857 0.374 0.039 0.000
## wind_dir 0.472 0.974 0.467 0.866 0.114 0.316
## sunshine 0.119 0.697 0.049 0.304 0.641 0.597
## cloud_cover 1.000 0.052 0.518 0.056 0.546 0.810
## evapotransp 0.087 0.605 0.382 0.070 0.555 0.052
## soil_moisture 0.229 0.661 0.905 0.910 0.949 0.619
## dew_point 0.886 0.291 0.903 0.368 0.894 0.720
## vapor_pressure 0.360 0.699 0.929 0.536 0.974 0.529
## radiation 0.588 0.661 0.361 0.555 0.805 0.346
## snowfall 0.585 0.222 0.883 0.749 0.089 0.324
## cyclone_freq 0.603 0.470 0.262 0.588 0.214 0.865
## drought_index 0.497 0.597 0.437 0.194 0.315 0.044
## heatwaves 0.950 0.103 0.463 0.528 0.348 0.035
## storm_surge 0.743 0.114 0.361 0.974 0.279 0.369
## sea_temp 0.151 0.225 0.910 0.015 0.783 0.935
## wind_dir sunshine cloud_cover evapotransp soil_moisture
## rainfall 1.000 1.000 1.000 1.000 1.000
## temperature 1.000 1.000 1.000 1.000 1.000
## mslp 1.000 1.000 1.000 1.000 1.000
## humidity 1.000 1.000 1.000 1.000 1.000
## wind_speed 1.000 1.000 1.000 1.000 1.000
## wind_gust 1.000 1.000 1.000 1.000 1.000
## wind_dir 0.000 1.000 1.000 1.000 1.000
## sunshine 0.680 0.000 1.000 1.000 1.000
## cloud_cover 0.525 0.894 0.000 1.000 1.000
## evapotransp 0.649 0.126 0.737 0.000 1.000
## soil_moisture 0.963 0.162 0.448 0.268 0.000
## dew_point 0.304 0.377 0.785 0.768 0.754
## vapor_pressure 0.606 0.266 0.943 0.963 0.592
## radiation 0.855 0.923 0.154 0.003 0.326
## snowfall 0.582 0.815 0.055 0.623 0.217
## cyclone_freq 0.217 0.135 0.768 0.871 0.507
## drought_index 0.070 0.350 0.303 0.917 0.518
## heatwaves 0.091 0.833 0.603 0.275 0.659
## storm_surge 0.166 0.491 0.032 0.771 0.528
## sea_temp 0.009 0.646 0.943 0.653 0.216
## dew_point vapor_pressure radiation snowfall cyclone_freq
## rainfall 1.000 1.000 1.000 1.000 1.000
## temperature 1.000 1.000 1.000 1.000 1.000
## mslp 1.000 1.000 1.000 1.000 1.000
## humidity 1.000 1.000 1.000 1.000 1.000
## wind_speed 1.000 1.000 1.000 1.000 1.000
## wind_gust 1.000 1.000 1.000 1.000 1.000
## wind_dir 1.000 1.000 1.000 1.000 1.000
## sunshine 1.000 1.000 1.000 1.000 1.000
## cloud_cover 1.000 1.000 1.000 1.000 1.000
## evapotransp 1.000 1.000 0.658 1.000 1.000
## soil_moisture 1.000 1.000 1.000 1.000 1.000
## dew_point 0.000 1.000 1.000 1.000 1.000
## vapor_pressure 0.082 0.000 1.000 1.000 1.000
## radiation 0.046 0.304 0.000 1.000 1.000
## snowfall 0.049 0.998 0.726 0.000 1.000
## cyclone_freq 0.679 0.813 0.514 0.422 0.000
## drought_index 0.019 0.796 0.833 0.839 0.316
## heatwaves 0.956 0.564 0.678 0.592 0.580
## storm_surge 0.911 0.199 0.994 0.600 0.072
## sea_temp 0.781 0.594 0.029 0.123 0.563
## drought_index heatwaves storm_surge sea_temp
## rainfall 1.000 1.000 1.000 1
## temperature 1.000 1.000 1.000 1
## mslp 1.000 1.000 1.000 1
## humidity 1.000 1.000 1.000 1
## wind_speed 1.000 1.000 1.000 1
## wind_gust 1.000 1.000 1.000 1
## wind_dir 1.000 1.000 1.000 1
## sunshine 1.000 1.000 1.000 1
## cloud_cover 1.000 1.000 1.000 1
## evapotransp 1.000 1.000 1.000 1
## soil_moisture 1.000 1.000 1.000 1
## dew_point 1.000 1.000 1.000 1
## vapor_pressure 1.000 1.000 1.000 1
## radiation 1.000 1.000 1.000 1
## snowfall 1.000 1.000 1.000 1
## cyclone_freq 1.000 1.000 1.000 1
## drought_index 0.000 1.000 1.000 1
## heatwaves 0.817 0.000 1.000 1
## storm_surge 0.907 0.685 0.000 1
## sea_temp 0.737 0.118 0.458 0
cor.test(data[,1], data[,2], alternative="two.sided")
##
## Pearson's product-moment correlation
##
## data: data[, 1] and data[, 2]
## t = -0.49095, df = 98, p-value = 0.6246
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2435805 0.1483291
## sample estimates:
## cor
## -0.04953215
# Correlation plot
M <- cor(data)
col <- colorRampPalette(c("red", "white", "blue"))(5)
corrplot(M, col=col, type="lower", method="color", order="original", diag=FALSE, tl.col="black", tl.srt=90)
This section applies variable selection techniques to identify the most relevant predictors for rainfall, using methods such as best subset selection.
# Verify dataset
cbind(names(data))
## [,1]
## [1,] "rainfall"
## [2,] "temperature"
## [3,] "mslp"
## [4,] "humidity"
## [5,] "wind_speed"
## [6,] "wind_gust"
## [7,] "wind_dir"
## [8,] "sunshine"
## [9,] "cloud_cover"
## [10,] "evapotransp"
## [11,] "soil_moisture"
## [12,] "dew_point"
## [13,] "vapor_pressure"
## [14,] "radiation"
## [15,] "snowfall"
## [16,] "cyclone_freq"
## [17,] "drought_index"
## [18,] "heatwaves"
## [19,] "storm_surge"
## [20,] "sea_temp"
dim(data)
## [1] 100 20
# Multiple linear regression (full model)
fitall <- lm(rainfall ~ temperature + mslp + humidity + wind_speed, data = data)
summary(fitall)
##
## Call:
## lm(formula = rainfall ~ temperature + mslp + humidity + wind_speed,
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -48.51 -11.77 -0.67 10.99 47.02
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 334.94695 194.64831 1.721 0.0885 .
## temperature -0.26476 0.37877 -0.699 0.4863
## mslp -0.25840 0.19123 -1.351 0.1798
## humidity -0.05976 0.11662 -0.512 0.6095
## wind_speed -1.90243 0.92462 -2.058 0.0424 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18.04 on 95 degrees of freedom
## Multiple R-squared: 0.06282, Adjusted R-squared: 0.02336
## F-statistic: 1.592 on 4 and 95 DF, p-value: 0.1827
# Best subset selection
selols2 <- ols_step_best_subset(fitall)
write.table(selols2, "OLS.txt", row.names = TRUE, col.names = TRUE)
Multiple linear regression models are constructed to predict rainfall based on selected meteorological variables. A reduced model is also built for comparison.
# Reduced model
fit <- lm(rainfall ~ temperature + mslp + humidity, data = data)
The developed models are compared using performance metrics such as adjusted R², Mean Squared Error (MSE), Akaike Information Criterion (AIC), and Mallows’ Cp. Variance Inflation Factor (VIF) is calculated to check for multicollinearity.
# Model fit indicators
round(summary(fit)$r.squared, 3) # R-Squared
## [1] 0.021
round(summary(fit)$adj.r.squared, 3) # Adjusted R-Squared
## [1] -0.01
round(mean(fit$residuals^2), 3) # Mean Squared Error (MSE)
## [1] 323.011
# Model selection criteria
ols_aic(fit, method = c("R", "STATA", "SAS"))
## [1] 871.5563
round(ols_mallows_cp(fit, fitall), 3) # Mallows' Cp
## [1] 7.233
round(ols_apc(fit), 3)
## [1] 1.061
# AIC comparison
models <- list(fitall, fit)
mod.names <- c('Full Model', 'Reduced Model')
aictab(cand.set = models, modnames = mod.names)
##
## Model selection based on AICc:
##
## K AICc Delta_AICc AICcWt Cum.Wt LL
## Full Model 6 870.10 0.00 0.74 0.74 -428.60
## Reduced Model 5 872.19 2.09 0.26 1.00 -430.78
# Variance Inflation Factor
ols_vif_tol(fit)
## Variables Tolerance VIF
## 1 temperature 0.9970173 1.002992
## 2 mslp 0.9969258 1.003084
## 3 humidity 0.9959417 1.004075
This project serves as an educational exercise, demonstrating the application of statistical methods and regression modeling in a controlled environment. The methodology developed here can be applied to real-world climate data analysis, providing insights into climate patterns and informing predictive modeling. author: “Abdi-Basid ADAN” email: “abdi-basid@outlook.com”