Forest fires pose a significant global challenge, resulting in the loss of millions of hectares each year. The detection and forecasting of forest fires emerge as crucial considerations in mitigating the impact of these disasters.
The dataset includes 244 instances that regroup a data of two regions of Algeria,namely the Bejaia region located in the northeast of Algeria and the Sidi Bel-abbes region located in the northwest of Algeria.
The features of the dataset are as follows:
Temperature: temperature noon (temperature max) in Celsius degrees
RH: Relative Humidity in %Â Ws: Wind speed in km/h
Rain: total day in mm
FFMC: Fine Fuel Moisture Code (FFMC) index from the FWI system
DMC: Duff Moisture Code (DMC) index from the FWI system
DC: Drought Code (DC) index from the FWI system
ISI: Initial Spread Index (ISI) index from the FWI system
BUI: Buildup Index (BUI) index from the FWI system
FWI: Fire Weather Index (FWI)
122 instances for each region. The period from June 2012 to September
2012.
The dataset comes from:https://archive.ics.uci.edu/dataset/547/algerian+forest+fires+dataset
Objective 1: predict whether a fire will occur (using classification
methods)
Objective 2: predict potential fire danger situations (using regression
methods)
Objective 1: predict whether a fire will occur (using classification methods)
# Load the data
data <- read.csv("C:/Users/Zee/Desktop/FF/Data.csv")
# Examine the structure of the dataset
str(data)
## 'data.frame': 247 obs. of 14 variables:
## $ day : chr "1" "2" "3" "4" ...
## $ month : chr "6" "6" "6" "6" ...
## $ year : chr "2012" "2012" "2012" "2012" ...
## $ Temperature: chr "29" "29" "26" "25" ...
## $ RH : chr "57" "61" "82" "89" ...
## $ Ws : chr "18" "13" "22" "13" ...
## $ Rain : chr "0" "1.3" "13.1" "2.5" ...
## $ FFMC : chr "65.7" "64.4" "47.1" "28.6" ...
## $ DMC : chr "3.4" "4.1" "2.5" "1.3" ...
## $ DC : chr "7.6" "7.6" "7.1" "6.9" ...
## $ ISI : chr "1.3" "1" "0.3" "0" ...
## $ BUI : chr "3.4" "3.9" "2.7" "1.7" ...
## $ FWI : chr "0.5" "0.4" "0.1" "0" ...
## $ Classes : chr "not fire " "not fire " "not fire " "not fire " ...
# Identify the row where Location B is labeled
location_B_row <- which(data$day == "Sidi-Bel Abbes Region Dataset")
# Print rows around the identified label for inspection
print(data[(location_B_row - 2):(location_B_row + 2), ])
## day month year Temperature RH Ws Rain FFMC DMC
## 122 30 9 2012 25 78 14 1.4 45 1.9
## 123
## 124 Sidi-Bel Abbes Region Dataset
## 125 day month year Temperature RH Ws Rain FFMC DMC
## 126 1 6 2012 32 71 12 0.7 57.1 2.5
## DC ISI BUI FWI Classes
## 122 7.5 0.2 2.4 0.1 not fire
## 123
## 124
## 125 DC ISI BUI FWI Classes
## 126 8.2 0.6 2.8 0.2 not fire
# Check consistency of columns
print(colnames(data))
## [1] "day" "month" "year" "Temperature" "RH"
## [6] "Ws" "Rain" "FFMC" "DMC" "DC"
## [11] "ISI" "BUI" "FWI" "Classes"
# Check for missing values
print(sapply(data, function(x) sum(is.na(x))))
## day month year Temperature RH Ws
## 0 0 0 0 0 0
## Rain FFMC DMC DC ISI BUI
## 0 0 0 0 0 0
## FWI Classes
## 0 0
# Find the row where B location is mentioned
location_B_row <- which(data$day == "Sidi-Bel Abbes Region Dataset")
# Combine data from A and B
combined_data <- rbind(data[1:(location_B_row - 2), ], data[(location_B_row + 1):nrow(data), ])
# Check the structure of the combined data
str(combined_data)
## 'data.frame': 245 obs. of 14 variables:
## $ day : chr "1" "2" "3" "4" ...
## $ month : chr "6" "6" "6" "6" ...
## $ year : chr "2012" "2012" "2012" "2012" ...
## $ Temperature: chr "29" "29" "26" "25" ...
## $ RH : chr "57" "61" "82" "89" ...
## $ Ws : chr "18" "13" "22" "13" ...
## $ Rain : chr "0" "1.3" "13.1" "2.5" ...
## $ FFMC : chr "65.7" "64.4" "47.1" "28.6" ...
## $ DMC : chr "3.4" "4.1" "2.5" "1.3" ...
## $ DC : chr "7.6" "7.6" "7.1" "6.9" ...
## $ ISI : chr "1.3" "1" "0.3" "0" ...
## $ BUI : chr "3.4" "3.9" "2.7" "1.7" ...
## $ FWI : chr "0.5" "0.4" "0.1" "0" ...
## $ Classes : chr "not fire " "not fire " "not fire " "not fire " ...
# Export the combined data to a CSV file
write.csv(combined_data, "combined_data.csv", row.names = FALSE)
# Load the data
combined_data <- read.csv("combined_data.csv")
str(combined_data)
## 'data.frame': 245 obs. of 14 variables:
## $ day : chr "1" "2" "3" "4" ...
## $ month : chr "6" "6" "6" "6" ...
## $ year : chr "2012" "2012" "2012" "2012" ...
## $ Temperature: chr "29" "29" "26" "25" ...
## $ RH : chr "57" "61" "82" "89" ...
## $ Ws : chr "18" "13" "22" "13" ...
## $ Rain : chr "0" "1.3" "13.1" "2.5" ...
## $ FFMC : chr "65.7" "64.4" "47.1" "28.6" ...
## $ DMC : chr "3.4" "4.1" "2.5" "1.3" ...
## $ DC : chr "7.6" "7.6" "7.1" "6.9" ...
## $ ISI : chr "1.3" "1" "0.3" "0" ...
## $ BUI : chr "3.4" "3.9" "2.7" "1.7" ...
## $ FWI : chr "0.5" "0.4" "0.1" "0" ...
## $ Classes : chr "not fire " "not fire " "not fire " "not fire " ...
summary(combined_data)
## day month year Temperature
## Length:245 Length:245 Length:245 Length:245
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
## RH Ws Rain FFMC
## Length:245 Length:245 Length:245 Length:245
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
## DMC DC ISI BUI
## Length:245 Length:245 Length:245 Length:245
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
## FWI Classes
## Length:245 Length:245
## Class :character Class :character
## Mode :character Mode :character
# Check for missing values
missing_values <- sapply(combined_data, function(x) sum(is.na(x)))
print(missing_values)
## day month year Temperature RH Ws
## 0 0 0 0 0 0
## Rain FFMC DMC DC ISI BUI
## 0 0 0 0 0 0
## FWI Classes
## 0 0
# Convert numeric variables to appropriate types
numeric_vars <- c("Temperature", "RH", "Ws", "Rain", "FFMC", "DMC", "DC", "ISI", "BUI", "FWI")
combined_data[, numeric_vars] <- lapply(combined_data[, numeric_vars], as.numeric)
## Warning in lapply(combined_data[, numeric_vars], as.numeric): NAs introduced by
## coercion
## Warning in lapply(combined_data[, numeric_vars], as.numeric): NAs introduced by
## coercion
## Warning in lapply(combined_data[, numeric_vars], as.numeric): NAs introduced by
## coercion
## Warning in lapply(combined_data[, numeric_vars], as.numeric): NAs introduced by
## coercion
## Warning in lapply(combined_data[, numeric_vars], as.numeric): NAs introduced by
## coercion
## Warning in lapply(combined_data[, numeric_vars], as.numeric): NAs introduced by
## coercion
## Warning in lapply(combined_data[, numeric_vars], as.numeric): NAs introduced by
## coercion
## Warning in lapply(combined_data[, numeric_vars], as.numeric): NAs introduced by
## coercion
## Warning in lapply(combined_data[, numeric_vars], as.numeric): NAs introduced by
## coercion
## Warning in lapply(combined_data[, numeric_vars], as.numeric): NAs introduced by
## coercion
# Clean up classes column
combined_data$Classes <- trimws(combined_data$Classes)
# Remove leading and trailing whitespaces
# Check for outliers or inconsistencies
# Boxplot for numeric variables
numeric_vars <- c("Temperature", "RH", "Ws", "Rain", "FFMC", "DMC", "DC", "ISI", "BUI", "FWI")
par(mfrow = c(2, 5))
for (var in numeric_vars) {
boxplot(combined_data[[var]], main = var, col = "lightblue", border = "black")
}
# Histogram for numeric variables
par(mfrow = c(2, 5)) # Reset the plot layout
for (var in numeric_vars) {
hist(combined_data[[var]], main = var, col = "lightgreen", border = "black")
}
# Save the cleaned data
write.csv(combined_data, "cleaned_combined_data.csv", row.names = FALSE)
# Check for missing values
missing_values <- sapply(combined_data, function(x) sum(is.na(x)))
print(missing_values)
## day month year Temperature RH Ws
## 0 0 0 1 1 1
## Rain FFMC DMC DC ISI BUI
## 1 1 1 2 1 1
## FWI Classes
## 2 0
# Handle missing values if present
# Impute missing values for numeric variables with mean
numeric_vars <- c("Temperature", "RH", "Ws", "Rain", "FFMC", "DMC", "DC", "ISI", "BUI", "FWI")
for (var in numeric_vars) {
combined_data[, var][is.na(combined_data[, var])] <- mean(combined_data[, var], na.rm = TRUE)
}
# Impute missing values for categorical variables with mode
categorical_vars <- c("day", "month", "year", "Classes")
for (var in categorical_vars) {
mode_val <- names(sort(table(combined_data[, var]), decreasing = TRUE))[1]
combined_data[, var][is.na(combined_data[, var])] <- mode_val
}
# Check for missing values again
missing_values <- sapply(combined_data, function(x) sum(is.na(x)))
print(missing_values)
## day month year Temperature RH Ws
## 0 0 0 0 0 0
## Rain FFMC DMC DC ISI BUI
## 0 0 0 0 0 0
## FWI Classes
## 0 0
# Exclude "day," "month," and "year"
combined_data <- combined_data[, !names(combined_data) %in% c("day", "month", "year")]
# Confirm the updated structure
str(combined_data)
## 'data.frame': 245 obs. of 11 variables:
## $ Temperature: num 29 29 26 25 27 31 33 30 25 28 ...
## $ RH : num 57 61 82 89 77 67 54 73 88 79 ...
## $ Ws : num 18 13 22 13 16 14 13 15 13 12 ...
## $ Rain : num 0 1.3 13.1 2.5 0 0 0 0 0.2 0 ...
## $ FFMC : num 65.7 64.4 47.1 28.6 64.8 82.6 88.2 86.6 52.9 73.2 ...
## $ DMC : num 3.4 4.1 2.5 1.3 3 5.8 9.9 12.1 7.9 9.5 ...
## $ DC : num 7.6 7.6 7.1 6.9 14.2 22.2 30.5 38.3 38.8 46.3 ...
## $ ISI : num 1.3 1 0.3 0 1.2 3.1 6.4 5.6 0.4 1.3 ...
## $ BUI : num 3.4 3.9 2.7 1.7 3.9 7 10.9 13.5 10.5 12.6 ...
## $ FWI : num 0.5 0.4 0.1 0 0.5 2.5 7.2 7.1 0.3 0.9 ...
## $ Classes : chr "not fire" "not fire" "not fire" "not fire" ...
# Load necessary libraries for plotting
library(ggplot2)
# EDA for numeric variables
numeric_vars <- c("Temperature", "RH", "Ws", "Rain", "FFMC", "DMC", "DC", "ISI", "BUI", "FWI")
numeric_data <- combined_data[, numeric_vars]
# Pairwise scatterplots
pairs(numeric_data)
# Boxplots for each numeric variable by Classes
for (var in numeric_vars) {
ggplot(combined_data, aes(x = Classes, y = get(var))) +
geom_boxplot(fill = "lightblue", color = "black") +
labs(title = var, x = "Classes", y = var)
}
# Bar plot for the target variable
ggplot(combined_data, aes(x = Classes)) +
geom_bar(fill = "lightcoral") +
labs(title = "Distribution of Classes", x = "Classes", y = "Count")
# Install and load the ROSE package
# install.packages("ROSE", repos = "https://cloud.r-project.org")
library(ROSE)
## Loaded ROSE 0.0-4
# Clean up classes column (remove leading and trailing whitespaces)
combined_data$Classes <- trimws(combined_data$Classes)
# Remove empty and "Classes" levels
combined_data$Classes <- factor(combined_data$Classes, levels = c("fire", "not fire"))
# Check the levels of the target variable
levels(combined_data$Classes)
## [1] "fire" "not fire"
# Perform oversampling
oversampled_data <- ovun.sample(Classes ~ ., data = combined_data, method = "over",
N = 2 * max(table(combined_data$Classes)), seed = 42)$data
# Check the distribution of the target variable after oversampling
table(oversampled_data$Classes)
##
## fire not fire
## 137 137
# Load the caret package
# install.packages("caret", repos = "https://cloud.r-project.org")
library(caret)
## Loading required package: lattice
# Set the seed for reproducibility
set.seed(42)
# Create an index for splitting the data (70% training, 30% testing)
index <- createDataPartition(oversampled_data$Classes, p = 0.7, list = FALSE)
# Split the data
train_data <- oversampled_data[index, ]
test_data <- oversampled_data[-index, ]
# Check the dimensions of the training and testing sets
dim(train_data)
## [1] 192 11
dim(test_data)
## [1] 82 11
# Load the randomForest package
library(randomForest)
## randomForest 4.7-1.1
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
## The following object is masked from 'package:ggplot2':
##
## margin
# Specify the formula for the model
formula <- Classes ~ .
# Train the Random Forest model
rf_model <- randomForest(formula, data = train_data, ntree = 100, importance = TRUE)
# Print the model details
print(rf_model)
##
## Call:
## randomForest(formula = formula, data = train_data, ntree = 100, importance = TRUE)
## Type of random forest: classification
## Number of trees: 100
## No. of variables tried at each split: 3
##
## OOB estimate of error rate: 3.12%
## Confusion matrix:
## fire not fire class.error
## fire 93 3 0.03125
## not fire 3 93 0.03125
# Make predictions on the test set
predictions <- predict(rf_model, newdata = test_data)
# Evaluate the model performance
confusion_matrix <- confusionMatrix(predictions, test_data$Classes)
print(confusion_matrix)
## Confusion Matrix and Statistics
##
## Reference
## Prediction fire not fire
## fire 41 1
## not fire 0 40
##
## Accuracy : 0.9878
## 95% CI : (0.9339, 0.9997)
## No Information Rate : 0.5
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.9756
##
## Mcnemar's Test P-Value : 1
##
## Sensitivity : 1.0000
## Specificity : 0.9756
## Pos Pred Value : 0.9762
## Neg Pred Value : 1.0000
## Prevalence : 0.5000
## Detection Rate : 0.5000
## Detection Prevalence : 0.5122
## Balanced Accuracy : 0.9878
##
## 'Positive' Class : fire
##
Objective 2: predict potential fire danger situations (using regression methods)
# Load the data
df <- read.csv("C:/Users/Zee/Documents/FF_o2/cleaned_combined_data.csv")
str(df)
## 'data.frame': 244 obs. of 14 variables:
## $ day : int 1 2 3 4 5 6 7 8 9 10 ...
## $ month : int 6 6 6 6 6 6 6 6 6 6 ...
## $ year : int 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
## $ Temperature: int 29 29 26 25 27 31 33 30 25 28 ...
## $ RH : int 57 61 82 89 77 67 54 73 88 79 ...
## $ Ws : int 18 13 22 13 16 14 13 15 13 12 ...
## $ Rain : num 0 1.3 13.1 2.5 0 0 0 0 0.2 0 ...
## $ FFMC : num 65.7 64.4 47.1 28.6 64.8 82.6 88.2 86.6 52.9 73.2 ...
## $ DMC : num 3.4 4.1 2.5 1.3 3 5.8 9.9 12.1 7.9 9.5 ...
## $ DC : num 7.6 7.6 7.1 6.9 14.2 22.2 30.5 38.3 38.8 46.3 ...
## $ ISI : num 1.3 1 0.3 0 1.2 3.1 6.4 5.6 0.4 1.3 ...
## $ BUI : num 3.4 3.9 2.7 1.7 3.9 7 10.9 13.5 10.5 12.6 ...
## $ FWI : num 0.5 0.4 0.1 0 0.5 2.5 7.2 7.1 0.3 0.9 ...
## $ Classes : chr "not fire" "not fire" "not fire" "not fire" ...
summary(df)
## day month year Temperature RH
## Min. : 1.00 Min. :6.0 Min. :2012 Min. :22.00 Min. :21.00
## 1st Qu.: 8.00 1st Qu.:7.0 1st Qu.:2012 1st Qu.:30.00 1st Qu.:52.00
## Median :16.00 Median :7.5 Median :2012 Median :32.00 Median :63.00
## Mean :15.75 Mean :7.5 Mean :2012 Mean :32.17 Mean :61.94
## 3rd Qu.:23.00 3rd Qu.:8.0 3rd Qu.:2012 3rd Qu.:35.00 3rd Qu.:73.25
## Max. :31.00 Max. :9.0 Max. :2012 Max. :42.00 Max. :90.00
## Ws Rain FFMC DMC
## Min. : 6.0 Min. : 0.0000 Min. :28.60 Min. : 0.70
## 1st Qu.:14.0 1st Qu.: 0.0000 1st Qu.:72.08 1st Qu.: 5.80
## Median :15.0 Median : 0.0000 Median :83.50 Median :11.30
## Mean :15.5 Mean : 0.7607 Mean :77.89 Mean :14.67
## 3rd Qu.:17.0 3rd Qu.: 0.5000 3rd Qu.:88.30 3rd Qu.:20.75
## Max. :29.0 Max. :16.8000 Max. :96.00 Max. :65.90
## DC ISI BUI FWI
## Min. : 6.90 Min. : 0.000 Min. : 1.10 Min. : 0.00
## 1st Qu.: 11.55 1st Qu.: 1.400 1st Qu.: 6.00 1st Qu.: 0.70
## Median : 33.10 Median : 3.500 Median :12.25 Median : 4.45
## Mean : 49.28 Mean : 4.774 Mean :16.66 Mean : 7.03
## 3rd Qu.: 68.15 3rd Qu.: 7.300 3rd Qu.:22.52 3rd Qu.:11.38
## Max. :220.40 Max. :19.000 Max. :68.00 Max. :31.10
## Classes
## Length:244
## Class :character
## Mode :character
##
##
##
library(stringr)
df$Classes <- ifelse(str_detect(df$Classes, "not fire"), 0, 1)
col_names <- colnames(df)
drop_cols <- match(c('month','day','year'), col_names)
df <- df[,-drop_cols]
# Data manipulation
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:randomForest':
##
## combine
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Indepent and dependent features
X <- df %>% dplyr::select(-FWI)
y <- df$FWI
# Model selection and graphics drawing packages
library(caret)
library(ggplot2)
# Divide the training set and test set
set.seed(42)
split <- createDataPartition(y, p = 0.25, list = FALSE)
X_train <- X[split, ]
X_test <- X[-split, ]
y_train <- y[split]
y_test <- y[-split]
dim(X_train)
## [1] 63 10
dim(X_test)
## [1] 181 10
# install.packages(c("ggplot2", "reshape2"))
library(ggplot2)
library(reshape2)
corr <- cor(X_train)
# Heat maps for plotting correlation coefficient matrices
ggplot(melt(corr), aes(x = Var1, y = Var2, fill = value)) +
geom_tile() +
scale_fill_gradient2(low = "blue", high = "red", mid = "white",
midpoint = 0, limit = c(-1, 1), space = "Lab",
name = "Pearson\nCorrelation") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, vjust = 1,
size = 12, hjust = 1),
axis.text.y = element_text(size = 12)) +
coord_fixed()
# Feature scaling or standardization
X_train_scaled <- scale(X_train)
train_mean <- attr(X_train_scaled, "scaled:center")
train_sd <- attr(X_train_scaled, "scaled:scale")
X_test_scaled <- scale(X_test, center = train_mean, scale = train_sd)
#str(X_test_scaled)
#summary(X_test_scaled)
library(ggplot2)
# Standard scaling
X_train_scaled <- scale(X_train)
X_test_scaled <- scale(X_test)
# Box plots
par(mfrow = c(1, 2), mar = c(5, 4, 2, 1))
boxplot(X_train, main = "X_train before scaling", col = "lightblue")
boxplot(X_train_scaled, main = "X_train after scaling", col = "lightgreen")
par(mfrow = c(1, 1), mar = c(5, 4, 4, 2) + 0.1)
# install.packages("lattice")
# Load the lattice package
library(lattice)
# Linear regression
library(caret)
train_data <- cbind.data.frame(y_train, X_train_scaled)
linreg <- lm(y_train ~ ., data = train_data)
test_data <- cbind.data.frame(y_test, X_test_scaled)
y_pred <- predict(linreg, newdata = test_data)
mae <- mean(abs(y_test - y_pred))
score <- cor(y_test, y_pred)^2
cat("Linear Regression :Mean absolute error:", mae, "\n")
## Linear Regression :Mean absolute error: 1.157709
cat("Linear Regression :R-squared score:", score, "\n")
## Linear Regression :R-squared score: 0.9561771
plot(y_test, y_pred, main = "Linear Regression : Scatter plot of y_test vs. y_pred",
xlab = "y_test", ylab = "y_pred", pch = 16)
abline(0, 1, col = "red", lty = 2)
# Lasso regression
library(glmnet)
## Loading required package: Matrix
## Loaded glmnet 4.1-8
lasso <- cv.glmnet(as.matrix(X_train_scaled), as.vector(y_train), alpha = 1)
y_pred <- predict(lasso, newx = as.matrix(X_test_scaled), s = "lambda.min")
mae <- mean(abs(y_test - y_pred))
score <- cor(y_test, y_pred)^2
cat("Lasso Regression : Mean absolute error:", mae, "\n")
## Lasso Regression : Mean absolute error: 1.130073
cat("Lasso Regression : R-squared score:", score, "\n")
## Lasso Regression : R-squared score: 0.9592434
plot(y_test, y_pred, main = "Lasso Regression : Scatter plot of y_test vs. y_pred",
xlab = "y_test", ylab = "y_pred", pch = 16)
abline(0, 1, col = "red", lty = 2)
# Lasso regression model with cross-validated alpha selection
# install.packages("Metrics")
library(Metrics)
##
## Attaching package: 'Metrics'
## The following objects are masked from 'package:caret':
##
## precision, recall
lassocv <- cv.glmnet(as.matrix(X_train_scaled), as.vector(y_train), alpha = 1)
mse_path <- as.matrix(lassocv$cvm)
matplot(log(lassocv$lambda), mse_path, type = "l", xlab = "log(lambda)", ylab = "Mean Squared Error",main = "Lasso Cross-Validated Mean Squared Error Path")
# Identify the optimal lambda (minimize cross-validated MSE)
#best_lambda <- lassocv$lambda.min
#cat("Optimal Lambda:", best_lambda, "\n")
y_pred <- predict(lassocv, newx = as.matrix(X_test_scaled), s = "lambda.min", type = "response")
mae <- mae(y_test, y_pred)
score <- cor(y_test, y_pred)^2
cat("Lasso Regression(Alpha) : Mean absolute error:", mae, "\n")
## Lasso Regression(Alpha) : Mean absolute error: 1.131542
cat("Lasso Regression(Alpha) : R-squared score:", score, "\n")
## Lasso Regression(Alpha) : R-squared score: 0.9591337
plot(y_pred, y_test, main = "Lasso Regression(Alpha) : Scatter plot of y_pred vs. y_test",
xlab = "y_pred", ylab = "y_test", pch = 16)
abline(0, 1, col = "red", lty = 2)
#Ridge regression
library(glmnet)
ridge <- cv.glmnet(as.matrix(X_train_scaled), as.vector(y_train), alpha = 0)
y_pred <- predict(ridge, newx = as.matrix(X_test_scaled), s = "lambda.min")
mae <- mean(abs(y_test - y_pred))
score <- cor(y_test, y_pred)^2
cat("Ridge Regression : Mean absolute error:", mae, "\n")
## Ridge Regression : Mean absolute error: 1.306729
cat("Ridge Regression : R-squared score:", score, "\n")
## Ridge Regression : R-squared score: 0.9500863
plot(y_test, y_pred, main = "Ridge Regression : Scatter plot of y_test vs. y_pred",
xlab = "y_test", ylab = "y_pred", pch = 16)
abline(0, 1, col = "red", lty = 2)
#Elastic Net regression
library(glmnet)
elastic <- cv.glmnet(as.matrix(X_train_scaled), as.vector(y_train), alpha = 0.5)
y_pred <- predict(elastic, newx = as.matrix(X_test_scaled), s = "lambda.min")
mae <- mean(abs(y_test - y_pred))
score <- cor(y_test, y_pred)^2
cat("Elastic Net Regression : Mean absolute error:", mae, "\n")
## Elastic Net Regression : Mean absolute error: 1.136838
cat("Elastic Net Regression : R-squared score:", score, "\n")
## Elastic Net Regression : R-squared score: 0.9588857
plot(y_test, y_pred, main = "Elastic Net Regression : Scatter plot of y_test vs. y_pred",
xlab = "y_test", ylab = "y_pred", pch = 16)
abline(0, 1, col = "red", lty = 2)
By combining the results of classification and regression models, forest management authorities can gain a comprehensive understanding of fire risk. By predicting whether and how fires will occur, we can inform proactive measures to help prevent fires and manage forest areas more effectively, thereby contributing to overall forest safety and conservation.
The results of our project can be further applied to: risk assessment for forest management, planning controlled burns and fire resource allocation, and providing early warning to communities about potential fire hazards.
Of course, there is also room for subsequent project improvements: incorporating contextualised data on human activities and land-use change, exploring the use of advanced spatial analysis methods for localised risk assessment, and regularly updating models with new data to adapt to changing environmental conditions.