Introduction

This project involves forecasting the mean temperature of two locations: Ang Mo Kio in Singapore and Finland. We utilize the Prophet library to create predictive models based on historical weather data. We want to see how well Prophet works in different scenarios, and how well it can predict temperature values.

Libraries

# Load necessary libraries
library(ggplot2)
library(lubridate)

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(prophet)

## Loading required package: Rcpp

## Loading required package: rlang

Singapore Weather Data

# Read Singapore data
weather_data <- read.csv("angmokio.csv")
names(weather_data)[names(weather_data) == "Mean.Temperature...C."] <- "Mean_Temperature"
weather_data$Date <- as.Date(with(weather_data, paste(Year, Month, Day, sep = "-")), "%Y-%m-%d")

Finland Weather Data

# Read Finland data
finland_weather_data <- read.csv("Finland.csv")
# Ensure the date column is correctly formatted (mm/dd/yyyy)
finland_weather_data$date <- as.Date(finland_weather_data$date, format = "%m/%d/%Y")

Prepare data for Prophet

singapore_prophet <- weather_data %>%
  select(Date, Mean_Temperature) %>%
  rename(ds = Date, y = Mean_Temperature)

# Filter out NA values
singapore_prophet <- singapore_prophet %>% filter(!is.na(y))
finland_prophet <- finland_weather_data %>%
  select(date, tavg) %>%
  rename(ds = date, y = tavg)

# Filter out NA values
finland_prophet <- finland_prophet %>% filter(!is.na(y))
finland_prophet$ds <- as.Date(finland_prophet$ds)

Raw data for Singapore

ggplot(singapore_prophet, aes(x = ds, y = y)) +
  geom_line(color = "blue") +
  labs(title = "Raw Mean Temperature in Ang Mo Kio, Singapore",
       x = "Date", y = "Mean Temperature (°C)") +
  theme_minimal()

Raw data for Finland

ggplot(finland_prophet, aes(x = ds, y = y)) +
  geom_line(color = "red") +
  labs(title = "Raw Mean Temperature in Finland",
       x = "Date", y = "Mean Temperature (°C)") +
  theme_minimal()

What we can draw from raw data:

We can see here that we have the temperatures of two different countries in two different regions plotted over the span of several years. From this, we can easily see that Finland has a more seasonal variation, with ranges of around 45 degrees Celsius a year. However, this range for Singapore is only just 2 degrees, therefore it is more consistent throughout the year. Therefore, we can see that both graphs have a recurring shape, except the shape for norway is more elongated, implying larger ranges as described above. What I am about to do, is not only predict the temperatures of these two countries over the next year, but also evaluate the accuracies’ of the two graphs. My guess is that prophet may struggle more with finland, due to the large ranges, making the time series less consistent and therefore tougher to track.

Fit Prophet models

singapore_model <- prophet(singapore_prophet,daily.seasonality = TRUE)
finland_model <- prophet(finland_prophet, daily.seasonality = TRUE)  # Enable daily seasonality

Create future data frames for predictions

future_singapore <- make_future_dataframe(singapore_model, periods = 365)
future_finland <- make_future_dataframe(finland_model, periods = 365)

Make predictions

forecast_singapore <- predict(singapore_model, future_singapore)
forecast_finland <- predict(finland_model, future_finland)

Plot the forecasts for Singapore

ggplot(forecast_singapore, aes(x = ds, y = yhat)) +
  geom_line(color = "blue") +
  geom_ribbon(aes(ymin = yhat_lower, ymax = yhat_upper), alpha = 0.2) +
  labs(title = "Forecast of Mean Temperature in Ang Mo Kio, Singapore",
       x = "Date", y = "Mean Temperature (°C)") +
  theme_minimal()

Plot the forecasts for Finland

ggplot(forecast_finland, aes(x = ds, y = yhat)) +
  geom_line(color = "red") +
  geom_ribbon(aes(ymin = yhat_lower, ymax = yhat_upper), alpha = 0.2) +
  labs(title = "Forecast of Mean Temperature in Finland",
       x = "Date", y = "Mean Temperature (°C)") +
  theme_minimal()

Model Evaluation

# Evaluate the model using Mean Absolute Error (MAE)
mae_singapore <- mean(abs(singapore_prophet$y - forecast_singapore$yhat[1:nrow(singapore_prophet)]))
mae_finland <- mean(abs(finland_prophet$y - forecast_finland$yhat[1:nrow(finland_prophet)]))

cat("MAE for Singapore:", mae_singapore, "\n")

## MAE for Singapore: 0.7749165

cat("MAE for Finland:", mae_finland, "\n")

## MAE for Finland: 2.94989

Conclusion

If we look at just the graphs, we can see that prophet has done a good job of tracing the temperatures for the next year, as it had drawn a curve consistent to the yearly- curves before it, thus taking into account seasonal changes. This is due to the seasoning option being enabled. Therefore it shows that prophet is quite effective when calculating future temperatures. However, the forecasting models demonstrated varying accuracy levels, with Singapore showing a lower Mean Absolute Error (MAE) due to its stable tropical climate compared to Finland’s more variable temperatures. This proves that my original hypothesis was right, and that prophet may struggle with more dynamic graphs. The results suggest that while the Prophet model can be effective for both regions, it may need further tuning for locations with greater temperature fluctuations.

Evaluating Prophet’s effectiveness in Modelling temperatures for different regions

Tejas Bantupalli

2024-11-02