Kristin Lussi
2023-9-26
In this presentation, I will demonstrate time series analysis on a dataset which provides daily climate data from January 1, 2013 to April 24, 2017 in the city of Delhi, India. The variables measured are mean temperature (celsius), humidity (g.m^-3), wind speed (kmph), and mean pressure (atm).
library(readr)
library(stringr)
library(dplyr)
library(lubridate)
library(ggplot2)
library(gt)
urlfile <- "https://raw.githubusercontent.com/kristinlussi/607ExtraCredit3/main/DailyDelhiClimateTrain.csv"
delhiClimate <- read_csv(url(urlfile), show_col_types = FALSE)
Here is a glimpse of the data:
## # A tibble: 6 × 5
## date meantemp humidity wind_speed meanpressure
## <date> <dbl> <dbl> <dbl> <dbl>
## 1 2013-01-01 10 84.5 0 1016.
## 2 2013-01-02 7.4 92 2.98 1018.
## 3 2013-01-03 7.17 87 4.63 1019.
## 4 2013-01-04 8.67 71.3 1.23 1017.
## 5 2013-01-05 6 86.8 3.7 1016.
## 6 2013-01-06 7 82.8 1.48 1018
Here, we will calculate the yearly averages for each year (2013 - 2016).
yearly_average <- delhiClimate %>%
mutate(year = year(delhiClimate$date)) %>%
group_by(year) %>%
summarize(
meanTemp = mean(meantemp, na.rm = TRUE),
humidity = mean(humidity, na.rm = TRUE),
windSpeed = mean(wind_speed, na.rm = TRUE),
meanPressure = mean(meanpressure, na.rm = TRUE)
)
# filter out year 2017
yearly_average <- yearly_average %>%
filter(year != 2017)
Yearly Averages for Years 2013 to 2016 | ||||
Recorded in Delhi, India | ||||
Year | Mean Temp | Humidity | Wind Speed | Mean Pressure |
---|---|---|---|---|
2013 | 24.79149 | 63.04629 | 6.827253 | 1007.642 |
2014 | 25.01067 | 59.76794 | 6.756148 | 1008.347 |
2015 | 25.11459 | 61.43049 | 6.480603 | 1008.835 |
2016 | 27.10337 | 58.74017 | 7.162480 | 1019.557 |
Here, I calculated the 30-day moving averages for each variable using filter function from the stats package. I used 30 days here instead of 6 because we are looking at a long stretch of data (2013-2016). It is easier to see long-term trends when using a longer moving average.
# calculate 30-day moving average for mean temperature column
delhiClimate$temp_ma <- stats::filter(delhiClimate$meantemp, filter = rep(1/30, 30), sides = 2)
# calculate 30-day moving average for humidity column
delhiClimate$humidity_ma <- stats::filter(delhiClimate$humidity, filter = rep(1/30, 30), sides = 2)
# calculate 30-day moving average for wind speed column
delhiClimate$windspeed_ma <- stats::filter(delhiClimate$wind_speed, filter = rep(1/30, 30), sides = 2)
# calculate 30-day moving average for mean pressure column
delhiClimate$pressure_ma <- stats::filter(delhiClimate$meanpressure, filter = rep(1/30, 30), sides = 2)
ggplot(data = delhiClimate) +
geom_line(aes(x = date, y = temp_ma, color = factor(year(date)), group = year(date)),
na.rm = TRUE, linewidth = 1) +
labs(x = "Year", y = "Mean Temperature (°C)", title = "30 Day Moving Average of Mean Temperature") +
theme_minimal() +
theme(legend.position = "none")
As you can see from the plot, there is a similar trend from year to year, with higher temperatures in the summer months and lower temperatures in the winter months.
ggplot(data = delhiClimate) +
geom_line(aes(x = date, y = humidity_ma, color = factor(year(date)), group = year(date)),
na.rm = TRUE, linewidth = 1) +
labs(x = "Year", y = "Humidity", title = "30 Day Moving Average of Humidity") +
theme_minimal() +
theme(legend.position = "none")
As you can see from the above plot, there is a slight trend in humidity levels from year to year. The humidity is lower in the spring and higher in the summer and winter months.
ggplot(data = delhiClimate) +
geom_line(aes(x = date, y = windspeed_ma, color = factor(year(date)), group = year(date)),
na.rm = TRUE, linewidth = 1) +
labs(x = "Year", y = "Wind Speed (kmph)", title = "30 Day Moving Average of Wind Speed") +
theme_minimal() +
theme(legend.position = "none")
From the above plot, we can see that there is a slight trend in wind speed from year-to-year. Wind speed tends to be highest in the middle of the year (June).
ggplot(data = delhiClimate) +
geom_line(aes(x = date, y = pressure_ma, color = factor(year(date)), group = year(date)),
na.rm = TRUE, linewidth = 1) +
labs(x = "Year", y = "Mean Pressure (atm)", title = "30 Day Moving Average of Mean Pressure") +
theme_minimal() +
theme(legend.position = "none")
As you can see, it seems that there was a trend in the years 2013-2015, but in 2016 there was an unsual spike and decline in mean pressure levels.
From the analysis, we can conclude that there is a trend from year-to-year in regards to mean temperature, humidity, and wind speed. However, we cannot conclude that there is a trend from year-to-year in regards to mean pressure, since there was an unusual spike and decline in 2016 which we are not able to explain with the data provided.
Title: Daily Climate Time Series Data
Author/Uploader: Sumanthvrao
URL: https://www.kaggle.com/datasets/sumanthvrao/daily-climate-time-series-data
Title: What is a Trend?
Publisher/Website: TradeStation
URL: https://help.tradestation.com/10_00/eng/tradestationhelp/data/what_trend.html
Title: Moving Average Trading Signal - Stocks
Publisher/Website: Fidelity
URL: https://www.fidelity.com/viewpoints/active-investor/moving-averages#:~:text=A%20longer%20moving%20average%2C%20such,to%20assess%20short%2Dterm%20patterns.