***Will have an intermittent updates for minor improvement from time to time
The data is taken from data.gov.my a public open source data controlled by Department of Statistics Malaysia (DOSM).
It is derived based on births registered with JPN. Accordingly, if a birth is not registered with JPN (for instance, if a foreigner chooses to register their child in their home country, or if a resident in a remote area does not register the birth of their child), it will not count in this dataset.
This dataset tabulate births for each day of birth, rather than the date of registration with JPN. Therefore, to ensure accuracy, the data is provided with a 1 month lag, since most people do not register their child on the exact day they are born.
The data is collected from 1st January 1920 till 31st July 2023 made up to 37834 samples (row)
#Load the data
file_path <- "C:\\Users\\Qandiyas Qassem\\Downloads\\my pet project\\births.csv"
your_data <- read.csv(file_path)
# Display the result
print(head(your_data))
## date births
## 1 1/1/1920 96
## 2 2/1/1920 115
## 3 3/1/1920 111
## 4 4/1/1920 101
## 5 5/1/1920 95
## 6 6/1/1920 91
#Check data type
str(your_data)
## 'data.frame': 37833 obs. of 2 variables:
## $ date : chr "1/1/1920" "2/1/1920" "3/1/1920" "4/1/1920" ...
## $ births: int 96 115 111 101 95 91 85 83 96 123 ...
#Check Data Range:
min_date <- min(your_data$births)
max_date <- max(your_data$births)
#print(c("Min Date:", min_date, "Max Date:", max_date)
ts_start <- min_date
ts_end <- max_date
print(ts_start)
## [1] 43
births_ts <- ts(your_data$births, frequency=365, start=c(1920,1,1))
#births_ts
plot.ts(births_ts)
# Transforming the series
logs_birth_ts <- log(births_ts)
plot.ts(logs_birth_ts)
#Check what kind of data used
class(your_data)#data.frame
## [1] "data.frame"
class(births_ts)#array
## [1] "ts"
It is observed that the time series of births rate in Malaysia is
seasonal and the fluctuations are hardly constant in size over
time.
## Exploratory Data Analysis (EDA)
library(tseries)
## Warning: package 'tseries' was built under R version 4.3.2
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
adf.test(births_ts)
## Warning in adf.test(births_ts): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: births_ts
## Dickey-Fuller = -6.0082, Lag order = 33, p-value = 0.01
## alternative hypothesis: stationary
The null hypothesis of the ADF test is that the time series has a unit root and is non-stationary.
if p<0.05 is true, accept the data is stationary and reject the null hypothesis
We see that the series is stationary enough to do any kind of time series modelling.
The purpose of decompose is to separate time series into seasonal, trend and an irregular (random) components.
this function will return a list object where it consist o=f estimation of seasonal , trend and irregular component
# Decomposing Time Series
#separating into seaspnal , trend or irregular
library("TTR")
## Warning: package 'TTR' was built under R version 4.3.2
# Decomposing Seasonal Data
#To estimate the trend, seasonal and irregular components of this time series
births_comp <- decompose(births_ts)
print(head(births_comp$seasonal))
## [1] 14.51728 26.42413 52.41870 61.01732 59.16622 38.18981
# Plot the estimated trend, seasonal, and irregular components of the time series
plot(births_comp)
The trend plot shows the underlying trend of the data. We can see here a positive trend in the data, suggesting increasing birth rates over 104 years of data period
The seasonal plot shows patterns that repeat at a regular interval. This is show strong seasonality regardless any event.
The random plot shows the residuals of the time series after the trend and seasonal parts are removed.
Removed seasonality from the original time series to get a seasonally-adjusted time series.
births_ts_seasonallyadjusted <- births_ts - births_comp$seasonal
plot(births_ts_seasonallyadjusted)
To remove trend, from dataset, simply subtract the trend from the original time series
births_ts_detrended <- births_ts - births_comp$trend
plot(births_ts_detrended)
options(max.print = 5)
#forecast using howitzer
births_forecasts <- HoltWinters(births_ts, beta=FALSE, gamma=FALSE)
print(births_forecasts)
## Holt-Winters exponential smoothing without trend and without seasonal component.
##
## Call:
## HoltWinters(x = births_ts, beta = FALSE, gamma = FALSE)
##
## Smoothing parameters:
## alpha: 0.1072262
## beta : FALSE
## gamma: FALSE
##
## Coefficients:
## [,1]
## a 1127.321
print(births_forecasts$fitted)
## Time Series:
## Start = c(1920, 2)
## End = c(2023, 238)
## Frequency = 365
## xhat level
## 1920.003 96.00000 96.00000
## 1920.005 98.03730 98.03730
## [ reached getOption("max.print") -- omitted 37830 rows ]
print(births_forecasts)
## Holt-Winters exponential smoothing without trend and without seasonal component.
##
## Call:
## HoltWinters(x = births_ts, beta = FALSE, gamma = FALSE)
##
## Smoothing parameters:
## alpha: 0.1072262
## beta : FALSE
## gamma: FALSE
##
## Coefficients:
## [,1]
## a 1127.321
plot(births_forecasts)
## Warning: package 'forecast' was built under R version 4.3.2