***Will have an intermittent updates for minor improvement from time to time

Data Overview

The data is taken from data.gov.my a public open source data controlled by Department of Statistics Malaysia (DOSM).

It is derived based on births registered with JPN. Accordingly, if a birth is not registered with JPN (for instance, if a foreigner chooses to register their child in their home country, or if a resident in a remote area does not register the birth of their child), it will not count in this dataset.

This dataset tabulate births for each day of birth, rather than the date of registration with JPN. Therefore, to ensure accuracy, the data is provided with a 1 month lag, since most people do not register their child on the exact day they are born.

The data is collected from 1st January 1920 till 31st July 2023 made up to 37834 samples (row)

#Load the data
file_path <- "C:\\Users\\Qandiyas Qassem\\Downloads\\my pet project\\births.csv"
your_data <- read.csv(file_path)

# Display the result
print(head(your_data))
##       date births
## 1 1/1/1920     96
## 2 2/1/1920    115
## 3 3/1/1920    111
## 4 4/1/1920    101
## 5 5/1/1920     95
## 6 6/1/1920     91
#Check data type
str(your_data)
## 'data.frame':    37833 obs. of  2 variables:
##  $ date  : chr  "1/1/1920" "2/1/1920" "3/1/1920" "4/1/1920" ...
##  $ births: int  96 115 111 101 95 91 85 83 96 123 ...
#Check Data Range:
min_date <- min(your_data$births)
max_date <- max(your_data$births)
#print(c("Min Date:", min_date, "Max Date:", max_date)

ts_start <- min_date
ts_end <- max_date
print(ts_start)
## [1] 43
births_ts <- ts(your_data$births, frequency=365, start=c(1920,1,1))
#births_ts
plot.ts(births_ts)

# Transforming the series
logs_birth_ts <- log(births_ts)
plot.ts(logs_birth_ts)

#Check what kind of data used
class(your_data)#data.frame
## [1] "data.frame"
class(births_ts)#array
## [1] "ts"

It is observed that the time series of births rate in Malaysia is seasonal and the fluctuations are hardly constant in size over time.
## Exploratory Data Analysis (EDA)

Augmented Dickey-Fuller Test

library(tseries)
## Warning: package 'tseries' was built under R version 4.3.2
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
adf.test(births_ts)
## Warning in adf.test(births_ts): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  births_ts
## Dickey-Fuller = -6.0082, Lag order = 33, p-value = 0.01
## alternative hypothesis: stationary

The null hypothesis of the ADF test is that the time series has a unit root and is non-stationary.

if p<0.05 is true, accept the data is stationary and reject the null hypothesis

We see that the series is stationary enough to do any kind of time series modelling.

Decomposing Seasonal Data

The purpose of decompose is to separate time series into seasonal, trend and an irregular (random) components.

this function will return a list object where it consist o=f estimation of seasonal , trend and irregular component

# Decomposing Time Series
#separating into seaspnal , trend or irregular 
library("TTR")
## Warning: package 'TTR' was built under R version 4.3.2
# Decomposing Seasonal Data
#To estimate the trend, seasonal and irregular components of this time series
births_comp <- decompose(births_ts)
print(head(births_comp$seasonal))
## [1] 14.51728 26.42413 52.41870 61.01732 59.16622 38.18981
# Plot the estimated trend, seasonal, and irregular components of the time series
plot(births_comp)

The trend plot shows the underlying trend of the data. We can see here a positive trend in the data, suggesting increasing birth rates over 104 years of data period

The seasonal plot shows patterns that repeat at a regular interval. This is show strong seasonality regardless any event.

The random plot shows the residuals of the time series after the trend and seasonal parts are removed.

Results

Seasonally adjusted

Removed seasonality from the original time series to get a seasonally-adjusted time series.

births_ts_seasonallyadjusted <- births_ts - births_comp$seasonal
plot(births_ts_seasonallyadjusted)

Detrended Adjustment

To remove trend, from dataset, simply subtract the trend from the original time series

births_ts_detrended <- births_ts - births_comp$trend
plot(births_ts_detrended)

Forecasting within Time Period

options(max.print = 5)
#forecast using howitzer
births_forecasts <- HoltWinters(births_ts, beta=FALSE, gamma=FALSE)
print(births_forecasts)
## Holt-Winters exponential smoothing without trend and without seasonal component.
## 
## Call:
## HoltWinters(x = births_ts, beta = FALSE, gamma = FALSE)
## 
## Smoothing parameters:
##  alpha: 0.1072262
##  beta : FALSE
##  gamma: FALSE
## 
## Coefficients:
##       [,1]
## a 1127.321
print(births_forecasts$fitted)
## Time Series:
## Start = c(1920, 2) 
## End = c(2023, 238) 
## Frequency = 365 
##                xhat      level
## 1920.003   96.00000   96.00000
## 1920.005   98.03730   98.03730
##  [ reached getOption("max.print") -- omitted 37830 rows ]
print(births_forecasts)
## Holt-Winters exponential smoothing without trend and without seasonal component.
## 
## Call:
## HoltWinters(x = births_ts, beta = FALSE, gamma = FALSE)
## 
## Smoothing parameters:
##  alpha: 0.1072262
##  beta : FALSE
##  gamma: FALSE
## 
## Coefficients:
##       [,1]
## a 1127.321
plot(births_forecasts)

## Warning: package 'forecast' was built under R version 4.3.2