WQD7004 Individual Assignment 1 (Sem2 2020/2021)

Introduction

Covid-19 was declared a pandemic on 11 March 2020 by the World Health Organization. This assignment aims to study the impact of covid-19 on the Malaysia stock market so that the investors can understand the market better. There are a total of over 900 stocks listed under Bursa Malaysia Stock Exchange, but I decide to pick only 30 stocks that constitute the FBM KLCI Index, and choose a one year sample data size for a period of 1-Apr-2020 to 31-March-2021.

Short description of the datasets

I used 3 separate datasets for this assignment. All the datasets were acquired from Yahoo Finance! website. The R codes I used for data gathering and data cleaning steps are described under the data preparation tab. I have uploaded all the datasets used in this assignment to my data.world account.

  1. FBM KLCI stocks list
There are 30 stocks in FBMKLCI index, the list is being reviewed from time to time. For the sake of this assignment, I have downloaded the latest stock constituents from Bursa Malaysia website as at (10-Apr-2021) and converted it to csv. I make use of the treemap package to plot a visualization based on the market capital of each stock. The codes used to generate this plot will be discussed under the Step 4 of Data Preparation tab.
List of FBMKLCI stocks
  1. One year movement of the Kuala Lumpur Composite Index (KLCI)
I downloaded this dataset from Yahoo Finance!, it includes the KLCI index over the period from 01/04/2020 to 31/03/2021, with a total of 245 rows, excluding public holidays. Some data cleaning were done to the csv file, and here is an overview of the overall market performance. The details of the data cleaning and plotting is mentioned under the Data Preparation tab.

Stock market performance from 01-04-2020 to 31-03-2021
  1. Historical stock prices
Due to the time constraint, I choose to study 2 groups of stocks out of 30 stocks from the Index. I will focus on two major sectors, 6 stocks from banking sector, and 3 stocks from healthcare equipment manufacturing sector. Here are the detailed stock lists:

  • Banking Sector
    • CIMB (1023): CIMB GROUP HOLDINGS BERHAD
    • HLBANK (5819): HONG LEONG BANK BERHAD
    • HLFG (1082): HONG LEONG FINANCIAL GROUP BERHAD
    • MAYBANK (1155): MALAYAN BANKING BERHAD
    • PBBANK (1295): PUBLIC BANK BERHAD
    • RHBBANK (1066): RHB BANK BERHAD
  • Healthcare Equipment Manufacturing Sector
    • HARTA (5168): HARTALEGA HOLDINGS BERHAD
    • SUPERMX (7106): SUPERMAX CORPORATION BERHAD
    • TOPGLOV (7113): TOP GLOVE CORPORATION BHD

I pulled the past year’s historical prices of the above stocks from a period of 1/Apr/2020 to 31/Mar/2021. I used R to download the prices of each stock from Yahoo Finance! website, merged and cleaned for further analysis. I will elaborate more on this process under Data Preparation tab.

Packages Info

The following packages are used in this assignment, for data preparation and analysis.

  • dplyr
    • to merge csv files: lapply and bind_rows
    • to convert the data type of dataframe columns: as.numeric and as.Date
  • readr
    • to read all csv files under a directory
    • Usage: list.files(path = “cleaned_files/banking/”, pattern = "*.csv", full.names = TRUE)
  • treemap
    • to draw a treemap
  • RColorBrewer
    • contains a ready-to-use color palettes for creating beautiful graphics (I used for treemap)
    • Usage: palette = brewer.pal(5,“Set1”)
  • data.table
    • this package is to summarize dataframe by group
  • lubridate
    • easier package to format Dates
    • Usage: ymd(healthcaredf$Date)
  • ggplot2
    • to create data visualizations
  • gridExtra
    • to arrange plots in a grid
    • Usage: grid.arrange(p1, p2, p3, p4, p5, p6, ncol=3)
  • scales
    • to tweak axis breaks and labels
    • Usage: scale_x_date(labels=date_format (“%b %y”), breaks=(“2 months”))
  • knitr
    • Usage: kable
    • to present the data frames nicely

Data Preparation

Step 1: Download Historical price files from Yahoo Finance!

To get the historical stock prices for the stocks mentioned under introduction tab, I performed web scraping using R, here are the sample codes. The historical prices of each stock is available on Yahoo Finance! website, i only need to modify 3 parameters and put it in a loop. The params period1 and period2 are from and to date which are Unix epoh time:

  • 1585699200 : Wednesday, April 1, 2020 12:00:00 AM (GMT)
  • 1617148800 : Wednesday, March 31, 2021 12:00:00 AM (GMT)
banking_stock_codes <- c('1023', '5819', '1082', '1155', '1295', '1066')
banking_stock_names <- c('CIMB', 'HLBANK', 'HLFG', 'MAYBANK', 'PBBANK', 'RHBBANK')
for (stk_code in banking_stock_codes){
  url <- paste("https://query1.finance.yahoo.com/v7/finance/download/", stk_code, ".KL?period1=1585699200&period2=1617148800&interval=1d&events=history&includeAdjustedClose=true",sep='');
  filename <- paste(stk_code, ".csv", sep='')
  download.file(url, filename)
}

healthcare_equi_stock_codes <- c('5168', '7106', '7113')
healthcare_equi_stock_names <- c('HARTA', 'SUPERMX', 'TOPGLOV')
for(stk_code in healthcare_equi_stock_codes){
  url <- paste("https://query1.finance.yahoo.com/v7/finance/download/", stk_code, ".KL?period1=1585699200&period2=1617148800&interval=1d&events=history&includeAdjustedClose=true",sep='');
  filename <- paste(stk_code, ".csv", sep='')
  download.file(url, filename)
}
Step 2: Data Manipulation: Add stock code and symbol code columns

The original dataset downloaded from Yahoo Finance! does not come with symbol code column, so I used cbind function to add two new columns to each csv file.

for(i in 1:length(banking_stock_codes)){
  original_filename <- paste(banking_stock_codes[i], ".csv", sep='')
  stock_df <- read.csv(file = original_filename)
  df_len <- nrow(stock_df)
  stock_codes_df <- data.frame(StockCode=rep(banking_stock_codes[i], df_len))
  symbol_df <- data.frame(Symbol=rep(banking_stock_names[i], df_len))
  stock_df <- cbind(stock_df, stock_codes_df, symbol_df)
  filename <- paste(banking_stock_codes[i], "_", banking_stock_names[i], ".csv", sep='')
  write.csv(stock_df, file = filename)
}
Step 3: Merge all the csv files into one master file

For this task, I used dplyr package for lapply and bind_rows functions. The readr package came very handy when i want to list all the files under one directory with list.files function.

library(dplyr)
library(readr)

data_all <- list.files(path = "cleaned_files/banking/",
                       pattern = "*.csv", full.names = TRUE) %>%
  lapply(read_csv) %>%
  bind_rows

write.csv(data_all, file = "banking_stocks_KLCI_historical_price.csv")
Step 4: Create a treemap of 30 FBMKLCI stocks based on market cap

The result of this step is being presented under the Introduction tab. I installed two packages: treemap and RColorBrewer for this task.

library(treemap)
library(RColorBrewer)

stocks_list <- read.csv(file="wqd7004assignment/fbm_klci_stocks.csv")
png(filename="treemap_fbmklci_mktcap.png",width=1400, height=800)
treemap(stocks_list,
        index="Symbol",
        vSize="Market.Cap.in.billion.",
        type="index",
        title="A treemap of 30 FBMKLCI stocks based on Market Cap (in billion)",
        palette = brewer.pal(5,"Set1"),
        fontsize.title=30,
        fontsize.labels = 16
)
dev.off()
Step 5: Clean the FBMKLCI Index price data file

The original file downloaded from Yahoo Finance! contains commas in the numbers. We need to transform the columns to numeric type first. For this task, I used dplyr package.

library(dplyr)
fbmklci_df <- read.csv(file = "wqd7004assignment/FBMKLCI_index_one_year_performance.csv")

fbmklci_df$Open <- as.numeric(gsub(",","",fbmklci_df$Open)) #remove commas and convert to numeric
fbmklci_df$High <- as.numeric(gsub(",","",fbmklci_df$High))
fbmklci_df$Low <- as.numeric(gsub(",","",fbmklci_df$Low))
fbmklci_df$Close <- as.numeric(gsub(",","",fbmklci_df$Close))
fbmklci_df$Date <- as.Date(fbmklci_df$Date , format = "%m/%d/%Y")

head(fbmklci_df)
Step 6: Plotting the FBMKLCI one year performance chart

I use ggplot2 to plot a line chart, the results is presented under the Introduction tab

library(ggplot2)
library(dplyr)
png(filename="line_chart_fbmklci.png")
p <- ggplot(fbmklci_df, aes(Date, Close, group = 1)) +
  geom_line(color="#69b3a2") +
  labs(x = "Date", y = "Close Price",
       title = "FBM KLCI Market Performance from 01-04-2020 to 31-03-2021")
p+scale_x_date(date_labels = "%m-%Y")
dev.off()

Data Analysis

As mentioned under the Introduction tab, I will be analysing the historical prices of two sectors separately. Each data set has a daily stock prices of each stock for a period from 01/04/2020 to 31/03/2021.

Banking Sector
I will start with analysing the 6 stocks under the banking sector.
Basic data preparations

I will perform the following operations:
* read from csv file to dataframe
* change the type of Date column to date format
* print out the structure of the dataframe
* print out the head of dataframe

library(lubridate)
bankingdf <- read.csv("banking_stocks_KLCI_historical_price.csv")
bankingdf$Date <- ymd(bankingdf$Date)
str(bankingdf)
## 'data.frame':    1458 obs. of  11 variables:
##  $ X        : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ X1       : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Date     : Date, format: "2020-04-01" "2020-04-02" ...
##  $ Open     : num  3.59 3.52 3.53 3.5 3.56 3.53 3.54 3.54 3.52 3.51 ...
##  $ High     : num  3.64 3.55 3.54 3.53 3.6 3.55 3.57 3.54 3.53 3.58 ...
##  $ Low      : num  3.46 3.48 3.48 3.46 3.53 3.5 3.52 3.5 3.49 3.5 ...
##  $ Close    : num  3.53 3.52 3.5 3.5 3.56 3.52 3.54 3.52 3.5 3.57 ...
##  $ Adj.Close: num  3.49 3.48 3.46 3.46 3.52 ...
##  $ Volume   : int  15962300 14289900 15139800 20801600 17923000 15139700 15025400 9066600 5813800 14146600 ...
##  $ StockCode: int  1023 1023 1023 1023 1023 1023 1023 1023 1023 1023 ...
##  $ Symbol   : chr  "CIMB" "CIMB" "CIMB" "CIMB" ...
head(bankingdf)
##   X X1       Date Open High  Low Close Adj.Close   Volume StockCode Symbol
## 1 1  1 2020-04-01 3.59 3.64 3.46  3.53  3.492100 15962300      1023   CIMB
## 2 2  2 2020-04-02 3.52 3.55 3.48  3.52  3.482207 14289900      1023   CIMB
## 3 3  3 2020-04-03 3.53 3.54 3.48  3.50  3.462422 15139800      1023   CIMB
## 4 4  4 2020-04-06 3.50 3.53 3.46  3.50  3.462422 20801600      1023   CIMB
## 5 5  5 2020-04-07 3.56 3.60 3.53  3.56  3.521778 17923000      1023   CIMB
## 6 6  6 2020-04-08 3.53 3.55 3.50  3.52  3.482207 15139700      1023   CIMB
Summarise data based on Price and Volume

Next, I will generate the 5-number summary of the data set group by Symbol. I am most interested in Close Price and Volume features. This task can be easily achieved by data.table package, then I saved them into a variable for subsequent usage:

library(data.table)
setDT(bankingdf)
closePriceSummary <- bankingdf[, as.list(summary(Close)), by = Symbol]
volumeSummary <- bankingdf[, as.list(summary(Volume)), by = Symbol]
Analysing Price and Volume movement

Let’s calculate the percentage change of the open prices based on the min. and max prices of each stock, and print out the result in ascending order of change percent.

library(knitr)
closePriceSummary$changePercent = ((closePriceSummary$Max. - closePriceSummary$Min) / closePriceSummary$Max) * 100
#print(closePriceSummary[order(changePercent)])
kable(closePriceSummary[order(changePercent)])
Symbol Min. 1st Qu. Median Mean 3rd Qu. Max. changePercent
MAYBANK 7.000 7.425 7.800 7.781852 8.14 8.91 21.43659
RHBBANK 4.230 4.690 4.950 4.985926 5.35 5.85 27.69231
HLBANK 13.140 14.540 15.000 15.853416 17.77 19.34 32.05791
PBBANK 2.932 3.180 3.468 3.591119 4.10 4.45 34.11236
HLFG 12.300 13.400 14.300 14.977695 16.83 18.68 34.15418
CIMB 2.900 3.410 3.630 3.701440 4.04 4.65 37.63441

From the above calculation, we can see that Maybank stock has the lowest change of 21.43% and CIMB has the highest change of 37.63%.

Next I am going to check the volume movement. I divided the max value by average volume to understand the spikes.

#calculate the max spike level of stock volumes 
volumeSummary <- bankingdf[, as.list(summary(Volume)), by = Symbol]
volumeSummary$Movement <- volumeSummary$Max. / volumeSummary$Mean
#print(volumeSummary[order(Movement)])
kable(volumeSummary[order(Movement)])
Symbol Min. 1st Qu. Median Mean 3rd Qu. Max. Movement
HLBANK 0 579900 948600 1064696.3 1461600 4298600 4.037395
MAYBANK 0 4176750 5987300 7000535.4 8755550 30397300 4.342139
RHBBANK 0 2392050 3648900 4223892.2 5337150 20068600 4.751210
HLFG 0 95900 181700 238316.5 290600 1405900 5.899299
PBBANK 0 12006000 17588500 24402942.8 29304750 223687500 9.166415
CIMB 0 8795150 12710600 15911152.7 19270950 152250200 9.568772

Based on this result, we can see that volume spike is 4 to 9 times above average trading volumes, with CIMB stock being traded 9.57 times above average at the peak time.

Plotting

I am going to plot 2 charts to further study the fluctuations in the price and volume. First, I will plot line charts for each stock:

library(dplyr)
library(ggplot2)
library(gridExtra)

banking_stock_names <- c('CIMB', 'HLBANK', 'HLFG', 'MAYBANK', 'PBBANK', 'RHBBANK')

plotLineChart = function (maindf, mSymbol) {
  p1 <- ggplot(filter(maindf, Symbol == mSymbol), mapping = aes(Date, Close, group = 1)) +
    geom_line(color="#69b3a2") +
    labs(x = "Date Range", y = "Close Price", 
         title = mSymbol)  
  p1 + theme(axis.text.x=element_text(angle = 90, hjust = 0))
  print(p1)
}

png(filename="banking_stock_prices.png", width=1400, height=800)
p1 <- plotLineChart(bankingdf, banking_stock_names[1])
p2 <- plotLineChart(bankingdf, banking_stock_names[2])
p3 <- plotLineChart(bankingdf, banking_stock_names[3])
p4 <- plotLineChart(bankingdf, banking_stock_names[4])
p5 <- plotLineChart(bankingdf, banking_stock_names[5])
p6 <- plotLineChart(bankingdf, banking_stock_names[6])
grid.arrange(p1, p2, p3, p4, p5, p6, ncol=3)
dev.off()
Price movement of banking stocks

Next, let’s plot histogram based on the daily trading volumes of each stock. I use scales package to scale the x axis into the interval of two months each.

#Function to plot histogram
require(scales)
plotHistogram = function (maindf, mSymbol){
  df <- filter(maindf, Symbol == mSymbol) 
  
  plot1 <- ggplot(df, aes(Date, Volume)) +
    geom_bar(stat="identity", na.rm = TRUE, colour = "steelblue") +
    ggtitle(mSymbol) +
    xlab("Date") + ylab("Volume") +
    scale_x_date(labels=date_format ("%b %y"), breaks=("2 months")) +
    theme(plot.title = element_text(lineheight=.8, face="bold", size = 20)) +
    theme(text = element_text(size=18)) + 
    theme(axis.text.x=element_text(angle = 90, hjust = 0))
  
  print(plot1)
}
Daily trading volume distribution of banking stocks
Summary

In summary, I think the stocks in banking sector is highly related to the covid-19 loan moratorium announcement by Bank Negara Malaysia for a period of 6 months from April to September 2020. All the banking stocks saw a sharp increase in price once the load moratorium has ended. As for the volume, we can see a lot of volume spike which is almost triple or quadruple of the usual trading volume around June and November 2020 for all banking stocks.

Healthcare Equipment Manufacturing Sector

I am going to analyse the past year historical prices of the three healthcare equipment stocks under FBMKLCI, which is also known as ‘glove stocks’ in layman term. The last year sees a breakthrough in the glove stocks prices due to the global demand during the covid-19 pandemic. Let’s try to study the trend.

Basic data preparations

Three separate csv files obtained from Yahoo Finance! has been merged into one master csv file, here I use lubridate package to convert the date string to Date data type. Let’s print out the head and structure of the final dataset.

library(lubridate)
healthcaredf <- read.csv("healthcare_stocks_KLCI_historical_price.csv")
healthcaredf$Date <- ymd(healthcaredf$Date)
head(healthcaredf)
##   X X1       Date Open High  Low Close Adj.Close  Volume StockCode Symbol
## 1 1  1 2020-04-01 6.88 6.89 6.74  6.75  6.661989 4733000      5168  HARTA
## 2 2  2 2020-04-02 7.10 7.10 6.79  6.80  6.711338 7211700      5168  HARTA
## 3 3  3 2020-04-03 7.00 7.02 6.87  6.90  6.810033 6286800      5168  HARTA
## 4 4  4 2020-04-06 6.93 7.10 6.90  7.04  6.948208 6742100      5168  HARTA
## 5 5  5 2020-04-07 7.10 7.20 7.07  7.15  7.056774 7225000      5168  HARTA
## 6 6  6 2020-04-08 7.20 7.39 7.16  7.27  7.175209 8316400      5168  HARTA
str(healthcaredf)
## 'data.frame':    729 obs. of  11 variables:
##  $ X        : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ X1       : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Date     : Date, format: "2020-04-01" "2020-04-02" ...
##  $ Open     : num  6.88 7.1 7 6.93 7.1 7.2 7.35 7.5 7.47 7.44 ...
##  $ High     : num  6.89 7.1 7.02 7.1 7.2 7.39 7.54 7.6 7.59 7.55 ...
##  $ Low      : num  6.74 6.79 6.87 6.9 7.07 7.16 7.34 7.44 7.42 7.43 ...
##  $ Close    : num  6.75 6.8 6.9 7.04 7.15 7.27 7.44 7.47 7.42 7.47 ...
##  $ Adj.Close: num  6.66 6.71 6.81 6.95 7.06 ...
##  $ Volume   : int  4733000 7211700 6286800 6742100 7225000 8316400 9007100 6342700 3714800 6004000 ...
##  $ StockCode: int  5168 5168 5168 5168 5168 5168 5168 5168 5168 5168 ...
##  $ Symbol   : chr  "HARTA" "HARTA" "HARTA" "HARTA" ...
Summarise data based on Price and Volume

Next, I will generate the 5-number summary of the data set group by Symbol. I am most interested in Close Price and Volume features, and saved them into a variable for subsequent usage:

library(data.table)
setDT(healthcaredf)
closePriceSummary <- healthcaredf[, as.list(summary(Close)), by = Symbol]
volumeSummary <- healthcaredf[, as.list(summary(Volume)), by = Symbol]
Analysing Price and Volume changes

In this step, I will do some calculations based on the summary dataframe achieved from the previous step. As for price, let’s calculate the percentage change based on 1st quarter average price and max stock price in a year. The minimum price doesn’t look very accurate to me, I have decided to calculate using first quarter average price.

#calculate the percentage change based on 1st quarter average price and max price
closePriceSummary$changePercent = ((closePriceSummary$Max. - closePriceSummary$`1st Qu.`) / closePriceSummary$Max) * 100
#order by
#print(closePriceSummary[order(changePercent)])
kable(closePriceSummary[order(changePercent)])
Symbol Min. 1st Qu. Median Mean 3rd Qu. Max. changePercent
TOPGLOV 2.126666 5.200 6.566666 6.362428 8.276666 9.60000 45.83333
HARTA 6.750000 10.540 12.780000 13.269300 16.480000 20.50000 48.58537
SUPERMX 0.777717 4.135 6.760000 6.254742 8.569567 11.63152 64.45006

We can spot some extraordinary rise for the healthcare equipment sector within one year. The prices had increased 45-65% due to the pandemic during it’s peak time.

#calculate the max spike level of stock volumes 
volumeSummary$Movement <- volumeSummary$Max. / volumeSummary$Mean
#print(volumeSummary[order(Movement)])
kable(volumeSummary[order(Movement)])
Symbol Min. 1st Qu. Median Mean 3rd Qu. Max. Movement
SUPERMX 0 21773730 39491706 46335683 65851132 180783231 3.901598
TOPGLOV 0 28439050 46084900 55746065 73653450 221265400 3.969166
HARTA 0 4869750 7399300 9088844 12148450 38642100 4.251597

The trading volume doesn’t seem to be as volatile as banking sector. But we can still observe at least 4 times above-average trading volume at it’s highest point.

Plotting

I have reused the plotting function mentioned under banking tab.

healthcare_stock_names <- c('HARTA', 'SUPERMX', 'TOPGLOV')

png(filename="healthcare_stock_prices.png", width=1400, height=400)
p1 <- plotLineChart(healthcaredf, healthcare_stock_names[1])
p2 <- plotLineChart(healthcaredf, healthcare_stock_names[2])
p3 <- plotLineChart(healthcaredf, healthcare_stock_names[3])
grid.arrange(p1, p2, p3, ncol=3)
dev.off()
Price movement of healthcare equipment stocks

These three stocks price movement is closely related, a very similar pattern for the whole sector.

Next, let’s plot out the trading volumes:

png(filename="healthcare_stock_volume.png", width=1400, height=400)
p1 <- plotHistogram(healthcaredf, healthcare_stock_names[1])
p2 <- plotHistogram(healthcaredf, healthcare_stock_names[2])
p3 <- plotHistogram(healthcaredf, healthcare_stock_names[3])
grid.arrange(p1, p2, p3, ncol=3)
dev.off()
Daily trading volume distribution of healthcare equipment stocks

The highest trading volumes are observed between August 2020 and September 2020.

Summary

From the charts, we can summarise that the prices of the golve stocks increased exponentially as the rise of covid 19 cased until third quarter of the year 2020. And the sharp decline in healthcare equipment stocks is closely followed by the first announcement of the covid-19 vaccine by Pfizer in November 2020.