Covid-19 was declared a pandemic on 11 March 2020 by the World Health Organization. This assignment aims to study the impact of covid-19 on the Malaysia stock market so that the investors can understand the market better. There are a total of over 900 stocks listed under Bursa Malaysia Stock Exchange, but I decide to pick only 30 stocks that constitute the FBM KLCI Index, and choose a one year sample data size for a period of 1-Apr-2020 to 31-March-2021.
I used 3 separate datasets for this assignment. All the datasets were acquired from Yahoo Finance! website. The R codes I used for data gathering and data cleaning steps are described under the data preparation tab. I have uploaded all the datasets used in this assignment to my data.world account.
I pulled the past year’s historical prices of the above stocks from a period of 1/Apr/2020 to 31/Mar/2021. I used R to download the prices of each stock from Yahoo Finance! website, merged and cleaned for further analysis. I will elaborate more on this process under Data Preparation tab.
The following packages are used in this assignment, for data preparation and analysis.
To get the historical stock prices for the stocks mentioned under introduction tab, I performed web scraping using R, here are the sample codes. The historical prices of each stock is available on Yahoo Finance! website, i only need to modify 3 parameters and put it in a loop. The params period1 and period2 are from and to date which are Unix epoh time:
banking_stock_codes <- c('1023', '5819', '1082', '1155', '1295', '1066')
banking_stock_names <- c('CIMB', 'HLBANK', 'HLFG', 'MAYBANK', 'PBBANK', 'RHBBANK')
for (stk_code in banking_stock_codes){
url <- paste("https://query1.finance.yahoo.com/v7/finance/download/", stk_code, ".KL?period1=1585699200&period2=1617148800&interval=1d&events=history&includeAdjustedClose=true",sep='');
filename <- paste(stk_code, ".csv", sep='')
download.file(url, filename)
}
healthcare_equi_stock_codes <- c('5168', '7106', '7113')
healthcare_equi_stock_names <- c('HARTA', 'SUPERMX', 'TOPGLOV')
for(stk_code in healthcare_equi_stock_codes){
url <- paste("https://query1.finance.yahoo.com/v7/finance/download/", stk_code, ".KL?period1=1585699200&period2=1617148800&interval=1d&events=history&includeAdjustedClose=true",sep='');
filename <- paste(stk_code, ".csv", sep='')
download.file(url, filename)
}
The original dataset downloaded from Yahoo Finance! does not come with symbol code column, so I used cbind function to add two new columns to each csv file.
for(i in 1:length(banking_stock_codes)){
original_filename <- paste(banking_stock_codes[i], ".csv", sep='')
stock_df <- read.csv(file = original_filename)
df_len <- nrow(stock_df)
stock_codes_df <- data.frame(StockCode=rep(banking_stock_codes[i], df_len))
symbol_df <- data.frame(Symbol=rep(banking_stock_names[i], df_len))
stock_df <- cbind(stock_df, stock_codes_df, symbol_df)
filename <- paste(banking_stock_codes[i], "_", banking_stock_names[i], ".csv", sep='')
write.csv(stock_df, file = filename)
}
For this task, I used dplyr package for lapply and bind_rows functions. The readr package came very handy when i want to list all the files under one directory with list.files function.
library(dplyr)
library(readr)
data_all <- list.files(path = "cleaned_files/banking/",
pattern = "*.csv", full.names = TRUE) %>%
lapply(read_csv) %>%
bind_rows
write.csv(data_all, file = "banking_stocks_KLCI_historical_price.csv")
The result of this step is being presented under the Introduction tab. I installed two packages: treemap and RColorBrewer for this task.
library(treemap)
library(RColorBrewer)
stocks_list <- read.csv(file="wqd7004assignment/fbm_klci_stocks.csv")
png(filename="treemap_fbmklci_mktcap.png",width=1400, height=800)
treemap(stocks_list,
index="Symbol",
vSize="Market.Cap.in.billion.",
type="index",
title="A treemap of 30 FBMKLCI stocks based on Market Cap (in billion)",
palette = brewer.pal(5,"Set1"),
fontsize.title=30,
fontsize.labels = 16
)
dev.off()
The original file downloaded from Yahoo Finance! contains commas in the numbers. We need to transform the columns to numeric type first. For this task, I used dplyr package.
library(dplyr)
fbmklci_df <- read.csv(file = "wqd7004assignment/FBMKLCI_index_one_year_performance.csv")
fbmklci_df$Open <- as.numeric(gsub(",","",fbmklci_df$Open)) #remove commas and convert to numeric
fbmklci_df$High <- as.numeric(gsub(",","",fbmklci_df$High))
fbmklci_df$Low <- as.numeric(gsub(",","",fbmklci_df$Low))
fbmklci_df$Close <- as.numeric(gsub(",","",fbmklci_df$Close))
fbmklci_df$Date <- as.Date(fbmklci_df$Date , format = "%m/%d/%Y")
head(fbmklci_df)
I use ggplot2 to plot a line chart, the results is presented under the Introduction tab
library(ggplot2)
library(dplyr)
png(filename="line_chart_fbmklci.png")
p <- ggplot(fbmklci_df, aes(Date, Close, group = 1)) +
geom_line(color="#69b3a2") +
labs(x = "Date", y = "Close Price",
title = "FBM KLCI Market Performance from 01-04-2020 to 31-03-2021")
p+scale_x_date(date_labels = "%m-%Y")
dev.off()
As mentioned under the Introduction tab, I will be analysing the historical prices of two sectors separately. Each data set has a daily stock prices of each stock for a period from 01/04/2020 to 31/03/2021.
I will perform the following operations:
* read from csv file to dataframe
* change the type of Date column to date format
* print out the structure of the dataframe
* print out the head of dataframe
library(lubridate)
bankingdf <- read.csv("banking_stocks_KLCI_historical_price.csv")
bankingdf$Date <- ymd(bankingdf$Date)
str(bankingdf)
## 'data.frame': 1458 obs. of 11 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ X1 : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Date : Date, format: "2020-04-01" "2020-04-02" ...
## $ Open : num 3.59 3.52 3.53 3.5 3.56 3.53 3.54 3.54 3.52 3.51 ...
## $ High : num 3.64 3.55 3.54 3.53 3.6 3.55 3.57 3.54 3.53 3.58 ...
## $ Low : num 3.46 3.48 3.48 3.46 3.53 3.5 3.52 3.5 3.49 3.5 ...
## $ Close : num 3.53 3.52 3.5 3.5 3.56 3.52 3.54 3.52 3.5 3.57 ...
## $ Adj.Close: num 3.49 3.48 3.46 3.46 3.52 ...
## $ Volume : int 15962300 14289900 15139800 20801600 17923000 15139700 15025400 9066600 5813800 14146600 ...
## $ StockCode: int 1023 1023 1023 1023 1023 1023 1023 1023 1023 1023 ...
## $ Symbol : chr "CIMB" "CIMB" "CIMB" "CIMB" ...
head(bankingdf)
## X X1 Date Open High Low Close Adj.Close Volume StockCode Symbol
## 1 1 1 2020-04-01 3.59 3.64 3.46 3.53 3.492100 15962300 1023 CIMB
## 2 2 2 2020-04-02 3.52 3.55 3.48 3.52 3.482207 14289900 1023 CIMB
## 3 3 3 2020-04-03 3.53 3.54 3.48 3.50 3.462422 15139800 1023 CIMB
## 4 4 4 2020-04-06 3.50 3.53 3.46 3.50 3.462422 20801600 1023 CIMB
## 5 5 5 2020-04-07 3.56 3.60 3.53 3.56 3.521778 17923000 1023 CIMB
## 6 6 6 2020-04-08 3.53 3.55 3.50 3.52 3.482207 15139700 1023 CIMB
Next, I will generate the 5-number summary of the data set group by Symbol. I am most interested in Close Price and Volume features. This task can be easily achieved by data.table package, then I saved them into a variable for subsequent usage:
library(data.table)
setDT(bankingdf)
closePriceSummary <- bankingdf[, as.list(summary(Close)), by = Symbol]
volumeSummary <- bankingdf[, as.list(summary(Volume)), by = Symbol]
Let’s calculate the percentage change of the open prices based on the min. and max prices of each stock, and print out the result in ascending order of change percent.
library(knitr)
closePriceSummary$changePercent = ((closePriceSummary$Max. - closePriceSummary$Min) / closePriceSummary$Max) * 100
#print(closePriceSummary[order(changePercent)])
kable(closePriceSummary[order(changePercent)])
| Symbol | Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | changePercent |
|---|---|---|---|---|---|---|---|
| MAYBANK | 7.000 | 7.425 | 7.800 | 7.781852 | 8.14 | 8.91 | 21.43659 |
| RHBBANK | 4.230 | 4.690 | 4.950 | 4.985926 | 5.35 | 5.85 | 27.69231 |
| HLBANK | 13.140 | 14.540 | 15.000 | 15.853416 | 17.77 | 19.34 | 32.05791 |
| PBBANK | 2.932 | 3.180 | 3.468 | 3.591119 | 4.10 | 4.45 | 34.11236 |
| HLFG | 12.300 | 13.400 | 14.300 | 14.977695 | 16.83 | 18.68 | 34.15418 |
| CIMB | 2.900 | 3.410 | 3.630 | 3.701440 | 4.04 | 4.65 | 37.63441 |
From the above calculation, we can see that Maybank stock has the lowest change of 21.43% and CIMB has the highest change of 37.63%.
Next I am going to check the volume movement. I divided the max value by average volume to understand the spikes.
#calculate the max spike level of stock volumes
volumeSummary <- bankingdf[, as.list(summary(Volume)), by = Symbol]
volumeSummary$Movement <- volumeSummary$Max. / volumeSummary$Mean
#print(volumeSummary[order(Movement)])
kable(volumeSummary[order(Movement)])
| Symbol | Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | Movement |
|---|---|---|---|---|---|---|---|
| HLBANK | 0 | 579900 | 948600 | 1064696.3 | 1461600 | 4298600 | 4.037395 |
| MAYBANK | 0 | 4176750 | 5987300 | 7000535.4 | 8755550 | 30397300 | 4.342139 |
| RHBBANK | 0 | 2392050 | 3648900 | 4223892.2 | 5337150 | 20068600 | 4.751210 |
| HLFG | 0 | 95900 | 181700 | 238316.5 | 290600 | 1405900 | 5.899299 |
| PBBANK | 0 | 12006000 | 17588500 | 24402942.8 | 29304750 | 223687500 | 9.166415 |
| CIMB | 0 | 8795150 | 12710600 | 15911152.7 | 19270950 | 152250200 | 9.568772 |
Based on this result, we can see that volume spike is 4 to 9 times above average trading volumes, with CIMB stock being traded 9.57 times above average at the peak time.
I am going to plot 2 charts to further study the fluctuations in the price and volume. First, I will plot line charts for each stock:
library(dplyr)
library(ggplot2)
library(gridExtra)
banking_stock_names <- c('CIMB', 'HLBANK', 'HLFG', 'MAYBANK', 'PBBANK', 'RHBBANK')
plotLineChart = function (maindf, mSymbol) {
p1 <- ggplot(filter(maindf, Symbol == mSymbol), mapping = aes(Date, Close, group = 1)) +
geom_line(color="#69b3a2") +
labs(x = "Date Range", y = "Close Price",
title = mSymbol)
p1 + theme(axis.text.x=element_text(angle = 90, hjust = 0))
print(p1)
}
png(filename="banking_stock_prices.png", width=1400, height=800)
p1 <- plotLineChart(bankingdf, banking_stock_names[1])
p2 <- plotLineChart(bankingdf, banking_stock_names[2])
p3 <- plotLineChart(bankingdf, banking_stock_names[3])
p4 <- plotLineChart(bankingdf, banking_stock_names[4])
p5 <- plotLineChart(bankingdf, banking_stock_names[5])
p6 <- plotLineChart(bankingdf, banking_stock_names[6])
grid.arrange(p1, p2, p3, p4, p5, p6, ncol=3)
dev.off()
Next, let’s plot histogram based on the daily trading volumes of each stock. I use scales package to scale the x axis into the interval of two months each.
#Function to plot histogram
require(scales)
plotHistogram = function (maindf, mSymbol){
df <- filter(maindf, Symbol == mSymbol)
plot1 <- ggplot(df, aes(Date, Volume)) +
geom_bar(stat="identity", na.rm = TRUE, colour = "steelblue") +
ggtitle(mSymbol) +
xlab("Date") + ylab("Volume") +
scale_x_date(labels=date_format ("%b %y"), breaks=("2 months")) +
theme(plot.title = element_text(lineheight=.8, face="bold", size = 20)) +
theme(text = element_text(size=18)) +
theme(axis.text.x=element_text(angle = 90, hjust = 0))
print(plot1)
}
In summary, I think the stocks in banking sector is highly related to the covid-19 loan moratorium announcement by Bank Negara Malaysia for a period of 6 months from April to September 2020. All the banking stocks saw a sharp increase in price once the load moratorium has ended. As for the volume, we can see a lot of volume spike which is almost triple or quadruple of the usual trading volume around June and November 2020 for all banking stocks.
I am going to analyse the past year historical prices of the three healthcare equipment stocks under FBMKLCI, which is also known as ‘glove stocks’ in layman term. The last year sees a breakthrough in the glove stocks prices due to the global demand during the covid-19 pandemic. Let’s try to study the trend.
Three separate csv files obtained from Yahoo Finance! has been merged into one master csv file, here I use lubridate package to convert the date string to Date data type. Let’s print out the head and structure of the final dataset.
library(lubridate)
healthcaredf <- read.csv("healthcare_stocks_KLCI_historical_price.csv")
healthcaredf$Date <- ymd(healthcaredf$Date)
head(healthcaredf)
## X X1 Date Open High Low Close Adj.Close Volume StockCode Symbol
## 1 1 1 2020-04-01 6.88 6.89 6.74 6.75 6.661989 4733000 5168 HARTA
## 2 2 2 2020-04-02 7.10 7.10 6.79 6.80 6.711338 7211700 5168 HARTA
## 3 3 3 2020-04-03 7.00 7.02 6.87 6.90 6.810033 6286800 5168 HARTA
## 4 4 4 2020-04-06 6.93 7.10 6.90 7.04 6.948208 6742100 5168 HARTA
## 5 5 5 2020-04-07 7.10 7.20 7.07 7.15 7.056774 7225000 5168 HARTA
## 6 6 6 2020-04-08 7.20 7.39 7.16 7.27 7.175209 8316400 5168 HARTA
str(healthcaredf)
## 'data.frame': 729 obs. of 11 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ X1 : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Date : Date, format: "2020-04-01" "2020-04-02" ...
## $ Open : num 6.88 7.1 7 6.93 7.1 7.2 7.35 7.5 7.47 7.44 ...
## $ High : num 6.89 7.1 7.02 7.1 7.2 7.39 7.54 7.6 7.59 7.55 ...
## $ Low : num 6.74 6.79 6.87 6.9 7.07 7.16 7.34 7.44 7.42 7.43 ...
## $ Close : num 6.75 6.8 6.9 7.04 7.15 7.27 7.44 7.47 7.42 7.47 ...
## $ Adj.Close: num 6.66 6.71 6.81 6.95 7.06 ...
## $ Volume : int 4733000 7211700 6286800 6742100 7225000 8316400 9007100 6342700 3714800 6004000 ...
## $ StockCode: int 5168 5168 5168 5168 5168 5168 5168 5168 5168 5168 ...
## $ Symbol : chr "HARTA" "HARTA" "HARTA" "HARTA" ...
Next, I will generate the 5-number summary of the data set group by Symbol. I am most interested in Close Price and Volume features, and saved them into a variable for subsequent usage:
library(data.table)
setDT(healthcaredf)
closePriceSummary <- healthcaredf[, as.list(summary(Close)), by = Symbol]
volumeSummary <- healthcaredf[, as.list(summary(Volume)), by = Symbol]
In this step, I will do some calculations based on the summary dataframe achieved from the previous step. As for price, let’s calculate the percentage change based on 1st quarter average price and max stock price in a year. The minimum price doesn’t look very accurate to me, I have decided to calculate using first quarter average price.
#calculate the percentage change based on 1st quarter average price and max price
closePriceSummary$changePercent = ((closePriceSummary$Max. - closePriceSummary$`1st Qu.`) / closePriceSummary$Max) * 100
#order by
#print(closePriceSummary[order(changePercent)])
kable(closePriceSummary[order(changePercent)])
| Symbol | Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | changePercent |
|---|---|---|---|---|---|---|---|
| TOPGLOV | 2.126666 | 5.200 | 6.566666 | 6.362428 | 8.276666 | 9.60000 | 45.83333 |
| HARTA | 6.750000 | 10.540 | 12.780000 | 13.269300 | 16.480000 | 20.50000 | 48.58537 |
| SUPERMX | 0.777717 | 4.135 | 6.760000 | 6.254742 | 8.569567 | 11.63152 | 64.45006 |
We can spot some extraordinary rise for the healthcare equipment sector within one year. The prices had increased 45-65% due to the pandemic during it’s peak time.
#calculate the max spike level of stock volumes
volumeSummary$Movement <- volumeSummary$Max. / volumeSummary$Mean
#print(volumeSummary[order(Movement)])
kable(volumeSummary[order(Movement)])
| Symbol | Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | Movement |
|---|---|---|---|---|---|---|---|
| SUPERMX | 0 | 21773730 | 39491706 | 46335683 | 65851132 | 180783231 | 3.901598 |
| TOPGLOV | 0 | 28439050 | 46084900 | 55746065 | 73653450 | 221265400 | 3.969166 |
| HARTA | 0 | 4869750 | 7399300 | 9088844 | 12148450 | 38642100 | 4.251597 |
The trading volume doesn’t seem to be as volatile as banking sector. But we can still observe at least 4 times above-average trading volume at it’s highest point.
I have reused the plotting function mentioned under banking tab.
healthcare_stock_names <- c('HARTA', 'SUPERMX', 'TOPGLOV')
png(filename="healthcare_stock_prices.png", width=1400, height=400)
p1 <- plotLineChart(healthcaredf, healthcare_stock_names[1])
p2 <- plotLineChart(healthcaredf, healthcare_stock_names[2])
p3 <- plotLineChart(healthcaredf, healthcare_stock_names[3])
grid.arrange(p1, p2, p3, ncol=3)
dev.off()
These three stocks price movement is closely related, a very similar pattern for the whole sector.
Next, let’s plot out the trading volumes:
png(filename="healthcare_stock_volume.png", width=1400, height=400)
p1 <- plotHistogram(healthcaredf, healthcare_stock_names[1])
p2 <- plotHistogram(healthcaredf, healthcare_stock_names[2])
p3 <- plotHistogram(healthcaredf, healthcare_stock_names[3])
grid.arrange(p1, p2, p3, ncol=3)
dev.off()
The highest trading volumes are observed between August 2020 and September 2020.
From the charts, we can summarise that the prices of the golve stocks increased exponentially as the rise of covid 19 cased until third quarter of the year 2020. And the sharp decline in healthcare equipment stocks is closely followed by the first announcement of the covid-19 vaccine by Pfizer in November 2020.