title: “Quiz” author: “Nguyen Bao Quynh Trang” date: “2022-10-26” output: html_document

Input & View

First, we input the data ETF by function “read.table”. Then using “view” to display the data. We name the data as “data” for short.

data <- read.table('~/data/dataquiz.txt', header= TRUE)
head(data)
##   證券代碼   簡稱   年月日 收盤價.元.
## 1       50 ????50 20100104    37.5723
## 2       52   ???? 20100104    23.1720
## 3       56  ????? 20100104    11.9283
## 4       50 ????50 20100105    37.5723
## 5       52   ???? 20100105    23.2037
## 6       56  ????? 20100105    11.8775

Cleaning the raw data

Then, we clean the raw data:

data <- data[,-2]
as.data.table(data)
##       證券代碼   年月日 收盤價.元.
##    1:       50 20100104    37.5723
##    2:       52 20100104    23.1720
##    3:       56 20100104    11.9283
##    4:       50 20100105    37.5723
##    5:       52 20100105    23.2037
##   ---                             
## 8858:       52 20211229   122.8468
## 8859:       56 20211229    30.8785
## 8860:       50 20211230   140.1584
## 8861:       52 20211230   122.8010
## 8862:       56 20211230    30.8510
colnames(data) <- c('id','date','close')
data1 <- reshape2::dcast(data,date~id)
## Using close as value column: use value.var to override.
str(data)
## 'data.frame':    8862 obs. of  3 variables:
##  $ id   : int  50 52 56 50 52 56 50 52 56 50 ...
##  $ date : int  20100104 20100104 20100104 20100105 20100105 20100105 20100106 20100106 20100106 20100107 ...
##  $ close: num  37.6 23.2 11.9 37.6 23.2 ...
str(data1)
## 'data.frame':    2954 obs. of  4 variables:
##  $ date: int  20100104 20100105 20100106 20100107 20100108 20100111 20100112 20100113 20100114 20100115 ...
##  $ 50  : num  37.6 37.6 38.3 38.2 38.4 ...
##  $ 52  : num  23.2 23.2 23.7 23.5 23.4 ...
##  $ 56  : num  11.9 11.9 12.1 12 12.1 ...
dat <- data1 %>% mutate(year= as.numeric(substr(date,1,4))) %>% filter(year>= 2010 & year <=2020)
apply(is.na(dat),2,sum) 
## date   50   52   56 year 
##    0    0    0    0    0
str(dat) 
## 'data.frame':    2710 obs. of  5 variables:
##  $ date: int  20100104 20100105 20100106 20100107 20100108 20100111 20100112 20100113 20100114 20100115 ...
##  $ 50  : num  37.6 37.6 38.3 38.2 38.4 ...
##  $ 52  : num  23.2 23.2 23.7 23.5 23.4 ...
##  $ 56  : num  11.9 11.9 12.1 12 12.1 ...
##  $ year: num  2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
etf.xts <- xts(dat[,-1],order.by= as.Date(as.character(dat$date),format='%Y%m%d')) 

##Conclusion

Finally, we display 6 first columns of the data

final <- etf.xts[,-4]
head(final,6)
##                 50      52      56
## 2010-01-04 37.5723 23.1720 11.9283
## 2010-01-05 37.5723 23.2037 11.8775
## 2010-01-06 38.2705 23.7433 12.0552
## 2010-01-07 38.1708 23.4894 11.9537
## 2010-01-08 38.4035 23.4195 12.0805
## 2010-01-11 38.5033 23.5529 12.2074