title: “Quiz” author: “Nguyen Bao Quynh Trang” date: “2022-10-26” output: html_document
First, we input the data ETF by function “read.table”. Then using “view” to display the data. We name the data as “data” for short.
data <- read.table('~/data/dataquiz.txt', header= TRUE)
head(data)
## 證券代碼 簡稱 年月日 收盤價.元.
## 1 50 ????50 20100104 37.5723
## 2 52 ???? 20100104 23.1720
## 3 56 ????? 20100104 11.9283
## 4 50 ????50 20100105 37.5723
## 5 52 ???? 20100105 23.2037
## 6 56 ????? 20100105 11.8775
Then, we clean the raw data:
data <- data[,-2]
as.data.table(data)
## 證券代碼 年月日 收盤價.元.
## 1: 50 20100104 37.5723
## 2: 52 20100104 23.1720
## 3: 56 20100104 11.9283
## 4: 50 20100105 37.5723
## 5: 52 20100105 23.2037
## ---
## 8858: 52 20211229 122.8468
## 8859: 56 20211229 30.8785
## 8860: 50 20211230 140.1584
## 8861: 52 20211230 122.8010
## 8862: 56 20211230 30.8510
colnames(data) <- c('id','date','close')
data1 <- reshape2::dcast(data,date~id)
## Using close as value column: use value.var to override.
str(data)
## 'data.frame': 8862 obs. of 3 variables:
## $ id : int 50 52 56 50 52 56 50 52 56 50 ...
## $ date : int 20100104 20100104 20100104 20100105 20100105 20100105 20100106 20100106 20100106 20100107 ...
## $ close: num 37.6 23.2 11.9 37.6 23.2 ...
str(data1)
## 'data.frame': 2954 obs. of 4 variables:
## $ date: int 20100104 20100105 20100106 20100107 20100108 20100111 20100112 20100113 20100114 20100115 ...
## $ 50 : num 37.6 37.6 38.3 38.2 38.4 ...
## $ 52 : num 23.2 23.2 23.7 23.5 23.4 ...
## $ 56 : num 11.9 11.9 12.1 12 12.1 ...
dat <- data1 %>% mutate(year= as.numeric(substr(date,1,4))) %>% filter(year>= 2010 & year <=2020)
apply(is.na(dat),2,sum)
## date 50 52 56 year
## 0 0 0 0 0
str(dat)
## 'data.frame': 2710 obs. of 5 variables:
## $ date: int 20100104 20100105 20100106 20100107 20100108 20100111 20100112 20100113 20100114 20100115 ...
## $ 50 : num 37.6 37.6 38.3 38.2 38.4 ...
## $ 52 : num 23.2 23.2 23.7 23.5 23.4 ...
## $ 56 : num 11.9 11.9 12.1 12 12.1 ...
## $ year: num 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
etf.xts <- xts(dat[,-1],order.by= as.Date(as.character(dat$date),format='%Y%m%d'))
##Conclusion
Finally, we display 6 first columns of the data
final <- etf.xts[,-4]
head(final,6)
## 50 52 56
## 2010-01-04 37.5723 23.1720 11.9283
## 2010-01-05 37.5723 23.2037 11.8775
## 2010-01-06 38.2705 23.7433 12.0552
## 2010-01-07 38.1708 23.4894 11.9537
## 2010-01-08 38.4035 23.4195 12.0805
## 2010-01-11 38.5033 23.5529 12.2074