ECO 4051 Financial Econometrics
Assignment 1: Analyzing financial data in R
Q1: downloading the data and analyzing prices
getSymbols("PKI", from="1990-01-01")[1] "PKI"
price<-Ad(to.monthly(PKI))
length(price)[1] 388
periodicity(price)Monthly periodicity from Jan 1990 to Apr 2022
head(price, 5) PKI.Adjusted
Jan 1990 5.414520
Feb 1990 5.452516
Mar 1990 5.680496
Apr 1990 5.537663
May 1990 6.053236
tail(price, 5) PKI.Adjusted
Dec 2021 200.9804
Jan 2022 172.1700
Feb 2022 179.6100
Mar 2022 174.4600
Apr 2022 172.4100
plot(price, ylab="price")plot(log(Ad(price)), col="red", ylab="log.price")log price and price
- log price vs price:
- the log price captures the percentage changes in prices whiles the price plot doesn’t account for percentage changes.
- Both plots show that there was a decrease in prices from January 2002 to approximately 2004 but the log price plot shows a detailed magnitude of the decrease in percentage terms unlike the price plot which doesn’t show the true extent of the decrease.
Q2: creating the returns and calculating the descriptive statistics
log.ret<-100*diff(log(price))
basicStats(log.ret) PKI.Adjusted
nobs 388.000000
NAs 1.000000
Minimum -36.362227
Maximum 34.120887
1. Quartile -3.438049
3. Quartile 6.039423
Mean 0.894261
Median 1.722185
Sum 346.079115
SE Mean 0.494124
LCL Mean -0.077250
UCL Mean 1.865772
Variance 94.489288
Stdev 9.720560
Skewness -0.560986
Kurtosis 2.243362
plot(log.ret, col="slateblue4", alpha=0.8, ylab= "stock return")Summary Stats for log returns
There are 386 monthly observations with 1 missing value on the first row resulting from the returns calculation.
The stock price had it’s highest monthly return (34.120890%) in July, 2002 and it faced it’s lowest (-36.36%) in August, 2000.
1st Quartile shows that 25% of the returns fell below the median -3.528222 and 3rd Quartile indicates that 75% of the returns fell below the median 6.116376
The mean is the average monthly return of 0.901607%
A median of 1.729741 means that more than 193 months had returns above or below the return of 1.729741.
Sum is the total monthly returns across the many years
The standard error(SE mean) shows what the average return would be (0.496422) if we were to take a sample from the population.
The lower control limit (LCL) is the “out of control” mean value -0.074438 at which if the average return goes beyond would trigger a halt in the stock’s trade since it would be (3 standard deviations) far below the mean return.
The upper control limit (UCL) is also the “out of control” mean value of 1.877651 at which if the average return goes beyond would trigger a halt in the stock’s trade since the return would be (3 standard deviations) far above the mean return.
The variance (squared deviation) show that each return is far from the average return by 94.877278 which being high indicates that the stock’s returns vary a lot on a monthly basis.
The standard deviation (square root of the variance) shows more precisely how far the returns are from the average return by 9.740497% which means there’s more volatility in the stock return’s movement.
Skewness of -0.563156 indicates that the returns distribution is slightly skewed to the left.
Kurtosis of 2.228432 signals that there’s a little probability of getting extreme returns it’s less than 3 (kurtosis value of a normal distribution).
Log returns for PKI’s adjusted closing prices
Q3: Plotting
date<-time(price)
p1<-ggplot(price, aes(date, PKI.Adjusted))+geom_point(alpha=0.8, color="hotpink4")+
labs(x="year", y="stock price", caption="source: YAHOO", subtitle="Montlhy Basis")+theme_bw()
ggplotly(p1)p2<-ggplot(log.ret, aes(date, PKI.Adjusted))+geom_line(color="darkslategrey")+
labs(x="year", y="stock returns", caption="source: Yahoo", subtitle="Monthly Basis")
ggplotly(p2)p3<-ggplot(log.ret, aes(PKI.Adjusted))+geom_histogram(aes(y=..density..), bins=50, color="red", fill="grey45")+
stat_function(fun=dnorm, color="darkorchid",size=1.5, args=list(0.901607, 9.740497))+
labs(x="stock returns")+theme_bw()
ggplotly(p3)Discussion:
- The histogram of the returns distribution is approximately normal.
Q4: correlation
getSymbols("^GSPC", from="1990-01-01")[1] "^GSPC"
SP<-Ad(to.monthly(GSPC))
GSPC.ret<-100*diff(log(SP))
PKI.GSPC<-merge(log.ret, GSPC.ret)
cor(PKI.GSPC, use='complete.obs') PKI.Adjusted GSPC.Adjusted
PKI.Adjusted 1.0000000 0.5001094
GSPC.Adjusted 0.5001094 1.0000000
PKIGSPC.df<-data.frame(Date=time(PKI.GSPC), coredata(PKI.GSPC))
p4<-ggplot(PKIGSPC.df, aes(log.ret, GSPC.ret))+geom_point(alpha=0.8, color="grey66")+geom_smooth(method="lm", se=FALSE)+labs(x="PKI returns", y="S&P500 returns")+
geom_vline(xintercept=0)+geom_hline(yintercept=0)+theme_gray()
ggplotly(p4)Correlation Discussion
The positive correlation coefficient of 0.50 indicates that the PKI stock returns is related (move in the same direction) to the S&P500: when the S&P500 goes up, the Stock returns go up as well and vice versa.
When PKI had postive returns, the S&P500 returns were positive as well and vice versa
The points are more clustered in the center with the linear line running through, but I would like to run a non-linear regression on it to see how best a non-linear line would fit.
Residual variation appears to be heteroskedastic