FINANCIAL
ECONOMETRICS
Homework 4
Homework 4
1 INCLASS - LAB
1.1 Lab 1
First, we import the file ‘UKHP.xls’ into RStudio.
library(readxl)
UKHP <- read_excel ("C:/Users/MyNgo/Documents/Fecoooo/UKHP.xls ", col_types = c ("date"
, "numeric") )
View(UKHP)Then we rename the second column of “UKHP” from “Average House Price” to “hp”
names(UKHP)[2]="hp"
UKHP## # A tibble: 327 × 2
## Month hp
## <dttm> <dbl>
## 1 1991-01-01 00:00:00 53052.
## 2 1991-02-01 00:00:00 53497.
## 3 1991-03-01 00:00:00 52893.
## 4 1991-04-01 00:00:00 53677.
## 5 1991-05-01 00:00:00 54386.
## 6 1991-06-01 00:00:00 55107.
## 7 1991-07-01 00:00:00 54541.
## 8 1991-08-01 00:00:00 54041.
## 9 1991-09-01 00:00:00 53259.
## 10 1991-10-01 00:00:00 53467.
## # … with 317 more rows
Giving some statistical summary
summary(UKHP)## Month hp
## Min. :1991-01-01 00:00:00.00 Min. : 49602
## 1st Qu.:1997-10-16 12:00:00.00 1st Qu.: 61654
## Median :2004-08-01 00:00:00.00 Median :150946
## Mean :2004-07-31 17:54:29.72 Mean :124660
## 3rd Qu.:2011-05-16 12:00:00.00 3rd Qu.:169239
## Max. :2018-03-01 00:00:00.00 Max. :211756
We create a new column which is “dhp” displaying the Simple Return of “United Kingdom Housing Price”.
UKHP$dhp = c(NA, 100*diff(UKHP$hp)/UKHP$hp[1:nrow(UKHP)-1])
UKHP## # A tibble: 327 × 3
## Month hp dhp
## <dttm> <dbl> <dbl>
## 1 1991-01-01 00:00:00 53052. NA
## 2 1991-02-01 00:00:00 53497. 0.839
## 3 1991-03-01 00:00:00 52893. -1.13
## 4 1991-04-01 00:00:00 53677. 1.48
## 5 1991-05-01 00:00:00 54386. 1.32
## 6 1991-06-01 00:00:00 55107. 1.33
## 7 1991-07-01 00:00:00 54541. -1.03
## 8 1991-08-01 00:00:00 54041. -0.917
## 9 1991-09-01 00:00:00 53259. -1.45
## 10 1991-10-01 00:00:00 53467. 0.389
## # … with 317 more rows
Then we plot the house price series
par(cex.axis = 1, cex.lab = 1, lwd = 1)
plot(UKHP$Month , UKHP$hp , type = 'l', xlab="Date", ylab="Average House Price", col ='red')Also, we give the histogram of the house price
Also, we give the histogram of Simple Return of House Price
We can see that the return having the normal distribution.
1.2 Lab 2
Here is our SandPhedge.xls
Transform the ‘spot’ and ‘future’ price series into percentage returns
SandPhedge$rspot = c(NA,100*diff(log(SandPhedge$Spot)))
SandPhedge$rfutures = c(NA,100*diff(log(SandPhedge$Futures)))
print(SandPhedge)## # A tibble: 247 × 5
## Date Spot Futures rspot rfutures
## <dttm> <dbl> <dbl> <dbl> <dbl>
## 1 1997-09-01 00:00:00 947. 954. NA NA
## 2 1997-10-01 00:00:00 915. 924 -3.51 -3.25
## 3 1997-11-01 00:00:00 955. 955 4.36 3.30
## 4 1997-12-01 00:00:00 970. 979. 1.56 2.51
## 5 1998-01-01 00:00:00 980. 988. 1.01 0.864
## 6 1998-02-01 00:00:00 1049. 1050. 6.81 6.16
## 7 1998-03-01 00:00:00 1102. 1110. 4.87 5.55
## 8 1998-04-01 00:00:00 1112. 1119. 0.904 0.785
## 9 1998-05-01 00:00:00 1091. 1091. -1.90 -2.58
## 10 1998-06-01 00:00:00 1134. 1143 3.87 4.68
## # … with 237 more rows
#Extract two columns
print(SandPhedge[c("rspot","rfutures")])## # A tibble: 247 × 2
## rspot rfutures
## <dbl> <dbl>
## 1 NA NA
## 2 -3.51 -3.25
## 3 4.36 3.30
## 4 1.56 2.51
## 5 1.01 0.864
## 6 6.81 6.16
## 7 4.87 5.55
## 8 0.904 0.785
## 9 -1.90 -2.58
## 10 3.87 4.68
## # … with 237 more rows
Giving some statistical summary
summary(SandPhedge[c("rspot","rfutures")])## rspot rfutures
## Min. :-18.5636 Min. :-18.9447
## 1st Qu.: -1.8314 1st Qu.: -1.9314
## Median : 0.9185 Median : 0.9976
## Mean : 0.4168 Mean : 0.4140
## 3rd Qu.: 3.2765 3rd Qu.: 3.1336
## Max. : 10.2307 Max. : 10.3872
## NA's :1 NA's :1
# create a scatterplot of rspot against rfutures
plot(SandPhedge$rfutures, SandPhedge$rspot, col = 'blue',
xlab = 'Future Contract Return', ylab = 'Stock Return',
main = 'Future Contract Return vs Stock Return', pch = 19)
# add the regression line to the plot
lm_returns <- lm(rspot ~ rfutures, data = SandPhedge)
alpha <- lm_returns$coefficients[1]
beta <- lm_returns$coefficients[2]
lines(SandPhedge$rfutures, alpha + beta * SandPhedge$rfutures, lwd = 2, col = 'red')summary(lm_returns)##
## Call:
## lm(formula = rspot ~ rfutures, data = SandPhedge)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.45284 -0.16401 0.00236 0.23692 2.33789
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.013077 0.029473 0.444 0.658
## rfutures 0.975077 0.006654 146.543 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4602 on 244 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.9888, Adjusted R-squared: 0.9887
## F-statistic: 2.147e+04 on 1 and 244 DF, p-value: < 2.2e-16
\[rfutures=0.013007+0.975007⋅rspot\]
#Hypothesis Testing with Null Hypothesis beta = 1
linearHypothesis(lm_returns ,c("rfutures=1"))## Linear hypothesis test
##
## Hypothesis:
## rfutures = 1
##
## Model 1: restricted model
## Model 2: rspot ~ rfutures
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 245 54.656
## 2 244 51.684 1 2.9718 14.03 0.0002246 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since the corresponding p-value is 0.0002246, as stated in the last column. As it is considerably smaller than 0.05, we can reject the null hypothesis that the coefficient estimate is equal to 1 at the 5% significant level
# Create a scatter plot of rspot against rfutures
plot(SandPhedge$Futures, SandPhedge$Spot, col = 'blue',
xlab = 'Future Contract Return', ylab = 'Stock Return',
main = 'Future Contract Return vs Stock Return', pch = 19)
# Fit a linear regression model of rspot on rfutures
lm_prices <- lm(Spot ~ Futures, data = SandPhedge)
alpha <- lm_prices$coefficients[1]
beta <- lm_prices$coefficients[2]
# Add the regression line to the plot
abline(alpha, beta, lwd = 2, col = 'red')summary(lm_prices)##
## Call:
## lm(formula = Spot ~ Futures, data = SandPhedge)
##
## Residuals:
## Min 1Q Median 3Q Max
## -64.576 -1.996 1.436 4.309 16.612
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.8378335 1.4889725 -1.906 0.0578 .
## Futures 1.0016065 0.0009993 1002.331 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.908 on 245 degrees of freedom
## Multiple R-squared: 0.9998, Adjusted R-squared: 0.9998
## F-statistic: 1.005e+06 on 1 and 245 DF, p-value: < 2.2e-16
\[Futures=−2.8378335+1.0016065Spot\]
Then we will test the hypothesis with null is β=1
linearHypothesis(lm_prices ,c("Futures=1"))## Linear hypothesis test
##
## Hypothesis:
## Futures = 1
##
## Model 1: restricted model
## Model 2: Spot ~ Futures
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 246 11816
## 2 245 11693 1 123.35 2.5846 0.1092
With an F-statistic of 2.58 and a corresponding p-value of 0.1092, we find that the null hypothesis is not rejected at the 5% significance level*.
1.3 Lab 3
Here is the capm.xls
## # A tibble: 194 × 7
## Date SANDP FORD GE MICROSOFT ORACLE USTB3M
## <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2002-01-01 00:00:00 1130. 15.3 37.2 31.9 17.3 1.68
## 2 2002-02-01 00:00:00 1107. 14.9 38.5 29.2 16.6 1.76
## 3 2002-03-01 00:00:00 1147. 16.5 37.4 30.2 12.8 1.83
## 4 2002-04-01 00:00:00 1077. 16 31.5 26.1 10.0 1.75
## 5 2002-05-01 00:00:00 1067. 17.6 31.1 25.5 7.92 1.76
## 6 2002-06-01 00:00:00 990. 16 29.0 27.4 9.47 1.73
## 7 2002-07-01 00:00:00 912. 13.5 32.2 24.0 10.0 1.71
## 8 2002-08-01 00:00:00 916. 11.8 30.2 24.5 9.59 1.65
## 9 2002-09-01 00:00:00 815. 9.8 24.6 21.9 7.86 1.66
## 10 2002-10-01 00:00:00 886. 8.46 25.2 26.7 10.2 1.61
## # … with 184 more rows
#Summary statistics
summary(capm)## Date SANDP FORD
## Min. :2002-01-01 00:00:00.00 Min. : 729.6 Min. : 1.870
## 1st Qu.:2006-01-08 18:00:00.00 1st Qu.:1127.2 1st Qu.: 8.527
## Median :2010-01-16 12:00:00.00 Median :1323.5 Median :11.740
## Mean :2010-01-15 09:46:23.50 Mean :1468.9 Mean :11.430
## 3rd Qu.:2014-01-24 06:00:00.00 3rd Qu.:1854.7 3rd Qu.:14.168
## Max. :2018-02-01 00:00:00.00 Max. :2816.4 Max. :17.650
## GE MICROSOFT ORACLE USTB3M
## Min. : 8.51 Min. :16.15 Min. : 7.86 Min. :0.010
## 1st Qu.:20.72 1st Qu.:25.87 1st Qu.:13.95 1st Qu.:0.080
## Median :26.95 Median :28.35 Median :22.71 Median :0.480
## Mean :26.59 Mean :34.75 Mean :25.97 Mean :1.241
## 3rd Qu.:32.70 3rd Qu.:38.27 3rd Qu.:36.48 3rd Qu.:1.725
## Max. :41.40 Max. :95.01 Max. :51.59 Max. :5.160
#Continuously Compounded Returns for the S&P500 index, FORD, GE, MICROSOFT, ORACLE,
capm$rsandp = c(NA,100*diff(log(capm$SANDP)))
capm$rford = c(NA,100*diff(log(capm$FORD)))
capm$rge = c(NA,100*diff(log(capm$GE)))
capm$rmsoft = c(NA,100*diff(log(capm$MICROSOFT)))
capm$roracle = c(NA,100*diff(log(capm$ORACLE)))
capm## # A tibble: 194 × 12
## Date SANDP FORD GE MICROSOFT ORACLE USTB3M rsandp rford
## <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2002-01-01 00:00:00 1130. 15.3 37.2 31.9 17.3 1.68 NA NA
## 2 2002-02-01 00:00:00 1107. 14.9 38.5 29.2 16.6 1.76 -2.10 -2.78
## 3 2002-03-01 00:00:00 1147. 16.5 37.4 30.2 12.8 1.83 3.61 10.3
## 4 2002-04-01 00:00:00 1077. 16 31.5 26.1 10.0 1.75 -6.34 -3.02
## 5 2002-05-01 00:00:00 1067. 17.6 31.1 25.5 7.92 1.76 -0.912 9.81
## 6 2002-06-01 00:00:00 990. 16 29.0 27.4 9.47 1.73 -7.52 -9.81
## 7 2002-07-01 00:00:00 912. 13.5 32.2 24.0 10.0 1.71 -8.23 -17.2
## 8 2002-08-01 00:00:00 916. 11.8 30.2 24.5 9.59 1.65 0.487 -13.5
## 9 2002-09-01 00:00:00 815. 9.8 24.6 21.9 7.86 1.66 -11.7 -18.3
## 10 2002-10-01 00:00:00 886. 8.46 25.2 26.7 10.2 1.61 8.29 -14.7
## # … with 184 more rows, and 3 more variables: rge <dbl>, rmsoft <dbl>,
## # roracle <dbl>
#Transforming the T-bill yields into monthly figures (Risk-free rate\)
capm$USTB3M = capm$USTB3M/12
#Subtracting the Risk-Free Rate to obatain the Excess Return
capm$ersandp = capm$rsandp - capm$USTB3M
capm$erford = capm$rford - capm$USTB3M
capm$erge = capm$rge - capm$USTB3M
capm$ermsoft = capm$rmsoft - capm$USTB3M
capm$eroracle = capm$roracle - capm$USTB3M
capm## # A tibble: 194 × 17
## Date SANDP FORD GE MICROSOFT ORACLE USTB3M rsandp rford
## <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2002-01-01 00:00:00 1130. 15.3 37.2 31.9 17.3 0.14 NA NA
## 2 2002-02-01 00:00:00 1107. 14.9 38.5 29.2 16.6 0.147 -2.10 -2.78
## 3 2002-03-01 00:00:00 1147. 16.5 37.4 30.2 12.8 0.152 3.61 10.3
## 4 2002-04-01 00:00:00 1077. 16 31.5 26.1 10.0 0.146 -6.34 -3.02
## 5 2002-05-01 00:00:00 1067. 17.6 31.1 25.5 7.92 0.147 -0.912 9.81
## 6 2002-06-01 00:00:00 990. 16 29.0 27.4 9.47 0.144 -7.52 -9.81
## 7 2002-07-01 00:00:00 912. 13.5 32.2 24.0 10.0 0.142 -8.23 -17.2
## 8 2002-08-01 00:00:00 916. 11.8 30.2 24.5 9.59 0.137 0.487 -13.5
## 9 2002-09-01 00:00:00 815. 9.8 24.6 21.9 7.86 0.138 -11.7 -18.3
## 10 2002-10-01 00:00:00 886. 8.46 25.2 26.7 10.2 0.134 8.29 -14.7
## # … with 184 more rows, and 8 more variables: rge <dbl>, rmsoft <dbl>,
## # roracle <dbl>, ersandp <dbl>, erford <dbl>, erge <dbl>, ermsoft <dbl>,
## # eroracle <dbl>
Before running the CAPM regression, we plot the data series to examine whether they appear to move together. We do this for the S&P500 and the Ford series. The following two lines will produce the graph in figure below.
In order to get an idea about the association between two series, a scatter plot might be more informative.
We see from this scatter plot that there appears to be a weak
association between ersandp and erford. We can also create similar
scatter plots for the other data series and the S&P500. For the case
of the Ford stock, the CAPM regression equation takes the form \[RFord−rf=α+β(RM−rf)+ϵ\] Here, the
dependent variable (y) is the excess return of Ford “erford” and it is
regressed on a constant as well as the excess market return “ersandp”.
Hence, specify the CAPM equation regression as
lm_capm = lm(erford ~ ersandp, data = capm)
summary(lm_capm)##
## Call:
## lm(formula = erford ~ ersandp, data = capm)
##
## Residuals:
## Min 1Q Median 3Q Max
## -50.727 -5.027 -1.080 3.482 65.145
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.9560 0.7931 -1.205 0.23
## ersandp 1.8898 0.1916 9.862 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.98 on 191 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.3374, Adjusted R-squared: 0.3339
## F-statistic: 97.26 on 1 and 191 DF, p-value: < 2.2e-16
Then we will test the hypothesis with null is β=1
linearHypothesis(lm_capm ,c("ersandp=1"))## Linear hypothesis test
##
## Hypothesis:
## ersandp = 1
##
## Model 1: restricted model
## Model 2: erford ~ ersandp
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 192 25618
## 2 191 23020 1 2598.5 21.56 6.365e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The F -statistic of 21.56 with a corresponding p-value of 0.000006 implies that the null hypothesis of the CAPM beta of the Ford stock being 1 is convincingly rejected and hence the estimated beta of 1.89 is significantly different from 1.
2 FPT and VNINDEX
In this section, we will use FPT and VnIndex as follows:
## # A tibble: 977 × 2
## Date vnindex_price
## <chr> <dbl>
## 1 14-02-2023 1044.
## 2 13-02-2023 1055.
## 3 10-02-2023 1064.
## 4 09-02-2023 1072.
## 5 08-02-2023 1066.
## 6 07-02-2023 1089.
## 7 06-02-2023 1077.
## 8 03-02-2023 1078.
## 9 02-02-2023 1076.
## 10 01-02-2023 1111.
## # … with 967 more rows
## # A tibble: 977 × 2
## Date fpt_price
## <chr> <dbl>
## 1 14-02-2023 80.7
## 2 13-02-2023 80.7
## 3 10-02-2023 80.7
## 4 09-02-2023 81.2
## 5 08-02-2023 80.7
## 6 07-02-2023 80.5
## 7 06-02-2023 80.1
## 8 03-02-2023 81.9
## 9 02-02-2023 82
## 10 01-02-2023 83.5
## # … with 967 more rows
FPT$srFPT = c(NA, 100*diff(FPT$fpt_price)/FPT$fpt_price[1:nrow(FPT)-1])
Riskfree = 0.03962 #VietNam 5-years Goverment Bond
FPT$rFPT = c(NA, 100*diff(log(FPT$fpt_price))) - Riskfree #Return of Asset (Rm-Rf=Ra)
VNINDEX$srVNINDEX = c(NA, 100*diff(VNINDEX$vnindex_price)/VNINDEX$vnindex_price[1:nrow(VNINDEX)-1])
VNINDEX$rVNINDEX = c(NA, 100*diff(log(VNINDEX$vnindex_price))) - Riskfree #Return of Asset (Rm-Rf=Ra)
data = merge(VNINDEX, FPT, by = 'Date') #VNINDEX~x; FPT~y
head(data,10)## Date vnindex_price srVNINDEX rVNINDEX fpt_price srFPT
## 1 01-02-2021 1056.61 2.0376433 1.9775411 43.59 -1.2460353
## 2 01-02-2023 1111.18 3.2723961 3.1803733 83.50 1.8292683
## 3 01-03-2021 1168.47 -1.4921976 -1.5430629 52.97 -2.1791320
## 4 01-03-2022 1490.13 -0.5771361 -0.6184279 76.16 -0.1049318
## 5 01-04-2019 980.76 -0.7860156 -0.8287410 23.10 -0.6878762
## 6 01-04-2020 662.53 -2.6020611 -2.6761336 23.88 -2.8478438
## 7 01-04-2021 1194.59 -1.7687690 -1.8242187 54.02 -2.5085725
## 8 01-04-2022 1489.48 -1.7778481 -1.8334617 87.34 -3.6088732
## 9 01-06-2020 864.47 -1.6160788 -1.6688998 32.64 -0.6997262
## 10 01-06-2021 1328.05 -0.7273244 -0.7696023 68.06 -3.2551528
## rFPT
## 1 -1.2934835
## 2 1.7731185
## 3 -2.2428458
## 4 -0.1446069
## 5 -0.7298730
## 6 -2.9288016
## 7 -2.5801935
## 8 -3.7152234
## 9 -0.7418058
## 10 -3.3489314
summary(data)## Date vnindex_price srVNINDEX rVNINDEX
## Length:977 Min. : 659.2 Min. :-4.75399 Min. :-4.91033
## Class :character 1st Qu.: 959.9 1st Qu.:-0.67244 1st Qu.:-0.71433
## Mode :character Median :1046.3 Median :-0.10827 Median :-0.14795
## Mean :1113.1 Mean : 0.00545 Mean :-0.04333
## 3rd Qu.:1284.1 3rd Qu.: 0.45005 3rd Qu.: 0.40942
## Max. :1528.6 Max. : 7.15179 Max. : 6.86800
## NA's :1 NA's :1
## fpt_price srFPT rFPT
## Min. :22.55 Min. :-6.5382 Min. :-6.8014
## 1st Qu.:32.53 1st Qu.:-0.9983 1st Qu.:-1.0429
## Median :52.97 Median :-0.1123 Median :-0.1520
## Mean :54.30 Mean :-0.1110 Mean :-0.1673
## 3rd Qu.:77.06 3rd Qu.: 0.7104 3rd Qu.: 0.6683
## Max. :95.18 Max. : 7.5110 Max. : 7.2027
## NA's :1 NA's :1
So we get:
Maximum Simple Return of FPT :7.5110
Maximum Simple Return of VNIndex :7.15179
Minimm Simple Return of FPT :-6.5382
Minimum Simple Return of VNIndex :-4.75399
lm_data = lm(fpt_price ~ vnindex_price, data = data)
summary(lm_data)##
## Call:
## lm(formula = fpt_price ~ vnindex_price, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.879 -10.139 -3.840 4.424 33.160
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -41.955347 2.365651 -17.73 <2e-16 ***
## vnindex_price 0.086475 0.002088 41.42 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.88 on 975 degrees of freedom
## Multiple R-squared: 0.6377, Adjusted R-squared: 0.6373
## F-statistic: 1716 on 1 and 975 DF, p-value: < 2.2e-16
Then we will test the hypothesis with null is β=1
linearHypothesis (lm_data, c("vnindex_price=1"))## Linear hypothesis test
##
## Hypothesis:
## vnindex_price = 1
##
## Model 1: restricted model
## Model 2: fpt_price ~ vnindex_price
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 976 37102042
## 2 975 187942 1 36914100 191502 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since the p-value (Pr(>F)) is less than 2.2e-16 (which is less than any commonly used significance level), we can conclude that there is strong evidence to reject the null hypothesi that the coefficient for vnindex_price is equal to 1. From the summary earlier, β = 0.68210 <1 implies β is DEFENSIVE