The hypothesis test for the 4 stocks is that the NULL = \(\beta\) = 0
#Problem one
require(mosaic)
require(MASS)
require(openintro)
setwd("C:\\Users\\J\\Desktop\\R-stuffs\\data-files")
#READ IN DATAFRAME
capm = read.csv("capm.csv")
#CONVERT USTB3M TO DECIMAL
RF = capm$USTB3M/100
#CONVERT PRICE TO CONTINUOUSLY COMPOUNDED RETURNS
SANDP_RETURNS = diff(log(capm$SANDP))
FORD_RETURNS = diff(log(capm$FORD))
GE_RETURNS = diff(log(capm$GE))
MSOFT_RETURNS = diff(log(capm$MICROSOFT))
ORACLE_RETURNS = diff(log(capm$ORACLE))
#GET RM IN EXCESS RETURNS
SANDP_RETURNS = SANDP_RETURNS - RF
FORD_RETURNS = FORD_RETURNS - RF
GE_RETURNS = GE_RETURNS - RF
MSOFT_RETURNS = MSOFT_RETURNS - RF
ORACLE_RETURNS = ORACLE_RETURNS - RF
#CREATE DATA FRAMES
FORD.DATA = data.frame(SANDP_RETURNS = SANDP_RETURNS, FORD_RETURNS = FORD_RETURNS )
GE.DATA = data.frame(SANDP_RETURNS = SANDP_RETURNS, GE_RETURNS = GE_RETURNS )
MSOFT.DATA = data.frame(SANDP_RETURNS = SANDP_RETURNS, MSOFT_RETURNS = MSOFT_RETURNS )
ORACLE.DATA = data.frame(SANDP_RETURNS = SANDP_RETURNS, oRACLE_RETURNS = ORACLE_RETURNS)
significance.level = c(.05, 0.025, .005)
t_crit = qt(1-significance.level, df = 134)
90% = 1.656, 95% = 1.978, 99% = 2.613
#FORD
FORDFIT = lm(FORD_RETURNS ~ SANDP_RETURNS, data = FORD.DATA)
summary(FORDFIT)
Residuals:
Min 1Q Median 3Q Max
-0.49646 -0.06669 -0.00409 0.05178 0.62584
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.01166 0.01117 1.044 0.298
SANDP_RETURNS 1.97179 0.21822 9.036 1.58e-15 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1253 on 134 degrees of freedom
Multiple R-squared: 0.3786, Adjusted R-squared: 0.374
F-statistic: 81.65 on 1 and 134 DF, p-value: 1.58e-15
Recall that the critical values are 90% = 1.656, 95% = 1.978, and 99% = 2.613. In Fords case, the T-value For \(\beta\) is 9.063. The null dos not fit into the 90%, 95% or 99% confidence interval, so in this case we reject the NULL Hypothesis. However, the \(\alpha\) value is 1.044, so we fail to reject \(\alpha\) at any of the 3 confidence intervals.
#GE
GEFIT = lm(GE_RETURNS ~ SANDP_RETURNS, data = GE.DATA)
summary(GEFIT)
Residuals:
Min 1Q Median 3Q Max
-0.176746 -0.028076 -0.002496 0.037119 0.214075
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.001015 0.005120 0.198 0.843
SANDP_RETURNS 1.295586 0.099992 12.957 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.05741 on 134 degrees of freedom
Multiple R-squared: 0.5561, Adjusted R-squared: 0.5528
F-statistic: 167.9 on 1 and 134 DF, p-value: < 2.2e-16
Recall that the critical values are 90% = 1.656, 95% = 1.978 and 99% = 2.613. In Fords case, the T-value For \(\beta\) is 12.957. The null dos not fit into the 90%, 95% or 99% confidence interval so in this case we reject the NULL Hypothesis. However, the \(\alpha\) value is 0.198, so we fail to reject \(\alpha\) at any of the 3 confidence intervals.
#MSOFT
MSOFTFIT = lm(MSOFT_RETURNS ~ SANDP_RETURNS, data = MSOFT.DATA)
summary(MSOFTFIT)
Residuals:
Min 1Q Median 3Q Max
-0.144113 -0.036321 -0.002089 0.029645 0.212258
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.001683 0.004811 -0.35 0.727
SANDP_RETURNS 0.996292 0.093957 10.60 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.05394 on 134 degrees of freedom
Multiple R-squared: 0.4563, Adjusted R-squared: 0.4522
F-statistic: 112.4 on 1 and 134 DF, p-value: < 2.2e-16
Recall that the critical values are 90% = 1.656, 95% = 1.978 and 99% = 2.613. In Fords case, the T-value For \(\beta\) is 10.60. The null dos not fit into the 90%, 95% or 99% confidence interval so in this case we reject the NULL Hypothesis. However, the \(\alpha\) value is -0.35, so we fail to reject \(\alpha\) at any of the 3 confidence intervals.
ORACLEFIT = lm(ORACLE_RETURNS ~ SANDP_RETURNS, data = ORACLE.DATA)
summary(ORACLEFIT)
Residuals:
Min 1Q Median 3Q Max
-0.301732 -0.038969 0.002515 0.036819 0.255207
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.003060 0.006496 0.471 0.638
SANDP_RETURNS 1.052564 0.126883 8.296 1.01e-13 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.07285 on 134 degrees of freedom
Multiple R-squared: 0.3393, Adjusted R-squared: 0.3344
F-statistic: 68.82 on 1 and 134 DF, p-value: 1.015e-13
Recall that the critical values are 90% = 1.656, 95% = 1.978 and 99% = 2.613. In Fords case, the T-value For \(\beta\) is 8.296.The null dos not fit into the 90%, 95% or 99% confidence interval, so in this case we reject the NULL Hypothesis. However, the \(\alpha\) value is 0.471, so we fail to reject \(\alpha\) at any of the 3 confidence intervals.
We are testing the hypothisis of \(\beta\) = 1 by setting up the real beta to be close to 1 and running symulation
#Problem 2
require(mosaic)
require(MASS)
require(openintro)
setwd("C:\\Users\\J\\Desktop\\R-stuffs\\data-files")
set.seed(12345)
# set up the Parameters. this determines the PRF
T = 100
alpha = 0.0
beta = .97
x = runif(n=T)
Null = 1
#run the main simulation loop
M = 1000 #the number of simulations
beta.hat = rep(0, M)
for(i in 1:20)
{
size = M * i
u = rnorm(size)
x = runif(size)
y = alpha + beta*x+u
srf = lm(y ~ x ) #sample regression function
beta.hat.i = coef(srf)[2] #this stores the slope coef
}
#standared error
se.beta = coef(summary(srf))[, "Std. Error"][2]
#t-test
t.beta = (beta.hat.i - Null)/se.beta
significance.level = c(.05, 0.025, .005)
t_crit = qt(1-significance.level, df = T - 2)
#call values
summary(srf)
se.beta
t_crit
abs(t.beta)
print(c(i, size, as.character(abs(t.beta) > t_crit)))
#False = Accept
#True = Reject
#T = 1000
significance.level = c(.05, 0.025, .005)
t_crit = qt(1-significance.level, df = T - 2)
90% = 1.66 , 95% = 1.984 , 99% = 2.627
Residuals:
Min 1Q Median 3Q Max
-3.8500 -0.6720 -0.0035 0.6782 4.0064
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.002228 0.014141 0.158 0.875
x 0.961344 0.024529 39.191 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.003 on 19998 degrees of freedom
Multiple R-squared: 0.07133, Adjusted R-squared: 0.07128
F-statistic: 1536 on 1 and 19998 DF, p-value: < 2.2e-16
> se.beta
x
0.02452948
> t_crit
[1] 1.646382 1.962344 2.580765
> abs(t.beta)
x
1.575901
> print(c(i, size, as.character(abs(t.beta) > t_crit)))
[1] "20" "20000" "FALSE" "FALSE" "FALSE"
> #False = Fail to Reject
> #True = Reject
Recall that the critical values are 90% = 1.66, 95% = 1.984 and 99% = 2.627. In this case, the T-value is 1.576. We fail to reject at all of the confidence intervals.
#run the main simulation loop
M = 10000 #the number of simulations
Residuals:
Min 1Q Median 3Q Max
-4.6714 -0.6752 0.0011 0.6739 4.6111
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.002468 0.004470 0.552 0.581
x 0.969184 0.007730 125.372 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.9996 on 199998 degrees of freedom
Multiple R-squared: 0.07287, Adjusted R-squared: 0.07286
F-statistic: 1.572e+04 on 1 and 199998 DF, p-value: < 2.2e-16
> se.beta
x
0.007730443
> t_crit
[1] 1.646382 1.962344 2.580765
> abs(t.beta)
x
3.986294
> print(c(i, size, as.character(abs(t.beta) > t_crit)))
[1] "20" "2e+05" "TRUE" "TRUE" "TRUE"
> #False = Accept
> #True = Reject
Recall that the critical values are 90% = 1.661, 95% = 1.985 and 99% = 2.626.in this case, We reject the T-value of 3.986 at the 90%, 95% and 99% Confidence intervals.
T = 100
alpha = 0.0
beta = .98
x = runif(n=T)
Null = 1
Residuals:
Min 1Q Median 3Q Max
-3.8500 -0.6720 -0.0035 0.6782 4.0064
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.002228 0.014141 0.158 0.875
x 0.971344 0.024529 39.599 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.003 on 19998 degrees of freedom
Multiple R-squared: 0.07271, Adjusted R-squared: 0.07266
F-statistic: 1568 on 1 and 19998 DF, p-value: < 2.2e-16
> se.beta
x
0.02452948
> t_crit
[1] 1.646382 1.962344 2.580765
> abs(t.beta)
x
1.168228
> print(c(i, size, as.character(abs(t.beta) > t_crit)))
[1] "20" "20000" "FALSE" "FALSE" "FALSE"
> #False = Accept
> #True = Reject
Recall that the critical values are 90% = 1.66, 95% = 1.984 and 99% = 2.627. In this case, the T-value is 1.168. We fail to reject at all of the confidence intervals.
#run the main simulation loop
M = 10000 #the number of simulations
Residuals:
Min 1Q Median 3Q Max
-4.6714 -0.6752 0.0011 0.6739 4.6111
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.002468 0.004470 0.552 0.581
x 0.979184 0.007730 126.666 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.9996 on 199998 degrees of freedom
Multiple R-squared: 0.07426, Adjusted R-squared: 0.07426
F-statistic: 1.604e+04 on 1 and 199998 DF, p-value: < 2.2e-16
> se.beta
x
0.007730443
> t_crit
[1] 1.646382 1.962344 2.580765
> abs(t.beta)
x
2.692707
> print(c(i, size, as.character(abs(t.beta) > t_crit)))
[1] "20" "2e+05" "TRUE" "TRUE" "TRUE"
> #False = Accept
> #True = Reject
Recall that the critical values are 90% = 1.661, 95% = 1.985 and 99% = 2.626.in this case, We reject the T-value of 2.693 at the 90%, 95% and 99% Confidence intervals.
T = 100
alpha = 0.0
beta = .99
x = runif(n=T)
Null = 1
Residuals:
Min 1Q Median 3Q Max
-3.8500 -0.6720 -0.0035 0.6782 4.0064
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.002228 0.014141 0.158 0.875
x 0.981344 0.024529 40.007 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.003 on 19998 degrees of freedom
Multiple R-squared: 0.0741, Adjusted R-squared: 0.07406
F-statistic: 1601 on 1 and 19998 DF, p-value: < 2.2e-16
> se.beta
x
0.02452948
> t_crit
[1] 1.646382 1.962344 2.580765
> abs(t.beta)
x
0.7605556
> print(c(i, size, as.character(abs(t.beta) > t_crit)))
[1] "20" "20000" "FALSE" "FALSE" "FALSE"
> #False = Accept
> #True = Reject
Recall that the critical values are 90% = 1.66, 95% = 1.984 and 99% = 2.627. In this case, the T-value is .761. We fail to reject at all of the confidence intervals.
#run the main simulation loop
M = 10000 #the number of simulations
Residuals:
Min 1Q Median 3Q Max
-4.6714 -0.6752 0.0011 0.6739 4.6111
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.002468 0.004470 0.552 0.581
x 0.989184 0.007730 127.960 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.9996 on 199998 degrees of freedom
Multiple R-squared: 0.07567, Adjusted R-squared: 0.07567
F-statistic: 1.637e+04 on 1 and 199998 DF, p-value: < 2.2e-16
> se.beta
x
0.007730443
> t_crit
[1] 1.646382 1.962344 2.580765
> abs(t.beta)
x
1.39912
> print(c(i, size, as.character(abs(t.beta) > t_crit)))
[1] "20" "2e+05" "FALSE" "FALSE" "FALSE"
> #False = Accept
> #True = Reject
Recall that the critical values are 90% = 1.66, 95% = 1.984 and 99% = 2.627. In this case, the T-value is 1.399. We fail to reject at all of the confidence intervals.
#run the main simulation loop
M = 100000 #the number of simulations
Residuals:
Min 1Q Median 3Q Max
-4.9765 -0.6756 0.0004 0.6754 4.7521
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0004196 0.0014159 0.296 0.767
x 0.9881531 0.0024524 402.933 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.001 on 1999998 degrees of freedom
Multiple R-squared: 0.07508, Adjusted R-squared: 0.07508
F-statistic: 1.624e+05 on 1 and 1999998 DF, p-value: < 2.2e-16
> se.beta
x
0.002452398
> t_crit
[1] 1.646382 1.962344 2.580765
> abs(t.beta)=
x
4.830761
> print(c(i, size, as.character(abs(t.beta) > t_crit)))
[1] "20" "2e+06" "TRUE" "TRUE" "TRUE"
> #False = Accept
> #True = Reject
Recall that the critical values are 90% = 1.661, 95% = 1.985 and 99% = 2.626.In this case, We reject the T-value of 4.831 at the 90%, 95% and 99% Confidence intervals.
When a Frequentisist Vews the Results, they might say, “given the data we’ve observed, the probability of drawing a sample with a mean equal to our null is < X%”" so im going to assume that the null hypothesis is correct. Frequentist analysis is heavily dependent on the distribution generating the data, but indirectly so. Because of this it is very easy to misinterperet results and reject the Null hypothisis, when in reality the Null may be really close to the true value.