Introduction to Hypothesis testing
1 Data downloading
Using the getSymbols function, download monthly data for Apple (AAPL) and Microsoft (MSFT) from 2016 to date.
library(quantmod)
Loading required package: xts
Loading required package: zoo
Attaching package: ‘zoo’
The following objects are masked from ‘package:base’:
as.Date, as.Date.numeric
Loading required package: TTR
Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
getSymbols(c("AAPL", "MSFT"), from="2016-01-01", periodicity="monthly", src="yahoo")
‘getSymbols’ currently uses auto.assign=TRUE by default, but will
use auto.assign=FALSE in 0.5-0. You will still be able to use
‘loadSymbols’ to automatically load data. getOption("getSymbols.env")
and getOption("getSymbols.auto.assign") will still be checked for
alternate defaults.
This message is shown once per session and may be disabled by setting
options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
[1] "AAPL" "MSFT"
2 Return Calculation
Calculate the continuously compounded returns for each stock. Remember that the cc returns can be calculate as the first difference of the log of adjusted prices
r_AAPL <- diff(log(Ad(AAPL)))
r_MSFT <- diff(log(Ad(MSFT)))
head(r_AAPL, 5)
AAPL.Adjusted
2016-01-01 NA
2016-02-01 -0.006699996
2016-03-01 0.125157717
2016-04-01 -0.150731161
2016-05-01 0.063244216
head(r_MSFT, 5)
MSFT.Adjusted
2016-01-01 NA
2016-02-01 -0.07949864
2016-03-01 0.08919059
2016-04-01 -0.10208674
2016-05-01 0.06087242
r_AAPL = na.omit(r_AAPL)
r_MSFT = na.omit(r_MSFT)
3 Hypothesis test for one-sample mean
Run a t-test to compare whether the mean return of a stock is different than zero. To do a hypothesis test, we usually do the following steps:
- DEFINE THE VARIABLE OF analysis
- WRITE THE NULL AND THE ALTERNATIVE HYPOTHESIS.
- CALCULATE THE STANDARD ERROR, WHICH IS THE STANDARD DEVIATION OF THE VARIABLE OF STUDY.
- CALCULATE THE t-statistic (t-value). EXPLAIN/INTERPRET THE t-statistic.
- WRITE YOUR CONCLUSION OF THE t-TEST
We will do these steps for the case of Apple return
Defining the variable of analysis In this case, the variable of analysis is the mean return of Apple. We want to test whether the mean return of Apple is greater than zero. We start believing that the mean return of Apple is greater than zero.
Writing the null and alternative hypotheses The alternative hypothesis is always our belief, and the null hypothesis is the opposite of our belief. The null hypothesis is usually named H0, while the alternative hypothesis is named Ha. Then we define the hypotheses as follows.
H0: mean(r_AAPL) = 0
Ha: mean(r_AAPL) > 0
In any hypothesis test we assume that the null hypothesis is true. The alternative hypothesis is our belief that we want to provide evidence for. In other words, we start being very skeptic about our belief, so we start assuming that we are wrong and that the null hypothesis is true. Then, the purpose of any hypothesis test is provide strong evidence against the null hypothesis, so we can say with certain confidence (level of probability) that our alternative hypothesis might be true. Then, how we provide this evidence against the null hypothesis? We start assuming that H0 is true, then we collect a sample data, calculate the variable of analysis with the data, and then calculate the t-statistic of the test. Then, what is the t statistic? The t-statistic or t-value is the standardized distance between the variable of analysis (calculated with the data) and the value stated in the null hypothesis. This standardized distance is measured in number of standard deviations of the variable of analysis. For the case of this example, I can re-write the definition of the t-statistic as follows: The t-statistic or t-value is the standardized distance between the mean return of Apple (calculated with historical returns) and zero (the value of the H0). This standardized distance is measured in number of standard deviations of the Apple mean returns. The standard deviation of the variable of analysis is usually named as standard error of the test. Then, we need to first calculate the standard error of the test, and then the t-statistic.
- Calculate the standard error of the test The standard error of the test is the standard deviation of the variable of analysis. For the case of Apple returns, the standard error is the standard deviation of the mean returns of Apple. Check that it is not the same the standard deviation of Apple returns vs the standard deviation of the Apple **mean returns*. From the Central Limit Theorem we learned that the standard deviation of the mean of a group is significantly reduced compared with the standard deviation of the individuals. In this case, the standard deviation of Apple mean returns (the mean of a group of returns) must be much less than the standard deviation of Apple historical individual returns. Then, how we can calculate the standard error, which is the standard deviation of the mean of a group? Remember what we learned from the Central Limit Theorem. The standard deviation of the mean of a group is equal to the standard deviation of the individuals divided by the squared root of N (N=the number of elements in the group). Then, for the case of Apple mean return, the standard error is equal to the standard deviation of Apple historical returns divided by the squared root of the number of historical periods. Then, we can manually calculate the standard error as follows:
1.- I set N equal to the # of rows of the historical return dataset: 2.- I calculate the standard error: 3.- Note that sd is a function to calculate the standard deviation of a variable
N = nrow(r_AAPL)
se_AAPL <- sd(r_AAPL) / sqrt(N)
se_AAPL
[1] 0.01011031
We got a standard error equal to 0.0101059, which is much less that the original standard deviation of the monthly Apple returns, which is equal to 0.0827202.
- Calculation of the t-statistic (also called t-value) We will calculate the t-statistic a) by hand (manually calculated), and b) using the t.test function. We start doing the manual calculation to better understand what is the t-statistic: Since the t-statistic is the standardized distance from the real value of the variable of analysis and the null value stated in the null hypothesis, then:
t_val <- (mean(r_AAPL) - 0) / se_AAPL
t_val
[1] 2.832736
The numerator of t_val is the distance between the Apple mean return and the hypothetical value of Apple mean return (stated in H0), which is zero. To measure this distance in number of standard deviations of Apple mean returns, we divide this distance by its standard error. By doing this division we get a standardized distance from the actual (real) Apple mean returns and zero, the hypothetical Apple mean return stated in the null hypothesis H0.
Now calculate the t-statistic using the t.test function:
ttest_AAPL <- t.test(as.numeric(r_AAPL), alternative = "greater")
ttest_AAPL
One Sample t-test
data: as.numeric(r_AAPL)
t = 2.8327, df = 66, p-value = 0.003057
alternative hypothesis: true mean is greater than 0
95 percent confidence interval:
0.01177311 Inf
sample estimates:
mean of x
0.02863983
We can display only the t-value and the corresponding p-value of the test:
cat("t-vale from t.test =", ttest_AAPL$statistic,"\n")
t-vale from t.test = 2.832736
cat("p-value = ", ttest_AAPL$p.value)
p-value = 0.003057268
WE GOT THE SAME t-VALUE USING THE t.test FUNCTION compared with our MANUAL CALCULATION!
d.1. Interpretation of the t-statistic The t-value of the test is 2.7998249.
As we mentioned, the t-value or t-statistic is a measure of standardized distance between the real actual value of the variable of analysis and the hypothetical value stated in the null hypothesis. In this case we can say that the real Apple mean return is 2.7998249 standard deviations away from zero, the hypothetical Apple mean return stated in H0. When the t-value is bigger than 2, we have strong statistical evidence (at least at the 95% confidence level) to reject the null hypothesis H0. Why this is the case? Let’s quickly review what is the t-Student distribution vs the z normal distribution. In any hypothesis testing we assume that the variable of analysis behaves like a t-Student distribution. The t-Student distribution is a probability distribution that is very similar to the normal probability distribution. The main difference is that the t-Student distribution better models extreme values for small samples, and real financial returns usually have more extreme values compared to the normal distribution. When the sample size is bigger than 30, the normal z distribution behaves almost the same as the t Student distribution. Since the variable of analysis is supposed to behave like a t-Student distribution, then when the t-statistic is greater than 2 this means that the hypothetical distribution with mean zero is very far away from the real distribution of the data. When this distance is 2, then, both distributions will overlap only in about 2.5%! This means that if we assume that the hypothetical distribution with mean=0 is true, then it will be very unlikely that we got a mean that is 2 standard deviations away from the tru value!! Then, we can say that we have strong evidence to reject the null hypothesis when the t-value is 2 or greater!
- Conclusion of the test Since the t-value of the test is greater than 2, then we have strong statistical evidence to reject the null hypothesis that states that Apple mean return is zero. Therefore, AAPL mean return is statistically greater than 0. Another more detailed interpretation using the p-value and the confidence level of the test is the following: Since the t-value of the mean return of AAPL is greater than 2 and the corresponding p-value is less than 0.05, we can reject the null hypothesis at the 99.6650538% confidence (1-pvalue). Therefore, AAPL mean return is statistically greater than 0 with a confidence level of 99.6650538%.
Hypothesis test for comparing 2 sample means
Do a hypothesis test to check whether the Apple mean monthly return is greater than the Microsoft mean monthly return. We follow the same steps of hypothesis testing we described in the previous example:
3.4.1 Defining the variable of analysis In this case, the variable of analysis is the difference of two mean returns. More specifically, the variable of analysis is the difference between Apple mean returns and Microsoft mean returns. If this difference is bigger than zero, then we say that Apple mean returns is greater than Microsoft mean returns. We start calculating the mean of returns for both stocks:
mean_AAPL_r <- mean(r_AAPL)
mean_MSFT_r <- mean(r_MSFT)
print(mean_AAPL_r)
[1] 0.02863983
print(mean_MSFT_r)
[1] 0.02690819
3.4.2 Writing the null and alternative hypotheses
Remember that the alternative hypothesis (Ha) is always our belief, and the null hypothesis (H0) is the opposite of our belief. Then we define the hypotheses as follows. Since the mean return of Apple is higher than the mean return of Microsoft we start believing that Apple is significantly offering higher average monthly returns compared to Microsoft. Then:
H0: mean(r_AAPL) = mean(r_MSFT)
Ha: mean(r_AAPL) > mean(r_MSFT)
However, the null hypothesis always has to be stated with a variable that is equal to a specific value. Remember that our variable of study is the difference of both means. Then, we can re-arrange the equality to leave a number to the right:
H0: mean(r_AAPL) - mean(r_MSFT) = 0
Ha: mean(r_AAPL) - mean(r_MSFT) <>0
We can define our variable of study as meandif:
meandif = mean(r_AAPL) - mean(r_MSFT)
Then, the final setup of the hypotheses is:
H0: meandif = 0
Ha: meandif > 0
In this case, the variable of study of this test is meandif, which is the difference of 2 means. The mean return of AAPL and MSFT are random variables, so the variable of this test is also a random variable. To calculate the t value of this test, we have estimate the standar error, which is the standard deviation of meandif.
3.4.3 Calculate the standard error of the test In this case the standard error is the standard deviation of the difference of both means (meandiff). Remember that the means of return of each stock are random variables. From basic probability theory, if both random variables (Apple mean returns and Microsoft mean returns) are independent, then the variance of the difference of 2 random variables is the SUM of the variances! This sounds counter-intuitive.
WHY THE VARIANCE OF THE DIFFERENCE OF 2 RANDOM VARIABLES IS THE SUM OF THE 2 VARIANCES INSTEAD OF BEING THE DIFFERENCE OF BOTH VARIANCES? DO YOUR OWN RESEARCH AND BRIEFLY EXPLAIN. EVERY TIME SOMETHING HAPPENS AT RANDOM, WHETHER IT ADDS TO THE PILE OR SUBTRACTS FROM IT,UNCERTAINTY (VARIANCE) INCREASES.
RESEARCH XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
PERSONAL NOTES This is an excerpt I found to be extremely useful. “The Pythagorean Theorem of Statistics Quick. What’s the most important theorem in statistics? That’s easy. It’s the central limit theorem (CLT), hands down. Okay, how about the second most important theorem? I say it’s the fact that for the sum or difference of independent random variables, variances add:”
VAR(X+-y)=VAR(X)+var(Y)
“I like to refer to this statement as the Pythagorean theorem of statistics for several reasons: When written in terms of standard deviations, it looks like the Pythagorean theorem:”
“Just as the Pythagorean theorem applies only to right triangles, this relationship applies only to independent random variables. The name helps students remember both the relationship and the restriction. As you may suspect, this analogy is more than a mere coincidence. There’s a nice geometric model that represents random variables as vectors whose lengths correspond to their standard deviations. When the variables are independent, the vectors are orthogonal. Then the standard deviation of the sum or difference of the variables is the hypotenuse of a right triangle.”
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
3.4.4 Calculation of the t-statistic of the test Here we do the calculation of t-value manually in R:
N <- nrow(r_AAPL)
t <- (mean_AAPL_r - mean_MSFT_r - 0) / sqrt( (1/N) * (var(r_AAPL) + var(r_MSFT) ))
t
AAPL.Adjusted
AAPL.Adjusted 0.1464527
ttest <- t.test(as.numeric(r_AAPL), as.numeric(r_MSFT), paired = FALSE, var.equal = FALSE)
ttest
Welch Two Sample t-test
data: as.numeric(r_AAPL) and as.numeric(r_MSFT)
t = 0.14645, df = 108.76, p-value = 0.8838
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.02170352 0.02516680
sample estimates:
mean of x mean of y
0.02863983 0.02690819
We can also display only the t-value and the corresponding p-value:
cat("t-vale from t.test =", ttest$statistic,"\n")
t-vale from t.test = 0.1464527
cat("p-value = ", ttest$p.value)
p-value = 0.8838352
We got the same t value than the manual calculation.
4.5 Interpretation of the t-statistic/t-value We got a t-value= 0.1213646. This means that the distance between the difference of both mean returns from zero is only 0.1213646. Then, the real distribution of our variable of analysis is not too far away from the hypothetical distribution with a difference=0 to say that there is a significant distance to reject the null hypothesis.
3.4.6 Conclusion of the test Since the t-value of the test is less than 2 and the p-value is greater than 0.05, then the null hypothesis (H0) cannot be rejected. Therefore, we conclude that there is no significant difference between the average monthly returns of AAPL and MSFT over time; they are statistically equal.
Reading
Regression Analysis is perhaps the single most important Business Statistics tool used in the industry. Regression is the engine behind a multitude of data analytics applications used for many forms of forecasting and prediction.
---
title: "Workshop 3 - Financial Econometrics 1"
author: Stefan Schweitzer A01209755
output: html_notebook
---
## Introduction to Hypothesis testing


1 Data downloading

Using the getSymbols function, download monthly data for Apple (AAPL) and Microsoft (MSFT) from 2016 to date.
```{r}
library(quantmod)
getSymbols(c("AAPL", "MSFT"), from="2016-01-01", periodicity="monthly", src="yahoo")
```

2 Return Calculation

Calculate the continuously compounded returns for each stock. Remember that the cc returns can be calculate as the first difference of the log of adjusted prices
```{r}
r_AAPL <- diff(log(Ad(AAPL)))
r_MSFT <- diff(log(Ad(MSFT)))

head(r_AAPL, 5)
```
```{r}
head(r_MSFT, 5)
```
```{r}
r_AAPL = na.omit(r_AAPL)
r_MSFT = na.omit(r_MSFT)
```

3 Hypothesis test for one-sample mean

Run a t-test to compare whether the mean return of a stock is different than zero.
To do a hypothesis test, we usually do the following steps:

a. DEFINE THE VARIABLE OF analysis
b. WRITE THE NULL AND THE ALTERNATIVE HYPOTHESIS.
c. CALCULATE THE STANDARD ERROR, WHICH IS THE STANDARD DEVIATION OF THE VARIABLE OF STUDY.
d. CALCULATE THE t-statistic (t-value). EXPLAIN/INTERPRET THE t-statistic.
e. WRITE YOUR CONCLUSION OF THE t-TEST

We will do these steps for the case of Apple return

a. Defining the variable of analysis
In this case, the variable of analysis is the mean return of Apple. We want to test whether the mean return of Apple is greater than zero. We start believing that the mean return of Apple is greater than zero.

b. Writing the null and alternative hypotheses
The alternative hypothesis is always our belief, and the null hypothesis is the opposite of our belief. The null hypothesis is usually named H0, while the alternative hypothesis is named Ha. Then we define the hypotheses as follows.

H0: mean(r_AAPL) = 0

Ha: mean(r_AAPL) > 0

In any hypothesis test we assume that the null hypothesis is true. The alternative hypothesis is our belief that we want to provide evidence for.
In other words, we start being very skeptic about our belief, so we start assuming that we are wrong and that the null hypothesis is true.
Then, the purpose of any hypothesis test is provide strong evidence against the null hypothesis, so we can say with certain confidence (level of probability) that our alternative hypothesis might be true.
Then, how we provide this evidence against the null hypothesis?
We start assuming that H0 is true, then we collect a sample data, calculate the variable of analysis with the data, and then calculate the t-statistic of the test. Then, what is the t statistic?
The t-statistic or t-value is the standardized distance between the variable of analysis (calculated with the data) and the value stated in the null hypothesis. This standardized distance is measured in number of standard deviations of the variable of analysis.
For the case of this example, I can re-write the definition of the t-statistic as follows:
The t-statistic or t-value is the standardized distance between the mean return of Apple (calculated with historical returns) and zero (the value of the H0). This standardized distance is measured in number of standard deviations of the Apple mean returns.
The standard deviation of the variable of analysis is usually named as standard error of the test.
Then, we need to first calculate the standard error of the test, and then the t-statistic.

c.  Calculate the standard error of the test
The standard error of the test is the standard deviation of the variable of analysis.
For the case of Apple returns, the standard error is the standard deviation of the mean returns of Apple. Check that it is not the same the standard deviation of Apple returns vs the standard deviation of the Apple **mean returns*.
From the Central Limit Theorem we learned that the standard deviation of the mean of a group is significantly reduced compared with the standard deviation of the individuals.
In this case, the standard deviation of Apple mean returns (the mean of a group of returns) must be much less than the standard deviation of Apple historical individual returns.
Then, how we can calculate the standard error, which is the standard deviation of the mean of a group? Remember what we learned from the Central Limit Theorem. The standard deviation of the mean of a group is equal to the standard deviation of the individuals divided by the squared root of N (N=the number of elements in the group).
Then, for the case of Apple mean return, the standard error is equal to the standard deviation of Apple historical returns divided by the squared root of the number of historical periods. Then, we can manually calculate the standard error as follows:

1.- I set N equal to the # of rows of the historical return dataset:
2.- I calculate the standard error:
3.- Note that sd is a function to calculate the standard deviation of a variable
```{r}
N = nrow(r_AAPL)
se_AAPL <- sd(r_AAPL) / sqrt(N)
se_AAPL
```
We got a standard error equal to 0.0101059, which is much less that the original standard deviation of the monthly Apple returns, which is equal to 0.0827202.

d. Calculation of the t-statistic (also called t-value)
We will calculate the t-statistic a) by hand (manually calculated), and b) using the t.test function.
We start doing the manual calculation to better understand what is the t-statistic:
Since the t-statistic is the standardized distance from the real value of the variable of analysis and the null value stated in the null hypothesis, then:

```{r}
t_val <- (mean(r_AAPL) - 0) / se_AAPL
t_val
```
The numerator of t_val is the distance between the Apple mean return and the hypothetical value of Apple mean return (stated in H0), which is zero. To measure this distance in number of standard deviations of Apple mean returns, we divide this distance by its standard error. By doing this division we get a standardized distance from the actual (real) Apple mean returns and zero, the hypothetical Apple mean return stated in the null hypothesis H0.

Now calculate the t-statistic using the t.test function:
```{r}
ttest_AAPL <- t.test(as.numeric(r_AAPL), alternative = "greater")
ttest_AAPL
```
We can display only the t-value and the corresponding p-value of the test:
```{r}
cat("t-vale from t.test =", ttest_AAPL$statistic,"\n")
```
```{r}
cat("p-value = ", ttest_AAPL$p.value)
```
WE GOT THE SAME t-VALUE USING THE t.test FUNCTION compared with our MANUAL CALCULATION!

d.1. Interpretation of the t-statistic
The t-value of the test is 2.7998249.

As we mentioned, the t-value or t-statistic is a measure of standardized distance between the real actual value of the variable of analysis and the hypothetical value stated in the null hypothesis.
In this case we can say that the real Apple mean return is 2.7998249 standard deviations away from zero, the hypothetical Apple mean return stated in H0.
When the t-value is bigger than 2, we have strong statistical evidence (at least at the 95% confidence level) to reject the null hypothesis H0. Why this is the case? Let’s quickly review what is the t-Student distribution vs the z normal distribution.
In any hypothesis testing we assume that the variable of analysis behaves like a t-Student distribution. The t-Student distribution is a probability distribution that is very similar to the normal probability distribution. The main difference is that the t-Student distribution better models extreme values for small samples, and real financial returns usually have more extreme values compared to the normal distribution.
When the sample size is bigger than 30, the normal z distribution behaves almost the same as the t Student distribution.
Since the variable of analysis is supposed to behave like a t-Student distribution, then when the t-statistic is greater than 2 this means that the hypothetical distribution with mean zero is very far away from the real distribution of the data. When this distance is 2, then, both distributions will overlap only in about 2.5%! This means that if we assume that the hypothetical distribution with mean=0 is true, then it will be very unlikely that we got a mean that is 2 standard deviations away from the tru value!! Then, we can say that we have strong evidence to reject the null hypothesis when the t-value is 2 or greater!

e. Conclusion of the test
Since the t-value of the test is greater than 2, then we have strong statistical evidence to reject the null hypothesis that states that Apple mean return is zero. Therefore, AAPL mean return is statistically greater than 0.
Another more detailed interpretation using the p-value and the confidence level of the test is the following:
Since the t-value of the mean return of AAPL is greater than 2 and the corresponding p-value is less than 0.05, we can reject the null hypothesis at the 99.6650538% confidence (1-pvalue). Therefore, AAPL mean return is statistically greater than 0 with a confidence level of 99.6650538%.

## Hypothesis test for comparing 2 sample means
Do a hypothesis test to check whether the Apple mean monthly return is greater than the Microsoft mean monthly return.
We follow the same steps of hypothesis testing we described in the previous example:

3.4.1 Defining the variable of analysis
In this case, the variable of analysis is the difference of two mean returns. More specifically, the variable of analysis is the difference between Apple mean returns and Microsoft mean returns. If this difference is bigger than zero, then we say that Apple mean returns is greater than Microsoft mean returns.
We start calculating the mean of returns for both stocks:

```{r}
mean_AAPL_r <- mean(r_AAPL)
mean_MSFT_r <- mean(r_MSFT)

print(mean_AAPL_r)
```
```{r}
print(mean_MSFT_r)
```

3.4.2 Writing the null and alternative hypotheses

Remember that the alternative hypothesis (Ha) is always our belief, and the null hypothesis (H0) is the opposite of our belief. Then we define the hypotheses as follows.
Since the mean return of Apple is higher than the mean return of Microsoft we start believing that Apple is significantly offering higher average monthly returns compared to Microsoft. Then:

H0: mean(r_AAPL) = mean(r_MSFT)

Ha: mean(r_AAPL) > mean(r_MSFT)

However, the null hypothesis always has to be stated with a variable that is equal to a specific value. Remember that our variable of study is the difference of both means. Then, we can re-arrange the equality to leave a number to the right:

H0: mean(r_AAPL) - mean(r_MSFT) = 0

Ha: mean(r_AAPL) - mean(r_MSFT) <>0

We can define our variable of study as meandif:

meandif = mean(r_AAPL) - mean(r_MSFT)

Then, the final setup of the hypotheses is:

H0: meandif = 0

Ha: meandif > 0

In this case, the variable of study of this test is meandif, which is the difference of 2 means. The mean return of AAPL and MSFT are random variables, so the variable of this test is also a random variable.
To calculate the t value of this test, we have estimate the standar error, which is the standard deviation of meandif.

3.4.3 Calculate the standard error of the test
In this case the standard error is the standard deviation of the difference of both means (meandiff). Remember that the means of return of each stock are random variables.
From basic probability theory, if both random variables (Apple mean returns and Microsoft mean returns) are independent, then the variance of the difference of 2 random variables is the SUM of the variances! This sounds counter-intuitive.

WHY THE VARIANCE OF THE DIFFERENCE OF 2 RANDOM VARIABLES IS THE SUM OF THE 2 VARIANCES INSTEAD OF BEING THE DIFFERENCE OF BOTH VARIANCES? DO YOUR OWN RESEARCH AND BRIEFLY EXPLAIN.
EVERY TIME SOMETHING HAPPENS AT RANDOM, WHETHER IT ADDS TO THE PILE OR SUBTRACTS FROM IT,UNCERTAINTY (VARIANCE) INCREASES.



RESEARCH 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX


PERSONAL NOTES
This is an excerpt I found to be extremely useful.
"The Pythagorean Theorem of Statistics
Quick. What’s the most important theorem in statistics? That’s easy. It’s the central limit theorem (CLT), hands down. Okay, how about the second most important theorem? I say it’s the fact that for the sum or difference of independent random variables, variances add:"

VAR(X+-y)=VAR(X)+var(Y)

"I like to refer to this statement as the Pythagorean theorem of statistics for several reasons:
When written in terms of standard deviations, it looks like the Pythagorean theorem:"

"Just as the Pythagorean theorem applies only to right triangles, this relationship applies only to independent random variables.
The name helps students remember both the relationship and the restriction.
As you may suspect, this analogy is more than a mere coincidence. There’s a nice geometric model that represents random variables as vectors whose lengths correspond to their standard deviations. When the variables are independent, the vectors are orthogonal. Then the standard deviation of the sum or difference of the variables is the hypotenuse of a right triangle."

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX



3.4.4 Calculation of the t-statistic of the test
Here we do the calculation of t-value manually in R:
```{r}
N <- nrow(r_AAPL)
t <- (mean_AAPL_r - mean_MSFT_r - 0) / sqrt( (1/N) * (var(r_AAPL) + var(r_MSFT) ))
t
```
```{r}
ttest <- t.test(as.numeric(r_AAPL), as.numeric(r_MSFT), paired = FALSE, var.equal = FALSE)
ttest
```
We can also display only the t-value and the corresponding p-value:
```{r}
cat("t-vale from t.test =", ttest$statistic,"\n")
```
```{r}
cat("p-value = ", ttest$p.value)
```
We got the same t value than the manual calculation.

4.5 Interpretation of the t-statistic/t-value
We got a t-value= 0.1213646. This means that the distance between the difference of both mean returns from zero is only 0.1213646. Then, the real distribution of our variable of analysis is not too far away from the hypothetical distribution with a difference=0 to say that there is a significant distance to reject the null hypothesis.

3.4.6 Conclusion of the test
Since the t-value of the test is less than 2 and the p-value is greater than 0.05, then the null hypothesis (H0) cannot be rejected. Therefore, we conclude that there is no significant difference between the average monthly returns of AAPL and MSFT over time; they are statistically equal.


## Reading
Regression Analysis is perhaps the single most important Business Statistics tool used in the industry. Regression is the engine behind a multitude of data analytics applications used for many forms of forecasting and prediction.
