StockAnalysis

MSFT stock analysys

In this section we will do analysis of MSF returns perform various statistical tests in order to better understand characteristics of MSFT returns and its distribution during last two and a half years from 01/01/2019 to 06/11/2021.

Dataset

In the table we can see with which data we’re working on. Data sets used in this analysis are obtained with quantmod library.

##            MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume
## 2021-06-03    245.22    246.34   243.00     245.71    25307700
## 2021-06-04    247.76    251.65   247.51     250.79    25281100
## 2021-06-07    249.98    254.09   249.81     253.81    23079200
## 2021-06-08    255.16    256.01   252.51     252.57    22455000
## 2021-06-09    253.81    255.53   253.21     253.59    17937600
## 2021-06-10    254.29    257.46   253.67     257.24    24563600

Visualizing the dataset

Let’s now plot the stock price for that period. As you can see, only by visually inspecting plots, in order to have a better feeling of the MSFT price movement, without performing any statistical tests, it seems that before H1 of 2020 (when Covid-19 starting to take its toll), there was a clear upwards trend. And as mentioned, because of the global pandemic, there was a sudden drop in price. Later on, we will compare MSFT performance to the S&P500 indexes that we will use as a benchmark.

What is also interesting is that in the H2 of 2020 and in the H1 of 2021, we can again clearly see the upwards trend, which resulted in stock price surge.

On the second plot, we inspect both the volume and the stock price. First thing that’s obvious is that there was a major sell-off in the H1 of 2020 when Covid pandemic stuck. What is also interesting is that is we closely examine the plot we will see that occasionally there are huge volumes during some trading sessions that moved the price, regardless of the direction.

MSFT returns

In this section we’ll work with MSFT returns, performing various statistical tests to figure out and estimate the distribution of returns, thus to have a better idea of stocks risk-reward ratio, potential tail risks and other interesting things.

Calculating and visualizing the returns

Let’s first calculate the simple returns for MSFT:

Daily returns
Weekly returns
Monthly returns
Quarterly returns
Yearly returns

Now let’s plot previously calculated returns. There are some interesting things we can see on the plots:

Daily returns - we can notice some volatility clustering, meaning that there are periods with high volatility, periods with low volatility and periods with moderate volatility. We automatically know that there isn’t a constant variance of this returns, there is some time variation in volatility.
Weekly returns - we can see similar behaviour here as well, but it isn’t so obvious as for the daily returns. For both the daily and weekly returns we can clearly see that the period with the highest volatility was in H1 of the 2020, for obvious reasons. In that period there were both the highest and the lowest daily and weekly returns, for two and a half year period.
Quarterly returns - as previously mentioned H1 of 2020 was the most volatile period and that was the time when the whole market went down and in the transitioning period of Q2-Q3, market started to recover. What is interesting is that Q1 of that year had a lowest quarterly return (nearly 0%) and the Q2 of the same year had the highest quarterly return (nearly 30%) in the time horizon of two and a half years, which represents our data set.
Yearly returns - the most important thing to take it from here is that the MSFT stock returns in 2019 (astonishing nearly 60%) outperformed both the returns in 2020 (around 40%) and in H1 of 2021 (around 5%).

Plotting returns:

Analysing daily returns

Here we can see some statistical properties of our data set in more detail. Some of the interesting things we could notice are:

The highest daily return (14.21%) and the lowest daily (-14.73%) return MSFT stock had during the period of 615 trading days.
Simple estimation of future MSFT daily return, calculated as the mean, is 0.17%
From the first quantile (-0.0072) and the third quantile (0.0111), we have an indicator that returns are somewhere symmetric, with small potential skewness. But we’ll statistically calculate that in the one of the next steps.

##             daily.returns
## nobs           615.000000
## NAs              0.000000
## Minimum         -0.147390
## Maximum          0.142169
## 1. Quartile     -0.007274
## 3. Quartile      0.011151
## Mean             0.001753
## Median           0.001484
## Sum              1.078209
## SE Mean          0.000823
## LCL Mean         0.000137
## UCL Mean         0.003369
## Variance         0.000416
## Stdev            0.020407
## Skewness        -0.093339
## Kurtosis        10.134167

Let’s now calculate average and annualized daily return and volatility.

Expected annual return is 42.2% while expected annual volatility is 32.4%.

## [1] "Average daily return:" "0.0018"

## [1] "Annualized return:" "0.442"

## [1] "Volatility:" "0.02"

## [1] "Annualized volatility:" "0.324"

Zero mean test

We calculated that the expected daily return is 0.18%. Let’s do zero mean test (null hypothesis is that returns are zero on average). We got that p-value for the test is 0.0335 which is lower than the significance level of 0.5 (for rejecting the null hypothesis). Because of the we reject the null hypothesis and accept the alternative, that the returns on average are not zero.

Previously we already calculated average daily return, which is not zero. But because the return is not so large number, we got the p-value for the test to be equal to 0.033, slightly less than 0.5. That means that if the daily return was smaller than 0.18%, it wouldn’t be statistically significant and we would probably accept the null hypothesis. In that case we could say with statistical certainty that average daily return is 0.

## 
##  One Sample t-test
## 
## data:  MSFT_xts.retDaily
## t = 2.1305, df = 614, p-value = 0.03352
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.0001371854 0.0033691845
## sample estimates:
##   mean of x 
## 0.001753185

Central moments

Third central moment (skewness) is -0.093. As skewness is negative and small, returns are slightly skewed to the left. We’ll also perform skewness test to make sure that it’s statistically significant.

Obtained p-value for the test is less than significance level, so we reject the null hypothesis (H0 is that it’s normal distribution, meaning that skewness is 0).

## [1] "Skewness:"           "-0.0933389252898925"

## [1] "P-value:"          "0.344666740793146"

Fourth central moment (kurtosis) is 10.13. In this case we’re calculating excess kurtosis, with the reference value is 0, the same as for the skewness. As kurtosis is positive and large number, returns distribution is leptokurtic. That means that distribution has fatter tails, indicating potential tail risk. Returns that are more than three standard deviations are considered to instantiate tail risk.

Similarly as for the skewness, we’ll also perform kurtosis test. Obtained p-value for the test is less than significance level so we reject the null hypothesis (H0 is that it’s normal distribution, meaning that kurtosis is 0).

## [1] "Kurtosis:"        "10.1341665251272"

## [1] "P-value:" "0"

Distribution of returns

For simplification purposes, especially in academia, it is often assumed that returns follow normal distribution. As we previously seen from calculating kurtosis, assuming normal distribution could be dangerous because it ignores tail risks for rare and extreme events.

Jarque-Bera normality test is statistical test which checks if distribution follows normal distribution. For MSFT returns we got that p-value is really small (2.2e-16), so we reject the null hypothesis. This test takes both deviation in the skewness and in the kurtosis.

## 
## Title:
##  Jarque - Bera Normalality Test
## 
## Test Results:
##   STATISTIC:
##     X-squared: 2654.9042
##   P VALUE:
##     Asymptotic p Value: < 2.2e-16 
## 
## Description:
##  Mon Jun 21 01:52:26 2021 by user:

To have a better felling of all the metrics we calculated previously we can plot the histogram of MSFT daily returns.

Just by looking at the histogram, if we remove outliers, we could say that returns follow normal distribution. As skewness is nearly equal to zero, histogram looks quite symmetric.

In order to better compare distribution of MSFT daily returns to normal distribution we will estimate PDF and plot it together with normal distribution.

What we previously calculated from Jarque-Bera normality test, we can clearly see that MSFT returns are deviating from normal distribution.

Comparing to S&P500

In previous chapters we examined returns on MSFT in the great details and performing various statistical tests.

Now let’s compare the MSFT performance compared to the S&P500 index. Here is the S&P500 data:

##            GSPC.Open GSPC.High GSPC.Low GSPC.Close GSPC.Volume
## 2021-06-03   4191.43   4204.39  4167.93    4192.85  4579450000
## 2021-06-04   4206.05   4233.45  4206.05    4229.89  3487070000
## 2021-06-07   4229.34   4232.34  4215.66    4226.52  3835570000
## 2021-06-08   4233.81   4236.74  4208.41    4227.26  3943870000
## 2021-06-09   4232.99   4237.09  4218.74    4219.55  3902870000
## 2021-06-10   4228.56   4249.74  4220.34    4239.18  3502480000

We can plot both the MSFT and S&P500 price movement. If you recall our analysis, at the beginning, we notices the upward trend in MSFT stock until Q2 of 2020, where the stock price corrected and from H2 of 2020 it continued with the upward trend. The same thing we notice for the S&P500.

Just by looking at the plot it seems that the MSFT and S&P500 are higly correlated, but in the following steps we’ll perform formal statistical test to see what it the level of the linear dependence.

Regression

In order to find out the level of liner dependence between MSFT S&P500 return, we’ll do a simple liner regression.

We can see that on average MSFT (0.175%) has a higher daily return compared to the S&P500 (0.099%).

As we previously seen just from looking at the plot, it seemed that there was a high correlation between them. Now we calculated correlation (standardized covariance) which is extremely high with value of 0.85.

## [1] "Average daily returns:"

## [1] "MSFT (%)" "0.1753"

## [1] "S&P500 (%)" "0.0991"

## [1] "Returns correlation:" "0.854913092606122"

Now let’s do the simple linear regression of MSFT and S&P500 returns. We want to find out if there is a positive/negative association between MSFT and S&P500 returns and perform a formal statistical test.

From the regression summary, we got the value for the intercept and for the coefficient. The model fitted returns and the value for the coefficient is 1.144. T-test is automatically performed to see if the coefficient has a statistical significance (null hypothesis is that coefficient is zero). For the p-value we got the value of 2e-16, so we could reject the null hypothesis with a high confidence.

On the scatter plot of the returns which contains the regression line as well, once again we can clearly see the high level of positive association between MSFT and S&P500 returns.

## 
## Call:
## lm(formula = MSFT_xts.retDaily ~ SNP_xts.retDaily)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.037829 -0.005745 -0.000226  0.005483  0.047407 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      0.0006193  0.0004281   1.446    0.149    
## SNP_xts.retDaily 1.1440170  0.0280386  40.802   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0106 on 613 degrees of freedom
## Multiple R-squared:  0.7309, Adjusted R-squared:  0.7304 
## F-statistic:  1665 on 1 and 613 DF,  p-value: < 2.2e-16

Conditional PDF and CDF estimation

In this section we’ll use estimator from np library (Nonparametric Kernel Smoothing library) to estimate MFST returns PDF and CDF conditional on the S&P500 returns. We want to find out where is the mass of the density and distribution.

As we already knew from previous steps, returns are highly correlated and there is a positive association between them. That’s the reason why the most of the mass of the density is located across diagonal, so if S&P500 return is positive there is a high likelihood that MSFT return will be positive as well. And all of the peaks are also diagonally distributed.

Now we estimate and plot conditional CDF. Here we observe all the similar things we previously described for the conditional PDF.

Statistics and Financial Data Analysis

A work by: Nikola Krivacevic, Aleksandar Milinkovic and Milos Milunovic

Entire forecasting project on github

(https://github.com/mcf-long-short/statistics-stocks-forecasting)