# !pip install yfinance
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller
from statsmodels.stats.diagnostic import acorr_ljungbox
import iguizarFuncs as igEfficient Financial Markets [?]
Disclaimer: This document is intended for educational purposes only. It does not constitute financial advice. It is part of my Risk Financial Management course at Tec de Monterrey.
1 Intro
1.1 The Efficient Market Hypothesis
The description of the efficient market hypothesis (EMH) here follows Mishkin and Eakins (2024). It states that prices in financial markets fully reflect all available information. To understand the hypothesis, the authors define the arithmetic rate of return from holding a stock from time \(t\) to \(t+1\) as:
\[ R = \frac{P_{t+1}-P_t}{P_t} \] that is, the rate of capital gains (assuming no other cash payments).
At the start of the period, however, \(P_{t+1}\) is unknown, but investors do have some expectations of the price, then, the expected return is:
\[ R^e = \frac{P^e_{t+1}-P_t}{P_t} \]
the efficient market hypothesis (EMH) views the expectations as the optimal forecast of the return \((R^{of})\), or simply as the best guess of the future, using all available information \([R^e = R^{of}]\).
The supply-and-demand analysis of a financial market teaches us that the expected return on a stock will converge to the equilibrium return, \(R^*\), that equates the quantity demanded to the quantity supplied. Then, if the market is in equilibrium, \(R^e = R^*\).
Therefore, in an efficient market:
\[ R^{of}=R^* \]
current prices in a financial market will be set so that the optimal forecast of a stock’s return using all available information equals the security’s equilibrium return.
\[ If \ R^{of} > R^* \Rightarrow P_t \uparrow \ \rightarrow \ R^{of} \downarrow \]
\[ If \ R^{of} < R^* \Rightarrow P_t \downarrow \ \rightarrow \ R^{of} \uparrow \] until \(R^{of}=R^*\).
More simply: in an efficient market a stock’s price fully reflects all available information.
A strong view of the EMH states that not only an efficient market is one in which expectations are optimal forecasts using all available information, but they also add the condition that an efficient market is one in which prices reflect the true fundamental (intrinsic) value of the securities. Thus, in an efficient market, all prices are always correct and reflect market fundamentals.
In favor of the EMH: Random-Walk
A random walk describes the movements of a variable whose future changes cannot be predicted (are random) because, given today’s value, the variable is just as likely to fall as to rise. Under the EMH all price changes are due to information that can not be anticipated, thus must be uncorrelated over time. Then, stock prices should approximately follow a random walk.
More formally, \(y_t\) represent a random walk if: \[ y_t = y_{t-1} + \epsilon_t \] where:
\(y_t:\) is the return at time \(t\)
\(\epsilon_t:\) is the random error term at time \(t\), independent and identically distributed (i.i.d.) with mean zero, constant variance, and no autocorrelation.
Against the EMH: Mean reversion and trend
Stocks with low returns today tend to have high returns in the future, and viceversa; hence stocks that have done poorly in the past are more likely to do well in the future because mean reversion indicates that there will be a predictable positive change in the future price. A trend is when a movement in one direction is followed by another in the same direction. Suggesting that stock prices are not a random walk.
More formally, using a simple first order auto-regression process:
\[ y_t = \rho y_{t-1} + \epsilon_t \] If \(1>\rho>0\), then trend
If \(-1<\phi<0\), then mean reversion
As we have seen earlier, when \(\rho \ne 0\), the time aggregation problem arises, which must be accounted for when measuring financial risk.
Note:
Dissatisfaction with using the EMH to explain events like 1987’s Black Monday, when stock markets around the world plummeted sharply, with the Dow Jones Industrial Average (DJIA) experiencing its largest one-day percentage drop in history—22.6%, gave rise to the field of behavioral finance.
Behavioral finance is a field of study that combines insights from psychology, sociology, anthropology and other social sciences with financial theory to understand how human behavior affects financial decision-making and markets. It challenges the traditional view of financial markets as perfectly rational and efficient, acknowledging that investors often act irrationally due to biases and emotions.
2 Testing the EMH
Recall: Autocorrelation measures the relationship between lagged values of a variable. Under the Efficient Markets Hypothesis, the stock returns should not be predictable, that is, they should not be autocorrelated.
Times series that show no autocorrelation are called white noise. We can evaluate if a process is white noise by evaluating the autocorrelations \((\rho)\).
Formally, we employ the Ljung and Box (1978) test to verify the null hypothesis:
\[ \begin{equation*} \rho_1 = \rho_2 = \dots =\rho_\tau = 0 \end{equation*} \]
for \(\tau \ge 1\). If the null is not rejected, the evidence is consistent with the EMH.
3 Application
Data will be obtained from yahoo finance, make sure you have installed the library (!pip install yfinance) before importing it, you only have to do this once. Also, I have written some useful functions in a file named “iguizarFuncs.py”, please make sure this file is loaded.
3.1 One stock
Import the libraries:
Download the data for one stock (AAPL):
Data = yf.download('AAPL', start='2023-09-30', end='2024-09-30', progress=False)['Close']
# Reset index to make the date a regular column
Data = Data.reset_index()
# Clean the 'Date' column
Data['Date'] = pd.to_datetime(Data['Date']).dt.dateYF.download() has changed argument auto_adjust default to True
Plot the price data:
plt.figure(figsize=(7, 5))
plt.plot(Data['Date'], Data['AAPL'])
plt.title("Time Series of the stock's prices")
plt.xlabel("Date", fontsize = 10)
plt.ylabel("Price", fontsize=10)
plt.xticks(rotation=45)
plt.tight_layout()
plt.legend()
plt.show()Calculate the log returns:
Data['Log_Returns'] = np.log(Data['AAPL'] / Data['AAPL'].shift(1))Plot the returns:
plt.figure(figsize=(7, 5))
plt.plot(Data['Date'], Data['Log_Returns'])
plt.axhline(0, color='black', linestyle='--', linewidth=0.5)
plt.title("Times series of the stock's return")
plt.xlabel("Date")
plt.ylabel("Returns")
plt.xticks(rotation=45)
plt.tight_layout()
plt.legend()
plt.show()The Ljung-Box test
We can now apply the Ljung-Box test. The analyst decides the number of lags, in this particular case we are choosing 5. Then, the null hypothesis:
\[ \rho_1 = \rho_2 = \rho_3 = \rho_4 = \rho_5=0 \]
indicates that the returns of the past 5 days have no influence on current returns. If the null hypothesis was rejected we would have statitical evidence that returns from at least one of the past 5 days are significantly correlated with today’s return — implying a deviation from the EMH.
# Run Ljung-Box test on log returns with 5 lags
ljung_box_results = acorr_ljungbox(Data['Log_Returns'].dropna(), lags=[5], return_df=True)
# Display the test results
print(ljung_box_results) lb_stat lb_pvalue
5 3.659679 0.599376
Since the p-value is not below any conventional significance level, we fail to reject the null hypothesis. This means that, for this particular stock, the test does not detect significant autocorrelation (p-value ≥ 0.10), supporting the idea that returns are independent — a result consistent with the Efficient Market Hypothesis (EMH).
3.2 Multiple stocks
To generalize this approach to multiple stocks, we will use the file “toUScompanies.csv” that lists the top US companies by market cap. Please make sure you have loaded this file.
- Specify the period and tickers of interest.
companies = pd.read_csv('toUScompanies.csv')
tickers = companies['Symbol'].tolist()
start_date = '2023-09-30'
end_date = '2024-09-30'The function ‘get_prices’ is useful to extract daily prices for multiple tickers over a specified period:
nStocks = ig.get_prices(tickers, start_date, end_date)
nStocks.head(5)| Ticker | Date | AAPL | ABBV | ABNB | ABT | ADBE | ADI | ADP | AMAT | AMD | ... | UBER | UNH | UNP | UPS | V | VRTX | VZ | WFC | WMT | XOM |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2023-10-02 | 172.485809 | 138.997025 | 136.559998 | 92.424889 | 521.130005 | 170.950928 | 231.582275 | 137.814804 | 103.269997 | ... | 45.680000 | 502.766541 | 196.067673 | 144.168121 | 228.643341 | 347.829987 | 28.193148 | 38.061131 | 52.395565 | 109.820274 |
| 1 | 2023-10-03 | 171.145615 | 138.228180 | 127.730003 | 92.482803 | 507.029999 | 167.131012 | 232.598114 | 135.058731 | 100.080002 | ... | 44.509998 | 497.783508 | 197.170700 | 142.931702 | 226.211273 | 345.149994 | 28.388437 | 37.157883 | 52.065022 | 110.010231 |
| 2 | 2023-10-04 | 172.396454 | 138.471970 | 127.410004 | 92.347656 | 518.419983 | 169.352798 | 236.042297 | 137.607376 | 104.070000 | ... | 44.939999 | 498.907135 | 195.709686 | 142.903809 | 228.593918 | 352.970001 | 27.997854 | 37.446159 | 52.690102 | 105.897789 |
| 3 | 2023-10-05 | 173.637360 | 138.246933 | 124.989998 | 92.878654 | 516.440002 | 167.705978 | 235.442490 | 137.587631 | 102.910004 | ... | 44.610001 | 504.388428 | 194.132523 | 142.587738 | 230.828247 | 355.140015 | 28.246408 | 37.763252 | 52.061752 | 103.513908 |
| 4 | 2023-10-06 | 176.198624 | 138.987640 | 126.360001 | 93.535172 | 526.679993 | 169.528229 | 238.364227 | 138.585327 | 107.239998 | ... | 45.779999 | 512.771606 | 195.893509 | 143.415131 | 232.370529 | 360.619995 | 27.969883 | 38.138004 | 51.187946 | 101.785339 |
5 rows × 111 columns
While the function ‘get_returns’ help us to obtain returns for multiple tickers over a specified period:
nStocksRet = ig.get_returns(tickers, start_date, end_date)
nStocksRet.head(5)| Ticker | Date | AAPL | ABBV | ABNB | ABT | ADBE | ADI | ADP | AMAT | AMD | ... | UBER | UNH | UNP | UPS | V | VRTX | VZ | WFC | WMT | XOM |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2023-10-03 | -0.007800 | -0.005547 | -0.066845 | 0.000627 | -0.027429 | -0.022599 | 0.004377 | -0.020201 | -0.031377 | ... | -0.025947 | -0.009961 | 0.005610 | -0.008613 | -0.010694 | -0.007735 | 0.006903 | -0.024017 | -0.006329 | 0.001728 |
| 1 | 2023-10-04 | 0.007282 | 0.001762 | -0.002508 | -0.001463 | 0.022216 | 0.013206 | 0.014699 | 0.018695 | 0.039094 | ... | 0.009614 | 0.002255 | -0.007437 | -0.000195 | 0.010478 | 0.022404 | -0.013854 | 0.007728 | 0.011934 | -0.038099 |
| 2 | 2023-10-05 | 0.007172 | -0.001626 | -0.019177 | 0.005734 | -0.003827 | -0.009772 | -0.002544 | -0.000143 | -0.011209 | ... | -0.007370 | 0.010927 | -0.008091 | -0.002214 | 0.009727 | 0.006129 | 0.008838 | 0.008432 | -0.011997 | -0.022768 |
| 3 | 2023-10-06 | 0.014643 | 0.005344 | 0.010901 | 0.007044 | 0.019634 | 0.010807 | 0.012333 | 0.007225 | 0.041214 | ... | 0.025889 | 0.016484 | 0.009030 | 0.005786 | 0.006659 | 0.015313 | -0.009838 | 0.009875 | -0.016927 | -0.016840 |
| 4 | 2023-10-09 | 0.008416 | 0.005852 | 0.011097 | -0.001239 | 0.004943 | -0.003743 | 0.015306 | -0.000998 | -0.002521 | ... | -0.007234 | 0.003234 | 0.009047 | 0.000454 | -0.002556 | -0.015144 | 0.019262 | 0.000252 | -0.003651 | 0.034393 |
5 rows × 111 columns
- Use the function ‘ljung_box_test’ to test the EMH using the Ljung-Box test for each of the stocks, and store the results in a data frame named lb_results. Note the function allows us to choose the number of lags, we will continue usign 5:
lags = 5
lb_results = ig.ljung_box_test(tickers, start_date, end_date, lags)
# Display the results
lb_results| Ticker | Ljung-Box Statistic | P-value | |
|---|---|---|---|
| 0 | AAPL | 3.659644 | 0.599381 |
| 1 | ABBV | 0.964368 | 0.965388 |
| 2 | ABNB | 5.066243 | 0.407849 |
| 3 | ABT | 3.803183 | 0.578086 |
| 4 | ADBE | 1.699569 | 0.888954 |
| ... | ... | ... | ... |
| 105 | VRTX | 8.923362 | 0.112160 |
| 106 | VZ | 9.525372 | 0.089856 |
| 107 | WFC | 6.704654 | 0.243548 |
| 108 | WMT | 1.462580 | 0.917347 |
| 109 | XOM | 5.994274 | 0.306776 |
110 rows × 3 columns
Filter for those stocks that contradict the EMH. In this example, we use a p-value < 0.20, that is, for those that reject the null hypothesis with a confidence level relatively low, of 80%.
no_stocks = lb_results[lb_results['P-value'] < 0.20]
print(f"{no_stocks.shape[0]} stocks reject the EMH")28 stocks reject the EMH
We can now export to an excel file the list of tickers that contradict the EMH for future reference as:
no_stocks.to_excel('no_stocks.xlsx', index=False)Or select from the original list of companies:
select_companies = companies[companies['Symbol'].isin(no_stocks['Ticker'])]
select_companies| Name | Symbol | Country | Sector | Market Cap | |
|---|---|---|---|---|---|
| 5 | Amazon.com Inc | AMZN | US | Consumer Discretionary | 2.030020e+12 |
| 6 | Meta Platforms Inc | META | US | Communication Services | 1.499610e+12 |
| 7 | Berkshire Hathaway Inc | BRK-B | US | Financials | 9.813140e+11 |
| 8 | Berkshire Hathaway Inc | BRK-A | US | Financials | 9.813140e+11 |
| 10 | Broadcom Inc | AVGO | US | Information Technology | 8.257580e+11 |
| 14 | Visa Inc | V | US | Financials | 5.738230e+11 |
| 17 | Oracle Corp | ORCL | US | Information Technology | 4.846800e+11 |
| 18 | Mastercard Inc | MA | US | Financials | 4.750540e+11 |
| 19 | Procter & Gamble Co | PG | US | Consumer Staples | 3.916550e+11 |
| 24 | Bank of America Corp | BAC | US | Financials | 3.254660e+11 |
| 36 | Thermo Fisher Scientific Inc | TMO | US | Health Care | 2.102390e+11 |
| 37 | McDonald's Corp | MCD | US | Consumer Discretionary | 2.099230e+11 |
| 43 | Morgan Stanley | MS | US | Financials | 1.907780e+11 |
| 44 | Texas Instruments Inc | TXN | US | Information Technology | 1.903750e+11 |
| 45 | General Electric Co | GE | US | Industrials | 1.891850e+11 |
| 47 | Qualcomm Inc | QCOM | US | Information Technology | 1.873800e+11 |
| 52 | Verizon Communications Inc | VZ | US | Communication Services | 1.741730e+11 |
| 57 | Comcast Corp | CMCSA | US | Communication Services | 1.638650e+11 |
| 71 | Charles Schwab Corp | SCHW | US | Financials | 1.302400e+11 |
| 76 | Boston Scientific Corp | BSX | US | Health Care | 1.238550e+11 |
| 77 | Vertex Pharmaceuticals Inc | VRTX | US | Health Care | 1.224950e+11 |
| 81 | Palo Alto Networks Inc | PANW | US | Information Technology | 1.171250e+11 |
| 83 | United Parcel Service Inc | UPS | US | Industrials | 1.148200e+11 |
| 84 | Analog Devices Inc | ADI | US | Information Technology | 1.146350e+11 |
| 94 | Regeneron Pharmaceuticals Inc | REGN | US | Health Care | 1.018900e+11 |
| 99 | Intel Corp | INTC | US | Information Technology | 9.595344e+10 |
| 101 | Elevance Health Inc | ELV | US | Health Care | 9.529940e+10 |
| 105 | KLA Corp | KLAC | US | Information Technology | 9.278362e+10 |
The statistical evidence rejects the EMH for these stocks.