Efficient Financial Markets [?]

Isai Guizar

iguizar@tec.mx

Disclaimer: This document is intended for educational purposes only. It does not constitute financial advice.

1 Intro

1.1 The Efficient Market Hypothesis

The description of the efficient market hypothesis (EMH) here follows Mishkin and Eakins (2024). It states that prices in financial markets fully reflect all available information. To understand the hypothesis, the authors define the arithmetic rate of return from holding a stock from time \(t\) to \(t+1\) as:

\[ R = \frac{P_{t+1}-P_t}{P_t} \] that is, the rate of capital gains (assuming no other cash payments).

At the start of the period, however, the price, \(P_{t+1}\), is unknown, but investors do have some expectations of the price, then, the expected return is:

\[ R^e = \frac{P^e_{t+1}-P_t}{P_t} \]

the efficient market hypothesis (EMH) views the expectations as the optimal forecast of the return \((R^{of})\), or simply as the best guess of the future, using all available information \([R^e = R^{of}]\).

The supply-and-demand analysis of a financial market teaches us that the expected return on the financial asset will converge to the equilibrium return, \(R^*\), that equates the quantity demanded to the quantity supplied. Then, if the market is in equilibrium, \(R^e = R^*\).

Therefore, in an efficient market:

\[ R^{of}=R^* \]

current prices in a financial market will be set so that the optimal forecast of a stock’s return using all available information equals the security’s equilibrium return.

\[ If \ R^{of} > R^* \Rightarrow P_t \uparrow \ \rightarrow \ R^{of} \downarrow \]

\[ If \ R^{of} < R^* \Rightarrow P_t \downarrow \ \rightarrow \ R^{of} \uparrow \] until \(R^{of}=R^*\).

More simply: in an efficient market a stock’s price fully reflects all available information.

A strong view of the EMH states that not only an efficient market is one in which expectations are optimal forecasts using all available information, but they also add the condition that an efficient market is one in which prices reflect the true fundamental (intrinsic) value of the securities. Thus, in an efficient market, all prices are always correct and reflect market fundamentals.

In favor of the EMH: Random-Walk

A random walk describes the movements of a variable whose future changes cannot be predicted (are random) because, given today’s value, the variable is just as likely to fall as to rise. Under the EMH all price changes are due to information that can not be anticipated, thus must be uncorrelated over time. Then, stock prices should approximately follow a random walk.

More formally, \(y_t\) represent a random walk if: \[ y_t = y_{t-1} + \epsilon_t \] where:

\(y_t:\) is the return at time \(t\)

\(\epsilon_t:\) is the random error term at time \(t\), independent and identically distributed (i.i.d.) with mean zero, constant variance, and no autocorrelation.

Against the EMH: Mean reversion and trend

Stocks with low returns today tend to have high returns in the future, and viceversa; hence stocks that have done poorly in the past are more likely to do well in the future because mean reversion indicates that there will be a predictable positive change in the future price. A trend is when a movement in one direction is followed by another in the same direction. Suggesting that stock prices are not a random walk.

More formally, using a simple first order auto-regression process:

\[ y_t = \rho y_{t-1} + \epsilon_t \] If \(1>\rho>0\), then trend

If \(-1<\rho<0\), then mean reversion

Note: when \(\rho \ne 0\), the time aggregation problem arises, which must be accounted for when measuring financial risk.

ImportantBehavioral Fianance

Dissatisfaction with using the EMH to explain events like 1987’s Black Monday, when stock markets around the world plummeted sharply, with the Dow Jones Industrial Average (DJIA) experiencing its largest one-day percentage drop in history—22.6%, gave rise to the field of behavioral finance.

Behavioral finance is a field of study that combines insights from psychology, sociology, anthropology and other social sciences with financial theory to understand how human behavior affects financial decision-making and markets. It challenges the traditional view of financial markets as perfectly rational and efficient, acknowledging that investors often act irrationally due to biases and emotions.


2 Testing the EMH

Recall: Autocorrelation measures the relationship between lagged values of a variable. Under the Efficient Markets Hypothesis, the stock returns should not be predictable, that is, they should not be autocorrelated.

Times series that show no autocorrelation are called white noise. We can evaluate if a process is white noise by evaluating the autocorrelations \((\rho)\).

Formally, we employ the Ljung and Box (1978) test to verify the null hypothesis:

\[ \begin{equation*} \rho_1 = \rho_2 = \dots =\rho_\tau = 0 \end{equation*} \]

for \(\tau \ge 1\). If the null is not rejected, the evidence is consistent with the EMH.

3 Application

We will test the Efficient Market Hypothesis (EMH) for individual stocks and then generalize the procedure to a portfolio of N stocks. The ultimate goal is to identify stocks whose return patterns reject the EMH, enabling us to build a portfolio composed of assets with potentially predictable returns.

Data will be obtained from yahoo finance, make sure you have installed the library (!pip install yfinance) before importing it, you only have to do this once. Also, I have written some useful functions in a file named “iguizarFuncs.py”, please make sure this file is loaded.

3.1 One stock

Import the libraries:

# !pip install yfinance
import yfinance as yf
import pandas   as pd
import numpy    as np 
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller
from statsmodels.stats.diagnostic import acorr_ljungbox
import iguizarFuncs as ig

Download the data for one stock (AAPL):

Data = yf.download('AAPL', start='2023-09-30', end='2024-09-30', progress=False)['Close']

# Reset index to make the date a regular column 
Data = Data.reset_index()

# Clean the 'Date' column
Data['Date'] = pd.to_datetime(Data['Date']).dt.date

Plot the price data:

plt.figure(figsize=(7, 5))
plt.plot(Data['Date'], Data['AAPL'])
plt.title("Time Series of the stock's prices")
plt.xlabel("Date", fontsize = 10)
plt.ylabel("Price", fontsize=10)
plt.xticks(rotation=45)
plt.tight_layout()
plt.legend()
plt.show()

Calculate the log returns:

Data['Log_Returns'] = np.log(Data['AAPL'] / Data['AAPL'].shift(1))

Plot the returns:

plt.figure(figsize=(7, 5))

plt.plot(Data['Date'], Data['Log_Returns'])
plt.axhline(0, color='black', linestyle='--', linewidth=0.5)  
plt.title("Times series of the stock's return")
plt.xlabel("Date")
plt.ylabel("Returns")
plt.xticks(rotation=45)
plt.tight_layout()
plt.legend()
plt.show()

The Ljung-Box test

We can now apply the Ljung-Box test. The analyst decides the number of lags, in this particular case we are choosing 5. Then, the null hypothesis:

\[ \rho_1 = \rho_2 = \rho_3 = \rho_4 = \rho_5=0 \]

indicates that the returns of the past 5 days have no influence on current returns. If the null hypothesis was rejected we would have statitical evidence that returns from at least one of the past 5 days are significantly correlated with today’s return — implying a deviation from the EMH.

# Run Ljung-Box test on log returns with 5 lags
ljung_box_results = acorr_ljungbox(Data['Log_Returns'].dropna(), lags=[5], return_df=True)

# Display the test results
print(ljung_box_results)
    lb_stat  lb_pvalue
5  3.659663   0.599378

Since the p-value is not below any conventional significance level, we fail to reject the null hypothesis. This means that, for this particular stock, the test does not detect significant autocorrelation (p-value ≥ 0.10), supporting the idea that returns are independent — a result consistent with the Efficient Market Hypothesis (EMH).

3.2 Multiple stocks

To generalize this approach to multiple stocks, we will use the file “toUScompanies.csv” that lists the top US companies by market cap. Please make sure you have loaded this file.

  1. Specify the period and tickers of interest.
companies = pd.read_csv('toUScompanies.csv')
tickers   = companies['Symbol'].tolist()

start_date = '2023-09-30'
end_date   = '2024-09-30'

The function ‘get_prices’ is useful to extract daily prices for multiple tickers over a specified period:

nStocks = ig.get_prices(tickers, start_date, end_date)
nStocks.head(5)

1 Failed download:
['FI']: YFPricesMissingError('possibly delisted; no price data found  (1d 2023-09-30 -> 2024-09-30) (Yahoo error = "No data found, symbol may be delisted")')
Ticker Date AAPL ABBV ABNB ABT ADBE ADI ADP AMAT AMD ... UBER UNH UNP UPS V VRTX VZ WFC WMT XOM
0 2023-10-02 171.898010 136.820404 136.559998 91.142830 521.130005 168.909195 227.767929 136.763214 103.269997 ... 45.680000 492.779694 192.550980 136.827042 227.412842 347.829987 26.810520 37.435558 52.041309 106.879936
1 2023-10-03 170.562393 136.063614 127.730003 91.199959 507.029999 165.134888 228.767014 134.028152 100.080002 ... 44.509998 487.895599 193.634232 135.653580 224.993881 345.149994 26.996239 36.547161 51.713001 107.064819
2 2023-10-04 171.808960 136.303604 127.410004 91.066681 518.419983 167.330109 232.154449 136.557388 104.070000 ... 44.939999 488.996857 192.199402 135.627106 227.363678 352.970001 26.624809 36.830696 52.333851 103.062469
3 2023-10-05 173.045624 136.082092 124.989998 91.590317 516.440002 165.702972 231.564514 136.537766 102.910004 ... 44.610001 494.369324 190.650558 135.327133 229.585999 355.140015 26.861172 37.142574 51.709751 100.742416
4 2023-10-06 175.598160 136.811203 126.360001 92.237724 526.679993 167.503448 234.438156 137.527878 107.239998 ... 45.779999 502.586029 192.379944 136.112366 231.119965 360.619995 26.598207 37.511162 50.841850 99.060143

5 rows × 111 columns

While the function ‘get_returns’ help us to obtain returns for multiple tickers over a specified period:

nStocksRet = ig.get_returns(tickers, start_date, end_date)
nStocksRet.head(5)

1 Failed download:
['FI']: YFPricesMissingError('possibly delisted; no price data found  (1d 2023-09-30 -> 2024-09-30) (Yahoo error = "No data found, symbol may be delisted")')
Ticker Date AAPL ABBV ABNB ABT ADBE ADI ADP AMAT AMD ... UBER UNH UNP UPS V VRTX VZ WFC WMT XOM
0 2023-10-03 -0.007800 -0.005547 -0.066845 0.000627 -0.027429 -0.022598 0.004377 -0.020201 -0.031377 ... -0.025947 -0.009961 0.005610 -0.008613 -0.010694 -0.007735 0.006903 -0.024018 -0.006328 0.001728
1 2023-10-04 0.007282 0.001762 -0.002508 -0.001462 0.022216 0.013206 0.014699 0.018695 0.039094 ... 0.009614 0.002255 -0.007438 -0.000195 0.010478 0.022404 -0.013854 0.007728 0.011934 -0.038099
2 2023-10-05 0.007172 -0.001627 -0.019177 0.005733 -0.003827 -0.009772 -0.002544 -0.000143 -0.011209 ... -0.007370 0.010927 -0.008091 -0.002214 0.009727 0.006129 0.008838 0.008432 -0.011997 -0.022768
3 2023-10-06 0.014643 0.005344 0.010901 0.007044 0.019634 0.010807 0.012333 0.007225 0.041214 ... 0.025889 0.016484 0.009030 0.005786 0.006659 0.015313 -0.009838 0.009875 -0.016927 -0.016840
4 2023-10-09 0.008416 0.005852 0.011097 -0.001239 0.004943 -0.003743 0.015306 -0.000998 -0.002521 ... -0.007234 0.003234 0.009047 0.000453 -0.002556 -0.015144 0.019262 0.000252 -0.003651 0.034393

5 rows × 111 columns

  1. Use the function ‘ljung_box_test’ to test the EMH using the Ljung-Box test for each of the stocks, and store the results in a data frame named lb_results. Note the function allows us to choose the number of lags, we will continue using 5:
lags = 5
lb_results = ig.ljung_box_test(tickers, start_date, end_date, lags)

# Display the results
lb_results

3 Failed downloads:
['FI']: YFPricesMissingError('possibly delisted; no price data found  (1d 2023-09-30 -> 2024-09-30) (Yahoo error = "No data found, symbol may be delisted")')
['TMUS']: ConnectionError("Failed to perform, curl: (7) Failed to connect to fc.yahoo.com port 443 after 72 ms: Couldn't connect to server. See https://curl.se/libcurl/c/libcurl-errors.html first for more details.")
['AMAT']: ConnectionError("Failed to perform, curl: (7) Failed to connect to fc.yahoo.com port 443 after 74 ms: Couldn't connect to server. See https://curl.se/libcurl/c/libcurl-errors.html first for more details.")
Ticker Ljung-Box Statistic P-value
0 AAPL 3.659634 0.599382
1 ABBV 0.964419 0.965384
2 ABNB 5.066243 0.407849
3 ABT 3.803216 0.578081
4 ADBE 1.699569 0.888954
... ... ... ...
105 VRTX 8.923362 0.112160
106 VZ 9.525351 0.089857
107 WFC 6.704683 0.243546
108 WMT 1.462589 0.917346
109 XOM 5.994265 0.306777

110 rows × 3 columns

Filter for those stocks that contradict the EMH. In this example, we use a p-value < 0.20, that is, for those that reject the null hypothesis with a confidence level relatively low, of 80%.

no_stocks = lb_results[lb_results['P-value'] < 0.20]


print(f"{no_stocks.shape[0]} stocks reject the EMH")
28 stocks reject the EMH

We can now export to an excel file the list of tickers that contradict the EMH for future reference as:

no_stocks.to_excel('no_stocks.xlsx', index=False)

Or select from the original list of companies:

select_companies = companies[companies['Symbol'].isin(no_stocks['Ticker'])]
select_companies
Name Symbol Country Sector Market Cap
5 Amazon.com Inc AMZN US Consumer Discretionary 2.030020e+12
6 Meta Platforms Inc META US Communication Services 1.499610e+12
7 Berkshire Hathaway Inc BRK-B US Financials 9.813140e+11
8 Berkshire Hathaway Inc BRK-A US Financials 9.813140e+11
10 Broadcom Inc AVGO US Information Technology 8.257580e+11
14 Visa Inc V US Financials 5.738230e+11
17 Oracle Corp ORCL US Information Technology 4.846800e+11
18 Mastercard Inc MA US Financials 4.750540e+11
19 Procter & Gamble Co PG US Consumer Staples 3.916550e+11
24 Bank of America Corp BAC US Financials 3.254660e+11
36 Thermo Fisher Scientific Inc TMO US Health Care 2.102390e+11
37 McDonald's Corp MCD US Consumer Discretionary 2.099230e+11
43 Morgan Stanley MS US Financials 1.907780e+11
44 Texas Instruments Inc TXN US Information Technology 1.903750e+11
45 General Electric Co GE US Industrials 1.891850e+11
47 Qualcomm Inc QCOM US Information Technology 1.873800e+11
52 Verizon Communications Inc VZ US Communication Services 1.741730e+11
57 Comcast Corp CMCSA US Communication Services 1.638650e+11
71 Charles Schwab Corp SCHW US Financials 1.302400e+11
76 Boston Scientific Corp BSX US Health Care 1.238550e+11
77 Vertex Pharmaceuticals Inc VRTX US Health Care 1.224950e+11
81 Palo Alto Networks Inc PANW US Information Technology 1.171250e+11
83 United Parcel Service Inc UPS US Industrials 1.148200e+11
84 Analog Devices Inc ADI US Information Technology 1.146350e+11
94 Regeneron Pharmaceuticals Inc REGN US Health Care 1.018900e+11
99 Intel Corp INTC US Information Technology 9.595344e+10
101 Elevance Health Inc ELV US Health Care 9.529940e+10
105 KLA Corp KLAC US Information Technology 9.278362e+10

The statistical evidence rejects the EMH for these stocks.

References

Ljung, G. M., and G. E. P. Box. 1978. “On a Measure of Lack of Fit in Time Series Models.” Biometrika 65 (2): 297–303. https://doi.org/10.1093/biomet/65.2.297.
Mishkin, Frederic S, and Stanley G Eakins. 2024. Financial Markets and Institutions. Pearson.