Efficient Financial Markets [?]

Isai Guizar

iguizar@tec.mx

Disclaimer: This document is intended for educational purposes only. It does not constitute financial advice.

1 Intro

1.1 The Efficient Market Hypothesis

The description of the efficient market hypothesis (EMH) here follows Mishkin and Eakins (2024). It states that prices in financial markets fully reflect all available information. To understand the hypothesis, the authors define the arithmetic rate of return from holding a stock from time \(t\) to \(t+1\) as:

\[ R = \frac{P_{t+1}-P_t}{P_t} \] that is, the rate of capital gains (assuming no other cash payments).

At the start of the period, however, the price, \(P_{t+1}\), is unknown, but investors do have some expectations of the price, then, the expected return is:

\[ R^e = \frac{P^e_{t+1}-P_t}{P_t} \]

the efficient market hypothesis (EMH) views the expectations as the optimal forecast of the return \((R^{of})\), or simply as the best guess of the future, using all available information \([R^e = R^{of}]\).

The supply-and-demand analysis of a financial market teaches us that the expected return on the financial asset will converge to the equilibrium return, \(R^*\), that equates the quantity demanded to the quantity supplied. Then, if the market is in equilibrium, \(R^e = R^*\).

Therefore, in an efficient market:

\[ R^{of}=R^* \]

current prices in a financial market will be set so that the optimal forecast of a stock’s return using all available information equals the security’s equilibrium return.

\[ If \ R^{of} > R^* \Rightarrow P_t \uparrow \ \rightarrow \ R^{of} \downarrow \]

\[ If \ R^{of} < R^* \Rightarrow P_t \downarrow \ \rightarrow \ R^{of} \uparrow \] until \(R^{of}=R^*\).

More simply: in an efficient market a stock’s price fully reflects all available information.

A strong view of the EMH states that not only an efficient market is one in which expectations are optimal forecasts using all available information, but they also add the condition that an efficient market is one in which prices reflect the true fundamental (intrinsic) value of the securities. Thus, in an efficient market, all prices are always correct and reflect market fundamentals.

In favor of the EMH: Random-Walk

A random walk describes the movements of a variable whose future changes cannot be predicted (are random) because, given today’s value, the variable is just as likely to fall as to rise. Under the EMH all price changes are due to information that can not be anticipated, thus must be uncorrelated over time. Then, stock prices should approximately follow a random walk.

More formally, \(y_t\) represent a random walk if: \[ y_t = y_{t-1} + \epsilon_t \] where:

\(y_t:\) is the return at time \(t\)

\(\epsilon_t:\) is the random error term at time \(t\), independent and identically distributed (i.i.d.) with mean zero, constant variance, and no autocorrelation.

Against the EMH: Mean reversion and trend

Stocks with low returns today tend to have high returns in the future, and viceversa; hence stocks that have done poorly in the past are more likely to do well in the future because mean reversion indicates that there will be a predictable positive change in the future price. A trend is when a movement in one direction is followed by another in the same direction. Suggesting that stock prices are not a random walk.

More formally, using a simple first order auto-regression process:

\[ y_t = \rho y_{t-1} + \epsilon_t \] If \(1>\rho>0\), then trend

If \(-1<\rho<0\), then mean reversion

Note: when \(\rho \ne 0\), the time aggregation problem arises, which must be accounted for when measuring financial risk.

Behavioral Fianance

Dissatisfaction with using the EMH to explain events like 1987’s Black Monday, when stock markets around the world plummeted sharply, with the Dow Jones Industrial Average (DJIA) experiencing its largest one-day percentage drop in history—22.6%, gave rise to the field of behavioral finance.

Behavioral finance is a field of study that combines insights from psychology, sociology, anthropology and other social sciences with financial theory to understand how human behavior affects financial decision-making and markets. It challenges the traditional view of financial markets as perfectly rational and efficient, acknowledging that investors often act irrationally due to biases and emotions.

2 Testing the EMH

Recall: Autocorrelation measures the relationship between lagged values of a variable. Under the Efficient Markets Hypothesis, the stock returns should not be predictable, that is, they should not be autocorrelated.

Times series that show no autocorrelation are called white noise. We can evaluate if a process is white noise by evaluating the autocorrelations \((\rho)\).

Formally, we employ the Ljung and Box (1978) test to verify the null hypothesis:

\[ \begin{equation*} \rho_1 = \rho_2 = \dots =\rho_\tau = 0 \end{equation*} \]

for \(\tau \ge 1\). If the null is not rejected, the evidence is consistent with the EMH.

3 Application

We will test the Efficient Market Hypothesis (EMH) for individual stocks and then generalize the procedure to a portfolio of N stocks. The ultimate goal is to identify stocks whose return patterns reject the EMH, enabling us to build a portfolio composed of assets with potentially predictable returns.

Data will be obtained from yahoo finance, make sure you have installed the library (!pip install yfinance) before importing it, you only have to do this once. Also, I have written some useful functions in a file named “iguizarFuncs.py”, please make sure this file is loaded.

3.1 One stock

Import the libraries:

# !pip install yfinance
import yfinance as yf
import pandas   as pd
import numpy    as np 
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller
from statsmodels.stats.diagnostic import acorr_ljungbox
import iguizarFuncs as ig

Download the data for one stock (AAPL):

Data = yf.download('AAPL', start='2023-09-30', end='2024-09-30', progress=False)['Close']

# Reset index to make the date a regular column 
Data = Data.reset_index()

# Clean the 'Date' column
Data['Date'] = pd.to_datetime(Data['Date']).dt.date

Plot the price data:

plt.figure(figsize=(7, 5))
plt.plot(Data['Date'], Data['AAPL'])
plt.title("Time Series of the stock's prices")
plt.xlabel("Date", fontsize = 10)
plt.ylabel("Price", fontsize=10)
plt.xticks(rotation=45)
plt.tight_layout()
plt.legend()
plt.show()

Calculate the log returns:

Data['Log_Returns'] = np.log(Data['AAPL'] / Data['AAPL'].shift(1))

Plot the returns:

plt.figure(figsize=(7, 5))

plt.plot(Data['Date'], Data['Log_Returns'])
plt.axhline(0, color='black', linestyle='--', linewidth=0.5)  
plt.title("Times series of the stock's return")
plt.xlabel("Date")
plt.ylabel("Returns")
plt.xticks(rotation=45)
plt.tight_layout()
plt.legend()
plt.show()

The Ljung-Box test

We can now apply the Ljung-Box test. The analyst decides the number of lags, in this particular case we are choosing 5. Then, the null hypothesis:

\[ \rho_1 = \rho_2 = \rho_3 = \rho_4 = \rho_5=0 \]

indicates that the returns of the past 5 days have no influence on current returns. If the null hypothesis was rejected we would have statitical evidence that returns from at least one of the past 5 days are significantly correlated with today’s return — implying a deviation from the EMH.

# Run Ljung-Box test on log returns with 5 lags
ljung_box_results = acorr_ljungbox(Data['Log_Returns'].dropna(), lags=[5], return_df=True)

# Display the test results
print(ljung_box_results)

    lb_stat  lb_pvalue
5  3.659663   0.599378

Since the p-value is not below any conventional significance level, we fail to reject the null hypothesis. This means that, for this particular stock, the test does not detect significant autocorrelation (p-value ≥ 0.10), supporting the idea that returns are independent — a result consistent with the Efficient Market Hypothesis (EMH).

3.2 Multiple stocks

To generalize this approach to multiple stocks, we will use the file “toUScompanies.csv” that lists the top US companies by market cap. Please make sure you have loaded this file.

Specify the period and tickers of interest.

companies = pd.read_csv('toUScompanies.csv')
tickers   = companies['Symbol'].tolist()

start_date = '2023-09-30'
end_date   = '2024-09-30'

The function ‘get_prices’ is useful to extract daily prices for multiple tickers over a specified period:

nStocks = ig.get_prices(tickers, start_date, end_date)
nStocks.head(5)


1 Failed download:
['FI']: YFPricesMissingError('possibly delisted; no price data found  (1d 2023-09-30 -> 2024-09-30) (Yahoo error = "No data found, symbol may be delisted")')

Ticker	Date	AAPL	ABBV	ABNB	ABT	ADBE	ADI	ADP	AMAT	AMD	...	UBER	UNH	UNP	UPS	V	VRTX	VZ	WFC	WMT	XOM
0	2023-10-02	171.898010	136.820404	136.559998	91.142830	521.130005	168.909195	227.767929	136.763214	103.269997	...	45.680000	492.779694	192.550980	136.827042	227.412842	347.829987	26.810520	37.435558	52.041309	106.879936
1	2023-10-03	170.562393	136.063614	127.730003	91.199959	507.029999	165.134888	228.767014	134.028152	100.080002	...	44.509998	487.895599	193.634232	135.653580	224.993881	345.149994	26.996239	36.547161	51.713001	107.064819
2	2023-10-04	171.808960	136.303604	127.410004	91.066681	518.419983	167.330109	232.154449	136.557388	104.070000	...	44.939999	488.996857	192.199402	135.627106	227.363678	352.970001	26.624809	36.830696	52.333851	103.062469
3	2023-10-05	173.045624	136.082092	124.989998	91.590317	516.440002	165.702972	231.564514	136.537766	102.910004	...	44.610001	494.369324	190.650558	135.327133	229.585999	355.140015	26.861172	37.142574	51.709751	100.742416
4	2023-10-06	175.598160	136.811203	126.360001	92.237724	526.679993	167.503448	234.438156	137.527878	107.239998	...	45.779999	502.586029	192.379944	136.112366	231.119965	360.619995	26.598207	37.511162	50.841850	99.060143

5 rows × 111 columns

While the function ‘get_returns’ help us to obtain returns for multiple tickers over a specified period:

nStocksRet = ig.get_returns(tickers, start_date, end_date)
nStocksRet.head(5)


1 Failed download:
['FI']: YFPricesMissingError('possibly delisted; no price data found  (1d 2023-09-30 -> 2024-09-30) (Yahoo error = "No data found, symbol may be delisted")')

Ticker	Date	AAPL	ABBV	ABNB	ABT	ADBE	ADI	ADP	AMAT	AMD	...	UBER	UNH	UNP	UPS	V	VRTX	VZ	WFC	WMT	XOM
0	2023-10-03	-0.007800	-0.005547	-0.066845	0.000627	-0.027429	-0.022598	0.004377	-0.020201	-0.031377	...	-0.025947	-0.009961	0.005610	-0.008613	-0.010694	-0.007735	0.006903	-0.024018	-0.006328	0.001728
1	2023-10-04	0.007282	0.001762	-0.002508	-0.001462	0.022216	0.013206	0.014699	0.018695	0.039094	...	0.009614	0.002255	-0.007438	-0.000195	0.010478	0.022404	-0.013854	0.007728	0.011934	-0.038099
2	2023-10-05	0.007172	-0.001627	-0.019177	0.005733	-0.003827	-0.009772	-0.002544	-0.000143	-0.011209	...	-0.007370	0.010927	-0.008091	-0.002214	0.009727	0.006129	0.008838	0.008432	-0.011997	-0.022768
3	2023-10-06	0.014643	0.005344	0.010901	0.007044	0.019634	0.010807	0.012333	0.007225	0.041214	...	0.025889	0.016484	0.009030	0.005786	0.006659	0.015313	-0.009838	0.009875	-0.016927	-0.016840
4	2023-10-09	0.008416	0.005852	0.011097	-0.001239	0.004943	-0.003743	0.015306	-0.000998	-0.002521	...	-0.007234	0.003234	0.009047	0.000453	-0.002556	-0.015144	0.019262	0.000252	-0.003651	0.034393

5 rows × 111 columns

Use the function ‘ljung_box_test’ to test the EMH using the Ljung-Box test for each of the stocks, and store the results in a data frame named lb_results. Note the function allows us to choose the number of lags, we will continue using 5:

lags = 5
lb_results = ig.ljung_box_test(tickers, start_date, end_date, lags)

# Display the results
lb_results


3 Failed downloads:
['FI']: YFPricesMissingError('possibly delisted; no price data found  (1d 2023-09-30 -> 2024-09-30) (Yahoo error = "No data found, symbol may be delisted")')
['TMUS']: ConnectionError("Failed to perform, curl: (7) Failed to connect to fc.yahoo.com port 443 after 72 ms: Couldn't connect to server. See https://curl.se/libcurl/c/libcurl-errors.html first for more details.")
['AMAT']: ConnectionError("Failed to perform, curl: (7) Failed to connect to fc.yahoo.com port 443 after 74 ms: Couldn't connect to server. See https://curl.se/libcurl/c/libcurl-errors.html first for more details.")

	Ticker	Ljung-Box Statistic	P-value
0	AAPL	3.659634	0.599382
1	ABBV	0.964419	0.965384
2	ABNB	5.066243	0.407849
3	ABT	3.803216	0.578081
4	ADBE	1.699569	0.888954
...	...	...	...
105	VRTX	8.923362	0.112160
106	VZ	9.525351	0.089857
107	WFC	6.704683	0.243546
108	WMT	1.462589	0.917346
109	XOM	5.994265	0.306777

110 rows × 3 columns

Filter for those stocks that contradict the EMH. In this example, we use a p-value < 0.20, that is, for those that reject the null hypothesis with a confidence level relatively low, of 80%.

no_stocks = lb_results[lb_results['P-value'] < 0.20]


print(f"{no_stocks.shape[0]} stocks reject the EMH")

28 stocks reject the EMH

We can now export to an excel file the list of tickers that contradict the EMH for future reference as:

no_stocks.to_excel('no_stocks.xlsx', index=False)

Or select from the original list of companies:

select_companies = companies[companies['Symbol'].isin(no_stocks['Ticker'])]
select_companies

	Name	Symbol	Country	Sector	Market Cap
5	Amazon.com Inc	AMZN	US	Consumer Discretionary	2.030020e+12
6	Meta Platforms Inc	META	US	Communication Services	1.499610e+12
7	Berkshire Hathaway Inc	BRK-B	US	Financials	9.813140e+11
8	Berkshire Hathaway Inc	BRK-A	US	Financials	9.813140e+11
10	Broadcom Inc	AVGO	US	Information Technology	8.257580e+11
14	Visa Inc	V	US	Financials	5.738230e+11
17	Oracle Corp	ORCL	US	Information Technology	4.846800e+11
18	Mastercard Inc	MA	US	Financials	4.750540e+11
19	Procter & Gamble Co	PG	US	Consumer Staples	3.916550e+11
24	Bank of America Corp	BAC	US	Financials	3.254660e+11
36	Thermo Fisher Scientific Inc	TMO	US	Health Care	2.102390e+11
37	McDonald's Corp	MCD	US	Consumer Discretionary	2.099230e+11
43	Morgan Stanley	MS	US	Financials	1.907780e+11
44	Texas Instruments Inc	TXN	US	Information Technology	1.903750e+11
45	General Electric Co	GE	US	Industrials	1.891850e+11
47	Qualcomm Inc	QCOM	US	Information Technology	1.873800e+11
52	Verizon Communications Inc	VZ	US	Communication Services	1.741730e+11
57	Comcast Corp	CMCSA	US	Communication Services	1.638650e+11
71	Charles Schwab Corp	SCHW	US	Financials	1.302400e+11
76	Boston Scientific Corp	BSX	US	Health Care	1.238550e+11
77	Vertex Pharmaceuticals Inc	VRTX	US	Health Care	1.224950e+11
81	Palo Alto Networks Inc	PANW	US	Information Technology	1.171250e+11
83	United Parcel Service Inc	UPS	US	Industrials	1.148200e+11
84	Analog Devices Inc	ADI	US	Information Technology	1.146350e+11
94	Regeneron Pharmaceuticals Inc	REGN	US	Health Care	1.018900e+11
99	Intel Corp	INTC	US	Information Technology	9.595344e+10
101	Elevance Health Inc	ELV	US	Health Care	9.529940e+10
105	KLA Corp	KLAC	US	Information Technology	9.278362e+10

The statistical evidence rejects the EMH for these stocks.

References

Ljung, G. M., and G. E. P. Box. 1978. “On a Measure of Lack of Fit in Time Series Models.” Biometrika 65 (2): 297–303. https://doi.org/10.1093/biomet/65.2.297.

Mishkin, Frederic S, and Stanley G Eakins. 2024. Financial Markets and Institutions. Pearson.