Rational Agent and Behavioral Finance

FZ2024 Financial Modeling and Programming

Author

Affiliation

Sergio Castellanos-Gamboa, PhD

Tecnológico de Monterrey

Published

November 11, 2025

0.1 Before you begin: important instructions for all Workshops

Welcome to our workshop series! Please read these instructions carefully before starting any activity. Following these guidelines will make your work smoother and ensure that your submissions are graded without issues.

0.1.1 Working environment

We will use Google Colab for all workshops. Colab runs Python in the cloud — you don’t need to install anything locally.

Access Colab at: https://colab.research.google.com/
Sign in with your institutional Google account for access to all features.
Always save a copy of the notebook to your Google Drive:
- Go to File → Save a copy in Drive.

0.1.2 Loading data

You may work with datasets provided by the instructor or public datasets online. You will receive instructions each time to load the data with Python code. However, it is a good idea to store files, like data or your own notes, in a dedicated Google Drive folder:

Create a folder in your Google Drive named fz2024_workshops (or similar).
Upload your datasets there.

0.1.3 Output and submission format

After completing the workshop, export your notebook as PDF:
In Colab: File → Print → Save as PDF.
Submit the PDF file through Canvas, as well as the.ipynb.
Include all outputs, tables, and graphs in your PDF — make sure you run all cells before exporting.
Name your PDF file using the following format: Lastname_Firstname_WorkshopX.pdf

0.1.4 Deadlines

All assignments must be uploaded to Canvas before the stated deadline. Late submissions are not accepted. Once you have read and understood these instructions, you are ready to begin the workshop!

1 Overview

This workshop is the first of three wokshops that together explore how to build a portfolio grounded in Rational Agent Theory, Behavioural Finance, and the Market Anomalies literature. The process unfolds in three analytical stages:

Filter 1 – Market Efficiency: Test whether each stock’s historical information helps forecast its price. This step identifies inefficient markets, where past returns contain predictive signals.
Filter 2 – Market Anomalies: Examine systematic patterns—such as momentum or trend-following behaviour—that contradict the Efficient Market Hypothesis. Here, the strategy buys assets with upward trends and sells those trending downward.
Filter 3 – Portfolio Allocation: Optimize the portfolio composition using only the assets that pass the previous filters.

Unlike the traditional Markowitz framework, which focuses solely on optimizing asset weights, this three-part approach first conducts stock selection through theoretical filters. Each filter reflects assumptions derived from the rational agent and behavioural perspectives, allowing students to connect empirical testing with economic theory before moving into optimization.

This workshop designs a pre-allocation stock filter using (i) Efficient Market Hypothesis (EMH) tests on the conditional mean of returns, and (ii) a Breusch–Pagan test on the conditional variance of returns. The resulting list of stocks passes to the next stage (trend/anomaly filter) before portfolio optimization. This is not a pure Markowitz exercise; we first select assets under theory-driven assumptions, then optimize weights.

2 Introduction to rational agent and behavioural finance

The Efficient Market Hypothesis (EMH) posits that asset prices “fully reflect” available information. In its strict form, past returns should not help predict current returns (Fama, 1970). By contrast, behavioural finance documents departures from full rationality—heuristics, over/under-reaction, limits to arbitrage—that can generate predictable patterns in returns (Burton & Shah, 2013).

From an empirical perspective, a practical first step is to ask: Can yesterday’s return help forecast today’s return? If no, this is consistent with EMH (at least for the conditional mean). If yes, the asset exhibits predictability, suggesting a possible inefficiency exploitable by systematic strategies (Aldridge, 2010).

Dimension	Rational Agent / Efficient Markets	Behavioural Finance
Core Logic	Markets are efficient and prices fully reflect all available information. Agents act rationally, updating beliefs and maximizing expected utility.	Markets are not fully efficient because investors are subject to cognitive biases, emotions, and social dynamics that distort decisions and prices.
View of Human Behavior	Individuals are homo economicus: rational, consistent, and optimizing.	Individuals are boundedly rational: influenced by heuristics, overconfidence, loss aversion, mental accounting, and social imitation.
Price Formation	Prices adjust instantaneously to new information through arbitrage and competition among rational traders.	Prices may deviate from fundamentals due to collective biases, slow information diffusion, or feedback loops (herding, momentum, bubbles).
Market Outcomes	Random walk of prices; predictability is negligible in the short run. Any abnormal pattern is quickly arbitraged away.	Persistent anomalies (momentum, overreaction, calendar effects) can exist, as markets do not always self-correct efficiently.
Predictability of Returns	Past prices or returns contain no useful information about future prices; returns are unpredictable.	Some predictability may exist because human reactions to information are not purely rational or immediate.
Role of Emotions and Psychology	Minimal—emotions are assumed irrelevant to market equilibrium.	Central—fear, greed, regret, and overconfidence drive much of observed market behavior.
Policy and Strategy Implications	Focus on diversification, passive investing, and risk–return optimization under rational expectations.	Incorporate investor behavior, sentiment, and cognitive limits into strategy design, forecasting, and regulation.
Narrative Essence	“Markets are rational, investors are disciplined, and prices tell the truth.”	“Markets are stories, investors are human, and psychology shapes the truth.”

The rational agent model provides the benchmark of how markets should work under ideal conditions, while behavioural finance explains why they often do not. Modern finance uses both: EMH as a theoretical foundation, and behavioural insights to interpret deviations from it.

In what follows, we implement two simple tests frequently taught in introductory econometrics/finance courses (Wooldridge, 2020):

EMH mean test (OLS with lagged returns): If past returns have no predictive power, their coefficients should be statistically insignificant.
EMH variance test (Breusch–Pagan): If the variance of returns is unrelated to past information/explanatory variables, the model is homoscedastic. Heteroscedasticity can be interpreted as structure in second moments, which, under certain trading schemes, may be used for prediction or risk timing (Breusch & Pagan, 1979).

We apply these tests asset-by-asset to filter a large ticker set down to a shortlist for the following topics in this course (anomaly/trend filter).

2.1 Refresher: logs and log returns

Log returns are widely used because they (i) are time-additive, (ii) approximate arithmetic returns for small changes, and (iii) connect naturally to continuous compounding.

Price at date t: P_t > 0.
Log price: \ln P_t.
Log return:
r_t^{\log} \equiv \ln P_t - \ln P_{t-1} = \ln\!\left(\frac{P_t}{P_{t-1}}\right).
Relationship to arithmetic return r_t^{\text{arith}} = \frac{P_t}{P_{t-1}} - 1:
\;\; r_t^{\log} \approx r_t^{\text{arith}} for small returns; exactly,
\;\; 1 + r_t^{\text{arith}} = e^{r_t^{\log}}.

If P_{t-1}=100 and P_t=101, then r_t^{\log} = \ln(101/100) \approx 0.00995 \approx 0.995\%, while r_t^{\text{arith}} = 0.01 = 1\%.

In the code below we will compute log returns by differencing the log price series.

3 Setup and data

We use daily close prices for a broad set of large-cap tickers (firms whose total market value of outstanding shares typically exceeds $10 billion USD.) As usual, first let’s load all the libraries we will need for the workshop. It is always a good idea to keep them all grouped so they are easier to track.

# Load libraries for the workshop
import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels.stats.api as sms

# Additionals
from statsmodels.stats.diagnostic import het_breuschpagan # Breusch-Pagan test for homoscedasticity
from statsmodels.tools.tools import add_constant # Add constant for regression loop

Now, we will load the dataset from an online repository. Column names are in the form TICKER.Close; the index is the trading date. The sample spans roughly 2020-01 to 2022-05.

data_url = "https://raw.githubusercontent.com/abernal30/AFP_py/refs/heads/main/data/1Rational_agent.csv"
data = pd.read_csv(data_url, index_col=0) # index_col= 0 indicates that date is in the first column 

# Show data
data

	AAPL.Close	MSFT.Close	GOOG.Close	GOOGL.Close	AMZN.Close	TSLA.Close	BRK.A.Close	BRK.B.Close	FB.Close	TSM.Close	...	TMUS.Close	PM.Close	AMD.Close	LIN.Close	TXN.Close	CRM.Close	BMY.Close	UPS.Close	RLLCF.Close	QCOM.Close
date
01/02/2020	75.087502	160.619995	1367.369995	1368.680054	1898.010010	86.052002	342261	228.389999	209.779999	60.040001	...	78.589996	85.190002	49.099998	210.740005	129.570007	166.990005	63.340000	116.790001	0.0046	88.690002
01/03/2020	74.357498	158.619995	1360.660034	1361.520020	1874.969971	88.601997	339155	226.179993	208.669998	58.060001	...	78.169998	85.029999	48.599998	205.259995	127.849998	166.169998	62.779999	116.720001	0.0100	87.019997
01/06/2020	74.949997	159.029999	1394.209961	1397.810059	1902.880005	90.307999	340210	226.990005	212.600006	57.389999	...	78.620003	86.019997	48.389999	204.389999	126.959999	173.449997	62.980000	116.199997	0.0217	86.510002
01/07/2020	74.597504	157.580002	1393.339966	1395.109985	1906.859985	93.811996	338901	225.919998	213.059998	58.320000	...	78.919998	86.400002	48.250000	204.830002	129.410004	176.000000	63.930000	116.000000	0.0126	88.970001
01/08/2020	75.797501	160.089996	1404.319946	1405.040039	1891.969971	98.428001	339188	225.990005	215.220001	58.750000	...	79.419998	88.040001	47.830002	207.389999	129.759995	177.330002	63.860001	116.660004	0.0099	88.709999
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
05/20/2022	137.589996	252.559998	2186.260010	2178.159912	2151.820068	663.900024	456500	304.049988	193.539993	90.779999	...	126.040001	101.150002	93.500000	315.179993	169.809998	159.649994	76.190002	171.039993	0.0072	131.600006
05/23/2022	143.110001	260.649994	2233.330078	2229.760010	2151.139893	674.900024	464510	310.200012	196.229996	91.500000	...	129.889999	102.910004	95.070000	320.420013	169.929993	160.320007	76.699997	174.389999	0.0085	132.119995
05/24/2022	140.360001	259.619995	2118.520020	2119.399902	2082.000000	628.159973	463606	309.170013	181.279999	88.720001	...	129.220001	106.620003	91.160004	320.489990	167.860001	156.929993	77.129997	174.110001	0.0075	128.529999
05/25/2022	140.520004	262.519989	2116.790039	2116.100098	2135.500000	658.799988	462890	308.640015	183.830002	90.410004	...	131.440002	108.570000	92.650002	315.850006	170.009995	159.649994	77.239998	173.860001	0.0080	131.229996
05/26/2022	143.779999	265.899994	2165.919922	2155.850098	2221.550049	707.729980	468805	312.500000	191.630005	91.000000	...	132.740005	108.070000	98.750000	320.329987	174.130005	162.460007	77.589996	178.380005	0.0085	134.839996

606 rows × 100 columns

4 EMH mean test on historical returns (one asset)

We test whether lagged log returns have predictive power for today’s log return using a simple OLS:

r_t = \alpha + \beta_1 r_{t-1} + \beta_2 r_{t-2} + \varepsilon_t,

where r_t denotes log return. Under EMH (mean efficiency), \beta_1 = \beta_2 = 0.

# Choose a single asset for illustration
stock = "AAPL.Close"

# Compute log returns
stock_price = data[stock].dropna()
r = np.log(stock_price).diff() # First differences of natural logarithm of price.

# Build the dataset for the regression with *lags as past information*

df = pd.DataFrame({
    "r": r,
    "lag_1": r.shift(1), #Defining first lag
    "lag_2": r.shift(2)  # Defining second lag
})

df.head()

	r	lag_1	lag_2
date
01/02/2020	NaN	NaN	NaN
01/03/2020	-0.009770	NaN	NaN
01/06/2020	0.007937	-0.009770	NaN
01/07/2020	-0.004714	0.007937	-0.009770
01/08/2020	0.015958	-0.004714	0.007937

Now that we have the dataset with the returns for the first stock and the first two lags of that return, we can estimate an OLS regression:

df = df.dropna() # Remember to drop all missing values

# OLS regression for EMH test
X = sm.add_constant(df[["lag_1", "lag_2"]])
y = df["r"]

ols = sm.OLS(y, X).fit()
print(ols.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      r   R-squared:                       0.034
Model:                            OLS   Adj. R-squared:                  0.031
Method:                 Least Squares   F-statistic:                     10.65
Date:                Tue, 11 Nov 2025   Prob (F-statistic):           2.85e-05
Time:                        15:32:01   Log-Likelihood:                 1418.9
No. Observations:                 603   AIC:                            -2832.
Df Residuals:                     600   BIC:                            -2819.
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0012      0.001      1.315      0.189      -0.001       0.003
lag_1         -0.1779      0.041     -4.356      0.000      -0.258      -0.098
lag_2          0.0286      0.041      0.700      0.484      -0.052       0.109
==============================================================================
Omnibus:                       63.982   Durbin-Watson:                   1.997
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              385.101
Skew:                          -0.189   Prob(JB):                     2.38e-84
Kurtosis:                       6.897   Cond. No.                         47.3
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

If the p-values on lag_1 and lag_2 are large (e.g., > 0.10), we fail to reject the EMH in mean for this asset. If one or both are statistically significant, we find predictability inconsistent with EMH’s strict form.

For the stock selected in this example, the Efficient Market Hypothesis (EMH) mean test yields inconclusive evidence. The first lag (r_{t-1}) is statistically significant at the 1% level, while the second lag (r_{t-2}) is not significant at any conventional level.

In practical terms, this means that yesterday’s return appears to help predict today’s return, but the return from two days ago does not. For a clear rejection of the EMH (in its weak form), both lag coefficients would need to be significant, implying that historical returns consistently contain predictive information.

Conversely, if both lags were insignificant, we would fail to reject the EMH, concluding that the stock’s past returns provide no useful information for forecasting future returns—consistent with an efficient market.

Since in this case one coefficient is significant and the other is not, the result is inconclusive.
When faced with such mixed outcomes, it is advisable to complement the test with additional diagnostics—such as variance-based tests (e.g., the Breusch–Pagan test)—to evaluate whether patterns may exist in the volatility of returns even when mean predictability is unclear.

5 EMH variance test (one asset)

We use the Breusch–Pagan (BP) test (Breusch & Pagan, 1979) to assess whether the error variance depends on the regressors (here, lagged returns). Under homoskedasticity (EMH-consistent variance), the BP test’s null holds.

5.0.1 Breusch–Pagan (BP) Test

The Breusch–Pagan test (Breusch & Pagan, 1979) examines whether the variance of the residuals from a regression model depends on the explanatory variables.

It starts from the standard regression model:

y_t = X_t \beta + \varepsilon_t

where the residuals \varepsilon_t are assumed to have constant variance under the null hypothesis of homoscedasticity.

The BP test models the squared residuals as a function of the regressors:

\hat{\varepsilon}_t^2 = \delta_0 + \delta_1 X_{1t} + \delta_2 X_{2t} + \dots + \delta_k X_{kt} + u_t

The test statistic is computed as:

LM = n R^2

where n is the sample size and R^2 is the coefficient of determination from the auxiliary regression above.

An equivalent version uses the F-statistic from the same auxiliary regression of the squared residuals on the explanatory variables .The test statistic is:

F = \frac{(R^2 / k)}{[(1 - R^2) / (n - k - 1)]}

where R^2 is from the auxiliary regression, k is the number of explanatory variables, and n is the sample size. In practice, we are testing the following null hypothesis:

\delta_1 = \delta_2 = \dots = \delta_k = 0

Null hypothesis (H_0): the variance of the residuals is constant (homoscedasticity).
Alternative hypothesis (H_1): the variance of the residuals depends on the regressors (heteroscedasticity).

# BP test using OLS residuals and the OLS design matrix
bp_lm, bp_lm_p, bp_f, bp_f_p = het_breuschpagan(ols.resid, ols.model.exog)

bp_results = pd.DataFrame({
    "statistic": ["LM", "F"],
    "value":     [bp_lm, bp_f],
    "p_value":   [bp_lm_p, bp_f_p]
})
bp_results

	statistic	value	p_value
0	LM	14.521472	0.000703
1	F	7.402890	0.000667

The Breusch-Pagan test generates two statistics with their p-values: the Lagrange-Multiplier (LM) and the F-test. Usually, both arrive at the same conclusions. In this workshop, we will focus on the F test. Remember that the decision rule is:

In the context of the Efficient Market Hypothesis (EMH):

If the F-test p-value is large (e.g., p \ge 0.1), we fail to reject H_0, suggesting that the stock’s return variance is constant and consistent with market efficiency.
If the F-test p-value is small (e.g., p < 0.1), we reject H_0, implying that the variance changes with past information — evidence of heteroscedasticity and potential market inefficiency.

In our case, the F-test p-value is significant at the 1% level, so we reject the null hypothesis of homoscedasticity. This means the model is heteroscedastic, and the variance of returns depends on the explanatory variables—in this case, the lagged returns.

Interpreted through the lens of the (EMH), this result suggests that the variance of past returns influences today’s return, indicating that the market for this stock is not fully efficient. In other words, historical information contains patterns that could be used to forecast changes in volatility or risk, which in turn may affect expected returns. From a portfolio construction perspective, this makes the stock eligible for inclusion in the next stage of analysis, since it exhibits predictable structure inconsistent with EMH.

In the following section, we extend this same test to all available stocks, identifying those that display similar inefficiencies. These filtered assets will then form the candidate set for the next step—examining market anomalies and refining our portfolio composition.

6 EMH variance test across many assets (filter)

We now apply the same pipeline to all tickers, collecting BP F-statistics and p-values. The filter keeps assets with p < 0.10.

tickers = list(data.columns) # Tickers names
records = [] # Create the object to store the results

for col in tickers:               # Start the loop  
    series = data[col].dropna()   # Variable no NAs for regression
    
    r = np.log(series).diff()     # Log returns
    
    # The following lines create the dataframe with returns and lags no NAs
    df = pd.DataFrame({          
        "r": r,
        "lag_1": r.shift(1),
        "lag_2": r.shift(2)
    }).dropna()

    # Vector of independent variables with constant for regression
    X = add_constant(df[["lag_1", "lag_2"]]) 
    # Dependent variable
    y = df["r"]
    
    # Regression to run Breusch-Pagan Test
    fit = sm.OLS(y, X).fit()
    lm, lmp, fstat, fp = sms.het_breuschpagan(fit.resid, fit.model.exog)
    records.append((col, fstat, fp))
    
# Create data frame to filter
bp_df = pd.DataFrame(records, columns=["ticker", "F_stat", "p_value"]).set_index("ticker")
# Show the top of the table
bp_df.head(10)

	F_stat	p_value
ticker
AAPL.Close	7.402890	6.667956e-04
MSFT.Close	5.709427	3.497252e-03
GOOG.Close	5.608142	3.862781e-03
GOOGL.Close	5.482668	4.369222e-03
AMZN.Close	0.486758	6.148581e-01
TSLA.Close	3.699726	2.529626e-02
BRK.A.Close	13.298510	2.232942e-06
BRK.B.Close	17.135412	5.793682e-08
FB.Close	0.152859	8.582842e-01
TSM.Close	6.901224	1.088404e-03

Filter: keep assets with p < 0.10.

alpha = 0.10
selected = bp_df[bp_df["p_value"] < alpha]
# Organize by p-value
selected = selected.sort_values("p_value")
selected.head() # Observations with lowest p-value

	F_stat	p_value
ticker
WFC.PQ.Close	85.605410	1.965355e-33
BML.PH.Close	43.868318	1.654730e-18
JNJ.Close	37.951483	3.020436e-16
PG.Close	36.181957	1.459002e-15
BAC.PE.Close	24.481575	6.016778e-11

selected.tail() # Observations with highest p-value

	F_stat	p_value
ticker
LIN.Close	3.014815	0.049799
CICHY.Close	2.897345	0.055942
AZNCF.Close	2.807902	0.061124
RYDAF.Close	2.754262	0.064461
BMY.Close	2.502647	0.082722

Check with how many stocks we arrived at the end.

selected.shape  # Check the size of the new datset

(79, 2)

The tables above show the shortlist for the next topic (trend/anomaly filter). Note that this is an introductory filter; it does not correct for multiple testing and should be interpreted as a pedagogical first pass.

7 References

Aldridge, I. (2010). High-Frequency Trading: A Practical Guide to Algorithmic Strategies and Trading Systems. Wiley.
Breusch, T. S., & Pagan, A. R. (1979). A simple test for heteroscedasticity and random coefficient variation. Econometrica, 47(5), 1287–1294.
Burton, E., & Shah, S. (2013). Behavioral Finance: Understanding the Social, Cognitive, and Economic Debates. Wiley.
Fama, E. F. (1970). Efficient capital markets: A review of theory and empirical work. The Journal of Finance, 25(2), 383–417.
Wooldridge, J. M. (2020). Introductory Econometrics: A Modern Approach (7th ed.). Cengage.