Creating Measures of Environmental Uncertainty

Dynamism

The dynamism measure is broken down into two aspects: the rate of changes (frequency) and the quantum of changes (magnitude). To determine the dynamism, we first follow Dess and Beard (1984) and run a regression of the stock market index on the linear time trend. We adopt a rolling time window of a quarter of the year to measure stock market dynamism during the Great Recession. The time window choice is behaviorally and economically justified. First, fund launches typically involve strategic planning, regulatory approval, and marketing preparation, all of which unfold over months rather than days. As matter of fact, the average launch time between two funds is 86 days. Second, a quarterly window (60 trading days) smooths out short-term market noise while capturing sustained uncertainty that fund firms are more likely to respond to. This time frame aligns well with the decision-making and reporting cycles of financial institutions, making it an appropriate horizon for assessing how prolonged market stress influenced fund initiation behavior during a systemic crisis like the Great Recession.

To measure the rate and quantum of changes, we first obtain the residuals from the regression with daily stock market index of 90 days (\(w=90\)) up to the fund launch time \(t\). \[y_s = \alpha + \beta s + \varepsilon_s, \quad \text{for } s = t-w, \cdots, t-1\] where \(y_s\) is the stock market index.

The quantum of changes measures the magnitude of the volatility in the firm’s environment, where a higher magnitude of volatility implies a higher level of uncertainty. Quantum of change expresses the fact that larger increases or decreases in the environmental entail more uncertainty than smaller increases or decrease. To measure the quantum of changes, we use the coefficient of variation of the residuals, with larger variation indicating higher magnitude of changes and therefore more uncertainty. \[ \text{Magnitude}_t = \sqrt{\frac{1}{w} \sum_{s=t-w}^{t-1} \varepsilon_s^2} \] The rate of change captures the pace of change in an industry. It differs from the quantum of change in that it reflects how often the path of the industry is altered. To determine how much the financial market fluctuates, we compute the autocorrelation coefficient of the residuals. A positive autocorrelation coefficient corresponds to a long and smooth wave, and a negative autocorrelation coefficient corresponds to a sawtooth series with many changes in varying directions. The former reflects a low rate of change and the latter a high one. To ensure that higher scores on the measure reflect higher rates of change, we multiply the autocorrelation coefficient by \(-1\).

The rate of change can be measured by the no. of time-to-time sign flips of the residuals. This measure focuses on the nature of changes but not the extent. \[ \text{SignFlips}_t = \sum_{s=t-w}^{t-1} \mathbb{I}\left( \text{sign}(\varepsilon_s) \neq \text{sign}(\varepsilon_{s-1}) \right) \] Another way to measure the rate of change is to use the autocorrelation which captures the persistence of changes. \[ \rho_1 = \frac{\sum_{s=t-w+1}^{t-1} (\varepsilon_s - \bar{\varepsilon})(\varepsilon_{s-1} - \bar{\varepsilon})}{\sum_{s=t-w}^{t-1} (\varepsilon_s - \bar{\varepsilon})^2} \]

quantum <- rep(0, dim(uncertainty)[1])
rate_direction <- quantum
rate_auto <- quantum

count_sign_flips <- function(X) {
  # Get the sign: +1, 0, or -1
  sign_diff <- sign(X)
  # Count number of sign flips (including 0s)
  flips <- sum(sign_diff[-1] != sign_diff[-length(sign_diff)])
  return(flips)
}

auto_cor <- function(x) {cor(x[-length(x)], x[-1])}

for (i in 1:dim(uncertainty)[1]) {
  idx <- (stock_index$time <= uncertainty$issue_date[i]) &
    (stock_index$time > (uncertainty$issue_date[i]-89))
  index <- stock_index$index[idx]
  model_i <- lm(index~c(1:length(index)))
  residual_i <- model_i$residuals
  quantum[i] <- sd(residual_i)
  rate_direction[i] <- count_sign_flips(residual_i)
  rate_auto[i] <- auto_cor(residual_i)
}

uncertainty$quantum <- quantum
uncertainty$rate_direction <- rate_direction
uncertainty$rate_auto <- -rate_auto

Complexity

To capture the complexity of the investment environment, we measure the cross-sectional dispersion of industry performance using the CSI 300 industry indexes. At each time point \(t\), we compute the standard deviation of index levels (or returns) across different industries within the CSI 300, which reflects how heterogeneous or fragmented the market conditions are across sectors: \[ \text{Complexity}_t = \text{StdDev} \left( y_{t-1}^{(1)}, y_{t-1}^{(2)}, \dots, y_{t-1}^{(K)} \right) \] where \(y_{t}^{(i)}\) is the index value (or return) for industry \(i\) at time \(t\), and \(N\) is the number of industries.

The industry indexes are well-suited for this purpose because they represent leading sectors in China’s equity market, covering a wide range of industries within the top 300 large-cap A-shares. This makes them particularly relevant for understanding the open-end fund market in China, as these funds often benchmark or allocate across these industries. A high standard deviation implies greater sectoral divergence, suggesting a more complex and uncertain environment for fund managers to navigate, while a lower value suggests more uniform market conditions.

# sorting data of the industry index 
library(tidyverse)
industry_index_1 <- industry_index %>% arrange(date) %>% 
  group_by(index_code) %>% summarise(date=date[1]) %>% 
  left_join(industry_list, by = "index_code") %>% arrange(display_name)

# after 2007-07-05 we use CSI 300
industry_specific_csi300 <- c(
  "000952", # 300地产 - Real Estate
  "000908", # 300能源 - Energy
  "000910", # 300工业 - Industrials
  "000911", # 300可选 - Consumer Discretionary
  "000912", # 300消费 - Consumer Staples
  "000913", # 300医药 - Healthcare
  "000914", # 300金融 - Financials
  "000915", # 300信息 - IT
  "000916", # 300电信 - Telecom
  "000917"  # 300公用 - Utilities
)

# before that we use mainland CSI index
mainland_index_codes <- c(
  "000949",  # Mainland Agriculture Index (内地农业)
  "000948",  # Mainland Real Estate Index (内地地产)
  "000942",  # Mainland Consumption Index (内地消费)
  "000944",  # Mainland Resources Index (内地资源)
  "000945"   # Mainland Transportation Index (内地运输)
)

# get data of these index codes
industry_index_1 <- industry_index %>% 
  filter(index_code %in% c(industry_specific_csi300, 
                           mainland_index_codes)) %>% 
  mutate(day_of_week = weekdays(date))

# a function to get the Mondays
get_monday <- function(x) {
  day_of_week <- wday(x, week_start = 1)  # Make Monday = 1
  # Calculate Monday and Sunday of the same week
  monday <- x - (day_of_week - 1)
  # Convert to Excel numeric dates (Excel considers 1899-12-30 as day 0)
  excel_monday <- as.numeric(monday) + 25569
  return(excel_monday)
}
industry_index_1 <- industry_index_1 %>% 
  mutate(mondays = get_monday(date)) %>% 
  mutate(sundays = mondays+6)

# get the sd ratio 
sd_1 <- industry_index_1 %>% 
  filter(index_code %in% mainland_index_codes & date >= "2007-07-06") %>%
  summarise(sd_1 = sd(open)) %>% unlist()
sd_2 <-  industry_index_1 %>% 
  filter(index_code %in% industry_specific_csi300 & date >= "2007-07-06") %>%
  summarise(sd_2 = sd(open)) %>% unlist()
sd_ratio <- sd_2/sd_1
rm(sd_1, sd_2)

# get a second data frame 
industry_index_2 <- industry_index_1 %>% filter(
  (index_code %in% mainland_index_codes & date < "2007-07-06") | 
  (index_code %in% industry_specific_csi300 & date >= "2007-07-06")) %>% 
  mutate(open = ifelse(date < "2007-07-06", 1.004515*open, open))

# create the complexity variable
complexity <- c()
unique_industry <- c()
for (i in uncertainty$issue_date) {
  temp_data <- industry_index_2 %>% 
    filter(i>=mondays & i<=sundays) 
  complexity <- c(complexity,sd(temp_data$open))
  unique_industry <- c(unique_industry, length(unique(temp_data$index_code)))
}

# pack into data 
uncertainty$complexity <- complexity
rm(complexity, unique_industry, temp_data)
data <- left_join(data, uncertainty, by = "issue_date")

Run the regression of partnership selection

We are first running a lasso regression to select which fixed effects to include.

# load data 
load("partner_selection.RData")
# the variables to use
idx <- c(3,7:12,14:19)

# run the basic model 
model_0 <- glm(matching ~ ., data = data[,c(idx,26,28,29,4:5)], 
               family = "binomial")
summary(model_0)

# run the lasso regression with glmnet
library(glmnet)
X_vars <- as.matrix(data[,c(idx[-1],26,28,29)])
X_firm <- model.matrix(~bankid+firmid-1, data = data)
y <- as.factor(data$matching)
penalty_factor <- c(rep(0, ncol(X_vars)), rep(1, ncol(X_firm)))

# run the cross-validation
set.seed(123456789)
cvfit <- cv.glmnet(cbind(X_vars, X_firm), y, family = "binomial", alpha = 1, penalty.factor = penalty_factor)

# Best lambda
best_lambda <- cvfit$lambda.min

# refit the model 
model_lasso <- glmnet(cbind(X_vars, X_firm), y, family = "binomial", alpha = 1, lambda = best_lambda, penalty.factor = penalty_factor)

# Get coefficients
coefs <- coef(model_lasso)
firm_names <- colnames(X_firm)
zero_firm_ids <- firm_names[which(coefs[firm_names, 1] == 0)]

# getting new bankid's and firmid's
data <- data %>% mutate(
  new_bankid = as.factor(ifelse(bankid %in% c(31,36), 43, bankid)),
  new_firmid = as.factor(ifelse(firmid %in% c(13,32,38,42,52,60), 64, firmid))
)

After running the lasso, we run a few models based on the refined fixed effects.

# load data 
load("partner_selection.RData")
# the variables to use
idx <- c(3,7:12,14:19)

# run the basic model without fixed effects
model_0 <- glm(matching ~ ., data = data[,c(idx,26,28,29)], 
               family = "binomial")
# the model with interactions
model_1 <- glm(matching ~ . + lag_match:quantum + lag_match:rate_auto + 
                 lag_match:complexity, 
               data = data[,c(idx,26,28,29)], 
               family = "binomial")

# the model with fixed effects
model_f0 <- glm(matching ~ ., data = data[,c(idx,26,28,29,30:31)], 
               family = "binomial")
# the model with interactions 
model_f1 <- glm(matching ~ . + lag_match:quantum + lag_match:rate_auto + 
                 lag_match:complexity, 
               data = data[,c(idx,26,28,29,30:31)], 
               family = "binomial")

# print the models
models <- list("Main No FE" = model_0,
               "Main + FE" = model_f0,
               "Interaction No FE" = model_1,
               "Interaction + FE" = model_f1)
coef_shown <- names(model_f1$coefficients)[c(13:16,114:116)]

library(modelsummary)

## Warning: package 'modelsummary' was built under R version 4.4.3

modelsummary(models, coef_map = coef_shown, stars = T)

	Main No FE	Main + FE	Interaction No FE	Interaction + FE
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001
lag_match	3.326***	2.842***	4.418***	5.005***
	(0.050)	(0.052)	(0.673)	(0.707)
quantum	0.572***	0.380***	0.575***	0.378**
	(0.061)	(0.112)	(0.064)	(0.117)
rate_auto	2.404***	1.678***	2.191***	1.117*
	(0.383)	(0.490)	(0.447)	(0.562)
complexity	-0.151*	-0.044	-0.048	0.101
	(0.073)	(0.122)	(0.080)	(0.128)
lag_match:quantum			-0.145	-0.130
			(0.129)	(0.135)
lag_match:rate_auto			0.276	1.373
			(0.828)	(0.863)
lag_match:complexity			-0.518***	-0.616***
			(0.156)	(0.162)
Num.Obs.	24402	24402	24402	24402
AIC	13969.1	12835.7	13962.3	12820.3
BIC	14098.8	13751.3	14116.2	13760.2
Log.Lik.	-6968.572	-6304.842	-6962.139	-6294.172
RMSE	0.29	0.28	0.29	0.28