Replicating the table IV in BLP (1995)

Introduction

This is my attempte to replication the results of the table IV in Berry, Levinsohn, and Pakes (1995) by using Python. The following sections provide a very brief overview of the BLP model and the corresponding Python codes for how it is estimated. The main codes I relying on for this BLP replication exercise come from http://www.ivan-li.com/code/blp_1995. I also try to follow the procedure of Chris Conlon and Jeff Gortmaker’s pyblp package, with their working paper: https://jeffgortmaker.com/files/pyblp.pdf.

Import necessary packages for Python

Below are the necessary packages you need to import in Python before running the codes.

import pandas as pd
import numpy as np
from scipy.optimize import minimize
import scipy
from numba import jit, njit, prange
import time
import pickle
import pyblp

1. BLP model description

The BLP model is a model to present a framework which enables one to obtain estimates of demand and cost parameters for a class of oligopolistic differentiated products market. These estimates can be obtained using only widely available product-level and aggregate consumer-level data, and they are consistent with a structural model of equilibrium in an oligopolistic industry. The framework is based on:

A joint distribution of consumer characteristics and product attributes that determines preferences over the products marketed;
Price taking assumptions on the part of consumers;
Nash equilibrium assumptions on the part of producers.

The main contributions of BLP model are:

Use aggregate data to incorporate heterogeneity such as different consumer with different income have different taste;
Propose BLP instruments to address endogeneity for the price of the product and the unobserved product attributes.

1.1 Utility and demand

Below are the common notations that used in the BLP model.

There are $t = 1, 2, . . . , T$ markets
Each market with consumers $i = 1,2, \cdots, I_t$
Each market with $j = 1, 2,\cdots, J$ products produced
Each prodcut with $k = 1,2, \cdots, K$ attributes

Given these notations, the untility derived by consumer $i$ from consuming product $j$ is: \[U(\zeta_i, p_j, x_{j}, \xi_{j}; \theta)\] where $\theta$ is a $k$-vector of parameters to be estimated, $\zeta$ represents a vector of product characteristics, $p$ represents the price of the product, and $x$ and $\xi$ are, respectively, observed and unobserved (by the econometrician) product attributes. Consumer $i$ chooses good $j$ if and only if \[U(\zeta_i, p_j, x_{j}, \xi_{j}; \theta) \geq U(\zeta_i, p_j, x_{j}, \xi_{j}; \theta), \quad \text{for}\quad r = 0, 1, \cdots, J, \] where alternatives $r = 0, 1, \cdots, J$ represent purchases of the competing differentiated products. Alternative zero, or the outside alternative, represents the option of not purchasing any of those product. Let \[A_{j} = \{U(\zeta, p_j, x_{j}, \xi_{j}; \theta) \geq U(\zeta, p_j, x_{j}, \xi_{j}; \theta), \quad \text{for}\quad r = 0, 1, \cdots, J,\}. \] That is, $A_{j}$ is the set of values for $\zeta$ that induces the choice of good $j$. Then, the market share of good “$j$” as a function of the characteristics of all the goods competing in the market is given by

\[s_{j}(p, x, \xi ; \theta)=\int_{\xi \in A_{j}} P_{0}(d \zeta),\] where $P_{0}(d \zeta)$ ) provides the density of $\zeta$ in the population. The Cobb-Douglas utility function form of consumer $i$’s utility of consuming product $j$ in market $t$, in which the function $G(.)$ is linear and allows for interaction between individual and product characteristics:

\[U\left(\zeta_{i}, p_{j}, x_{j}, \xi_{j} ; \theta\right)=\left(y_{i}-p_{j}\right)^{\alpha} G\left(x_{j}, \xi_{j}, \nu_{i}\right) e^{\epsilon(i, j)},\] where $y$ is income, and $\epsilon$ provides the effect of the interactions of unobserved product and individual characteristics.

Random Coefficients

The BLP paper assume that $G(.)$ is linear and has the random coefficient specification as dicussed above. Let $u_{ij} = \log[U_{ij}]$, then

\[u_{ij}= \alpha \log(y_{i} - p_{j}) + x_{j} \bar{\beta}+\xi_{j}+\sum_{k} \sigma_{k} x_{j k} \nu_{i k}+\epsilon_{i j}\] where $(\zeta_{i}, \epsilon_{i} ) = (\nu_{i1}, \cdots, \nu_{iK}, \epsilon_{i1}, \cdots, \epsilon_{iK})$ is a mean zero vector of random variables with (a known) distribution function. Using Taylor expansion, the utility obtained from consuming good $j$ can be expressed as

\[u_{ij}= \alpha \log(y_{i}) + \alpha_{new} p_{j} + x_{j} \bar{\beta}+\xi_{j}+\sum_{k} \sigma_{k} x_{j k} \nu_{i k}+\epsilon_{i j}\]

Define

\[\delta_{j} = \alpha_{new} p_{j} + x_{j} \bar{\beta} +\xi_{j} \] and a deviation from that mean

\[\mu_{ijt} = \alpha \log(y_{i}) + \sum_{k} \sigma_{k} x_{j k} \nu_{i k},\] which depends on the interaction between consumer preferences and product characteristics. Given the assumption of Type I extreme value errors, we can analytically integrate out the $\epsilon_{ij}$ and arrive at a probability that consumer $i$ will choose product $j$

\[\hat{s}_{i j}\left(v_{i}\right)=\frac{\exp \left\{\delta_{j}+\mu_{i j}\left(v_{i}\right)\right\}}{\sum_{l} \exp \left\{\delta_{l}+\mu_{i l}\left(v_{i}\right)\right\}}\] BLP considers two models. In the first, $\mu_{ij} = 0$ which reduces to the standard logit model. In the second, $\mu_{ij} \neq 0$, which leads to the random coefficients model. Here we fouced on the second case.

The main estimation procedure:

Treat $\delta_i$ as a whole component
Once $\delta_i$ estimated, we can use OLS regression to estimate the parameters in $\delta_{j} =\alpha_{new} p_{j} + x_{j} \bar{\beta} +\xi_{j}$
Iterations the above two steps

Price Endogeneity and Instrumental Variables

Based on the expression of the market share, we have the regression model:

\[\ln(s_{j}) - \ln(s_{0}) = -\alpha p_{j} + x_{j} \bar{\beta} + \xi_{j},\] We can then estimate $\bar{\beta}$ and $\xi$ using regular OLS. Here, the structural error that we are trying to minimize is $\xi_{jt}$, i.e. the unobserved product quality. This will also be the error term that we minimize in the random coefficients setup. For the OLS estimation, we have an endogeneity problem since cars with large unobserved quality will tend to have higher prices as well: \[\text{Cov} (p_{j}, \xi_{j}) > 0.\] This endogeneity problem will cause biased OLS estimators. A simple remedy would be to instrument for price in the logit model. BLP propose using three sets of instruments:

The observed product characteristics (which are assumed orthogonal to the unobserved characteristics) other than $j$ as an instruments in the demand of differentiated products.
The sum of product characteristics for all models marketed by a single firm in a given market other than $j$ as an instruments in the demand of differentiated products..
The sum of product characteristics for all models in a given market other than $j$ as an instruments in the demand of differentiated products.

Here a mistake have been found by Shapiro and Gentzkow in how BLP calculated their instruments. They multiply each product characteristic by the number of models the firm sells in each market rather than sum across the characteristics. I follow this mistaken calculation so as to match BLP’s original results.

2. Computation steps and corresponding Python codes

First we need to import the data in Python and define some variables for later use.

# Read the dataframe
df = pd.read_csv("C:/Users/zgong\OneDrive/Desktop/IO II/blp_replication/blp_1995_data.csv")
df = df.drop(df.columns[0], axis = 1)

# python code for cleaning
df[["ln_hpwt", "ln_space", "ln_mpg", "ln_mpd", "ln_price"]] = \
    df[["hpwt", "space", "mpg", "mpd", "price"]].apply(lambda x: np.log(x))

# instrument change
df["trend"] = df.market.map(lambda x: x + 70)  # use with non pyblp instruments
# df["trend"] = df.market

df["cons"] = 1

df["s_0"] = np.log(1 - df.share.groupby(df["model_year"]).transform("sum"))

df["s_i"] = np.log(df.share)
df["dif"] = df.s_i - df.s_0
df["dif_2"] = np.log(df.share) - np.log(df.share_out)
df["ln_price"] = np.log(df.price)

df.head()

# here because we may want to use their instruments

product_data = pd.read_csv(pyblp.data.BLP_PRODUCTS_LOCATION)

# demand variables
X = df[["cons", "hpwt", "air", "mpd", "space"]].values

# suppy variables
W = df[["cons", "ln_hpwt", "air", "ln_mpg", "ln_space", "trend"]].values

# price
p = df.price.values

# initial delta_0 estimate: log(share) - log(share outside good)
delta_0 = df.dif_2.values

# number of goods per market
J = df.groupby("year").sum().cons.values

# number of draws per market
N = 500

# number of markets
T = len(J)

# Estimated log income means for years 1971 - 1990
incomeMeans = [2.01156, 2.06526, 2.07843, 2.05775, 2.02915, 2.05346, 2.06745,
               2.09805, 2.10404, 2.07208, 2.06019, 2.06561, 2.07672, 2.10437, 2.12608, 2.16426,
               2.18071, 2.18856, 2.21250, 2.18377]

# standard deviation of log incomes, assuming empirically given in 1995
sigma_v = 1.72

# number of terms that have the random coefficient
#  according to table 4, this is constant, hp/wt, air, mp$, size, price
k = 5

# markets for itj
markets = df.market.values

# unique markets
marks = np.unique(df.market)

# firms
firms = np.reshape(df.firmid.values, (-1,1))

Step 0:

Draw $\nu_i$ for a set of $N$ consumers
Choose initial values for $\theta_2$, which contains all other parameters than linear parameters.

# set seed
np.random.seed(64622020) 
m_t = np.repeat(incomeMeans, N)

# different draws for each market
V = np.reshape(np.random.standard_normal((k + 1) * N * T), (T * N, k + 1))

# income if we have different draws per market
y_it = np.exp(m_t + sigma_v * V[:, k]).reshape(T,N).T

# initial parameter guess (from BLP(1995))
theta_2 = [3.612, 4.628, 1.818, 1.050, 2.056, 43.501]

Initialize mean utility $\delta_{jt}$ based on homogenous logit

# define a class so I can repeatedly update the delta value
class delta:
    def __init__(self, delta):
        self.delta = delta

# initialize a delta object using the delta_0 values
d = delta(delta_0)

Step 1:

For given $\theta_2$ (and initial draws of $\nu_i$ and $D_{i}$), we define a function to compute household deviations from mean utility $\mu_{ijt} (x_{jt}; p_{jt}; ν_{i}; D_{i}; \theta_{2})$.

# the loops that calculate utility in a separate function so that it can be run in parallel. 

def util_iter(out, x, v, p, y, delta, theta_2, J, T, N):
    # first iterate over the individuals 
    for i in prange(N):
        # iterator through t and j
        tj = 0
        
        # iterate over the markets
        for t in prange(T):
            # market size of market t
            mktSize = J[t]
            
            # income for individual i in market t
            y_im = y[i, t]
            
            # iterate over goods in a particular market
            for j in prange(mktSize):
                out[tj, i] = delta[tj] + \
                v[N * t + i, 0] * theta_2[0] * x[tj, 0] + \
                v[N * t + i, 1] * theta_2[1] * x[tj, 1] + \
                v[N * t + i, 2] * theta_2[2] * x[tj, 2] + \
                v[N * t + i, 3] * theta_2[3] * x[tj, 3] + \
                v[N * t + i, 4] * theta_2[4] * x[tj, 4] - \
                theta_2[5] / y_im * p[tj]
                
                tj += 1
    return out

Define a function to compute indirect utility given parameters
- $x$: matrix of demand characteristics
- $v$: monte carlo draws of N simulations
- $p$: price vector
- $y$: income of individuals
- delta: guess for the mean utility
- theta_2: non-linear params
- $J$: vector of number of goods per market
- $T$: numer of markets
- $N$: number of simulations

def compute_indirect_utility(x, v, p, y, delta, theta_2, J, T, N):
    # make sure theta_2 are positive
    theta_2 = np.abs(theta_2)
    
    # output matrix
    out = np.zeros((sum(J), N))
    
    # call the iteration function to calculate utilities
    out = util_iter(out, x, v, p, y, delta, theta_2, J, T, N)
     
    return out

For given mean utility $\delta_{jt}$ and $\theta_2$, we define a function to compute predicted market shares. The market share can be approximated by simulation with

\[s_{j t}\left(\delta_{t}, \theta_{2}\right)=\frac{1}{N S} \sum_{i=0}^{N S} \frac{\exp \left\{\delta_{j t}+\mu_{i j t}\right\}}{\sum_{k=0}^{J} \exp \left\{\delta_{k t}+\mu_{i k t}\right\}}\]

def compute_share(x, v, p, y, delta, theta_2, J, T, N):
    q = np.zeros((np.sum(J), N))
    
    # obtain vector of indirect utilities
    u = compute_indirect_utility(x, v, p, y, delta, theta_2, J, T, N)
    
    # exponentiate the utilities
    exp_u = np.exp(u)
    
    # pointer to first good in the market
    first_good = 0
            
    for t in range(T):
        # market size of market t
        mktSize = J[t]

        # calculate the numerator of the share eq
        numer = exp_u[first_good:first_good + mktSize,:]

        # calculate the denom of the share eq
        denom = 1 + numer.sum(axis = 0)    
          
        # calculate the quantity each indv purchases of each good in each market
        q[first_good:first_good + mktSize,:] = numer/denom
        
        first_good += mktSize
    
    # to obtain shares, assume that each simulation carries the same weight.
    # this averages each row, which is essentially computing the sahres for each good in each market. 
    s = np.matmul(q, np.repeat(1/N, N))
    
    return [q,s]

Step 2:

Contraction Mapping: given $\theta_2$, search for $\delta_t$ such that market shares computed in Step 1 are equal to the observed market shares, $s_{jt} = s_{jt}(\delta_t, \theta_2)$. This is a non-linear system of equations that is solved numerically using contraction mapping proposed by BLP (1995)

\[\delta_{t}^{h+1}=\delta_{t}^{h}+\ln s_{j t}-\ln s_{j t}\left(\delta_{t}^{h}, \theta_{2}\right)\]

Iteration continues until $||\delta_{t}^{h+1}- \delta_{t}^{h}||$ is below a specified tolerance level.

def solve_delta(s, x, v, p, y, delta, theta_2, J, T, N, tol):
    # define the tolerance variable
    eps = 10
    
    # renaming delta as delta^r
    delta_old = delta
    
    while eps > tol:
        # Aviv's step 1: obtain predicted shares and quantities
        q_s = compute_share(x, v, p, y, delta_old, 
                            theta_2, J, T, N)
        
        # extract the shares
        sigma_jt = q_s[1]
        
        # step 2: use contraction mapping to find delta
        delta_new = delta_old + np.log(s/sigma_jt)
        
        # update tolerance
        eps = np.max(np.abs(delta_new - delta_old))
        
        delta_old = delta_new.copy()
    
    return delta_old

Step 3:

Calculates the marginal costs given probabilities and shares
- q_s : output of compute share, a list of probabilities matrix(q) and shares vector(s)
- Firms: vector of firms operating in each market (length is $J\times T$)
- marks: vector of unique markets (length $T$)
- markets: vector indicating observation in which market (length $J\times T$)

def calc_mc(q_s, firms, p, y, alpha, J, T, N, marks, markets):
    
    # declare output vector
    out = np.zeros((np.sum(J)))
    
    # make sure the value of alpha is positive
    alpha = np.abs(alpha)
    
    # read in quantities
    q = q_s[0]
    
    # read in shares
    s = q_s[1].reshape((-1,1))
    
    # reshape some vectors into column vectors
    p = p.reshape((-1,1))
    
    # iterate over markets
    for m in marks:
        # obtain list of firms operating in that market/year
        firm_yr = firms[markets == m]
        
        # obtain list of prices of goods in that market/year
        price = p[markets == m]
        
        # J_t x J_t block matrix of 1's indicating goods belonging to same firm
        #  in that market/year
        # Also known as the ownership matrix
        same_firm = np.equal(firm_yr, np.transpose(firm_yr))
        
        # obtain matrix of probabilities for all simulations for goods in that 
        #  market/year
        yr = q[markets == m,:]
        
        # obtain number of goods in that market
        nobs = np.size(yr, 0)
        
        # this is the omega matrix initializing        
        grad = np.zeros((nobs, nobs))
        
        # calculate the omega matix by iterating through all individuals
        #  Omega matrix is cross-price deriv element-wise product with
        #  ownership matrix
        for i in range(N):
            yr_i = yr[:, i].reshape((-1, 1))
            grad += alpha / y[i, m - 1] * same_firm * 1/N * \
            (yr_i @ yr_i.T - np.diag(yr[:,i]))
        
        # Omega matrix actually requires negative cross price derivs
        subMatrix = -grad
        
        # now obtain the marginal costs
        b = np.linalg.inv(subMatrix) @ s[markets == m]
        mc = price - b
        mc[mc < 0] = .001
        
        # update entries in the output vector
        out[markets == m] = mc.flatten()
        
    return out

From $\delta_t$, estimate the linear parameters $\theta_1$ using the fact that $\delta_{jt}(s_{jt}, \theta_{2}) - (x_{jt}\beta - \alpha p_{jt}) = \xi_{jt}$. The IV moment conditions are $E[Z'\xi] = 0$.

\[\hat{\theta}_{1}=\left(X_{1}^{\prime} Z W^{-1} Z^{\prime} X_{1}\right)^{-1} X_{1}^{\prime} Z W^{-1} Z^{\prime} \delta\left(\theta_{2}\right)\] Note: here $X_1$ - product characteristics that enter linear part of the estimation, $Z$ - instruments for endogenous variables, $W$ - consistent estimate of $E[Z'\xi \xi' Z]$. The instruments for BLP paper can be generated as follows:

for m in marks:
        sub = inx[markets == m, :]
        firminfo = firms[markets == m]
        same_firm = np.equal(firminfo, np.transpose(firminfo))
        
        z_1 = np.zeros((np.size(sub,axis = 0), np.size(sub, axis = 1)))
        
        for i in range(np.size(sub, axis = 1)):
            z_1[:,i] = (sub[:,i].reshape((-1,1)) * same_firm).T.sum(axis = 0)
        
        totFirm[markets == m,:] = z_1
        
        z_1 = np.zeros((np.size(sub,axis = 0), np.size(sub, axis = 1)))
        
        for i in range(np.size(sub, axis = 1)):
            z_1[:,i] = (sub[:,i].reshape((-1,1)) * (same_firm + np.logical_not(same_firm))).sum(axis = 0)
            
        totMarket[markets == m, :] = z_1
        
    return [totFirm, totMarket]
    
    
# close to BLP original instruments
tempDemand = gen_inst(X, firms, marks, markets)
tempSupply = gen_inst(W, firms, marks, markets)

# instruments
Z = np.hstack((X, tempDemand[0], tempDemand[1]))
baseData = np.hstack((p.reshape((-1,1)), X))

Z_s = np.hstack((W, tempSupply[0], tempSupply[1], df.mpd.values.reshape((-1,1))))

baseData = np.hstack((X, p.reshape((-1,1))))

Compute GMM objective function

\[Q\left(\theta_{2}\right)=\hat{\xi}\left(\theta_{2}\right)^{\prime} Z W Z^{\prime} \hat{\xi}\left(\theta_{2}\right)\] where $\hat{\xi}\left(\theta_{2}\right)$ are GMM residuals.

# Construct GMM estimator
zxw1 = Z.T @ baseData

bx1 = np.linalg.inv(zxw1.T @ zxw1)@ zxw1.T @ Z.T @ delta_0

# estimated error 
e = delta_0 - baseData @ bx1

g_ind = e.reshape((-1,1)) * Z

demean = g_ind - g_ind.mean(axis=0).reshape((1,-1))

vg = demean.T @ demean / demean.shape[0]

# weighting matrix
w0 = np.linalg.inv(vg)

t3c2 = np.linalg.inv(zxw1.T @ w0 @ zxw1) @ zxw1.T @ w0 @ Z.T @ delta_0

# obtain block-diag matrix of supply and demand instruments
z = scipy.linalg.block_diag(Z,Z_s)

# Recommended initial weighting matrix from Aviv's appendix
w1 = np.linalg.inv(z.T @ z)

def objective(theta_2, s, X, V, p, y, J, T, N, marks, markets, tol, 
              Z, Z_s, W, weigh, firms):
    
    obs = np.sum(J)
    
    d.delta = solve_delta(s, X, V, p, y, d.delta, theta_2, J, T, N, tol)
    
    # obtain the actual implied quantities and shares from converged delta
    q_s = compute_share(X, V, p, y, d.delta, theta_2, J, T, N)
    
    # calculate marginal costs
    mc = calc_mc(q_s, firms, p, y, theta_2[5], J, T, N, marks, markets).reshape((-1,1))
    
    # since we are using both demand and supply side variables,
    #  we want to stack the estimated delta and estimated mc
    y2 = np.vstack((d.delta.reshape((-1,1)), np.log(mc)))
    
    # create characteristics matrix that includes both supply and demand side
    #  with demand characteristics on the bottom left and supply on the top right
    x = scipy.linalg.block_diag(X,W)
    
    # create matrix of supply and demand instruments, again with
    #  demand instruments on the right and supply on the left (top/down changed)    
    z = scipy.linalg.block_diag(Z,Z_s)
    
    # get linear parameters (this FOC is from Aviv's appendix)
    b = np.linalg.inv(x.T @ z @ weigh @ z.T @ x) @ (x.T @ z @ weigh @ z.T @ y2)
    
    # Step 3: get the error term xi (also called omega)
    xi_w = y2 - x @ b
    
    # computeo g_bar in GMM methods
    g = z.T @ xi_w / obs
    
    obj = float(obs**2 * g.T @ weigh @ g)
   
    print([theta_2, obj])
    
    return obj

Step 4:

Minimize $Q(\theta_2)$ over $\theta_2$ with Steps 1-3 nested for every $\theta_2$ trial. The initial computation is

# computation time
t1 = time.time()

# set bounds for optimization 
bnds = ((0,np.inf), (0,np.inf), (0,np.inf), 
        (0,np.inf), (0,np.inf), (5,np.inf))

res_init_wt = minimize(objective,
                      theta_2, 
                      args = (df.share.values, X, V, p, y_it, 
                              J, T, N, marks, markets, 1e-4, 
                              Z, Z_s, W, w1, firms), 
                      bounds = bnds,
                      method = "L-BFGS-B",
                      options = {'maxiter': 1000, 'maxfun': 1000, 'eps': 1e-3},
                      tol = 1e-4)
    

    
time.time() - t1

# save the output
outfile = open("res_init_wt_bfgs.pickle", "wb")
pickle.dump(res_init_wt, outfile)
outfile.close()

Computation time:

time.time() - t1

Out[149]: 319.6439049243927

Around 5 minutes.

We next search for optimal parameters with the optimal weighting matrix

# remember to write code that reads in the above output later and comment out the save
obs = np.sum(J)

# approximate optimal weighting matrix
params_2 = res_init_wt.x

# calculate mean utility given the optimal parameters (with id weighting matrix)
d.delta = solve_delta(df.share.values, X, V, p, y_it, 
                        d.delta, params_2, J, T, N, 1e-5)

# calculate probabilities and shares given the optimal params (w/ id weight matrix)
q_s = compute_share(X, V, p, y_it, d.delta, params_2, J, T, N)

# calculate marginal costs
mc = calc_mc(q_s, firms, p, y_it, params_2[5], J, T, N, marks, markets).reshape((-1,1))

y2 = np.vstack((d.delta.reshape((-1,1)), np.log(mc)))
x = scipy.linalg.block_diag(X,W)
z = scipy.linalg.block_diag(Z,Z_s)

# this is the first order condition that solves for the linear parameters
b = np.linalg.inv(x.T @ z @ w1 @ z.T @ x) @ (x.T @ z @ w1 @ z.T @ y2)

# obtain the error
xi_w = y2 - x @ b


# update weighting matrix
g_ind = z * xi_w
vg = g_ind.T @ g_ind / obs

# obtain optimal weighting matrix
weight = scipy.linalg.inv(vg)
# 

res = minimize(objective,
              theta_2, 
              args = (df.share.values, X, V, p, y_it, 
                      J, T, N, marks, markets, 1e-4, 
                      Z, Z_s, W, weight, firms), 
              bounds = bnds,
              method = "L-BFGS-B",
              options = {'maxiter': 1000, 'maxfun': 1000, 'eps': 1e-3},
              tol = 1e-4)


outfile = open("res_bfgs.pickle", "wb")
pickle.dump(res, outfile)
outfile.close()

Computation time:

time.time() - t1

Out[148]: 541.9632413387299

Around 9 minutes.

3. Results in table IV

Linear parameters: here we compare parameters to Table IV of BLP
- first 5 are the demand side means
- last 6 are the cost side params

# obtain the linear parameters
params_3 = res.x

d.delta = solve_delta(df.share.values, X, V, p, y_it, 
                        d.delta, params_3, J, T, N, 1e-4)
q_s = compute_share(X, V, p, y_it, d.delta, params_3, J, T, N)
    
mc = calc_mc(q_s, firms, p, y_it, params_3[5], J, T, N, marks, markets).reshape((-1,1))
    
y2 = np.vstack((d.delta.reshape((-1,1)), np.log(mc)))

b = np.linalg.inv(x.T @ z @ weight @ z.T @ x) @ (x.T @ z @ weight @ z.T @ y2)
b

Out[146]: 
array([[-9.55410261], # Constant 
       [ 5.9324421 ], # HP/Weight
       [ 0.55829539], # Air
       [-0.26241737], # MP$
       [ 4.39568512], # Size
       [ 1.35815902], # Cost side Parameters: Constant
       [ 0.4824173 ], # In(HP/Weight)
       [ 0.54948733], # Air
       [-0.30928763], # In(MPG)
       [-0.07389036], # In(Size)
       [ 0.01350354]]) # Trend

Non-linear poarameters - these are the Std.Deviations ($\sigma_{\beta}$) in table IV of BLP
- HP/Weight
- Air
- MP$\$$
- Size

res.x

Out[147]: 
array([ 6.45677703, # Constant
        4.6053849 , # HP/Weight
        1.79051229, # Air 
        0.96568467, # MP$ 
        2.02809019, # Size
       43.46828961]) # Term on price (\alpha): ln(y - p)

References

[1] Berry, S., Levinsohn, J., & Pakes, A. (1995). Automobile prices in market equilibrium. Econometrica: Journal of the Econometric Society, 841-890.

[2] Conlon, C., & Gortmaker, J. (2019). Best practices for differentiated products demand estimation with pyblp. Working paper.

[3] Nevo, A. (2000). A practitioner’s guide to estimation of random‐coefficients logit models of demand. Journal of economics & management strategy, 9(4), 513-548.

[4] Mark Ponder’s Blog

[5] Kohei Kawaguchi’s online empirical IO assignments