Is the data set up correctly?

It is unclear whether or not the dataset is appropriately constructed. The unit of observation is a bureaucrat - firm dyad year, so a dynamic network. The authors state on page 3 that “for every revolving-door bureaucrat and the connected firm pair, we create 21 rows for each year between 1997 and 2017.” Is this to imply that only connected bureaucrat-firm pairs are expanded over time?

This fear is compounded by the illustration of the data on page 16 and the claim on page 13 that “only USTR officials with prior/post connections with private sector firms are included.” But then on page 15 it says that "to be included in the sample, a bureaucrat must have a revolving-door history (entry to or/and exit from the USTR) and connected firms must have served at least once on an advisory committee under the USTR during the period.

So how, exactly, was the data constructed? Three non-identical descriptions were given. One particularly concerning aspect is the fact that “connected firms must have served at least once on an advisory commitee under the USTR during the period,” but serving on an advisory committee and the count of advisory committees sat on is a dependent variable. THOU SHALT NOT SELECT (or sample) ON THE DEPENDENT VARIABLE!

My gut says that the only correct way of approaching the construction of this data is to (1) get all bureaucrat names relevant for the period, (2) get all firm names relevant for the period, and then (3) expand that grid (in R, for example, with the expand.grid function). I.e. the “base” dataset should look something like:

all_bureaucrats <- c("a","b")
all_firms <- c("T","S")
all_years <- 1997:2000

expand.grid(crats = all_bureaucrats,
            firms = all_firms,
            year = all_years) -> example

example
##    crats firms year
## 1      a     T 1997
## 2      b     T 1997
## 3      a     S 1997
## 4      b     S 1997
## 5      a     T 1998
## 6      b     T 1998
## 7      a     S 1998
## 8      b     S 1998
## 9      a     T 1999
## 10     b     T 1999
## 11     a     S 1999
## 12     b     S 1999
## 13     a     T 2000
## 14     b     T 2000
## 15     a     S 2000
## 16     b     S 2000

I.e. all combinations of \(i\), \(j\), and \(t\). If there was selection on the dependent variable then the results will be biased. There are, however, some other related issues worth note.

Missing Data Argument

On page 13 the authors note that USTR officials with and without career information are systematically different and that only 459 out of 825 USTR officials have such information posted online. They then claim that “the missing data woud not pose a serious threat to our empirical findngs” because “officials withot online career information are more likely to be career bureaucrats” and as such are likely to be excluded from your dataset since you construct your data in a way which includes only USTR officials with prior/post connections with private sector firms.

Since I address the problems with constructing data in that way above, I’ll focus on the specific claim regarding missing data here. The claim that they are more likely to be career bureaucrats seems difficult to substantiate for two reasons. First, you don’t observe their career trajectories, so technically isn’t that exactly the thing which is unknown? Second, two sentences before it is shown that they have on average just two more years at the USTR than those with online career information. That is hardly a career.

A better argument is to note that missing data does not induce bias when the probability of being a complete case does not depend on Y. This is the thing that has to be argued, and it may or may not be the case. See, and perhaps cite, Arel-Bundock and Pelc (2018) or Pepinsky (2018) for additional details and a recent discussion targeting political scientists. Funny enough, there is also a proof in Paul Allison’s 2001 green-book titled Missing Data which is more general than these recent PA articles.

The idea is that we want to estimate the conditional distribution of \(Y\) given a vector of predictor variables \(X\); \(f(Y|X)\). Let \(A=1\) if the observation is a complete case and 0 otherwise. Listwise deletion estimates \(f(Y|X,A=1)\). From the definition of conditional probability we know that

\[ \begin{aligned} f(Y|X,A=1) &= \frac{f(Y,X,A=1)}{f(X,A=1)} \\ &= \frac{Pr(A=1|Y,X)f(Y|X)f(X)}{Pr(A=1|X)f(X)} \end{aligned} \] So all we have to assume is that \(Pr(A=1|Y,X) = Pr(A=1|X)\), i.e., that the probability of a complete case does not depend on \(Y\) but may depend on \(X\). So then it is immediate that \(f(Y|X,A=1) = f(Y|X)\) and so the complete case estimator is completely fine. Note that this applies to any regression procedure, not just linear regression.

So again, the argument that you have to make here is something along the lines of firm lobbying activity is unrelated to missingness. This is already not the case because of the selection on observables implied in the previous section. It may also be violated if, for example, firm lobbying activity is related to missingness on online career information – perhaps because lobbying activity is part of the revolving door which brings these people into and out of firms and agencies. On these points I have no substantive input, but it is along these lines that you will have to argue.

Specification issues

My final comment regards your use of dyad fixed effects while omitting the constituent unit terms. Effectively the sort of data you are working with are network data. In such data it is common to include terms capturing the effect of unit \(i\) (we will call this a “sender” effect), the effect of unit \(j\) (call this a “receiver” effect) and the dyadic effect (representing that there is something about the pair rather than just the members). Omitting these “sender” and “receiver” effects will introduce bias insofar as they are correlated both the response and the covariate (because it is exactly omitting a variable). And so the claim that “our results are robust to individual-level and firm-level time-invariant confounders” is false.

The newest, fanciest, and probably not the best thing related to this would be the “Additive and Multiplicative Effect Network” model introduced by Minas, Hoff, and Ward (2019). One complication is that you have a temporal component as well. You should check out the recent article on potential issues with the two-way fixed effects model by Kropko and Kubinec (2020) which shows that there is a flaw with exactly the interpretation you wish to give to the model you estimate.

This matter of interpretation aside, there is no mechanical reason why you cannot estimate a simplified version of the AMEN model using vanilla mixed model software like lme4. Depending on the structure of your (perhaps rectified) data this may not be necessary since the point of regularizing (whether with latent factors or random effects) is to overcome the degrees of freedom problem.