This is a work-in-progress, but I wanted to post as a placeholder in the meantime.
This is a blog post about missing data in networks, be them social, biological, information, or otherwise. Given my background, I will focus on social science examples, but everything in this write-up could easily apply to networks in other areas. This is partly why I love this topic, graphs are such a ubiquitous structures across so many different fields. The study of missing data has a long history (see Rubin, 1976, Little and Rubin, 1987, or Rubin, 1987) and has often focused on questions such as “What is the cause or mechanism of the missingness?” or “How can we recover or impute the missing values to get as close to the Truth* as possible?”. For this post, I will not focus on these traditional missing data mechanisms (e.g. MAR, MNAR, MCAR) and instead will focus on walking through simulating network data where the connections a node has are the mechanism causing the missingness. As I note later, missing data by itself can be really complex and extending this to networks can be headache inducing. Therefore, I try to focus on simplicity through out this write-up.
*without losing our arm and brother in the process. If you get this reference, thank you.