Do you have missing data in the dataset? … don’t ignore them.

Missing data is a common issue in statistical analysis, and it can cause bias or inaccurate results. In these posts, we will explore various types of missing data mechanisms to understand how to identify them and address the issue.

Let’s start with the concept of missing completely at random (MCAR).

In a Couple of Words

Example

Suppose you’re conducting a survey about people’s income and education level. You collect data from a random sample of individuals, but some respondents leave certain fields blank. In an MCAR scenario, the missing data is not related to the respondents’ income or education level or any other factor. For instance, consider a situation where you have collected the following data:

Example of MCAR Data.
Respondent Income Education_Level
1 50000 Bachelor’s
2 75000 Master’s
3 NA High School
4 60000 Doctorate
5 42000 NA
6 NA Master’s

The missing data points (e.g., Respondent 3’s income and Respondent 5’s education level) are seemingly random and don’t show any particular pattern. The missing data points occur regardless of the income or education level, demonstrating that the data is missing completely at random (MCAR).

Graphical representations of (a) missing completely at random (MCAR), (b) missing at random (MAR), and (c) missing not at random (MNAR) in a univariate missing-data pattern. X represents variables that are completely observed, Y represents a variable that is partly missing, Z represents the component of the causes of missingness unrelated to X and Y, and R represents the missingness.
Graphical representations of (a) missing completely at random (MCAR), (b) missing at random (MAR), and (c) missing not at random (MNAR) in a univariate missing-data pattern. X represents variables that are completely observed, Y represents a variable that is partly missing, Z represents the component of the causes of missingness unrelated to X and Y, and R represents the missingness.