About Mean & Normal Distribution

The mean’s importance can be described in two ways. On one hand, it serves as an essential tool for summarizing and interpreting data, providing a foundation for statistical analysis across various disciplines. On the other hand, the discussion surrounding the mean underscores the critical need for careful and nuanced understanding of statistical concepts to avoid misapplication and misinterpretation of statistical data. Raper underscores the mean’s role not just as a statistical tool but as a focal point in the broader conversation on effective data analysis and communication.

There are situations where the mean might not be the most useful measure for decision-making:

Skewed Distributions: In distributions that are highly skewed, either to the right (positively skewed) or to the left (negatively skewed), the mean can be pulled in the direction of the tail. In such cases, the median, which is the middle value when the data are ordered, may provide a better sense of the central tendency of the data. Let’s consider an example of a right-skewed distribution or positively skewed distribution, where the mean might not provide the most useful measure of central tendency due to the presence of outliers or a long tail to the right.

Presence of Outliers: Outliers are extreme values that differ significantly from other observations. The mean is sensitive to outliers, which can dramatically affect its value, whereas the median is more robust and not influenced by extreme values. Let’s consider a scenario where we have a set of exam scores that follow a normal distribution, and then we introduce a few extremely high scores as outliers. This example will illustrate how the mean is affected by the presence of outliers, even in data that is normally distributed.

Categorical or Ordinal Data: For categorical data (e.g., colors, types) or ordinal data (e.g., ratings), calculating the mean does not make sense as these data types do not represent numeric values or the numeric differences between values are not consistent. Instead, we use modes or frequency distributions to analyze such data.

Bimodal or Multimodal Distributions: In distributions with more than one peak (mode), the mean might not fall near any of these peaks, making it a poor representation of the data’s most common values. In a bimodal distribution, data are distributed around two different values (modes), which might make the mean fall between these modes and not accurately represent the most common values or the distribution’s shape.