2023-02-08

Descriptive Statistics

  • The central tendency is the extent to which all the data values group around a typical or central value.

  • The variation is the amount of dispersion or scattering of values

  • The shape is the pattern of the distribution of values from the lowest value to the highest value.

Measures of Central Tendency:The Mean

  • The arithmetic mean (often just called the “mean”) is the most common measure of central tendency

Measures of Central Tendency:The Mean

  • The arithmetic mean (often just called the “mean”) is the most common measure of central tendency

  • The most common measure of central tendency

  • Mean = sum of values divided by the number of values

  • Affected by extreme values (outliers)

Measures of Central Tendency:The Median

  • In an ordered array, the median is the “middle” number (50% above, 50% below)

  • Not affected by extreme values

Measures of Central Tendency: Locating the Median

  • The location of the median when the values are in numerical order (smallest to largest):

  • If the number of values is odd, the median is the middle number

  • If the number of values is even, the median is the average of the two middle numbers

Measures of Central Tendency: The Mode

  • Value that occurs most often
  • Not affected by extreme values
  • Used for either numerical or categorical (nominal) data
  • There may be no mode
  • There may be several modes

Measures of Central Tendency:

Which Measure to Choose?

  • The mean is generally used, unless extreme values (outliers) exist.
  • The median is often used, since the median is not sensitive to extreme values. For example, median home prices may be reported for a region; it is less sensitive to outliers.
  • In some situations it makes sense to report both the mean and the median.

Measures of Central Tendency:

Review Example

Measures of Central Tendency:

Summary

Measures of Variation

Measures of Variation:The Range

  • Simplest measure of variation
  • Difference between the largest and the smallest values:

Measures of Variation:

Why The Range Can Be Misleading

  • Ignores the way in which data are distributed

  • Sensitive to outliers

Measures of Variation:

The Sample Variance

  • Low variation: more points close to the mean

  • High variation: more points far from the mean

  • So, measures the distance to the mean

Measures of Variation:

The Sample Variance

  • Average (approximately) of squared deviations of values from the mean

Measures of Variation:

The Sample Standard Deviation

  • Most commonly used measure of variation
  • Shows variation about the mean
  • Is the square root of the variance
  • Has the same units as the original data

Measures of Variation:

Comparing Standard Deviations

Locating Extreme Outliers:

Z-Score

Locating Extreme Outliers:

Z-Score

  • Suppose the mean math SAT score is 490, with a standard deviation of 100.
  • Compute the Z-score for a test score of 620.

General Descriptive Stats Using Using Rstudio

Summary Statistics from the Grocery Dataset

Spend FamilySize Age
Min. : 456 Min. :1.000 Min. :23.00
1st Qu.: 862 1st Qu.:2.000 1st Qu.:31.25
Median : 994 Median :2.500 Median :39.00
Mean :1085 Mean :2.667 Mean :40.90
3rd Qu.:1259 3rd Qu.:3.000 3rd Qu.:46.00
Max. :2136 Max. :5.000 Max. :69.00

Numerical Descriptive Measures for a Population

  • Descriptive statistics discussed previously described a sample, not the population.

  • Summary measures describing a population, called parameters, are denoted with Greek letters.

  • Important population parameters are the population mean, variance, and standard deviation.

Numerical Descriptive Measures for a Population:

The mean µ

  • The population mean is the sum of the values in the population divided by the population size, N

Numerical Descriptive Measures For A Population:

The Variance σ2

  • Average of squared deviations of values from the mean