Descriptive Statistics

Goal: to organize and summarize information contained in a variable or multiple variables

Graphical Description
Numerical Summaries

  • Univariate Analysis
    • descriptions and summaries of a single variable, e.g., income
  • Bivariate Analysis
    • analysis of relation between two variables, e.g., income and level of education
  • Multivariate Analysis
    • analysis of relation among more than two variables, e.g., income, level of education, and gender

Univariate Categorical Variable

Frequency Distribution

A frequency distribution (or frequency table) shows how data are partitioned among several categories (or classes) by listing the categories along with the number (frequency) of data values in each of them.

Relative Frequency Distribution

  • A variation of the basic frequency distribution is a relative frequency distribution, in which each class frequency is replaced by a relative frequency.

\[ \text{relative frequency of a class} = \frac{\text{frequency of a class}}{\text{sum of all frequencies}} \]

\[ \bbox[yellow,5px] { \color{black} { \begin{array}{c|c|c} \text{Animal} & \text{Frequency} & \text{Relative Frequency} \\ \hline \text{giraffes} & \text{20} & 35\% \\ \text{orangutans} & \text{14} & 25\% \\ \text{monkeys} & \text{23} & 60\% \end{array} } } \]

Displaying Frequency Distribution: Bar Chart

Univariate Continuous Variable

Email Data

Frequency distribution of number of characters in \(50\) email messages.

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    0    0    0    0    0    0    1    1    1     1
[2,]    1    2    2    3    3    3    4    4    5     5
[3,]    5    6    6    7    7    7    9    9    9    10
[4,]   10   10   11   11   12   14   14   16   17    22
[5,]   25   25   25   26   26   27   29   42   43    64

Email Data

Frequency distribution of number of characters in \(50\) email messages.

A cumulative frequency distribution is a variation of basic frequency distribution in which each class is the sum of the frequencies for that class and all previous classes.

char_num freq cumFreq relative
1 6 6 0.12
2 5 11 0.10
3 2 13 0.04
4 3 16 0.06
5 2 18 0.04
6 3 21 0.06
7 2 23 0.04
8 3 26 0.06
9 3 29 0.06
10 3 32 0.06
11 2 34 0.04
12 1 35 0.02
13 2 37 0.04
14 1 38 0.02
15 1 39 0.02
16 1 40 0.02
17 3 43 0.06
18 2 45 0.04
19 1 46 0.02
20 1 47 0.02
21 1 48 0.02
22 1 49 0.02
23 1 50 0.02

Dotplot

A dot plot uses dots to show the frequency, or number of occurrences, of the values in a data set. The higher the stack of dots, the greater the number of occurrences there are of the corresponding value.

Histogram

A histogram is a graph consisting of bars of equal width drawn adjacent to each other. The horizontal scale represents classes of quantitative data values; and the vertical scale represents frequencies. The heights of the bars correspond to frequency values.

Cumulative Frequency Histogram

Shape of a Distribution

Modes

Mode is represented by a prominent peak in the distribution.

1. Unimodel Distribution


Modes

  1. Bimodal Distribution

Modes

  1. Multimodal Distribution

Modes

  1. Uniform Distribution

– All the bins have the same frequency, or at least close to the same frequency.
– It is a distribution without a mode.

Symmetry

The histogram for a symmetric distribution will look the same on the left and the right of its center.

Skew

  • A histogram is skewed right if the longer tail is on the right side of the mode.
  • A histogram is skewed left if the longer tail is on the left side of the mode.

Outlier

  • An Outlier is a data value that is far above or far below the rest of the data values.

Next


Chapter 3: Describing, Exploring, and Comparing Data