DATA PRESENTATION

Data Presentation and Visualization

Data Visualization is a term used to describe the use of graphical displays to:

summarize
present

Data becomes more comprehensible and more useful when organized and presented

Data Patterns in Graphs

Data patterns are commonly described in terms of the:

Center: data point where about half of the observations are on either side.

Spread: variability of the data

Shape: can be described by the characteristics:

Symmetry

Number of Peaks

Skewness

Other data patterns:

Unusual Features: if there are gaps or if there are outliers.

Summarizing Qualitative and Quantitative Data for a Single Variable

FREQUENCY DISTRIBUTION TABLE

Shows how often each value (or set of values) of the variable in question occurs in a data set.
tabular summary of data showing frequency or number

Relative Frequency Distribution

Gives tabular summary of data showing relative frequency of each class.

Percent Frequency Distribution

presents percent frequency of the data for each class

Frequency Distribution Table

Example 1
Create a Frequency Distribution Table for the data on soft drink purchases presented on the following table.

Purchase	Purchase	Purchase	Purchase
Coke Classic	Sprite	Pepsi	Diet Coke
Coke Classic	Coke Classic	Pepsi	Diet Coke
Coke Classic	Diet Coke	Coke Classic	Coke Classic
Coke Classic	Diet Coke	Pepsi	Coke Classic
Coke Classic	Dr. Pepper	Dr. Pepper	Sprite
Coke Classic	Diet Coke	Pepsi	Diet Coke
Pepsi	Coke Classic	Pepsi	Pepsi
Coke Classic	Pepsi	Coke Classic	Coke Classic
Pepsi	Dr. Pepper	Pepsi	Pepsi
Sprite	Coke Classic	Coke Classic	Coke Classic
Sprite	Dr. Pepper	Diet Coke	Dr. Pepper
Pepsi	Coke Classic	Pepsi	Sprite
Coke Classic	Diet Coke

Frequency Distribution Table

The R Script:

#install.packages("readr")
#install.packages("pander")
library(readr)

Warning: package 'readr' was built under R version 4.1.1

library(pander)

Warning: package 'pander' was built under R version 4.1.1

# Import "purchase.csv" data and store it in 'pchase'.
pchase <- read.csv("purchase.csv")

# Determine the frequencies for each observation.
pchase.freq = table(pchase)
pander(pchase.freq)

Coke Classic	Diet Coke	Dr. Pepper	Pepsi	Sprite
19	8	5	13	5

# Create the Frequency Distribution Table.
freq.dist <- cbind(pchase.freq)
colnames(freq.dist) <-c("Frequency")

The Frequency Distribution Table

The Tabular Output

# Generate created Frequency Distribution Table.
pander(freq.dist)

	Frequency
Coke Classic	19
Diet Coke	8
Dr. Pepper	5
Pepsi	13
Sprite	5

Relative Frequency Distribution Table

The R script:

data.relfreq<-pchase.freq/nrow(pchase)
relfreq.dist<-cbind(data.relfreq) 
colnames(relfreq.dist) <-c("Relative Frequency")

Relative Frequency Distribution Table

The Tabular Output

pander(relfreq.dist)

	Relative Frequency
Coke Classic	0.38
Diet Coke	0.16
Dr. Pepper	0.1
Pepsi	0.26
Sprite	0.1

Relative Frequency Distribution Table

Example 2
A survey was taken in Aurora Avenue. In each of 20 homes, people were asked how many cars were registered to their households. The results were recorded as follows: 1, 2, 1, 0, 3, 4, 0, 1, 1, 1, 2, 2, 3, 2, 3, 2, 1, 4, 0, 0

Relative Frequency Distribution Table

The R Script:

# Create the data vector.
cars <- c(1, 2, 1, 0, 3, 4, 0, 1, 1, 1, 2, 2, 3, 2, 3, 2, 1, 4, 0, 0)

# Determine the frequencies.
# Compute for relative and percent frequencies, respectively.
car.freq<-table(cars) 
car.relfreq<-car.freq/sum(car.freq) 
car.pctfreq<-car.relfreq*100

# Create the tabular output.
car.freqdist<-cbind(car.freq, car.relfreq, car.pctfreq)
colnames(car.freqdist) <-c("Frequency", "Relative Frequency", "Percent Frequency")

Complete Frequency Distribution

# Generate the tabular output.
pander(car.freqdist)

	Frequency	Relative Frequency	Percent Frequency
0	4	0.2	20
1	6	0.3	30
2	5	0.25	25
3	3	0.15	15
4	2	0.1	10

Grouped Frequency Distribution Table

Individual data values are classified into categories called class intervals.
Not advisable to create and use one for data analysis.
Simply created as a convenient means of organizing and summarizing data.

Grouped Frequency Distribution Table

Steps in Creating a Grouped FDT:

Determine the number of classes, k.
(Use Sturges’ formula.)

Calculate the class size (or class width), c.

Enumerate the class intervals.

Tally the observations.

Grouped Frequency Distribution Table

Example
Consider the following data set presented in Example 3 of the module:
425, 430, 430, 435, 435, 435, 435, 435, 440, 440, 440, 440, 440, 445, 445, 445, 445, 445, 450, 450, 450, 450, 450, 450, 450, 460, 460, 460, 465, 465, 465, 470, 470, 472, 475, 475, 475, 480, 480, 480, 480, 485, 490, 490, 490, 500, 500, 500, 500, 510, 510, 515, 525, 525, 525, 535, 549, 550, 570, 570, 575, 575, 580, 590, 600, 600, 600, 600, 615, 615

Create a frequency distribution table with 7 class intervals.

Grouped Frequency Distribution Table

The R script

# Load necessary packages.
library(readr) 
library(pander)

# Import data into RStudio.
rent <-read.csv("rent.csv")

# Generate regular sequences of values
breaks <-seq(425, 621, by =28)

# Create the class intervals and assign data values to these.
classint<-cut(rent$Rent, breaks, right =FALSE)

# Determine frequencies of each class interval.
freq<-table(classint)
# Transform table to column format.
freq.dist<-cbind(freq)
# Provide label to the column of frequencies.
colnames(freq.dist) <-c("Frequency")

Grouped Frequency Distribution Table

The Tabular Output

# Generate the Grouped FDT.
pander(freq.dist)

	Frequency
[425,453)	25
[453,481)	16
[481,509)	8
[509,537)	7
[537,565)	2
[565,593)	6
[593,621)	6

Bar Chart

Used to display qualitative data summarized in a frequency, relative frequency, or percent frequency distribution.
Vertical bar chart: horizontal axis - categories; vertical axis - value (freq., rel. freq., % freq.)

Bar Chart

Example for Data on Soft Drink Purchases

The R Script:

# install.packages(“tidyverse”) 
# install.packages(“forcats”)
# Load necessary packages.
library(readr) 
library(tidyverse) 
library(forcats)

# Import "purchase.csv" file.
purchase <-read.csv("purchase.csv")

Bar Chart

The Chart:

# Create the chart. Store it to 'bar1'.
bar1 <- ggplot(purchase, aes(x=Purchase)) + geom_bar(width=.5)+
  ggtitle("Soft Drink Purchases")

# Present the generated chart.
bar1

Bar Chart

The Graphical Output (ordered bars):

bar2 <- ggplot(mutate(purchase, Purchase =fct_infreq(Purchase)))+ 
  geom_bar(aes(x = Purchase), width = .5) + ggtitle("Soft Drink Purchases") 
bar2

Pie Chart

Provides another graphical device for presenting the frequency, relative frequency, or percent frequency distributions for qualitative data.

The pie chart makes use of sectors of a circle where the numerical values presented by each sector could be the frequencies, relative frequencies or percent frequencies.

The angle of a sector is proportional to the frequency of each of the categories of the variable.

The Dot Plot

Similar to a bar graph.
The height, represented by the number of dots, equals the number of items in a certain category.

Example (Data on the number of cars registered to each household).
Data: 1, 2, 1, 0, 3, 4, 0, 1, 1, 1, 2, 2, 3, 2, 3, 2, 1, 4, 0, 0

The Dot Plot

The R Script

library(ggplot2) 
library(readr) 

# Import the "cars.csv" data file
cars <-read.csv("cars.csv") 

# Generate the plot
dplot <-ggplot(cars, aes(cars)) + geom_dotplot(binwidth = 0.25)

The Dot Plot

The Output Plot

dplot

The Dot Plot

Using the “stripchart” function

The R Script

# Manually construct the data vector.
cars <- c(1, 2, 1, 0, 3, 4, 0, 1, 1, 1,2, 2, 3, 2, 3, 2, 1, 4, 0, 0)

# Creater the dotplot
dplot <- stripchart(cars, method = "stack", at = c(0.05), 
            pch = 20, cex = 3.2, las = 1, frame.plot = FALSE, 
            xlim = c(0,5), main = "Number of Cars Registered")

Stem-and-Leaf Plot

It shows both rank order of data as well as the shape.

Useful for numerous data.

Example 1

Data: 22, 29, 22, 31, 20, 12, 14, 24, 13, 4, 2, 1

Stem-and-Leaf Plot

The R Script

# Create the data vector
data <- c(22, 29, 22, 31, 20, 12, 14, 24, 13, 4, 2, 1)

# Create the stem-and-leaf plot
stem(data, scale = 1)

  The decimal point is 1 digit(s) to the right of the |

  0 | 124
  1 | 234
  2 | 02249
  3 | 1

Stem-and-Leaf Plot

Example 2

Data: 8.6, 11.7, 9.4, 9.1, 10.2, 11.0, 8.8

Stem-and-Leaf Plot

The R Script

# Create the data vector
data <- c(8.6, 11.7, 9.4, 9.1, 10.2, 11.0, 8.8)

# Create the stem-and-leaf plot
stem(data, scale = 1)

  The decimal point is at the |

   8 | 68
   9 | 14
  10 | 2
  11 | 07

Crosstabulation

A crosstabulation is a tabular summary of data for two variables.
It is also called a contingency table.

The two variables can be:

both qualitative
both quantitative
combination

Crosstabulation

Example: Consider the data below:(named “example.csv”)

Respondent	Gender	Age	Education
1	Male	20	Bachelor’s Degree
2	Male	18	Undergraduate
3	Female	19	Undergraduate
4	Male	25	Bachelor’s Degree
5	Female	37	Master’s Degree
6	Female	15	Undergraduate
7	Female	40	PhD
8	Male	43	Bachelor’s Degree
9	Male	60	Bachelor’s Degree
10	Female	65	Master’s Degree
11	Female	42	Bachelor’s Degree

For the given data:

Create a crosstabulation of Gender vs Education
. Create a crosstabulation of Age vs Education where for the age, classify each respondents as a “Teen” (from 15 to 19 years of age), an “Adult” (from 20 to 59 years of age), or a “Senior” (from 60 years of age and above).