Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.12 226.70 305.79 326.43 387.83 936.66 2
Data Visualizations, Boxplots, Outliers
2024-09-05
Today’s plan 📋
Review Question about Measures of Variability
A few minutes for R Questions 🪄
More about Boxplots
Outliers
IQR - Inter-quartile Range
UL - Upper limit and LL Lower Limit to detect outliers
In-class Exercises
In this course we will use R and RStudio to understand statistical concepts.
You will access R and RStudio through Posit Cloud.
I will post R/RStudio files on Posit Cloud that you can access in provided links.
I will also provide demo videos that show how to access files and complete exercises.
NOTE: The free Posit Cloud account is limited to 25 hours per month.
I will demo how to download completed work so that you can use this allotment efficiently.
For those who want to go further with R/RStudio:
Session ID: MAS261f24
In Lecture 3 we discussed the measures of variability, including TSS, variance, standard deviation, CV, and range.
Recall that all of these measures, except range, are closely related and can be calculated from each other.
An online electronics store has a selection of 31 \((n=31)\) different headphones. The sample mean \((\overline{X})\) price is $97.
Recall that \(SD = \sqrt{Var}\) and \(CV = \frac{SD}{\overline{X}}\)
Also recall that the R console (lower left pane) can be used like a calculator
If the variance in these prices is 1256, what is the coefficient of variation, CV? Round answer to two decimal places.
Annotated with Five Number Summary
Technical term is a Box and Whiskers Plot. This version shows where top whisker ends and circles above that.
Terminology
Notes
It is useful, but not required, to examine a boxplot (or histogram) to examine the data distribution.
We will introduce histograms at the end of this lecture.
Data visualizations can indicate if there are high or low outliers present.
Find Q1 (25th Percentile) and Q3 (75th Percentile)
summary
command.Calculate IQR, the Interquartile Range, \(IQR = Q3 - Q1\)
Calculate the LL, Lower Limit and UL, Upper Limit for determining outliers.
\(LL = Q1 - 1.5\times IQR\)
\(UL = Q3 + 1.5\times IQR\)
Examine values in sorted data to determine which values are
HIGH Outliers, values above the UL
Low Outliers, values below the LL
Session ID: MAS261f24
Based on the boxplots below, which market, domestic or foreign, has a higher median for the lifetime gross data?
A. Domestic
B Foreign
C. The median values for these two markets are approximately equal.
Session ID: MAS261f24
Which statement(s) are true about all three movie gross markets?
A. There are no outliers in these data.
B. There are only low outliers in these data.
C. There are only high outliers in these data.
D. There are low and high outliers in these data.
NOTE: All saved calculations are enclosed in parentheses so they will ALSO be displayed.
summary
command to find Q1 (25th Percentile or Quantile) and Q3 (75th Percentile or Quantile).Examine domestic data to determine if there are
ul_dom
)ll_dom
)Session ID: MAS261f24
Q4: How many LOW outliers are in the domestic gross data?
Instructions
Use the previous example and the summary
command to find Q1 and Q3 for the foreign gross data.
Find the IQR, LL, and UL for the foreign gross data.
Examine the sorted foreign gross data to determine how many high outliers are present.
Boxplots and the side-by-side boxplots are great for comparing the central tendancy and variability or two or more groups of data.
Another tool for examining the entire distribution of values is a histogram.
Next lecture we’re going to examine both categorical and quantitative data.
For categorical data, we’ll look at
Frequency Tables and Terminology
Bar Charts
Pie Charts
For quantitative data we’ll talk more about histograms
How are they created by the software
How can we modify what the software is doing to better understand the data.
What does the shape of the histogram tell us. For example, are the data
More about Boxplots
Identifying outliers visually from a boxplot
Identifying outliers numerically
Defining and calculating the Inter-quartile Range (IQR)
Defining upper and lower limits for determining outliers
Using lower and upper limits to identify outliers
Introduction to histograms
To submit an Engagement Question or Comment about material from Lecture 4: Submit it by midnight today (day of lecture).