2023-09-06
Today’s plan 📋
Review Question about Measures of Variability
A few minutes for R Questions 🪄
More about Boxplots
Outliers
IQR - Inter-quartile Range
UL - Upper limit and LL Lower Limit to detect outliers
In-class Exercises
Review: You have two options to facilitate your introduction to R and RStudio:
If you are comfortable with coding: Start with Option 1, but still sign up for Posit Cloud account.
If you are nervous about coding: Choose Option 2.
For both options: I can help with download/install issues during office hours.
What I do: I maintain a Posit Cloud account for helping students but I do most of my work on my laptop.
NOTE: We will use R and RStudio in class during MOST lectures
In lecture 3 we discussed the measures of variability, including TSS, variance, standard deviation, CV, and range.
Recall that all of these measures, except range, are closely related and can be calculated from each other.
An online electronics store has a selection of 31 \((n=31)\) different headphones. The sample mean \((\overline{X})\) price is $97.
Recall that \(SD = \sqrt{Var}\) and \(CV = \frac{SD}{\overline{X}}\)
Also recall that the R console (lower left pane) can be used like a calculator
If the variance in these prices is 1256, what is the coefficient of variation, CV? Round answer to two decimal places
Technical term is a Box and Whiskers Plot. This version shows where top whisker ends and circles above that.
It is useful, but not required, to examine a boxplot (or histogram) to examine the data distribution.
We will introduce histograms at the end of this lecture.
Data visualizations can indicate if there are high or low outliers present.
Find Q1 (25th Percentile) and Q3 (75th Percentile)
summary
command.Calculate IQR, the Interquartile Range, \(IQR = Q3 - Q1\)
Calculate the LL, Lower Limit and UL, Upper Limit for determining outliers.
\(LL = Q1 - 1.5\times IQR\)
\(UL = Q3 + 1.5\times IQR\)
Examine values in sorted data to determine which values are
HIGH Outliers, values above the UL
Low Outliers, values below the LL
Based on the boxplots below, which market, domestic or foreign, has a higher median for the lifetime gross data?
A. Domestic
B Foreign
C. The median values for these two markets are approximately equal.
Which statement(s) are true about all three movie gross markets?
A. There are no outliers in these data.
B. There are only low outliers in these data.
C. There are only high outliers in these data.
D. There are low and high outliers in these data.
NOTE: All Saved calculations are enclosed in parentheses so they will ALSO be displayed.
summary
command to find Q1 (25th Percentile or Quantile) and Q3 (75th Percentile or Quantile).Examine domestic data to determine if there are
ul_dom
)ll_dom
)Q4: How many LOW outliers are in the domestic gross data?
Instructions
summary
command to find Q1 and Q3 for the foreign gross data.Boxplots and the side-by-side boxplots are great for comparing the central tendancy and variability or two or more groups of data.
Another tool for examining the entire distribution of values is a histogram.
Next lecture we’re going to examine both categorical and quantitative data.
For categorical data, we’ll look at
Frequency Tables and Terminology
Bar Charts
Pie Charts
For quantitative data we’ll talk more about histograms
How are they created by the software
How can we modify what the software is doing to better understand the data.
What does the shape of the histogram tell us. For example, are the data
More about Boxplots
Identifying outliers visually from a boxplot
Identifying outliers numerically
Defining and calculating the Inter-quartile Range (IQR)
Defining upper and lower limits for determining outliers
Using lower and upper limits to identify outliers
Introduction to histograms
To submit an Engagement Question or Comment about material from Lecture 4: Submit by midnight today (day of lecture). Click on Link next to the ❓ under Lecture 4