POLS3316, Instructor: Tom Hanna, Fall 2023, University of Houston
2023-09-27
Plan for today
Lecture, Discussion, a little bit of code + A short overview of scalars, vectors, matrices + Some practice computing the measures so far + More on Standard Deviation and Variance + Percentiles, Deciles, Quartiles, Ranges
Data sets for those who needed help (and others)
- I'll post several to Canvas with topics
- Pick one and use it
Measures of Dispersion Continued
Yesterday we covered variance in detail and I told you that standard deviation is the square root of variance. Why do we care?
Example
Two companies with the same mean salary - $70,000.
Company A has a variance of 2,500.
Company B has a variance of 250,000.
Example: Assumptions:
In most companies entry level workers with no seniority are paid the least and workers with the most seniority are paid the most.
You have the same chance of getting hired at Company A or Company B
Salaries are normally distributed
As a new college graduate with interviews at both companies on the same, which is more appealing?
The distributions: code
#Simulating data for company A and B from the mean and standard deviationset.seed(7385)A <-rnorm(n=10000, mean=70000, sd=50)B <-rnorm(n=10000, mean=70000, sd=500)A_mean<-mean(A)stdA<-sqrt(var(A))B_mean<-mean(B)stdB<-sqrt(var(B))hist(A, density=20, prob=TRUE,main="Histogram with normal curve", col ="lightblue") curve(dnorm(x, mean=A_mean, sd=stdA), col ="blue", add=TRUE)
hist(B, density=20, prob=TRUE, col ="red") curve(dnorm(x, mean=B_mean, sd=stdB), col ="red", add=TRUE)
The distributions: code 2
Company A
Company B
The issue?
The variance vastly overstates the difference
Why? Because in variance the units are squared
The answer
By taking the square root of the variance, we get back to the original unit of measure. So, in the example:
Company A has a standard deviation of 50.
Company B has a standard deviation of 500.
The answer
One standard deviation from the mean is $69,950 at Company A.
One standard deviation from the mean is $69,500 at Company B.
The answer
Because of some rules we’ll discuss more later, 99.7% of employees are within 3 standard deviations of the mean.
So, 99.7% of Company A employees make between $69,850 and $70,150 a year.
99.7% of Company B employees make between $68,500 and $71,500 a year.
So, you probably better look hard at things besides salary because there isn’t that much difference after all.
Other Measures
Here are a few other measures you should be aware of both for the midterm and final tests and because they are commonly used:
Quartiles: divides data into four chunks
Deciles: divides into 10 chunks
Percentiles: divides into 100 chunks
Interquartile Range: Between the 1st and 3rd quartiles
The bottom line of the box is the first quartile. The top line is the third quartile. The heavy center line is the median. The “whiskers” show the largest or smallest observation that falls within a distance of 1.5 times the box size from the nearest box edge or “hinge”.** In this case, this plot doesn’t really tell us a whole lot, but it can be useful when the distributions are more varied.
#The code probs = seq(.1, .9, by = .1) is just telling R that #instead of the default of quartiles we want to split the data by #a sequence running from .1 to .9 and separate by .1. Another way #of saying that is 0.1, 0.2, 0.3, 0.4 and so on up to 0.9, or ten #ranks - deciles. quantile((A), probs =seq(.1, .9, by = .1))