Guiding Question #1: Is a Normal model appropriate for the given data?

Guiding Question #2: Do we have major reasons to doubt that the sample could have come from a Normal population? (This is an important assumption for t procedures in AP Stats)

  1. Look at a histrogram or dot plot of the data
  1. Look at a Normal Probability Plot- is it roughly linear?
set.seed(16)  ## to generate the same graphs I generated, use this command. To generate different random samples, change this number to whatever you'd like. I chose 16 because it's my lucky number!
x<-rnorm(100)
hist(x)

qqnorm(x)

y<-rexp(100)  ## extra credit: this is an exponential distribution. What transformation could we apply so that a Normal approximation is appropriate? Scroll to the bottom to see the answer.
hist(y)

qqnorm(y)

c<-rchisq(100,df=2)
hist(c)

qqnorm(c)

Let’s generate several examples of NPP made from a population which we KNOW is Normally distributed.

for(i in 1:9){
  set.seed(i)
  x<-rnorm(20)        ## change the sample size to different sizes to see how the NPP is affected
  par(mfrow=c(1,2))
  hist(x,main="R.S. from NORMAL pop.")
  qqnorm(x)
}

Now, let’s take a look at some examples of Normal Probability Plots from a population that we know is not Normal. For these examples, we’ll look at random samples from an Exponential distribution

for(i in 1:10){
  set.seed(i)
  x<-rexp(20)
  par(mfrow=c(1,2))
  hist(x,main="R.S. from Exponential Pop.")
  qqnorm(x)
}

Now, let’s simulate some samples from a Cauchy population distribution:

for(i in 1:9){
  set.seed(i)
  x<-rcauchy(20)
  par(mfrow=c(1,2))
  hist(x,main="R.S. from CAUCHY Pop.")
  qqnorm(x)
}

Finally, let’s simulate some samples from a Uniform population distribution. This distribution can be difficult to identify in small samples!

for(i in 1:9){
  set.seed(i)
  x<-runif(200)   ## with a large sample size (n=200 here), this is clearly not Normal. Change the sample size to 20 though and you'll see that it can be very difficult to identify that the samples do not come from a Normal distribution
  par(mfrow=c(1,2))
  hist(x,main="R.S. from UNIFORM Pop.")
  qqnorm(x)
}

Extra Credit Discussion

hist(y)

logy=log(y)   ## since the sample comes from an exponential population, a log transformation seems reasonable
hist(logy)  ## the transformed data are much less skewed...

qqnorm(logy)  ## the QQ plot is more linear. Do you think t methods are appropriate?