Markdown Author: Jessie Bell, 2023

Libraries Used: none

Answers: purple

Part I

Calculating probabilities using the normal distribution with z scores

1

ht <- seq(150, 190, .01) #seq creates a sequence of numbers that say, "start at 150, end at 190 with 0.1 increments"

plot(ht, dnorm(ht, 170, 8), type="l", col="#00c19a")

verify your z-score

Answer The probability of a value under the Standard Normal Curve (SNC) taking on the value of -1.25 or smaller is 0.106

meanht <- 170 #see plot function above
sdht <- 8 #see plot function above

(160-meanht)/sdht

## [1] -1.25

#checks out!

#Caculate the prob of a vale under the SNC taking on the value of -1.25 or smaller

pnorm(-1.25)

## [1] 0.1056498

2

What is the probability that a random selected individual will be:

a. Shorter than 165 cm?

Answer The probability of a randomly selected individual being shorter than 165 cm tall is 0.266

a <- (165-meanht)/sdht
a

## [1] -0.625

anew <- pnorm(a)
anew

## [1] 0.2659855

plot(ht, dnorm(ht, 170, 8), type="l", col="#e68613")+
abline(v=165, col=2) #not necessary for you to have, but good for you to visualize

## integer(0)

b. Taller than 175 cm?

Answer The probability of a randomly selected individual being taller than 175 cm tall is 1 - 0.734 = 0.266, which checks out since our values are both on equal sides of our mean.

b <- (175-meanht)/sdht
b

## [1] 0.625

bnew <- 1-pnorm(b) #becuase now you want the right tail
bnew

## [1] 0.2659855

#another way to calculate this is: 
pnorm(b, lower.tail = F) #where you tell r to calculate the upper tail instead of the lower tail.

## [1] 0.2659855

plot(ht, dnorm(ht, 170, 8), type="l", col="lightgreen")+
abline(v=175, col=2) #not necessary for you to have, but good for you to visualize

## integer(0)

c. Between 165-175?

Answer The probability of a randomly selected individual being between 165 and 175 cm tall is the probability that it is not in the tails we calculated in a and b. We know that our total probability is 1, so add the tails and subtract from 1 to find the middle. The probability of this middle is 0.468.

c <- 1-(anew+bnew)
c #checks out! since 68.26% of normal distribution lies within 1 standard deviation of the mean, and 175-165 is still shy of the first standard deviation.

## [1] 0.4680289

plot(ht, dnorm(ht, 170, 8), type="l", col="tomato")+
abline(v=165, col=2)+
abline(v=175, col=2) #not necessary for you to have, but good for you to visualize

## integer(0)

Hypothesis Practice

Applying this to a hypothesis test:

Helpful phrase: “If the p is low, the null has got to go.”

avglifespan_hrs <- 1200

n_lightbulbs <- 36
mean_sample_hrs <- 1150
sd_sample_hrs <- 100

a. Our hypotheses are:

H₀: μ = 1200

H_A: μ < 1200

b. Our z stat is -0.5.

lightbulbs <- seq(800, 1500, .1)

plot(lightbulbs, dnorm(lightbulbs, 1150, 100), type="l", col="plum")

bulb <- (mean_sample_hrs-avglifespan_hrs)/sd_sample_hrs

bulb

## [1] -0.5

c. Our critical value is approximately -1.645.

# For a two-tailed test with alpha = 0.05
critical <- qnorm(0.05)

d. The p-value for this test is 0.31.

p <- pnorm(-0.5)
p

## [1] 0.3085375

e. The p-value (0.31) is greater than the significance level (0.05), so the consumer advocacy group fails to reject the null hypothesis. There is not enough evidence to conclude that the average lifespan of the light bulbs is less than 1200 hours, supporting the lighbulb business.

Part II

Testing for normality graphically below. Please know that normality is one of the three assumptions you are making when you run a t-test.

Mutant <- c(35.4, 33.0, 28.0, 42.2, 34.1, 33.2, 32.1, 36.8, 42.6, 33.2, 39.4, 28.6, 33.6,
36.0, 31.6, 29.3, 29.8, 23.1, 42.9, 32.7, 39.7, 38.4, 42.9, 29.2)

Wildtype <- c(33.6, 36.0, 31.6, 29.3, 29.8, 23.1, 42.9, 32.7, 39.7, 38.4, 42.9, 29.2, 32.5,
34.1, 33.8, 32.7, 36.7, 42.6, 43.2, 59.4, 18.6, 43.6, 56.0, 51.6, 24.3, 25.8, 53.3, 47.6, 32.7)

rangeData <- c(72.9, 40.9, 36.7, 64.2, 104.2, 33.6, 55.1, 44.3, 40.0, 91.1, 78.8) #not sure why this is a part of the lab?

a. Histograms, theoretically normal

mean(Mutant)

## [1] 34.49167

sd(Mutant)

## [1] 5.311507

hist(Mutant, ylim = c(0,.1), freq = F, col="pink")
curve(dnorm(x, 34.5, 5.3), add=T)

3

What is the normality graphic for the Wildtype data?

Answer The distribution for Wildtype below has a bit of a right skew, the Mutant distribution above looks pretty normal. Because Wildtype seems like it could violate our normality assumption, we can also check for normality using other tests like qqplot or Shapiro-Wilks, you will see examples of this in Part II problem 5, and Part III problem 6.

mean(Wildtype)

## [1] 37.16207

sd(Wildtype)

## [1] 9.950033

hist(Wildtype, ylim = c(0,.06), freq = F, col="steelblue")
curve(dnorm(x, 37.2, 9.95), add=T)

4

Calculate the quartiles of a standard normal distribution. Share your scripts and

the results.

Answer Q1: -0.67; Q2: 0; Q3: 0.67. These values represent the z-scores corresponding to the first quartile (Q1), median (Q2), and third quartile (Q3) of the standard normal distribution.

qnorm(0.25)

## [1] -0.6744898

qnorm(0.5)

## [1] 0

qnorm(0.75)

## [1] 0.6744898

Standard Normal Curve %

Answer notice that within one standard deviation of the mean is where about 68% of all of the data lie within the standard normal curve.

5

What do your results look like and how do they compare to the histogram for the Mutant data?

Answer They both look linear, suggesting that they are normally distributed.

qqnorm(Mutant, main="Mutant", col="pink")
qqline(Mutant)

qqnorm(Wildtype, main="Wildtype", col="steelblue")
qqline(Wildtype)

Part III

Testing for normality with statistical tests: SHAPIRO-WILK

Shapiro-wilk hypotheses: H₀: the data are normal; H_A: the data are not normal

6

Are either of your data normally distributed? Does this conclusion match the conclusion you made about your qqplot?

Answer The p-value for both Mutant & Wildtype is greater that 0.05 (p > 0.05) and we conclude that there is evidence suggesting the data ARE normally distributed. In case you still aren’t sure, plot a friggin’ histogram! In fact, I always begin there. Visualize your data first. ALWAYS.

shapiro.test(Mutant)

## 
##  Shapiro-Wilk normality test
## 
## data:  Mutant
## W = 0.95859, p-value = 0.4107

shapiro.test(Wildtype)

## 
##  Shapiro-Wilk normality test
## 
## data:  Wildtype
## W = 0.96723, p-value = 0.4872

hist(Mutant, main="Mutant Distribution", col="pink")

hist(Wildtype, main="Wildtype Distribution", col="steelblue")

Notes: A histogram of your data, a qqplot of your data, AND the Shapiro-Wilks tests are all just telling you if your data violate the assumption of normality. You should also be testing equal variance in this lab, so I am not exactly sure why there are no examples of this.

Assumptions of your data

You must ensure 3 things before running a t-test

1. your data are random, aka INDEPENDENT. Read Ch. 1 & Ch. 4 of your text for more information on randomness. For now, we assume our data are random since the data are already collected, and randomness happens in experimental design – which you will read more about in Ch. 13.

2. your data is approximately normal: histogram, qqplot, or the Shapiro-Wilks test

3. your data have equal variance: jitter/stripchart, Levene’s Test (can be done in R by adding the package “car” to your library and following the code below):

#install.packages(car)
library(car)

## Loading required package: carData

# Data
Mutant <- c(35.4, 33.0, 28.0, 42.2, 34.1, 33.2, 32.1, 36.8, 42.6, 33.2, 39.4, 28.6, 33.6, 36.0, 31.6, 29.3, 29.8, 23.1, 42.9, 32.7, 39.7, 38.4, 42.9, 29.2)
Wildtype <- c(33.6, 36.0, 31.6, 29.3, 29.8, 23.1, 42.9, 32.7, 39.7, 38.4, 42.9, 29.2, 32.5, 34.1, 33.8, 32.7, 36.7, 42.6, 43.2, 59.4, 18.6, 43.6, 56.0, 51.6, 24.3, 25.8, 53.3, 47.6, 32.7)

# Combine data into a data frame
WildData <- data.frame(Group = rep(c("Mutant", "Wildtype"), times = c(length(Mutant), length(Wildtype))),
                   Value = c(Mutant, Wildtype))

# Perform Levene's test
levene_test_result <- leveneTest(Value ~ Group, data = WildData)

## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

print(levene_test_result)

## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value  Pr(>F)  
## group  1  5.3948 0.02422 *
##       51                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#the p value is less that 0.05 so we can reject our null hypothesis in favor of the alternative. Our data are not equal variance, so we might not have enough assumptions met to even run a t-test. In these examples we will proceed becauase the volation is due to the low sampling number.

Part IV

Working with island.csv

islandData <- read.csv("island.csv")

Try this on your own and I will upload the new key next week.

Lab 05: Z tests and Testing for Normality