####Problem Set 9
##Katelyn Burton
library(visualize)
#What percent of a standard normal distribution N(µ = 0, SD = 1) is found in each region? Be sure to draw a graph.
#Z < -1.35
visualize.norm(stat=-1.35,mu=0,sd=1,section="lower")

#The percent of a standard normal distribution is 8.85%.
#Z > 1.48
visualize.norm(stat=1.48,mu=0,sd=1,section="upper")

#The percent of a standard normal distribution is 6.94%.
visualize.norm(stat=c(0.4,1.5),mu=0,sd=1,section="bounded")

#The percent of a standard normal distribution is 27.8%.
#|Z| > 2
#Z >2 and -Z >2 or Z < -2
#Finding probability of -2 > Z > 2
visualize.norm(stat=c(-2,2),mu=0,sd=1,section="bounded")

#The percent of a standard normal distribution is 95.4%.
#(a) Write down the short-hand for these two normal distributions.
#Verbal: N(μ=151,σ=7)
#Quantitative: N(μ=153,σ=7.67)
#(b) What is Sophia's Z-score on the Verbal Reasoning section? On the Quantitative Reasoning section?
#Verbal: Z=160−1517=1.285714
#Quantitative: Z=157−1537.67=0.5215124
#Draw a standard normal distribution curve and mark these two Z-scores.
curve(dnorm, from = -5, to=5)
abline(v=1.285714, col="blue")
abline(v=0.5215124, col="red")
text(1.285714+1, 0.3, "Verbal: 1.29",col="blue")
text(0.5215124-1.5, 0.1, "Quantitative: 0.52", col="red")

#What do these Z-scores tell you?
#The Z-scores tell me how far above the average scores Sophia scored for both tests. That is, she scored 1.2857 standard deviations above the mean for the Verbal test and 0.5215 standard deviations above the mean for the Quantitative test.
#Relative to others, which section did she do better on?
#Relative to others, Sophia did better on the Verbal section than she did on the Quantitative section, since that has the higher Z-score.
#Find her percentile scores for the two exams.
#Verbal: Using the lookup table we have 0.9015 (z-score = 1.29). So she was in the 90th percentile.
#Quantitative: Using the lookup table we have 0.6985 (Z-score = 0.52). Rounding, she was in the 70th percentile.
#We can also use pnorm() to get the percentiles for the normal distribution.
pnorm(1.285714)
## [1] 0.9007286
pnorm(0.5215124)
## [1] 0.6989951
#What percent of the test takers did better than her on the Verbal Reasoning section? On the Quantitative Reasoning section?
#Given the above, 10% of test takers did better than her on the Verbal Reasoning section (100-90) and 30% of test takers did better than her on the Quantitative Reasoning section (100-70).
#Explain why simply comparing her raw scores from the two sections would lead to the incorrect conclusion that she did better on the Quantitative Reasoning section.
#I don’t think it would, but it could. In the current problem, the raw Verbal score is higher than the raw quantitative score. However, it is possible for someone to have a higher raw score in one section than the other, but in fact do worse in that section. This can occur when the scales used are very different, or even slightly different as in this problem.
#For example, if we lower her Verbal to 156, then her new Z-score for Verbal is (156-151)/7 = 0.7142857. Notice that even though the raw score is lower than her Quantitative score, the Z-score is still higher than the Z-score of her Quantitative score. In other words, she still would have done better on the Verbal with this score, even though the raw score is lower than the Quantitative score. This highlights the importance of normalizing values before doing comparisons so that the comparisons are done using the same units of measure (in this case, standard deviations from the mean).
#If the distributions of the scores on these exams are not nearly normal, would your answers to parts (b) - (f) change? Explain your reasoning.
#Parts b-d are still doable since Z-scores can be calculated for any kind of distribution and used to compare values. However, (e) and (f) ask us about the percentiles, and so we would need to know what distribution we are using in order to calculate these using our Z-scores. Otherwise, we would have to say that (e) and (f) are unanswerable without more information.
#z = qnorm(.8) = .84
quanmean = 153
quansd = 7.67
z80 <- (.84 * quansd) + quanmean
show(z80)
## [1] 159.4428
#z = qnorm(.3) = -.5244005, 70% worse in other words is 30% or less.
verbmean = 151
verbsd = 7
z30 <- (-.52*verbsd) + verbmean
show(z30)
## [1] 147.36
#What is the probability of observing an 83 F temperature or higher in LA during a randomly chosen day in June?
pnorm(83, mean = 77 , sd = 5, lower.tail = FALSE)
## [1] 0.1150697
visualize.norm(stat = 83, mu = 77, sd = 5, section = "upper")

#The probability of observing 83 F weather or higher temperature in LA is .1150 or 11.5%. The Z-score of this distribution is 1.2.
#How cool are the coldest 10% of the days (days with lowest average high temperature) during June in LA?
#z = qnorm(.1) = -1.281552
#using the z-score formula = (-1.28 * 5) + 77 = 70.6
pnorm(70.6, mean = 77, sd = 5, lower.tail = TRUE)
## [1] 0.1002726
visualize.norm(stat = 70.6, mu = 77, sd = 5, section = "lower")

#The temperature of the coolest 10% days in LA is around 70.6 F and less. The z-score of this distribution is -1.28.