#The data folder contains the same data in 3 different formats - import any as it is the same file. #The variables in the data set are defined as follows: #• launch : this numbers the temperature-sorted observations from 1 to 23. #• temp : temperature in degrees Fahrenheit at the time of launch. #• o_ring_probs : counts the number of O-ring partial failures experienced on the flight.
#1Print the measures of center (like mean, median, mode, …), spread (like sd, min, max, …) and shape (skewness, kurtosis, …) for the variables in the data. HINT: You can use the describe function in “psych” package for this
data <- read.csv('./challenger-2.csv') # Downloading the data
library(psych)
summary(data)
## launch temp incident o_ring_probs
## Min. : 1.0 Min. :53.60 Length:23 Min. :0.0000
## 1st Qu.: 6.5 1st Qu.:66.20 Class :character 1st Qu.:0.0000
## Median :12.0 Median :69.80 Mode :character Median :0.0000
## Mean :12.0 Mean :69.02 Mean :0.4348
## 3rd Qu.:17.5 3rd Qu.:74.30 3rd Qu.:1.0000
## Max. :23.0 Max. :80.60 Max. :3.0000
#There are four levels of measurement we use: Nominal, Ordinal, Interval, and Ratio.
#Nominal variables refer to data that can be classified into distinct categories without any order or ranking. Examples of nominal variables include gender, nationality, or hair color.
#Ordinal variables arrange data in order, but the intervals between positions are not necessarily equal. Examples include ranking of preferences, class grades like A, B, C, etc., or satisfaction levels.
#Interval variables have equal intervals between values but no true zero, allowing for meaningful addition and subtraction. Typical examples of interval variables include temperature in Celsius and dates.
#Ratio variables have a true zero point, making operations like multiplication and division meaningful. Examples of ratio variables include height, weight, and age.
# The "launch" variable is a categorical variable that signifies the order in which observations were recorded. It is not a continuous or numeric variable but rather serves as a sequence identifier for the observations. While there is a numerical aspect to it, it does not imply a specific measurement scale or equal intervals. As a result, it is considered an ordinal variable.
# The "temp" variable represents the temperature in degrees Fahrenheit at the time of the launch. It is a continuous numeric variable, but it lacks a true zero point (absolute zero in Fahrenheit is -459.67°F), making it an interval variable. It is possible to perform mathematical operations on temperature values (e.g., addition and subtraction) and calculate differences, but it doesn't make sense to say that one temperature is "twice as hot" as another.
# The "incident" variable is a categorical variable that represents the presence or absence of an incident with O-rings. It has two categories, "Yes" and "No," with no inherent order or meaningful numerical values. Therefore, it is a nominal variable.
# The "o_ring_probs" variable represents the count of O-ring partial failures on a flight. It is a discrete numeric variable with a true zero point (i.e., a flight with zero partial failures indicates the absence of partial failures). You can perform mathematical operations, such as addition, subtraction, multiplication, and division, on this variable, which makes it a ratio variable.
#1.3 Third, provide an appropriate graph for the variable o_ring_probs. Interpret. Boxplot is acceptable, though histogram would be better.
# Plotting Histogram
hist(data$o_ring_probs)
boxplot(temp~incident,
data = data,
horizontal = TRUE,
main = "Boxplot of Temp vs Incident",
xlab = "Temp",
col = "blue")
# There was a lower temperature during incidents compared to no incidents, indicating that lower temperatures increase the probability of an incident.
which(data$incident == "No")
## [1] 5 6 7 8 9 10 12 14 15 16 17 19 20 21 22 23
#We can see that the first launch with no incident was the 5th launch.
sum(data$temp > 65 & data$incident == "Yes")
## [1] 3
# There are 3 incidents above 65 degrees F
#2.1What is the probability that an individual is actually a liar given that the polygraph detected him/her as such? Solve using a Bayesian equation. If you are not sure, you can try to solve as with the tree or table method for partial credit
# Parameters
liar <- .59 # Liar
TT <- 0.90 # Truth Teller
poly <- 0.20 # individuals selected for the screening polygraph will lie.
Prob_liar <- (liar*poly)/((liar*poly)+((1-TT)*(1-poly)))
round(Prob_liar, digits = 4)
## [1] 0.596
# after rounding 4 digits and the probability is 59.6% an individual is actually a liar
#The event that the individual is a liar (L).
#The event that the individual is identified as a liar by the polygraph (P).
#The probability statement we are trying to solve is:P(L∪P)
#This can be calculated using the formula for the union of two events: P(L∪P)=P(L)+P(P)−P(L∩P)
#P(L) is the prior probability of being a liar (20% or 0.20).
#P(P) is the total probability of testing positive, regardless of whether the individual is a liar or not.
#P(L∩P) is the probability of an individual being a liar and testing positive, which we've previously calculated using sensitivity.
#P(L)=0.20, P(L∩P)=P(Positive∣Liar)×P(Liar)=0.59×0.20
liar <- .59 # Liar
TT <- 0.90 # Truth Teller
poly <- 0.20 # individuals selected for the screening polygraph will lie.
Prob_Lie <- ((1-TT)*(1-poly))+poly
round(Prob_Lie, digits = 4)
## [1] 0.28
#To find the probability that the MRI machine will fail after 8 years, we first find the probability that it does not fail during the 8 years, which is the probability of zero events in a Poisson distribution:
#P(X=0)=(e^(−λt)*(λt)^0)/0!
#This simplifies to:P(X=0)=e^(−0.8)
lambda_t <- 1/10 *(8)
k <- 0
# Probability after 8 years
prob_0 <- ppois(k,lambda_t)
round(prob_0, digits = 4)
## [1] 0.4493
#The probability that the MRI machine will not fail within the first 8 years is approximately 44.93%.
Exp <- lambda_t
Exp
## [1] 0.8
SD <- sqrt(Exp)
round(SD, digits = 4)
## [1] 0.8944
# The expected value (mean number of failures) over this 8-year period is 0.8, and the standard deviation is approximately 0.8944.
# P(X=0)=(1−p)^n = 0.9^8
Prob_F <- 1/10
Prob_NF <- 1-Prob_F
n <- 8 # Years
Prob <- pbinom(Prob_NF,n,Prob_F)
round(Prob, digits = 4)
## [1] 0.4305
#The probability that the MRI machine will not fail within the first 8 years, modeled as a Binomial distribution, is approximately 43.05%
Exp <- n*Prob_F
Exp
## [1] 0.8
SD <- sqrt(Exp*(1-0.1))
round(SD, digits = 4)
## [1] 0.8485
# The expected value, which represents the average number of failures, is 0.8. The standard deviation, which measures the variability around this expected value, is approximately 0.8485
#The probability of getting any one question correct by random guessing is 1/4.
#The probability of getting any one question wrong is 3/4.
Prob_C <- 0.25 # Probability she answers correctly
Prob_W <- 0.75 # Probability she answers incorrect
Prob<- (Prob_W^2)*Prob_C
round(Prob,digits = 4)
## [1] 0.1406
# There is a 14.06% that the first question Robin gets right is the third question.
#X ~ Binomial(n = 5, p = 1/4)
#Here, n is the number of trials (5 questions), and p is the probability of success (getting one question correct, which is 1/4).
# In order to find the probability that Robin gets exactly 3 or exactly 4 questions right, then we can calculate the probabilities for these two scenarios and add them together
prob<- dbinom(4,5,0.25)+dbinom(3,5,0.25)
prob
## [1] 0.1025391
# The probability is 0.1025391 that Robin gets exactly 3 or exactly 4 questions right
N <- 5
K <- 2
pi <- 0.25
choose(n = N, k = K) * pi^K * (1-pi)^(N-K)
## [1] 0.2636719
dbinom(x = 2,
size = 5,
prob = 0.25
)
## [1] 0.2636719
#
#5 5a Speeding on the Interstate 5 Freeway (I-5) in California.The distribution of passenger vehicle speeds traveling on the Interstate 5 Freeway (I-5) inCalifornia is nearly normal with a mean of 72.6 miles/hour and a standard deviation of 4.78miles/hour. #5a1 What percent of passenger vehicles travel slower than 80 miles/hour? Define the randomvariable X, and write the probability statement
X = 80
mean_ = 72.6
sd = 4.78
round(pnorm(80, 72.6, 4.78),digits = 4)
## [1] 0.9392
#5.2What percent of passenger vehicles travel between 68 and 78 miles/hour? Does this makesense? Justify.
round(pnorm(78,72.6,4.78)-pnorm(68,72.6,4.78),digits = 4)
## [1] 0.7028
1-pnorm(70,72.6,4.78)
## [1] 0.7067562
#5b N(𝜇 = 4313,𝜎 = 583) for Men, Ages 30 - 34 group.5b1 The cutoff time for the fastest 5% of athletes in the men’s group, i.e. those who took theshortest 5% of time to finish.
x = 0.05
mean = 4313
sd = 583
Prob <- qnorm(0.05, mean = 4313, sd = 583)
round(Prob, digits = 4)
## [1] 3354.05
qnorm(0.10,5261,807)
## [1] 4226.788