In 1986, the Challenger space shuttle exploded during “throttle up” due to catastrophic failure of o-rings (seals) around the rocket booster. The data (real) on all space shuttle launches prior to the Challenger disaster are in the file challenger.csv. The data folder contains the same data in 3 different formats - import any as it is the same file. The variables in the data set are defined as follows: • launch: this numbers the temperature-sorted observations from 1 to 23. • temp: temperature in degrees Fahrenheit at the time of launch. • incident: If there was an incident with an O-Ring, then it is coded “Yes”. • o_ring_probs: counts the number of O-ring partial failures experienced on the flight. Load the data into R or Python and answer the following questions. Include all R code.
setwd("C:/Users/LENOVO/Downloads/Data Analytics/HW 1 Titanic/titanic")
library(psych)
library(readxl)
my_data <- read_excel("challenger.xlsx")
summary(my_data)
## launch temp incident o_ring_probs
## Min. : 1.0 Min. :53.60 Length:23 Min. :0.0000
## 1st Qu.: 6.5 1st Qu.:66.20 Class :character 1st Qu.:0.0000
## Median :12.0 Median :69.80 Mode :character Median :0.0000
## Mean :12.0 Mean :69.02 Mean :0.4348
## 3rd Qu.:17.5 3rd Qu.:74.30 3rd Qu.:1.0000
## Max. :23.0 Max. :80.60 Max. :3.0000
#Finding center, spread and shape using psych package
psych::describe(my_data$launch)
psych::describe(my_data$temp)
psych::describe(my_data$o_ring_probs)
The variable launch is considered to be nominal because each observation is numeric, but they are identifiers.
The variable temp is considered to be interval, it is also a numeric value, it does not have a true zero value, as a result of having no absence of temperature.
The variable incident is considered to be nominal.
The variable o_ring_probs is considered to be ratio, as it’s numerical value helps to represent a true and equal interval, while also having a true zero
hist(my_data$o_ring_probs)
boxplot(temp~incident, data = my_data, horizontal = TRUE, main = "Temp vs Incident", xlab = "Temperature")
Based on this boxplot it is clear that there are more incidents when the temperatures are colder out, so launches should only be made on warm days.
mintemp <- min(my_data$temp[my_data$incident == "No"])
#The first observation succesfull launch occured with no incident is
mintemp
## [1] 66.2
Three or fewer incidents occured above 65 degrees F.
The sensitivity and specificity of the polygraph has been a subject of study and debate for years. A 2001 study of the use of polygraph for screening purposes suggested that the probability of detecting an actual liar was .59 (sensitivity) and that the probability of detecting an actual “truth teller” was .90 (specificity). We estimate that about 20% of individuals selected for the screening polygraph will lie.
pdLiar = .59
pdTruther = .90
pliar = .20
pTruther = 1-pliar
pnotTruther = 1- pdTruther
pnotLiar = 1-pdLiar
p_a = pliar
p_b = pTruther*pnotTruther + pliar*pdLiar
p_ba = pliar*pdLiar
p_ab = p_ba*p_a/p_b
#Probability that an individual is actually a liar given that the polygraph detcted him/her:
p_ab
## [1] 0.1191919
p_b = pTruther * pnotTruther
p_a = pliar * pdLiar
p_AorB = p_a + p_b
p_AorB
## [1] 0.198
Your organization owns an expensive Magnetic Resonance Imaging machine (MRI). This machine has a manufacturer’s expected lifetime of 10 years i.e. the machine fails once in 10 years, or the probability of the machine failing in any given year is 1/10.
#PMF P(X=k)=e−λ⋅λ/kk!
#Expected value E(X)=λ
#Sd σ(X)=sqrt of λ
#Poisson
lambda <- 8 / 10
# Number of failures of interest (0 failures in 8 years)
k <- 0
# Probability of 0 failures in 8 years using the Poisson distribution
prob_0 <- ppois(k, lambda)
# Expected Value
expected_value <- lambda
# Standard Deviation
standard_deviation <- sqrt(lambda)
# Print the results
#Probability that the machine will fail after 8 years (0 failures in 8 years):
print(prob_0)
## [1] 0.449329
#standard deviation of failures in 8 years :
print(standard_deviation)
## [1] 0.8944272
#expected value of failures in 8 years :
print(expected_value)
## [1] 0.8
#Binomial
p_f <- 1/10
p_s <- 1 - p_f
n <- 8
#Binomial of failures in 8 years
prob_0_failures_in_8_years <- pbinom(p_s, n, p_f)
# Expected value
expected_value <- n * p_f
# Standard deviation
standard_deviation <- sqrt(n * p_s * p_f)
# Print the results
print(prob_0_failures_in_8_years)
## [1] 0.4304672
#Expected value
print(expected_value)
## [1] 0.8
#sd
print(standard_deviation)
## [1] 0.8485281
In a multiple choice quiz there are 5 questions and 4 choices for each question (a, b, c, d). Robin has not studied for the quiz at all, and decides to randomly guess the answers.
pr <- 0.25 ##Probability that the answer is right
pw <- (1-pr) ##Probability that the answer is wrong
(pw^2)*pr
## [1] 0.140625
#X = 0:5
#It is binomial distribution
dbinom(4,5,0.25)
## [1] 0.01464844
dbinom(3,5,0.25)
## [1] 0.08789063
sum(dbinom(4,5,0.25)+
dbinom(3,5,0.25))
## [1] 0.1025391
choose(n = 5, k = 2) * 0.25^2 * (1-0.25)^(5-2)
## [1] 0.2636719
N <- 5
K <- 2
pi <- 0.25
choose(n = N, k = K) * pi^K * (1-pi)^(N-K)
## [1] 0.2636719
# Again, gives same answer as above
dbinom(x = 2,
size = 5,
prob = 0.25
)
## [1] 0.2636719
#Got confusion above codes
#gets majority of the questions right
1-pbinom(2,5,0.25)
## [1] 0.1035156
Question 5: Normal Distribution (Week 3)
The distribution of passenger vehicle speeds traveling on the Interstate 5 Freeway (I-5) in California is nearly normal with a mean of 72.6 miles/hour and a standard deviation of 4.78 miles/hour.
a1. What percent of passenger vehicles travel slower than 80 miles/hour? Define the random variable X, and write the probability statement.
#It is a normal distribution
X = 80
mean_ = 72.6
sd = 4.78
pnorm(X, mean = 72.6, sd = 4.78)
## [1] 0.939203
Approximately 93.92% of passenger vehicles on I-5 travel slower than than 80 mph.
pnorm(80, mean = 72.6, sd = 4.78)
## [1] 0.939203
pnorm(60, mean = 72.6, sd = 4.78)
## [1] 0.004194693
r = pnorm(80, mean = 72.6, sd = 4.78) -
pnorm(60, mean = 72.6, sd = 4.78)
round(r, 4)
## [1] 0.935
Therefore, approximately 93.5% of passenger vehicles on I-5 travel between 60 mph and 80 mph.
1 - pnorm(70, mean = 72.6, sd = 4.78)
## [1] 0.7067562
Therefore, approximately 70.68% of passenger vehicles on I-5 travel above speed limit.
#P(X > x)=.05
x = 0.05
mean = 4313
sd = 583
qnorm(0.05, mean = 4313, sd = 583)
## [1] 3354.05
#1- P(X > x) = .1 P(X > X) = 1 -.1 P(X > x) = .9
x = 0.9
mean = 5621
sd = 807
qnorm(0.9, mean = 5621, sd = 807)
## [1] 6655.212