Random Variables

Binomial

A Binomial distribution is defined as a number of successes in a sequence of independent Bernoulli trials.

\[ X = Bin(n,p) \\ n = \textrm{number of trials} \\ p = \textrm{probability of success} \\ P(x) = \binom{n}{x} p^x (1-p)^{n-x} \\ E[X] = np \\ V[X] = np(1-p) \\ Desv[X] = \sqrt{np(1-p)} \\ \]

A company called Birrus Mac has deployed a malicius worm on the net. The effects of that malware on the file systen is unknown. Suppose any possible PC can be infected. If the probability of an infection in any local PC is 0.595 and the probability that it does not corrupt your files is 0.385 .
What do you thing will happen if your organization have 11 local PC computers, and the average number of files in there are 22 . Treat each situation as a separate problem

Could you please solve the following questions:
1. Simulate the infection of local PC using 50000 experiments and show a table with the estimated PMF and CDF. Compare with the theoretical results.

nPC <- 50000
size <- 11
probInfection <- 0.595
infectionSim <- rbinom(nPC, size, probInfection)
PMF <- dbinom(infectionSim, size, probInfection)
CDF <- pbinom(infectionSim, size, probInfection)
dTable <- data.table(PMF=PMF,CDF=CDF)
dTable

##               PMF        CDF
##     1: 0.22336694 0.48101866
##     2: 0.22336694 0.48101866
##     3: 0.23439740 0.71541606
##     4: 0.02515794 0.03169119
##     5: 0.23439740 0.71541606
##    ---                      
## 49996: 0.15203968 0.25765173
## 49997: 0.17218081 0.88759687
## 49998: 0.07392085 0.10561205
## 49999: 0.22336694 0.48101866
## 50000: 0.17218081 0.88759687

Compute the Expected value, the Variance and the Deviation of the number of corrupted files. Use at least two different methods

nFiles <- 22 
probCorruption <- 1 - 0.385

#Expected value
Eproblem2 <- nFiles*probCorruption
vectortest <- array(1:nFiles)
for(i in 1:nFiles){
  vectortest[i] <- dbinom(i,nFiles, probCorruption)
}
arrayValues <- array(1:nFiles)
Eproblem22<-weighted.mean(arrayValues, vectortest)
Eproblem22

## [1] 13.53

#Variance
Vproblem2 <- nFiles*probCorruption*(1-probCorruption)
vectortest2 <- array(1:nFiles)
Vproblem22 <- 0
for(j in 1:nFiles){
  Vproblem22 <- Vproblem22 + (((arrayValues[j] - Eproblem22)^2)*vectortest[j])
}
Vproblem22

## [1] 5.20905

#Standard Deviaton
Desvproblem2 <- sqrt(nFiles*probCorruption*(1-probCorruption))
Desvproblem22 <- sqrt(Vproblem22)
Desvproblem22

## [1] 2.282334

What is the probability that at least one file is corrupted?

prob1fileCorrupted <- pbinom(1,nFiles,probCorruption,lower.tail = FALSE) + dbinom(1,nFiles,probCorruption)
prob1fileCorrupted

## [1] 1

What is the probability that at least 6 files are not corrupted ?

prob6fileNotCorrupted <- pbinom(6,nFiles,1-probCorruption, lower.tail = FALSE) + dbinom(6,nFiles,probCorruption)
prob6fileNotCorrupted

## [1] 0.8054039

Show the cdf of the number of corrupted files

corruptionSim <- rbinom(nPC, nFiles, probCorruption)
CDFCorrupt <- pbinom(corruptionSim, nFiles, probCorruption)

What is the probability that all the computers are infected?

probAllPCInfected <- dbinom(size,size,probInfection)
probAllPCInfected

## [1] 0.003308923

What is the probability that at least 4 PC are not infected ?

prob4fileNotInfected <- pbinom(4,size,1-probInfection, lower.tail = FALSE) + dbinom(4,size,1-probInfection)
prob4fileNotInfected

## [1] 0.7154161

What is the probability that no file is corrupted?

prob0FileCorrupted <- dbinom(0,nFiles,probCorruption)
prob0FileCorrupted

## [1] 7.588152e-10

Geometric Distribution

A geometric distribution is defined as the number of trials until the first success is observed. Or in other words, the number of Bernoulli experiments needed to obtain the first sucessful outcome.

\[ X = Geom(p) \\ p = \textrm{probability of success} \\ P(x) = p (1-p)^{x-1} \textrm{ ; } x\geq 1 \\ E[X] = \frac{1}{p} \\ V[X] = \frac{1-p}{p^2} \\ Desv[X] = \sqrt{\frac{1-p}{p^2}} \\ \]

A group of coders are trying to prepare themselves to get a position at google. They already know no bugs permitted, at all. They can also decide to code with Python which google supports or Javi (a Java new implementation no one likes) What do you think will happen if the probability that you make a bug coding in Python is 0.595 and if you are programming in Javi the probability of writting goog code is 0.562 .

You are coding different methods, programs and functions. Could you please solve the following questions if you need to write a good method or program in order to start interveiws with Google:
1. Simulate the 1st Bug code (Python) using 5000 experiments and show visually the estimated PMF and CDF. Compare with the theoretical results.

nExperiments <- 5000 
probBugPython <- 0.595
probBugJavi <- 0.562
bugSimPy <- rgeom(nExperiments, probBugPython)
PMFgeom <- dgeom(bugSimPy,probBugPython)
CDFgeom <- pgeom(bugSimPy,probBugPython)
dTable1Py <- data.table(PMF=PMFgeom,CDF=CDFgeom)
dTable1Py

##              PMF       CDF
##    1: 0.24097500 0.8359750
##    2: 0.24097500 0.8359750
##    3: 0.59500000 0.5950000
##    4: 0.24097500 0.8359750
##    5: 0.59500000 0.5950000
##   ---                     
## 4996: 0.59500000 0.5950000
## 4997: 0.00648324 0.9955870
## 4998: 0.01600800 0.9891038
## 4999: 0.24097500 0.8359750
## 5000: 0.59500000 0.5950000

2.Simulate the 1st Bug code (Python) using 1000 experiments and show a table with the estimated PMF and CDF. Compare with the theoretical results.

nExperiments2 <- 1000 
bugSimPy2 <- rgeom(nExperiments2, probBugPython)
PMFgeom2 <- dgeom(bugSimPy2,probBugPython)
CDFgeom2 <- pgeom(bugSimPy2,probBugPython)
dTable1Py2 <- data.table(PMF=PMFgeom2,CDF=CDFgeom2)
dTable1Py2

##              PMF       CDF
##    1: 0.09759488 0.9335699
##    2: 0.59500000 0.5950000
##    3: 0.59500000 0.5950000
##    4: 0.59500000 0.5950000
##    5: 0.59500000 0.5950000
##   ---                     
##  996: 0.24097500 0.8359750
##  997: 0.59500000 0.5950000
##  998: 0.03952592 0.9730958
##  999: 0.59500000 0.5950000
## 1000: 0.59500000 0.5950000

Show the cdf of the number of attempts up to and including the first No Bug code (Python).

CDFgeom2 <- pgeom(bugSimPy,1-probBugPython)

Show the cdf of the number of attempts up to and including the first Bug code (Javi).

bugSimJavi <- rgeom(nExperiments, probBugJavi)
CDFgeom3 <- pgeom(bugSimJavi,probBugJavi)

Simulate the 1st No Bug code (Python) using 5000 experiments and show visually the estimated PMF and CDF. Compare with the theoretical results.

PMFgeomNoBugs <- dgeom(bugSimPy,1-probBugPython)
CDFgeom2NoBugs <- pgeom(bugSimPy,1-probBugPython)
dTable1PyNoBugs <- data.table(PMF=PMFgeom2,CDF=CDFgeom2)
dTable1PyNoBugs

##              PMF       CDF
##    1: 0.09759488 0.6459750
##    2: 0.59500000 0.6459750
##    3: 0.59500000 0.4050000
##    4: 0.59500000 0.6459750
##    5: 0.59500000 0.4050000
##   ---                     
## 4996: 0.24097500 0.4050000
## 4997: 0.59500000 0.9556287
## 4998: 0.03952592 0.9254264
## 4999: 0.59500000 0.6459750
## 5000: 0.59500000 0.4050000

What is the probability that exactly 3 attemps are needed to get a program with no bugs (Javi)?

prob3Attemps <- dgeom(3,probBugJavi)
prob3Attemps

## [1] 0.04722355

What is the probability that at least 2 attempts are needed to get a program with no bugs (Javi)?

prob2Attemps <- pgeom(2,probBugJavi,lower.tail = FALSE)
prob2Attemps

## [1] 0.08402767

Plot the pmf of the number of attempts up to and including the first Bug code (Python)

hist(PMFgeom, breaks = seq(0,0.6,0.1), col = "light grey", border = "grey", xlab = "PMF Python 1st Bug", main = "Histogram of PMF Python 1st Bug")

HyperGeometric Distribution

An Hypergeometric Distribution is defined as the number of success (without replacement) in our sample of size n.

\[ X = HyperGeom(N,k,n) \\ N = \textrm{Total number of elements} \\ k = \textrm{successful elements} \\ n = \textrm{sample size} \\ \\ P(x) = \frac{ \binom{k}{x} \binom{N-k}{n-x} } {\binom{N}{n}} \\ \\ E[X] = \frac{nk}{N} \\ V[X] = \frac{k(N-k)n(N-n)}{N^2(N-1)} \\ Desv[X] = \sqrt{\frac{k(N-k)n(N-n)}{N^2(N-1)}} \\ \]

A big company has two main departments. The Engineers Department and the Economist Department. Between both departments there are 79 employees which can be rised to the Board Of Directors. However only 7 positions are available. If there are 33 Engineers.

Can you try to solve these questions in the assumption that all employees are equally likely to join the BOD?:

Simulate the number of Engineers on BOD using 2e+05 experiments and show a table with the estimated PMF and CDF. Compare with the theoretical results

NTotalEmployees <- 79
kPositions <- 7
nEngineers <- 33
nEconomists <- NTotalEmployees - nEngineers 
experiments <- 200000
BODEngSim <- rhyper(experiments,NTotalEmployees,kPositions,nEngineers)
PMFhyper <- dhyper(BODEngSim,NTotalEmployees,kPositions,nEngineers)
CDFhyper <- phyper(BODEngSim,NTotalEmployees,kPositions,nEngineers)
dTableBODEngSim <- data.table(PMF= PMFhyper, CDF = CDFhyper)
dTableBODEngSim

##                PMF        CDF
##      1: 0.28199089 0.83031717
##      2: 0.28199089 0.83031717
##      3: 0.02868738 1.00000000
##      4: 0.29733733 0.54832629
##      5: 0.06086670 0.07258656
##     ---                      
## 199996: 0.29733733 0.54832629
## 199997: 0.29733733 0.54832629
## 199998: 0.28199089 0.83031717
## 199999: 0.29733733 0.54832629
## 200000: 0.14099544 0.97131262

What is the probability that exactly 3 are Economist ?

prob3Economists <- dhyper(3,nEconomists,nEngineers,kPositions)
prob3Economists

## [1] 0.2142871

What is the probability that less than 3 are Economist?

prob3lessEconomists <- phyper(3,nEconomists,nEngineers,kPositions)
prob3lessEconomists

## [1] 0.3180776

Compute the Expected value, the Variance and the Deviation of the number of Engineers on BOD. Use at least two different methods

ExpectedEngineers <- (nEngineers*kPositions)/NTotalEmployees
ExpectedEngineers

## [1] 2.924051

VariancEngineers <- (kPositions*(NTotalEmployees-kPositions)*nEngineers*(NTotalEmployees-nEngineers))/((NTotalEmployees^2)*(NTotalEmployees-1))
VariancEngineers

## [1] 1.571642

DesvEngineers <- sqrt(VariancEngineers)
DesvEngineers

## [1] 1.253651

What is the probability that more than 3 are Engineers?

prob3Engineers <- phyper(3,nEngineers,nEconomists,kPositions, lower.tail = FALSE)
prob3Engineers

## [1] 0.3180776

What is the probability that less than 4 are Engineers?

prob4lessEngineers <- phyper(4,nEngineers,nEconomists,kPositions)
prob4lessEngineers

## [1] 0.8962096

What is the probability that at least one of them is Economist ?

prob1Economists <- phyper(1,nEconomists,nEngineers,kPositions, lower.tail = FALSE) +dhyper(1,nEconomists,nEngineers,kPositions)
prob1Economists

## [1] 0.9985262

What is the probability that no Engineer in the Board of Directors ?

prob0Engineers <- dhyper(0,nEngineers,nEconomists,kPositions)
prob0Engineers

## [1] 0.01846472

Poisson Distribution

A Poisson Distribution is defined as the number of success within a fixed period of time

\[ X = Pois(\lambda) \\ \lambda = \textrm{frequency, average number of events} \\ k = \textrm{successful elements} \\ \\ P(x) = e^{-\lambda}\frac{\lambda^x}{x!} \\ \\ E[X] = \lambda \\ V[X] = \lambda \\ Desv[X] = \sqrt{\lambda} \\ \]

Poisson Aproximation of Binomial Distribution

When the number of trials n is large, and the probability of success is small p Poisson distribution can be used to approximate Binomial Probabilities effectively

App entertainment has upload a new App to Google Play and Apple Store. The expected number of payed downloads per hour is supposed to be 67.

Solve the following questions:

Plot the pmf of the number of downloads per minut.

lambdaHour <- 67
lambdaMin <- 67/60
DownloadSim <- rpois(1:60,lambdaMin)
PMFpois <- dpois(1:60,lambdaMin)
plot(1:60,PMFpois,main = "PMF of the number downloads per minute", xlab = "Downloads",ylab ="PMF")

2. What is the probability that we have between 58 and 59 downloads per hour.

prob5859 <- dpois(59,lambdaHour) + dpois(58,lambdaHour)
prob5859

## [1] 0.05928642

What is the probability that more than 20 downloads per hour.

probMore20 <- ppois(20,lambdaHour, lower.tail = FALSE)
probMore20

## [1] 1

Simulate the number of downloads per minut using 5000 experiments and show a table with the estimated PMF and CDF. Compare with the theoretical results.

nExperiments2 <- 5000 
DownPerMinutSim <- rpois(nExperiments2, lambdaMin)
PMFpois<- dpois(DownPerMinutSim,lambdaMin)
CDFpois <- ppois(DownPerMinutSim,lambdaMin)
dTable1Pois <- data.table(PMF=PMFpois,CDF=CDFpois)
dTable1Pois

##              PMF       CDF
##    1: 0.32736921 0.3273692
##    2: 0.36556228 0.6929315
##    3: 0.36556228 0.6929315
##    4: 0.32736921 0.3273692
##    5: 0.32736921 0.3273692
##   ---                     
## 4996: 0.07597264 0.9730097
## 4997: 0.32736921 0.3273692
## 4998: 0.36556228 0.6929315
## 4999: 0.32736921 0.3273692
## 5000: 0.32736921 0.3273692

What is the probability that we have between 34 and 81 downloads per minut.

prob3481 <- ppois(81,lambdaMin) - (ppois(58,lambdaMin,lower.tail = FALSE )+dpois(58,lambdaMin))
prob3481

## [1] 1

Compute the Expected value, the Variance and the Deviation of the number of downloads per hour . Use at least two different methods.

Epoisson <- lambdaMin
Epoisson

## [1] 1.116667

Vpoisson <- lambdaMin
Vpoisson

## [1] 1.116667

Desv <- sqrt(lambdaMin)
Desv

## [1] 1.056724

#There's only one way to get the proper results, because we cannot set a limited range when time is involved.

What is the probability that the exactly 64 downloads per minut.

prob64 <- dpois(64, lambdaMin)
prob64

## [1] 3.011121e-87

What is the probability that less than 23 downloads per hour.

prob23 <- ppois(23, lambdaHour) - dpois(23, lambdaHour)
prob23

## [1] 1.561691e-10

Joint Distribution

A table with the joint pmf of X1 and Y1.
	A	B	C	D
a	0.11	0.11	0.10	0.10
b	0.09	0.11	0.08	0.09
c	0.06	0.05	0.05	0.05

You should compute:

The marginal distribution of X1

marginalDistribX1a <- (0.10*3) + 0.11
marginalDistribX1a

## [1] 0.41

marginalDistribX1b <- (0.10*2) + 0.06 + 0.09
marginalDistribX1b

## [1] 0.35

marginalDistribX1c <- (0.06*3) + 0.05
marginalDistribX1c

## [1] 0.23

The marginal distribution of Y1

marginalDistribX1A <- (0.10*2)+0.06
marginalDistribX1A

## [1] 0.26

marginalDistribX1b <- (0.06*2)+0.10
marginalDistribX1b

## [1] 0.22

marginalDistribX1C <- (0.10+0.10+0.05)
marginalDistribX1C

## [1] 0.25

marginalDistribX1D <- (0.11+0.09+0.06)
marginalDistribX1D

## [1] 0.26

The Expected value of X1

Expeca <- 0.10*contA+0.10*contB+0.10*contC+0.11*contD
Expecb <- 0.10*contA+0.06*contB+0.10*contC+0.09*contD
Expecc <- 0.06*contA+0.06*contB+0.05*contC+0.06*contD
ExpecX1Total <- Expeca+Expecb+Expecc
ExpecX1Total

## [1] 246.76

The Expected value of Y1

ExpecA <- 0.10*conta+0.10*contb+0.06*contc
ExpecB <- 0.10*conta+0.06*contb+0.06*contc
ExpecC <- 0.10*conta+0.10*contb+0.05*contc
ExpecD <- 0.11*conta+0.09*contb+0.06*contc
ExpecY1Total <- ExpecA+ExpecB+ExpecC+ExpecD
ExpecY1Total

## [1] 349.34

The Variance of X1

Vara <- (((contA-Expeca)^2)*0.10)+(((contB-Expeca)^2)*0.10)+(((contC-Expeca)^2)*0.10)+(((contD-Expeca)^2)*0.11)
Varb <- (((contA-Expecb)^2)*0.10)+(((contB-Expecb)^2)*0.06)+(((contC-Expecb)^2)*0.10)+(((contD-Expecb)^2)*0.09)
Varc <- (((contA-Expecc)^2)*0.06)+(((contB-Expecc)^2)*0.06)+(((contC-Expecc)^2)*0.05)+(((contD-Expecc)^2)*0.06)
VarX1Total <- Vara+Varb+Varc
VarX1Total

## [1] 26805.88

The Variance of Y1

VarA <- (((conta-ExpecA)^2)*0.10)+(((contb-ExpecA)^2)*0.10)+(((contc-ExpecA)^2)*0.06)
VarB <- (((conta-ExpecB)^2)*0.10)+(((contb-ExpecB)^2)*0.06)+(((contc-ExpecB)^2)*0.06)
VarC <- (((conta-ExpecC)^2)*0.10)+(((contb-ExpecC)^2)*0.10)+(((contc-ExpecC)^2)*0.05)
VarD <- (((conta-ExpecD)^2)*0.11)+(((contb-ExpecD)^2)*0.09)+(((contc-ExpecD)^2)*0.06)
VarY1Total <- VarA+VarB+VarC+VarD
VarY1Total

## [1] 75827.84

Plot the joint distribution

xPrima   <- seq(-200,446,length=100)
yPrima   <- dnorm(xPrima,Expeca, sqrt(Vara))
yPrimb   <- dnorm(xPrima,Expecb, sqrt(Varb))
yPrimc   <- dnorm(xPrima,Expecc, sqrt(Varc))
plot(xPrima,yPrima, type="l", lwd=2, col="red", main="Joint Distrib x1", xlab="x1 Distrib", ylab="Frequency")
lines(xPrima,yPrimb, col="green", lwd=2)
lines(xPrima,yPrimc, col="blue", lwd=2)

seqGraf   <- seq(-200,446,length=100)
yPrimA   <- dnorm(seqGraf,ExpecA, sqrt(VarA))
yPrimB   <- dnorm(seqGraf,ExpecB, sqrt(VarB))
yPrimC   <- dnorm(seqGraf,ExpecC, sqrt(VarC))
yPrimD   <- dnorm(seqGraf,ExpecC, sqrt(VarD))
plot(seqGraf,yPrimA, type="l", lwd=1, col="red", main="Joint Distrib y1", xlab="y1 Distrib", ylab="Frequency")
lines(seqGraf,yPrimB, col="green", lwd=1)
lines(seqGraf,yPrimC, col="blue", lwd=1)
lines(seqGraf,yPrimD, col="purple", lwd=1)

Joint Distribution 2

NASA and the Jet Propulsion Laboratory work on a new testing software called XX. It can detect problems with the code written by some software engineers. Nevertheless they use an old testing approach to detect coding errors called YY. Note the the number right to the parenthesis is the value of the random variable that accounts for the number of errors detected Can you try to solve the following questions?:

Note: We think that the questions to be answered are wrong, because the probabilities to be solved aren’t the expected. So, to solve that, we hace change exercise 4,5,6,8. (Instead of YY are XX and viceversa)

Compute the expected number of errors detected by XX

contA2 <- 14
contB2 <- 16
contC2 <- 19
contD2 <- 44
contE2 <- 49

Expeca2 <- 0.008*+contA2+0.011*contB2+0.002*contC2+0.176*contD2+0.062*contE2
Expecb2 <- 0.005*contA2+0.010*contB2+0.002*contC2+0.177*contD2+0.069*contE2
Expecc2 <- 0.010*contA2+0.008*contB2+0.001*contC2+0.151*contD2+0.053*contE2
Expecd2 <- 0.006*contA2+0.012*contB2+0.002*contC2+0.167*contD2+0.068*contE2
ExpecX1Total2 <- Expeca2+Expecb2+Expecc2+Expecd2
ExpecX1Total2

## [1] 43.067

What is the probability that the number of errors detected by YY is greater or equal than 44 given that the probability that the number of errors detected by XX is less than 42

probMore44Less42 <- 0.176+0.062
probMore44Less42

## [1] 0.238

Compute the Variance and Standard Deviation of number of errors detected by YY

conta2 <- 38
contb2 <- 42
contc2 <- 43
contd2 <- 56

ExpecA2 <- 0.008*conta2+0.005*contb2+0.010*contc2+0.006*contd2
ExpecB2 <- 0.011*conta2+0.010*contb2+0.008*contc2+0.012*contd2
ExpecC2 <- 0.002*conta2+0.002*contb2+0.001*contc2+0.002*contd2
ExpecD2 <- 0.176*conta2+0.177*contb2+0.151*contc2+0.167*contd2
ExpecE2 <- 0.062*conta2+0.069*contb2+0.053*contc2+0.068*contd2
ExpecY1Total2 <- ExpecA+ExpecB+ExpecC+ExpecD+ExpecE2
ExpecY1Total2

## [1] 360.681

VarA2 <- (((conta2-ExpecA2)^2)*0.008)+(((contb2-ExpecA2)^2)*0.005)+(((contc2-ExpecA2)^2)*0.010)+(((contd2-ExpecA2)^2)*0.006)
VarB2 <- (((conta2-ExpecB2)^2)*0.011)+(((contb2-ExpecB2)^2)*0.010)+(((contc2-ExpecB2)^2)*0.008)+(((contd2-ExpecB2)^2)*0.012)
VarC2 <- (((conta2-ExpecC2)^2)*0.002)+(((contb2-ExpecC2)^2)*0.002)+(((contc2-ExpecC2)^2)*0.001)+(((contd2-ExpecC2)^2)*0.002)
VarD2 <- (((conta2-ExpecD2)^2)*0.176)+(((contb2-ExpecD2)^2)*0.177)+(((contc2-ExpecD2)^2)*0.151)+(((contd2-ExpecD2)^2)*0.167)
VarE2 <- (((conta2-ExpecE2)^2)*0.062)+(((contb2-ExpecE2)^2)*0.069)+(((contc2-ExpecE2)^2)*0.053)+(((contd2-ExpecE2)^2)*0.068)
VarY1Total2 <- VarA2+VarB2+VarC2+VarD2+VarE2
VarY1Total2

## [1] 621.4795

What is the probability that the number of errors detected by XX is less than 42

probLess42 <- 0.008+0.011+0.002+0.176+0.062
probLess42

## [1] 0.259

What is the probability that the number of errors detected by XX is less or equal than 42

probLessEqual42 <- probLess42+0.005+0.010+0.02+0.177+0.069
probLessEqual42

## [1] 0.54

What is the probability that the number of errors detected by XX is greater or equal than 56

probMoreEqual56 <- 0.006+0.012+0.002+0.167+0.068
probMoreEqual56

## [1] 0.255

Compute the expected number of errors detected by YY

ExpecA2 <- 0.008*conta2+0.005*contb2+0.010*contc2+0.006*contd2
ExpecB2 <- 0.011*conta2+0.010*contb2+0.008*contc2+0.012*contd2
ExpecC2 <- 0.002*conta2+0.002*contb2+0.001*contc2+0.002*contd2
ExpecD2 <- 0.176*conta2+0.177*contb2+0.151*contc2+0.167*contd2
ExpecE2 <- 0.062*conta2+0.069*contb2+0.053*contc2+0.068*contd2
ExpecY1Total2 <- ExpecA+ExpecB+ExpecC+ExpecD+ExpecE2
ExpecY1Total2

## [1] 360.681

What is the probability that the number of errors detected by YY is less than 16

probLess16 <- 0.008+0.005+0.010+0.006
probLess16

## [1] 0.029

Normal Distribution

The gaussian (or bell-shaped) distribution is the most important continuous distribution. Why? Normality arises naturally in many real contexts ranging from physical to biological, from engineering to social measurement situations. The central limit theorem (CLT) states that under certain (fairly common) conditions, the sum of many random variables will have an approximately normal distribution. Normality is also important in statistical inference.

Where it first Came From

The normal curve was first developed mathematically in 1733 by Abraham de Moivre as an approximation to the binomial distribution. Laplace used the normal curve in 1783 to describe the distribution of errors.

Shape

Symmetric smooth form with a single mode that is also the location of the mean and median. Either side of the mode there is a point of inflection of the bell curve which is one unit (one standard deviation) from the mean. Beyond this point the curve extends towards the x-axis asymptotically, with a theoretical extent to infinity in both directions.

The normal distribution is described by two parameters: the mean \(\mu\) and the standard deviation \(\sigma\) and the Normal Model is written : \(X \sim Norm(\mu, \sigma)\)

\[ X \sim Norm(\mu,\sigma) \\ \mu = \textrm{ mean or expectation of the distribution} \\ \sigma = \textrm{ standard deviation} \\ f_X(x) = \frac{1}{\sigma \sqrt {2\pi }}exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right) \\ E[X] = \mu \\ V[X] = \sigma^2 \\ Desv[X] = \sigma \\ \] Note that the mean of the distribution is μ=5949 and sigma is equal to σ=2078.
The marks of 7000 computing engineers (CE) are normaly distributed with a mean equal to 39 and a standard deviation of 2 The marks of mechanical engineers (ME) also follow a normal distribution.
The percentage of ME that get a mark under 0.0569 is 7.2 % and 19.1 % of them get a mark over 6.2
Can you try to solve the following questions?:
Observation: All questions are related to the CE students if no further information is given:
1. What is the probability that the RV. is greater than 34.6

mu <- 39
sigma <- 2
prob346Norm <- pnorm(34.6, mu, sigma, lower.tail= FALSE)
prob346Norm

## [1] 0.9860966

If we want to identify the students with lower marks. What is the mark that leave a 6 % of the students under this mark

porb6Norm <- qnorm(0.06, mu, sigma)
porb6Norm

## [1] 35.89045

What is the probability that the random variable lay between 41.5 and 43.1

prob415431Norm <- pnorm(43.1,mu, sigma) - pnorm(41.5, mu, sigma)
prob415431Norm

## [1] 0.08546756

What is the probability that the random variable lay between 39.4 and 43.8

prob394438Norm <- pnorm(43.8,mu, sigma) - pnorm(39.4, mu, sigma)
prob394438Norm

## [1] 0.4125428

What is the probability that the random variable lay between 33.2 and 35.1

prob332351Norm <- pnorm(35.1,mu, sigma) - pnorm(33.2, mu, sigma)
prob332351Norm

## [1] 0.02372225

Find the value of the RV. (x) if we know that the probability that RV is less than (x) is 0.976

prob0976Norm <- qnorm(0.976, mu, sigma)
prob0976Norm

## [1] 42.95474

If we want to identify the students with lower marks. What is the mark that leave a 10 % of the students under this mark

prob10Norm <- qnorm(0.1, mu, sigma)
prob10Norm

## [1] 36.4369

If we want to identify the students with lower marks. What is the mark that leave a 0.75 % of the students under this mark

prob075Norm <- qnorm(0.0075, mu, sigma)
prob075Norm

## [1] 34.13524