Random Variables Sharon Cabrera & Marc Ribas
February 15, 2016
This assignment is related with the use random variable. Here we are going to write down some of the formulas needed and then it is up to you to solve the statement problems. This assignement will make you understand why Einstein said that God does not play dice but we certainly do.
TO DO LIST
1.-Solve the problems by hand
2.-Check the results using R as the system to solve these problems
3.-Write an R markdown document .Rmd with the solutions done by hand and the code used to check the answers.
4.-The .Rmd output has to be .Html
5.-You are going to mix text and code. Make all your code echoed (visible)
6.-Verify that your final .Rmd file is accurate, complete and well documented.
7.-Make sure your report respect the same format, order and aspect of this one. It means you have to add the informations necessary 8.-to solve the problem by hand and also the way you do it using R
Observation: You can add any additional graphic or link that helps understanding the way you solve the problems
******************** GROUP NAMES *****************************
A Binomial distribution is defined as a number of successes in a sequence of independent Bernoulli trials. \[ X = Bin(n,p) \\ n = \textrm{number of trials} \\ p = \textrm{probability of success} \\ P(x) = \binom{n}{x} p^x (1-p)^{n-x} \\ E[X] = np \\ V[X] = np(1-p) \\ Desv[X] = \sqrt{np(1-p)} \\ \]
A company called Birrus Mac has deployed a malicius worm on the net. The effects of that malware on the file system is unknown. Suppose any possible PC can be infected. If the probability of an infection in any local PC is 0.579 and the probability that it does not corrupt your files is 0.74 .
What do you thing will happen if your organization have 13 local PC computers, and the average number of files in there are 25 . Could you please solve the following questions:
1.-What is the probability that we have between 9 and 12 PCâs infected ?
d<-dbinom(9, 15, 0.372, log = FALSE)
d
## [1] 0.04188402
2.-What is the probability that at least 19 files are not corrupted ?
dbinom(x=1, size = 17, prob=0.795, log=FALSE) + dbinom(x=2, size = 17, prob=0.795, log=FALSE) + dbinom(x=3, size = 17, prob=0.795, log=FALSE) + dbinom(x=4, size = 17, prob=0.795, log=FALSE)
## [1] 1.156916e-06
3.-Compute the Expected value, the Variance and the Deviation of the number of infected PCâs. Use at least two different methods
probBin <-0.579
pcs<-13
#Expected value
expBin <- pcs*probBin
expBin
## [1] 7.527
#Variance
varBin<- expBin*(1-probBin)
varBin
## [1] 3.168867
#var()
#Deviation
devBin<- sqrt(varBin)
devBin
## [1] 1.780131
#sd()
4.-What is the probability that at least 4 PC are not infected ?
pP<-dnbinom(4, size = 13, prob = 1-0.579)
pP
## [1] 0.002669641
sum(dnbinom(1:4, size = 13, prob = 1-0.579))
## [1] 0.004318744
5.-What is the probability that we have between 11 and 18 filesâs not corrupted ?
#INVENTADA
r<-dbinom(18, size = 25, prob = 0.74)-dbinom(11, size = 25, prob = 0.74)
r
## [1] 0.1698981
A group of coders are trying to prepare themselves to get a position at google. They already know no bugs permitted, at all. They can also decide to code with Python which google supports or Javi (a Java new implementation no one likes) What do you thing will happen if the probability that you make a bug coding in Python is 0.71 and if you are programming in Javi the probability of writting goog code is 0.0717 . You are coding different methods, and programs. Could you please solve the following questions if you need to write a good method or program in order to start interveiws with google:
1.What is the probability that exactly 1 attemps are needed to get a program with no bugs (Javi)?
prob <-0.0717
j<-dgeom(1, prob)
j
## [1] 0.06655911
2.Compute the Expected value, the Variance and the Deviation of the number of attempts up to the 1st Bug code (Javi). Use at least two different methods
#Expected value
expectVal <- 1/prob
expectVal
## [1] 13.947
#Variance
v<- (1-prob)/(prob^2)
v
## [1] 180.5718
#Deviation
dev<- sqrt(v)
dev
## [1] 13.4377
3.Plot the pmf of the number of attempts up to and including the first Bug code (Javi)
plot(j)
A big company has two main departments. The Engineers Department and the Economist Department. Between both departments there are 62 employees which can be rised to the Board Of Directors. However only 4 positions are available. If there are 37 Engineers.
Can you try to solve these questions in the assumption that all employees are equally likely to join the BOD?:
m <- 37
n <- 25
N <-62
k <- 4
x <- dhyper(0,m = 37, n =25 , k = 4, log = FALSE)
x
## [1] 0.02267655
## Create a cumulative mass function
#n <- 25 # Sample size
#p <- x # Success probability
#X <- c() # Empty vector for storage
## Create a cumulative mass function
#cdf <- function(p, k) {
# cum.prob <- 0
# for (i in 1:k) {
# if (k == 0) {
# cum.prob <- 0
# return(cum.prob)
# break
# }
# cum.prob <- cum.prob + p*(1-p)^(i-1)
# }
# return(cum.prob)
#}
#iteration <- 0
#repeat{
# iteration <- iteration + 1
# u <- runif(1) ## Generate a U(0,1) value
#i <- 0
# repeat { ## Find which x satisfies F(x)<=u<=F(x+1)
# if (cdf(p, i) <= u) {
# if (u <= cdf(p, i + 1)) {
# X <- c(X, i)
# break
# }
#}
# i <- i + 1
#}
# if (iteration == n) {
# break
# }
#}
#X
#?????
#cdf <- phyper(25, n, m, k, lower.tail = TRUE, log.p = FALSE)
#cdf
#Expected value
expecValueEng <- (m*k)/N
expecValueEng
## [1] 2.387097
#Variance and the Deviation
y <- dhyper(0,n ,m , k , log = FALSE)
y
## [1] 0.1183931
#Expected value
expecValueEco <- (n*k)/N
expecValueEco
## [1] 1.612903
x2 <- dhyper(4,m ,n,k, log = FALSE)
x2
## [1] 0.1183931
y2 <- dhyper(4,n ,m ,k, log = FALSE)
y2
## [1] 0.02267655
## Create a cumulative mass function
#n <- 25 # Sample size
#p <- x2 # Success probability
#X <- c() # Empty vector for storage
## Create a cumulative mass function
#cdf <- function(p, k) {
# cum.prob <- 0
# for (i in 1:k) {
# if (k == 0) {
# cum.prob <- 0
# return(cum.prob)
# break
# }
# cum.prob <- cum.prob + p*(1-p)^(i-1)
# }
# return(cum.prob)
#}
#iteration <- 0
#repeat{
# iteration <- iteration + 1
# u <- runif(1) ## Generate a U(0,1) value
#i <- 0
# repeat { ## Find which x satisfies F(x)<=u<=F(x+1)
# if (cdf(p, i) <= u) {
# if (u <= cdf(p, i + 1)) {
# X <- c(X, i)
# break
# }
#}
# i <- i + 1
#}
# if (iteration == n) {
# break
# }
#}
#X
#??????
#cdfEng <- phyper(25, m, n, k, lower.tail = TRUE, log.p = FALSE)
#cdfEng
A Poisson Distribution is defined as the number of success within a fixed period of time
App entertainment has upload a new App to Google Play and Apple Store. The expected number of payed downloads per day is supposed to be 86 Solve the following questions:
1.What is the probability that more than 80 downloads per day
probPoi<-ppois(80, 86)
probPoi
## [1] 0.2804921
2.What is the probability that the exactly 9 downloads per day
probPoi2<-dpois(9, 86)
probPoi2
## [1] 3.17247e-26
3.What is the probability that we have between 75 and 76 downloads per day
probPoi3<-diff(ppois(c(75, 76), lambda = 86))
probPoi3
## [1] 0.0249535
4.What is the probability that we have between 56 and 64 downloads per day
probPoi4<-diff(ppois(c(56, 64), lambda=86))
probPoi4
## [1] 0.007655138
5.What is the probability that we have between 38 and 73 downloads per hour
lambda = 86/24
probPoi5<-diff(ppois(c(38, 73), lambda ))
probPoi5
## [1] 0
6.Show the cdf of the number of downloads per hour
#n <- 86 # Sample size
#p <- probPoi5 # Success probability
#X <- c() # Empty vector for storage
## Create a cumulative mass function
#cdf <- function(p, k) {
# cum.prob <- 0
# for (i in 1:k) {
# if (k == 0) {
# cum.prob <- 0
# return(cum.prob)
# break
# }
# cum.prob <- cum.prob + p*(1-p)^(i-1)
# }
# return(cum.prob)
#}
#iteration <- 0
#repeat{
# iteration <- iteration + 1
# u <- runif(1) ## Generate a U(0,1) value
#i <- 0
# repeat { ## Find which x satisfies F(x)<=u<=F(x+1)
# if (cdf(p, i) <= u) {
# if (u <= cdf(p, i + 1)) {
# X <- c(X, i)
# break
# }
#}
# i <- i + 1
#}
# if (iteration == n) {
# break
# }
#}
#X
#???? INVENTADA
#cdf6 <-ppois(q=0.005, lambda, lower.tail = TRUE, log.p = FALSE)
#cdf6
7.Compute the Expected value, the Variance and the Deviation of the number of downloads per hour . Use at least two different methods
#Expected value -> E[V ] = lambda
#Variance -> Var(X) = lambda
lambda <-86
lambda
## [1] 86
#standard deviation
sd7<-sqrt(lambda)
sd7
## [1] 9.273618
1.The marginal distribution of X1
p <- matrix(c(.12,.08,.04,.12,.11,.04,.11,.08,.04,.11,.09,.05),ncol=4)
#pmf
sum(p)
## [1] 0.99
#marginal distribution
# Automatically adds rows and columns and augments the matrix with sums
addmargins(p)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.12 0.12 0.11 0.11 0.46
## [2,] 0.08 0.11 0.08 0.09 0.36
## [3,] 0.04 0.04 0.04 0.05 0.17
## [4,] 0.24 0.27 0.23 0.25 0.99
#The marginal probabilities can be picked off as row and column sums
px <- margin.table(p,1);px
## [1] 0.46 0.36 0.17
2.The marginal distribution of Y1
py <- margin.table(p,2);py
## [1] 0.24 0.27 0.23 0.25
3.The Expected value of X1
# The entries of px*x are px[i]*x[i]. Expectation is sum of these
EX <- sum(px*x); EX
## [1] 0.02244978
4.The Expected value of Y1
EY <- sum(py*y); EY
## [1] 0.1172092
5.The Variance of X1
VX <- sum(px*x^2) - EX^2; VX
## [1] 5.090837e-06
6.The Variance of Y1
VY <- sum(py*y^2)-EY^2;VY
## [1] 0.0001387676
Plot the joint distribution
dim(p)
## [1] 3 4
#library(ggplot2)
#library(lattice)
xmin <- min(p)
xmax <- max(p)
image(p, col = rev(heat.colors(12)))
NASA and the Jet Propulsion Laboratory work on a new testing software called XX. It can detect problems with the code wrx1 = sample(letters[1:3], 1000,prob = c(.45,.35,.20), TRUE) y1 = sample(LETTERS[1:4], 1000, TRUE) x<-(table(x1, y1)/1000) knitr::kable(x, digits = 2, caption = “A table with the joint pmf of X1 and Y1.”)itten by some software engineers. Nevertheless they use an old testing approach to detect coding errors called YY.
Can you try to solve the following questions?:
1.What is the probability that the number of errors detected by XX is less than 43
2.What is the probability that the number of errors detected by YY is greater or equal than 28 given
3.What the probability that the number of errors detected by XX is less or equal than 43
4.What is the probability that the number of errors detected by YY is greater or equal than 28
5.What is the probability that the number of errors detected by YY is greater or equal than 28
6.What is the probability that the number of errors detected by XX is greater than 28
7.What is the probability that the number of errors detected by YY is greater or equal than 91 given
8.that the probability that the number of errors detected by XX is less than 43
The gaussian (or bell-shaped) distribution is the most important continuous distribution. Why? Normality arises naturally in many real contexts ranging from physical to biological, from engineering to social measurement situations. The central limit theorem (CLT) states that under certain (fairly common) conditions, the sum of many random variables will have an approximately normal distribution. Normality is also important in statistical inference.
Where it first Came From
The normal curve was first developed mathematically in 1733 by Abraham de Moivre as an approximation to the binomial distribution. Laplace used the normal curve in 1783 to describe the distribution of errors.
Shape
Symmetric smooth form with a single mode that is also the location of the mean and median. Either side of the mode there is a point of inflection of the bell curve which is one unit (one standard deviation) from the mean. Beyond this point the curve extends towards the x-axis asymptotically, with a theoretical extent to infinity in both directions.
The normal distribution is described by two parameters: the mean mu mu and the standard deviation ???? and the Normal Model is written : X???Norm(??,??)
Note that the mean of the distribution is ??=??= 5791 and sigma is equal to ??=??= 2332.
The marks of 71299138 computing engineers (CE) are normaly distributed with a mean equal to 54 and a standard deviation of 10 The marks of mechanical engineers (ME) also follow a normal distribution. The percentage of ME that get a mark under 0.0951 is 0.5 % and 5.1 % of them get a mark over 6.2
Can you try to solve the following questions?:
Observation: All questions are related to the CE students if no further information is given
s<-qnorm(0.5, mean = 5791, sd = 2332, lower.tail = TRUE, log.p = FALSE)
s
## [1] 5791
1.How many students will get marks under 27
xnorm<- pnorm(27, mean=54, sd=10, lower.tail=FALSE)
xnorm
## [1] 0.996533
2.What is the probability that the RV. is less than 29.7
ynorm<-pnorm(29.7, mean=54, sd=10, lower.tail=FALSE)
ynorm
## [1] 0.9924506
3.What is the probability that the random variable lay between 69.7 and 76.3
anorm<-pnorm(69.7,mean=54,sd=10,76.3)
anorm
## [1] 0.9417924
4.Find the value of the RV. (x) if we know that the probability that RV is greater than (x) is 0.12
enorm<-pnorm(0.12, lower.tail = FALSE)
enorm
## [1] 0.4522416
5.What are the model parameters for the mechanical engineers (ME) students
6.What is the probability that the RV. is less than 14.3
gnorm<-pnorm(14.3, mean=54, sd=10, lower.tail=FALSE)
gnorm
## [1] 0.9999641