Random Walk: A particle moves along the line in a random walk. That is, the particle starts at the origin (position 0) and moves either right or left in independent steps of length 1. If the particle moves to the right with probability 0.6, its movement at the \(i\)-th step is a random variable \(X_i\) with distribution \[P(X_i=1)=0.6\] \[(X_i=-1)=0.4\]
set.seed(50)
sample(c(1,-1),size=10,replace=TRUE,prob=c(0.6,0.4)) # 10 steps of being either 1 or -1
## [1] -1 1 1 -1 1 1 -1 -1 1 1
Let \(W\) be the number of steps to the right in this example. Count \(W\) in this example, and find the exact probability of a particle moves right for \(W\) out of 10 steps. Show your work.
The number of times that the particle steps to the right in this example is 6 times. The net steps away from the origin the particle moves is two steps to the right. The exact probability of w equaling 6 in a sample size of 10 is (.6^6)(.4^4) (10!/6!(4!))= 210(.66)(.44)= .2508
p(0)+p(1)+p(2)+p(3)+p(4)+p(5)+p(6)+p(7)= .832710
Refer to problem 1.
Since n>30, we can use the central limit theorem to assume a normal distribution for this distribution. The mean of the sample is .6 and the standard deviation is .0489897. We then find the z scores for .5 and .7 which are -2.04 and 2.04. Using a z score table, we find the probability of \(\bar{X}\) being within .5 and .7 is .9793-.0207 = .9586
p(rightatleast180)=p(340)+p(341)+p(342)+…. + p(499)+(p500)
set.seed(100)
X<-sample(c(1,-1),size=100,replace=TRUE,prob=c(0.6,0.4)) # simulation of 100 steps
Y=0 # current position
Position = 0;
plot(0,0,xlim=c(0,100),ylim=c(-10,50),type="n")
for(i in 1:100){
# modify the next line
Y<-X[i] # The particle's position at step i
Position = Position + Y
points(i,Position) # plot the position
}
Fuel Efficiency Computers in some vehicles calculate the various quantities related to performance. One of these is the fuel efficiency, or gas mileage, usually expressed as miles per gallon(mpg). For one vehicle equipped in this way, the miles per gallon were recorded each time the gas tank was filled. and the computer was then reset. Here are the mpg values for a random sample of 20 records:
mpg<-c(41.5,50.7,36.6,37.3,34.2,45.0,48.0,43.2,47.7,42.2,43.2,44.6,48.4,46.4,46.8,39.2,37.3,43.5,44.3,43.3)
print(mpg)
## [1] 41.5 50.7 36.6 37.3 34.2 45.0 48.0 43.2 47.7 42.2 43.2 44.6 48.4 46.4
## [15] 46.8 39.2 37.3 43.5 44.3 43.3
Suppose that the standard deviation is known to be \(\sigma=3.5\)mpg.
3.5/square root of 20 = .78262
summary(mpg)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 34.20 40.92 43.40 43.17 46.50 50.70
sd(mpg)
## [1] 4.414939
mean(mpg)
## [1] 43.17
hist(mpg, xlab = "MPG", xlim = c(30,55))
Given the summary and the histogram, I do believe a normal distribution would be appropriate for this sample set. The data set has no significant skew and n>15. We can proceed with caution and construct a confidence interval based on the Normal distribution.
For a 95 % CI, z star will be 1.96, our sample mean is 43.17. So we know that the CI is x +- margin of error. 43.17 +- (1.96)*.78262 = A confidence interval of 41.636 to 44.704. We can say with 95 % confidence that the true mean of fuel efficiency for cars is within this interval.
Multiple Confidence Intervals As we prepare to take a sample and compute a 95% confidence interval, we know that the probability that the interval we compute will cover the parameter is 0.95. If we plan to use several such intervals, however, our confidence that all of them will cover the parameter is less than 95%.
Refer to Problem 3, suppose the miles per gallon for that vehicle follows a Normal distribution with mean \(\mu=43\) and standard deviation \(\sigma=3.5\), and another vehicle’s mpg follows a Normal distribution with mean \(\mu=38\) and standard deviation \(\sigma=4\).
set.seed(5)
T=15 # we draw 15 samples
conf1=NULL;conf2=NULL
for(i in 1:T){
sample1<-rnorm(20,43,3.5) # sample for first distribution
sample2<-rnorm(20,38,4) # sample for 2nd distribution
z=1.282 # Z-score for 80% confidence level
mean1<-mean(sample1) # sample mean for 1st
upper1<-mean1+z*3.5/sqrt(20) # upper bound of confidence interval for 1st
lower1<-mean1-z*3.5/sqrt(20) # lower bound of confidence interval for 1st
conf1=rbind(conf1,c(L=lower1,U=upper1))
# write your own code of constructing confidence interval for 2nd distribution
mean2<-mean(sample2)
upper2<-mean2+z*4/sqrt(20)
lower2<-mean2-z*4/sqrt(20)
conf2=rbind(conf2,c(L=lower2,U=upper2))
}
conf1=data.frame(conf1);conf2=data.frame(conf2);
par(mfrow=c(1,2))
# lot confidence intervals for 1st distribution
plot(rep(43,T),1:T,xlim=c(40,46),col="red",type="l",xlab="Normal(43,3.5)", ylab="") # where the true mean lies
for (i in 1:T){
lines(x=c(conf1$L[i],conf1$U[i]),y=c(i,i)) # draw confidence interval from lower to upper
}
# write your code of ploting confidence intervals for 2nd distribtuion###
plot(rep(38,T),1:T,xlim=c(35,41),col="red",type="l",xlab="Normal(38,4)", ylab="") # true mean for 2nd distribution
for (i in 1:T){
lines(x=c(conf2$L[i],conf2$U[i]),y=c(i,i))
}
For the first distribution, 10/15 are covering the true mean. For the 2nd distribution, 12/15 are covering the true mean. The proportion of both confidence intervals simultaneiously covering the true means for both distributions is 8/15
As we lower the confidence interval percentage, more and more of our samples will not be include the true mean because our z star will be much lower. So for the first distribution, it will be .8 * 10/15 = .53333 For the second distribution the probability will be .8*12/15 = .64 The probability that atleast one covers the true mean for any distribution is .34
P-value A test of the null hypothesis: \(H_0:\mu=\mu_0\) gives a test statistic \(z=1.73\).
With a z-score of 1.73, we find that the probability is 1-.9582 = .0418 At the significance level of 5%, We reject the null hypothesis. At a significance level of 10 % we will still reject the null hypothesis. A the 1% significance level, we will fail to reject the null hypothesis.
With a z-score of 1.73, we find that the probability is 1-.0418= .9582 At all significance levels asked in this question we fail to reject the null hypothesis since the P-value is so large.
The p-value is .0836, At a significance level of 5%, we would fail to reject the null hypothesis, at a significance level of %10 we would fail to reject the null hypothesis. Since a significance level of 1 % is < 5% we will fail to reject null hypothesis for that situation as well.