Homework 6

Problem 1

Random Walk: A particle moves along the line in a random walk. That is, the particle starts at the origin (position 0) and moves either right or left in independent steps of length 1. If the particle moves to the right with probability 0.6, its movement at the \(i\)-th step is a random variable \(X_i\) with distribution \[P(X_i=1)=0.6\] \[(X_i=-1)=0.4\]

We can use R to demonstrate this random walk. Below is an example of 10 steps of random walk:

set.seed(50)
sample(c(1,-1),size=10,replace=TRUE,prob=c(0.6,0.4)) # 10 steps of being either 1 or -1

##  [1] -1  1  1 -1  1  1 -1 -1  1  1

Let \(W\) be the number of steps to the right in this example. Count \(W\) in this example, and find the exact probability of a particle moves right for \(W\) out of 10 steps. Show your work.

The number of times that the particle steps to the right in this example is 6 times. The net steps away from the origin the particle moves is two steps to the right. The exact probability of w equaling 6 in a sample size of 10 is (.6^6)(.4^4) (10!/6!(4!))= 210(.6^6)(.44)= .2508

Suppose the particle moved 10 steps, what is the probability that the particle had moved right at most 7 times?

p(0)+p(1)+p(2)+p(3)+p(4)+p(5)+p(6)+p(7)= .832710

Problem 2

Refer to problem 1.

Let \(\bar{X}\) denotes the average movement toward right of each step, \(\bar{X}=\frac{1}{n}(X_1+X_2+\cdots+X_k)\). Use the central limit theorem to find the approximate probability that the average movement when \(k=100\) is within \(0.6 \pm 0.1\). Show your work.

Since n>30, we can use the central limit theorem to assume a normal distribution for this distribution. The mean of the sample is .6 and the standard deviation is .0489897. We then find the z scores for .5 and .7 which are -2.04 and 2.04. Using a z score table, we find the probability of \(\bar{X}\) being within .5 and .7 is .9793-.0207 = .9586

Let \(Y\) denotes the position of the particle after \(k\) steps, \(Y=X_1+X_2+\cdots+X_k\). Find the approximate probability that the position of the particle after 500 steps is at least 180 to the right. Show your work.

p(rightatleast180)=p(340)+p(341)+p(342)+…. + p(499)+(p500)

Simulate the random walk for 100 steps and make the plot of the particle’s position (\(Y\)) v.s. step \(k\). You can modify upon the following codes:

set.seed(100)
X<-sample(c(1,-1),size=100,replace=TRUE,prob=c(0.6,0.4)) # simulation of 100 steps
Y=0   # current position
Position = 0;
plot(0,0,xlim=c(0,100),ylim=c(-10,50),type="n")
for(i in 1:100){
  # modify the next line
  Y<-X[i]     # The particle's position at step i 
  Position = Position + Y 
  points(i,Position) # plot the position
}

Problem 3

Fuel Efficiency Computers in some vehicles calculate the various quantities related to performance. One of these is the fuel efficiency, or gas mileage, usually expressed as miles per gallon(mpg). For one vehicle equipped in this way, the miles per gallon were recorded each time the gas tank was filled. and the computer was then reset. Here are the mpg values for a random sample of 20 records:

mpg<-c(41.5,50.7,36.6,37.3,34.2,45.0,48.0,43.2,47.7,42.2,43.2,44.6,48.4,46.4,46.8,39.2,37.3,43.5,44.3,43.3)
print(mpg)

##  [1] 41.5 50.7 36.6 37.3 34.2 45.0 48.0 43.2 47.7 42.2 43.2 44.6 48.4 46.4
## [15] 46.8 39.2 37.3 43.5 44.3 43.3

Suppose that the standard deviation is known to be \(\sigma=3.5\)mpg.

What is \(\sigma_{\bar{X}}\), the standard deviation of \(\bar{X}\)?

3.5/square root of 20 = .78262

Examine the data for skewness and other signs of non-Normality. Show your plots and numerical summaries. Do you think it is reasonable to construct a confidence interval based on the Normal distribution? Explain your answer.

summary(mpg)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   34.20   40.92   43.40   43.17   46.50   50.70

sd(mpg)

## [1] 4.414939

mean(mpg)

## [1] 43.17

hist(mpg, xlab = "MPG", xlim = c(30,55))

Given the summary and the histogram, I do believe a normal distribution would be appropriate for this sample set. The data set has no significant skew and n>15. We can proceed with caution and construct a confidence interval based on the Normal distribution.

Give a 95% confidence interval for \(\mu\), the true mean of mpg for this vehicle.

For a 95 % CI, z star will be 1.96, our sample mean is 43.17. So we know that the CI is x +- margin of error. 43.17 +- (1.96)*.78262 = A confidence interval of 41.636 to 44.704. We can say with 95 % confidence that the true mean of fuel efficiency for cars is within this interval.

Problem 4

Multiple Confidence Intervals As we prepare to take a sample and compute a 95% confidence interval, we know that the probability that the interval we compute will cover the parameter is 0.95. If we plan to use several such intervals, however, our confidence that all of them will cover the parameter is less than 95%.

Refer to Problem 3, suppose the miles per gallon for that vehicle follows a Normal distribution with mean \(\mu=43\) and standard deviation \(\sigma=3.5\), and another vehicle’s mpg follows a Normal distribution with mean \(\mu=38\) and standard deviation \(\sigma=4\).

Use R to draw 15 samples of size 20 from each of the two Normal distributions and construct 80% confidence intervals for each sample. Modify the following code and make a plot of your confidence intervals for each distribution.

set.seed(5)
T=15     # we draw 15 samples
conf1=NULL;conf2=NULL
for(i in 1:T){
  sample1<-rnorm(20,43,3.5)  # sample for first distribution
  sample2<-rnorm(20,38,4)    # sample for 2nd distribution
  z=1.282         # Z-score for 80% confidence level
  mean1<-mean(sample1)       #  sample mean for 1st
  upper1<-mean1+z*3.5/sqrt(20)  # upper bound of confidence interval for 1st
  lower1<-mean1-z*3.5/sqrt(20)  # lower bound of confidence interval for 1st
  conf1=rbind(conf1,c(L=lower1,U=upper1)) 
  
  # write your own code of constructing confidence interval for 2nd distribution
  mean2<-mean(sample2)
  upper2<-mean2+z*4/sqrt(20)
  lower2<-mean2-z*4/sqrt(20)
  conf2=rbind(conf2,c(L=lower2,U=upper2))
}

conf1=data.frame(conf1);conf2=data.frame(conf2);

par(mfrow=c(1,2))

#  lot confidence intervals for 1st distribution

plot(rep(43,T),1:T,xlim=c(40,46),col="red",type="l",xlab="Normal(43,3.5)", ylab="") # where the true mean lies
for (i in 1:T){
  lines(x=c(conf1$L[i],conf1$U[i]),y=c(i,i))  # draw confidence interval from lower to upper
}

#  write your code of ploting confidence intervals for 2nd distribtuion###
plot(rep(38,T),1:T,xlim=c(35,41),col="red",type="l",xlab="Normal(38,4)", ylab="") # true mean for 2nd distribution
for (i in 1:T){
  lines(x=c(conf2$L[i],conf2$U[i]),y=c(i,i))
}

What’s the proportion of confidence intervals covering the true mean for 1st distribution? What’s the proportion of confidence intervals covering the true mean for 2nd distribution? What’s the proportion of both confidence intervals simultaneously covering the true means for both distributions?

For the first distribution, 10/15 are covering the true mean. For the 2nd distribution, 12/15 are covering the true mean. The proportion of both confidence intervals simultaneiously covering the true means for both distributions is 8/15

Theoretically if we construct two 80% confidence intervals for each of the distribution, what is the probability that the two confidence interval cover the true means for both distribution? What is the probability that at least one confidence interval covers the true mean for any of the distribution?

As we lower the confidence interval percentage, more and more of our samples will not be include the true mean because our z star will be much lower. So for the first distribution, it will be .8 * 10/15 = .53333 For the second distribution the probability will be .8*12/15 = .64 The probability that atleast one covers the true mean for any distribution is .34

Problem 5

P-value A test of the null hypothesis: \(H_0:\mu=\mu_0\) gives a test statistic \(z=1.73\).

What is the P-value if the alternative is \(H_{a}:\mu>\mu_0\)? Do you reject the null at significance level of 5%? How about significance level of 10% or 1%?

With a z-score of 1.73, we find that the probability is 1-.9582 = .0418 At the significance level of 5%, We reject the null hypothesis. At a significance level of 10 % we will still reject the null hypothesis. A the 1% significance level, we will fail to reject the null hypothesis.

What is the P-value if the alternative is \(H_{a}:\mu<\mu_0\)? Do you reject the null at significance level of 5%? How about significance level of 10% or 1%?

With a z-score of 1.73, we find that the probability is 1-.0418= .9582 At all significance levels asked in this question we fail to reject the null hypothesis since the P-value is so large.

What is the P-value if the alternative is \(H_{a}:\mu \neq \mu_0\)? Do you reject the null at significance level of 5%? How about significance level of 10% or 1%?

The p-value is .0836, At a significance level of 5%, we would fail to reject the null hypothesis, at a significance level of %10 we would fail to reject the null hypothesis. Since a significance level of 1 % is < 5% we will fail to reject null hypothesis for that situation as well.