Statistical Inference - Course Project (part I)

A 1000 simulation of 40 samples of an exponential distribution is performed to determine it’s center and compare it with the theorotical center of the distribution.

   nsim <- 1000 # number of simulation
   n <- 40      # number of samples
   lambda <- 0.2
   
   # simulated data
   simdata <- sapply(1:nsim, function(x){
           mean(rexp(n, lambda))
   })

Mean, Standard Deviation and Variance of simulated data:

  simdata_mean = mean(simdata)
  cat("mean = ", simdata_mean = mean(simdata), "\n")

## mean =  5.035

  simdata_sd = sd(simdata)
  cat("standard deviation = ", simdata_sd = sd(simdata), "\n")

## standard deviation =  0.8145

  sim_var = var(simdata)
  cat("variance = ", sim_var = var(simdata), "\n")

## variance =  0.6634

Theoretical Mean and Standard Deviation:

   theorotical_mean <- 1/lambda
   theorotical_mean

## [1] 5

Both values: theorotical mean = 5 and simulated mean = 5.0352 are very close.

We wish to show how variable is the above simulated distribution, and compare it to the theoretical variance of the distribution.

theorotical_var <- ((1/lambda)/sqrt(n))^2
theorotical_var

## [1] 0.625

sim_var

## [1] 0.6634

The variance of the simulated data is 0.6634 is a bit larger than the variance of theorotical variance 0.625.

Show that the distribution is approximately normal.

Make a normal distribution

  normal<- rnorm(nsim,mean=theorotical_mean,sd= theorotical_mean/sqrt(n))
  data<-c(simdata,normal)
  dist<-as.factor(c(rep('simulation', length(simdata)), rep('normal', length (normal))))
  df<-data.frame(val=data,type=dist)

Plot both distributions: Normal and simulated distribution

  library(ggplot2) 
  ggplot(df,aes(x=val,fill=dist)) + 
  geom_histogram(aes(y=..density..),binwidth=.1,colour="black", fill="white") +
  geom_density(alpha=.2) +
  geom_vline(aes(xintercept=theorotical_mean),color="red",linetype="solid", size=1) +
  geom_vline(aes(xintercept=mean(data, na.rm=T)),color="blue",linetype="solid", size=1) +
  scale_x_continuous(breaks = 1:10) +
  scale_y_continuous(name = "Density")

## Warning: position_stack requires constant width: output may be incorrect

plot of chunk unnamed-chunk-6

   t.test(simdata,normal)

## 
##  Welch Two Sample t-test
## 
## data:  simdata and normal
## t = -0.086, df = 1998, p-value = 0.9315
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.07443  0.06818
## sample estimates:
## mean of x mean of y 
##     5.035     5.038

Both simulated and theorotical distributions are shown in green and red. The mean of simulaterd mean is presented vertical blue line while the theoretical mean should be in red line and they are very close each other.

Evaluate the coverage of the confidence interval for 1/lambda: X¯±1.96S/√n

The confidence interval is…

   CI <- mean(data) + c(-1,1) * 1.96 * sd(data)/sqrt(n)
   CI

## [1] 4.785 5.289

The confidence interval for 1/lambda is [4.7849, 5.2887]

Statistical Inference - Course Project (part I)

Noha Elprince

October 24, 2014

A 1000 simulation of 40 samples of an exponential distribution is performed to determine it’s center and compare it with the theorotical center of the distribution.

We wish to show how variable is the above simulated distribution, and compare it to the theoretical variance of the distribution.

Show that the distribution is approximately normal.

Evaluate the coverage of the confidence interval for 1/lambda: X¯±1.96S/√n