Title: “Assignment 1 - Statistical Inference” |
Author: “ahlulwulus” |
Date: “November 21, 2015” |
Output: pdf_document |
Author: Ahlulwulus
This is the report for part 1 (simulation). The problem statement is defined as follows:
The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also also 1/lambda. Set lambda = 0.2 for all of the simulations.
First, we will collect 1,000 sample means of size n=40 from a simulated dataset that has exponential distribution with lambda = 0.2
set.seed(1)
library(data.table)
Define function to collect sample mean of n=40 from aforementioned distribution
sim_exp = function(x){return(x * mean(rexp(40, 0.2)))}
Then, collect 1000 sample means of size n=40 with lambda = 0.2
sim = apply(data.table(rep.int(1, 1000)), 1, sim_exp)
Show the distribution is centered at and compare it to the theoretical center of the distribution.
Show how variable it is and compare it to the theoretical variance of the distribution.
These two questions can be answered by simply calculating the observed and theoretical mean and variance. The theoretical variance is the square of the theoretical standard deviation, which is 1/lambda divided by the square root of the sample size n =40. The theoretical mean is 1/lambda.
theoretical_variance = round(((1/.2) / sqrt(40))^2,3)
actual_variance = round(var(sim),3)
theoretical_mean = 1/.2
actual_mean = round(mean(sim),3)
compare_var = rbind(theoretical_mean,actual_mean,theoretical_variance, actual_variance)
The difference theoretical variance of the distribution
compare_var
## [,1]
## theoretical_mean 5.000
## actual_mean 4.990
## theoretical_variance 0.625
## actual_variance 0.611
As we can see, the theoretical variance and observed variance are very similar. Furthermore, the theoretical mean and actual mean are also very similar.
Generate the Histogram
As we can see from this histogram, the observed density is very similar to the normal density, and thus it is very fair to say that the observed data is approximately normal.
Calculate the confidence interval
ci = mean(sim) + (c(-1,1)* (1.96 * ((1/0.2) / sqrt(40))))
ci
## [1] 3.440509 6.539541
Calculate the coverage
sum(between(sim, ci[1], ci[2])) / length(sim)
## [1] 0.949
The coverage is approximately 94.9%. This is expected as the data is normally distributed.