Appendix B - Exercises

This is an example on how you can use R to solve some of the exercises in your textbook. You can look at some videos on how to download and install R here: http://dist.stat.tamu.edu/pub/rvideos/

B.5

Lets create \( X_{n} \) from formula (B.52):

x <- rep(0, 40)
x[1] = 79
for (i in 2:40) x[i] = (263 * x[i - 1] + 71)%%100
x = x/100
x
##  [1] 0.79 0.48 0.95 0.56 0.99 0.08 0.75 0.96 0.19 0.68 0.55 0.36 0.39 0.28
## [15] 0.35 0.76 0.59 0.88 0.15 0.16 0.79 0.48 0.95 0.56 0.99 0.08 0.75 0.96
## [29] 0.19 0.68 0.55 0.36 0.39 0.28 0.35 0.76 0.59 0.88 0.15 0.16

The random number generator has generated 40 numbers that should be independent draws of a uniform random variable on the (0,1) interval. A histogram is used to see if \( X_{n} \) is truly a random variable.

bins = seq(0, 1, by = 0.05)
hist(x, breaks = bins, col = "lightblue")

plot of chunk unnamed-chunk-2

These nubers do not look random, there is a clear pattern. Some intervals do not contain data, while others are overrepresented. A truly uniform random (0,1) variable should look like:

u <- runif(40)
hist(u, breaks = bins, col = "red")

plot of chunk unnamed-chunk-3

B.8

The data definition file is found here: http://www.principlesofeconometrics.com/poe4/data/def/uniform1.def

This is the data description:

uniform1.def

u1 u2

obs: 1000 uniform random numbers

u1 = uniform random numbers using seed 12345
u2 = uniform random numbers using seed 1010101

Values generated in Stata 11.1 using function runiform()

Variable |       Obs   |     Mean   | Std. Dev.   |    Min    |    Max
      u1 |      1000   |  .5082264  |  .2930674   | .0012705  | .9997851
      u2 |      1000   |  .5090838  |  .2880467   | .0001617  | .9993329

Lets download the data from the textbook's web page.

uniform1 <- read.table("http://www.principlesofeconometrics.com/poe4/data/dat/uniform1.dat", 
    header = FALSE, sep = "", na.strings = "NA", dec = ".", strip.white = TRUE)
names(uniform1)[c(1, 2)] <- c("u1", "u2")
summary(uniform1)
##        u1                u2          
##  Min.   :0.00127   Min.   :0.000162  
##  1st Qu.:0.24761   1st Qu.:0.280193  
##  Median :0.51443   Median :0.515708  
##  Mean   :0.50823   Mean   :0.509084  
##  3rd Qu.:0.77665   3rd Qu.:0.761179  
##  Max.   :0.99979   Max.   :0.999333  
cat(sd(uniform1$u1), sd(uniform1$u2))
## 0.2931 0.288

A plot of the tvo variables:

attach(uniform1)
plot(u1, u2)

plot of chunk unnamed-chunk-5

detach(uniform1)
uniform1$z1 <- with(uniform1, sqrt(-2 * log(u1)) * cos(2 * pi * u2))
uniform1$z2 <- with(uniform1, sqrt(-2 * log(u1)) * sin(2 * pi * u2))
attach(uniform1)
hist(z1, col = "green")

plot of chunk unnamed-chunk-6

cat(mean(z1), sd(z1))
## -0.01719 1.015
hist(z2, col = "purple")

plot of chunk unnamed-chunk-6

cat(mean(z2), sd(z2))
## -0.03593 0.9632

Histograms are not the best method for determining the shape of a distribution, mainly because they are so strongly affected by the number of bins used. Kernel density plots are more efficient.

plot(density(z1))  # Kernel Density Plot of z1

plot of chunk unnamed-chunk-7

plot(density(z2))  # Kernel Density Plot of z2

plot of chunk unnamed-chunk-7

Both z1 and z2 seem quite normal distributed.

c) Plot z1 and z2 and calculate the correlation.

plot(z2, z1)

plot of chunk unnamed-chunk-8

z = data.frame(z1, z2)
cor(z)
##         z1      z2
## z1  1.0000 -0.0244
## z2 -0.0244  1.0000
cor.test(uniform1$z1, uniform1$z2, alternative = "two.sided", method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  uniform1$z1 and uniform1$z2 
## t = -0.7712, df = 998, p-value = 0.4408
## alternative hypothesis: true correlation is not equal to 0 
## 95 percent confidence interval:
##  -0.08627  0.03765 
## sample estimates:
##     cor 
## -0.0244 
## 

They are not correlated.

Lets do a formal test for normality (http://en.wikipedia.org/wiki/Shapiro%E2%80%93Wilk_test)

shapiro.test(z1)
## 
##  Shapiro-Wilk normality test
## 
## data:  z1 
## W = 0.9993, p-value = 0.9688
## 
shapiro.test(z2)
## 
##  Shapiro-Wilk normality test
## 
## data:  z2 
## W = 0.9987, p-value = 0.6771
## 

The null hypothesis is that the population is normally distributed. If the p-value is less than the chosen \( \alpha \)-level, then the null hypothesis is rejected.