This is an example on how you can use R to solve some of the exercises in your textbook. You can look at some videos on how to download and install R here: http://dist.stat.tamu.edu/pub/rvideos/
Lets create \( X_{n} \) from formula (B.52):
x <- rep(0, 40)
x[1] = 79
for (i in 2:40) x[i] = (263 * x[i - 1] + 71)%%100
x = x/100
x
## [1] 0.79 0.48 0.95 0.56 0.99 0.08 0.75 0.96 0.19 0.68 0.55 0.36 0.39 0.28
## [15] 0.35 0.76 0.59 0.88 0.15 0.16 0.79 0.48 0.95 0.56 0.99 0.08 0.75 0.96
## [29] 0.19 0.68 0.55 0.36 0.39 0.28 0.35 0.76 0.59 0.88 0.15 0.16
The random number generator has generated 40 numbers that should be independent draws of a uniform random variable on the (0,1) interval. A histogram is used to see if \( X_{n} \) is truly a random variable.
bins = seq(0, 1, by = 0.05)
hist(x, breaks = bins, col = "lightblue")
These nubers do not look random, there is a clear pattern. Some intervals do not contain data, while others are overrepresented. A truly uniform random (0,1) variable should look like:
u <- runif(40)
hist(u, breaks = bins, col = "red")
The data definition file is found here: http://www.principlesofeconometrics.com/poe4/data/def/uniform1.def
This is the data description:
uniform1.def
u1 u2
obs: 1000 uniform random numbers
u1 = uniform random numbers using seed 12345
u2 = uniform random numbers using seed 1010101
Values generated in Stata 11.1 using function runiform()
Variable | Obs | Mean | Std. Dev. | Min | Max
u1 | 1000 | .5082264 | .2930674 | .0012705 | .9997851
u2 | 1000 | .5090838 | .2880467 | .0001617 | .9993329
Lets download the data from the textbook's web page.
uniform1 <- read.table("http://www.principlesofeconometrics.com/poe4/data/dat/uniform1.dat",
header = FALSE, sep = "", na.strings = "NA", dec = ".", strip.white = TRUE)
names(uniform1)[c(1, 2)] <- c("u1", "u2")
summary(uniform1)
## u1 u2
## Min. :0.00127 Min. :0.000162
## 1st Qu.:0.24761 1st Qu.:0.280193
## Median :0.51443 Median :0.515708
## Mean :0.50823 Mean :0.509084
## 3rd Qu.:0.77665 3rd Qu.:0.761179
## Max. :0.99979 Max. :0.999333
cat(sd(uniform1$u1), sd(uniform1$u2))
## 0.2931 0.288
A plot of the tvo variables:
attach(uniform1)
plot(u1, u2)
detach(uniform1)
uniform1$z1 <- with(uniform1, sqrt(-2 * log(u1)) * cos(2 * pi * u2))
uniform1$z2 <- with(uniform1, sqrt(-2 * log(u1)) * sin(2 * pi * u2))
attach(uniform1)
hist(z1, col = "green")
cat(mean(z1), sd(z1))
## -0.01719 1.015
hist(z2, col = "purple")
cat(mean(z2), sd(z2))
## -0.03593 0.9632
Histograms are not the best method for determining the shape of a distribution, mainly because they are so strongly affected by the number of bins used. Kernel density plots are more efficient.
plot(density(z1)) # Kernel Density Plot of z1
plot(density(z2)) # Kernel Density Plot of z2
Both z1 and z2 seem quite normal distributed.
c) Plot z1 and z2 and calculate the correlation.
plot(z2, z1)
z = data.frame(z1, z2)
cor(z)
## z1 z2
## z1 1.0000 -0.0244
## z2 -0.0244 1.0000
cor.test(uniform1$z1, uniform1$z2, alternative = "two.sided", method = "pearson")
##
## Pearson's product-moment correlation
##
## data: uniform1$z1 and uniform1$z2
## t = -0.7712, df = 998, p-value = 0.4408
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.08627 0.03765
## sample estimates:
## cor
## -0.0244
##
They are not correlated.
Lets do a formal test for normality (http://en.wikipedia.org/wiki/Shapiro%E2%80%93Wilk_test)
shapiro.test(z1)
##
## Shapiro-Wilk normality test
##
## data: z1
## W = 0.9993, p-value = 0.9688
##
shapiro.test(z2)
##
## Shapiro-Wilk normality test
##
## data: z2
## W = 0.9987, p-value = 0.6771
##
The null hypothesis is that the population is normally distributed. If the p-value is less than the chosen \( \alpha \)-level, then the null hypothesis is rejected.