Part 1: Independent or Dependent Variables?

1)

  1. Scatter and contour plots for X and Y. It seems that X and Y are independent since both the scatter plot and contour plot have no particular shape or pattern.
X = runif(1000)
Y = runif(1000)

xx = mvrnorm(1000, mu=c(0,0), Sigma =matrix(c(1,.5,.5,1), 2))
xx.kde=kde2d(xx[,1], xx[,2], n =50)

yy = mvrnorm(1000, mu=c(0,0), Sigma =matrix(c(1,.5,.5,1), 2))
yy.kde=kde2d(yy[,1], yy[,2], n =50)


contour(xx.kde)

contour(yy.kde)

plot(X)

plot(Y)

  1. Scatter and contour plot for U and V. From the scatter plots, it seems that there is a concentration of points near the center. The contour plot is also stretched out which implies that U and V are not independent.
U = X-Y
V = X+Y

uu = mvrnorm(1000, mu=c(0,0), Sigma =matrix(c(1,.5,.5,1), 2))
uu.kde=kde2d(uu[,1], uu[,2], n =50)

vv = mvrnorm(1000, mu=c(0,0), Sigma =matrix(c(1,.5,.5,1), 2))
vv.kde=kde2d(vv[,1], vv[,2], n =50)


contour(uu.kde)

contour(vv.kde)

plot(U)

plot(V)

2)

  1. It seems that Z1 and Z2 are independent since both the scatter plot and contour plot have no particular shape or pattern.
Z1 = rnorm(1000)
Z2 = rnorm(1000)

zz1 = mvrnorm(1000, mu=c(0,0), Sigma =matrix(c(1,.5,.5,1), 2))
zz1.kde=kde2d(zz1[,1], zz1[,2], n =50)

zz2 = mvrnorm(1000, mu=c(0,0), Sigma =matrix(c(1,.5,.5,1), 2))
zz2.kde=kde2d(zz2[,1], zz2[,2], n =50)

U = Z1-Z2
V = Z1+Z2

contour(zz1.kde)

contour(zz2.kde)

plot(X)

plot(Y)

b) U and V both accumulate around 0 and have a stretched out contour plot. Therefore, U and V do not seem independent.

U = Z1-Z2
V = Z1+Z2

uu = mvrnorm(1000, mu=c(0,0), Sigma =matrix(c(1,.5,.5,1), 2))
uu.kde=kde2d(uu[,1], uu[,2], n =50)

vv = mvrnorm(1000, mu=c(0,0), Sigma =matrix(c(1,.5,.5,1), 2))
vv.kde=kde2d(vv[,1], vv[,2], n =50)


contour(uu.kde)

contour(vv.kde)

plot(U)

plot(V)

## Part 2 Exploring Distributions

###1)

  1. Chi-Squared
x <- rchisq(1000, 1)
plot(density(x))
hist(x, freq=FALSE, add=TRUE)

mean(x)
## [1] 0.9766505
var(x)
## [1] 1.877138
x <- rchisq(1000, 10)
plot(density(x))
hist(x, freq=FALSE, add=TRUE)

mean(x)
## [1] 9.945407
var(x)
## [1] 19.2963

2)

  1. The graphs for df = 1 is much skinnier and gives less information than df = 30. This makes sense since there is only 1 degree of freedom. In the graph with more degrees of freedom, we can see that the distribution is roughly normal centered at 0. The t.ratio has less skew than the t distribution.
t1 <- rt(1000, df=1)
plot(density(t1))
hist(t1, freq=FALSE, add=TRUE)

z <- rnorm(1000)
v <- rchisq(1000, 1)

t.ratio <- z/sqrt(v/1)
plot(density(t.ratio))
hist(t.ratio, freq=FALSE, add=TRUE)

b)

t1 <- rt(1000, df=30)
plot(density(t1))
hist(t1, freq=FALSE, add=TRUE)

z <- rnorm(1000)
v <- rchisq(1000, 30)

t.ratio <- z/sqrt(v/30)
plot(density(t.ratio))
hist(t.ratio, freq=FALSE, add=TRUE)

c) The 95th percentile would mean that 95% of the data is below that value. So, a quantile represents parts of the data that is greater than or below a certain limit.

qnorm(0.95)
## [1] 1.644854
qt(.95, 1)
## [1] 6.313752
qt(.95, 2)
## [1] 2.919986
qt(.95, 3)
## [1] 2.353363
qt(.95, 10)
## [1] 1.812461
qt(.95, 20)
## [1] 1.724718
qt(.95, 30)
## [1] 1.697261

3)

u <- rf(1000, 3, 7)
plot(density(u))
hist(u, freq=FALSE, add=TRUE)

v <- rf(1000, 3, 27)
plot(density(v))
hist(v, freq=FALSE, add=TRUE)

 qf(0.95, 3, 7)
## [1] 4.346831
 qf(0.95, 7, 3)
## [1] 8.886743
  1. The distribution with more degrees of freedom is wider than the one with less degrees of freedom. The distributions aer both right skewed.

  2. The two qfs are very different depending on df1 and df2. The data will be more spread with high df so the .95 quantile will be greater.