library(MASS)
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.0.6 v dplyr 1.0.3
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## x dplyr::select() masks MASS::select()
set.seed(123)
X <- runif(1000, 0, 1)
Y <- runif(1000, 0, 1)
U <- X - Y
V <- X + Y
plot(X, Y)
XY.kde <- kde2d(X, Y)
contour(XY.kde)
It looks like X and Y are independent. This plot shows no clear pattern, showing that the values are not dependent on the others.
UV.kde <- kde2d(U, V)
plot(U, V)
contour(UV.kde)
These plots also show that there is no clear pattern, showing the independence of U and V.
Z1<-rnorm(1000, 0, 1)
Z2<-rnorm(1000, 0, 1)
U2<- Z1-Z2
V2<- Z1+Z2
plot(Z1, Z2)
Z1Z2.kde <- kde2d(Z1, Z2)
contour(Z1Z2.kde)
These plots show that Z1 and Z2 are independent, but moreso than from the distribution in the previous example
plot(U2, V2)
U2V2.kde<-kde2d(U2, V2)
contour(U2V2.kde)
These two plots also seem independent, as I cannot see any clear pattern.
X <- rchisq(1000, 1)
plot(density(X))
hist(X, freq = FALSE, add = TRUE)
mean(X)
## [1] 0.9853859
var(X)
## [1] 2.12063
Mean of X would be the average of all the values of X divided by how large the sample size is. Variance would be the same as normal variance: \[1/(n-1)*Sigma(xi- xbar)^2\]
x_10<- rchisq(1000, 10)
plot(density(x_10))
hist(x_10, freq = FALSE, add = TRUE)
mean(x_10)
## [1] 10.02632
var(x_10)
## [1] 19.01786
t1 <- rt(1000, df = 1)
plot(density(t1))
hist(t1, freq = FALSE, add = TRUE)
z<- rnorm(1000)
v <- rchisq(1000, 1)
t.ratio <-z/sqrt(v/1)
plot(density(t.ratio))
hist(t.ratio,freq = FALSE, add = TRUE)
The first distribution seems more standardized than the second one; the second one shows more values in the negative than the other. This shows me there is some kind of skew.
t2 <- rt(1000, df = 30)
z2<- rnorm(1000)
v2 <- rchisq(1000, 30)
t2.ratio <- z2/sqrt(v2/30)
plot(density(t2.ratio))
hist(t2.ratio,freq = FALSE, add = TRUE)
Quantiles show levels in a distribution so that a proportion of values lie below that level. Taking the 95th percentile, for example, would show 95% of the data is below that point (in this case, below means to the left of that mark), with 5% remaining above.
qnorm(0.95)
## [1] 1.644854
# d=1
qt(.95, 1)
## [1] 6.313752
# d=2
qt(.95, 2)
## [1] 2.919986
# d=3
qt(.95, 3)
## [1] 2.353363
# d=10
qt(.95, 10)
## [1] 1.812461
# d=20
qt(.95, 20)
## [1] 1.724718
# d=30
qt(.95, 30)
## [1] 1.697261
I notice that as the degrees of freedom increase, the proportion of values below the .95 quantile gets closer and closer to the value of 1. Compared to the standard normal distribution, these values are still a bit higher when compared with this distribution.
u <- rf(1000, 3, 7)
plot(density(u))
hist(u, freq = FALSE, add = TRUE)
v <- rf(1000, 3, 27)
plot(density(v))
hist(v, freq = FALSE, add=TRUE)
These two look very similar. It doesn’t seem that adding more degrees of freedom to the second degree of freedom input really makes a difference in how the distribution is shown.
# F(3, 7)
qf(.95, 3, 7)
## [1] 4.346831
# F(7, 3)
qf(.95, 7, 3)
## [1] 8.886743
I notice that in the F(3, 7) distribution, the output is roughly half of the F(7, 3) distribution. I wonder what the difference between the two degrees of freedom inputs is, and furthermore why does it make this difference?