The chi-squared distribution with df degrees of freedom is the distribution of the sums of the squares of df independent standard normal random variables.
The chi-squared distribution is positively skewed with values between \(0\) and \(\infty\). The mean is \(\mu = df\) and the variance is \(\sigma = 2df\). When \(df >= 2\), the maximum occurs at \(\chi_n^2 = (n-1)-2\). As \(n \to \infty\), the distribution approaches normal. The chi-square distribution is heavily skewed right, so calculate an upper and lower \(\chi_{df}^2\) to create confidence intervals.
dchisq, pchisq, qchisq, and rchisqR function dchisq(x, df) is the probability of \(\chi^2\) equalling x when the degrees of freedom is df. R function pchisq(q, sd, lower.tail) is the cumulative probability (lower.tail = TRUE for left tail, lower.tail = FALSE for right tail) of less than or equal to value q. R function rchisq(n, df) returns n random numbers from the chi-square distribution. R function qchisq(p, df, lower.tail) is the value of x at the qth percentile (lower.tail = TRUE).
The chi-squared distribution has numerous applications. The Chi-squared test of population variance tests the likelihood of a hypothesized population variance. The Chi-squared goodness of fit test tests goodness of fit in categorical data analysis, and the Chi-square test of independence tests independence.
library(dplyr)
library(ggplot2)
library(tidyr)
data.frame(chisq = 0:7000 / 100) %>%
mutate(df_05 = dchisq(x = chisq, df = 5),
df_15 = dchisq(x = chisq, df = 15),
df_30 = dchisq(x = chisq, df = 30)) %>%
gather(key = "df", value = "density", -chisq) %>%
ggplot() +
geom_line(aes(x = chisq, y = density, color = df)) +
labs(title = "Chi-Square at Various Degrees of Freedom",
x = "Chi-square",
y = "Density")