Dear teaching team, I think I am learning R markdown a bit, and it seems a game changer for sure. Happy to have you throughout the PHS! I learned wiki or chatgpt could give the latex formatted formulas at ease. “\” - sign seems to give the Greek alphabet.
The random variable is the count of exoneration among departments as discrete values.
For this random variable, let us assume a Poisson variable. Hence the probability mass function is as followed.
\(P(Y=yi) = \frac{e^{-\lambda}\lambda^{yi}}{yi!}\)
Y- random variable,
yi- specific observation of the random variable,
\(\lambda\)-mean per draw
Let us demonstrate this in a scenario for a value of lambda=2.
(exp(-2*10)*2^(2+1+2+1+1+3+0+1+3+0))/(factorial(2)*factorial(1)*factorial(2)*factorial(1)*factorial(1)*factorial(3)*factorial(0)*factorial(1)*factorial(3)*factorial(0))
## [1] 2.345135e-07
Therefore the likelihood (as I understand this is probability for exact point probability for 2.00 not for 2.0001) of having 2 is 2.345135e-07.
\(P(Y=y_1, Y=y_2, \dots, Y=y_{10} | \lambda) = \prod_{i=1}^{10} \frac{e^{-\lambda}\lambda^{y_i}}{y_i!}\)
Observations are independent (One occurrence does not affect the other) Identically distributed (Distributions are identical) It is plausible in this case to assume joint probability distribution.
\(L(\lambda | Y1,Y2,dots, Y10)=\frac{e^{-10\lambda}\lambda^{14}}{144}\)
\(log(\lambda)=log(\frac{e^{-10\lambda}\lambda^{14}}{144})\)
\(log(\lambda)=-10\lambda+14log(\lambda))\)
Differentiate now (Although I haven’t recalled this math function, I have done all the latex typing myself)
\(\frac{dl(\lambda)}{d\lambda}=-10+\frac{14}{\lambda}\)
\(-10+\frac{14}{\lambda}=0\)
\(\lambda=1.4\)
curve(
expr=(exp(-10*x)*x^(14))/(144),
from=0,
to=4,
main="Likelihood function",
ylab="likelihood",
xlab="lambda"
)
abline(v=1.4, col="brown")
curve(
expr=log((exp(-10*x)*x^(14))/(144)),
from=0,
to=4,
main="Log likelihood function",
ylab="log likelihood",
xlab="lambda"
)
abline(v=1.4, col="brown")
The maximum point of the log-likelihood function corresponds to the same value of λ=1.4 that maximized the likelihood function.
sum(2,1,2,1,1,3,0,1,3,0)/10
## [1] 1.4
It results the same value.
\((\frac{n^{2}}{\sum Yi})^{-1}=\frac{14}{100}=0.14\)
\(\bar{\lambda}=Z_{1-\alpha/2}=1.4\pm1.96*\sqrt{0.14}\)
1.96*sqrt(0.14)
## [1] 0.7333648
1.4+1.96*sqrt(0.14)
## [1] 2.133365
1.4-1.96*sqrt(0.14)
## [1] 0.6666352
1.4 95%CI(0.67,2.13)
Across repeated samples with same methods, 95 per cent of the times the true population mean would fall within confidence intervals between 0.67 to 2.13.
We have a large confidence interval due small sample size, making out estimate less precise. Secondly, we assumed identical dispersion, which may not be explicitly true about social contexts.
Central limit theorem can be used to construct the confidence interval instead. However, our confidence intervals would be valid as we have proven as the estimate was equal to mean.
#Lambda1
"mean"; sum(2,1,2,1,1,3,0,1,3,0)/10; "variance"; var(c(2,1,2,1,1,3,0,1,3,0)); "bias"; sum(2,1,2,1,1,3,0,1,3,0)/(10)-1.4; "MSE"; var(c(2,1,2,1,1,3,0,1,3,0))+(sum(2,1,2,1,1,3,0,1,3,0)/(10)-1.4)^2
## [1] "mean"
## [1] 1.4
## [1] "variance"
## [1] 1.155556
## [1] "bias"
## [1] 0
## [1] "MSE"
## [1] 1.155556
#Lambda2
"mean"; 2; "var"; var(c(1.4,2)); "bias"; 2-1.4; "MSE"; var(c(1.4,2))+(1.4-2)^2
## [1] "mean"
## [1] 2
## [1] "var"
## [1] 0.18
## [1] "bias"
## [1] 0.6
## [1] "MSE"
## [1] 0.54
#Lambda3
"mean"; 1/10+sum(2,1,2,1,1,3,0,1,3,0)/10; "var"; var(c(2,1,2,1,1,3,0,1,3,0)); "bias"; (1/10+sum(2,1,2,1,1,3,0,1,3,0)/10)-1.4; "MSE"; var(c(2,1,2,1,1,3,0,1,3,0))+((1/10+sum(2,1,2,1,1,3,0,1,3,0)/10)-1.4)^2
## [1] "mean"
## [1] 1.5
## [1] "var"
## [1] 1.155556
## [1] "bias"
## [1] 0.1
## [1] "MSE"
## [1] 1.165556
Let us arrange a table for the values we computed.
matrix(c(
"Lambda", "Mean", "Variance", "Bias", "MSE",
"L1", 1.4, 1.155556, 0, 1.155556,
"L2", 2, 0.18, 0.6, 0.54,
"L3", 1.5, 1.155556, 0.1, 1.165556
), nrow = 4, byrow = TRUE)
## [,1] [,2] [,3] [,4] [,5]
## [1,] "Lambda" "Mean" "Variance" "Bias" "MSE"
## [2,] "L1" "1.4" "1.155556" "0" "1.155556"
## [3,] "L2" "2" "0.18" "0.6" "0.54"
## [4,] "L3" "1.5" "1.155556" "0.1" "1.165556"
According to above estimations, the R code “var” provide n-1 correction for samples. However I couldn’t intuitively decide which formula is appropriate for variance, as it would provide 1.04 when I do hand calculation sigma^2/n VS the 1.156 as shown. I wasn’t sure as we knew all 10 values in this setting, we may go with 1.04, or should we go with the variance. Since we never truly find the sampling variance in life, I have chosen sample variance for this exercise. I greatly appreciate if the teaching team enlighten me on this subject.
My final answer to 2a is here sorry.
\(\bar\lambda_1=\lambda\) \(var=\frac{\lambda}{n}\) \(bias=0\) \(MSE=\frac{\lambda}{n}\) \(\bar\lambda_2=\frac{\lambda}{n}\) \(var=\frac{\lambda}{n^2}\) \(bias=\frac{\lambda}{n}-\lambda\) \(MSE=\frac{\lambda}{n^2}+(\frac{\lambda}{n}-\lambda)^2\) \(\bar\lambda_3=\lambda+\frac{1}{n}\) \(var=\frac{\lambda}{n}\) \(bias=\frac{1}{n}\) \(MSE=\frac{\lambda}{n}+\frac{1}{n^2}\)
The Lambda 1 is a unbiased estimator of the true population mean in this example, and this setting also is the most efficient as we sample all the units of analysis for the study. However, It is a resource and labor intensive approach.
The Lambda 2 is very biased estimator of the true population mean, and this setting is also less efficient due to low sample size. As we sample only one department, we could save enormous resource and time, but yields a very biased estimate.
The lambda 3 is a less biased estimator of the true population mean compared to Lambda 2, and this setting also provide more efficient than Lambda 2. It is a resource and labor intensive approach than lambda 2. (Although I don’t understand fully why 1/n+lambda overcame to the formula math wise, I conceptualize it as, as long as the sample size grew, we care less about the bias in our study and our estimate gets closer to the truth)