এই নোটে আমাদের আলোচনার সবকিছু গুছিয়ে দেওয়া হলো: Variance, Standard Deviation, Standard Error, population vs sample (\(n\) vs \(n-1\)), t-test (pooled) বনাম Welch’s t-test, Levene’s test, one/left/right/two-tailed সিদ্ধান্তের নিয়ম, এবং চিত্রসহ বোঝানো।
লক্ষ্য: কনসেপ্ট পরিষ্কার করা + নিজের ডেটাতে সরাসরি রান করার মতো R কোড।
Data: 50, 55, 60, 65, 70
\[ \mu = 60,\quad \sigma^2 = 50,\quad \sigma = \sqrt{50} \approx 7.07 \]
Sample: 50, 55, 60 (population থেকে নেয়া)
\[ \bar x = 55,\quad s^2 = \frac{(50-55)^2 + (55-55)^2 + (60-55)^2}{3-1} = 25,\quad s=5 \]
Bottom line: Population → ভাগ N, Sample → ভাগ (n−1)।
pop <- c(50,55,60,65,70)
mu <- mean(pop)
var_pop <- mean( (pop - mu)^2 )
sd_pop <- sqrt(var_pop)
samp <- c(50,55,60)
xbar <- mean(samp)
var_samp <- sum( (samp - xbar)^2 )/(length(samp)-1)
sd_samp <- sqrt(var_samp)
se_samp <- sd_samp/sqrt(length(samp))
data.frame(
Measure = c("Population Mean","Population Variance","Population SD",
"Sample Mean","Sample Variance","Sample SD","Sample SE"),
Value = c(mu, var_pop, sd_pop, xbar, var_samp, sd_samp, se_samp)
)
কেন দরকার? দুই গ্রুপের variance সমান কি না তা
যাচাই।
- \(H_0\): variances equal
- \(H_1\): variances not equal
R-এ দু’ভাবে করা যায়:
car::leveneTest (ইনস্টল প্রয়োজন):# install.packages("car")
library(car)
group <- factor(rep(c("A","B"), each=5))
x <- c(85,90,88,75,95, 80,70,78,65,74)
car::leveneTest(x ~ group)
lev_test <- function(values, groups){
# groups: factor
med_by_g <- tapply(values, groups, median)
z <- abs(values - med_by_g[as.character(groups)])
# one-way ANOVA on z
fit <- aov(z ~ groups)
summary(fit)
}
group <- factor(rep(c("A","B"), each=5))
x <- c(85,90,88,75,95, 80,70,78,65,74)
lev_test(x, group)
## Df Sum Sq Mean Sq F value Pr(>F)
## groups 1 0.4 0.40 0.021 0.887
## Residuals 8 149.2 18.65
Interpretation: p-value > 0.05 হলে equal variance ধরতে পারবেন; নইলে unequal।
ডেটা (আমাদের আলোচনার উদাহরণ): - Group A: 85, 90, 88,
75, 95
- Group B: 80, 70, 78, 65, 74
\[ t = \frac{\bar X_1 - \bar X_2}{s_p \sqrt{\frac{1}{n_1}+\frac{1}{n_2}}},\quad s_p^2 = \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2},\quad df = n_1+n_2-2 \]
A <- c(85,90,88,75,95)
B <- c(80,70,78,65,74)
t.test(A, B, var.equal = TRUE) # pooled t-test
##
## Two Sample t-test
##
## data: A and B
## t = 3.0756, df = 8, p-value = 0.01522
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 3.30297 23.09703
## sample estimates:
## mean of x mean of y
## 86.6 73.4
\[ t = \frac{\bar X_1 - \bar X_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}},\; df = \frac{(s_1^2/n_1 + s_2^2/n_2)^2}{(s_1^2/n_1)^2/(n_1-1) + (s_2^2/n_2)^2/(n_2-1)} \]
t.test(A, B, var.equal = FALSE) # Welch (default)
##
## Welch Two Sample t-test
##
## data: A and B
## t = 3.0756, df = 7.6897, p-value = 0.01597
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 3.233029 23.166971
## sample estimates:
## mean of x mean of y
## 86.6 73.4
রুল অফ থাম্ব: Levene p>0.05 হলে pooled করতে পারেন; p<0.05 হলে Welch করুন। Welch সবসময় safe (variance equal ধরা লাগে না)।
একই \(\alpha\) আর df-এ one-tailed critical value ছোট হয়; two-tailed-এ বড় হয় (কারণ error দুই পাশে ভাগ হয়)।
নিচের ফাংশনগুলো দিয়ে আপনি left/right/two-tailed এক ক্লিকে আঁকতে পারবেন।
curve_t <- function(df=9, type=c("left","right","two"),
alpha=0.05, t_value=NULL){
type <- match.arg(type)
xs <- seq(-5, 5, length.out = 1000)
ys <- dt(xs, df)
plot(xs, ys, type="l", lwd=2, col="blue",
ylab="Density", xlab="t", main=paste0("t (df=",df,") — ", type,"-tailed"))
if(type=="left"){
tcrit <- qt(alpha, df)
polygon(c(min(xs), xs[xs<=tcrit], tcrit),
c(0, ys[xs<=tcrit], 0), col=rgb(1,0,0,0.5), border=NA)
abline(v=tcrit, lty=2)
} else if(type=="right"){
tcrit <- qt(1-alpha, df)
polygon(c(tcrit, xs[xs>=tcrit], max(xs)),
c(0, ys[xs>=tcrit], 0), col=rgb(1,0,0,0.5), border=NA)
abline(v=tcrit, lty=2)
} else {
tcrit <- qt(1-alpha/2, df)
xl <- -tcrit; xr <- tcrit
polygon(c(min(xs), xs[xs<=xl], xl),
c(0, ys[xs<=xl], 0), col=rgb(1,0,0,0.5), border=NA)
polygon(c(xr, xs[xs>=xr], max(xs)),
c(0, ys[xs>=xr], 0), col=rgb(1,0,0,0.5), border=NA)
abline(v=xl, lty=2); abline(v=xr, lty=2)
}
if(!is.null(t_value)){
abline(v=t_value, col="purple", lwd=2, lty=3)
legend("topright", legend=c(sprintf("t_crit = %.3f", if(type=="two") qt(1-alpha/2, df) else if(type=="left") qt(alpha, df) else qt(1-alpha, df)),
sprintf("t_value = %.3f", t_value)),
bty="n")
}
}
par(mfrow=c(1,3))
curve_t(df=9, type="left", alpha=0.05, t_value=-2.5)
curve_t(df=9, type="right", alpha=0.05, t_value= 2.3)
curve_t(df=9, type="two", alpha=0.05, t_value= 2.5)
par(mfrow=c(1,1))
Left-tailed (গড় কম কি না):
x <- c(68,65,70,66,64)
t.test(x, mu=70, alternative = "less")
##
## One Sample t-test
##
## data: x
## t = -3.1568, df = 4, p-value = 0.01714
## alternative hypothesis: true mean is less than 70
## 95 percent confidence interval:
## -Inf 68.89607
## sample estimates:
## mean of x
## 66.6
Right-tailed (গড় বেশি কি না):
y <- c(0.62,0.55,0.68,0.60,0.66,0.71,0.63,0.58,0.65,0.69)
t.test(y, mu=0.50, alternative = "greater")
##
## One Sample t-test
##
## data: y
## t = 8.5311, df = 9, p-value = 6.599e-06
## alternative hypothesis: true mean is greater than 0.5
## 95 percent confidence interval:
## 0.6075622 Inf
## sample estimates:
## mean of x
## 0.637
Two-tailed (সমান কি না):
milk <- c(495, 505, 498, 503, 492, 510, 501, 497, 499, 504)
t.test(milk, mu=500, alternative="two.sided")
##
## One Sample t-test
##
## data: milk
## t = 0.23886, df = 9, p-value = 0.8166
## alternative hypothesis: true mean is not equal to 500
## 95 percent confidence interval:
## 496.6117 504.1883
## sample estimates:
## mean of x
## 500.4
এই Rmd ফাইলটি knit করলে আপনি HTML নোট পাবেন: ফর্মুলা, ব্যাখ্যা, কোড, আর গ্রাফ—সব এক জায়গায়। নিজের ডেটা বসিয়ে রান করে দেখুন। শুভকামনা! ✨