There are four principal assumptions:
linearity of the relationship between dependent and independent variables.
statistical independence of the errors with y variable.
homoscedasticity (constant variance) of the errors for all x.
normality of the error distribution.
if independent assumption violated, the estimated standard errors tend to underestimate the true standard error. P value associated thus is lower.
only the prediction errors need to be normally distributed. but with extremely asymmetric or long-tailed, it may be hard to fit them (x and y) into a linear model whose errors will be normally distributed.
Regression is to estimate and/or predict the population mean (expectation) of dependent variable (yi) by a known or a set value of explanatory variables (xi). Population regression line (PRL) is the trajectory of the conditional expectation value given Xi.
E(Y|Xi)=f(Xi)=β1+β2Xi
This is an unknown but fixed value (can be estimated).
the errors ui=yi−ˆyi have equal variance Yi=β1+β2Xi+ui
(using hat to indicate sample) Yi=ˆβ1+ˆβ2Xi+ei
since
ui∼N(0,σ2) or ei∼N(0,ˆσ2) and
i.i.d., independent identically distribution, the probability distributions are all the same and variables are independent of each other.
ui∼i.i.d N(0,σ2)
then
Yi−β1+β2Xi(^Yi)∼i.i.d N(0,σ2)
thence, to minimize Q ∑(Yi−ˆYi)2 to solve b0 and b1.
Min(Q)=∑(Yi−ˆYi)2=∑(Yi−(ˆβ1+ˆβ2Xi))2=f(ˆβ1,ˆβ2)
ˆβ2=∑xiyi∑x2i^β1=ˉYi−ˆβ2ˉXivar(ˆβ2)=σ2ˆβ2=1∑x2i⋅σ2var(ˆβ1)=σ2ˆβ1=∑X2in∑x2i⋅σ2
(for sample)
ˆYi=ˆβ1+ˆβ2Xiei=Yi−ˆYiˆσ2=∑e2in−1=∑(Yi−ˆYi)2n−1
(Yi−¯Yi)=(^Yi−¯Yi)+(Yi−^Yi)∑y2i=∑^yi2+∑e2iTSS=ESS+RSS
r2=ESSTSS=∑(^Yi−ˉY)2∑(Yi−ˉY)2=1−RSSTSS=1−∑(Yi−^Yi)2∑(Yi−ˉY)2
since
^β2∼N(β2,σ2^β2)^β1∼N(β1,σ2^β1)
and Sˆβ2=√1∑x2i⋅ˆσSˆβ1=√∑X2in∑x2i⋅ˆσ
therefore
t∗^β2=^β2−β2S^β2=^β2S^β2=^β2√1∑x2i⋅ˆσ∼t(n−2)t∗^β1=^β1−β1S^β1=^β1S^β1=^β1√∑X2in∑x2i⋅ˆσ∼t(n−2)
since Yi∼i.i.d N(β1+β2Xi,σ2) and
ESS=∑(^Yi−ˉY)2∼χ2(dfESS)RSS=∑(Yi−^Yi)2∼χ2(dfRSS)
therefore F∗=ESS/dfESSRSS/dfRSS=MSSESSMSSRSS=∑(^Yi−ˉY)2/dfESS∑(Yi−^Yi)2/dfRSS=^β22∑x2i∑e2i/(n−2)=^β22∑x2iˆσ2
since
μˆY0=E(ˆY0)=E(ˆβ1+ˆβ2X0)=β1+β2X0=E(Y|X0) and var(ˆY0)=σ2ˆY0=E(ˆβ1+ˆβ2X0)=σ2(1n+(X0−ˉX)2∑x2i) therefore ˆY0∼N(μˆY0,σ2ˆY0)ˆY0∼N(E(Y|X0),σ2(1n+(X0−ˉX)2∑x2i))
then construct t statistic
to estimate CI tˆY0=ˆY0−E(Y|X0)SˆY0∼t(n−2)
ˆY0−t1−α/2(n−2)⋅SˆY0≤E(Y|X0)≤ˆY0+t1−α/2(n−2)⋅SˆY0
since
(Y0−ˆY0)∼N(μ(Y0−ˆY0),σ2(Y0−ˆY0))(Y0−ˆY0)∼N(0,σ2(1+1n+(X0−ˉX)2∑x2i))
and Construct t statistic
tˆY0=ˆY0−E(Y|X0)SˆY0∼t(n−2)
and SˆY0=√ˆσ2(1n+(X0−ˉX)2∑x2i) therefore
ˆY0−t1−α/2(n−2)⋅SˆY0≤E(Y|X0)≤ˆY0+t1−α/2(n−2)⋅SˆY0
it is harder to predict your weight based on your age than to predict the mean weight of people who are your age. so, the interval of individual prediction is wider than those of mean prediction.
Yi=β1+β2X2i+β3X3i+⋯+βkXki+ui
[Y1Y2⋯Yn]=[1X21X31⋯Xk11X22X32⋯Xk2⋯⋯⋯⋯⋯1X2nX3n⋯Xkn][β1β2⋮βk]+[u1u2⋮un]
y=Xβ+u(n×1)(n×k)(k×1)+(n×1)
because u∼N(0,σ2I) populatione∼N(0,σ2I) sample
therefore var−cov(u)=E(uu′)=[σ21σ212⋯σ21nσ221σ22⋯σ22n⋮⋮⋮⋮σ2n1σ2n2⋯σ2n]←(E(ui)=0)=[σ2σ212⋯σ21nσ221σ2⋯σ22n⋮⋮⋮⋮σ2n1σ2n2⋯σ2]←(var(ui)=σ2)=[σ20⋯00σ2⋯0⋮⋮⋮⋮00⋯σ2]←(cov(ui,uj)=0,i≠j)=σ2[10⋯001⋯0⋮⋮⋮⋮00⋯1]=σ2I
Q=∑e2i=e′e=(y−Xˆβ)′(y−Xˆβ)=y′y−2ˆβ′X′y+ˆβ′X′Xˆβ
(population=sample)
∂Q∂ˆβ=0∂(y′y−2ˆβ′X′y+ˆβ′X′Xˆβ)∂ˆβ=0−2X′y+2X′Xˆβ=0−X′y+X′Xˆβ=0X′Xˆβ=X′y ˆβ=(X′X)−1X′y
var−cov(ˆβ)=E((ˆβ−E(ˆβ))(ˆβ−E(ˆβ))′)=E((ˆβ−β)(ˆβ−β)′)=E(((X′X)−1X′u)((X′X)−1X′u)′)=E((X′X)−1X′uu′X(X′X)−1)=(X′X)−1X′E(uu′)X(X′X)−1=(X′X)−1X′σ2IX(X′X)−1=σ2(X′X)−1X′X(X′X)−1=σ2(X′X)−1
(for sample)
where ˆσ2=∑e2in−k=e′en−kE(ˆσ2)=σ2
therefore S2ij(ˆβ)=ˆσ2(X′X)−1=e′en−k(X′X)−1 which is variance-covariance of coefficients
TSS=y′y−nˉY2RSS=ee′=yy′−ˆβ′X′yESS=ˆβ′X′y−nˉY2
R2=ESSTSS=ˆβ′X′y−nˉY2y′y−nˉY2
because u∼N(0,σ2I)ˆβ∼N(β,σ2X′X−1) therefore
(for all coefficients test, vector, see above
S2ˆβ )
tˆβ=ˆβ−βSˆβ∼t(n−k)
(for individual coefficient test)
t∗ˆβ=ˆβ√S2ij(ˆβkk) where S2ij(ˆβkk)=[s2ˆβ1,s2ˆβ2,⋯,s2ˆβk]′
they are on diagonal line of the matrix of
S2(ˆβ)
unrestricted model ui∼i.i.d N(0,σ2)Yi∼i.i.d N(β1+β2Xi+⋯+βkXi,σ2)RSSU=∑(Yi−^Yi)2∼χ2(n−k) restricted model ui∼i.i.d N(0,σ2)Yi∼i.i.d N(β1,σ2)RSSR=∑(Yi−^Yi)2∼χ2(n−1) F test F∗=(RSSR−RSSU)/(k−1)RSSU/(n−k)=ESSU/dfESSURSSU/dfRSSU∼F(dfESSU,dfRSSU)
F∗=ESSU/dfESSURSSU/dfRSSU=(ˆβ′X′y−nˉY2)/(k−1)(yy′−ˆβ′X′y)/(n−k)
since E(ˆY0)=E(X0ˆβ)=X0β=E(Y0)var(ˆY0)=E(X0ˆβ−X0β)2=E(X0(ˆβ−β)(ˆβ−β)′X′0)=EX0((ˆβ−β)(ˆβ−β)′)X′0=σ2X0(X′X)−1X′0 and ˆY0∼N(μˆY0,σ2ˆY0)ˆY0∼N(E(Y0|X0),σ2X0(X′X)−1X′0) construct t statistic
tˆY0=ˆY0−E(Y|X0)SˆY0∼t(n−k)
therefore ˆY0−t1−α/2(n−2)⋅SˆY0≤E(Y|X0)≤ˆY0+t1−α/2(n−2)⋅SˆY0
where SˆY0=√ˆσ2X0(X′X)−1X′0ˆσ2=ee′(n−k)
since e0=Y0−ˆY0
and E(e0)=E(Y0−ˆY0)=E(X0β+u0−X0ˆβ)=E(u0−X0(ˆβ−β))=E(u0−X0(X′X)−1X′u)=0
var(e0)=E(Y0−ˆY0)2=E(e20)=E(u0−X0(X′X)−1X′u)2=σ2(1+X0(X′X)−1X′0)
and
e0∼N(μe0,σ2e0)e0∼N(0,σ2(1+X0(X′X)−1X′0))
construct a t statistic
te0=ˆY0−Y0Se0∼t(n−k)
therefore ˆY0−t1−α/2(n−2)⋅SY0−ˆY0≤(Y0|X0)≤ˆY0+t1−α/2(n−2)⋅SY0−ˆY0
where SY0−ˆY0=Se0=√ˆσ2(1+X0(X′X)−1X′0)ˆσ2=ee′(n−k)