This part, we will focus on the panle data. panel data consit of repeated observations on the same individual over time. Right now, we are dealing with micro panel data, which means we don’t have to worry about unit root issue from time series problems. This note includes two parts. Part 1 will just replicate the material from Lecture 4. Part 2 will solve the Lab exercise.
Before doing any analysis, you have to clean your data, which means you have to organize your data which the computer can read and manuplate. In this part, you can use any tool, either Stata or R or Excel. I will focus on Stata and R. Later you will see R programming has much advantage on cleaning and reshaping the data.
. cd "Z:\L14009\Dataset_part_1"
Z:\L14009\Dataset_part_1
.
. use bhps_1991.dta
. append using bhps_1992.dta
. append using bhps_1993.dta
.
. sort pid
.
. describe
Contains data from bhps_1991.dta
obs: 29,709
vars: 142 3 Oct 2008 12:53
size: 5,496,165
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
adoid byte %8.0g adoid date of interview: day
adoim byte %8.0g adoim date of interview: month
aplbornc byte %8.0g aplbornc country of birth
asex byte %8.0g asex sex
apaju byte %8.0g apaju father not working when resp.
aged 14
apasoc int %8.0g apasoc father's occupation (soc), resp.
aged 14
apasemp byte %8.0g apasemp father self employed, resp. aged
14
apaboss byte %8.0g apaboss father had employees, resp. aged
14
apamngr byte %8.0g apamngr father was manager, resp. aged 14
amaju byte %8.0g amaju mother not working when resp.
aged 14
amasoc int %8.0g amasoc mother's occupation (soc), resp.
aged 14
amasemp byte %8.0g amasemp mother self employed, resp. aged
14
amaboss byte %8.0g amaboss mother had employees, resp. aged
14
amamngr byte %8.0g amamngr mother was manager, resp. aged 14
amlstat byte %8.0g amlstat present legal marital status
aschool byte %8.0g aschool never went to /still at school
ascend byte %8.0g ascend school leaving age
asctype byte %8.0g asctype type of school attended
ascnow byte %8.0g ascnow still at school
afetype byte %8.0g afetype type of further education
attended
afenow byte %8.0g afenow still in further education
afeend byte %8.0g afeend further education leaving age
asmoker byte %8.0g asmoker smoker
ancigs byte %8.0g ancigs number of cigarettes smoked
arach16 byte %8.0g arach16 responsible for dependent child
under 16
ajbhas byte %8.0g ajbhas did paid work last week
ajboff byte %8.0g ajboff no work last week but has job
ajboffy byte %8.0g ajboffy reason off work last week
ajbsoc int %8.0g ajbsoc occupation (soc): current main
job
ajbsic int %8.0g ajbsic industry (sic) of employer:
current job
ajbsemp byte %8.0g ajbsemp employee or self-employed:
current job
ajbsize byte %8.0g ajbsize number employed at workplace:
current jo
ajbhrs byte %8.0g ajbhrs no. of hours normally worked per
week
atujbpl byte %8.0g atujbpl union or staff association at
workplace
atuin1 byte %8.0g atuin1 member of workplace union
ajbbgd byte %8.0g ajbbgd day started current job
ajbbgm byte %8.0g ajbbgm month started current job
ahunurs byte %8.0g ahunurs who cares for ill children
ajbstat byte %8.0g ajbstat current labour force status
arace byte %8.0g arace ethnic group membership
pid long %12.0g cross-wave person identifier
aregion byte %8.0g aregion region / metropolitan area
aage byte %8.0g aage age at date of interview
anchild byte %8.0g anchild number of own children in
household
aqfedhi byte %8.0g aqfedhi highest educational qualification
aqfvoc byte %8.0g aqfvoc has vocational qualifications
aqfachi byte %8.0g aqfachi highest academic qualification
ajbft byte %8.0g ajbft employed full time
apaygu double %10.0g apaygu usual gross pay per month:
current job
acjsten int %8.0g acjsten length (days) of current labour
market
ayr2uk4 int %8.0g ayr2uk4 year came to britain: 4 digit
ajbbgy4 int %8.0g ajbbgy4 year started current job: 4 digit
bdoid byte %8.0g date of interview: day
bdoim byte %8.0g bdoim date of interview: month
bivlyr byte %8.0g bivlyr ic: interviewed last year
bsex byte %8.0g bsex sex
bjbstat byte %8.0g bjbstat current economic activity
bplbornc byte %8.0g bplbornc country of birth
brace byte %8.0g brace ethnic group membership
bschool byte %8.0g bschool never went to /still at school
bscend byte %8.0g bscend school leaving age
bsctype byte %8.0g bsctype type of school attended
bscnow byte %8.0g bscnow still at school
bfetype byte %8.0g bfetype type of further education
attended
bfenow byte %8.0g bfenow still in further education
bfeend byte %8.0g bfeend further education leaving age
bsmoker byte %8.0g bsmoker smoker
bncigs byte %8.0g bncigs number of cigarettes smoked
bmlstat byte %8.0g bmlstat present legal marital status
bjbhas byte %8.0g bjbhas did paid work last week
bjboff byte %8.0g bjboff no work last week but has job
bjboffy byte %8.0g bjboffy reason off work last week
bjbsoc int %8.0g bjbsoc occupation (soc): current main
job
bjbsic int %8.0g bjbsic industry (sic) of employer:
current job
bjbsemp byte %8.0g bjbsemp employee or self-employed:
current job
bjbsize byte %8.0g bjbsize no. employed at workplace:
current job
bjbhrs byte %8.0g bjbhrs no. of hours normally worked per
week
bjbbgd byte %8.0g bjbbgd day started current job
bjbbgm byte %8.0g bjbbgm month started current job
btujbpl byte %8.0g btujbpl union or staff association at
workplace
btuin1 byte %8.0g btuin1 member of workplace union
bjbed byte %8.0g bjbed had work related training since
1.9.91
bhunurs byte %8.0g bhunurs who cares for ill children
bage byte %8.0g bage age at date of interview
bnchild byte %8.0g bnchild number of own children in
household
brach16 byte %8.0g brach16 whether responsible adult for
child
bsampst byte %8.0g bsampst sample membership status
bregion byte %8.0g bregion region / metropolitan area
bqfedhi byte %8.0g bqfedhi highest educational qualification
bqfvoc byte %8.0g bqfvoc has vocational qualifications
bqfachi byte %8.0g bqfachi highest academic qualification
bjbft byte %8.0g bjbft employed full time
bpaygu double %10.0g bpaygu usual gross pay per month:
current job
bcjsten int %8.0g bcjsten length (days) of current labour
market
bdoiy4 int %8.0g bdoiy4 date of interview: 4 digit year
byr2uk4 int %8.0g byr2uk4 year came to britain: 4 digit
bjbbgy4 int %8.0g bjbbgy4 year started current job: 4 digit
cdoid byte %8.0g date of interview: day
cdoim byte %8.0g cdoim date of interview: month
civievr byte %8.0g civievr ever interviewed
csex byte %8.0g csex sex
cjbstat byte %8.0g cjbstat current economic activity
cmlstat byte %8.0g cmlstat present legal marital status
cplbornc byte %8.0g cplbornc country of birth
crace byte %8.0g crace ethnic group membership
cschool byte %8.0g cschool never went to /still at school
cscend byte %8.0g cscend school leaving age
csctype byte %8.0g csctype type of school attended
cscnow byte %8.0g cscnow still at school
cfetype byte %8.0g cfetype type of further education
attended
cfenow byte %8.0g cfenow still in further education
cfeend byte %8.0g cfeend further education leaving age
csmoker byte %8.0g csmoker smoker
cncigs byte %8.0g cncigs number of cigarettes smoked
cjbhas byte %8.0g cjbhas did paid work last week
cjboff byte %8.0g cjboff no work last week but has job
cjboffy byte %8.0g cjboffy reason off work last week
cjbsoc int %8.0g cjbsoc occupation (soc): current main
job
cjbsic int %8.0g cjbsic industry (sic) of employer:
current job
cjbsemp byte %8.0g cjbsemp employee or self-employed:
current job
cjbsize byte %8.0g cjbsize no. employed at workplace:
current job
cjbhrs byte %8.0g cjbhrs no. of hours normally worked per
week
cjbbgd byte %8.0g cjbbgd day started current job
cjbbgm byte %8.0g cjbbgm month started current job
ctujbpl byte %8.0g ctujbpl union or staff association at
workplace
ctuin1 byte %8.0g ctuin1 member of workplace union
cjbed byte %8.0g cjbed had work related training since
1.9.92
chunurs byte %8.0g chunurs who cares for ill children
cage byte %8.0g cage age at date of interview
cnchild byte %8.0g cnchild number of own children in
household
crach16 byte %8.0g crach16 whether responsible adult for
child
csampst byte %8.0g csampst sample membership status
cregion byte %8.0g cregion region / metropolitan area
cqfedhi byte %8.0g cqfedhi highest educational qualification
cqfvoc byte %8.0g cqfvoc has vocational qualifications
cqfachi byte %8.0g cqfachi highest academic qualification
cjbft byte %8.0g cjbft employed full time
cpaygu double %10.0g cpaygu usual gross pay per month:
current job
ccjsten int %8.0g ccjsten length (days) current labour
market sp.
cdoiy4 int %8.0g cdoiy4 date of interview: 4 digit year
cyr2uk4 int %8.0g cyr2uk4 year came to britain: 4 digit
cjbbgy4 int %8.0g cjbbgy4 year started current job: 4 digit
-------------------------------------------------------------------------------
Sorted by: pid
Note: Dataset has changed since last saved.
.
. * If you check variable names carefully, you will find we have the format lik
> e this a_, b_, c_;
. * the content after prefix a/b/c is the same.
. * a means 1991; b means 1992; c means 1993
.
. * Let us list some data to take a look.
.
.
. list pid *age *sex, sepby(pid), if _n <=10
+----------------------------------------------------------+
| pid aage bage cage asex bsex csex |
|----------------------------------------------------------|
1. | 10002251 91 . . female . . |
|----------------------------------------------------------|
2. | 10004491 . 29 . . male . |
3. | 10004491 28 . . male . . |
|----------------------------------------------------------|
4. | 10004521 . . 28 . . male |
5. | 10004521 26 . . male . . |
6. | 10004521 . 27 . . male . |
|----------------------------------------------------------|
7. | 10007857 57 . . female . . |
8. | 10007857 . . 59 . . female |
9. | 10007857 . 59 . . female . |
|----------------------------------------------------------|
10. | 10014578 . 55 . . female . |
+----------------------------------------------------------+
.
. * hope you know the meaning of _n, which is the counting function.
.
You can see, we have three variables measuing age and sex. This is because we appended three different time datasets. To clean our data, we need see the structure of dataset. To check how to use rename or renpfix, check this link
. cd "Z:\L14009\Dataset_part_1"
Z:\L14009\Dataset_part_1
.
. clear
. use bhps_1991
. renpfix a
. generate year=1991
. save wave1, replace
file wave1.dta saved
.
. * renpfix a means: drop a
.
. clear
. use bhps_1992
. renpfix b
. generate year=1992
. save wave2, replace
file wave2.dta saved
.
. clear
. use bhps_1993
. renpfix c
. generate year=1993
. save wave3, replace
file wave3.dta saved
.
. list pid year sex age, sepby(pid), if _n <=10
+--------------------------------+
| pid year sex age |
|--------------------------------|
1. | 10004521 1993 male 28 |
|--------------------------------|
2. | 10007857 1993 female 59 |
|--------------------------------|
3. | 20002092 1993 female 26 |
|--------------------------------|
4. | 10014578 1993 female 56 |
|--------------------------------|
5. | 10014608 1993 male 59 |
|--------------------------------|
6. | 10016813 1993 male 37 |
|--------------------------------|
7. | 10016848 1993 female 33 |
|--------------------------------|
8. | 10017933 1993 female 51 |
|--------------------------------|
9. | 10017968 1993 male 48 |
|--------------------------------|
10. | 10019057 1993 female 61 |
+--------------------------------+
.
. * now we have variable = 1993, and also no aage,or cage, or asex, etc.
.
. ** The data is quite clean now.
.
Once we clean the data, we can combine them together.
.
. cd "Z:\L14009\Dataset_part_1"
Z:\L14009\Dataset_part_1
.
. use wave1
. append using wave2
. append using wave3
. sort pid year
. list pid year sex age, sepby(pid), if _n <=10
+--------------------------------+
| pid year sex age |
|--------------------------------|
1. | 10002251 1991 female 91 |
|--------------------------------|
2. | 10004491 1991 male 28 |
3. | 10004491 1992 male 29 |
|--------------------------------|
4. | 10004521 1991 male 26 |
5. | 10004521 1992 male 27 |
6. | 10004521 1993 male 28 |
|--------------------------------|
7. | 10007857 1991 female 57 |
8. | 10007857 1992 female 59 |
9. | 10007857 1993 female 59 |
|--------------------------------|
10. | 10014578 1991 female 54 |
+--------------------------------+
.
. save wave_final, replace
file wave_final.dta saved
.
The crucial points to rember in creating panel data are:
You need a variable which identifies each unit and variable which identifies each time period
You can append cross-section datasets to each other to create the panle only if all your variables have the same names in every wave.
We can use all the basic comands we discussed in Lecture 2 for analysing panel data. However, it is crucial to realise that we now have repeated observation on the same individuals.
. cd "Z:\L14009\Dataset_part_1"
Z:\L14009\Dataset_part_1
.
. use wave_final
.
. tab sex
sex | Freq. Percent Cum.
----------------+-----------------------------------
male | 13,939 46.92 46.92
female | 15,770 53.08 100.00
----------------+-----------------------------------
Total | 29,709 100.00
.
. tab year sex
| sex
year | male female | Total
-----------+----------------------+----------
1991 | 4,833 5,431 | 10,264
1992 | 4,630 5,215 | 9,845
1993 | 4,476 5,124 | 9,600
-----------+----------------------+----------
Total | 13,939 15,770 | 29,7.
.
. xtset pid year
panel variable: pid (unbalanced)
time variable: year, 1991 to 1993, but with gaps
delta: 1 unit
.
. * set the dataset as panel data
.
. xtdes
pid: 10002251, 10004491, ..., 37763717 n = 11754
year: 1991, 1992, ..., 1993 T = 3
Delta(year) = 1 unit
Span(year) = 3 periods
(pid*year uniquely identifies each observation)
Distribution of T_i: min 5% 25% 50% 75% 95% max
1 1 2 3 3 3 3
Freq. Percent Cum. | Pattern
---------------------------+---------
8170 69.51 69.51 | 111
1045 8.89 78.40 | 1..
800 6.81 85.21 | 11.
615 5.23 90.44 | ..1
566 4.82 95.25 | .11
309 2.63 97.88 | .1.
249 2.12 100.00 | 1.1
---------------------------+---------
11754 100.00 | XXX
.
. drop if paygu <= 0
(15,037 observations deleted)
.
. xtsum paygu
Variable | Mean Std. Dev. Min Max | Observations
-----------------+--------------------------------------------+----------------
paygu overall | 962.6754 719.4627 8.666667 13010.01 | N = 14672
between | 698.7412 8.666667 9173.723 | n = 6590
within | 173.1106 -1205.659 4988.588 | T-bar = 2.2264
.
. * be careful about 'between' and 'within'
.
. * see this example
.
. xttab region
Overall Between Within
region | Freq. Percent Freq. Percent Percent
----------+-----------------------------------------------------
inner lo | 573 3.91 293 4.45 93.40
outer lo | 940 6.41 434 6.59 95.47
r. of so | 2841 19.36 1299 19.71 97.66
south we | 1303 8.88 595 9.03 98.60
east ang | 511 3.48 234 3.55 98.43
east mid | 1169 7.97 539 8.18 97.71
west mid | 527 3.59 254 3.85 98.95
r. of we | 788 5.37 354 5.37 98.59
greater | 620 4.23 277 4.20 98.32
merseysi | 283 1.93 131 1.99 98.85
r. of no | 660 4.50 295 4.48 96.38
south yo | 376 2.56 167 2.53 97.50
west yor | 523 3.56 242 3.67 98.28
r. of yo | 493 3.36 222 3.37 96.62
tyne & w | 345 2.35 159 2.41 98.22
r. of no | 610 4.16 270 4.10 98.46
wales | 694 4.73 314 4.76 99.26
scotland | 1416 9.65 660 10.02 99.14
----------+-----------------------------------------------------
Total | 14672 100.00 6739 102.26 97.79
(n = 6590)
The between variation is the variation of individual mean between individuals (means cancel out effect of time), between is about individuals.
The within variation tells us how much the value varis across time for each person. within is about time;
In the example above, the overal column tells us that in the data there are 573 person-years where the person lives in region 1, which is 3.91% of the total number of person-years.
Stata has many built-in functions for calculating lags and leads which use time-series operators. For example, L.varname is the first lag of varname. Leads are obtained by using F.varname and differences with D.varname, and L3.varname is the tree-period lag.
Suppose we have panel data for just two periods t=1, 2. We can write a simple model with a single explanatory variable as \[y_{it} = \beta_{0} + \delta d_{2t} + \beta x_{it} + \alpha_{i} + u_{it} , t = 1, 2 \]
In this notation \(i\) denotes the person, firm, city and so on, and t denotes the time periods. The variable \(d_{2t}\) is a dummy variable which equals 0 when t = 1, and 1 when t = 1. By doing this, we can treat the data as if it came from a single data set, and treat the variation in \(y_{it}\) and \(x_{it}\) across t in the same way as the variation across i.
Now, let’s run pooled regression
. cd "Z:\L14009\Dataset_part_1"
Z:\L14009\Dataset_part_1
.
. use wave_final
.
. drop if year == 1991
(10,264 observations deleted)
.
. drop if age<16|age>60
(4,292 observations deleted)
. drop if jbed<0
(4,735 observations deleted)
.
. gen training = 1 if jbed ==1
(7,050 missing values generated)
. replace training = 0 if jbed == 2
(7,050 real changes made)
.
. gen female = 1 if sex == 2
(5,319 missing values generated)
. replace female = 0 if sex == 1
(5,319 real changes made)
.
. gen d2 = 1 if year == 1993
(5,298 missing values generated)
. replace d2 = 0 if year == 1992
(5,298 real changes made)
.
. gen lnpay = ln(paygu)
(1,278 missing values generated)
.
. save wave_2year, replace
file wave_2year.dta saved
.
. * the regression will be
.
. regress lnpay d2 female age training
Source | SS df MS Number of obs = 9,140
-------------+---------------------------------- F(4, 9135) = 790.78
Model | 1829.6012 4 457.400299 Prob > F = 0.0000
Residual | 5283.86235 9,135 .578419523 R-squared = 0.2572
-------------+---------------------------------- Adj R-squared = 0.2569
Total | 7113.46354 9,139 .778363447 Root MSE = .76054
------------------------------------------------------------------------------
lnpay | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
d2 | .0207247 .0159143 1.30 0.193 -.0104709 .0519202
female | -.708709 .0159335 -44.48 0.000 -.7399422 -.6774758
age | .0153719 .0006934 22.17 0.000 .0140128 .0167311
training | .4393872 .0167418 26.24 0.000 .4065694 .4722049
_cons | 6.258067 .0293321 213.35 0.000 6.200569 6.315564
------------------------------------------------------------------------------
.
. * the regression without assuming same individual across time are independent
>
.
. regress lnpay d2 female age training, vce(cluster pid)
Linear regression Number of obs = 9,140
F(4, 5434) = 573.27
Prob > F = 0.0000
R-squared = 0.2572
Root MSE = .76054
(Std. Err. adjusted for 5,435 clusters in pid)
------------------------------------------------------------------------------
| Robust
lnpay | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
d2 | .0207247 .0100369 2.06 0.039 .0010483 .040401
female | -.708709 .0198915 -35.63 0.000 -.7477043 -.6697136
age | .0153719 .0009572 16.06 0.000 .0134955 .0172484
training | .4393872 .0179993 24.41 0.000 .4041013 .4746731
_cons | 6.258067 .0402289 155.56 0.000 6.179202 6.336932
------------------------------------------------------------------------------
.
Actually, we can potentially use panel data to deal with endogenous problem
It turns out that it is straightforward to eliminate any corrleation between the fixed effect \(\alpha_{i}\) and \(x_{it}\) using panle data. Writing out the model equation separatley for each year we get:
\[y_{i2} = \beta_{0} + \delta + \beta x_{i2} + \alpha_{i} + u_{i2} , t = 2 \]
\[y_{i1} = \beta_{0} + \beta x_{i1} + \alpha_{i} + u_{i1} , t = 1\]
Subtracting \(y_{i1}\) from \(y_{i2}\) we get:
\[y_{i2} - y_{i1} = \delta + \beta (x_{i2} -x_{i1}) + (u_{i2} - u_{i1})\]
Now the unobserved fixed effect \(\alpha{i}\) has been differenced away.
.
. cd "Z:\L14009\Dataset_part_1"
Z:\L14009\Dataset_part_1
.
. use wave_2year
.
. xtset pid year
panel variable: pid (unbalanced)
time variable: year, 1992 to 1993
delta: 1 unit
.
. * you have to set the time properity every time you use panle data or time se
> ries data
.
. list pid year D.lnpay age D.age training D.training, sepby(pid), if _n <= 10
+---------------------------------------------------------------+
| D. D. D.|
| pid year lnpay age age training training |
|---------------------------------------------------------------|
1. | 10007857 1992 . 59 . 1 . |
2. | 10007857 1993 -.0066824 59 0 1 0 |
|---------------------------------------------------------------|
3. | 10014608 1992 . 58 . 0 . |
4. | 10014608 1993 .0862846 59 1 0 0 |
|---------------------------------------------------------------|
5. | 10016813 1992 . 37 . 0 . |
|---------------------------------------------------------------|
6. | 10016848 1992 . 33 . 1 . |
|---------------------------------------------------------------|
7. | 10017933 1992 . 49 . 0 . |
8. | 10017933 1993 . 51 2 0 0 |
|---------------------------------------------------------------|
9. | 10017968 1992 . 46 . 0 . |
10. | 10017968 1993 .1077027 48 2 0 0 |
+---------------------------------------------------------------+
.
. regress D.lnpay D.female D.training D.age
note: D.female omitted because of collinearity
Source | SS df MS Number of obs = 3,705
-------------+---------------------------------- F(2, 3702) = 1.79
Model | .418572903 2 .209286451 Prob > F = 0.1667
Residual | 432.190724 3,702 .116745198 R-squared = 0.0010
-------------+---------------------------------- Adj R-squared = 0.0004
Total | 432.609297 3,704 .116795167 Root MSE = .34168
------------------------------------------------------------------------------
D.lnpay | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female |
D1. | 0 (omitted)
|
training |
D1. | .0016944 .0107622 0.16 0.875 -.019406 .0227948
|
age |
D1. | -.0378549 .020068 -1.89 0.059 -.0772004 .0014905
|
_cons | .0939594 .0207339 4.53 0.000 .0533083 .1346104
------------------------------------------------------------------------------
.
. predict uhat, residuals
(6,713 missing values generated)
.
. ** We can also use within-groups estimator
.
. xtreg lnpay d2 female age training, fe
note: female omitted because of collinearity
Fixed-effects (within) regression Number of obs = 9,140
Group variable: pid Number of groups = 5,435
R-sq: Obs per group:
within = 0.0274 min = 1
between = 0.0481 avg = 1.7
overall = 0.0333 max = 2
F(3,3702) = 34.73
corr(u_i, Xb) = -0.5742 Prob > F = 0.0000
------------------------------------------------------------------------------
lnpay | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
d2 | .0939594 .0207339 4.53 0.000 .0533083 .1346104
female | 0 (omitted)
age | -.0378549 .020068 -1.89 0.059 -.0772004 .0014905
training | .0016944 .0107622 0.16 0.875 -.019406 .0227948
_cons | 7.922322 .7131802 11.11 0.000 6.524057 9.320587
-------------+----------------------------------------------------------------
sigma_u | 1.0943318
sigma_e | .24160422
rho | .95352258 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(5434, 3702) = 19.58 Prob > F = 0.0000
.
. ** we can use random effect
.
. xtreg lnpay d2 female age training, re
Random-effects GLS regression Number of obs = 9,140
Group variable: pid Number of groups = 5,435
R-sq: Obs per group:
within = 0.0143 min = 1
between = 0.2305 avg = 1.7
overall = 0.2216 max = 2
Wald chi2(4) = 1671.92
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
------------------------------------------------------------------------------
lnpay | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
d2 | .0369959 .0057385 6.45 0.000 .0257487 .0482432
female | -.7234406 .0212953 -33.97 0.000 -.7651786 -.6817026
age | .0172896 .0009032 19.14 0.000 .0155193 .0190599
training | .0945188 .0101153 9.34 0.000 .0746932 .1143444
_cons | 6.267822 .0354034 177.04 0.000 6.198433 6.337212
-------------+----------------------------------------------------------------
sigma_u | .74015203
sigma_e | .24160422
rho | .90370698 (fraction of variance due to u_i)
------------------------------------------------------------------------------
.
. ** Let us check the residuals
.
. qnorm uhat
.
. graph export "xtreg_re1.png", replace
(file xtreg_re1.png written in PNG format)
.
This is the end of part I, you should read more from Wooldrige textbook and CT(2009), and understand the logic behind the method and model.