Julio César Martínez Sánchez
jcms2665@gmail.comCargar y filtrar la base
Subpoblaciones (Problemas)
use "C:\Users\JC\Desktop\Estadística\BUAP\sdemt215.dta", clear
gen filtro=((c_res==1 | c_res==3) & r_def==0 & (eda>=15 & eda<=98))
tab filtro [fw=fac], m
filtro | Freq. Percent Cum.
------------+-----------------------------------
0 | 34,673,635 28.22 28.22
1 | 88,192,253 71.78 100.00
------------+-----------------------------------
Total |122,865,888 100.00
svy
cuya documentación completa se encuentra en: STATA SURVEY DATA REFERENCE MANUAL. Además, para identificar las variables de diseño se debe consultar la descripción de archivos de la ENOE. svyset upm [pw=fac], strata(est_d) vce(linearized)
pweight: fac
VCE: linearized
Single unit: missing
Strata 1: est_d
SU 1: upm
FPC 1:
svy, subpop(filtro): tab clase2, format(%11.3g) count se cv ci level(90)
Number of strata = 446 Number of obs = 403865
Number of PSUs = 18440 Population size = 122865888
Subpop. no. of obs = 291231
Subpop. size = 88192253
Design df = 17994
----------------------------------------------------------------------
clase2 | count se cv lb ub
----------+-----------------------------------------------------------
0 | 0 0 0 0
1 | 50336088 212129 .421 49987150 50685026
2 | 2287633 43791 1.91 2215599 2359667
3 | 5884296 88324 1.5 5739009 6029583
4 | 29684236 175752 .592 29395134 29973338
|
Total | 88192253
----------------------------------------------------------------------
Key: count = weighted counts
se = linearized standard errors of weighted counts
cv = coefficients of variation of weighted counts
lb = lower 90% confidence bounds for weighted counts
ub = upper 90% confidence bounds for weighted counts
Table contains a zero in the marginals.
Statistics cannot be computed.
gen int f2=((c_res==1 | c_res==3) & r_def==0 & (eda>=15 & eda<=97))
svy, subpop (f2): mean eda if (clase1==1), level(90)
estat cv
(running mean on estimation sample)
Survey: Mean estimation
Number of strata = 446 Number of obs = 178640
Number of PSUs = 18416 Population size = 53111321
Subpop. no. obs = 177130
Subpop. size = 52592728
Design df = 17970
--------------------------------------------------------------
| Linearized
| Mean Std. Err. [90% Conf. Interval]
-------------+------------------------------------------------
eda | 38.87578 .0576619 38.78093 38.97063
--------------------------------------------------------------
------------------------------------------------
| Linearized
| Mean Std. Err. CV (%)
-------------+----------------------------------
eda | 38.87578 .0576619 .148324
------------------------------------------------
svy, subpop (filtro):prop clase1, level(90)
estat cv
Survey: Proportion estimation
Number of strata = 446 Number of obs = 403865
Number of PSUs = 18440 Population size = 122865888
Subpop. no. obs = 291231
Subpop. size = 88192253
Design df = 17994
--------------------------------------------------------------
| Linearized
| Proportion Std. Err. [90% Conf. Interval]
-------------+------------------------------------------------
clase1 |
0 | . (no observations)
1 | .5966932 .0014983 .5942287 .5991578
2 | .4033068 .0014983 .4008422 .4057713
--------------------------------------------------------------
------------------------------------------------
| Linearized
| Proportion Std. Err. CV (%)
-------------+----------------------------------
clase1 |
0 | (omitted)
1 | .5966932 .0014983 .251098
2 | .4033068 .0014983 .3715
------------------------------------------------
Cuando se analizan poblaciones muy pequeñas se pueden presentar problemas. En particular al momento de calcular el coeficiente de variación aparece la siguiente leyenda: Note: missing standard errors because of stratum with single sampling unit
gen ti=((c_res==1 | c_res==3) & r_def==0 & (eda>=12 & eda<15) & clase2==1)
tab ti [fw=fac]
svy, subpop (ti): tab rama if (sex==2 & eda==14), format(%11.3g) count se cv ci level(90)
ti | Freq. Percent Cum.
------------+-----------------------------------
0 |122,390,514 99.61 99.61
1 | 475,374 0.39 100.00
------------+-----------------------------------
Total |122,865,888 100.00
> evel(90)
(running tabulate on estimation sample)
Number of strata = 120 Number of obs = 1823
Number of PSUs = 1551 Population size = 539903
Subpop. no. of obs = 175
Subpop. size = 54411
Design df = 1431
----------------------
rama | count
----------+-----------
0 | 0
2 | 8916
3 | 21301
4 | 17608
6 | 5109
7 | 1477
|
Total | 54411
----------------------
Key: count = weighted counts
Table contains a zero in the marginals.
Statistics cannot be computed.
Note: 284 strata omitted because they contain no subpopulation members.
Note: missing standard errors because of stratum with single sampling unit.
La solución es crear "pseudoestratos". Para ello, Stata tiene varios métidos como: missing, certainty, scaled, o centered.
svyset, clear
svyset upm [pw=fac], strata(est_d) vce(linearized) single(sca)
svy, subpop (ti): tab rama if (sex==2 & eda==14), format(%11.3g) count se cv ci level(90)
pweight: fac
VCE: linearized
Single unit: scaled
Strata 1: est_d
SU 1: upm
FPC 1:
> evel(90)
(running tabulate on estimation sample)
Number of strata = 120 Number of obs = 1823
Number of PSUs = 1551 Population size = 539903
Subpop. no. of obs = 175
Subpop. size = 54411
Design df = 1431
----------------------------------------------------------------------
rama | count se cv lb ub
----------+-----------------------------------------------------------
0 | 0 0 0 0
2 | 8916 2178 24.4 5331 12501
3 | 21301 3759 17.6 15114 27488
4 | 17608 3503 19.9 11842 23374
6 | 5109 1653 32.3 2389 7829
7 | 1477 976 66.1 -130 3084
|
Total | 54411
----------------------------------------------------------------------
Key: count = weighted counts
se = linearized standard errors of weighted counts
cv = coefficients of variation of weighted counts
lb = lower 90% confidence bounds for weighted counts
ub = upper 90% confidence bounds for weighted counts
Table contains a zero in the marginals.
Statistics cannot be computed.
Note: 284 strata omitted because they contain no subpopulation members.
Note: variance scaled to handle strata with a single sampling unit.
svyset, clear
svyset upm [pw=peso], strata(est_d) vce(linearized) single(sca)
svy, subpop(filtro): tab clase2, format(%11.3g) count se cv ci level(90)
pweight: peso
VCE: linearized
Single unit: scaled
Strata 1: est_d
SU 1: upm
FPC 1:
(running tabulate on estimation sample)
Number of strata = 446 Number of obs = 403865
Number of PSUs = 18440 Population size = 122742516
Subpop. no. of obs = 233328
Subpop. size = 88214725
Design df = 17994
----------------------------------------------------------------------
clase2 | count se cv lb ub
----------+-----------------------------------------------------------
0 | 0 0 0 0
1 | 50473590 437163 .866 49754484 51192696
2 | 2323904 52983 2.28 2236751 2411057
3 | 5809150 109485 1.88 5629053 5989247
4 | 29608081 300994 1.02 29112964 30103198
|
Total | 88214725
----------------------------------------------------------------------
Key: count = weighted counts
se = linearized standard errors of weighted counts
cv = coefficients of variation of weighted counts
lb = lower 90% confidence bounds for weighted counts
ub = upper 90% confidence bounds for weighted counts
Table contains a zero in the marginals.
Statistics cannot be computed.
svyset, clear
use "C:\Users\JC\Desktop\Estadística\BUAP\sdemt214.dta", clear
svyset upm [pw=peso], strata(est_d) vce(linearized) single(sca)
gen filtro=((c_res==1 | c_res==3) & r_def==0 & (eda>=15 & eda<=98))
svy, subpop(filtro): tab clase2, format(%11.3g) count se cv ci level(90)
pweight: peso
VCE: linearized
Single unit: scaled
Strata 1: est_d
SU 1: upm
FPC 1:
(running tabulate on estimation sample)
Number of strata = 446 Number of obs = 406088
Number of PSUs = 18438 Population size = 122211594
Subpop. no. of obs = 231470
Subpop. size = 86670766
Design df = 17992
----------------------------------------------------------------------
clase2 | count se cv lb ub
----------+-----------------------------------------------------------
0 | 0 0 0 0
1 | 49178167 434500 .884 48463441 49892893
2 | 2502742 54858 2.19 2412504 2592980
3 | 5798316 109008 1.88 5619005 5977627
4 | 29191541 300429 1.03 28697355 29685727
|
Total | 86670766
----------------------------------------------------------------------
Key: count = weighted counts
se = linearized standard errors of weighted counts
cv = coefficients of variation of weighted counts
lb = lower 90% confidence bounds for weighted counts
ub = upper 90% confidence bounds for weighted counts
Table contains a zero in the marginals.
Statistics cannot be computed.
svyset, clear
use "C:\Users\JC\Desktop\Estadística\BUAP\sdemt215.dta", clear
svyset upm [pw=fac], strata(est_d) vce(linearized) single(sca)
pweight: fac
VCE: linearized
Single unit: scaled
Strata 1: est_d
SU 1: upm
FPC 1:
Variables ordinales
gen int ocupado=(clase2==1)
gen int sexo = round(sex)
gen int nivel = round(niv_ins)
gen int econ = round(e_con)
gen int edad7 = round(eda7c)
(7072 missing values generated)
(89944 missing values generated)
logit ocupado anios_esc i.edad7 i.sexo i.econ if ((c_res==1 | c_res==3) & r_def==0 & (eda>=15 & eda<=97))
logit, or
estimates store modelo_1
Iteration 0: log likelihood = -197836.53
Iteration 1: log likelihood = -160326.33
Iteration 2: log likelihood = -159937.61
Iteration 3: log likelihood = -159936.41
Iteration 4: log likelihood = -159936.41
Logistic regression Number of obs = 291072
LR chi2(13) = 75800.24
Prob > chi2 = 0.0000
Log likelihood = -159936.41 Pseudo R2 = 0.1916
------------------------------------------------------------------------------
ocupado | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
anios_esc | .0319044 .0009674 32.98 0.000 .0300084 .0338004
|
edad7 |
2 | 1.681314 .0163315 102.95 0.000 1.649305 1.713323
3 | 2.373402 .0186586 127.20 0.000 2.336832 2.409973
4 | 2.448655 .0193802 126.35 0.000 2.410671 2.48664
5 | 1.964312 .0199039 98.69 0.000 1.925301 2.003323
6 | .5258125 .020226 26.00 0.000 .4861703 .5654547
|
2.sexo | -1.616895 .0093264 -173.37 0.000 -1.635175 -1.598616
|
econ |
2 | .527701 .0258567 20.41 0.000 .4770228 .5783793
3 | .5518173 .0371811 14.84 0.000 .4789438 .6246908
4 | -.1593692 .0249566 -6.39 0.000 -.2082832 -.1104553
5 | -.2012 .0137958 -14.58 0.000 -.2282392 -.1741608
6 | -.0908275 .014951 -6.08 0.000 -.1201309 -.0615241
9 | -1.845994 .58981 -3.13 0.002 -3.002001 -.6899878
|
_cons | -.5598089 .0201312 -27.81 0.000 -.5992653 -.5203525
------------------------------------------------------------------------------
Logistic regression Number of obs = 291072
LR chi2(13) = 75800.24
Prob > chi2 = 0.0000
Log likelihood = -159936.41 Pseudo R2 = 0.1916
------------------------------------------------------------------------------
ocupado | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
anios_esc | 1.032419 .0009987 32.98 0.000 1.030463 1.034378
|
edad7 |
2 | 5.372611 .0877427 102.95 0.000 5.203362 5.547366
3 | 10.73385 .2002787 127.20 0.000 10.3484 11.13366
4 | 11.57277 .2242831 126.35 0.000 11.14143 12.02082
5 | 7.130003 .1419151 98.69 0.000 6.857211 7.413649
6 | 1.691833 .034219 26.00 0.000 1.626077 1.760248
|
2.sexo | .1985141 .0018514 -173.37 0.000 .1949183 .2021762
|
econ |
2 | 1.695031 .043828 20.41 0.000 1.61127 1.783146
3 | 1.736406 .0645614 14.84 0.000 1.614368 1.867668
4 | .8526815 .02128 -6.39 0.000 .811977 .8954264
5 | .8177489 .0112815 -14.58 0.000 .7959338 .8401618
6 | .9131752 .0136529 -6.08 0.000 .8868043 .9403303
9 | .1578683 .0931123 -3.13 0.002 .0496876 .5015822
|
_cons | .5713182 .0115013 -27.81 0.000 .549215 .594311
------------------------------------------------------------------------------
gen f3=((c_res==1 | c_res==3) & r_def==0 & (eda>=15 & eda<=98) & (eda>=15 & eda<=97))
svy, subpop(f3): logit ocupado anios_esc i.edad7 i.sexo i.econ
logit, or
estimates store modelo_2
(running logit on estimation sample)
Survey: Logistic regression
Number of strata = 446 Number of obs = 403865
Number of PSUs = 18440 Population size = 122865888
Subpop. no. of obs = 291072
Subpop. size = 88140066
Design df = 17994
F( 13, 17982) = 1695.87
Prob > F = 0.0000
------------------------------------------------------------------------------
| Linearized
ocupado | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
anios_esc | .028724 .002202 13.04 0.000 .0244079 .03304
|
edad7 |
2 | 1.675739 .0275872 60.74 0.000 1.621666 1.729813
3 | 2.347418 .030226 77.66 0.000 2.288172 2.406663
4 | 2.424468 .0311733 77.77 0.000 2.363365 2.48557
5 | 2.007155 .0329516 60.91 0.000 1.942567 2.071743
6 | .6043582 .0346322 17.45 0.000 .5364758 .6722406
|
2.sexo | -1.752686 .0152963 -114.58 0.000 -1.782668 -1.722704
|
econ |
2 | .5827273 .0384924 15.14 0.000 .5072785 .658176
3 | .5755434 .0588421 9.78 0.000 .4602072 .6908797
4 | -.1326115 .0362316 -3.66 0.000 -.203629 -.0615941
5 | -.1801903 .0196649 -9.16 0.000 -.2187353 -.1416453
6 | -.0166378 .023152 -0.72 0.472 -.062018 .0287424
9 | -1.815822 1.500444 -1.21 0.226 -4.756835 1.125192
|
_cons | -.52711 .0347248 -15.18 0.000 -.5951739 -.4590461
------------------------------------------------------------------------------
Survey: Logistic regression
Number of strata = 446 Number of obs = 403865
Number of PSUs = 18440 Population size = 122865888
Subpop. no. of obs = 291072
Subpop. size = 88140066
Design df = 17994
F( 13, 17982) = 1695.87
Prob > F = 0.0000
------------------------------------------------------------------------------
| Linearized
ocupado | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
anios_esc | 1.02914 .0022661 13.04 0.000 1.024708 1.033592
|
edad7 |
2 | 5.342743 .1473915 60.74 0.000 5.061514 5.639598
3 | 10.45853 .3161192 77.66 0.000 9.856901 11.09687
4 | 11.29622 .3521401 77.77 0.000 10.62665 12.00797
5 | 7.442114 .2452298 60.91 0.000 6.976634 7.938651
6 | 1.830077 .0633795 17.45 0.000 1.70997 1.958621
|
2.sexo | .1733078 .002651 -114.58 0.000 .1681888 .1785826
|
econ |
2 | 1.790916 .0689366 15.14 0.000 1.660765 1.931267
3 | 1.778097 .104627 9.78 0.000 1.584402 1.99547
4 | .8758052 .0317318 -3.66 0.000 .815765 .9402645
5 | .8351112 .0164223 -9.16 0.000 .8035344 .867929
6 | .9834998 .02277 -0.72 0.472 .9398659 1.029159
9 | .1627041 .2441284 -1.21 0.226 .0085928 3.080807
|
_cons | .5903085 .0204983 -15.18 0.000 .5514667 .6318861
------------------------------------------------------------------------------
estimates table modelo_1 modelo_2, b(%6.2f) star(0.05 0.01 .001) eform
Variable | modelo_1 modelo_2
-------------+--------------------------
anios_esc | 1.03*** 1.03***
|
edad7 |
2 | 5.37*** 5.34***
3 | 10.73*** 10.46***
4 | 11.57*** 11.30***
5 | 7.13*** 7.44***
6 | 1.69*** 1.83***
|
sexo |
2 | 0.20*** 0.17***
|
econ |
2 | 1.70*** 1.79***
3 | 1.74*** 1.78***
4 | 0.85*** 0.88***
5 | 0.82*** 0.84***
6 | 0.91*** 0.98
9 | 0.16** 0.16
|
_cons | 0.57*** 0.59***
----------------------------------------
legend: * p<.05; ** p<.01; *** p<.001