CLASE 14. Regresión Poisson, Binomial Negativa y Gamma en Python

Autor/a

Gerson Rivera

Fecha de publicación

7 agosto 2024

Algo de Regression Gamma

import statsmodels.api as sm

data=sm.datasets.scotland.load_pandas()

data.endog

0     60.3
1     52.3
2     53.4
3     57.0
4     68.7
5     48.8
6     65.5
7     70.5
8     59.1
9     62.7
10    51.6
11    62.0
12    68.4
13    69.2
14    64.7
15    75.0
16    62.1
17    67.2
18    67.7
19    52.7
20    65.7
21    72.2
22    47.4
23    51.3
24    63.6
25    50.7
26    51.6
27    56.2
28    67.6
29    58.9
30    74.7
31    67.3
Name: YES, dtype: float64

data.exog

	COUTAX	UNEMPF	MOR	ACT	GDP	AGE	COUTAX_FEMALEUNEMP
0	712.0	21.0	105.0	82.4	13566.0	12.3	14952.0
1	643.0	26.5	97.0	80.2	13566.0	15.3	17039.5
2	679.0	28.3	113.0	86.3	9611.0	13.9	19215.7
3	801.0	27.1	109.0	80.4	9483.0	13.6	21707.1
4	753.0	22.0	115.0	64.7	9265.0	14.6	16566.0
5	714.0	24.3	107.0	79.0	9555.0	13.8	17350.2
6	920.0	21.2	118.0	72.2	9611.0	13.3	19504.0
7	779.0	20.5	114.0	75.2	9483.0	14.5	15969.5
8	771.0	23.2	102.0	81.1	9483.0	14.2	17887.2
9	724.0	20.5	112.0	80.3	12656.0	13.7	14842.0
10	682.0	23.8	96.0	83.0	9483.0	14.6	16231.6
11	837.0	22.1	111.0	74.5	12656.0	11.6	18497.7
12	599.0	19.9	117.0	83.8	8298.0	15.1	11920.1
13	680.0	21.5	121.0	77.6	9265.0	13.7	14620.0
14	747.0	22.5	109.0	77.9	8314.0	14.4	16807.5
15	982.0	19.4	137.0	65.3	9483.0	13.3	19050.8
16	719.0	25.9	109.0	80.9	8298.0	14.9	18622.1
17	831.0	18.5	138.0	80.2	9483.0	14.6	15373.5
18	858.0	19.4	119.0	84.8	12656.0	14.3	16645.2
19	652.0	27.2	108.0	86.4	13566.0	14.6	17734.4
20	718.0	23.7	115.0	73.5	9483.0	15.0	17016.6
21	787.0	20.8	126.0	74.7	9483.0	14.9	16369.6
22	515.0	26.8	106.0	87.8	8298.0	15.3	13802.0
23	732.0	23.0	103.0	86.6	9611.0	13.8	16836.0
24	783.0	20.5	125.0	78.5	9483.0	14.1	16051.5
25	612.0	23.7	100.0	80.6	9033.0	13.3	14504.4
26	486.0	23.2	117.0	84.8	8298.0	15.9	11275.2
27	765.0	23.6	105.0	79.2	9483.0	13.7	18054.0
28	793.0	21.7	125.0	78.4	9483.0	14.5	17208.1
29	776.0	23.0	110.0	77.2	9265.0	13.6	17848.0
30	978.0	19.3	130.0	71.5	9483.0	15.3	18875.4
31	792.0	21.2	126.0	82.2	12656.0	15.1	16790.4

data.endog_name

'YES'

data.exog_name

['COUTAX', 'UNEMPF', 'MOR', 'ACT', 'GDP', 'AGE', 'COUTAX_FEMALEUNEMP']

Load modules and data

 import statsmodels.api as sm

 data = sm.datasets.scotland.load_pandas()

 data.exog = sm.add_constant(data.exog)

Instantiate a gamma family model with the default link function.

  gamma_model = sm.GLM(data.endog, data.exog, family=sm.families.Gamma())

  gamma_results = gamma_model.fit()

  print(gamma_results.summary())

                 Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:                    YES   No. Observations:                   32
Model:                            GLM   Df Residuals:                       24
Model Family:                   Gamma   Df Model:                            7
Link Function:           InversePower   Scale:                       0.0035843
Method:                          IRLS   Log-Likelihood:                -83.017
Date:                Wed, 07 Aug 2024   Deviance:                     0.087389
Time:                        18:17:02   Pearson chi2:                   0.0860
No. Iterations:                     6   Pseudo R-squ. (CS):             0.9800
Covariance Type:            nonrobust                                         
======================================================================================
                         coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------------
const                 -0.0178      0.011     -1.548      0.122      -0.040       0.005
COUTAX              4.962e-05   1.62e-05      3.060      0.002    1.78e-05    8.14e-05
UNEMPF                 0.0020      0.001      3.824      0.000       0.001       0.003
MOR                -7.181e-05   2.71e-05     -2.648      0.008      -0.000   -1.87e-05
ACT                    0.0001   4.06e-05      2.757      0.006    3.23e-05       0.000
GDP                -1.468e-07   1.24e-07     -1.187      0.235   -3.89e-07    9.56e-08
AGE                   -0.0005      0.000     -2.159      0.031      -0.001   -4.78e-05
COUTAX_FEMALEUNEMP -2.427e-06   7.46e-07     -3.253      0.001   -3.89e-06   -9.65e-07
======================================================================================

C:\Users\Usuario\AppData\Local\Programs\Python\Python311\Lib\site-packages\statsmodels\genmod\generalized_linear_model.py:308: DomainWarning: The InversePower link function does not respect the domain of the Gamma family.
  warnings.warn((f"The {type(family.link).__name__} link function "

data.data

	YES	COUTAX	UNEMPF	MOR	ACT	GDP	AGE	COUTAX_FEMALEUNEMP
0	60.3	712.0	21.0	105.0	82.4	13566.0	12.3	14952.0
1	52.3	643.0	26.5	97.0	80.2	13566.0	15.3	17039.5
2	53.4	679.0	28.3	113.0	86.3	9611.0	13.9	19215.7
3	57.0	801.0	27.1	109.0	80.4	9483.0	13.6	21707.1
4	68.7	753.0	22.0	115.0	64.7	9265.0	14.6	16566.0
5	48.8	714.0	24.3	107.0	79.0	9555.0	13.8	17350.2
6	65.5	920.0	21.2	118.0	72.2	9611.0	13.3	19504.0
7	70.5	779.0	20.5	114.0	75.2	9483.0	14.5	15969.5
8	59.1	771.0	23.2	102.0	81.1	9483.0	14.2	17887.2
9	62.7	724.0	20.5	112.0	80.3	12656.0	13.7	14842.0
10	51.6	682.0	23.8	96.0	83.0	9483.0	14.6	16231.6
11	62.0	837.0	22.1	111.0	74.5	12656.0	11.6	18497.7
12	68.4	599.0	19.9	117.0	83.8	8298.0	15.1	11920.1
13	69.2	680.0	21.5	121.0	77.6	9265.0	13.7	14620.0
14	64.7	747.0	22.5	109.0	77.9	8314.0	14.4	16807.5
15	75.0	982.0	19.4	137.0	65.3	9483.0	13.3	19050.8
16	62.1	719.0	25.9	109.0	80.9	8298.0	14.9	18622.1
17	67.2	831.0	18.5	138.0	80.2	9483.0	14.6	15373.5
18	67.7	858.0	19.4	119.0	84.8	12656.0	14.3	16645.2
19	52.7	652.0	27.2	108.0	86.4	13566.0	14.6	17734.4
20	65.7	718.0	23.7	115.0	73.5	9483.0	15.0	17016.6
21	72.2	787.0	20.8	126.0	74.7	9483.0	14.9	16369.6
22	47.4	515.0	26.8	106.0	87.8	8298.0	15.3	13802.0
23	51.3	732.0	23.0	103.0	86.6	9611.0	13.8	16836.0
24	63.6	783.0	20.5	125.0	78.5	9483.0	14.1	16051.5
25	50.7	612.0	23.7	100.0	80.6	9033.0	13.3	14504.4
26	51.6	486.0	23.2	117.0	84.8	8298.0	15.9	11275.2
27	56.2	765.0	23.6	105.0	79.2	9483.0	13.7	18054.0
28	67.6	793.0	21.7	125.0	78.4	9483.0	14.5	17208.1
29	58.9	776.0	23.0	110.0	77.2	9265.0	13.6	17848.0
30	74.7	978.0	19.3	130.0	71.5	9483.0	15.3	18875.4
31	67.3	792.0	21.2	126.0	82.2	12656.0	15.1	16790.4

data.names

['YES', 'COUTAX', 'UNEMPF', 'MOR', 'ACT', 'GDP', 'AGE', 'COUTAX_FEMALEUNEMP']

data

data.data.to_csv('data.csv')

data.data

	YES	COUTAX	UNEMPF	MOR	ACT	GDP	AGE	COUTAX_FEMALEUNEMP
0	60.3	712.0	21.0	105.0	82.4	13566.0	12.3	14952.0
1	52.3	643.0	26.5	97.0	80.2	13566.0	15.3	17039.5
2	53.4	679.0	28.3	113.0	86.3	9611.0	13.9	19215.7
3	57.0	801.0	27.1	109.0	80.4	9483.0	13.6	21707.1
4	68.7	753.0	22.0	115.0	64.7	9265.0	14.6	16566.0
5	48.8	714.0	24.3	107.0	79.0	9555.0	13.8	17350.2
6	65.5	920.0	21.2	118.0	72.2	9611.0	13.3	19504.0
7	70.5	779.0	20.5	114.0	75.2	9483.0	14.5	15969.5
8	59.1	771.0	23.2	102.0	81.1	9483.0	14.2	17887.2
9	62.7	724.0	20.5	112.0	80.3	12656.0	13.7	14842.0
10	51.6	682.0	23.8	96.0	83.0	9483.0	14.6	16231.6
11	62.0	837.0	22.1	111.0	74.5	12656.0	11.6	18497.7
12	68.4	599.0	19.9	117.0	83.8	8298.0	15.1	11920.1
13	69.2	680.0	21.5	121.0	77.6	9265.0	13.7	14620.0
14	64.7	747.0	22.5	109.0	77.9	8314.0	14.4	16807.5
15	75.0	982.0	19.4	137.0	65.3	9483.0	13.3	19050.8
16	62.1	719.0	25.9	109.0	80.9	8298.0	14.9	18622.1
17	67.2	831.0	18.5	138.0	80.2	9483.0	14.6	15373.5
18	67.7	858.0	19.4	119.0	84.8	12656.0	14.3	16645.2
19	52.7	652.0	27.2	108.0	86.4	13566.0	14.6	17734.4
20	65.7	718.0	23.7	115.0	73.5	9483.0	15.0	17016.6
21	72.2	787.0	20.8	126.0	74.7	9483.0	14.9	16369.6
22	47.4	515.0	26.8	106.0	87.8	8298.0	15.3	13802.0
23	51.3	732.0	23.0	103.0	86.6	9611.0	13.8	16836.0
24	63.6	783.0	20.5	125.0	78.5	9483.0	14.1	16051.5
25	50.7	612.0	23.7	100.0	80.6	9033.0	13.3	14504.4
26	51.6	486.0	23.2	117.0	84.8	8298.0	15.9	11275.2
27	56.2	765.0	23.6	105.0	79.2	9483.0	13.7	18054.0
28	67.6	793.0	21.7	125.0	78.4	9483.0	14.5	17208.1
29	58.9	776.0	23.0	110.0	77.2	9265.0	13.6	17848.0
30	74.7	978.0	19.3	130.0	71.5	9483.0	15.3	18875.4
31	67.3	792.0	21.2	126.0	82.2	12656.0	15.1	16790.4

 data.endog

0     60.3
1     52.3
2     53.4
3     57.0
4     68.7
5     48.8
6     65.5
7     70.5
8     59.1
9     62.7
10    51.6
11    62.0
12    68.4
13    69.2
14    64.7
15    75.0
16    62.1
17    67.2
18    67.7
19    52.7
20    65.7
21    72.2
22    47.4
23    51.3
24    63.6
25    50.7
26    51.6
27    56.2
28    67.6
29    58.9
30    74.7
31    67.3
Name: YES, dtype: float64

data.endog_name

'YES'

data.exog_name

['COUTAX', 'UNEMPF', 'MOR', 'ACT', 'GDP', 'AGE', 'COUTAX_FEMALEUNEMP']

dataend=data.data.iloc[:,0]
dataend

0     60.3
1     52.3
2     53.4
3     57.0
4     68.7
5     48.8
6     65.5
7     70.5
8     59.1
9     62.7
10    51.6
11    62.0
12    68.4
13    69.2
14    64.7
15    75.0
16    62.1
17    67.2
18    67.7
19    52.7
20    65.7
21    72.2
22    47.4
23    51.3
24    63.6
25    50.7
26    51.6
27    56.2
28    67.6
29    58.9
30    74.7
31    67.3
Name: YES, dtype: float64

data1=data.exog.loc[:,['COUTAX','UNEMPF','MOR','ACT','GDP']]
data1.columns=['COUTAX','UNEMPF','MOR','ACT','GDP']

data1.columns

Index(['COUTAX', 'UNEMPF', 'MOR', 'ACT', 'GDP'], dtype='object')

dataend

0     60.3
1     52.3
2     53.4
3     57.0
4     68.7
5     48.8
6     65.5
7     70.5
8     59.1
9     62.7
10    51.6
11    62.0
12    68.4
13    69.2
14    64.7
15    75.0
16    62.1
17    67.2
18    67.7
19    52.7
20    65.7
21    72.2
22    47.4
23    51.3
24    63.6
25    50.7
26    51.6
27    56.2
28    67.6
29    58.9
30    74.7
31    67.3
Name: YES, dtype: float64

Regresion Gamma en Python

import pandas as pd

dat=pd.read_csv('data Gamma.csv')

dat.head()

	Unnamed: 0	YES	COUTAX	UNEMPF	MOR	ACT	GDP	AGE	COUTAX_FEMALEUNEMP
0	0	60.3	712.0	21.0	105.0	82.4	13566.0	12.3	14952.0
1	1	52.3	643.0	26.5	97.0	80.2	13566.0	15.3	17039.5
2	2	53.4	679.0	28.3	113.0	86.3	9611.0	13.9	19215.7
3	3	57.0	801.0	27.1	109.0	80.4	9483.0	13.6	21707.1
4	4	68.7	753.0	22.0	115.0	64.7	9265.0	14.6	16566.0

dat.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32 entries, 0 to 31
Data columns (total 9 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Unnamed: 0          32 non-null     int64  
 1   YES                 32 non-null     float64
 2   COUTAX              32 non-null     float64
 3   UNEMPF              32 non-null     float64
 4   MOR                 32 non-null     float64
 5   ACT                 32 non-null     float64
 6   GDP                 32 non-null     float64
 7   AGE                 32 non-null     float64
 8   COUTAX_FEMALEUNEMP  32 non-null     float64
dtypes: float64(8), int64(1)
memory usage: 2.4 KB

dat.dtypes

Unnamed: 0              int64
YES                   float64
COUTAX                float64
UNEMPF                float64
MOR                   float64
ACT                   float64
GDP                   float64
AGE                   float64
COUTAX_FEMALEUNEMP    float64
dtype: object

dat_endog=dat['YES']
dat_endog

0     60.3
1     52.3
2     53.4
3     57.0
4     68.7
5     48.8
6     65.5
7     70.5
8     59.1
9     62.7
10    51.6
11    62.0
12    68.4
13    69.2
14    64.7
15    75.0
16    62.1
17    67.2
18    67.7
19    52.7
20    65.7
21    72.2
22    47.4
23    51.3
24    63.6
25    50.7
26    51.6
27    56.2
28    67.6
29    58.9
30    74.7
31    67.3
Name: YES, dtype: float64

dat_exog=dat.drop(columns='YES')

dat_exog

	Unnamed: 0	COUTAX	UNEMPF	MOR	ACT	GDP	AGE	COUTAX_FEMALEUNEMP
0	0	712.0	21.0	105.0	82.4	13566.0	12.3	14952.0
1	1	643.0	26.5	97.0	80.2	13566.0	15.3	17039.5
2	2	679.0	28.3	113.0	86.3	9611.0	13.9	19215.7
3	3	801.0	27.1	109.0	80.4	9483.0	13.6	21707.1
4	4	753.0	22.0	115.0	64.7	9265.0	14.6	16566.0
5	5	714.0	24.3	107.0	79.0	9555.0	13.8	17350.2
6	6	920.0	21.2	118.0	72.2	9611.0	13.3	19504.0
7	7	779.0	20.5	114.0	75.2	9483.0	14.5	15969.5
8	8	771.0	23.2	102.0	81.1	9483.0	14.2	17887.2
9	9	724.0	20.5	112.0	80.3	12656.0	13.7	14842.0
10	10	682.0	23.8	96.0	83.0	9483.0	14.6	16231.6
11	11	837.0	22.1	111.0	74.5	12656.0	11.6	18497.7
12	12	599.0	19.9	117.0	83.8	8298.0	15.1	11920.1
13	13	680.0	21.5	121.0	77.6	9265.0	13.7	14620.0
14	14	747.0	22.5	109.0	77.9	8314.0	14.4	16807.5
15	15	982.0	19.4	137.0	65.3	9483.0	13.3	19050.8
16	16	719.0	25.9	109.0	80.9	8298.0	14.9	18622.1
17	17	831.0	18.5	138.0	80.2	9483.0	14.6	15373.5
18	18	858.0	19.4	119.0	84.8	12656.0	14.3	16645.2
19	19	652.0	27.2	108.0	86.4	13566.0	14.6	17734.4
20	20	718.0	23.7	115.0	73.5	9483.0	15.0	17016.6
21	21	787.0	20.8	126.0	74.7	9483.0	14.9	16369.6
22	22	515.0	26.8	106.0	87.8	8298.0	15.3	13802.0
23	23	732.0	23.0	103.0	86.6	9611.0	13.8	16836.0
24	24	783.0	20.5	125.0	78.5	9483.0	14.1	16051.5
25	25	612.0	23.7	100.0	80.6	9033.0	13.3	14504.4
26	26	486.0	23.2	117.0	84.8	8298.0	15.9	11275.2
27	27	765.0	23.6	105.0	79.2	9483.0	13.7	18054.0
28	28	793.0	21.7	125.0	78.4	9483.0	14.5	17208.1
29	29	776.0	23.0	110.0	77.2	9265.0	13.6	17848.0
30	30	978.0	19.3	130.0	71.5	9483.0	15.3	18875.4
31	31	792.0	21.2	126.0	82.2	12656.0	15.1	16790.4

dat_exog=sm.add_constant(dat_exog)
dat_exog.head(10)

	const	Unnamed: 0	COUTAX	UNEMPF	MOR	ACT	GDP	AGE	COUTAX_FEMALEUNEMP
0	1.0	0	712.0	21.0	105.0	82.4	13566.0	12.3	14952.0
1	1.0	1	643.0	26.5	97.0	80.2	13566.0	15.3	17039.5
2	1.0	2	679.0	28.3	113.0	86.3	9611.0	13.9	19215.7
3	1.0	3	801.0	27.1	109.0	80.4	9483.0	13.6	21707.1
4	1.0	4	753.0	22.0	115.0	64.7	9265.0	14.6	16566.0
5	1.0	5	714.0	24.3	107.0	79.0	9555.0	13.8	17350.2
6	1.0	6	920.0	21.2	118.0	72.2	9611.0	13.3	19504.0
7	1.0	7	779.0	20.5	114.0	75.2	9483.0	14.5	15969.5
8	1.0	8	771.0	23.2	102.0	81.1	9483.0	14.2	17887.2
9	1.0	9	724.0	20.5	112.0	80.3	12656.0	13.7	14842.0

Instantiate a gamma family model with the default link function

gam_mod = sm.GLM(dat_endog, dat_exog, family=sm.families.Gamma())

resul= gam_mod.fit()

print(resul.summary())

                 Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:                    YES   No. Observations:                   32
Model:                            GLM   Df Residuals:                       23
Model Family:                   Gamma   Df Model:                            8
Link Function:           InversePower   Scale:                       0.0035840
Method:                          IRLS   Log-Likelihood:                -82.603
Date:                Wed, 07 Aug 2024   Deviance:                     0.084423
Time:                        18:17:03   Pearson chi2:                   0.0824
No. Iterations:                     6   Pseudo R-squ. (CS):             0.9805
Covariance Type:            nonrobust                                         
======================================================================================
                         coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------------
const                 -0.0149      0.012     -1.253      0.210      -0.038       0.008
Unnamed: 0          2.032e-05   2.24e-05      0.909      0.364   -2.35e-05    6.42e-05
COUTAX               4.78e-05   1.63e-05      2.924      0.003    1.58e-05    7.98e-05
UNEMPF                 0.0020      0.001      3.729      0.000       0.001       0.003
MOR                -7.609e-05   2.75e-05     -2.771      0.006      -0.000   -2.23e-05
ACT                    0.0001   4.24e-05      2.360      0.018     1.7e-05       0.000
GDP                -1.288e-07   1.25e-07     -1.028      0.304   -3.74e-07    1.17e-07
AGE                   -0.0006      0.000     -2.309      0.021      -0.001   -8.64e-05
COUTAX_FEMALEUNEMP  -2.36e-06    7.5e-07     -3.146      0.002   -3.83e-06    -8.9e-07
======================================================================================

C:\Users\Usuario\AppData\Local\Programs\Python\Python311\Lib\site-packages\statsmodels\genmod\generalized_linear_model.py:308: DomainWarning: The InversePower link function does not respect the domain of the Gamma family.
  warnings.warn((f"The {type(family.link).__name__} link function "

dat.head(6)

	Unnamed: 0	YES	COUTAX	UNEMPF	MOR	ACT	GDP	AGE	COUTAX_FEMALEUNEMP
0	0	60.3	712.0	21.0	105.0	82.4	13566.0	12.3	14952.0
1	1	52.3	643.0	26.5	97.0	80.2	13566.0	15.3	17039.5
2	2	53.4	679.0	28.3	113.0	86.3	9611.0	13.9	19215.7
3	3	57.0	801.0	27.1	109.0	80.4	9483.0	13.6	21707.1
4	4	68.7	753.0	22.0	115.0	64.7	9265.0	14.6	16566.0
5	5	48.8	714.0	24.3	107.0	79.0	9555.0	13.8	17350.2

data.exog

	const	COUTAX	UNEMPF	MOR	ACT	GDP	AGE	COUTAX_FEMALEUNEMP
0	1.0	712.0	21.0	105.0	82.4	13566.0	12.3	14952.0
1	1.0	643.0	26.5	97.0	80.2	13566.0	15.3	17039.5
2	1.0	679.0	28.3	113.0	86.3	9611.0	13.9	19215.7
3	1.0	801.0	27.1	109.0	80.4	9483.0	13.6	21707.1
4	1.0	753.0	22.0	115.0	64.7	9265.0	14.6	16566.0
5	1.0	714.0	24.3	107.0	79.0	9555.0	13.8	17350.2
6	1.0	920.0	21.2	118.0	72.2	9611.0	13.3	19504.0
7	1.0	779.0	20.5	114.0	75.2	9483.0	14.5	15969.5
8	1.0	771.0	23.2	102.0	81.1	9483.0	14.2	17887.2
9	1.0	724.0	20.5	112.0	80.3	12656.0	13.7	14842.0
10	1.0	682.0	23.8	96.0	83.0	9483.0	14.6	16231.6
11	1.0	837.0	22.1	111.0	74.5	12656.0	11.6	18497.7
12	1.0	599.0	19.9	117.0	83.8	8298.0	15.1	11920.1
13	1.0	680.0	21.5	121.0	77.6	9265.0	13.7	14620.0
14	1.0	747.0	22.5	109.0	77.9	8314.0	14.4	16807.5
15	1.0	982.0	19.4	137.0	65.3	9483.0	13.3	19050.8
16	1.0	719.0	25.9	109.0	80.9	8298.0	14.9	18622.1
17	1.0	831.0	18.5	138.0	80.2	9483.0	14.6	15373.5
18	1.0	858.0	19.4	119.0	84.8	12656.0	14.3	16645.2
19	1.0	652.0	27.2	108.0	86.4	13566.0	14.6	17734.4
20	1.0	718.0	23.7	115.0	73.5	9483.0	15.0	17016.6
21	1.0	787.0	20.8	126.0	74.7	9483.0	14.9	16369.6
22	1.0	515.0	26.8	106.0	87.8	8298.0	15.3	13802.0
23	1.0	732.0	23.0	103.0	86.6	9611.0	13.8	16836.0
24	1.0	783.0	20.5	125.0	78.5	9483.0	14.1	16051.5
25	1.0	612.0	23.7	100.0	80.6	9033.0	13.3	14504.4
26	1.0	486.0	23.2	117.0	84.8	8298.0	15.9	11275.2
27	1.0	765.0	23.6	105.0	79.2	9483.0	13.7	18054.0
28	1.0	793.0	21.7	125.0	78.4	9483.0	14.5	17208.1
29	1.0	776.0	23.0	110.0	77.2	9265.0	13.6	17848.0
30	1.0	978.0	19.3	130.0	71.5	9483.0	15.3	18875.4
31	1.0	792.0	21.2	126.0	82.2	12656.0	15.1	16790.4

df={'const':1.0,
    'Unnamed: 0':0,
    'COUTAX':712,
    'UNEMPF':21,
    'MOR':105,
    'ACT':82.4,
    'GDP':13566,
    'AGE':12.3,
    'COUTAX_FEMALEUNEMP':14952}

df=pd.DataFrame(df,index=['Jaime'])

df

	const	Unnamed: 0	COUTAX	UNEMPF	MOR	ACT	GDP	AGE	COUTAX_FEMALEUNEMP
Jaime	1.0	0	712	21	105	82.4	13566	12.3	14952

resul.predict(df)

Jaime    58.273143
dtype: float64

dat.head()

	Unnamed: 0	YES	COUTAX	UNEMPF	MOR	ACT	GDP	AGE	COUTAX_FEMALEUNEMP
0	0	60.3	712.0	21.0	105.0	82.4	13566.0	12.3	14952.0
1	1	52.3	643.0	26.5	97.0	80.2	13566.0	15.3	17039.5
2	2	53.4	679.0	28.3	113.0	86.3	9611.0	13.9	19215.7
3	3	57.0	801.0	27.1	109.0	80.4	9483.0	13.6	21707.1
4	4	68.7	753.0	22.0	115.0	64.7	9265.0	14.6	16566.0

resul.aic

np.float64(183.20643451945216)

dat.head()

	Unnamed: 0	YES	COUTAX	UNEMPF	MOR	ACT	GDP	AGE	COUTAX_FEMALEUNEMP
0	0	60.3	712.0	21.0	105.0	82.4	13566.0	12.3	14952.0
1	1	52.3	643.0	26.5	97.0	80.2	13566.0	15.3	17039.5
2	2	53.4	679.0	28.3	113.0	86.3	9611.0	13.9	19215.7
3	3	57.0	801.0	27.1	109.0	80.4	9483.0	13.6	21707.1
4	4	68.7	753.0	22.0	115.0	64.7	9265.0	14.6	16566.0

dat_exog=dat.drop(columns=['YES','Unnamed: 0'])

dat_exog=sm.add_constant(dat_exog)
dat_exog.head(10)

	const	COUTAX	UNEMPF	MOR	ACT	GDP	AGE	COUTAX_FEMALEUNEMP
0	1.0	712.0	21.0	105.0	82.4	13566.0	12.3	14952.0
1	1.0	643.0	26.5	97.0	80.2	13566.0	15.3	17039.5
2	1.0	679.0	28.3	113.0	86.3	9611.0	13.9	19215.7
3	1.0	801.0	27.1	109.0	80.4	9483.0	13.6	21707.1
4	1.0	753.0	22.0	115.0	64.7	9265.0	14.6	16566.0
5	1.0	714.0	24.3	107.0	79.0	9555.0	13.8	17350.2
6	1.0	920.0	21.2	118.0	72.2	9611.0	13.3	19504.0
7	1.0	779.0	20.5	114.0	75.2	9483.0	14.5	15969.5
8	1.0	771.0	23.2	102.0	81.1	9483.0	14.2	17887.2
9	1.0	724.0	20.5	112.0	80.3	12656.0	13.7	14842.0

Instantiate a gamma family model with the default link function.

gam_mod2 = sm.GLM(dat_endog, dat_exog, family=sm.families.Gamma())

resul2= gam_mod2.fit()

resul2.summary()

C:\Users\Usuario\AppData\Local\Programs\Python\Python311\Lib\site-packages\statsmodels\genmod\generalized_linear_model.py:308: DomainWarning: The InversePower link function does not respect the domain of the Gamma family.
  warnings.warn((f"The {type(family.link).__name__} link function "

Generalized Linear Model Regression Results
Dep. Variable:	YES	No. Observations:	32
Model:	GLM	Df Residuals:	24
Model Family:	Gamma	Df Model:	7
Link Function:	InversePower	Scale:	0.0035843
Method:	IRLS	Log-Likelihood:	-83.017
Date:	Wed, 07 Aug 2024	Deviance:	0.087389
Time:	18:17:03	Pearson chi2:	0.0860
No. Iterations:	6	Pseudo R-squ. (CS):	0.9800
Covariance Type:	nonrobust

	coef	std err	z	P>\|z\|	[0.025	0.975]
const	-0.0178	0.011	-1.548	0.122	-0.040	0.005
COUTAX	4.962e-05	1.62e-05	3.060	0.002	1.78e-05	8.14e-05
UNEMPF	0.0020	0.001	3.824	0.000	0.001	0.003
MOR	-7.181e-05	2.71e-05	-2.648	0.008	-0.000	-1.87e-05
ACT	0.0001	4.06e-05	2.757	0.006	3.23e-05	0.000
GDP	-1.468e-07	1.24e-07	-1.187	0.235	-3.89e-07	9.56e-08
AGE	-0.0005	0.000	-2.159	0.031	-0.001	-4.78e-05
COUTAX_FEMALEUNEMP	-2.427e-06	7.46e-07	-3.253	0.001	-3.89e-06	-9.65e-07

resul.aic

np.float64(183.20643451945216)

resul2.aic

np.float64(182.03440432213847)

df={'const':1.0,
    'COUTAX':712,
    'UNEMPF':21,
    'MOR':105,
    'ACT':82.4,
    'GDP':13566,
    'AGE':12.3,
    'COUTAX_FEMALEUNEMP':14952}

df=pd.DataFrame(df,index=["F"])

df

	const	COUTAX	UNEMPF	MOR	ACT	GDP	AGE	COUTAX_FEMALEUNEMP
F	1.0	712	21	105	82.4	13566	12.3	14952

resul2.predict(df)

F    57.804315
dtype: float64

dat.head()

	Unnamed: 0	YES	COUTAX	UNEMPF	MOR	ACT	GDP	AGE	COUTAX_FEMALEUNEMP
0	0	60.3	712.0	21.0	105.0	82.4	13566.0	12.3	14952.0
1	1	52.3	643.0	26.5	97.0	80.2	13566.0	15.3	17039.5
2	2	53.4	679.0	28.3	113.0	86.3	9611.0	13.9	19215.7
3	3	57.0	801.0	27.1	109.0	80.4	9483.0	13.6	21707.1
4	4	68.7	753.0	22.0	115.0	64.7	9265.0	14.6	16566.0

resul2.aic

np.float64(182.03440432213847)

Regresion de Poisson en Python

#!pip install pyreadstat

import pandas as pd
import numpy as np
import statsmodels.api as sm
import pyreadstat

df=pd.read_spss('poisson data2.sav')
df.head()

	salary	manager	genderid	worksatisf	stress	numb.absent
0	12.0	yes	identified as male	21.0	3.0	0.0
1	7.0	no	identified as female	14.0	4.0	0.0
2	10.0	yes	identified as female	6.0	6.0	0.0
3	13.0	yes	identified as male	19.0	5.0	1.0
4	8.0	no	identified as male	10.0	7.0	1.0

df2=df.rename(columns={'numb.absent':'numabsent'})

df2.dtypes

salary         float64
manager       category
genderid      category
worksatisf     float64
stress         float64
numabsent      float64
dtype: object

f = """numabsent ~ salary+C(manager)+C(genderid)+
       worksatisf+stress"""

from patsy import dmatrices

respuesta, predictores = dmatrices(f, df2, return_type='dataframe')

respuesta.head()

	numabsent
0	0.0
1	0.0
2	0.0
3	1.0
4	1.0

pois_results = sm.GLM(respuesta, predictores,
family=sm.families.Poisson()).fit()

pois_results.aic

np.float64(173.13598143045402)

pois_results.summary()

Generalized Linear Model Regression Results
Dep. Variable:	numabsent	No. Observations:	50
Model:	GLM	Df Residuals:	44
Model Family:	Poisson	Df Model:	5
Link Function:	Log	Scale:	1.0000
Method:	IRLS	Log-Likelihood:	-80.568
Date:	Wed, 07 Aug 2024	Deviance:	16.800
Time:	18:17:03	Pearson chi2:	13.1
No. Iterations:	5	Pseudo R-squ. (CS):	0.6609
Covariance Type:	nonrobust

	coef	std err	z	P>\|z\|	[0.025	0.975]
Intercept	1.6127	0.604	2.671	0.008	0.429	2.796
C(manager)[T.yes]	-0.1469	0.174	-0.846	0.397	-0.487	0.193
C(genderid)[T.identified as male]	-0.1364	0.159	-0.857	0.391	-0.448	0.175
salary	-0.0749	0.036	-2.110	0.035	-0.144	-0.005
worksatisf	-0.0615	0.032	-1.908	0.056	-0.125	0.002
stress	0.0606	0.025	2.420	0.016	0.012	0.110

pois_results.aic

np.float64(173.13598143045402)

nb_results = sm.GLM(respuesta, predictores,
family=sm.families.NegativeBinomial(alpha=0.20213936671179472)).fit()

nb_results.summary()

Generalized Linear Model Regression Results
Dep. Variable:	numabsent	No. Observations:	50
Model:	GLM	Df Residuals:	44
Model Family:	NegativeBinomial	Df Model:	5
Link Function:	Log	Scale:	1.0000
Method:	IRLS	Log-Likelihood:	-91.011
Date:	Wed, 07 Aug 2024	Deviance:	11.620
Time:	18:17:03	Pearson chi2:	8.10
No. Iterations:	6	Pseudo R-squ. (CS):	0.4826
Covariance Type:	nonrobust

	coef	std err	z	P>\|z\|	[0.025	0.975]
Intercept	1.6125	0.790	2.041	0.041	0.064	3.161
C(manager)[T.yes]	-0.1229	0.232	-0.530	0.596	-0.577	0.332
C(genderid)[T.identified as male]	-0.1213	0.215	-0.564	0.573	-0.543	0.300
salary	-0.0840	0.046	-1.819	0.069	-0.174	0.006
worksatisf	-0.0604	0.042	-1.437	0.151	-0.143	0.022
stress	0.0622	0.033	1.888	0.059	-0.002	0.127

nb_results.aic

np.float64(194.0226456547587)

nb_results.pearson_chi2/nb_results.df_resid

np.float64(0.18414903796550222)

nb_results.df_resid

np.int64(44)

nb_results.aic

np.float64(194.0226456547587)