library(gcmr)
library(ggplot2)
library(glmnet)Loading required package: Matrix
Loaded glmnet 4.1-7
Pacotes usados:
library(gcmr)
library(ggplot2)
library(glmnet)Loading required package: Matrix
Loaded glmnet 4.1-7
ETL dos dados:
dados = read.csv("http://leg.ufpr.br/~lucambio/CE090/20231S/DRS.dat", header = TRUE, sep = ",")
dados$treat <- as.factor(dados$treat)
dados$type <- as.factor(dados$type)
dados$ind <- as.factor(dados$ind)Treinando modelos:
Poisson:
copulapois <- gcmr(status ~ . -Obs -id, marginal = poisson.marg(),
cormat = cluster.cormat(id, type = "exc"), data = dados)summary(copulapois)
Call:
gcmr(formula = status ~ . - Obs - id, data = dados, marginal = poisson.marg(),
cormat = cluster.cormat(id, type = "exc"))
Coefficients marginal model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.087980 0.184866 -0.476 0.63414
ind2 0.267516 0.082319 3.250 0.00116 **
obs_time -0.031644 0.003716 -8.517 < 2e-16 ***
treat1 -0.159813 0.111571 -1.432 0.15203
treat2 -0.111531 0.100803 -1.106 0.26854
age -0.002087 0.011257 -0.185 0.85290
type2 0.228088 0.345206 0.661 0.50879
Coefficients Gaussian copula:
Estimate Std. Error z value Pr(>|z|)
tau 0.91049 0.04124 22.08 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
log likelihood = 228.06, AIC = 472.11
plot(copulapois)Binomial:
copulaBin <- gcmr(status ~ . -Obs -id, marginal = binomial.marg(), cormat = cluster.cormat(id, type = "exc"), data = dados)summary(copulaBin)
Call:
gcmr(formula = status ~ . - Obs - id, data = dados, marginal = binomial.marg(),
cormat = cluster.cormat(id, type = "exc"))
Coefficients marginal model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.425575 0.461513 5.256 1.47e-07 ***
ind2 0.891029 0.231518 3.849 0.000119 ***
obs_time -0.093177 0.009285 -10.035 < 2e-16 ***
treat1 -0.444397 0.288025 -1.543 0.122853
treat2 -0.278507 0.273096 -1.020 0.307817
age -0.015060 0.020521 -0.734 0.462998
type2 0.740572 0.618865 1.197 0.231438
Coefficients Gaussian copula:
Estimate Std. Error z value Pr(>|z|)
tau 0.82426 0.09618 8.57 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
log likelihood = 148.01, AIC = 312.03
plot(copulaBin)Gaussiano:
copula <- gcmr(status ~ . -Obscid, marginal = gaussian.marg(), cormat = cluster.cormat(id, type = "exc"), data = dados)Warning in terms.formula(fi, ...): 'varlist' has changed (from nvar=8) to new 9
after EncodeVars() -- should no longer happen!
Warning in terms.formula(fi, ...): 'varlist' has changed (from nvar=8) to new 9
after EncodeVars() -- should no longer happen!
Warning in terms.formula(fi, ...): 'varlist' has changed (from nvar=8) to new 9
after EncodeVars() -- should no longer happen!
summary(copula)
Call:
gcmr(formula = status ~ . - Obscid, data = dados, marginal = gaussian.marg(),
cormat = cluster.cormat(id, type = "exc"))
Coefficients marginal model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.9384445 0.0710656 13.205 < 2e-16 ***
Obs -0.0012850 0.0044170 -0.291 0.771
id 0.0002520 0.0010151 0.248 0.804
ind2 0.1422006 0.0291257 4.882 1.05e-06 ***
obs_time -0.0157891 0.0009685 -16.302 < 2e-16 ***
treat1 -0.0606403 0.0370243 -1.638 0.101
treat2 -0.0553727 0.0371285 -1.491 0.136
age -0.0014056 0.0028169 -0.499 0.618
type2 0.0902787 0.0844710 1.069 0.285
sigma 0.3716866 0.0149176 24.916 < 2e-16 ***
Coefficients Gaussian copula:
Estimate Std. Error z value Pr(>|z|)
tau 0.44532 0.06226 7.153 8.48e-13 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
log likelihood = 143.62, AIC = 309.25
plot(copula)Como pode ser notado, o tempo observado foi a variável mais significativa nos dados. O modelo que teve melhor ajuste foi o de Poisson.
ETL dos dados:
data <- read.csv(file = "http://leg.ufpr.br/~lucambio/MSM/cigarettes.txt",
header = TRUE, sep = "")
data$Gender <- as.factor(data$Gender)
data$Brand <- as.factor(data$Brand)Modelo sem interação:
summary(glm(y ~ ., family = poisson, data = data))
Call:
glm(formula = y ~ ., family = poisson, data = data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.86527 1.07137 1.741 0.0817 .
Price 0.30959 0.32024 0.967 0.3337
GenderM -0.01568 0.22505 -0.070 0.9444
BrandB -0.87602 0.37957 -2.308 0.0210 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 63.647 on 11 degrees of freedom
Residual deviance: 46.311 on 8 degrees of freedom
AIC: 105.56
Number of Fisher Scoring iterations: 5
Modelo com interação:
summary(glm(y ~ . + Price:Brand, family = poisson, data = data))
Call:
glm(formula = y ~ . + Price:Brand, family = poisson, data = data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 6.7204 2.6941 2.494 0.0126 *
Price -1.1348 0.8028 -1.414 0.1575
GenderM 0.2615 0.2491 1.050 0.2939
BrandB -6.6896 2.9766 -2.247 0.0246 *
Price:BrandB 1.6522 0.8394 1.968 0.0490 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 63.647 on 11 degrees of freedom
Residual deviance: 42.361 on 7 degrees of freedom
AIC: 103.61
Number of Fisher Scoring iterations: 5
Sem a análise da interação, somente há efeito da marca. No entanto, quando se adciona a interação, é possível notar que a interação preço e marca é significativa.
Adcionando curvatura no preço:
m = glm(y~.+Price:Brand+I(Price^2), family = poisson, data = data)
summary(m)
Call:
glm(formula = y ~ . + Price:Brand + I(Price^2), family = poisson,
data = data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -183.8193 46.7926 -3.928 8.55e-05 ***
Price 103.9964 25.7996 4.031 5.56e-05 ***
GenderM -2.3613 0.6767 -3.489 0.000484 ***
BrandB -105.5351 24.7177 -4.270 1.96e-05 ***
I(Price^2) -14.2874 3.5069 -4.074 4.62e-05 ***
Price:BrandB 25.7770 6.0386 4.269 1.97e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 63.647 on 11 degrees of freedom
Residual deviance: 22.688 on 6 degrees of freedom
AIC: 85.941
Number of Fisher Scoring iterations: 5
plot(m)Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
Ao adcionar a curvatura no preço, todas as variáveis se tornam significativas. Nesse ponto, deve-se tomar cuidado com um eventual overfitting.
Preparando os dados:
data$Gender <- as.numeric(data$Gender)
data$Brand <- as.numeric(data$Brand)
m <- as.matrix(data[,2:4])
r <- data$yValor ótimo encontrado:
mod <- cv.glmnet(m, r, alpha = 0)Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
fold
(mod$lambda.min)[1] 18.24972