Assignment2: Log-linear Regression Models

Reserach title: Social contact patterns during the COVID-19 crisis in Luxembourg

Background:

The purpose if the present analysis is to evaluate the pattern of social contacts during COVID-19 pandemic in Luxembourg. The social contacts were heavily affected by different epidemiological measures that were taken place during the pandemic to control the spread of infections. Several different measures have been implemented to reduce the number of physical contacts including stay-home order, wearing of face-mask, easing restrictions, reopening of secondary and elementary schools. These measures were executed at different time points targeting different contact places including work, schools, bars/restaurants, and supermarkets. Therefore the social contacts of people with different characteristics who were predominated in those places affected differently at different points of time. The purpose of the present analysis is to determine the factors that influence the rate of social contacts during COVID-19 pandemic.

Method:

The secondary survery dataset was used for the present analysis. The dataset was imported from this link: “https://doi.org/10.1371/journal.pone.0237128.s006”. The dataset included a total of 6766 participants who were recruited through social media (Twitter, Facebook) followers and readers of the science.lu website. Information were collected from a total of 5,644 participants during and 1,119 participants after the pandemic through the survey link. The dataset contains information of respondents ID, age, nationality, language, places of social contacts, household size, wearing of facemask, language, survey date, and conversation number. Data were collected in six different survey weaves.

library(readr)
rawdata <-
  read_csv(
    "https://doi.org/10.1371/journal.pone.0237128.s006",
    col_types = cols(
      RespondentID = col_character(),
      household_N = col_character(),
      conv_no_mask = col_character(),
      natcat = col_character(),
      place_of_contact = col_character()
    )
  )

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.3.3     ✓ dplyr   1.0.5
## ✓ tibble  3.1.0     ✓ stringr 1.4.0
## ✓ tidyr   1.1.3     ✓ forcats 0.5.1
## ✓ purrr   0.3.4

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(lubridate)

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

cleandat <- rawdata %>%
  mutate(Household_size = as.character(
    recode(
     household_N,
      "5 or more" = "5",
      "5 ou plus" = "5",
      "5 und mehr" = "5"
      
    )
  )) %>%
  mutate(Contact_N =as.numeric(
    recode(
      conversation_N,
      "10 or more" = "10",
      "10 ou plus" = "10",
      "10 und mehr" = "10",
      "25 or more" = "25",
      "25 ou plus" = "25",
      "25 und mehr" = "25",
      "9-Jun" = "8"
    )
  ))  %>%
  
  mutate(
    work = as.character(grepl("work", place_of_contact)),
    supermarket = as.character(grepl("supermarket", place_of_contact)),
    school = as.character(grepl("school", place_of_contact)),
    bar = as.character(grepl("res_bar", place_of_contact)),
    leisure = as.character(grepl("fun", place_of_contact)),
    home = as.character(grepl("home", place_of_contact)),
    other = as.character(grepl("other", place_of_contact))
  ) %>%  
  mutate(Work = recode(work, "TRUE" = "Yes", "FALSE" = "No")) %>%
    mutate(Supermarket = (recode(
    supermarket, "TRUE" = "Yes", "FALSE" = "No"
  ))) %>%
  mutate(School = (recode(
    school, "TRUE" = "Yes", "FALSE" = "No"
  ))) %>%
  mutate(Bar = (recode(
    bar, "TRUE" = "Yes", "FALSE" = "No"
  ))) %>%
  mutate(Leisure = (recode(
    leisure, "TRUE" = "Yes", "FALSE" = "No"
  ))) %>%
  mutate(Home = (recode(
    home, "TRUE" = "Yes", "FALSE" = "No"
  ))) %>%
  mutate(Other = (recode(
    other, "TRUE" = "Yes", "FALSE" = "No"
  ))) %>%
  mutate(date = dmy(date)) %>%
  mutate(Age = recode(
    agecat,
    "65-74" = "65+",
    "75-84" = "65+",
    "85-94" = "65+"
  ))

## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

breaks <-
  as.Date(
    c(
      "2020-03-23",
      "2020-04-02",
      "2020-04-16",
      "2020-05-01",
      "2020-06-12",
      "2020-06-25",
      "2020-06-28"
    )
  )
cleandat <- cleandat %>%
  mutate(Lockdown = ifelse(month(date) <= 5, "During", "After")) %>%
  mutate(survey.week = cut(
    date,
    breaks,
    labels = c("March 25", "April 2", "April 16", "May 1", "June 12", "June 25")
  )) %>%
  mutate(Language = recode(
    language,
    "DE" = "German",
    "EN" = "English",
    "FR" = "French"
  )) %>%
  mutate(
    Nationality = recode(
      natcat,
      "LU" = "Luxembourgish",
      "FR" = "French",
      "PT" = "Portuguese",
      "BE" = "Belgian",
      "DE" = "German",
      "IT" = "Italian"
    )
  )

Several variables within the dataset were re-coded for the present analysis. The age categories greater included people 65 years and older were merged into a single category(65+). For the household size, 5 or more were re-coded as 5. The conversation number 10 or more were re-coded as 10, 25 or more was re-coded as 25, 6-9 was re-coded as 8. The “survey.week” variable was created by breaking the date into 6 categories by matching the six different survey weaves. In addition, the “date” variable was also used to create a “Lockdown” variable with two different categories:During Lockdown, and After Lockdown. The “place of contact” variable was used to created 7 different variables:“home”, “work”, “schools”, “bars/restaurants”, and “supermarkets”, leisure, and “others” .

Statistical Analysis: Both univariate and bivariate analysis were executed to examine the pattern of social contacts during and after lockdown. Log-linear multivariate Poisson regression model different regression model was used to examined the factors influencing the patterns of social contacts during the pandemic. The analysis was repeated by using the Negative Binomial Distribution, Zero-inflated Poisson Distribution, and Zero-inflated Negative Binomial Distribution to determine the best fitted model. Several model diagnostic analysis were conducted to examine the differences between the fitted models.

Results:

Table 1. Characteristics of the participants:

epitable1<-tableone::CreateTableOne(vars=c(
  "Contact_N",
  "Age",
  "Household_size",
  "Nationality",
  "Language",
  "Work",
  "Supermarket",
  "School",
  "Bar",
  "Leisure"
), data=cleandat, strata ="Lockdown")

print(epitable1)

##                        Stratified by Lockdown
##                         After        During       p      test
##   n                     1119         5647                    
##   Contact_N (mean (SD)) 4.83 (5.29)  1.05 (2.04)  <0.001     
##   Age (%)                                         <0.001     
##      13-17                 2 ( 0.2)    44 ( 0.8)             
##      18-24                40 ( 3.6)   233 ( 4.1)             
##      25-34               180 (16.1)   977 (17.3)             
##      35-44               304 (27.2)  1767 (31.3)             
##      45-54               296 (26.5)  1381 (24.5)             
##      55-64               206 (18.4)   961 (17.0)             
##      65+                  89 ( 8.0)   281 ( 5.0)             
##   Household_size (%)                               0.228     
##      0                   162 (14.5)   751 (13.3)             
##      1                   313 (28.0)  1492 (26.4)             
##      2                   248 (22.2)  1204 (21.3)             
##      3                   225 (20.1)  1261 (22.3)             
##      4                   110 ( 9.8)   652 (11.5)             
##      5                    61 ( 5.5)   287 ( 5.1)             
##   Nationality (%)                                  0.001     
##      Belgian              42 ( 3.8)   197 ( 5.3)             
##      French               80 ( 7.3)   320 ( 8.7)             
##      German               27 ( 2.5)   103 ( 2.8)             
##      Italian              20 ( 1.8)   121 ( 3.3)             
##      Luxembourgish       736 (67.4)  2221 (60.3)             
##      others              144 (13.2)   526 (14.3)             
##      Portuguese           43 ( 3.9)   195 ( 5.3)             
##   Language (%)                                    <0.001     
##      English             290 (25.9)  1712 (30.3)             
##      French              303 (27.1)  1931 (34.2)             
##      German              526 (47.0)  2004 (35.5)             
##   Work = Yes (%)         410 (36.6)   547 ( 9.7)  <0.001     
##   Supermarket = Yes (%)  199 (17.8)   681 (12.1)  <0.001     
##   School = Yes (%)        62 ( 5.5)     0 ( 0.0)  <0.001     
##   Bar = Yes (%)          166 (14.8)     0 ( 0.0)  <0.001     
##   Leisure = Yes (%)      202 (18.1)   330 ( 5.8)  <0.001

The table showed that there was significant difference between the average number of social contacts. The mean and the standard deviation of the social contacts were much lower during lockdown comapred to after lockdown. Significant age difference was found among the participants during and after lockdown. The proportion of respondents aged 35-44 was highly representative in both periods of lockdown. More than two third of the respondents were Luxembourgish and they were the highly representative group for during and after lockdown. Almost half of the participants used German as their survey language in case of after lockdown period, while one-third of the respondents used it in case during the lockdown period. During the lockdown period, most of the social contacts were associated with supermarket followed by work place. However, in case of after lockdown, most of the social contacts were related to work, followed by leisure activities.

## Warning: Removed 84 rows containing non-finite values (stat_bin).

Identifying the best fitted models:

##Poisson regression

library()
summary(
  pois.m <-
    glm(
      Contact_N ~ Age + Language+survey.week,
      family = poisson(link = "log"),
      data = cleandat
    )
)

## 
## Call:
## glm(formula = Contact_N ~ Age + Language + survey.week, family = poisson(link = "log"), 
##     data = cleandat)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.9179  -1.4769  -1.1886   0.2859   7.6978  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         -0.54953    0.14889  -3.691 0.000223 ***
## Age18-24             0.20908    0.15341   1.363 0.172917    
## Age25-34             0.29181    0.14777   1.975 0.048297 *  
## Age35-44             0.13055    0.14726   0.887 0.375326    
## Age45-54             0.17896    0.14734   1.215 0.224528    
## Age55-64            -0.02629    0.14812  -0.177 0.859145    
## Age65+              -0.30938    0.15320  -2.019 0.043442 *  
## LanguageFrench       0.45734    0.02517  18.171  < 2e-16 ***
## LanguageGerman       0.20417    0.02541   8.036 9.32e-16 ***
## survey.weekApril 2   0.07140    0.04157   1.718 0.085837 .  
## survey.weekApril 16  0.26357    0.03623   7.275 3.45e-13 ***
## survey.weekMay 1     0.50993    0.03418  14.917  < 2e-16 ***
## survey.weekJune 12   1.73737    0.03060  56.784  < 2e-16 ***
## survey.weekJune 25   1.83835    0.03426  53.653  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 26332  on 6671  degrees of freedom
## Residual deviance: 19715  on 6658  degrees of freedom
##   (94 observations deleted due to missingness)
## AIC: 28364
## 
## Number of Fisher Scoring iterations: 6

##Negative Binomial regression
summary(negbin.m <-
  MASS::glm.nb(Contact_N ~ Age + Language + survey.week, data = cleandat))

## 
## Call:
## MASS::glm.nb(formula = Contact_N ~ Age + Language + survey.week, 
##     data = cleandat, init.theta = 0.5099527109, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7082  -1.0560  -0.9231   0.1575   3.0399  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         -0.50795    0.26397  -1.924  0.05432 .  
## Age18-24             0.26006    0.27745   0.937  0.34859    
## Age25-34             0.32100    0.26332   1.219  0.22281    
## Age35-44             0.05572    0.26166   0.213  0.83136    
## Age45-54             0.14559    0.26211   0.555  0.57859    
## Age55-64            -0.02877    0.26345  -0.109  0.91305    
## Age65+              -0.36262    0.27453  -1.321  0.18654    
## LanguageFrench       0.46396    0.05222   8.885  < 2e-16 ***
## LanguageGerman       0.15342    0.05220   2.939  0.00329 ** 
## survey.weekApril 2   0.04885    0.06931   0.705  0.48091    
## survey.weekApril 16  0.25541    0.06211   4.112 3.92e-05 ***
## survey.weekMay 1     0.51387    0.06118   8.399  < 2e-16 ***
## survey.weekJune 12   1.75610    0.06881  25.521  < 2e-16 ***
## survey.weekJune 25   1.85168    0.08816  21.005  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.51) family taken to be 1)
## 
##     Null deviance: 7428.1  on 6671  degrees of freedom
## Residual deviance: 6062.0  on 6658  degrees of freedom
##   (94 observations deleted due to missingness)
## AIC: 20957
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.5100 
##           Std. Err.:  0.0162 
## 
##  2 x log-likelihood:  -20926.7630

##Zero-inflated Poisson model

summary(zin.pois.m<-pscl::zeroinfl(Contact_N ~ Age + Language + survey.week|1, data = cleandat, dist="poisson"))

## 
## Call:
## pscl::zeroinfl(formula = Contact_N ~ Age + Language + survey.week | 1, 
##     data = cleandat, dist = "poisson")
## 
## Pearson residuals:
##     Min      1Q  Median      3Q     Max 
## -0.9139 -0.7443 -0.6876  0.2606  9.2209 
## 
## Count model coefficients (poisson with log link):
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          0.60370    0.17745   3.402 0.000669 ***
## Age18-24             0.01930    0.18103   0.107 0.915076    
## Age25-34             0.07149    0.17548   0.407 0.683699    
## Age35-44            -0.02745    0.17506  -0.157 0.875394    
## Age45-54            -0.03440    0.17513  -0.196 0.844288    
## Age55-64            -0.22904    0.17592  -1.302 0.192934    
## Age65+              -0.49495    0.18096  -2.735 0.006235 ** 
## LanguageFrench       0.33445    0.02769  12.078  < 2e-16 ***
## LanguageGerman       0.09535    0.02784   3.425 0.000614 ***
## survey.weekApril 2   0.09212    0.05046   1.826 0.067924 .  
## survey.weekApril 16  0.17746    0.04304   4.123 3.74e-05 ***
## survey.weekMay 1     0.33059    0.04013   8.238  < 2e-16 ***
## survey.weekJune 12   1.02649    0.03621  28.352  < 2e-16 ***
## survey.weekJune 25   1.12393    0.03934  28.568  < 2e-16 ***
## 
## Zero-inflation model coefficients (binomial with logit link):
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.03598    0.02874  -1.252    0.211
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Number of iterations in BFGS optimization: 20 
## Log-likelihood: -1.218e+04 on 15 Df

##Zero-inflated Negative Binomial model
summary(zin.nb.m<-pscl::zeroinfl(Contact_N ~ Age + Language + survey.week|1, data = cleandat, dist="negbin"))

## 
## Call:
## pscl::zeroinfl(formula = Contact_N ~ Age + Language + survey.week | 1, 
##     data = cleandat, dist = "negbin")
## 
## Pearson residuals:
##     Min      1Q  Median      3Q     Max 
## -0.6934 -0.5823 -0.5378  0.1693  8.4047 
## 
## Count model coefficients (negbin with log link):
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         -0.50802    0.26173  -1.941  0.05225 .  
## Age18-24             0.26019    0.27563   0.944  0.34519    
## Age25-34             0.32113    0.26114   1.230  0.21881    
## Age35-44             0.05584    0.25937   0.215  0.82954    
## Age45-54             0.14571    0.25987   0.561  0.57501    
## Age55-64            -0.02865    0.26140  -0.110  0.91274    
## Age65+              -0.36251    0.27258  -1.330  0.18354    
## LanguageFrench       0.46394    0.05231   8.869  < 2e-16 ***
## LanguageGerman       0.15341    0.05208   2.946  0.00322 ** 
## survey.weekApril 2   0.04891    0.06922   0.707  0.47983    
## survey.weekApril 16  0.25544    0.06212   4.112 3.92e-05 ***
## survey.weekMay 1     0.51388    0.06116   8.403  < 2e-16 ***
## survey.weekJune 12   1.75604    0.06914  25.398  < 2e-16 ***
## survey.weekJune 25   1.85163    0.08833  20.963  < 2e-16 ***
## Log(theta)          -0.67333    0.03178 -21.185  < 2e-16 ***
## 
## Zero-inflation model coefficients (binomial with logit link):
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)   -9.796     12.928  -0.758    0.449
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Theta = 0.51 
## Number of iterations in BFGS optimization: 37 
## Log-likelihood: -1.046e+04 on 16 Df

A log-linear multivariate Poisson regression model was fitted to estimate the rate of social contacts for three covariates: age, language, and survey.week. Since there was a high correlation between nationality and language, nationality was not included in the model. This model was repeated using the assumption of Negative Binomial Distribution, Zero-inflated Poisson Distribution, and Zero-inflated Negative Binomial Distribution to examine over-dispersion and zero inflation.

fm <- list("Poisson" =  pois.m, "Negative Binomial" = negbin.m, "ZIP" = zin.pois.m, "ZINB" = zin.nb.m)
ct<-sapply(fm, function(x) coef(x)[1:14])
 
print(ct)

##                         Poisson Negative Binomial         ZIP        ZINB
## (Intercept)         -0.54952985       -0.50795444  0.60370243 -0.50802376
## Age18-24             0.20908247        0.26006400  0.01930465  0.26018817
## Age25-34             0.29180979        0.32100163  0.07149481  0.32112674
## Age35-44             0.13055254        0.05572069 -0.02745122  0.05583990
## Age45-54             0.17895764        0.14558861 -0.03439673  0.14570513
## Age55-64            -0.02628511       -0.02876578 -0.22904166 -0.02864636
## Age65+              -0.30937767       -0.36262416 -0.49495284 -0.36251111
## LanguageFrench       0.45734361        0.46395746  0.33445210  0.46393953
## LanguageGerman       0.20417347        0.15341852  0.09535427  0.15340824
## survey.weekApril 2   0.07140055        0.04885493  0.09211570  0.04891256
## survey.weekApril 16  0.26357172        0.25541050  0.17745793  0.25543503
## survey.weekMay 1     0.50992820        0.51386791  0.33059110  0.51388300
## survey.weekJune 12   1.73736977        1.75609575  1.02649473  1.75603715
## survey.weekJune 25   1.83835177        1.85167945  1.12393244  1.85162829

The coefficients, standard error, and corresponding p-values from different models were compared. The age group 65+ was found to be a significant covariate according to the log-linear multivariate Poisson regression model and zero inflated log-linear multivariate Poisson regression model. However, the variable age was not significant for negative binomial model and zero-inflated negative binomial model. In addition the robust estimation of standard error was calculated to see the differences between the standard errors and p-values of the fitted models and their robust estimation. The robust standard errors for the parameter estimates were calculated to obtain the robust standard errors and calculated the p-values by using the sandwich package of R. The results suggested that the age variable was no longer significant for the robust estimation of Poisson model and zero-inflated Poisson model. Therefore, we can assume that Poisson model and zero-inflated Poisson model might violate the distribution assumption.

library(sandwich)
library(lmtest)

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

coeftest(pois.m, vcov=sandwich)

## 
## z test of coefficients:
## 
##                      Estimate Std. Error z value  Pr(>|z|)    
## (Intercept)         -0.549530   0.354447 -1.5504 0.1210490    
## Age18-24             0.209082   0.363674  0.5749 0.5653475    
## Age25-34             0.291810   0.352013  0.8290 0.4071195    
## Age35-44             0.130553   0.351158  0.3718 0.7100590    
## Age45-54             0.178958   0.351361  0.5093 0.6105230    
## Age55-64            -0.026285   0.351952 -0.0747 0.9404663    
## Age65+              -0.309378   0.357693 -0.8649 0.3870804    
## LanguageFrench       0.457344   0.057354  7.9741 1.535e-15 ***
## LanguageGerman       0.204173   0.054243  3.7640 0.0001672 ***
## survey.weekApril 2   0.071401   0.083092  0.8593 0.3901757    
## survey.weekApril 16  0.263572   0.071587  3.6818 0.0002316 ***
## survey.weekMay 1     0.509928   0.066125  7.7115 1.243e-14 ***
## survey.weekJune 12   1.737370   0.064105 27.1018 < 2.2e-16 ***
## survey.weekJune 25   1.838352   0.074089 24.8126 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

coeftest(negbin.m, vcov=sandwich)

## 
## z test of coefficients:
## 
##                      Estimate Std. Error z value  Pr(>|z|)    
## (Intercept)         -0.507954   0.342547 -1.4829 0.1381079    
## Age18-24             0.260064   0.351750  0.7393 0.4596979    
## Age25-34             0.321002   0.340137  0.9437 0.3453018    
## Age35-44             0.055721   0.339158  0.1643 0.8695020    
## Age45-54             0.145589   0.339501  0.4288 0.6680464    
## Age55-64            -0.028766   0.340153 -0.0846 0.9326055    
## Age65+              -0.362624   0.346202 -1.0474 0.2948986    
## LanguageFrench       0.463957   0.058361  7.9498 1.869e-15 ***
## LanguageGerman       0.153419   0.055975  2.7408 0.0061284 ** 
## survey.weekApril 2   0.048855   0.081506  0.5994 0.5489039    
## survey.weekApril 16  0.255411   0.071376  3.5784 0.0003457 ***
## survey.weekMay 1     0.513868   0.066136  7.7698 7.860e-15 ***
## survey.weekJune 12   1.756096   0.063585 27.6179 < 2.2e-16 ***
## survey.weekJune 25   1.851679   0.072770 25.4457 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

coeftest(zin.pois.m, vcov=sandwich)

## 
## t test of coefficients:
## 
##                            Estimate Std. Error t value  Pr(>|t|)    
## count_(Intercept)          0.603702   0.414028  1.4581   0.14485    
## count_Age18-24             0.019305   0.418756  0.0461   0.96323    
## count_Age25-34             0.071495   0.408907  0.1748   0.86121    
## count_Age35-44            -0.027451   0.408458 -0.0672   0.94642    
## count_Age45-54            -0.034397   0.408750 -0.0842   0.93294    
## count_Age55-64            -0.229042   0.409287 -0.5596   0.57576    
## count_Age65+              -0.494953   0.414153 -1.1951   0.23209    
## count_LanguageFrench       0.334452   0.059528  5.6184 2.005e-08 ***
## count_LanguageGerman       0.095354   0.055654  1.7133   0.08670 .  
## count_survey.weekApril 2   0.092116   0.093730  0.9828   0.32575    
## count_survey.weekApril 16  0.177458   0.078109  2.2719   0.02312 *  
## count_survey.weekMay 1     0.330591   0.070148  4.7128 2.493e-06 ***
## count_survey.weekJune 12   1.026495   0.070911 14.4758 < 2.2e-16 ***
## count_survey.weekJune 25   1.123932   0.078719 14.2778 < 2.2e-16 ***
## zero_(Intercept)          -0.035977   0.032326 -1.1130   0.26577    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

coeftest(zin.nb.m, vcov=sandwich)

## 
## t test of coefficients:
## 
##                            Estimate Std. Error  t value  Pr(>|t|)    
## count_(Intercept)         -0.508024   0.335866  -1.5126  0.130434    
## count_Age18-24             0.260188   0.345939   0.7521  0.452005    
## count_Age25-34             0.321127   0.333645   0.9625  0.335844    
## count_Age35-44             0.055840   0.332578   0.1679  0.866667    
## count_Age45-54             0.145705   0.332995   0.4376  0.661720    
## count_Age55-64            -0.028646   0.333965  -0.0858  0.931647    
## count_Age65+              -0.362511   0.340028  -1.0661  0.286407    
## count_LanguageFrench       0.463940   0.058087   7.9870 1.616e-15 ***
## count_LanguageGerman       0.153408   0.055466   2.7658  0.005694 ** 
## count_survey.weekApril 2   0.048913   0.081039   0.6036  0.546152    
## count_survey.weekApril 16  0.255435   0.071305   3.5823  0.000343 ***
## count_survey.weekMay 1     0.513883   0.066007   7.7853 8.012e-15 ***
## count_survey.weekJune 12   1.756037   0.064106  27.3927 < 2.2e-16 ***
## count_survey.weekJune 25   1.851628   0.072999  25.3652 < 2.2e-16 ***
## zero_(Intercept)          -9.795565   0.596082 -16.4333 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

lrtest(pois.m, negbin.m )

## Likelihood ratio test
## 
## Model 1: Contact_N ~ Age + Language + survey.week
## Model 2: Contact_N ~ Age + Language + survey.week
##   #Df LogLik Df Chisq Pr(>Chisq)    
## 1  14 -14168                        
## 2  15 -10463  1  7409  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

lrtest(negbin.m, zin.pois.m )

## Warning in modelUpdate(objects[[i - 1]], objects[[i]]): original model was of
## class "negbin", updated model is of class "zeroinfl"

## Likelihood ratio test
## 
## Model 1: Contact_N ~ Age + Language + survey.week
## Model 2: Contact_N ~ Age + Language + survey.week | 1
##   #Df LogLik Df  Chisq Pr(>Chisq)    
## 1  15 -10463                         
## 2  15 -12179  0 3431.8  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

lrtest(negbin.m, zin.nb.m )

## Warning in modelUpdate(objects[[i - 1]], objects[[i]]): original model was of
## class "negbin", updated model is of class "zeroinfl"

## Likelihood ratio test
## 
## Model 1: Contact_N ~ Age + Language + survey.week
## Model 2: Contact_N ~ Age + Language + survey.week | 1
##   #Df LogLik Df Chisq Pr(>Chisq)
## 1  15 -10463                    
## 2  16 -10463  1 0.012     0.9129

The likelihood ratio test (lrtest) was conducted to examine the differences between models. The significant differences were found between Poisson model and negative binomial model, as well as Zero-inflated Poisson model and negative binomial model. However, there was no statistically significant difference between negative binomial model and zero-inflated negative binomial model.

with(negbin.m, cbind(res.deviance = deviance, df = df.residual,
               p = pchisq(deviance, df.residual, lower.tail=FALSE)))

##      res.deviance   df         p
## [1,]     6061.981 6658 0.9999999

with(pois.m, cbind(res.deviance = deviance, df = df.residual,
                     p = pchisq(deviance, df.residual, lower.tail=FALSE)))

##      res.deviance   df p
## [1,]     19715.33 6658 0

The residual deviance was used to perform a goodness of fit test for the overall model both for the Possion model and Negative Binomial model. The test was found to be significant for the Poisson model while it was insignificant for the Negative Binomial model. There this results suggested that the Negative Binomial model can be the better fitted model over the Poisson model.

## Warning in plot.window(...): "conf.int" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "conf.int" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "conf.int" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "conf.int" is not a
## graphical parameter

## Warning in box(...): "conf.int" is not a graphical parameter

## Warning in title(...): "conf.int" is not a graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "conf.int" is not a
## graphical parameter

## Warning in title(sub = sub.caption, ...): "conf.int" is not a graphical
## parameter

## Warning in plot.window(...): "conf.int" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "conf.int" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "conf.int" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "conf.int" is not a
## graphical parameter

## Warning in box(...): "conf.int" is not a graphical parameter

## Warning in title(...): "conf.int" is not a graphical parameter

## Warning in title(sub = sub.caption, ...): "conf.int" is not a graphical
## parameter

## Warning in plot.window(...): "conf.int" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "conf.int" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "conf.int" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "conf.int" is not a
## graphical parameter

## Warning in box(...): "conf.int" is not a graphical parameter

## Warning in title(...): "conf.int" is not a graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "conf.int" is not a
## graphical parameter

## Warning in title(sub = sub.caption, ...): "conf.int" is not a graphical
## parameter

## Warning in plot.window(...): "conf.int" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "conf.int" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "conf.int" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "conf.int" is not a
## graphical parameter

## Warning in box(...): "conf.int" is not a graphical parameter

## Warning in title(...): "conf.int" is not a graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "conf.int" is not a
## graphical parameter

## Warning in title(sub = sub.caption, ...): "conf.int" is not a graphical
## parameter

Further, the fitted values of the Negative binomial model and the zero-inflated negative binomial model were plotted to see the differences between these two models. There was no observable differences between these two models.

Since there was no observable differences between Negative binomial and Zero-inflated negative binomial model, the negative binomial model can be considered as the best fitted model and can be used to interpret the results.

Interpretation of the results from the best fitted models:

	Number of Social Contacts
Predictors	Incidence Rate Ratios	std. Error	CI	p
(Intercept)	0.60	0.16	0.36 – 1.02	0.054
Age18-24	1.30	0.36	0.75 – 2.21	0.349
Age25-34	1.38	0.36	0.81 – 2.28	0.223
Age35-44	1.06	0.28	0.63 – 1.74	0.831
Age45-54	1.16	0.30	0.68 – 1.91	0.579
Age55-64	0.97	0.26	0.57 – 1.61	0.913
Age [65+]	0.70	0.19	0.40 – 1.18	0.187
Language [French]	1.59	0.08	1.44 – 1.76	<0.001
Language [German]	1.17	0.06	1.05 – 1.29	0.003
survey.week [April 2]	1.05	0.07	0.92 – 1.20	0.481
survey.week [April 16]	1.29	0.08	1.14 – 1.46	<0.001
survey.week [May 1]	1.67	0.10	1.48 – 1.89	<0.001
survey.week [June 12]	5.79	0.40	5.06 – 6.64	<0.001
survey.week [June 25]	6.37	0.56	5.37 – 7.59	<0.001
Observations	6672
R² Nagelkerke	0.276
Deviance	6061.981
AIC	20956.763
log-Likelihood	-10463.381

The reference groups for the variables age, language, and survey.week were 13-17 years, English, and March 25 respectively. The results showed that compared to the English language category, the rate of social contacts were 1.59 and 1.17 times greater respectively for French and German. The rate ratio of the social contacts were comparatively higher after lockdown periods (June 12 and June 25) than during lockdown periods (April 2, April 16, May 1). Whereas the rate of social contacts were 1.29, and 1.67 times higher for April 16 and May 1 respectively compared to March 25, the rates were 5.79 and 6.37 time higher for June 12 and June 25.

##Conclusion:

The results showed that the language and the survey week which actually reflects the periods of lockdown were the significant factors that affted the rates of social contacts.

Since there was no significant differences between negative binomial model and zero inflated negative binomial model, I found it little hard to determine which model should I use for displaying and interpreting the results. It is still unclear to me what would be the magnitude of counts of zeros to consider zero-inflated model as the appropriate model.

Assignment2: Log-linear Regression Models

Mst Afroza Parvin

11/1/2020