This data comes from a German health survey from 1998. There are 1,127 observations on 3 variables, including: numvisit, the number of visits to doctor during 1998; badh, the patient’s perceived health status (1=patient claims to be in bad health; 0=not in bad health); and age of patient.
I would expect older patients to visit the doctor more frequently. I would also expect patients who claim to be in bad health to visit more than patients who claim to not be in bad health.
library(Zelig)
library(ZeligChoice)
library(faraway)
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.4.4
library(tidyr)
## Warning: package 'tidyr' was built under R version 3.4.4
library(readr)
health <- read_csv("/Users/meredithpowers/Desktop/badhealth.csv")
## Warning: Missing column names filled in: 'X1' [1]
tibble::glimpse(health)
## Observations: 1,127
## Variables: 4
## $ X1 <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16...
## $ numvisit <int> 30, 20, 16, 20, 15, 15, 13, 15, 15, 40, 15, 13, 12, 1...
## $ badh <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0,...
## $ age <int> 58, 54, 44, 57, 33, 28, 37, 31, 30, 47, 51, 40, 40, 4...
str(health)
## Classes 'tbl_df', 'tbl' and 'data.frame': 1127 obs. of 4 variables:
## $ X1 : int 1 2 3 4 5 6 7 8 9 10 ...
## $ numvisit: int 30 20 16 20 15 15 13 15 15 40 ...
## $ badh : int 0 0 0 0 0 0 0 0 0 1 ...
## $ age : int 58 54 44 57 33 28 37 31 30 47 ...
## - attr(*, "spec")=List of 2
## ..$ cols :List of 4
## .. ..$ X1 : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ numvisit: list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ badh : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ age : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## ..$ default: list()
## .. ..- attr(*, "class")= chr "collector_guess" "collector"
## ..- attr(*, "class")= chr "col_spec"
head(health)
h1 <- zelig(numvisit ~ badh + age + age*badh, model = "poisson", data = health, cite = F)
summary(h1)
## Model:
##
## Call:
## z5$zelig(formula = numvisit ~ badh + age + age * badh, data = health)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.5892 -1.8947 -0.6480 0.6023 9.9147
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.346885 0.081416 4.261 2.04e-05
## badh 1.568949 0.181533 8.643 < 2e-16
## age 0.008503 0.002086 4.077 4.57e-05
## badh:age -0.010918 0.004196 -2.602 0.00926
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 4020.3 on 1126 degrees of freedom
## Residual deviance: 3458.6 on 1123 degrees of freedom
## AIC: 5633.8
##
## Number of Fisher Scoring iterations: 6
##
## Next step: Use 'setx' method
Clearly, health status greatly increases the number of visits to the doctor – that’s an increase of 1.5 for health alone. Age also has an increase, though not as dramatic.
data(health)
## Warning in data(health): data set 'health' not found
a.range = min(health$age):max(health$age)
h1$setrange(age=a.range)
h1$sim()
h1$graph()
## age chosen as the x-axis variable. Use the var argument to specify a different variable.
As age increases, the number of visits to the doctor increased as well.