Introduction

In this homework, we use Zelig 5 syntax for Count Data Model. It is thought that regular visits to the doctor’s office will generally keep an individual in good health as they are aware of whats going on with their bodies. By this logic, the frequency of doctor’s visits should remain aproximately the same at all stages of life (age). I argue that the frequency of visits to the doctor’s office doesn’t weigh much on the individual’s health as does other factors like lifestyle and age. Meaning that their will be an increase of visits to the doctor’s office if those elements are factored in. Since lifestlye is too broad a term to quanify in a dataset, we focus on how age affects the number of doctors visits.

Data

The data of this homework is the doctor visits dataset from Econometric Academy, Data source: https://sites.google.com/site/econometricsacademy/econometrics-models/count-data-models

file: count_docvisit.csv

Results

library(Zelig)
library(tidyr)
library(dplyr)
library(ggplot2)

hw7 <- read.csv("count_docvisit.csv")

head(hw7)
##   docvis private medicaid age educyr offer  ssiratio physician
## 1      4       1        0  85     15     0 0.3945056         4
## 2      6       0        0  83      8     0 1.0000000         6
## 3      2       0        1  82     11     0 1.0000000         2
## 4     11       1        0  77     13     0 0.2051624        11
## 5      3       1        0  76     14     0 0.2169230         3
## 6      2       0        0  69     12     0 0.2192556         2
##   nonphysician female phylim actlim income totchr insured age2     linc bh
## 1            0      1      0      0 27.883      3       1 7225 3.328017  0
## 2            0      1      1      1 11.628      2       0 6889 2.453416  1
## 3            1      1      1      0  3.456      2       0 6724 1.240112  1
## 4            9      0      0      0 38.974      3       1 5929 3.662895  0
## 5            5      1      0      0 36.861      1       1 5776 3.607154  0
## 6            4      0      0      0 34.120      1       0 4761 3.529884  0
##     ldocvis  ldocvisa
## 1 1.3862940 1.3862940
## 2 1.7917590 1.7917590
## 3 0.6931472 0.6931472
## 4 2.3978950 2.3978950
## 5 1.0986120 1.0986120
## 6 0.6931472 0.6931472
z.out1 <- zelig(docvis~ private + medicaid + age + educyr, model = "poisson", data = hw7, cite = FALSE)

summary(z.out1) # similar to slide 4.3 of week 7
## Model: 
## 
## Call:
## z5$zelig(formula = docvis ~ private + medicaid + age + educyr, 
##     data = hw7)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -4.4707  -2.1271  -0.7913   0.8882  23.9975  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.7446607  0.0797939   9.332   <2e-16
## private     0.1542169  0.0142399  10.830   <2e-16
## medicaid    0.2911347  0.0184861  15.749   <2e-16
## age         0.0102186  0.0009912  10.309   <2e-16
## educyr      0.0251894  0.0018551  13.579   <2e-16
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 22873  on 3676  degrees of freedom
## Residual deviance: 22367  on 3672  degrees of freedom
## AIC: 34022
## 
## Number of Fisher Scoring iterations: 5
## 
## Next step: Use 'setx' method
# Age effect, similar to slide 4.4 of week 7
age.range = min(hw7$age):max(hw7$age)
x <- setx(z.out1, age = age.range)
s <- sim(z.out1, x = x)
ci.plot(s)

# Age effect matching slide 4.5 in week 7

x <- setx(z.out1, age = mean(hw7$age))
x1 <- setx(z.out1, age = mean(hw7$age) + sd(hw7$age))
s2 <- sim(z.out1, x = x, x1 = x1)
summary(s2)
## 
##  sim x :
##  -----
## ev
##          mean         sd      50%     2.5%    97.5%
## [1,] 6.753438 0.04351615 6.753168 6.667124 6.833751
## pv
##       mean       sd 50% 2.5% 97.5%
## [1,] 6.721 2.507877   7    2    12
## 
##  sim x1 :
##  -----
## ev
##         mean         sd      50%     2.5%    97.5%
## [1,] 7.20528 0.06352202 7.204672 7.084568 7.324758
## pv
##       mean       sd 50% 2.5% 97.5%
## [1,] 7.218 2.677619   7    2    13
## fd
##           mean         sd       50%      2.5%     97.5%
## [1,] 0.4518413 0.04655713 0.4516489 0.3628303 0.5413828

As we can see from the the plots above, number of doctor visits increase as age increases.

We can also infer from the summary table above that one standard deviation increase in age results in 0.4566348 times ( or 6.76 %) more number of doctor visits, which is significant.

### Similar to slide 4.7 of week 7
#### 25th, 50th and 75th quantile of age
# 25th
xl <- setx(z.out1, age = quantile(hw7$age, 0.25))
xl1 <- setx(z.out1, age = quantile(hw7$age, 0.25) + sd(hw7$age))
sl <- sim(z.out1, x=xl, x1=xl1)
fdl <- sl$get_qi(xvalue = "x1", qi="fd") # Getting frist difference

# 50th
xm <- setx(z.out1, age = quantile(hw7$age, 0.50))
xm1 <- setx(z.out1, age = quantile(hw7$age, 0.50) + sd(hw7$age))
sm <- sim(z.out1, x=xm, x1=xm1)
fdm <- sm$get_qi(xvalue = "x1", qi="fd")# Getting frist difference

# 75th
xh <- setx(z.out1, age = quantile(hw7$age, 0.75))
xh1 <- setx(z.out1, age = quantile(hw7$age, 0.75) + sd(hw7$age))
sh <- sim(z.out1, x=xh, x1=xh1)
fdh <- sh$get_qi(xvalue = "x1", qi="fd")# Getting frist difference
# slide 4.8 - 4.10 of week 7
dfd <- data.frame(cbind(fdl, fdm, fdh))
head(dfd)
##          X1        X2        X3
## 1 0.4669089 0.4107643 0.4022919
## 2 0.4200526 0.4866277 0.4942046
## 3 0.3831460 0.4551574 0.4410064
## 4 0.4063586 0.3940795 0.5013202
## 5 0.3765695 0.4802049 0.4616960
## 6 0.3276288 0.4086539 0.5015198
### Putting them together

tidd <- dfd %>% 
  gather(quantile, dif, 1:3)
head(tidd)
##   quantile       dif
## 1       X1 0.4669089
## 2       X1 0.4200526
## 3       X1 0.3831460
## 4       X1 0.4063586
## 5       X1 0.3765695
## 6       X1 0.3276288
tidd %>% 
  group_by(quantile) %>% 
  summarise(mean = mean(dif), sd = sd(dif))
## # A tibble: 3 x 3
##   quantile  mean     sd
##   <chr>    <dbl>  <dbl>
## 1 X1       0.430 0.0413
## 2 X2       0.455 0.0449
## 3 X3       0.478 0.0496
#### ggplot

ggplot(tidd, aes(dif)) + geom_histogram() + facet_grid(~quantile)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusion

In the above graphs and tables, the three quantile groups ( 25th, 50th and 75th) produce similar results. However, older age group (e.g. 75th quantile) has higher increase in average number of doctor visits per unit increase in standard deviation of age compared to younger age group (e.g. 25th quantile) Thus supporting my idea that age is a better indicator of how frequent a doctor’s visit may be. Moreover the implications being that as one gets older, the more health issues they may have (hence the increased doctor’s visits).