In this homework, we use Zelig 5 syntax for Count Data Model. It is thought that regular visits to the doctor’s office will generally keep an individual in good health as they are aware of whats going on with their bodies. By this logic, the frequency of doctor’s visits should remain aproximately the same at all stages of life (age). I argue that the frequency of visits to the doctor’s office doesn’t weigh much on the individual’s health as does other factors like lifestyle and age. Meaning that their will be an increase of visits to the doctor’s office if those elements are factored in. Since lifestlye is too broad a term to quanify in a dataset, we focus on how age affects the number of doctors visits.
The data of this homework is the doctor visits dataset from Econometric Academy, Data source: https://sites.google.com/site/econometricsacademy/econometrics-models/count-data-models
file: count_docvisit.csv
library(Zelig)
library(tidyr)
library(dplyr)
library(ggplot2)
hw7 <- read.csv("count_docvisit.csv")
head(hw7)
## docvis private medicaid age educyr offer ssiratio physician
## 1 4 1 0 85 15 0 0.3945056 4
## 2 6 0 0 83 8 0 1.0000000 6
## 3 2 0 1 82 11 0 1.0000000 2
## 4 11 1 0 77 13 0 0.2051624 11
## 5 3 1 0 76 14 0 0.2169230 3
## 6 2 0 0 69 12 0 0.2192556 2
## nonphysician female phylim actlim income totchr insured age2 linc bh
## 1 0 1 0 0 27.883 3 1 7225 3.328017 0
## 2 0 1 1 1 11.628 2 0 6889 2.453416 1
## 3 1 1 1 0 3.456 2 0 6724 1.240112 1
## 4 9 0 0 0 38.974 3 1 5929 3.662895 0
## 5 5 1 0 0 36.861 1 1 5776 3.607154 0
## 6 4 0 0 0 34.120 1 0 4761 3.529884 0
## ldocvis ldocvisa
## 1 1.3862940 1.3862940
## 2 1.7917590 1.7917590
## 3 0.6931472 0.6931472
## 4 2.3978950 2.3978950
## 5 1.0986120 1.0986120
## 6 0.6931472 0.6931472
z.out1 <- zelig(docvis~ private + medicaid + age + educyr, model = "poisson", data = hw7, cite = FALSE)
summary(z.out1) # similar to slide 4.3 of week 7
## Model:
##
## Call:
## z5$zelig(formula = docvis ~ private + medicaid + age + educyr,
## data = hw7)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -4.4707 -2.1271 -0.7913 0.8882 23.9975
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.7446607 0.0797939 9.332 <2e-16
## private 0.1542169 0.0142399 10.830 <2e-16
## medicaid 0.2911347 0.0184861 15.749 <2e-16
## age 0.0102186 0.0009912 10.309 <2e-16
## educyr 0.0251894 0.0018551 13.579 <2e-16
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 22873 on 3676 degrees of freedom
## Residual deviance: 22367 on 3672 degrees of freedom
## AIC: 34022
##
## Number of Fisher Scoring iterations: 5
##
## Next step: Use 'setx' method
# Age effect, similar to slide 4.4 of week 7
age.range = min(hw7$age):max(hw7$age)
x <- setx(z.out1, age = age.range)
s <- sim(z.out1, x = x)
ci.plot(s)
# Age effect matching slide 4.5 in week 7
x <- setx(z.out1, age = mean(hw7$age))
x1 <- setx(z.out1, age = mean(hw7$age) + sd(hw7$age))
s2 <- sim(z.out1, x = x, x1 = x1)
summary(s2)
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 6.753438 0.04351615 6.753168 6.667124 6.833751
## pv
## mean sd 50% 2.5% 97.5%
## [1,] 6.721 2.507877 7 2 12
##
## sim x1 :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 7.20528 0.06352202 7.204672 7.084568 7.324758
## pv
## mean sd 50% 2.5% 97.5%
## [1,] 7.218 2.677619 7 2 13
## fd
## mean sd 50% 2.5% 97.5%
## [1,] 0.4518413 0.04655713 0.4516489 0.3628303 0.5413828
As we can see from the the plots above, number of doctor visits increase as age increases.
We can also infer from the summary table above that one standard deviation increase in age results in 0.4566348 times ( or 6.76 %) more number of doctor visits, which is significant.
### Similar to slide 4.7 of week 7
#### 25th, 50th and 75th quantile of age
# 25th
xl <- setx(z.out1, age = quantile(hw7$age, 0.25))
xl1 <- setx(z.out1, age = quantile(hw7$age, 0.25) + sd(hw7$age))
sl <- sim(z.out1, x=xl, x1=xl1)
fdl <- sl$get_qi(xvalue = "x1", qi="fd") # Getting frist difference
# 50th
xm <- setx(z.out1, age = quantile(hw7$age, 0.50))
xm1 <- setx(z.out1, age = quantile(hw7$age, 0.50) + sd(hw7$age))
sm <- sim(z.out1, x=xm, x1=xm1)
fdm <- sm$get_qi(xvalue = "x1", qi="fd")# Getting frist difference
# 75th
xh <- setx(z.out1, age = quantile(hw7$age, 0.75))
xh1 <- setx(z.out1, age = quantile(hw7$age, 0.75) + sd(hw7$age))
sh <- sim(z.out1, x=xh, x1=xh1)
fdh <- sh$get_qi(xvalue = "x1", qi="fd")# Getting frist difference
# slide 4.8 - 4.10 of week 7
dfd <- data.frame(cbind(fdl, fdm, fdh))
head(dfd)
## X1 X2 X3
## 1 0.4669089 0.4107643 0.4022919
## 2 0.4200526 0.4866277 0.4942046
## 3 0.3831460 0.4551574 0.4410064
## 4 0.4063586 0.3940795 0.5013202
## 5 0.3765695 0.4802049 0.4616960
## 6 0.3276288 0.4086539 0.5015198
### Putting them together
tidd <- dfd %>%
gather(quantile, dif, 1:3)
head(tidd)
## quantile dif
## 1 X1 0.4669089
## 2 X1 0.4200526
## 3 X1 0.3831460
## 4 X1 0.4063586
## 5 X1 0.3765695
## 6 X1 0.3276288
tidd %>%
group_by(quantile) %>%
summarise(mean = mean(dif), sd = sd(dif))
## # A tibble: 3 x 3
## quantile mean sd
## <chr> <dbl> <dbl>
## 1 X1 0.430 0.0413
## 2 X2 0.455 0.0449
## 3 X3 0.478 0.0496
#### ggplot
ggplot(tidd, aes(dif)) + geom_histogram() + facet_grid(~quantile)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
In the above graphs and tables, the three quantile groups ( 25th, 50th and 75th) produce similar results. However, older age group (e.g. 75th quantile) has higher increase in average number of doctor visits per unit increase in standard deviation of age compared to younger age group (e.g. 25th quantile) Thus supporting my idea that age is a better indicator of how frequent a doctor’s visit may be. Moreover the implications being that as one gets older, the more health issues they may have (hence the increased doctor’s visits).