\(\;\)
Linear models (to me other people might dispute this)are the most useful applied statistical technique. However, they are not without their limitations. A couple of limitations that come to my mind:
Additive response models don’t make much sense if the response is discrete, or stricly positive.
Additive error models often don’t make sense, for example if the outcome has to be positive, and the error is both positive and negative. You add the error in, you could possibly get negative responses from the model, when the observed data can never be negative.
Transformations are often hard to interpret.
There’s value in modeling the data on the scale that it was collected, Or the scale that’s of the interest.
Particularly interpetable transformations, natural logarithms in specific, aren’t applicable for negative or zero values. And you can get around that of course, but you then no longer have natural log, you have of some function of the data.
GLMs introduced in a 1972 RSSB paper by Nelder and Wedderburn.
GLMs involves three components:
Assume that \(Y_i \sim N(\mu_i, \sigma^2)\) (the Gaussian distribution is an exponential family distribution.)
Define the linear predictor to be \(\eta_i = \sum_{k=1}^p X_{ik} \beta_k\).
The link function as \(g\) so that \(g(\mu) = \eta\). This is always the case thats why its called link function. It always links the mean of the Y’s to the linear predictor.
This yields the same likelihood model as our additive error Gaussian linear model
\[ Y_i = \sum_{k=1}^p X_{ik} \beta_k + \epsilon_{i} \] where \(\epsilon_i \stackrel{iid}{\sim} N(0, \sigma^2)\)
Assume that our outcome is Bernoulli, \(Y_i \sim Bernoulli(\mu_i)\) so that \(E[Y_i] = \mu_i\) where \(0\leq \mu_i \leq 1\).
We define linear predictor \(\eta_i = \sum_{k=1}^p X_{ik} \beta_k\). It looks like our regression model.
Link function that liks the mean of the Y’s to the linear predictor is \(g(\mu) = \eta = \log\left( \frac{\mu}{1 - \mu}\right)\) . \(g\) is the (natural) log odds, referred to as the logit.
Note then we can invert the logit function as \[ \mu_i = \frac{\exp(\eta_i)}{1 + \exp(\eta_i)} ~~~\mbox{and}~~~ 1 - \mu_i = \frac{1}{1 + \exp(\eta_i)} \] Thus (according to the Bernoulli liklihood) the likelihood for all of the muus is \[ \prod_{i=1}^n \mu_i^{y_i} (1 - \mu_i)^{1-y_i} = \exp\left(\sum_{i=1}^n y_i \eta_i \right) \prod_{i=1}^n (1 + \eta_i)^{-1} \]
Lets do a poisson example :
Assume that \(Y_i \sim Poisson(\mu_i)\) so that \(E[Y_i] = \mu_i\) where \(0\leq \mu_i\)
We define linear predictor \(\eta_i = \sum_{k=1}^p X_{ik} \beta_k\)
And the link function \(g(\mu) = \eta = \log(\mu)\)
And recall that \(e^x\) is the inverse of \(\log(x)\) so that \[ \mu_i = e^{\eta_i} \]
Thus, the likelihood is \[ \prod_{i=1}^n (y_i !)^{-1} \mu_i^{y_i}e^{-\mu_i} \propto \exp\left(\sum_{i=1}^n y_i \eta_i - \sum_{i=1}^n \mu_i\right) \]
Its interesting to note that in each of these cases, the only way in which the likelihood depends on the data is through
\[\sum_{i=1}^n y_i \eta_i = \sum_{i=1}^n y_i\sum_{k=1}^p X_{ik} \beta_k = \sum_{k=1}^p \beta_k\sum_{i=1}^n X_{ik} y_i \]
Thus if we don’t actually need the full data, but only \(\sum_{i=1}^n X_{ik} y_i\). This simplification is a consequence of chosing so-called ‘canonical’ link functions. If we didn’t choose those function, that wouldn’t be the case.
(This has to be derived). All models achieve their maximum at the root of the so called normal equations, which is notthing but the solution of this equation: \[ 0=\sum_{i=1}^n \frac{(Y_i - \mu_i)}{Var(Y_i)}W_i \] where \(W_i\) are the derivative of the inverse of the link function.
For the linear model \(Var(Y_i) = \sigma^2\) is constant.
For Bernoulli case \(Var(Y_i) = \mu_i (1 - \mu_i)\)
For the Poisson case \(Var(Y_i) = \mu_i\).
In the latter cases, it is often relevant to have a more flexible variance model, even if it doesn’t correspond to an actual likelihood \[ 0=\sum_{i=1}^n \frac{(Y_i - \mu_i)}{\phi \mu_i (1 - \mu_i ) } W_i ~~~\mbox{and}~~~ 0=\sum_{i=1}^n \frac{(Y_i - \mu_i)}{\phi \mu_i} W_i \]
These are called quasi-likelihood normal equations . The whole point is to relax these strict assumption relating the means and the variances in those two models. And it simply fits the standard normal equations, but throws an extra parameter down there in the denominator. They inherit most of the properties that gml’s do. They just don’t happen to correspond to a specific model.
These normal equations have to be solved iteratively. Resulting in \(\hat \beta_k\) and, if included, \(\hat \phi\).
Predicted linear predictor responses can be obtained as \(\hat \eta = \sum_{k=1}^p X_k \hat \beta_k\)
Therefore predicted mean responses as \(\hat \mu = g^{-1}(\hat \eta)\)
Coefficients are interpretted as \[ g(E[Y | X_k = x_k + 1, X_{\sim k} = x_{\sim k}]) - g(E[Y | X_k = x_k, X_{\sim k}=x_{\sim k}]) = \beta_k \] or the change in the link function of the expected response per unit change in \(X_k\) holding other regressors constant.
For iterative solving variations on Newon/Raphson’s algorithm are used to do it.
Asymptotics are used for inference usually.
Finally many of the ideas from linear models can be brought over to GLMs.
Frequently we care about outcomes that have two values such as :
These are called binary, Bernoulli or 0/1 outcomes. The most common version of binary outcome for example in clinical research is diseased or not.
Collection of exchangeable binary outcomes for the same covariate data are called binomial outcomes. So, instead of analyzing the individual zero and ones, we might analyze the count of zeros and ones out of the number of trials and that would just be called binomial outcomes.
Lets look at an example using Ravens Data . Lets first download the Ravens Data and look at the first couple of rows.
library(knitr)
library(printr)
URL <- "https://dl.dropboxusercontent.com/u/7710864/data/ravensData.rda"
#download.file(URL, destfile= "./data/ravensData.rda",method="curl")
load("./data/ravensData.rda")
kable(head(ravensData),align = 'c')
| ravenWinNum | ravenWin | ravenScore | opponentScore |
|---|---|---|---|
| 1 | W | 24 | 9 |
| 1 | W | 38 | 35 |
| 1 | W | 28 | 13 |
| 1 | W | 34 | 31 |
| 1 | W | 44 | 13 |
| 0 | L | 23 | 24 |
Imagine trying to fit a linear regression model to this. It wouldn’t work very well, right? Obviously the additive error term doesn’t make a lot of sense because this additive error is going to make it so that the outcome isn’t 0 and 1, if the errors say is continuous.
\[ RW_i = b_0 + b_1 RS_i + e_i \]
\(RW_i\) - 1 if a Ravens win, 0 if not
\(RS_i\) - Number of points Ravens scored
\(b_0\) - probability of a Ravens win if they score 0 points
\(b_1\) - increase in probability of a Ravens win for each additional point
\(e_i\) - residual variation due
So here you can fit the linear regression.
lmRavens <- lm(ravensData$ravenWinNum ~ ravensData$ravenScore)
summary(lmRavens)$coef
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 0.2850317 | 0.2566432 | 1.110615 | 0.2813504 |
| ravensData$ravenScore | 0.0158992 | 0.0090590 | 1.755069 | 0.0962526 |
I don’t think it’s the particularly great thing to do. I would also say, it’s not a horrible thing to do either. If you’re interested in predicting a Ravens win versus a loss and you just create a thresh held linear regression estimate, this isn’t the worst thing in the world to do, it’s just not very interpretable. Here we’re interested in interpretation.
So, from a machine learning perspective, depending on the loss function a linear regression might be perfectly reasonable for binary outcomes. But, for interoperability, you want something different, so let’s talk about things like odds and this is way to re-frame the model in a particularly interpretable way.
As I mentioned let’s talk about things odds and this is way to re-frame the model in a particularly interpretable way.
Binary Outcome 0/1
\[RW_i\]
Probability (0,1) : We could talk about the probability that they won, given the parameters and their score, how many points they scored:
\[\rm{Pr}(RW_i | RS_i, b_0, b_1 )\]
Odds \((0,\infty)\) : The odds, remember, is the probability over one minus the probability:
\[\frac{\rm{Pr}(RW_i | RS_i, b_0, b_1 )}{1-\rm{Pr}(RW_i | RS_i, b_0, b_1)}\]
Log odds \((-\infty,\infty)\) : And the log odds of course is just the log of the odds, but the log odds is also termed the logit :
\[\log\left(\frac{\rm{Pr}(RW_i | RS_i, b_0, b_1 )}{1-\rm{Pr}(RW_i | RS_i, b_0, b_1)}\right)\]
So the Linear regression models this
\[ RW_i = b_0 + b_1 RS_i + e_i \]
or
\[ E[RW_i | RS_i, b_0, b_1] = b_0 + b_1 RS_i\]
and Logistic regression models
\[ \rm{Pr}(RW_i | RS_i, b_0, b_1) = \frac{\exp(b_0 + b_1 RS_i)}{1 + \exp(b_0 + b_1 RS_i)}\]
or
\[ \log\left(\frac{\rm{Pr}(RW_i | RS_i, b_0, b_1 )}{1-\rm{Pr}(RW_i | RS_i, b_0, b_1)}\right) = b_0 + b_1 RS_i \]
So here is the equation for logistics regression, rewritten:
\[ \log\left(\frac{\rm{Pr}(RW_i | RS_i, b_0, b_1 )}{1-\rm{Pr}(RW_i | RS_i, b_0, b_1)}\right) = b_0 + b_1 RS_i \]
And the interpretations would be:
\(b_0\) - Log odds of a Ravens win if they score zero points
\(b_1\) - Log odds ratio of win probability for each point scored (compared to zero points). (By the way, we don’t have any other covariates here, but if we had other covariates, we’d have to add the phrase with all the covariates held fixed.)
\(\exp(b_1)\) - Odds ratio of win probability for each point scored (compared to zero points)
Now lets talk about interpreting odds. Odds are very interpretable, at least as interpretable as probabilities are, and I like to think of them in terms of these gambling experiments.
Imagine that you are playing a game where you flip a coin with success probability \(p\). If it comes up heads, you win \(X\). If it comes up tails, you lose \(Y\). What should we set \(X\) and \(Y\) for the game to be fair? By fair we mean :
\[ E[earnings]= X p - Y (1 - p) = 0 \]
By the way your opponent expected earning would be the negative of \(E[earnings]\).
Which implies this equation:
\[\frac{Y}{X} = \frac{p}{1 - p}\]
Lets explore this equation:
The odds can be said as “How much should you be willing to pay for a \(p\) probability of winning a dollar?”
(If \(p > 0.5\) you have to pay more if you lose than you get if you win.) It means for the game to be fair, if you have a very high likelihood of winning, you should have to pay out a lot. Your opponent has a very low likelihood of winning so you should have to pay out more. So y is going to have to be bigger.
(If \(p < 0.5\) you have to pay less if you lose than you get if you win.) If p is smaller than 0.5, then you’re going to get more. If you’re betting on something that’s really a long shot, if you’re betting on something that has a very low probability of happening then you should win more that the person who has a much more sure bet.
And that’s why when you go to a horse track, for example, they give you the odds of the horse against the horse. You use that to figures out your payout. If it’s 10 to 1, then that means you’re going to pay 10 times the amount that you’re putting in. And so that’s odds, so odds are kind of uniquely interpretable
You can use manipulate function in R to investigae the logistic regression curve.
library(manipulate)
x <- seq(-10, 10, length = 1000)
manipulate(
plot(x, exp(beta0 + beta1 * x) / (1 + exp(beta0 + beta1 * x)),
type = "l", lwd = 3, frame = FALSE),
beta1 = slider(-2, 2, step = .1, initial = 2),
beta0 = slider(-2, 2, step = .1, initial = 0)
)
So, let’s look at our logistic regression for the ravens data here and how we fit it. In the code bleow, the family equals binomial says that it’s going to be logistic progression
logRegRavens <- glm(ravensData$ravenWinNum ~ ravensData$ravenScore,family="binomial")
summary(logRegRavens)
##
## Call:
## glm(formula = ravensData$ravenWinNum ~ ravensData$ravenScore,
## family = "binomial")
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7575 -1.0999 0.5305 0.8060 1.4947
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.68001 1.55412 -1.081 0.28
## ravensData$ravenScore 0.10658 0.06674 1.597 0.11
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 24.435 on 19 degrees of freedom
## Residual deviance: 20.895 on 18 degrees of freedom
## AIC: 24.895
##
## Number of Fisher Scoring iterations: 5
So -1.68 is the log odds of the Ravens winning, when they score nothing. So of course that’s negative when they score nothing, you wouldn’t expect them to win. And if you wanted the actual odds of them winning, you would have to do \(exp(-1.68)\).
And 0.106 is the increase in the log odds per every point that they score. And if we had included other variables down here that interpretation would be holding these other variables constant.
So here’s the fitted probabilities. It has to look like a S curve, which is what the inverse logit function looks like. So if you plug in a score, this is going to give you your estimated probability.
plot(ravensData$ravenScore,logRegRavens$fitted,pch=19,col="blue",
xlab="Score",ylab="Prob Ravens Win")
So, here’s just getting the coefficients if you exponentiate the coefficients.
Also you can exponentiate the conference interval and then get the upper and lower bounds, which is a good thing to do. I prefer to do it this way just because I prefer to interpret the coefficients on the relative scale rather than on the log scale.
exp(logRegRavens$coeff)
## (Intercept) ravensData$ravenScore
## 0.1863724 1.1124694
exp(confint(logRegRavens))
| 2.5 % | 97.5 % | |
|---|---|---|
| (Intercept) | 0.0056750 | 3.106384 |
| ravensData$ravenScore | 0.9962297 | 1.303304 |
The ANOVA function also works for GLM’s.
anova(logRegRavens,test="Chisq")
| Df | Deviance | Resid. Df | Resid. Dev | Pr(>Chi) | |
|---|---|---|---|---|---|
| NULL | NA | NA | 19 | 24.43457 | NA |
| ravensData$ravenScore | 1 | 3.539826 | 18 | 20.89475 | 0.0599117 |
In this output if you fit multiple models, just like with linear models, it adds them sequentially. Ideally you are doing this with nested models.
So the deviance here is the measure of model fit compared to the previous model. The residual deviance, if it’s subtracted is going to give you the deviance.
In this case the chance of getting a chi-squared with 3.5 and with 1 degree of freedom is pretty far out in the tail. There’s only 6% chance of getting one more extreme. (\(P(X > 3.5)=0.06\))
let’s finish by talking a little bit about interpreting odds ratios.
So odds ratios are Not probabilities, which I think is clear. And odds are of course not probability, but they’re a meaningful scale. So people have a hard time interpreting probabilities, and they have an equivalently hard time interpreting odds. I do like the one thing about odds in the way it directly relates back to gambling and the sort of example we covered earlier.
So whether you wanted to look at relative odds or relative probabilities, you’re going to find the same thing that :
You can also look at the relative probabilities themselves, and that’s called a relative risk :
Also odds have an ineresting use in retrospective studies. What do I mean by that is that imagine you’re studying a treatment, or you want to study something like smoking, and you can’t follow people prospectively over time, it takes too long and if you wanted to study smoking and lung cancer, you couldn’t say follow people over 15 years and then determine exactly how much they smoked, and then determine who gets lung cancer or not. That’s too hard.
A much easier way to do would be to go to hospital records and collect and get the names of a bunch of people who died from lung cancer and get their information, the age at which they died and so on. Find some controls, maybe some match controls and look backwards. Maybe go contact their family or something, or something like that to ascertain whether or not they smoked and how much.
Or you could go to live patients and get their smoking history, that would be another kind of study. In both these cases, what you’re doing is you’re looking at the outcome as a fixed non-random quantity. For example you sample 50 people who have lung cancer, or you sample 50 people who don’t have lung cancer. And that number is non-random, but you’d like to treat it as if it was the outcome. Well, there’s this interesting fact that you can use the odds to sort of treat that as the outcome.
Many data take the form of counts and that we can’t necessarily model counts in the same way that we model other things with linear regression or binary regression. Examples are:
Data may also be in the form of rates. What we mean by rates then is a count per unit time.
Percents also that seem like binomial processes can be modeled by Poisson models especially when the probability is very low and the sample size is very large. For example:
In these cases linear regression with transformation is an option
The Poisson distribution is a useful model for counts and rates
Again here a rate is count per some monitoring time. Some examples uses of the Poisson distribution are
We say random variable \(X \sim Poisson(t\lambda)\) if \[ P(X = x) = \frac{(t\lambda)^x e^{-t\lambda}}{x!} \] For \(x = 0, 1, \ldots\).
The mean of the Poisson is \(E[X] = t\lambda\), thus \(E[X / t] = \lambda\)
The variance of the Poisson is \(Var(X) = t\lambda\).
The Poisson tends to a normal as \(t\lambda\) gets large, as shown in the code below:
par(mfrow = c(1, 3))
plot(0 : 10, dpois(0 : 10, lambda = 2), type = "h", frame = FALSE)
plot(0 : 20, dpois(0 : 20, lambda = 10), type = "h", frame = FALSE)
plot(0 : 200, dpois(0 : 200, lambda = 100), type = "h", frame = FALSE)
Below is sort of, showing that the mean and variance are equal :
x <- 0 : 10000; lambda = 3
mu <- sum(x * dpois(x, lambda = lambda))
sigmasq <- sum((x - mu)^2 * dpois(x, lambda = lambda))
c(mu, sigmasq)
## [1] 3 3
Consider the daily counts to Jeff Leek’s web site
http://biostat.jhsph.edu/~jleek/
Since the unit of time is always one day, set \(t = 1\) and then the Poisson mean is interpretted as web hits per day. (If we set \(t = 24\), it would be web hits per hour).
Website data
Below are the data and we create a new column for the julian dates:
download.file("https://dl.dropboxusercontent.com/u/7710864/data/gaData.rda",destfile="./data/gaData.rda",method="curl")
load("./data/gaData.rda")
gaData$julian <- julian(gaData$date)
head(gaData)
| date | visits | simplystats | julian |
|---|---|---|---|
| 2011-01-01 | 0 | 0 | 14975 |
| 2011-01-02 | 0 | 0 | 14976 |
| 2011-01-03 | 0 | 0 | 14977 |
| 2011-01-04 | 0 | 0 | 14978 |
| 2011-01-05 | 0 | 0 | 14979 |
| 2011-01-06 | 0 | 0 | 14980 |
http://skardhamar.github.com/rga/
Plot data
Here is a plot of Julian data and the number of visits:
plot(gaData$julian,gaData$visits,pch=19,col="darkgrey",xlab="Julian",ylab="Visits")
We could model this as a linear regression:
\[ NH_i = b_0 + b_1 JD_i + e_i \]
\(NH_i\) - number of hits to the website
\(JD_i\) - day of the year (Julian day)
\(b_0\) - number of hits on Julian day 0 (1970-01-01)
\(b_1\) - increase in number of hits per unit day
\(e_i\) - variation due to everything we didn’t measure
*Linear regression line
Here is the plot of the fit and it looks good !
plot(gaData$julian,gaData$visits,pch=19,col="darkgrey",xlab="Julian",ylab="Visits")
lm1 <- lm(gaData$visits ~ gaData$julian)
abline(lm1,col="red",lwd=3)
Taking the natural log of the outcome has a specific interpretation.
Consider the model
\[ \log(NH_i) = b_0 + b_1 JD_i + e_i \]
\(NH_i\) - number of hits to the website
\(JD_i\) - day of the year (Julian day)
\(b_0\) - log number of hits on Julian day 0 (1970-01-01)
\(b_1\) - increase in log number of hits per unit day
\(e_i\) - variation due to everything we didn’t measure
\(e^{E[\log(Y)]}\) geometric mean of \(Y\).
With no covariates, this is estimated by \(e^{\frac{1}{n}\sum_{i=1}^n \log(y_i)} = (\prod_{i=1}^n y_i)^{1/n}\)
When you take the natural log of outcomes and fit a regression model, your exponentiated coefficients estimate things about geometric means.
\(e^{\beta_0}\) estimated geometric mean hits on day 0
\(e^{\beta_1}\) estimated relative increase or decrease in geometric mean hits per day
There’s a problem with logs with you have zero counts, adding a constant works:
round(exp(coef(lm(I(log(gaData$visits + 1)) ~ gaData$julian))), 5)
## (Intercept) gaData$julian
## 0.00000 1.00231
Just to compare the Linear vs. Poisson regression :
Linear
\[ NH_i = b_0 + b_1 JD_i + e_i \]
or
\[ E[NH_i | JD_i, b_0, b_1] = b_0 + b_1 JD_i\]
Poisson/log-linear
\[ \log\left(E[NH_i | JD_i, b_0, b_1]\right) = b_0 + b_1 JD_i \]
or
\[ E[NH_i | JD_i, b_0, b_1] = \exp\left(b_0 + b_1 JD_i\right) \]
Just to reiterate :
\[ E[NH_i | JD_i, b_0, b_1] = \exp\left(b_0 + b_1 JD_i\right) \]
Which is of course :
\[ E[NH_i | JD_i, b_0, b_1] = \exp\left(b_0 \right)\exp\left(b_1 JD_i\right) \]
Meaning that if \(JD_i\) is increased by one unit, \(E[NH_i | JD_i, b_0, b_1]\) is multiplied by \(\exp\left(b_1\right)\)
Poisson regression in R is pretty simple::
plot(gaData$julian,gaData$visits,pch=19,col="darkgrey",xlab="Julian",ylab="Visits")
glm1 <- glm(gaData$visits ~ gaData$julian,family="poisson")
abline(lm1,col="red",lwd=3)
lines(gaData$julian,glm1$fitted,col="blue",lwd=3)
The GLM fit is in blue which is non-linear.
There is some concern of the mean-variance relationship in this dataset.
plot(glm1$fitted,glm1$residuals,pch=19,col="grey",ylab="Residuals",xlab="Fitted")
We can use varience sandwich estimates and get confidence intervals that uses socalled Model agnostic standard errors:
library(sandwich)
confint.agnostic <- function (object, parm, level = 0.95, ...)
{
cf <- coef(object); pnames <- names(cf)
if (missing(parm))
parm <- pnames
else if (is.numeric(parm))
parm <- pnames[parm]
a <- (1 - level)/2; a <- c(a, 1 - a)
pct <- stats:::format.perc(a, 3)
fac <- qnorm(a)
ci <- array(NA, dim = c(length(parm), 2L), dimnames = list(parm,
pct))
ses <- sqrt(diag(sandwich::vcovHC(object)))[parm]
ci[] <- cf[parm] + ses %o% fac
ci
}
http://stackoverflow.com/questions/3817182/vcovhc-and-confidence-interval
So below are confidence intervals computed in a normal way as well as using the robust standard errors dicusse in previous segment:
confint(glm1)
| 2.5 % | 97.5 % | |
|---|---|---|
| (Intercept) | -34.34658 | -31.1597157 |
| gaData$julian | 0.00219 | 0.0023965 |
confint.agnostic(glm1)
| 2.5 % | 97.5 % | |
|---|---|---|
| (Intercept) | -36.3626746 | -29.136997 |
| gaData$julian | 0.0020581 | 0.002528 |
Lets now talk about modeling rates:
\[ E[NHSS_i | JD_i, b_0, b_1]/NH_i = \exp\left(b_0 + b_1 JD_i\right) \]
\[ \log\left(E[NHSS_i | JD_i, b_0, b_1]\right) - \log(NH_i) = b_0 + b_1 JD_i \]
\[ \log\left(E[NHSS_i | JD_i, b_0, b_1]\right) = \log(NH_i) + b_0 + b_1 JD_i \]
glm2 <- glm(gaData$simplystats ~ julian(gaData$date),offset=log(visits+1),
family="poisson",data=gaData)
plot(julian(gaData$date),glm2$fitted,col="blue",pch=19,xlab="Date",ylab="Fitted Counts")
points(julian(gaData$date),glm1$fitted,col="red",pch=19)
and a second fit:
glm2 <- glm(gaData$simplystats ~ julian(gaData$date),offset=log(visits+1),
family="poisson",data=gaData)
plot(julian(gaData$date),gaData$simplystats/(gaData$visits+1),col="grey",xlab="Date",
ylab="Fitted Rates",pch=19)
lines(julian(gaData$date),glm2$fitted/(gaData$visits+1),col="blue",lwd=3)
Adopted based on the course on regression models offered by Brian Caffo at Johns Hopkins Bloomberg School of Public Health