This week we practiced more with logistic regression and correlated data.
We spend Thursday class with Dr.McNamara.
Here is an example of the violation of independence:
- In a survey about political views, the responses submitted by a certain friend group may be correlated as they might discuss issues together and agree on some topics.
Correlated data is encountered in nearly every field. In education, student scores from a particular teacher are typically more similar than scores of other students who have had a different teacher. During a study measuring depression indices weekly over the course of a month, we usually find that four measures for the same patient tend to be more similar than depression indices from other patients. In political polling, opinions from members of the same household are usually more similar than opinions of members from other randomly selected households.
# Tree Growth example
The 1st model is a standard linear model
The 2nd model takes into account the correlation.
As we saw in the case of the binary outcomes, the standard error for the coefficients is larger when we take correlation into account. The t-statistic for tubes is smaller, reducing our enthusiasm for the tubes effect. This conservative approach occurs because the observations within a transect are correlated and therefore not independent as assumed in the naive model.
This is an instruction with correlated data:
-Identify the grouping units
-State the response(s) measured and variable type
-Write a sentence describing the within-group correlation.
-Identify fixed and random effects
In conclusion :
structures of data sets may imply that outcomes are correlated. Correlated outcomes provide less information than independent outcomes, resulting in effective sample sizes that are less than the total number of observations. Neglecting to take into account correlation may lead to underestimating standard errors of coefficients, overstating significance and precision. Correlation is likely and should be accounted for if basic observational units (e.g., pups, trees) are aggregated in ways that would lead us to expect units within groups to be similar.
There are 2 ways to account for correlation:
-Incorporate a dispersion parameter
-Include random effects