The purpose of this analysis was to reproduce the Center for Disease Control & Prevention’s (CDC) linear trend analysis on complex sample survey data.
Youth Risk Behavioral Surveillance System (YRBS) 1991-2011 single year microdata was used for this analysis. The recommended R solution for downloading the microdata is to use Anthony Damico’s automation scripts. It is recommended to first read the CDC manual when combining multiple single year microdata.
Microdata was analyzed using the R survey package to produce estimates similar to that of SAS-callable SUUDAN. For this analysis, R was set to produce conservative standard errors similar to setting the MISSUNIT option in SUDAAN. You can read more about the lonely PSU on the survey package website.
In this example, we examined change over time for smoking prevalence among youth. Unadjusted prevalence rates (see figure 1) suggested a significant change in smoking prevalence. Epidemiological models, however, typically control for possible confounding variables such as sex and race.
Orthogonal coefficients were fitted for testing trends. The regression outputs are ordered by highest-order time variable(s) fitted in the model. Table 1 reveals that a quadratic trend may best describe the relationship between prevalence of smoking and change-over-time. The decision to test beyond linear trends, however, is a decision for the individual researcher to make. It is a decision that can be driven by theoretical issues, existing literature, or the availability of data.
| Model 1 | Model 2 | |
|---|---|---|
| (Intercept) | 0.22 (0.03)*** | 0.21 (0.03)*** |
| sex1 | -0.08 (0.02)*** | -0.09 (0.02)*** |
| raceeth2 | -0.08 (0.04)* | -0.08 (0.04)* |
| raceeth3 | 0.18 (0.03)*** | 0.19 (0.03)*** |
| raceeth4 | -0.12 (0.04)** | -0.13 (0.04)** |
| grade2 | 0.23 (0.03)*** | 0.24 (0.03)*** |
| grade3 | 0.38 (0.03)*** | 0.38 (0.03)*** |
| grade4 | 0.57 (0.03)*** | 0.58 (0.03)*** |
| t11l | -1.36 (0.05)*** | -1.33 (0.05)*** |
| t11q | -0.37 (0.05)*** | |
| Deviance | 195227.73 | 194818.48 |
| Dispersion | 1.00 | 1.00 |
| Num. obs. | 151096 | 151096 |
| p < 0.001, p < 0.01, p < 0.05 | ||
The adjusted prevalence and standard error were produced using the svypredmeans function, which emulates the PREDMARG statement in SUDAAN.
The original CDC analysis requires third-party software, National Cancer Institute Joinpoint Regression Program, which only runs on selected platforms. With that said, Dr. Vito Muggeo helped us with an R solution through his segmented package.
Carrying out a trend analysis required creating new weights to fit a piecewise linear regression. Figure 3 shows the relationship between variance at each datum and weighting; smaller circles are equivalent to a larger variance and therefore lower weight.
Our analysis returned similar results to the Joinpoint Regression Program in that the estimated change point was the 1999 year - the start of a decreasing trend in smoking prevalence or an annual percent change (APC) of -3.92.
After identifying the change point for smoking prevalence, we created two regression models. The first model covered years up to the change point (i.e., 1991 to 1999). The second model included the years from the change point forward (i.e., 1999 to 2011). According to the output from the regression models, there was no change during the 1991-1999 time period; the linear time trend variable was not significant. However, the 1999-2011 time period saw a linear decrease, which supports the APC estimate in the previous step.
| Model 1 | Model 2 | |
|---|---|---|
| (Intercept) | 0.62 (0.05)*** | -0.04 (0.04) |
| sex1 | -0.06 (0.03)* | -0.09 (0.02)*** |
| raceeth2 | -0.12 (0.05)* | -0.06 (0.05) |
| raceeth3 | 0.18 (0.05)*** | 0.19 (0.04)*** |
| raceeth4 | -0.16 (0.06)* | -0.15 (0.05)** |
| grade2 | 0.28 (0.05)*** | 0.26 (0.03)*** |
| grade3 | 0.36 (0.06)*** | 0.40 (0.04)*** |
| grade4 | 0.51 (0.06)*** | 0.65 (0.04)*** |
| t5l | 0.04 (0.06) | |
| t7l | -0.99 (0.06)*** | |
| Deviance | 83192.21 | 128939.00 |
| Dispersion | 1.00 | 1.00 |
| Num. obs. | 68769 | 96973 |
| p < 0.001, p < 0.01, p < 0.05 | ||
The analysis may complement qualitative evaluation on prevalence changes observed from surveillance data by providing quantitative evidence, such as when a change point occurred. A limitation to the analysis, however, is that the analysis is unable to explain why or how changes in trends occur.