Data Analysis Plan

Summary of Study Objective

The objective of this study is to predict the number of times a person has had contact with police in the past 30 days based on a set of predictor variables. The study design is longitudinal with 4 measurement occasions, and the outcome variable is a count of police contacts. Since the study is observational and there is no intervention, our analysis will focus on how predictors influence the count of police contacts over time.

Data Cleaning Steps

  1. Handling Missing Data:
    • Check for missing values in the outcome variable and predictors.
    • Use appropriate imputation methods or exclude cases with excessive missing data.
  2. Outlier Detection:
    • Identify and examine outliers in the count variable and predictors.
    • Decide on a strategy for handling outliers (e.g., transformation or exclusion).
  3. Data Transformation:
    • Ensure that predictors are in the correct format (e.g., categorical predictors are factor variables).
    • Scale or normalize predictors if necessary.
  4. Check for Consistency:
    • Verify that data entries are consistent across measurement occasions.

General Procedures and Best Practices

  1. Data Exploration:
    • Perform exploratory data analysis (EDA) to understand the distribution of the count variable and predictors.
    • Plot count data over time to check for trends or patterns.
  2. Model Checking:
    • Assess the fit of the model using diagnostic plots and statistical tests.
    • Check for overdispersion in the count data, which may affect model choice.
  3. Reporting Results:
    • Report estimates, confidence intervals, and p-values for predictor effects.
    • Include visualizations (e.g., predicted counts over time) to illustrate findings.

Statistical Approach

Given that the data is longitudinal with repeated measures and the outcome variable is a count, a suitable approach is a generalized linear mixed-effects model (GLMM) with a Poisson or negative binomial distribution.

Model Formula

For a Poisson GLMM, the model can be specified as:

\[ \text{log}(\text{Expected Count}_{ij}) = \beta_0 + \beta_1 \text{Predictor}_{ij} + u_{j} \]

where:

  • \(\text{Expected Count}_{ij}\) is the expected count of police contacts for individual \(i\) at measurement occasion \(j\).
  • \(\beta_0\) is the intercept.
  • \(\beta_1\) is the coefficient for the predictor variable.
  • \(u_{j}\) is the random effect for measurement occasion \(j\), accounting for correlations within the same occasion.

For a negative binomial GLMM, the model is similar but includes an additional parameter for overdispersion.

Example R Code

# Load necessary libraries
# library(lme4)
# library(MASS)

# Fit a Poisson GLMM
# poisson_model <- glmer(count ~ predictor1 + predictor2 + (1 | occasion),
#                        family = poisson, data = your_data)

# If overdispersion is a concern, fit a Negative Binomial GLMM
# negbin_model <- glmer.nb(count ~ predictor1 + predictor2 + (1 | occasion), 
#                          data = your_data)

# Summary of the model
# summary(poisson_model)
# summary(negbin_model)