Quiz 3 : Analyzing 911 response data using Regression
Attribute table, Shape files
everything is related to everything else, but near things are more related than distant things. (Tobler, 1970)
Geographically Weighted Regression (GWR) is a local, spatial, regression method that allows the relationships you are modeling to vary across the study area.
In order to demonstrate how the regression tools work, we will be doing an analysis of 911 Emergency call data for a portion of the Portland Oregon metropolitan area.
Suppose we have a community that is spending a large portion of its public resources responding to 911 emergency calls. Projections are telling them that their community’s population is going to double in size over the next 10 years. If they can better understand some of the factors contributing to high call volumes now, perhaps they can implement strategies to help reduce 911 calls in the future.
Step 1 - Getting started
- We will import ArcMap map documents (.mxd files) that have been saved locally into ArcGIS Pro.
- In this map document you will notice several Data frames containing layers of data (Shapefiles) for the Portland Oregon metropolitan study area.
- Stations
- 911Calls
- Streets
- ObsData911calls
- frame
- CallHotSpots
- Once these are imported and activated we can be able to visualize them on the map below
- In the map, each point represents a single call into a 911 emergency call center.
- This is real data representing over 2000 calls.
- By selecting the layers individually we can also be able to see what is happening on each layer , for instance if we add the frame and Stations , we can get the visual below showing exactly where the stations are located:
Step 2 - Examining Hotspot Analysis
- If we deactivate streets we can visualise properly the distribution of the 911 calls as shown below.
- focusing on the CallHotSpots layer and examining the data frame we can have a clear picture of the observed hot spots
Running the hospot analysis
- We move on to click Analysis tab
- from there we select Geoprocessing
- We then search for **Hot Spot Analysis (Getis-Ord Gi*)**
- On the Parameters we add:
- CallHotSpots as input features
- ICOUNT as input field.
- Mantain other fields as is.
The results
- After running the analysis , we get the map shown above with the output.
- Areas with high call volumes are shown in red (hot spot); areas getting very few calls are shown in blue (cold spots)
- The green crosses are the existing locations for the police and fire units tasked with responding to these 911 calls.
Step 3 : Running a Simple Ordinary least Squares Regression
What are the factors that contribute to high volumes of 911 calls? To help answer these questions, we’ll use the regression tools in ArcGIS.
- here we will focus on ObsData911Calls , if we look at this attribute table we notice that it has an Outcome variable call and some potential Predictor variables
- We go to analysis
- Select Geo-processing
- Select ObsData911Calls as the Input Feature Class
- We select UniqID from the ObsData911Calls to be a unique ID field
- Select Calls to be the Dependent variable.
- for the simple model , we only select Pop as the explanatory variable.
- We also specify the Out Report File location folder to be saved in.
- The OLS default output is a map showing us how well the model performed, using only the population variable to explain 911 call volumes.
- The red areas are under predictions (where the actual number of calls is higher than the model predicted); the blue areas are over predictions (actual call volumes are lower than predicted)
- the Adjusted R-Squared value is 0.393460, or 39%. This indicates that using population alone, the model is explaining 39% of the call volume story.
Step 4 : Finding key variables
Scatterplot matrix
We will experiment with the scatter plot matrix graph to explore the relationships between call volumes and other candidate explanatory variables.
- Open the Chart Properties pane
- Select three or more numeric fields for analysis
- Configure the layout:
- Lower left: Display scatterplots, R² values, or Pearson’s r
- Upper right: Show scatterplots, R² values, or leave blank
- Diagonal: Add histograms or field names
- Optionally, add trend lines (linear or nonlinear) to each scatter plot
- Customize axes, sorting, and appearance as needed
Step 5 - A properly specified multiple Regression model
Dependent and Explanatory variables should be numeric fields containing a variety of values. OLS cannot solve when variables have the same value (all the values for a field are 9.0, for example). Linear regression methods, such as OLS, are not appropriate for predicting binary outcomes (for example, all of the values for the dependent variable are either 1 or 0).
The Unique ID field links model predictions to each feature. Consequently, the Unique ID values must be unique for every feature, and typically should be a permanent field that remains with the feature class. If you don’t have a Unique ID field, you can create one by adding a new integer field to your feature class table and calculating the field values to be equal to the FID/OID field.
The functionality of this tool is included in the Generalized Linear Regression tool
- We go to analysis
- Select Geo-processing
- Select ObsData911Calls as the Input Feature Class
- We select UniqID from the ObsData911Calls to be a unique ID field
- Select Calls to be the Dependent variable.
- for the multiple Regression model , select multiple variables such as Pop,Jobs,LowEd,Dst2Center as the explanatory variables.
- We also specify the Out Report File location folder to be saved in.
Results and Interpretations
Step 6: The 6 things you gotta check!
1.Regression Coefficients
- A positive coefficient means the relationship is positive; a negative coefficient means the relationship is negative.
- the coefficient for the Pop variable is positive. This means that as the number of people goes up, the number of 911 calls also goes up. We are expecting a positive coefficient.
- If the coefficient for the Population variable was negative, we would not trust our model.
2. Variance Inflation Factor
- The VIF’s for each variable are considerably lower indicating that there is considerably no multicolinearity among the variables.
3. Statistical significance
- Probability and Robust Probability, measure coefficient statistical significance. An asterisk next to the probability tells you the coefficient is significant.
- If a variable is not significant, it is not helping the model, and unless theory tells us that a particular variable is critical.
- When the Koenker (BP) statistic is statistically significant, you can only trust the Robust Probability column to determine if a coefficient is significant or not.
- Small probabilities are “better” (more significant) than large probabilities.
4. Jarque-Bera test
- The Jarque-Bera test measures whether or not the residuals from a regression model are normally distributed.
- When it IS statistically significant, your model is biased.
- the Jarque-Bera test is NOT statistically significant here indicating Normality of residuals.
5. Check Model performance
The Adjusted \(R^2\) value is much higher for this new model, \(0.831080\), indicating this model explains \(83\%\) of the 911 call volume story. This is a big improvement over the model that only used Population.
Step 7: Running GWR
- When the Koeker test is statistically significant, as it is here, it indicates relationships between some or all of your explanatory variables and your dependent variable are non-stationary.
- This means, for example, that the population variable might be an important predictor of 911 call volumes in some locations of your study, but perhaps a weak predictor in other locations.
- Whenever you notice that the Koenker test is statistically significant, it indicates you will likely improve model results by moving to Geographically Weighted Regression.
Running the GWR model
- Go to Geoprocessing
- Run the Geographically Weighted Regression tool with the following parameters
- Input feature class: ObsData911Calls
- Dependent variable: Calls
- Explanatory variables: Pop, Jobs, LowEduc, Dst2UrbCen
- Fill in other fields as shown in the images below and run the model
- Examine the attribute table as shown in the image below.
- Applying the AICc method, that using 50 neighbors to calibrate each local regression equation yields optimal results (minimized bias and maximized model fit).
- Notice that the Adjusted R2 value is higher for GWR than it was for our best OLS model (OLS was 83%; GWR is almost 86.6%). The AICc value is lower for the GWR model.
Spatial Autocorrelation
Run the Spatial Autocorrelation tool on the Standardized Residuals in the Output Feature Class.
- The residuals are randomly distributed and this indicates we have a properly specified model.
Step 8: GWR Predictions
GWR may also be used to predict values for a future time or for locations within the study area where you have X values, but don’t know what the Y values are. In this next step we will explore using GWR to predict future 911 call volumes.
Double click the GWR tool and ensure that the following parameters are set as shown in the image below. Notice that the model is calibrated using the variables we’ve been using all along, but that the explanatory variables for the predictions are new. The new variables represent projected population, job, and education variables for some time in the future.
We run the model and When the model finishes, we can toggle on and off the layers representing the actually 911 call data (obsData911Calls), the model predictions for the current year (GWRPredictionsCY), and the future 911 call volume predictions (GWRPredictionsFY) to observe the differences.
We began by applying Ordinary Least Squares (OLS) regression to test whether population alone could explain variation in 911 emergency call volumes. Using the scatterplot matrix, we explored additional explanatory variables that could strengthen the model. OLS diagnostics allowed us to assess whether the model was properly specified. The Koenker test indicated statistically significant non-stationarity, suggesting that relationships between variables varied across space. This led us to use Geographically Weighted Regression (GWR) to improve the model’s performance.
Through this analysis, we gained several important insights for the community:
Using Hot Spot Analysis, we evaluated how well fire and police units are currently located in relation to 911 call demand.
With OLS, we identified the main factors driving call volumes. Where those factors pointed to potential interventions or policy actions, GWR helped us identify the neighborhoods where such efforts would likely be most effective.
Finally, GWR allowed us to predict future 911 call volumes, enabling us to anticipate demand and providing a benchmark for evaluating the effectiveness of interventions once they are implemented.