Data Science Now Challenge

Annual Donor Retention Scoring

By Joe Hennen & Paula Baingana

06-13-2022

Executive Summary

The organization’s retention rate is fairly consistent from year to year, in that it appears to retain approximately 76%-77% of its donors. Retained donors generate an average of $276m each year, and new or renewed donors generate $67m, for an average of approximately $340 million in gifts each year from FY17 to FY21.

This project sought to identify trends in donor retention and upgrade potential, as well as to determine a solicitation strategy for donors who have not yet given in FY22 that maximizes the impact and available budget of the solicitation.

In order to help answer these questions, we analyzed prospect biographical information at the entity and household levels, and we developed several metrics based on the available prospect giving data:

  • Gift Recency, Frequency, Monetary (RFM) Score - RFM scoring helped us identify the best donors by scoring their recency, frequency, and monetary amount of giving. Additionally, this was used as a primary segmentation strategy to determine the prospects that would be the best targets for high- and low-touch mailers.
  • Gift Velocity - This was applied to all prospects in order to identify trends in donor giving trajectories and to determine the prospects who may be more likely to upgrade their giving in FY22.
  • Predicted Gift Size - We calculated a predicted gift size for donors who had not yet made a gift in FY22, based on their giving trajectory and recent giving, in order to help determine the most cost-effective solicitation strategy.
  • Likelihood - We developed a score for prospects to determine the likelihood that they would make a gift in FY22, as well as to identify the biographical or giving-related characteristics that were most closely related to retention.
    Our analysis suggested that giving-related factors were most closely associated with retention, and that biographical characteristics such as age, prospect type, or geography were less important. Based on this, our segmentation strategy was primarily giving-based, and we matched RFM and giving velocity scores with high- and low-touch strategies to develop target segments:
  1. High-touch mailers would be sent out to 1882 households. These are a mix of loyal, upgradeable, new, and lapsed donors who tend to make higher value donations.
  2. Low-touch mailers would be sent out to 259 households. These are loyal, upgradeable donors who tend to make lower value donations.

In order to maximize response rate and stay within budget, mailers were targeted towards annual prospects in the 50+ age range based the United States. This resulted in a total solicitation cost of $4,964.

Note: Metrics are defined in the Appendix.


Background

You and your team work for a US-based nonprofit organization whose current fiscal year is coming to an end. Leadership would like your annual giving team to make one last push to retain donors who have not yet given this fiscal year, and has allocated a budget of $5,000 for this retention mailing. There are two options for the mailpieces:

  • A “standard” mailer which costs $1.00 per piece.
  • A “high-touch” mailer which costs $2.50 per piece.
    Your challenge is to identify which donors to include in this retention effort and develop a strategy that you feel will make the most of your available budget. It is up to you to decide how many mailers of each type are sent out and to which donors they are sent.

Data Preparation and Analysis

The Dataset:

The challenge dataset was provided in three .CSV files containing the following information:

  1. Household data: Household ID, prospect “type”, staff manager, address, latitude/longitude, and a do not solicit indicator.
  2. Giving data: Household ID and FY17-22 total giving.
  3. Entity data: Individual ID, household ID, birthday, deceased indicator, capacity source, capacity, race, and job title.

Data Preparation & Tools

In order to prepare the dataset for analysis, we took the following steps:

  1. Joined the datasets based on household ID and removed duplicates, which resulted in approximately 100,000 entities and 60,000 households
  2. Removed all entities/households with “do not solicit” flags
  3. Removed deceased entities and dual-deceased households

This resulted in 67,738 entities and 43,682 active households to solicit donations from, which comprised our working dataset for this project.
Additional considerations:

  • Mailability: While we included households without a mailable address at the analysis stage of this project in order to understand our constituent base, they are excluded at the point at which we determine which households to target with mailers, since they are not reachable through this method.
  • Prospect type details: We also considered excluding all prospects except for those classified as “annual” in type at this stage, as well as those assigned to a gift officer or those with giving over a certain amount that might make them fall under the assignment of a gift officer, since the focus of this mailer is annual giving. However, we decided to include them for analysis and focus on them at the final stage when targeting mailers.
  • Alignment of prospect capacity source: In some households, the “capacity source” for each individual varied. For example, one entity might have a screening capacity source and the other an institutional capacity source, or no capacity source. We assumed that the “institutional” capacity source would be an internal/researcher/gift officer assigned rating and matched capacity sources between members of the household in a hierarchy of institutional > screening > no source.
  • Calculated age fields: Additionally, we calculated the current ages of all entities based on the provided birthday. We wanted to add age as a field both to better understand our constituent base and to potentially aid in targeting mailers; older prospects may have a better response rate to a direct-mail campaign than younger ones. We also added a “max age” field for households that applied the older individual’s age to the household.
    Once prepared, the data was loaded into an Oracle APEX database for ease of manipulation, access, and analysis.

Note: Since we have only received information on the households and entities as of their current status, i.e. in FY22, we have no sense of whether or not households have changed over the course of their available giving history (since FY17). For the purposes of this project, we assume that all households have remained the same in terms of all information provided on them over the course of their provided giving history.

Tools & Software Platforms

The following tools were used:

  • SQL, Oracle Application Express: Used in loading the files into local databases and preliminary data manipulation.
  • R, RStudio: This was the main platform for analysis, modelling and reporting.The final report was written entirely in RMarkdown.
  • Excel: Secondary platform for analysis.

Exploratory Prospect Analysis

Approach
In order to understand our prospect base for the purposes of this project, we decided to focus our exploratory analysis at the entity (demographic) and household (financial) levels, and in the following areas:

  • Demographic information (calculated age, race/ethnicity, location)
  • Prospect information (prospect capacity, capacity source, prospect type)
  • Giving information (giving totals, averages, consecutive years)

Note: We considered creating a category for job codes that could be used in modeling. Given the wide variety of job types and limited time for this project, as well as the limited expected impact of job codes on likelihood, we did not standardize or analyze job fields.


1. Household & Entity Summaries


2. Giving Summaries

The organization’s rate is fairly consistent from year to year, in that it appears to retain approximately 76%-77% of its donors. Retained donors generate an average of $276m each year, and new or renewed donors generate $67m, for an average of approximately $340 million in gifts each year from FY17 to FY21.

Fiscal_year Total_gifts Number_of_gifts Average_Giving Annual_Retention
FY2017 $346,351,384 41,597 8,326
FY2018 $346,160,876 41,538 8,334 77%
FY2019 $336,184,202 41,558 8,090 77%
FY2020 $345,175,823 41,413 8,335 76%
FY2021 $342,471,123 41,549 8,243 77%
FY2022 $104,694,326 12,446 8,412 23%

Segmentation and Scoring

To understand prospect giving behavior and retention/likelihood, we developed an RFM (Recency-Frequency-Monetary) score and applied this to the households who had not yet made a gift in FY22. We used RFM scoring because this is an industry standard method for segmentation related to direct marketing and customer retention.

Additionally, we developed a giving velocity score for all prospects. Velocity is a useful metric that helps us understand past giving patterns and a possible future trajectory. From velocity, we can tell whether there is increased giving, reduced giving, or same-rate giving over a time period. Additionally, this can help us identify the prospects that have the best chance of being retained or upgraded. We calculated velocity as the average giving of last three years divided by average giving of last five years.

We then developed a predictive/likelihood score using the FY22 givers as a sample group, with all primary household-level demographic variables and our secondary giving-related factors (R/F/M, giving velocity) as variables. The variables that had the strongest effect were gift frequency, giving in the prior fiscal year, and having a gift velocity score greater than 1. Demographic variables had little to no impact.

Mailing Segments

The following segments were created in order to determine the type of mailer a household received. In order to get these segments, metrics such as velocity (V), and the RFM score were utilized.

Segment Description
High-touch • First-time, larger donors
• Upgradeable, higher value donors, V >1
• Lapsed one year high value donors
• New donors, higher value
Low-touch • Loyal givers,lower value, retainable
No-touch • Lapsed 2+ years
• Never givers
• All others

RFM Scoring

To develop the RFM scores, we used the following scales:

Recency: (1-5 scale)

  • Based on the last fiscal year in which they made a gift, e.g. a Recency score of 5 means they last donated in FY21. Likewise a 1 means they last donated in FY17 or have not given before.

Frequency: (1-5 scale)

  • Based on the number of fiscal years in which they made a gift from FY17 to FY21.

Monetary: (1-5 scale)

  • A quintile model based on the sum of all household gifts made from FY17 to FY21. This means that the population’s giving was distributed into 5 equal groups, with the bottom 20% of the population having a score of 1 and the top 20% of the population having a score of 5.

RFM Segmentation Logic

  1. Best donors/”most engaged”: RFM of 555.
    (Recency = 5, Frequency = 5 and Monetary = 5) .These have given the most recently, the most frequently, and the highest amounts

  2. Loyal Upgrade Targets, Higher Value: RFM of 554, 553, 545, 543. These are donors who have given most recently and give frequently at medium amounts.

  3. New, high value donors: RFM of 535, 534, 533, 415.
    These are newer, promising donors who have given only a few times but very recently, and they contributed larger gifts.

  4. Low value, loyal donors: RFM of 551, 552, 542, 541.
    The donors who have given recently and frequently but have not made large gifts.

  5. Lapsed 1 year, high value: RFM of 445, 444, 443.
    Donors who give frequently and spend a lot, but it’s been one year since they’ve given.

Note: Not all possible RFM combinations were seen in this dataset. The groups were therefore based on the available combinations.

 

RFM Score Distribution - A Multifaceted Heat Map Representation


 

Velocity & Upgraded Giving

For our purposes, we calculated giving velocity at the household level as the average amount donated in the last three years divided by average amount donated in the last 5 years.
This formula gave us the most robust velocity that was consistent with giving patterns.  

\[ Velocity =\frac{Average(FY2021,FY2020,FY2019)}{Average(FY2021,FY2020,FY2019,FY2018,FY2017)} \]  

A higher velocity is indicative of increased giving over the years, while a low velocity indicates that giving totals are generally shrinking year over year.
Household Velocity categories were defined as follows:

  • Velocity less than 1, V < 1 : ‘At Risk’
  • V = 1 : ‘Steady’
  • V > 1 : ‘Rising Star’

Below is a chart showing how velocity relates to donations in 2022.


Likelihood Using Logistic Regression

A likelihood score was developed, which predicted who is most likely to give in FY22. To create this score, a logistic regression model was built using the following features:

  • Capacity
  • Gift Frequency
  • Gift Size (Monetary score)
  • Location (US based or international)
  • Maximum age in Household (living persons)
  • Donated in previous year (yes/no)
  • Gift Velocity

The result was a highly accurate model (>70% accurate) with the following features considered to be the most important in determining who is most likely to give in FY2022:

  • Gift frequency
  • Donated in previous year
  • Gift velocity score of V>1

This was critical in assessing how we used demographic data. Since demographic data had almost no impact on likelihood, we focused more on giving history.
Likelihood also helped provide more information on our expected returns.

Results & Recommendations

Segmentation Results

We matched relevant/possible RFM scores to the key segments. We filtered out households that did not have a mailing address on file, resulting in approximately 33,000 eligible households (defined as those that did not yet give in FY22, that are not deceased, that are mailable, and excluding “do not solicits”).

We chose more likely segments to minimize potential waste and maximize response rate. Our segments had a much higher average likelihood score than the remaining prospects (with the exception of the New High Value segment, which only had ~1 year of giving data). They are still considered very high value donors, so they were retained in the mailer plan.
 
This resulted in the following key segment counts:

Segment Count RFM_Scores Average_Likelihood
Best donors 2940 555 97%
Loyal upgrade targets, higher value (V>1) 1164 554, 553, 545, 544, 543 96%
New donors, higher value 130 535, 534, 533, 415 4%
Loyal, retention, lower value (V>1) 591 552, 551, 542, 541 96%
Lapsed one year high value 1110 445, 444, 443 97%
All other 26889 all other 61%

Cost-Benefit Analysis

Predicted Gift Size
We developed a predicted gift size model for FY22 to help us target the mailing strategy. This allowed us to:
a) Determine how we should further segment our high-touch or low-touch mailers, and;
b) Determine potential ROI when selecting a mix of approaches.
This was calculated using a fairly simple method, based on prospects’ latest gift size and their giving velocity:

      Average giving over the past three years x giving velocity

This revealed a few key trends:

  • Predicted giving was highest for better-scored and more likely donors.
  • Prospects based in the US made up approximately 89% of total predicted giving for annual.
  • Of the annual, US-based group, average predicted giving for prospects in our segments was nearly 3x that of other prospects.

We examined the total predicted giving and the cost to mail to each segment across all annual/US-based prospects by simulating combinations of three age brackets (<50, 50+, and 65+), and mailing types (high-touch only, low-touch only, mixed/hybrid).

Since there were relatively few prospects that fell into our low-touch segment, a low-touch only strategy did not make sense. The highest-performing scenario only yielded approximately $300,000 and fell far below our budget ($420).

In the high-touch only and mixed strategies, mailing to all annual, US-based prospects exceeded our budget (approximately $8,000). Targeting each age bracket came in under budget, however. We found that in both approaches, the 50+ age group had the highest predicted giving ($6 million) compared to the under-50s ($3.5 million) and the 65+ group ($5 million).

The highest-performing strategy in terms of predicted giving and maximizing available budget ($4,964) was a mixed strategy targeting the 50+ age group, raising an estimated $6.4 million in predicted FY22 gifts from ~2,100 donors in a 100% response/conversion rate scenario, with the bulk of the revenue being generated by the high-touch group ($6.2 million).

We chose these segments intentionally to maximize potential retention, and so while actual generated revenue/gifts may be lower, this strategy is estimated to have the highest potential response rate.


Recommendations

Our recommendations are based on answering the following key questions:

1. Which donors are most likely to be retained than others?

We found that donors who scored highly on giving-related metrics (e.g., RFM, velocity) scored higher in our likelihood model, and in particular those who gave most frequently and most recently. Biographical factors (such as age, geography, prospect type) were minimally impactful.

2. What might be the best segmentation strategies for this appeal? Should the organization send out only one type of mailer or opt for a combination?

The exploratory analysis showed us that the most predictive, high-level factors in retention are related to our prospects’ giving patterns, rather than other demographic factors. For that reason, we decided to use a giving-focused scoring model (based on RFM & velocity) as the primary segmentation approach when determining how to target our mailers.

Additionally, RFM provides data points that enable us to match scores with high-touch or low-touch approaches. When determining whether to use a mix of high and low-touch approaches, or to use one approach entirely, we conducted research into donor direct-mail response rates to determine the types of segments where each strategy would be most impactful (given limited funds). We found that:

  • High-touch mailers are generally more useful in retaining/upgrading high-value donors; retaining new, high-value donors; and recapturing lapsed, high-value donors
  • Low-touch mailers are more useful in upgrading loyal but lower-value donors who appear to be increasing their giving year over year.

Based on this, we developed the following criteria for segments that would respond best to each strategy:

 

Segment Description Strategy
High-touch (personalized appeals) • First-time, larger donors
• Loyal upgrade targets, higher value – upgradeable (V>1)
• Lapsed one year high value donors
• New donors, higher value
• Retain/upgrade/steward
• Upgrade

• Retain
• Recapture
Low-touch (generic appeals) • Loyal givers,lower value, upgradeable (V>1) • Upgrade
No-touch (no additional appeal) • Lapsed 2+ years
• Never givers
• All others

  We then matched relevant/possible RFM scores to these key segments. We chose more likely segments to minimize potential waste and maximize response rate. Our segments had a much higher average likelihood score (>96%) than the remaining prospects (with the exception of the New High Value segment, which only had ~1 year of giving data).

We created a predicted gift size for FY22 to determine the total cost of mailing to these groups. This would allow us to a) determine how we should further segment our high- or low-touch mailers and b) determine potential ROI. This was calculated using a fairly simple method, based on prospects’ latest gift size and their giving velocity.

This process resulted in a significant number of prospects, far beyond what we could affordably mail to. In order to narrow down these groups further, we applied additional filters in the following way:

  1. Filtered by prospects with an “annual” prospect type only.
    Rationale: The primary focus of this initiative is to target annual-level prospects.
  2. Filtered by prospects based on the US.
    Rationale: Maximize the cost-effectiveness of mailings, given cost of mailing outside the United States. There are more cost-effective ways to reach those donors. More importantly, analysis showed that US based donors make up the vast majority (89%) of all annual giving.

In conclusion, the best segmentation strategies would include a giving-focused approach with several layers, starting with RFM, then velocity and likelihood. Further filtering based on prospect type, age group and location to stay within budget provided the most ideal segments.
The organization should opt for a combination mailer.

3. Are there donors who not only appear likely to be retained but also appear likely to upgrade their giving?

Yes! Donors with a velocity greater than 1 (“Rising Stars”) were great candidates for upgraded giving. As such, solicitation to these groups should suggest a possible increase in their annual contribution. In our analysis, the loyal donors who had a high velocity are most likely to upgrade their giving.

4.Is it possible to develop a score or metric to indicate the likelihood of a donor being retained?

Yes! Using a logistic regression model, a likelihood score ranging from 0 to 1 (or 0% - 100%) was assigned to every household based on the data provided.

Conclusion & Further Considerations

In conclusion, we were able to come in under and very close to budget ($4,964),while targeting the key segments of annual giving prospects that we believe may be most likely to respond to the types of mailers available to the organization – which would maximize the return on investment for this solicitation. While this strategy leaves out the vast majority of prospects, we recognize that there are other, more appropriate ways to reach them that might fall outside the scope of this solicitation effort.

While we used a fairly top-down approach to segmentation, we also considered using a clustering or classification algorithm. However, we found that this approach might take more time than was available to determine a useable solicitation strategy, and giving-oriented/RFM segmentation provided an easier method for matching high- and low-touch strategies.


   

Mailing List Depiction using Oracle Application Express


Appendix

Definitions

RFM – RFM (Recency, Frequency, Monetary) is a widely-used marketing analysis method to identify an organization’s best customers (or donors, in the case of philanthropic and nonprofit organizations). The analysis involves scoring customers according to three metrics: how recently they made a purchase, how often they make purchases, and the size of those purchases. A common scale for each category is 1 to 5, with 5 being the highest. Customers who score the highest in each category are considered the organization’s best customers – these customers would have an RFM score of 5-5-5. RFM scoring is a basic method for helping a company understand their customer base and determine a customer’s likelihood of doing business with the company again; or which donors are most likely to give again. RFM then helps a company or nonprofit segment their constituent base into more manageable groups and target them in direct marketing campaigns.

 
Giving Velocity – Giving velocity is a score that measures the rate of change in a donor’s giving patterns over a given period of time. Velocity can be calculated in a variety of ways, but it is generally determined by taking recent giving and comparing it to past giving. For example, taking the size of a prospect’s most recent gift and dividing it by the average giving in a recent period – or taking an average of recent gifts and comparing that to an average over a longer period of time. Similar to RFM, this scoring approach helps nonprofits understand individual donor behavior. If a donor’s giving velocity is increasing or increasing significantly, that may indicate a change in capacity or affinity for the organization. Velocity can help an organization both understand how its donors’ behavior is changing, and it is to some extent forward-looking, in that velocity can help an organization determine what a donor might do in the future.

\[ Velocity =\frac{Average(FY2021,FY2020,FY2019)}{Average(FY2021,FY2020,FY2019,FY2018,FY2017)} \]  
Likelihood - This is the probability that a prospect will give in the current year based on giving history and other factors. It is presented as a percentage, with 100% likelihood meaning that a donor will almost certainly donate that year.