NGO donations and catastrophical events: an example

library (pacman)
p_load('tidyverse', 'lubridate', 'kableExtra', 'magrittr')

So, that’s the long text, but I’ve done my best to rid it of unnecessary details, but it’s still long. If the moderators wish, I can reword it or make it shorter.

NOTE 1: I don’t want to be a free-rider, nor am I looking for ready-made solutions: Suggestions about the most relevant statistical approaches, references to studies with similar design or data structure are more than enough.

NOTE 2: All examples, names, amounts are completely fictitious. Countries are used only to make the explanation less abstract.

I’ve two data sets:

  1. microdata with individual donations to a specific NGO, with dates and amounts transferred, country of residence of a person who donated money, etc.
  2. dataset with catastrophic events (earthquakes, floods, forest fires, etc.) with dates, countries where they occurred, number of victims, etc.

I’d like to check if the occurrence of these events stimulates donations.

A tentative example of transaction microdata:

donations_df<-tibble(
  transaction_id=c(1,2,3,4,5,6,7),
  amount=c(100,90,150,175, 125, 111, 60),
  date=c('2010-01-01','2010-01-03','2010-01-01', '2010-01-03','2010-01-05', '2010-01-03', '2010-01-03'),
  country=c('Albania', 'Albania', 'Benin', 'Benin', 'Benin','Croatia', 'Croatia')
) %>% 
  mutate(date=parse_date_time(date, 'Y-m-d'))
donations_df %>% kbl %>% kable_classic_2()
transaction_id amount date country
1 100 2010-01-01 Albania
2 90 2010-01-03 Albania
3 150 2010-01-01 Benin
4 175 2010-01-03 Benin
5 125 2010-01-05 Benin
6 111 2010-01-03 Croatia
7 60 2010-01-03 Croatia

A tentative example of event dataset:

events_df<-tibble(
  event_id=c(1,2, 3),
  date=c('2010-01-01','2010-01-03','2010-01-01'),
  country=c('Albania', 'Benin', 'USA'),
  n_victims=c(10,20,15),
) %>% 
  mutate(date=parse_date_time(date, 'Y-m-d'))
events_df %>% kbl %>% kable_classic_2()
event_id date country n_victims
1 2010-01-01 Albania 10
2 2010-01-03 Benin 20
3 2010-01-01 USA 15
events_df%<>%mutate(incident_date=date)

The transaction database is relatively large (thousands of transactions per country). Events are rare - about 10, max 20 per year.

So several questions arise:

  1. General question: I’d primarily like to know if there was an increase in donations after the event. This effect (if any) seems to be short-lived: I’d expect donation levels to increase one week after the event and then return to pre-event levels. Is there a common/standardized way to analyze this?

  2. More specific questions: So far, I’ve collapsed the transaction dataset by summing up the donations per day per country and merging the result with the event dataset so that we can see in each country for each day how many days have passed since the last event. I then regressed the donation total (‘sum_amount’) for that day with the number of days since the last event (since_last) as well as an inverse number of days (rev_since_last = 1/since_last) Here is the example (the real data contains thousands of records per country, so when collapsed, as soon as the first event happens, there are no NAs in the merged dataset):

  donations_df %>% group_by(country, date) %>% 
  summarise(sum_mean=sum(amount)) %>% 
  left_join(events_df) %>% 
  fill(incident_date) %>%
  mutate(since_last=interval( incident_date,date)/ days(1)) %>% 
  arrange(country, date) %>% 
  mutate(rev_since_last=1/(since_last+1)) %>% 
  select(-c(n_victims, incident_date)) %>% 
  kbl(digits=2) %>% 
  kable_classic_2
`summarise()` has grouped output by 'country'. You can override using the
`.groups` argument.
Joining with `by = join_by(country, date)`
country date sum_mean event_id since_last rev_since_last
Albania 2010-01-01 100 1 0 1.00
Albania 2010-01-03 90 NA 2 0.33
Benin 2010-01-01 150 NA NA NA
Benin 2010-01-03 175 2 0 1.00
Benin 2010-01-05 125 NA 2 0.33
Croatia 2010-01-03 171 NA NA NA

2.1. The first problem is that the since_last (the number of days passed since the last catastrophic event) continues to grow until the next event in this country. So if nothing happens, it’ll continue to grow endlessly. Although I think the actual effect should be fairly short-lived, I guess this way of capturing the effect as “number of days since last event’ superficially overestimates the importance of this factor at the donation level. I also tried a specification where ‘rev_since_last’ is set to 0 after 7 days after the event, but of course this number looks completely arbitrary.

2.2. Another big problem with this approach is that when two events overlap, one simply cancels out the other, whereas in real life they naturally reinforce each other. So now if the first event occurs on January 1, since_last is 2 days on January 3. If the second event occurs on January 4, since_last is back to 0.

2.3. Another problem is the nested structure of the data: Events and transactions occur in specific countries. An event in a country A at a time \(t\) increases donations mostly in country A in \(t+1\). It would be naive to expect that an event in Albania would substantially increase donations in New Zealand, but it may. This is especially relevant for large events: Let’s say a major earthquake in country A increases donations in country B. So if I cluster the data by country, I may be too restrictive.

2.4. The “catastrophic” events are relatively rare. I wonder if I’m not artificially increasing their importance by creating this since_last variable that shows up everywhere in the data after the first event occurs in a country.