Introduction

Can we determine the characteristics of addresses that borrow on Anchor? Using on-chain data, we can obtain a lot of information about addresses. We can know how often they swap tokens, their voting participation habits, their account balance, and much more. What if all this information could be used to estimate the odds of an address borrowing on Anchor?

In this project I build a logistics regression model using on-chain data. The model takes Terra addresses and estimates their odds of borrowing on Anchor. Using the results, I determine which variables are associated with borrowing on Anchor.

Gathering the Data

I query Flipside’s terra schema for all addresses that deposited on Anchor Earn sometime between 2021-10-21 and 2021-12-22 (the deposit period).

The addresses are then separated into two possible outcomes:

addresses that borrowed on Anchor sometime in the following two months (the borrowing period, from 2021-12-22 to 2022-02-21), or
addresses that did not borrow on Anchor during the borrowing period.

I also gather a total of eleven input variables for each address. These variables tell us about the blockchain history of each address.

balance_dec_20: wallet balance of the address right before the borrowing period.
nb_deposits: the number of times the address has deposited into Anchor during the deposit period.
wallet_age: number of days since the address was created on Terra. The snapshot is December 20, 2021 (right before the borrowing period).
nb_swaps: total number of swap transactions before the borrowing period.
nb_transfers: total number of transfers before the borrowing period.
votes: number of times the address has voted on a Terra governance proposal.
anc_airdrops_claimed: number of anc airdrops claimed during the deposit period.
anc_voting: a binary variable (voted or not). addresses that voted on an Anchor governance proposal anytime before the borrowing period.
deposit_amount_usd: the total amount of deposits to Anchor during the deposit period.

The following variables are also part of the dataset but are not used in the analysis:

nb_borrows: the number of times the address has taken a loan on Anchor during the borrowing period.
borrowed_amount_usd: amount borrowed on Anchor.

In Summary, we have a list of addresses that deposited on Anchor during a two month period. We then look at which ones borrowed during the following two months. How are the borrowers different from the non-borrowers? Can we predict the odds of borrowing using on-chain data?

depositors.data.1 <- PullVelocityData("https://api.flipsidecrypto.com/api/v2/queries/c27aa873-aa94-43a8-b4ee-c8c9e077b744/data/latest")
depositors.data.2 <- PullVelocityData("https://api.flipsidecrypto.com/api/v2/queries/6f551570-b570-44f9-a68a-5df693855cf8/data/latest")
borrowers.data.1 <- PullVelocityData("https://api.flipsidecrypto.com/api/v2/queries/985bc297-24c0-4e3f-a7b3-5692511d5c5b/data/latest")
borrowers.data.2 <- PullVelocityData("https://api.flipsidecrypto.com/api/v2/queries/6f4dcef8-20c2-4b84-9019-42403a8685ac/data/latest")
anc_voting_and_claims.data <- PullVelocityData("https://api.flipsidecrypto.com/api/v2/queries/6280e078-c8da-4f82-8b88-68e2a76c93e4/data/latest")

# lunatic score can be imported as well. I chose not to do so mostly because # the score that I have is dated February 21, 2022. 
# lunatic_scores_feb_21_all_data <- read.csv("~/Data sciences/Terra - profile of anchor borrowers/Profile of Anchor Borrowers/lunatic_scores_feb_21_all_data.csv")

#combining depositors data sets into one
depositors.data.combined <- depositors.data.1 %>% left_join(depositors.data.2)
#combining borrowers data sets into one
borrowers.data.combined <- borrowers.data.1 %>% left_join(borrowers.data.2)
# converting borrowed amount into numeric
depositors.data.combined <- transform(depositors.data.combined, borrowed_amount_usd = as.numeric(borrowed_amount_usd))
# converting number of borrowing transactions into integer
depositors.data.combined <- transform(depositors.data.combined, nb_borrows = as.integer(nb_borrows))
# renaming column 'borrows' to 'depositors' to allow joining tables
renamed_borrowers <- rename(borrowers.data.combined, depositors = borrowers)
# combining borrowers data set with depositors data set
data.combined <- depositors.data.combined %>% full_join(renamed_borrowers)
#renaming 'depositors' column to 'address'. Address now contains both depositors who did not borrow as well as depositors who did borrow.
data.combined <- rename(data.combined, address = depositors)
# replace some NA values with 0
data.combined <- data.combined %>% mutate(deposit_amount_usd = coalesce(deposit_amount_usd, 0))
data.combined <- data.combined %>% mutate(balance_dec_20 = coalesce(balance_dec_20, 0))
data.combined <- data.combined %>% mutate(nb_deposits = coalesce(nb_deposits, 0))
data.combined <- data.combined %>% mutate(nb_swaps = coalesce(nb_swaps, 0))
data.combined <- data.combined %>% mutate(nb_transfers = coalesce(nb_transfers, 0))
data.combined <- data.combined %>% mutate(votes = coalesce(votes, 0))
data.combined <- data.combined %>% mutate(wallet_age = coalesce(wallet_age, 0))
#renaming column name for easier joining of the data frame.
anc_voting_and_claims.data <- rename(anc_voting_and_claims.data, address = depositors)
data.combined <- data.combined  %>% left_join(anc_voting_and_claims.data)
data.combined <- data.combined %>% mutate(anc_airdrop_claimed = coalesce(anc_airdrop_claimed, 0))
data.combined <- data.combined %>% mutate(anc_voting = coalesce(anc_voting, 'did_not_vote'))
# transform some columns as factors (anc voting: 'voted' or 'did_not_vote')
data.combined <- transform(data.combined, anc_voting = as.factor(anc_voting))
# creating the outcome variable 'borrowed'. It has a binary outcome ('yes' or 'no'). It tells us if the address borrowed on Anchor during the borrowing period. 
data.combined <- data.combined %>% mutate(borrowed = case_when(nb_borrows == 0 ~ "no",
                                              nb_borrows > 0 ~ "yes"))
data.combined <- transform(data.combined, borrowed = as.factor(borrowed))
# I think it is best to ignore lunatics score for this analysis because there are a lot of NA values. Regression reveals that column 'n_gov_stakes_anc' does seem to impact odds of borrowing). It is possible that our variable entitled 'anc_voting' may have a similar predictive effect.         # data.combined <- data.combined %>% left_join(lunatic_scores_feb_21_all_data)

#transform some numerical columns into integer
data.combined <- transform(data.combined, nb_swaps = as.integer(nb_swaps))
#Removing addresses that did not make a deposit to anchor during the two month period, this improves the predictive power of the logistics regression model. Likely because data is more consistent and complete. Why are these addresses in there? It may be that they deposited on Anchor before or after the deposit period. 
data.combined <- filter(data.combined, nb_deposits > 0)
# moving some columns to improve readability 
data.combined <- data.combined %>% dplyr::select("address", "borrowed",           everything())

Dataset

Sample of the resulting dataset:

kable(head(data.combined, 10), booktabs = TRUE)

address	borrowed	balance_dec_20	deposit_amount_usd	nb_deposits	wallet_age	nb_swaps	nb_transfers	anc_airdrop_claimed	anc_voting
terra13zqnq2dqg83swee9sen64ccwd7j83eshwwxpl9	no	2.826809	783.4332	2	318	0	4	0.000	voted
terra18u5m7pym8m8mjt0wzxgw9zwz56k02a69tlnzdq	no	76577.473066	780.9653	5	322	49	2	225.883	did_not_vote
terra1qszfqcmhdx4xd4zxkt00eqdygmvrgjcv5k2y5n	no	3279.395541	779.6089	1	103	0	2	0.000	did_not_vote
terra1y63jcxyrks9rzxs9rlvk4crne2r3w9rgaur7uj	no	18.288084	755.2106	3	95	0	0	0.000	did_not_vote
terra1dztz06tcrrgsdrt0vdh4exgey57nuyv02fcujf	no	790.990974	787.1846	1	118	0	0	0.000	did_not_vote
terra1l5qz5sua6fnnfmmuu5gx0knhgp5n2ke2aamnlp	no	85.088722	771.2856	1	77	0	0	0.000	did_not_vote
terra1u9yywa6qgr0xeq08syg2zkdgp7fqxd03aqqqsz	no	4014.449751	762.1542	5	150	1	3	0.000	did_not_vote
terra1ska44nk04frzahs2xl6ckalz7haf29j9h3f4yu	no	782.475071	779.8921	2	87	0	0	0.000	did_not_vote
terra1j8j79g5fs55xvaj38unynusy50y375jla3aj9p	no	1.250684	760.1646	1	84	0	0	0.000	did_not_vote
terra1qqnaal2vdwmqh2p8yrypx07eyzm6e75ghdhscp	no	788.442828	778.5851	3	314	0	0	0.000	did_not_vote

The full dataset has 66,682 rows.

Link to the SQL code and query result on Flipside Crypto (the page may take some time to load)

Logistics Regression

I build a model using a multivariate logistics regression. This model takes nine variables as input and assesses their impact on the outcome variable. The outcome is whether the address borrowed on Anchor during the borrowing period.

logistics <- glm(borrowed ~ balance_dec_20 + deposit_amount_usd + nb_deposits + wallet_age + nb_swaps + nb_transfers + votes + anc_airdrop_claimed + anc_voting, data = data.combined, family = 'binomial')

Plotting the Model

I plot the model’s predictions against the real borrowing outcomes.

predicted.data <- data.frame(
  probability.of.borrowed=logistics$fitted.values,
  borrowed=data.combined$borrowed)

predicted.data <- predicted.data[
  order(predicted.data$probability.of.borrowed, decreasing=FALSE),]
predicted.data$rank <- 1:nrow(predicted.data)

ggplot(data=predicted.data, aes(x=rank, y=probability.of.borrowed)) +
  geom_point(aes(color=borrowed), alpha=1, shape=4, stroke=2) +
  xlab("Address") +
  ylab("Predicted Probability of Borrowing")

On the x-axis are all 66,682 addresses in the dataset. On the y-axis are the predicted odds of borrowing on Anchor. The model assigns odds of borrowing to each address.

The color represents actual outcomes. Red points have not borrowed, and turquoise points have borrowed on Anchor.

The model has given less than 25% odds of borrowing to about 60,000 addresses. In actuality, most of these addresses did not borrow.

Most of the addresses with over 50% odds of borrowing have indeed borrowed.

The model has performed well.

Interpreting the model:

The summary function gives us detailed information about the model.

summary(logistics)

## 
## Call:
## glm(formula = borrowed ~ balance_dec_20 + deposit_amount_usd + 
##     nb_deposits + wallet_age + nb_swaps + nb_transfers + votes + 
##     anc_airdrop_claimed + anc_voting, family = "binomial", data = data.combined)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -7.1956  -0.5573  -0.5023  -0.4743   2.1619  
## 
## Coefficients:
##                       Estimate Std. Error  z value Pr(>|z|)    
## (Intercept)         -2.350e+00  2.255e-02 -104.209  < 2e-16 ***
## balance_dec_20      -2.120e-09  4.733e-09   -0.448    0.654    
## deposit_amount_usd  -5.166e-09  1.090e-09   -4.742 2.12e-06 ***
## nb_deposits          6.808e-02  2.207e-03   30.843  < 2e-16 ***
## wallet_age           1.994e-03  1.114e-04   17.898  < 2e-16 ***
## nb_swaps            -1.774e-04  3.925e-04   -0.452    0.651    
## nb_transfers         2.377e-04  1.597e-04    1.488    0.137    
## votes               -7.954e-03  5.231e-03   -1.521    0.128    
## anc_airdrop_claimed  9.304e-07  1.638e-06    0.568    0.570    
## anc_votingvoted      1.759e+00  3.298e-02   53.346  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 62343  on 66681  degrees of freedom
## Residual deviance: 55873  on 66672  degrees of freedom
## AIC: 55893
## 
## Number of Fisher Scoring iterations: 5

The ‘Coefficients’ section provides information on the effect size of each input variable.

On the right we see the p-value of each variable. It is simplified into a ranking of three stars. Variables with *** are statistically significant.

The column ‘Estimate’ is the effect that each variable has on the outcome. Variables with a negative Estimate are inversely associated with the probability of borrowing (and vice versa). For example, the greater the value of ‘deposit_amount_usd’, the lower the odds of this address borrowing on Anchor.

For all addresses, the regression model goes through each input variable and adjusts the odds of borrowing according to the estimate coefficients above.

The size of the estimate coefficient measures it’s impact on the outcome. For example, the estimate coefficient for ‘deposit_amount_usd’ is very small. It is only -5.166e-09 which in real numbers is -0.000000005166. In plain language, for every 1$ deposited into Anchor during the deposit period, the odds of borrowing during the borrowing period are reduced by 0.000000005166.

Median USD deposited by borrowers during the deposit period:

test_borrowed <- data.combined %>% filter(nb_borrows > 0)
test_not_borrowed <- data.combined %>% filter(nb_borrows == 0)
 print(median(test_borrowed$deposit_amount_usd))

## [1] 4248.218

Median USD deposited by non-borrowers during the deposit period:

print(median(test_not_borrowed$deposit_amount_usd))

## [1] 1868.959

The median amount deposited into anchor from borrowers’ is greater than deposits from non-borrowers. Even though larger deposits into anchor reduce the odds of borrowing, the effect size of this variable is very small.

‘anc_voting’ is the variable that has the largest effect on the odds of borrowing. The odds of borrowing are increased by 1.759 when an address has voted on an Anchor governance proposal anytime before the borrowing period. Odds of 1.00 signify a 100% probability of borrowing.

Percentage of borrowers that voted on an Anchor governance proposal:

num_borrowers_voted <- count(filter(test_borrowed, anc_voting == 'voted'))
num_borrowers_did_not_vote <- count(filter(test_borrowed, anc_voting == 'did_not_vote'))
print(num_borrowers_voted / num_borrowers_did_not_vote * 100)

##           n
## 1: 34.87232

Percentage of non-borrowers that voted on an Anchor governance proposal:

print(count(filter(test_not_borrowed, anc_voting == 'voted')) / count(filter(test_not_borrowed, anc_voting == 'did_not_vote'))* 100)

##           n
## 1: 4.329054

Anchor depositors that voted on an Anchor governance proposal are 8 times more likely to borrow compared to non-voters. Tweet this

Takeaways

The on-chain history of Anchor depositors is associated with their odds of borrowing on Anchor.

Factors that are associated with borrowing include:

Voting on a anchor governance proposal.
The age of the wallet (older wallets have increased probability of borrowing).
The number of deposit transactions into Anchor.

Factors that are inversely associated with borrowing:

The total amount deposited into anchor, where higher deposits are associated with not borrowing funds.

Factors that have no statistically significant effect:

Claiming ANC airdrops.
Voting on Terra governance proposals.
Number of swaps completed.
The balance of the address on December 20, 2021 (before the borrowing period).

The effect size of each variable is shown in this chart:

V = caret::varImp(logistics)

ggplot2::ggplot(V, aes(x=reorder(rownames(V),Overall), y=Overall)) +
    geom_point( color="blue", size=4, alpha=0.6)+
    geom_segment( aes(x=rownames(V), xend=rownames(V), y=0, yend=Overall), 
                  color='skyblue') +
    xlab('Variable')+
    ylab('Overall Importance')+
    theme_light() +
    coord_flip()

An Anchor depositor with high odds of borrowing has the following profile:

has voted on a Anchor governance proposal sometime in the past,
has deposited into Anchor multiple times in the last two months, and
has created its Terra address a long time ago.

Limitations of the analysis:

The ‘wallet_balance_dec_20’ variable only includes liquid LUNA, UST and Terra stablecoins.
The ‘deposit_amount_usd’ variable is not a net value. It would be better to replace it with a new variable measuring net deposits into Anchor (deposits - withdrawals).
Similarly, ‘nb_deposits’ (to Anchor) could be paired with an additional variable called ‘nb_withdrawals’ (from Anchor). ‘anc_airdrop_claimed’ could also be paired with a new ‘nb_anc_tokens_sold’ variable.

Futher Improvements

Market sentiment changes rapidly in the world of crypto. This may impact transaction patterns. The model could be rebuilt in a different market cycle. Afterwards, we could see if the same input variables had a comparable effect on the odds of borrowing on Anchor. This would give us more confidence in the effect size of the input variables.

Additionally, a new model could be created to predict the amount borrowed by a set of addresses. The amount borrowed by each address will not be precisely right, but a good model may be able to predict the aggregate amount borrowed by a large set of addresses.

Profile of Anchor Borrowers

Zook