Introduction

Can we determine the characteristics of addresses that borrow on Anchor? Using on-chain data, we can obtain a lot of information about addresses. We can know how often they swap tokens, their voting participation habits, their account balance, and much more. What if all this information could be used to estimate the odds of an address borrowing on Anchor?

In this project I build a logistics regression model using on-chain data. The model takes Terra addresses and estimates their odds of borrowing on Anchor. Using the results, I determine which variables are associated with borrowing on Anchor.

Gathering the Data

I query Flipside’s terra schema for all addresses that deposited on Anchor Earn sometime between 2021-10-21 and 2021-12-22 (the deposit period).

The addresses are then separated into two possible outcomes:

  1. addresses that borrowed on Anchor sometime in the following two months (the borrowing period, from 2021-12-22 to 2022-02-21), or
  2. addresses that did not borrow on Anchor during the borrowing period.

I also gather a total of eleven input variables for each address. These variables tell us about the blockchain history of each address.

The following variables are also part of the dataset but are not used in the analysis:

In Summary, we have a list of addresses that deposited on Anchor during a two month period. We then look at which ones borrowed during the following two months. How are the borrowers different from the non-borrowers? Can we predict the odds of borrowing using on-chain data?

depositors.data.1 <- PullVelocityData("https://api.flipsidecrypto.com/api/v2/queries/c27aa873-aa94-43a8-b4ee-c8c9e077b744/data/latest")
depositors.data.2 <- PullVelocityData("https://api.flipsidecrypto.com/api/v2/queries/6f551570-b570-44f9-a68a-5df693855cf8/data/latest")
borrowers.data.1 <- PullVelocityData("https://api.flipsidecrypto.com/api/v2/queries/985bc297-24c0-4e3f-a7b3-5692511d5c5b/data/latest")
borrowers.data.2 <- PullVelocityData("https://api.flipsidecrypto.com/api/v2/queries/6f4dcef8-20c2-4b84-9019-42403a8685ac/data/latest")
anc_voting_and_claims.data <- PullVelocityData("https://api.flipsidecrypto.com/api/v2/queries/6280e078-c8da-4f82-8b88-68e2a76c93e4/data/latest")

# lunatic score can be imported as well. I chose not to do so mostly because # the score that I have is dated February 21, 2022. 
# lunatic_scores_feb_21_all_data <- read.csv("~/Data sciences/Terra - profile of anchor borrowers/Profile of Anchor Borrowers/lunatic_scores_feb_21_all_data.csv")

#combining depositors data sets into one
depositors.data.combined <- depositors.data.1 %>% left_join(depositors.data.2)
#combining borrowers data sets into one
borrowers.data.combined <- borrowers.data.1 %>% left_join(borrowers.data.2)
# converting borrowed amount into numeric
depositors.data.combined <- transform(depositors.data.combined, borrowed_amount_usd = as.numeric(borrowed_amount_usd))
# converting number of borrowing transactions into integer
depositors.data.combined <- transform(depositors.data.combined, nb_borrows = as.integer(nb_borrows))
# renaming column 'borrows' to 'depositors' to allow joining tables
renamed_borrowers <- rename(borrowers.data.combined, depositors = borrowers)
# combining borrowers data set with depositors data set
data.combined <- depositors.data.combined %>% full_join(renamed_borrowers)
#renaming 'depositors' column to 'address'. Address now contains both depositors who did not borrow as well as depositors who did borrow.
data.combined <- rename(data.combined, address = depositors)
# replace some NA values with 0
data.combined <- data.combined %>% mutate(deposit_amount_usd = coalesce(deposit_amount_usd, 0))
data.combined <- data.combined %>% mutate(balance_dec_20 = coalesce(balance_dec_20, 0))
data.combined <- data.combined %>% mutate(nb_deposits = coalesce(nb_deposits, 0))
data.combined <- data.combined %>% mutate(nb_swaps = coalesce(nb_swaps, 0))
data.combined <- data.combined %>% mutate(nb_transfers = coalesce(nb_transfers, 0))
data.combined <- data.combined %>% mutate(votes = coalesce(votes, 0))
data.combined <- data.combined %>% mutate(wallet_age = coalesce(wallet_age, 0))
#renaming column name for easier joining of the data frame.
anc_voting_and_claims.data <- rename(anc_voting_and_claims.data, address = depositors)
data.combined <- data.combined  %>% left_join(anc_voting_and_claims.data)
data.combined <- data.combined %>% mutate(anc_airdrop_claimed = coalesce(anc_airdrop_claimed, 0))
data.combined <- data.combined %>% mutate(anc_voting = coalesce(anc_voting, 'did_not_vote'))
# transform some columns as factors (anc voting: 'voted' or 'did_not_vote')
data.combined <- transform(data.combined, anc_voting = as.factor(anc_voting))
# creating the outcome variable 'borrowed'. It has a binary outcome ('yes' or 'no'). It tells us if the address borrowed on Anchor during the borrowing period. 
data.combined <- data.combined %>% mutate(borrowed = case_when(nb_borrows == 0 ~ "no",
                                              nb_borrows > 0 ~ "yes"))
data.combined <- transform(data.combined, borrowed = as.factor(borrowed))
# I think it is best to ignore lunatics score for this analysis because there are a lot of NA values. Regression reveals that column 'n_gov_stakes_anc' does seem to impact odds of borrowing). It is possible that our variable entitled 'anc_voting' may have a similar predictive effect.         # data.combined <- data.combined %>% left_join(lunatic_scores_feb_21_all_data)

#transform some numerical columns into integer
data.combined <- transform(data.combined, nb_swaps = as.integer(nb_swaps))
#Removing addresses that did not make a deposit to anchor during the two month period, this improves the predictive power of the logistics regression model. Likely because data is more consistent and complete. Why are these addresses in there? It may be that they deposited on Anchor before or after the deposit period. 
data.combined <- filter(data.combined, nb_deposits > 0)
# moving some columns to improve readability 
data.combined <- data.combined %>% dplyr::select("address", "borrowed",           everything())

Dataset

Sample of the resulting dataset:

kable(head(data.combined, 10), booktabs = TRUE)
address borrowed balance_dec_20 borrowed_amount_usd deposit_amount_usd nb_borrows nb_deposits wallet_age nb_swaps nb_transfers votes anc_airdrop_claimed anc_voting
terra13zqnq2dqg83swee9sen64ccwd7j83eshwwxpl9 no 2.826809 0 783.4332 0 2 318 0 4 0 0.000 voted
terra18u5m7pym8m8mjt0wzxgw9zwz56k02a69tlnzdq no 76577.473066 0 780.9653 0 5 322 49 2 0 225.883 did_not_vote
terra1qszfqcmhdx4xd4zxkt00eqdygmvrgjcv5k2y5n no 3279.395541 0 779.6089 0 1 103 0 2 0 0.000 did_not_vote
terra1y63jcxyrks9rzxs9rlvk4crne2r3w9rgaur7uj no 18.288084 0 755.2106 0 3 95 0 0 0 0.000 did_not_vote
terra1dztz06tcrrgsdrt0vdh4exgey57nuyv02fcujf no 790.990974 0 787.1846 0 1 118 0 0 0 0.000 did_not_vote
terra1l5qz5sua6fnnfmmuu5gx0knhgp5n2ke2aamnlp no 85.088722 0 771.2856 0 1 77 0 0 0 0.000 did_not_vote
terra1u9yywa6qgr0xeq08syg2zkdgp7fqxd03aqqqsz no 4014.449751 0 762.1542 0 5 150 1 3 0 0.000 did_not_vote
terra1ska44nk04frzahs2xl6ckalz7haf29j9h3f4yu no 782.475071 0 779.8921 0 2 87 0 0 0 0.000 did_not_vote
terra1j8j79g5fs55xvaj38unynusy50y375jla3aj9p no 1.250684 0 760.1646 0 1 84 0 0 0 0.000 did_not_vote
terra1qqnaal2vdwmqh2p8yrypx07eyzm6e75ghdhscp no 788.442828 0 778.5851 0 3 314 0 0 0 0.000 did_not_vote

The full dataset has 66,682 rows.

Link to the SQL code and query result on Flipside Crypto (the page may take some time to load)

Logistics Regression

I build a model using a multivariate logistics regression. This model takes nine variables as input and assesses their impact on the outcome variable. The outcome is whether the address borrowed on Anchor during the borrowing period.

logistics <- glm(borrowed ~ balance_dec_20 + deposit_amount_usd + nb_deposits + wallet_age + nb_swaps + nb_transfers + votes + anc_airdrop_claimed + anc_voting, data = data.combined, family = 'binomial')

Plotting the Model

I plot the model’s predictions against the real borrowing outcomes.

predicted.data <- data.frame(
  probability.of.borrowed=logistics$fitted.values,
  borrowed=data.combined$borrowed)

predicted.data <- predicted.data[
  order(predicted.data$probability.of.borrowed, decreasing=FALSE),]
predicted.data$rank <- 1:nrow(predicted.data)

ggplot(data=predicted.data, aes(x=rank, y=probability.of.borrowed)) +
  geom_point(aes(color=borrowed), alpha=1, shape=4, stroke=2) +
  xlab("Address") +
  ylab("Predicted Probability of Borrowing")

On the x-axis are all 66,682 addresses in the dataset. On the y-axis are the predicted odds of borrowing on Anchor. The model assigns odds of borrowing to each address.

The color represents actual outcomes. Red points have not borrowed, and turquoise points have borrowed on Anchor.

The model has given less than 25% odds of borrowing to about 60,000 addresses. In actuality, most of these addresses did not borrow.

Most of the addresses with over 50% odds of borrowing have indeed borrowed.

The model has performed well.

Interpreting the model:

The summary function gives us detailed information about the model.

summary(logistics)
## 
## Call:
## glm(formula = borrowed ~ balance_dec_20 + deposit_amount_usd + 
##     nb_deposits + wallet_age + nb_swaps + nb_transfers + votes + 
##     anc_airdrop_claimed + anc_voting, family = "binomial", data = data.combined)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -7.1956  -0.5573  -0.5023  -0.4743   2.1619  
## 
## Coefficients:
##                       Estimate Std. Error  z value Pr(>|z|)    
## (Intercept)         -2.350e+00  2.255e-02 -104.209  < 2e-16 ***
## balance_dec_20      -2.120e-09  4.733e-09   -0.448    0.654    
## deposit_amount_usd  -5.166e-09  1.090e-09   -4.742 2.12e-06 ***
## nb_deposits          6.808e-02  2.207e-03   30.843  < 2e-16 ***
## wallet_age           1.994e-03  1.114e-04   17.898  < 2e-16 ***
## nb_swaps            -1.774e-04  3.925e-04   -0.452    0.651    
## nb_transfers         2.377e-04  1.597e-04    1.488    0.137    
## votes               -7.954e-03  5.231e-03   -1.521    0.128    
## anc_airdrop_claimed  9.304e-07  1.638e-06    0.568    0.570    
## anc_votingvoted      1.759e+00  3.298e-02   53.346  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 62343  on 66681  degrees of freedom
## Residual deviance: 55873  on 66672  degrees of freedom
## AIC: 55893
## 
## Number of Fisher Scoring iterations: 5

The ‘Coefficients’ section provides information on the effect size of each input variable.

On the right we see the p-value of each variable. It is simplified into a ranking of three stars. Variables with *** are statistically significant.

The column ‘Estimate’ is the effect that each variable has on the outcome. Variables with a negative Estimate are inversely associated with the probability of borrowing (and vice versa). For example, the greater the value of ‘deposit_amount_usd’, the lower the odds of this address borrowing on Anchor.

For all addresses, the regression model goes through each input variable and adjusts the odds of borrowing according to the estimate coefficients above.

The size of the estimate coefficient measures it’s impact on the outcome. For example, the estimate coefficient for ‘deposit_amount_usd’ is very small. It is only -5.166e-09 which in real numbers is -0.000000005166. In plain language, for every 1$ deposited into Anchor during the deposit period, the odds of borrowing during the borrowing period are reduced by 0.000000005166.

Median USD deposited by borrowers during the deposit period:

test_borrowed <- data.combined %>% filter(nb_borrows > 0)
test_not_borrowed <- data.combined %>% filter(nb_borrows == 0)
 print(median(test_borrowed$deposit_amount_usd))
## [1] 4248.218

Median USD deposited by non-borrowers during the deposit period:

print(median(test_not_borrowed$deposit_amount_usd))
## [1] 1868.959

The median amount deposited into anchor from borrowers’ is greater than deposits from non-borrowers. Even though larger deposits into anchor reduce the odds of borrowing, the effect size of this variable is very small.

‘anc_voting’ is the variable that has the largest effect on the odds of borrowing. The odds of borrowing are increased by 1.759 when an address has voted on an Anchor governance proposal anytime before the borrowing period. Odds of 1.00 signify a 100% probability of borrowing.

Percentage of borrowers that voted on an Anchor governance proposal:

num_borrowers_voted <- count(filter(test_borrowed, anc_voting == 'voted'))
num_borrowers_did_not_vote <- count(filter(test_borrowed, anc_voting == 'did_not_vote'))
print(num_borrowers_voted / num_borrowers_did_not_vote * 100)
##           n
## 1: 34.87232

Percentage of non-borrowers that voted on an Anchor governance proposal:

print(count(filter(test_not_borrowed, anc_voting == 'voted')) / count(filter(test_not_borrowed, anc_voting == 'did_not_vote'))* 100)
##           n
## 1: 4.329054

Anchor depositors that voted on an Anchor governance proposal are 8 times more likely to borrow compared to non-voters. Tweet this

Takeaways

The on-chain history of Anchor depositors is associated with their odds of borrowing on Anchor.

Factors that are associated with borrowing include:

Factors that are inversely associated with borrowing:

Factors that have no statistically significant effect:

The effect size of each variable is shown in this chart:

V = caret::varImp(logistics)

ggplot2::ggplot(V, aes(x=reorder(rownames(V),Overall), y=Overall)) +
    geom_point( color="blue", size=4, alpha=0.6)+
    geom_segment( aes(x=rownames(V), xend=rownames(V), y=0, yend=Overall), 
                  color='skyblue') +
    xlab('Variable')+
    ylab('Overall Importance')+
    theme_light() +
    coord_flip() 

An Anchor depositor with high odds of borrowing has the following profile:

  1. has voted on a Anchor governance proposal sometime in the past,
  2. has deposited into Anchor multiple times in the last two months, and
  3. has created its Terra address a long time ago.

Limitations of the analysis:

Futher Improvements

Market sentiment changes rapidly in the world of crypto. This may impact transaction patterns. The model could be rebuilt in a different market cycle. Afterwards, we could see if the same input variables had a comparable effect on the odds of borrowing on Anchor. This would give us more confidence in the effect size of the input variables.

Additionally, a new model could be created to predict the amount borrowed by a set of addresses. The amount borrowed by each address will not be precisely right, but a good model may be able to predict the aggregate amount borrowed by a large set of addresses.