Trey Oehmler, Fall 2019

Venmo’s Public Data API

This demonstration will use transaction data from the popular peer-to-peer payment app, Venmo. This dataset describes over 7 million transactions and can be found in full here. This collection was put together by Dan Salmon to show just how public and available Venmo transaction data really is. Up until recently, Venmo did not throttle requests to its public API endpoint (https://venmo.com/api/v5/public). This allowed anyone to make unlimited API calls to effectively scrape all transactions on Venmo that were listed as public. This has since been changed so that only 20 transactions are returned for a given IP address every minute. Despite this change and Venmo’s positioning of itself as a social network for payment transfers, the company has recently been criticized in the press for its “public by default” approach to user data. As privacy has become a major concern for a growing number of technology companies, Venmo stands out as one of the few services that requires users to manually set their account activity to private.

For each transaction, a wealth of information is provided, including the first and last names and usernames of both users involved, as well as a timestamp, the type of transaction (payment or charge) and a note. My analysis will only be working with the notes and timestamps for each transaction.

Transaction Notes

Any time a transaction is made using Venmo, a “note” is required to describe what the payment or charge is for. Many users choose to enter emojis as a substitute for a written explanation. While not much insight can be gained from interpreting emoji’s by themselves, I’ll propose a method for relating their use to different periods of the day. Through doing so, we can begin to see interesting correlations emerge between the use of different words and emojis throughout a 24 hour span, and draw conclusions about the types of payments that can be inferred from different emojis.

Exploratory Analysis

Before moving to emojis, it makes sense to first try and understand the patterns in how full words are used. To do so, it makes sense to find which words are used most frequently.

word total
food 135632
uber 91794
rent 60373
love 44188
gas 42680
stuff 42356
bills 41159
august 35115
bill 32851
dinner 31013

We can start to see how frequently Venmo is used to send rent payments, split Ubers, pay bills and share the tab from a night out. While this list is interesting, it doesn’t say much about how these use cases differ. Intuitively, paying back a friend for dinner is a lot different than paying for utilities, however the cumulative totals for these types of payments won’t give this kind of insight. This is where timestamps come into play.If we can understand when a particular type of transaction occurs, we can start to differentiate between types of transactions and use these trends to better interpret what types of transactions are signified by different emojis.If we limit our scope to only transactions containing the words “rent” and “food”, and plot the distributions of these transactions at one hour intervals throughout the day, we can see a clear pattern start to emerge.

It becomes clear that payments for rent seem to peak during the morning hours while payments for food seem to dip during this same period. If we illustrate these distributions as line charts and plot them together, their inverse relationship becomes even more clear. We can observe a similar phenomenon for the words “bill” and “dinner”, as well as for “night” and “wifi”.

These distributions match the intuition that certain types of transactions are more likely to occur at night than during the day and vice versa.

Understanding Emoji Use

How do the distributions of full words help understand the meaning behind emojis? If we can obverse similar patterns of inverse relationships across emojis, we can speculate as to which kind of transaction an emoji represents. By finding the frequency of individual characters and filtering any letters, digits and punctuation, we can see the most commonly used emojis exhibit a variety similar to that of the most commonly used words.

word total
💸 154763
🍕 145517
🏠 111379
❤️ 100480
🌮 81323
🏈 77600
🍺 67078
🎉 62717
😘 58447
🍻 56662
$ 52348
🍷 51473
🚗 50265
49881
🔌 49729

We can visualize the hourly distributions of some of the most used emojis in the following way:

While certain emojis, like pizza, are quite literal in what they probably represent, others, like the money with wings, are more vague. It’s hard to intuitively describe what kind of payment would be made with a flying money icon as the note. Based on its similarity to the distribution for the word “rent”, we can infer this icon is used to describe similar sorts of a transactions. The same can be said of the similarity between “food” and the emoji for beer. One of the striking features of these distributions is their parabolic shape. We can interpret this as an indication that each word or emoji has some particular time during the day when it peaks in its usage. Note that we can roughly approximate this curve with a quadratic function by performing a polynomial regression.

For each quadratic, we can model this curve with an equation of the form \(y = t1x + t2x^2 + b\), with \(t1\) and \(t2\) as a pair of coefficients and \(b\) as an intercept. If we take this model to represent the pattern of use for a particular word or emoji, then we a can interpret \(t1\) and \(t2\) as descriptions of this pattern of use. If we find these coefficients for the used emojis and plot them with the \(t1\) coefficient the x-axis and the \(t2\) coefficient as the y-axis, we end up with a surprisingly linear result.

From these visualizations emerge what appears to be a spectrum. We can speculate that this spectrum is indicative of the kinds of transactions that take place on Venmo over the course of a day. On one end of the spectrum are the transactions that are more likely to take place at night, such as those for food, drinks and transit. On the other end of the spectrum are the transactions that are more likely to take place during the day, such as those for rent and bills. Along this spectrum we can mark each emoji and interpret its position. We can model this distribution linearly:

We can then map each point in this distribution to a single dimension by drawing a line perpendicular to the model. If we take the two furthest points to be our maximum and minimum, we can then center this mapping and plot each symbol along this line. Following this process for the most common emojis gives a clearer visualization for the daily trends in their use on Venmo:

Closing Thoughts

This analysis and visualization highlights an interesting phenomenon in the way emojis are used on Venmo. While some emojis and words peak in their usage during the hours of 6:00 am to 12:00 pm, others peak during the hours of 6:00 pm t0 12:00 am. What’s particularly remarkable is the consistency with which this pattern occurs. Even when comparing the hourly distributions of less frequently used emojis, their use over the course of day seems to fall into one of these two categories. For more frequent symbols, the differences between the daily peak and minimum are much more pronounced, which further speaks to the idea that Venmo transactions in general can be categorized along a binary spectrum. It’s not shocking that more rent payments are made during the day than at night, and it makes sense that pizza emojis peak at night, but the degree to which their hourly use parallels each other is certainly notable.