Distinct Platform Identity: A Cross-Platform Analysis of Linguistic Norms

Part I: Sentiment and emotional language on Tumblr

Author

Ellie Gómez Tovar

Published

April 21, 2026

Do platforms have personalities?

If you’ve spent time on social media, you’ve probably noticed that different platforms feel different — and not just because of how they look. The writing feels different. The emotional register is different. Tumblr has a reputation for being earnest and intense. Reddit feels like it wants to debate you. Threads is still figuring out what it is.

But is that just vibes, or is there something measurable underneath?

This project explores that question. Using linguistic data collected directly from platform APIs, I’m asking whether the emotional tone of posts — as captured through sentiment analysis — can actually back up those impressions. This is Part I, focusing on Tumblr as the opening case study.


Research Question

What does sentiment analysis reveal about the emotional tone of Tumblr posts, and what might that suggest about Tumblr’s platform-specific voice?


Why start with Tumblr?

Tumblr is an ideal first test case because its reputation is so specific. Users write in ways that are emotionally heightened, highly stylized, and deeply communal. The platform has its own grammar of feeling — irony layered over sincerity, long-form emotional posts tagged with lowercase stream-of-consciousness labels. If any platform’s “personality” should be traceable through language, it’s Tumblr.

That makes it a good anchor for comparison: once we know what Tumblr looks like emotionally, we have a baseline for asking whether Reddit (or other platforms) are genuinely different.


Collecting the Data

The Tumblr dataset was collected through the platform’s API using tagged posts related to fandom and niche topic discussion. Because Tumblr’s tag endpoint returns a limited number of posts per request, I paginated backward through multiple batches and then deduplicated the results.

Note

Final dataset: 499 Tumblr posts after cleaning and deduplication.


Cleaning the Corpus

Raw API data is messy. Tumblr’s output included nested JSON structures, HTML-formatted post bodies, and metadata fields that weren’t useful for text analysis. The cleaning process involved:

  • Reducing to the most relevant variables (post text, tags, timestamp, engagement)
  • Stripping HTML tags from post bodies
  • Standardizing and flattening the tags column
  • Exporting a clean CSV for sentiment scoring

This is less glamorous than the analysis itself, but it’s the part that determines whether the results mean anything.


Sentiment Analysis

Sentiment was scored using the NRC Emotion Lexicon (Mohammad & Turney, 2013), which maps words onto eight emotion categories — trust, anticipation, joy, fear, anger, sadness, surprise, and disgust — plus positive/negative valence.

What emotions showed up most?

The chart below shows the total count of each NRC emotion category across all 499 Tumblr posts.

The standout finding: “trust” dominates by a wide margin. Anticipation comes in second, followed by joy and fear at roughly similar levels. This is consistent with Tumblr’s community-oriented culture — a lot of the emotional language in these posts is warm, earnest, and oriented toward shared enthusiasm rather than conflict.

How positive or negative were posts overall?

The distribution of average sentiment scores leans positive, with most posts clustered around neutral-to-mildly-positive. There are very few strongly negative posts in this sample, which tracks with the fandom-discussion context — these are people talking about things they care about.


Comparing Tumblr and Reddit

Now for the interesting part. To see whether Tumblr’s emotional profile is actually distinctive, I compared it to a parallel Reddit corpus (248 posts, collected via search export and cleaned to a matching schema).

A few things stand out:

  • Reddit scores higher on average emotion counts per post across almost every category — especially trust, anticipation, and joy.
  • Tumblr scores higher on average sentiment when normalizing by post length and controlling for raw counts.

This is a subtle but meaningful distinction. Reddit posts pack in more emotional content per post in absolute terms, likely because they tend to be longer and more discursive. But Tumblr posts are tonally more positive on average — the emotional language that does appear skews warmer.

Tip

Key takeaway: While emotion counts per post are higher on Reddit, average sentiment across platforms is higher on Tumblr. The platforms differ not just in how much emotional language appears, but in what kind.


Interpretation

These results are preliminary, but they do line up with how people tend to describe these platforms anecdotally. Tumblr’s writing style is affectively intense and tag-oriented — posts perform emotion as much as communicate it. Reddit’s style consolidates discussion within threads, generating more total text (and more total emotional words), but within a more debate-forward frame.

A few important caveats:

  • The NRC Lexicon misses sarcasm and slang, both of which are extremely common on both platforms. Platform-specific irony likely suppresses apparent emotion scores.
  • This is not a time-matched experiment. The Tumblr and Reddit samples weren’t collected from the same time window, which limits direct comparison.
  • Sample size is modest. 499 Tumblr posts and 248 Reddit posts is enough to see patterns, but not enough to make strong claims.

What’s Next

This is Part I of a longer project. Planned next steps include:

  1. Adding a third platform — Bluesky is a good candidate given its continued growth, distinct community norms, and the fact that it is largely micro-blogging/text-based!
  2. Testing alternative sentiment methods — or manually validating a subset to check how well the NRC Lexicon handles platform-specific language.
  3. More advanced linguistic analysis — deeper part-of-speech tagging, topic modeling, or discourse-level features beyond word-level sentiment.

The goal isn’t just to confirm that platforms feel different — it’s to get specific about how they differ linguistically, and whether those differences are consistent enough to say something meaningful about platform design and culture.


References

  • Mohammad, S. M., & Turney, P. D. (2013). NRC Emotion Lexicon (EmoLex). National Research Council Canada.
  • Jockers, M. (2023). syuzhet: Extracts sentiment and sentiment-derived plot arcs from text [R package]. CRAN.
  • Kunst, J. (2023). RedditExtractoR: Reddit data extraction toolkit [R package]. CRAN.
  • Tumblr. (2026). Tumblr API documentation. Retrieved April 7, 2026, from Tumblr Developers.
  • Dr. Nathan Carpenter
  • Dr. Jieun Shin