2026-03-19

Dataset Overview and Source

Reddit “Dead Internet” Comment Dataset (2026)

This analysis examines data from 500 records of synthetic Reddit comment metadata generated to simulate the “Dead Internet” environment of 2026.

Data Source: Kaggle — nudratabbas/the-dead-internet-theory-reddit-bot-vs-human

Key Variables:

  • subreddit: Community where comment was posted (ex: r/funny, r/gaming)
  • account_age_days: Account maturity and reputation signals
  • reply_delay_seconds: Time from post to comment — instant replies suggest automation
  • is_bot_flag: Ground-truth label — Bot (218) or Human (282)
  • bot_type_label: Bot sub-type — AI Summarizer, Engagement Farmer, Reprint Bot, or None (Human)
  • bot_probability: Model-estimated probability of being a bot (0–1)

R Code for Data Preparation

Here’s how the data is loaded and prepared for analysis:

# Code is visible but does not run twice; Load required libraries
library(ggplot2)
library(plotly)
library(dplyr)
# Load the data from local CSV
internet <- read.csv("reddit_dead_internet_analysis_2026.csv")
# Convert categorical columns to factors
internet$subreddit      <- factor(internet$subreddit)
internet$is_bot_flag    <- factor(internet$is_bot_flag,
                                  levels = c("False", "True"),
                                  labels = c("Human", "Bot"))
internet$bot_type_label <- factor(internet$bot_type_label,
                                  levels = c("None (Human)",
                                             "AI Summarizer",
                                             "Engagement Farmer",
                                             "Reprint Bot"))

3D Plotly: Account Age, Reply Delay, and Word Length

3D Plot Analysis

Key Observations:

  • X-Axis Account Age: Bots average 759 days, humans 1,466 days. Bot points concentrate toward the left at low account age values, reflecting that most bots operate on newly created accounts, while human points spread across the range.

  • Y-Axis Reply Delay: TBots cluster tightly at near-zero reply delay as they average 6 seconds while humans spread broadly from hundreds to thousands of seconds at an average of 1,823 seconds, creating an almost complete separation on the Y-axis.

  • Z-Axis Average Word Length: Bots average 6.44 chars/word, while humans cluster in the lower range at 5.05, reflecting the more formal vocabulary patterns of AI-generated text.

  • Combined Effect: Bots will cluster tightly near the bottom of the Y-axis (near-zero reply delay) and upwards on Z (longer words), while humans spread out — making the separation visually intuitive even in 3D

Plotly Grouped Bar: Bot vs Human Count by Subreddit

ggplot Boxplot: Reply Delay by Subreddit and Bot Status

ggplot Faceted Scatterplot: Account Age vs Word Length by Bot Type

Statistical Analysis: Summary Statistics

internet %>%
  group_by(is_bot_flag) %>%
  summarise(
    Count         = n(),
    Mean_Age      = round(mean(account_age_days), 1),
    Median_Age    = round(median(account_age_days), 1),
    Mean_Delay    = round(mean(reply_delay_seconds), 1),
    Median_Delay  = round(median(reply_delay_seconds), 1),
    Mean_WordLen  = round(mean(avg_word_length), 2),
    SD_WordLen    = round(sd(avg_word_length), 2)
  )
## # A tibble: 2 × 8
##   is_bot_flag Count Mean_Age Median_Age Mean_Delay Median_Delay Mean_WordLen
##   <fct>       <int>    <dbl>      <dbl>      <dbl>        <dbl>        <dbl>
## 1 Human         282    1466.       1419     1823          1756.         5.05
## 2 Bot           218     759.        120        5.8           6          6.44
## # ℹ 1 more variable: SD_WordLen <dbl>

Summary Statistics: Interpretation

Detailed Findings:

  • Balanced Dataset: With 282 humans versus 218 bots, the sample sizes are not the most balanced, but they can still provide a solid foundation for reliable statistical comparisons.

  • Reply Delay is the strongest signal: Humans average 1,823 seconds (around 30 min) to reply versus bots at just 6 seconds — over 300 times difference in means and a 293x difference in medians (1,757s vs 6s), making this the single most discriminating variable in the dataset.

  • Account Age exposes throwaway bots: Human accounts have a median age of 1,419 days (~4 years) compared to just 120 days for bots, revealing that the majority of bots operate on recently created accounts to avoid leaving a behavioral history.

  • Word Length is a consistent linguistic tell: Bots average 6.44 characters per word versus 5.05 for humans, with nearly no overlap in range (bots: 5.5–7.5, humans: 4.0–6.0). AI-generated text consistently favors longer, more formal vocabulary.

Statistical Analysis: T-Test

## 
##  Welch Two Sample t-test
## 
## data:  avg_word_length by is_bot_flag
## t = -27.159, df = 478.58, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group Human and group Bot is not equal to 0
## 95 percent confidence interval:
##  -1.489776 -1.288749
## sample estimates:
## mean in group Human   mean in group Bot 
##            5.048582            6.437844
## 
##  Welch Two Sample t-test
## 
## data:  reply_delay_seconds by is_bot_flag
## t = 29.836, df = 281.01, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group Human and group Bot is not equal to 0
## 95 percent confidence interval:
##  1697.373 1937.160
## sample estimates:
## mean in group Human   mean in group Bot 
##         1823.028369            5.761468

T-Test: Interpretation

Comprehensive Analysis:

  • Statistical Significance: The p-value is essentially zero, meaning the difference in average word length between humans (5.05) and bots (6.44) is overwhelmingly unlikely to be due to chance.

  • Effect Size: The 1.39 character gap is consistent and meaningful — bot-generated text reliably uses longer, more formal vocabulary, likely reflecting AI language patterns from training data. A difference of ~1,817 seconds (~30 minutes) in mean reply time. Bots respond near-instantly; no human reflexively replies within 10 seconds.

  • Confidence Interval: The true mean difference is between 1,697s and 1,937s with 95% confidence — an enormous and precise gap. We are 95% confident the true difference in means falls between 1.29 and 1.49 characters, a very tight interval that reflects high precision from the large sample.

Key Insights and Conclusions

Major Findings:

First — Bots have a consistent behavioral fingerprint: Across every subreddit, bots are characterized by near-instant replies, young accounts, and longer-than-average word lengths. The combination of all three is definitive, but even reply delay alone separates the groups almost perfectly.

Second — Linguistic patterns betray AI authorship: The 1.39 character gap in average word length, confirmed by t-test, reflects how large language models favor formal vocabulary. This signal persists even when bots successfully mimic karma and posting frequency.

Third — Karma farming is effective camouflage: Bot karma distributions are nearly identical to human ones, meaning reputation-based detection alone would fail entirely. Effective detection requires behavioral timing and linguistic analysis working together.

Implications and Future Directions

Practical Application: Platform moderation could implement real-time reply delay thresholds — any account responding consistently under 15 seconds across multiple posts warrants automated review, regardless of karma or account age.

Study Limitations:

  • This dataset is synthetic, meaning bot behavior patterns were designed to be detectable.
  • Real-world bots may deliberately introduce reply delay variation or word length noise to evade these exact signals. Results should be validated against live Reddit data before operational deployment.

Future Research Directions:

  • Longitudinal studies tracking whether bot account age distributions shift as platforms improve new-account detection.
  • NLP analysis of full comment text beyond average word length to identify deeper stylistic signatures of AI-generated content.
  • Cross-platform comparison to determine whether these behavioral fingerprints generalize beyond Reddit.

Thank You