Debugging Claude Code in RStudio's Terminal

A blow-by-blow account of how a "something's wrong with ANSI codes" symptom turned into a root-cause patch for a WebSocket UTF-8 framing bug in rsession.

๐Ÿ“… June 11, 2026 โฑ ~2 hours Bug Found rstudio/rstudio #17941 Fable 5 Assist

1 โ€” Setup & Initial Symptoms

1

The Problem Statement

Joe is running Claude Code (Anthropic's AI coding CLI, built on ink / React for terminals) inside RStudio's terminal pane. Something is clearly wrong with the rendering โ€” overprinting, missing content, garbled UI โ€” but the root cause is unknown. The question: how do we systematically find it?

Environment: RStudio 2026.04.0+526 ยท macOS Darwin 25.5.0 (Apple Silicon) ยท zsh ยท TERM=xterm-256color ยท 150ร—35 terminal pane
2

First Look: Environment Scan

Ran a quick battery of diagnostic commands to understand what the terminal reports about itself before writing any test scripts.

echo "TERM=$TERM"; echo "COLORTERM=$COLORTERM"
tput cols; tput lines; tput colors
infocmp -1 | head -60
Notable findings:
โ€ข COLORTERM is empty โ€” truecolor not advertised to apps
โ€ข TERM_PROGRAM is empty โ€” no terminal self-identification
โ€ข strikethrough (smxx) missing from terminfo
โ€ข Tc (truecolor flag) missing from terminfo
โ€ข Python and Node.js isatty() return false (expected โ€” piped through Claude Code's Bash tool)

2 โ€” Terminal Capability Test Script

3

Writing terminal-test.sh

Wrote a comprehensive standalone script covering both programmatic and visual tests โ€” the key insight being that color rendering and attribute correctness require human eyes, while things like cursor position reports can be automated.

Sections covered:

  • Environment snapshot (TERM, COLORTERM, window size)
  • Terminfo capability probes via tput
  • CPR round-trip (ESC[6n โ†’ ESC[row;colR)
  • Device Attributes query (ESC[c)
  • SGR attributes (bold, dim, italic, underline, blink, strikethrough)
  • 16-color, 256-color, and truecolor gradients
  • Unicode & emoji rendering
  • OSC 8 hyperlinks
  • Cursor shape sequences (DECSCUSR)
4

Communication Problem: The Terminal Is Too Broken to Read

โš  At this point it became clear that Claude Code's own output in RStudio was unreadable โ€” overprinting, garbled characters, and escape sequences rendering as literal text. Joe couldn't read Claude's responses in the chat window.

Solution: route all instructions through a plain text file (assistant.txt) in the project root. All subsequent diagnostics were written there and Joe would copy-paste commands from it. This workaround was used for the rest of the session.

cat > /Users/jcheng/Development/posit-dev/assistant/assistant.txt
5

Running the Test โ€” Key Results

Joe ran bash /tmp/terminal-test.sh directly in the RStudio terminal and shared a screenshot of the results.

Issues found:
โ€ข Overprinting in cursor-shape test โ€” using \r without ESC[2K (erase line) left remnants. (This was a bug in the test script itself, not the terminal.)
โ€ข /1;2c appeared after the script โ€” the DA1 terminal response (ESC[?1;2c) leaked into shell stdin because the script didn't fully consume it. Required Ctrl+C to clear.
โ€ข COLORTERM unset and Tc missing confirmed

The leaked /1;2c was actually a useful data point: the terminal does respond to Device Attribute queries and identifies as VT100+AVO.

3 โ€” Color Fix (Partial Win)

6

Fix: export COLORTERM=truecolor

The most actionable finding was the missing COLORTERM. Added to ~/.zshrc. This would tell truecolor-aware apps (including Claude Code) that 24-bit color is available. Joe tested it.

Result: colors improved but the fundamental overprinting problem persisted. The rendering was still a mess.
7

Width Hypothesis (First Red Herring)

Joe shared a screenshot of the garbled Claude Code UI โ€” it showed diff output and chat elements overlapping vertically. Initial hypothesis: a terminal width mismatch between the PTY and the renderer.

stty size   # โ†’ 35 150
tput cols   # โ†’ 150
echo $COLUMNS  # โ†’ 150
All three sources agreed: 150ร—35. Width mismatch theory eliminated.

4 โ€” Systematic Cursor Movement Tests

With width ruled out, focus shifted to cursor positioning โ€” the mechanism ink uses for redraws.

8

Scrolling Hypothesis (Second Red Herring)

Theory: a 35-row terminal is short. When ink prints more than 35 rows, the terminal scrolls. Cursor-up can't reach into the scrollback buffer, so ink redraws at the wrong position. Designed a targeted test:

python3 -c "
import sys, time, shutil
rows = shutil.get_terminal_size().lines
for i in range(rows + 5):
    sys.stdout.write('Filler %02d\n' % i)
for i in range(5):
    sys.stdout.write('TARGET %d\n' % i)
sys.stdout.flush(); time.sleep(0.5)
sys.stdout.write('\033[5A')
for i in range(5):
    sys.stdout.write('\033[2KReplaced %d\n' % i)
sys.stdout.flush()
"
Prediction
TARGET lines visible even after cursor-up (scroll clamps cursor at row 1)
Result
PASS โ€” no TARGET lines visible. Cursor-up after scrolling works correctly.
9

Full Cursor Test Battery (All Passed)

Eight more targeted tests, each isolating one possible failure mode:

Test What it checked Result
ECursor-up over ANSI-colored linesโœ“ PASS
FLine growing past terminal width mid-stream (simulates ink streaming)โœ“ PASS
GOSC 8 hyperlinks in linesโœ“ PASS
HDECSC/DECRC save-restore cursorโœ“ PASS
INode.js (not Python) stdout cursor-upโœ“ PASS
Wall hit: every synthetic test passed, yet Claude Code still garbled. The bug was not in any escape sequence we could hand-craft. Time for a new strategy.
Model switch to Fable 5
โ˜…

Switching to Fable 5

At this point Joe ran /model and switched from Claude Sonnet 4.6 to Fable 5, Anthropic's newest model for complex, long-running tasks. The instruction: "This investigation is not going anywhere, please think outside of the current box and come up with a new strategy."

Fable's insight: stop trying to synthesize the failure with test scripts. All synthetic tests pass because they don't emit the actual bytes Claude Code does. Instead, capture the real byte stream with script -q -r, then replay and analyze it programmatically.

5 โ€” Capturing the Real Byte Stream

10

Strategy Pivot: Record with script

The script command records a terminal session โ€” every byte the PTY emits โ€” into a binary file (BSD script -r / -q format: 24-byte record headers with timestamps, followed by payload bytes).

script -q -r /tmp/claude-session.scr
# Inside: run claude, trigger the bug, exit, then type exit

Parsed the file in Python to extract the output stream separately from input:

import struct
data = open('/tmp/claude-session.scr','rb').read()
# Each record: u64 len, u64 sec, u32 usec, u32 dir ('i'=0x69 / 'o'=0x6f)
pos = 0
while pos + 24 <= len(data):
    ln, sec, usec, d = struct.unpack_from('<QQLl', data, pos)
    ...
11

Reference Emulator: pyte

Fed the captured output bytes into pyte, a pure-Python terminal emulator that correctly implements VT100/xterm. This told us what a standards-compliant terminal should display.

import pyte
screen = pyte.Screen(156, 35)
stream = pyte.ByteStream(screen)
stream.feed(open('/tmp/claude-out.bin','rb').read())
# Display the expected screen
for i, line in enumerate(screen.display):
    if line.strip(): print(f"{i:2d}| {line.rstrip()}")
Key finding: pyte rendered the banner, mascot art, separators, and input box perfectly at 156 columns wide. Claude Code's byte output was entirely correct. The problem was somewhere in RStudio's rendering pipeline.
12

Confirming Width: Claude Code Renders for 156 Columns

The separator lines were 156 โ”€ characters long, and column-absolute positioning sequences reached column 156. Claude Code's PTY reported 156 columns to it. Meanwhile Joe's ruler test showed 150 columns in the renderer โ€” a 6-column discrepancy was briefly suspected as the cause.

The ruler test showed all three rows rendering correctly. Width mismatch eliminated again. The mystery deepened.

6 โ€” Binary Bisection via Replay

13

Building a Replay Kit

Claude Code's startup emits several distinct bursts of output (each being a separate PTY read โ†’ rsession write โ†’ websocket frame โ†’ xterm.js render cycle). Split the captured output into 9 cumulative-prefix files, one per write record. Each file = everything Claude Code emitted up through record N.

Wrote a replay shell script (/tmp/replay/run.sh) that: sanitizes the terminal with stty sane, disables stale focus-reporting and bracketed-paste modes, cats prefix N, pauses for observation, and advances on Enter.

โš  First run revealed more stale terminal state: Enter printed ^M (raw mode stuck on from a previous crashed session), and regaining focus printed ^[[I / ^[[O (focus events enabled and not cleaned up). Added stty sane and printf '\e[?1004l\e[?2004l\e[?25h\e[0m' to fix.
14

Bisection Results

Joe reported which prefixes showed the Claude Code mascot banner:

02 โœ“ 03 โœ— 04 โœ— 05 โœ— 06 โœ“ 07 โœ— 08 โœ— 09 โœ“

The banner toggled on and off. This was compared against what pyte predicted:

Pyte predicted the banner visible in ALL prefixes (02โ€“09). So pyte disagreed with RStudio in 5 out of 8 cases. The bytes were correct; RStudio was mishandling them.
15

Headless xterm.js 6.0.0 Confirms

RStudio bundles xterm.js 6.0.0 (found in src/gwt/tools/build-xterm in the cloned rstudio repo). Installed @xterm/headless@6.0.0 and ran the same prefix sweep.

npm install --legacy-peer-deps @xterm/headless@6.0.0
node -e "
const {Terminal} = require('@xterm/headless');
// feed each prefix, check buffer for 'โ–ˆ'
..."
Same result as pyte โ€” headless xterm.js said the banner was visible in all prefixes regardless of width or height swept from 100โ€“200 cols and 6โ€“45 rows. The bug was not in xterm.js rendering the bytes. It was happening upstream of xterm.js.

7 โ€” Reading RStudio Source Code

16

Cloning RStudio and Tracing the Terminal Pipeline

Cloned the RStudio source at the exact matching tag and traced how terminal output flows from PTY to screen:

git clone --depth 1 --branch v2026.04.0+526 \
  https://github.com/rstudio/rstudio \
  ~/Development/_gh/rstudio/rstudio

The pipeline:

PTY โ†’ rsession (C++) โ†’ ConsoleProcess::onStdout()
    โ†’ enqueOutputEvent()
    โ†’ s_terminalSocket.sendText()  [WebSocket TEXT frame]
    โ†’ xterm.js in the browser
!

The Smoking Gun: enqueOutputEvent()

In src/cpp/session/SessionConsoleProcess.cpp, line 645 (at tag v2026.04.0+526):

if (procInfo_->getChannelMode() == Websocket)
{
    s_terminalSocket.sendText(procInfo_->getHandle(), output);
    return;  // <-- Error returned by sendText is silently discarded
}

sendText() calls websocketpp::server::send() with opcode::text. WebSocket text frames must be valid UTF-8. websocketpp validates outgoing text frames โ€” if the payload isn't valid UTF-8, the send fails and returns an error.

Raw PTY reads are arbitrary byte windows over a UTF-8 stream. A read can end in the middle of a multi-byte character, making the chunk individually invalid UTF-8. That's exactly what happened in our capture: a 1024-byte output record ended with the first byte of โ”€ (U+2500, bytes E2 94 80), and the next record began with the remaining two bytes.

websocketpp refuses to send the frame โ†’ error is silently discarded โ†’ the entire ~1KB chunk is dropped. Any valid ASCII in the same chunk goes down with it. Claude Code's separator lines are 156 ร— 3-byte box-drawing characters, so a large fraction of its output constantly hits this.

8 โ€” Confirmation Tests

17

First Confirmation Test

Designed a test with a split multibyte character and predicted exactly what RStudio vs. iTerm2 should show:

python3 -c "
import os, time
os.write(1, b'LINE 1: should be VISIBLE\n');  time.sleep(0.5)
os.write(1, b'LINE 2: invisible \xe2');        time.sleep(0.5)
os.write(1, b'\x94\x80 LINE 3: invisible\n');   time.sleep(0.5)
os.write(1, b'LINE 4: should be VISIBLE\n')
"
Prediction (RStudio)
Lines 1 & 4 visible; lines 2 & 3 missing (they are the invalid UTF-8 chunks)
Actual (RStudio)
ALL FOUR lines invisible
"All four invisible" is even more consistent with the theory. Without the sleep calls, the writes coalesce into fewer PTY reads. The two invalid chunks swallowed innocent ASCII with them.
18

Second Confirmation (Sleeps Prevent Coalescing)

Re-ran with explicit sleeps between each write to ensure each OS write becomes a separate PTY read chunk:

Prediction
Lines 1 & 4 visible; lines 2 & 3 invisible
Result
CONFIRMED. Lines 1 & 4 visible, lines 2 & 3 completely gone.

9 โ€” Workaround & Fix

19

Workaround: Disable WebSockets

RStudio has an alternate output channel for its terminal: RPC mode, which doesn't use WebSocket text frames and therefore doesn't validate UTF-8.

Tools โ†’ Global Options โ†’ Terminal โ†’ uncheck "Connect with WebSockets"

Joe restarted the terminal and ran Claude Code again.

"It works way, way better!" โ€” Claude Code's UI rendered correctly, matching iTerm2 output. The only residual artifact: occasional ๏ฟฝ๏ฟฝ๏ฟฝ where a chunk boundary splits a multibyte character (the RPC path still decodes each chunk independently, so the bytes arrive but the character renders as three replacement chars rather than being assembled).

The ๏ฟฝ๏ฟฝ๏ฟฝ characters are the bytes of โ”€ (U+2500, E2 94 80) each rendered as U+FFFD. Not an em-dash โ€” a box-drawing light horizontal line. The proper fix would carry incomplete trailing UTF-8 bytes to the next chunk before sending, eliminating both the drops and the ๏ฟฝ๏ฟฝ๏ฟฝ.

20

Filing the GitHub Issue

Drafted and filed a detailed bug report with minimal repro, root-cause analysis, code references, and suggested fix.

gh issue create -R rstudio/rstudio \
  --title "Terminal silently drops output chunks with invalid UTF-8 (WebSockets)" \
  --body-file issue-body.md
๐Ÿ›

rstudio/rstudio #17941

Terminal silently drops entire output chunks containing invalid UTF-8 (split multibyte characters) when using WebSockets

๐Ÿ” Root Cause Summary

10 โ€” What Made This Work

โœฆ

Key Diagnostic Techniques

1. Routing around the broken medium. When the terminal itself was too garbled to read responses, using a plain assistant.txt file as the communication channel let the investigation continue without switching tools.

2. The model switch that unstuck things. Switching to Fable 5 mid-investigation introduced the critical insight: synthetic tests will always pass because they don't reproduce the real failure conditions. Capture the actual byte stream instead.

3. script -q -r as a byte-perfect capture tool. Recording a real session and parsing the binary format separated "what bytes were emitted" from "what the terminal did with them."

4. Reference emulator comparison. Feeding the same bytes into pyte (Python) and @xterm/headless (the same xterm.js version RStudio bundles) proved the data was correct before looking at the transport.

5. Reading the source. Once we knew the fault was in the transport layer, cloning the exact RStudio version and reading enqueOutputEvent() took about 10 minutes to find the discarded error and connect it to WebSocket text frame validation.

fin

Investigated June 11, 2026 ยท Claude Sonnet 4.6 + Fable 5 ยท Bug filed as rstudio/rstudio #17941