For this project, we rely on Ekşi Sözlük as a primary data source because it offers a uniquely rich, user-generated record of public discourse in Turkey. Ekşi Sözlük functions somewhat like a hybrid between a forum and a wiki: users open “topics” (entries) and others contribute comments under those topics over time. Unlike traditional social media, discussions are organized chronologically within a fixed topic, which makes it especially useful for tracing how reactions evolve around specific events.
One important feature for our research is that authors can revise and update their entries retrospectively. This means posts are not strictly static snapshots; they may reflect evolving interpretations or corrections. While this introduces some complexity, it also provides insight into how narratives and opinions shift over time.
Relevant Topics
We focus on five key topics related to Kızılcık Şerbeti, each capturing different aspects of audience reaction:
https://eksisozluk.com/kizilcik-serbeti-dizi--7435046 This is the main discussion thread, with over 1,600 pages (~16,000 comments). Starting around page 57, users discuss the Nursema episode, which becomes a focal point of controversy. In the 70s pages, discussions shift toward the RTÜK ban, showing how attention transitions from narrative content to regulatory issues.
Together, these threads allow us to capture both large-scale discourse (from the main thread) and event-specific reactions (from smaller, focused threads).
Data Collection Approach
There is currently no established R or Python package designed specifically for scraping Ekşi Sözlük. As a result, I developed a custom scraping script tailored to the platform’s structure.
Here’s what I did in detail:
Targeted the main topic first (the largest dataset), since it contains the bulk of the discussion (~16,000 comments).
Iterated through all available pages (over 1,600), which required handling pagination systematically.
For each entry, I extracted:
Text content (the main body of the comment)
Date
Time
These were selected because they are the most analytically useful fields:
The text allows for content analysis (sentiment, themes, discourse patterns).
The timestamp (date + time) enables temporal analysis, such as tracking spikes in discussion around key events (e.g., the Nursema episode or RTÜK decisions).
Overall, Ekşi Sözlük provides a longitudinal, topic-centered, and user-driven dataset, making it particularly well-suited for studying how public discourse unfolds in response to media events. - Justify using EksiSozluk - Make note of all eksisozluk links here - authors can revise their posts and update it over time
Exploratory Data Analysis
In this dataset, we have six variables:
text: This variable contains the full textual content of the entry written by the user. It is the primary variable for analysis, as it captures opinions, reactions, and narratives related to the topic.
page: This indicates the page number on which the entry appears within the topic. Since Ekşi Sözlük organizes entries in a paginated format (typically 10 entries per page), this variable provides a rough sense of position and sequence within the overall discussion.
initial_date: This is the original date when the entry was first posted by the author. It reflects when the comment initially entered the discussion and is crucial for constructing a timeline of reactions.
initial_time: This records the exact time of day (e.g., hour and minute) when the entry was first posted. Combined with initial_date, it allows for fine-grained temporal analysis, such as identifying bursts of activity within a single day.
last_date: This variable captures the date of the most recent edit made to the entry. Because Ekşi Sözlük allows users to revise their posts, this field helps track whether and when content has been updated after its initial publication.
last_time: This is the time of the most recent edit. Together with last_date, it provides a complete timestamp for the latest modification, enabling us to distinguish between original and revised content.
Overall, the distinction between initial and last timestamps is particularly important in this context, as it reflects the platform’s editable nature and allows us to account for temporal dynamics not just in posting behavior, but also in post-publication revisions.
To contextualize the temporal patterns in the Ekşi Sözlük data, we define three critical dates corresponding to major real-world events that likely influenced online discussions:
ban_date (2023-03-22): This marks the date when Kızılcık Şerbeti faced a temporary broadcasting ban imposed by RTÜK. This event triggered significant public debate, both about the show’s content and broader issues of media regulation. We expect to see a noticeable spike in activity and shifts in sentiment around this date.
election_date (2023-05-28): This corresponds to the second round of the 2023 Turkish presidential election. Although not directly related to the show, national elections often shape public discourse, potentially influencing how audiences interpret themes in the series (e.g., politics, social values). This serves as an important external benchmark.
arrest_date (2025-12-15): This represents the date when the screenwriter of the series was reportedly taken into custody, an event discussed in one of the Ekşi Sözlük threads. While this occurs later than the main broadcast controversies, it provides a useful point for examining how discussions resurface or evolve in response to new developments.
These dates allow us to anchor the dataset in real-world events and conduct event-based temporal analysis, such as:
Comparing pre- and post-event discussion intensity
Identifying shifts in narrative or sentiment
Detecting delayed reactions or renewed attention
In short, they act as reference points for interpreting fluctuations in user-generated content over time.
# Dates to considerban_date <-as.Date("2023-03-22")election_date <-as.Date("2023-05-28")arrest_date <-as.Date("2025-12-15")
Activity Over Time
Let’s look at the daily number of posts. I also added a column that accounts for the number of days relative to the ban and election date. Most posts do not appear to happen immediately after the ban/election, but rather around the airing of the season finale episodes, which is likely to be the main driver of engagement.
# A tibble: 20 × 6
initial_date n_posts ban_relative days_label election_relative
<date> <int> <int> <chr> <int>
1 2024-06-07 286 443 +443 days 376
2 2024-06-08 280 444 +444 days 377
3 2023-12-15 203 268 +268 days 201
4 2024-03-09 179 353 +353 days 286
5 2025-01-10 170 660 +660 days 593
6 2024-09-13 148 541 +541 days 474
7 2024-03-15 146 359 +359 days 292
8 2024-12-27 135 646 +646 days 579
9 2024-03-08 130 352 +352 days 285
10 2023-12-01 124 254 +254 days 187
11 2023-04-14 121 23 +23 days -44
12 2023-06-10 115 80 +80 days 13
13 2025-03-14 115 723 +723 days 656
14 2025-04-12 115 752 +752 days 685
15 2024-02-23 112 338 +338 days 271
16 2024-03-29 111 373 +373 days 306
17 2025-09-13 111 906 +906 days 839
18 2023-12-16 110 269 +269 days 202
19 2025-02-28 110 709 +709 days 642
20 2023-06-09 108 79 +79 days 12
# ℹ 1 more variable: days_label_elec <chr>
Below, you can find all eksisozluk posts in 2023 covering the TV show ban and the national election in the upcoming months. We can clearly see that the TV ban led to more engagement. We should also consider the episodes that aired near the end of the year, as these episodes have higher engagement.
If we include all years in this data, we see that the TV ban and the election are still one of the most significant spikes, but there is also a noticeable increase in posts around the end of the year in 2023, which likely corresponds to the airing of new episodes. The arresting of the scriptwriter in September also shows a smaller spike, indicating renewed interest or controversy around that time.
Instead of daily posts, we can also look at the monthly share of posts, but again highest engagement does not appear to be coinciding with the ban of the show.
Nursema Effect
I also particularly looked at mentions of Nursema in the posts. Ahead of the RTUK ban, “nursema” mentions are increasing and they peak around the ban date, which makes sense since the ban was largely driven by the controversy around that storyline. However, after the ban, mentions of Nursema drop sharply and never really recover to pre-ban levels, even during the election period. This suggests that while Nursema was a key driver of discussion leading up to the ban, it did not maintain its prominence in the discourse afterward, possibly because the conversation shifted more toward the regulatory and political implications rather than the content of the show itself.
KWIC Approach
I wand to start with Keyword-in-context (KWIC) approach. The first tier establishes the empirical baseline: how often do politically salient terms appear in the corpus, and does that frequency change around key external events? We begin with KWIC (keyword-in-context) concordances, which display each hit surrounded by its immediate context window.
This is a deliberate first step — reading concordances before counting anything ensures the keyword lists are capturing the intended phenomenon rather than spurious matches. A word like “özgürlük” (freedom) can appear in both secular and religious framings; the concordance tells you which is dominant in this corpus before you commit to including it in a category.
Once the keyword lists are validated, we count hits per entry and aggregate to weekly rates — hits per 100 entries rather than raw counts, so that weeks with more entries do not artificially inflate frequency.
The resulting time series data, plotted with the ban and election dates as reference lines, gives the reader an immediate visual sense of whether political discourse around the show intensified at politically salient moments. This is descriptive, not inferential — the causal testing comes in tier 4 — but it motivates every subsequent analytical step and provides the most accessible evidence for a general audience.
This is invaluable for validating that your keywords actually capture political engagement rather than casual use — for instance “kadın” might appear in a purely gossip context rather than a rights context, and KWIC will reveal that quickly.
KWIC Analysis for Keywords
I decided to look at certain keywords on religion, secularism, gender, politics, and specific to nursema. We mostly have entries about religion, gender, and politics. But, nursema is also important figure here. If we did “doga” or “fatih”, I would expect similar results because these are main characters and it is hard to say whether they are used in political context or simply an assessment of these characters.
# Political keyword groups ----# Define thematically — you can expand each after inspecting KWIC outputpol_keywords <-list(religion =c("başörtü*", "türban*", "dindar*", "muhafazakar*", "namaz*","imam*", "tarikat*", "müslüman*", "helal*", "haram*"),secularism =c("laik*", "atatürk*", "cumhuriyet*", "seküler*", "kemalist*"),gender =c("kadın*", "erkek*", "feminist*", "şiddet*", "taciz*","namus*", "eşitlik*"),politics =c("seçim*", "iktidar*", "muhalefet*", "akp*", "chp*","erdoğan*", "propaganda*", "siyasi*", "oy*"),nursema =c("nursema*") # keep separate — proper noun, specific signal)# KWIC for qualitative inspection ----# Do this first — read the concordances before you count anythingkwic_results <-list(religion =kwic(toks, pattern =phrase(pol_keywords$religion), window =6),secularism =kwic(toks, pattern =phrase(pol_keywords$secularism), window =6),gender =kwic(toks, pattern =phrase(pol_keywords$gender), window =6),politics =kwic(toks, pattern =phrase(pol_keywords$politics), window =6),nursema =kwic(toks, pattern ="nursema*", window =8))# Quick count per groupmap(kwic_results, nrow)
Let’s plot these keywords – wow, politics is the dominant theme before the ban! Great support for our hypothesis. This especially happens pre-ban – we could think about this in detail.
Keyness Analysis
In KWICH approach we can talk about how often political keywords appear. In keyness analysis, we can ask a harder question: what do commenters actually do with those words?
Two complementary methods address this.
Collocations identify which words habitually co-occur within a fixed window around a target keyword. If “kadın” (woman) most frequently appears alongside “kapalı” (covered), “namus” (honour), and “dindar” (devout) rather than alongside “eşitlik” (equality) or “özgür” (free), that pattern constitutes empirical evidence of a specific framing — the show’s female characters are being discussed primarily through a religious-conservative lens. Collocations make that framing legible as a quantitative pattern rather than a qualitative assertion.
Keyness analysis compares two sub-corpora — for instance, entries written before and after the May 2023 election — and identifies which words are statistically over- or under-represented in one period relative to the other. We use the log-likelihood ratio (G²) rather than chi-squared because it is more robust when the two corpora differ substantially in size, which is the case here given that the post-election period contains more entries.
A word with a high positive G² score appeared disproportionately more often after the event; a high negative score means it receded. Taken together, collocations and keyness shift the argument from “political keywords are frequent” to “here is the specific vocabulary through which political meaning is constructed in this discourse, and it changed in a systematic direction around these dates.
So, the logic here is: collocations tell you what travels with your political keywords (framing), and keyness tells you what distinguishes discourse before vs. after each event date (shift). Together they let you say something like “after the election, religious framing intensified while secular vocabulary declined” — with numbers behind it.
Keyness – Pre vs Post Comparison
It looks like prior to the election, there were more political keywords (see in red). Again, supports Lisel’s theory.
Collocation Plots
Which words are associated with nursema, religion, and secularism themes?
Wordfish Approach
Wordfish is a statistical method that reads through a large collection of texts and automatically arranges them along a single axis based purely on the vocabulary patterns it finds — without being told in advance what to look for. The core intuition is simple: people who are writing from a similar perspective tend to use similar words, and people writing from opposing perspectives tend to use systematically different words. Wordfish exploits this by finding the vocabulary dimension that best separates the texts from one another.
In our case, it reads every Ekşi Sözlük entry and assigns each one a position score — entries that cluster at one end of the axis tend to share a certain vocabulary, and entries at the other end share a different one. Crucially, the method does not know what those vocabularies mean politically — that interpretation is our job. Once the model has run, we inspect which words are pulling entries toward each pole: if religious and conservative terminology loads at one end while secular and oppositional vocabulary loads at the other, we can label the axis accordingly and say the model has recovered a meaningful ideological dimension from the data.
To measure the intensity of different political framings across entries, we constructed a custom four-category dictionary based on vocabulary patterns identified in the collocation and keyness analyses.
The religious category captures terms associated with conservative and religious identity, including words relating to the headscarf, piety, religious orders, and Islamic practice.
The secular category captures the opposing discursive tradition, centering on references to Kemalism, the republic, and laicism.
The gender threat category identifies entries foregrounding violence, harassment, and patriarchal norms — relevant because the show’s central dramatic tension revolves around gender relations between its secular and religious characters.
Finally, the political explicit category captures direct references to electoral politics, party names, and political actors, which allows us to distinguish entries that engage with the show’s political subtext implicitly through cultural framing from those that name the political stakes outright.
Each category uses wildcard matching so that inflected forms of a root are captured together — for instance, laik* matches laik, laiklik, and laikçi without requiring each form to be listed separately. This is particularly important for Turkish given its agglutinative morphology, where a single root can surface in dozens of grammatical forms. For each entry, we compute the proportion of tokens matching each category and scale the result to hits per 1,000 tokens, which normalises for entry length and makes scores comparable across entries of different sizes.
Plotting gender, secularism, conservative, and political categories
It looks like plotting the average score per week for each category gives us a clear picture of how different political framings fluctuate over time, especially around key events like the ban and the election. We can see that the religious framing (in red) spikes prior the ban, while the politics framing (in purple) shows a more gradual increase after the ban, leading up to the election, and post-election.
Again this is sensitive to keywords we selected, so we can be more meaningful for our word choices.
Inferential Testing: Interrupted Time Series Approach
In this section, we will ask whether those patterns we observe in the previous sections represent genuine structural shifts or could be explained by random variation. We use two complementary approaches: interrupted time series (ITS) regression, which tests whether each political event produced a statistically significant change in discourse, and correlation/regression models that ask whether dictionary scores and Wordfish positions are systematically related to observable features of the data (time, period, entry length). Together these move the argument from “it looks like discourse changed around the election” to “discourse changed significantly, by this magnitude, at this point in time.”
In this section, we use ITS approach. ITS is the standard quasi-experimental design for observational time series data when you have a known intervention date but no control group. The model estimates four things: (1) the pre-event trend — was discourse already changing before the event? (2) the level change — did the mean jump immediately at the event? (3) the slope change — did the rate of change shift after the event? (4) residual variation unexplained by the model. We fit this for both event dates simultaneously, using religious framing score as the primary outcome because tier 3 suggested it was most responsive to political shocks. We then repeat for theta and political_explicit as robustness checks.
# A tibble: 3 × 6
name mean median sd max zeros
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 political_explicit 140. 0 322. 1000 0.813
2 religious 161. 0 329. 1000 0.774
3 secular 63.3 0 202. 1000 0.881
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 276.5345 31.5134 8.7751 < 2.2e-16 ***
periodban_to_election -100.9875 25.4180 -3.9731 7.301e-05 ***
periodpost_election -145.1400 21.8178 -6.6524 3.550e-11 ***
log_tokens 3.1549 6.6719 0.4729 0.6364
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
═══ Period model: secular ═══
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 60.3877 12.2129 4.9446 8.151e-07 ***
periodban_to_election -25.5136 10.0714 -2.5333 0.011363 *
periodpost_election -27.3110 8.7740 -3.1127 0.001875 **
log_tokens 3.1536 2.6518 1.1893 0.234456
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
═══ Period model: political ═══
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 128.3845 29.1857 4.3989 1.135e-05 ***
periodban_to_election 28.3065 22.6824 1.2480 0.2122
periodpost_election -10.0222 18.4388 -0.5435 0.5868
log_tokens 6.5317 6.4673 1.0099 0.3126
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
═══ Period model: theta ═══
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.3101731 0.0839916 -3.6929 0.0002266 ***
periodban_to_election 0.0078865 0.0645818 0.1221 0.9028170
periodpost_election 0.4269244 0.0507189 8.4175 < 2.2e-16 ***
log_tokens 0.0111695 0.0206205 0.5417 0.5880945
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Political explicit shows no significant period effects, which is actually interpretable. Direct political references — party names, election vocabulary, erdoğan — do not significantly change across periods. This suggests the explicitly political vocabulary was a stable undercurrent throughout rather than something triggered by events.
Religious framing is your most striking finding. The intercept of 277 represents the pre-ban baseline — religious vocabulary was already heavily present from the start. But it then drops significantly in the ban-to-election period (−101, p < 0.001) and drops even further post-election (−145, p < 0.001). This is counterintuitive at first glance — you might have expected religious framing to increase around a politically charged election. The more likely interpretation is that the ban itself provoked an initial burst of religious discourse that was already baked into the pre-ban baseline, and as the show continued commenters shifted toward other registers. Or alternatively, the pre-ban period had a smaller but more politically engaged audience who wrote in explicitly religious terms, while post-election the audience broadened and diluted that signal.
Secular framing tells a consistent and theoretically meaningful story alongside religious. It also declines significantly in both the ban-to-election period (−25, p = 0.011) and post-election (−27, p = 0.002) relative to pre-ban. Crucially, both religious and secular framing decline over time — which suggests the discourse is not simply shifting from one pole to the other but moving away from explicit ideological vocabulary altogether as the show becomes more mainstream.
QUESTIONS FOR LISEL
In Section “Key Dates in Our Data” – Any suggestions for key dates you want to look at? Also, I primarily focus on 2023 in this analysis. Strecthing this to contemporary times can convulate the results, and might not be helpful.
In Section “Nursema Effect” – I particularly looked at nursema keyword, but is there any other keyword we should be looking at?
In Section “KWIC Analysis for Keywords” – any suggestions for keywords or subjects that you want to look at?
In Section “Inferential Testing” – any suggestions on whether this is useful?