Major 3: Sentiment Analysis of 15 Minute Cities

1. Examining “15 minute cities” using user-generated text data and sentiment analysis.

The “15 minute city” is a trending topic in urban planning for its focus on walkable communities with equitable access to services. However, it also has appeared as a buzzword for government control, especially in a post-COVID world. Understanding the public discourse around “15 minute cities” can guide planners in addressing misconceptions while garnering community support for sustainable urban design. In this analysis, I aim to better understand changes in sentiment of the “15 minute city” over the past 12 months on Reddit.

library(RedditExtractoR)
library(anytime)
library(magrittr)
library(httr)
library(tidytext)
library(tidyverse)
library(igraph)
library(ggraph)
library(wordcloud2)
library(textdata)
library(sf)
library(tmap)
library(here)
library(dplyr)
library(stringr)
library(stringi)
library(lubridate)
library(sentimentr)
library(syuzhet)

2. Search Reddit threads.

# Using keyword '15 minute city'

threads_1 <- find_thread_urls(keywords = '15 minute city', 
                              sort_by = 'relevance', 
                              period = 'all') %>% 
  drop_na()

## parsing URLs on page 1...
## parsing URLs on page 2...
## parsing URLs on page 3...

rownames(threads_1) <- NULL

colnames(threads_1)

## [1] "date_utc"  "timestamp" "title"     "text"      "subreddit" "comments" 
## [7] "url"

head(threads_1, 3) %>% knitr::kable()

date_utc	timestamp	title	text	subreddit	comments	url
2023-02-13	1676258909	Edmonton’s 15-minute city plan		Edmonton	96	https://www.reddit.com/r/Edmonton/comments/110y6ty/edmontons_15minute_city_plan/
2023-02-18	1676714201	Be careful visiting the city centre today, a protest against ‘15 minute cities’ is taking place on Broad Street	The usual far right dickheads and nationalists will be out and about…	oxford	217	https://www.reddit.com/r/oxford/comments/115bxug/be_careful_visiting_the_city_centre_today_a/
2024-10-08	1728410439	Bill Gates is manufacturing hurricanes to turn Tampa in a 15-minute city. And the evil plan was leaked through Alexa.		insanepeoplefacebook	34	https://www.reddit.com/r/insanepeoplefacebook/comments/1fz682l/bill_gates_is_manufacturing_hurricanes_to_turn/

Save threads_1 as CSV

# write.csv(threads_1, "15mincity.csv", row.names = FALSE)

3. Clean data and tokenize.

# Filter rows that mention "15 minute city" or "15 minute cities"
filtered_threads <- threads_1 %>%
  filter(
    str_detect(title, "\\b15 minute city\\b|\\b15 minute cities\\b") |
    str_detect(text, "\\b15 minute city\\b|\\b15 minute cities\\b")
  )

# Remove the one entry from 2022 to only show 2023 and 2024
filtered_threads <- filtered_threads %>%
  filter(timestamp != 1642900768)

# Tokenize
words <- threads_1 %>% 
  unnest_tokens(output = word, input = text, token = "words") 

# Load list of stop words - from the tidytext package
data("stop_words")
# View random 50 words
print(stop_words$word[sample(1:nrow(stop_words), 100)])

##   [1] "more"         "thinks"       "out"          "alone"        "could"       
##   [6] "h"            "gives"        "lately"       "is"           "up"          
##  [11] "opens"        "these"        "new"          "clear"        "did"         
##  [16] "would"        "beside"       "see"          "got"          "please"      
##  [21] "then"         "had"          "any"          "secondly"     "others"      
##  [26] "any"          "viz"          "hardly"       "less"         "away"        
##  [31] "particularly" "shows"        "needed"       "whole"        "parts"       
##  [36] "everything"   "u"            "on"           "us"           "upon"        
##  [41] "overall"      "said"         "perhaps"      "howbeit"      "f"           
##  [46] "faces"        "didn't"       "do"           "herself"      "how's"       
##  [51] "or"           "known"        "but"          "eg"           "various"     
##  [56] "were"         "don't"        "yourselves"   "here's"       "its"         
##  [61] "keeps"        "little"       "too"          "smallest"     "what's"      
##  [66] "a"            "down"         "some"         "having"       "great"       
##  [71] "each"         "than"         "brief"        "few"          "just"        
##  [76] "from"         "furthermore"  "their"        "mr"           "very"        
##  [81] "onto"         "go"           "i've"         "seconds"      "by"          
##  [86] "her"          "different"    "orders"       "she"          "hasn't"      
##  [91] "he'd"         "fully"        "asked"        "few"          "over"        
##  [96] "also"         "look"         "our"          "those"        "you'll"

# Regex that matches URL-type string
replace_reg <- "http[s]?://[A-Za-z\\d/\\.]+|&amp;|&lt;|&gt;"

words_clean <- threads_1 %>% 
  # drop URLs
  mutate(text = str_replace_all(text, replace_reg, "")) %>%
  # Tokenization (word tokens)
  unnest_tokens(word, text, token = "words") %>% 
  # drop stop words
  anti_join(stop_words, by = "word") %>% 
  # drop non-alphabet-only strings
  filter(str_detect(word, "[a-z]"))

# Check the number of rows after removal of the stop words. There should be fewer words now
print(
  glue::glue("Before: {nrow(words)}, After: {nrow(words_clean)}")
)

## Before: 10786, After: 4115

Plot of words found in selected threads (not including stop words):

words_clean %>%
  count(word, sort = TRUE) %>%
  top_n(20, n) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(x = word, y = n)) +
  geom_col() +
  xlab(NULL) +
  coord_flip() +
  labs(x = "words",
       y = "counts",
       title = "Unique wordcounts")

4.Generate a word cloud that illustrates the frequency of words except the keyword “15 minute city” or “15 minute cities”.

# Filter out keywords
words_clean_filtered <- words_clean %>%
  filter(!word %in% c("15", "minute", "city", "cities"))

n <- 20
h <- runif(n, 0, 1) 
s <- runif(n, 0.6, 1) 
v <- runif(n, 0.3, 0.7) 

df_hsv <- data.frame(h = h, s = s, v = v)
pal <- apply(df_hsv, 1, function(x) hsv(x['h'], x['s'], x['v']))
pal <- c(pal, rep("grey", 10000))

words_clean_filtered %>% 
  count(word, sort = TRUE) %>% 
  wordcloud2(color = pal, 
             minRotation = 0, 
             maxRotation = 0, 
             ellipticity = 0.8)

5. Conduct a tri-gram analysis.

# Extract trigrams from the text data
words_ngram <- filtered_threads %>%
  mutate(text = str_replace_all(text, replace_reg, "")) %>% 
  select(text) %>% 
  unnest_tokens(paired_words, text, token = "ngrams", n = 3) 

# Separate trigrams into individual words
words_ngram_pair <- words_ngram %>%
  separate(paired_words, into = c("word1", "word2", "word3"), sep = " ")

# Filter trigrams
words_ngram_pair_filtered <- words_ngram_pair %>%
  filter(
    !word1 %in% stop_words$word & 
    !word2 %in% stop_words$word & 
    !word3 %in% stop_words$word, # Remove stop words
    str_detect(word1, "^[a-zA-Z]+$") & 
    str_detect(word2, "^[a-zA-Z]+$") & 
    str_detect(word3, "^[a-zA-Z]+$") # Remove non-alphabetic terms
  )

# Count trigram frequencies
trigram_counts <- words_ngram_pair_filtered %>%
  count(word1, word2, word3, sort = TRUE)

# Display the top 20 trigrams
top_trigrams <- trigram_counts %>%
  head(20)

# Create table
trigram_table <- trigram_counts %>%
  mutate(trigram = paste(word1, word2, word3, sep = " ")) %>%
  select(trigram, n) %>%
  arrange(desc(n)) %>%
  head(20)

trigram_table

##                        trigram n
## 1           cities north texas 2
## 2      fields community suburs 2
## 3      frisco fields community 2
## 4          main street america 2
## 5          minute cities north 2
## 6           north texas frisco 2
## 7          texas frisco fields 2
## 8         add public transport 1
## 9            adding bike lanes 1
## 10            aka credit score 1
## 11   american households owned 1
## 12                anti car isn 1
## 13            anti vax climate 1
## 14   approved emission control 1
## 15 approved insurance coverage 1
## 16      approved licence plate 1
## 17     aux services essentiels 1
## 18            bakery bar parks 1
## 19             bar parks movie 1
## 20  barcelona melbourne oxford 1

Interestingly, the city of Frisco, Texas, located in the Dallas-Fort Worth metroplex, is highlighted as a top trigram. The focus seems to be on Fields, an announced planned mixed use community there that is situated on a 2,544-acre site and will be home to the PGA and the University of North Texas. It is being branded, at last by commenters online, as a model “15 minute city.”

Trigram visualization

# Create a network graph from trigram data
word_network <- trigram_counts %>%
  mutate(trigram_label = paste(word1, word2, word3, sep = " ")) %>%
  graph_from_data_frame(directed = FALSE)

ggraph(word_network, layout = "fr") + 
  geom_edge_link(aes(edge_alpha = 0.6, edge_width = n), show.legend = FALSE) + 
  geom_node_point(color = "darkslategray4", size = 3) + 
  geom_node_text(aes(label = name), vjust = 1.8, size = 4, check_overlap = TRUE) + 
  labs(title = "Trigram Word Network", x = NULL, y = NULL) + 
  theme_void()

## Warning: The `trans` argument of `continuous_scale()` is deprecated as of ggplot2 3.5.0.
## ℹ Please use the `transform` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Because the highest number of trigrams found was just two, I decided to look into bigrams as well.

# Extract and clean bigrams
top_bigrams <- filtered_threads %>%
  mutate(text = str_replace_all(text, replace_reg, "")) %>%  
  unnest_tokens(paired_words, text, token = "ngrams", n = 2) %>%
  separate(paired_words, into = c("word1", "word2"), sep = " ") %>%
  filter(!word1 %in% stop_words$word, !word2 %in% stop_words$word) %>%
  drop_na(word1, word2) %>% 
  count(word1, word2, sort = TRUE) %>%
  slice_max(n, n = 10) %>%
  mutate(bigram = paste(word1, word2))  

# Create the bar chart
ggplot(top_bigrams, aes(x = reorder(bigram, n), y = n)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  coord_flip() +
  labs(
    title = "Top 10 Bigrams (Excluding Stop Words)",
    x = "Bigrams",
    y = "Frequency"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Words to exclude
excluded_words <- c("15", "minute", "minutes", "city", "cities")

# Extract and clean bigrams
top_bigrams <- filtered_threads %>%
  mutate(text = str_replace_all(text, replace_reg, "")) %>%  
  unnest_tokens(paired_words, text, token = "ngrams", n = 2) %>%
  separate(paired_words, into = c("word1", "word2"), sep = " ") %>%
  filter(!word1 %in% stop_words$word, !word2 %in% stop_words$word) %>%
  drop_na(word1, word2) %>%  
  filter(!word1 %in% excluded_words, !word2 %in% excluded_words) %>%  # Exclude specific words
  count(word1, word2, sort = TRUE) %>%
  slice_max(n, n = 4) %>%  # Display only top 4 bigrams
  mutate(bigram = paste(word1, word2)) 

# Create the bar chart
ggplot(top_bigrams, aes(x = reorder(bigram, n), y = n)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  coord_flip() +
  labs(
    title = "Top 4 Bigrams (Excluding Specified and Stop Words)",
    x = "Bigrams",
    y = "Frequency"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

“Government approved” implies a conspiratorial slant to the usage of “15 minute cities,” while “walkable community,” “grocery store,” and “public transport” are all favorite aspects of modern urban planning.

6. Sentiment Analysis

I performed a sentiment analysis on my text data using a dictionary method that accommodates negations.

# Initial sentiment analysis on all posts, displaying the top 10 positively scored 
sentiment_15min <- sentiment(filtered_threads$text) %>%
  arrange(desc(sentiment)) 
                    
head(sentiment_15min, 10) %>% 
  knitr::kable()

element_id	sentence_id	word_count	sentiment
62	9	9	1.4252667
19	8	21	0.9274260
6	10	1	0.6000000
30	6	8	0.5656854
9	14	5	0.4919350
24	1	5	0.4919350
25	7	9	0.4666667
71	4	15	0.4518481
73	4	14	0.4510033
37	6	12	0.4330127

7. 10 sample texts displayed alongside their sentiment scores while evaluating the credibility of the sentiment analysis outcomes

# Set seed and code to randomly select 10 posts from the entire population
set.seed(1234)
senti_15min_filtered <- filtered_threads %>%
  filter(nzchar(text) & !grepl("http[s]?://", text)) %>%
  unnest(text)
random_indices <- sample(nrow(senti_15min_filtered), size = 10)
ten_15min_samples <- senti_15min_filtered[random_indices, ]
ten_15min_samples$sentiment_score <- sapply(ten_15min_samples$text, function(text) {
  sentiment(text)$sentiment[1]
})
# Select columns and sort by sentiment 
columns <- c("date_utc", "title", "text", "subreddit", "sentiment_score")
ten_15min_samples_final <- ten_15min_samples[, columns] %>%
  arrange(desc(sentiment_score))
ten_15min_samples_final$title <- strtrim(ten_15min_samples_final$title, 50)
ten_15min_samples_final$text <- strtrim(ten_15min_samples_final$text, 50)
head(ten_15min_samples_final, 10) %>% 
  knitr::kable()

date_utc	title	text	subreddit	sentiment_score
2024-01-28	What are 15 minute cities?	Like, I literally just saw it in a post mentioned	fuckcars	0.2000000
2023-02-19	15 Minute Cities - Good or Bad?	I think they’re a good thing and would benefit alo	askTO	0.1936492
2023-01-23	The WEF agenda around 15-minute cities is a plot t	Some cities that are following the WEF urban agend	conspiracy	0.1745743
2023-09-30	What’s so bad about a 15 minute city?	Don’t people want to only walk a few minutes and h	NoStupidQuestions	0.0000000
2023-04-21	What’s with this weird conspiracy shit around 15 m	I don’t own a car. I bought a car after the Gamest	AskConservatives	0.0000000
2024-06-03	Conservatives will whine and cry about ’15 minute	That is all	fuckcars	0.0000000
2024-02-01	This is my take on the topic of 15 minute cities a	Just to clarify for those few that may not know, 1	fuckcars	-0.1018402
2024-02-28	I REALLY wish the right wing conspiracies about 15	Shit, every time I see a right winger post a consp	fuckcars	-0.1756986
2024-11-01	What is it with 15 minute city conspiracy theorist	I went to Facebook to look at 15 minute conspiraci	fuckcars	-0.2946278
2024-06-23	15 minute gulags are now being rolled out in Canad	Do not, under any circumstance accept the mandates	conspiracy_commons	-0.4939306

A sample text such as “Do not, under any circumstance accept the mandates” connotes strong negativity and hints at the user’s skepticism of the benefits of a “15 minute city” being normalized. Neutrality is demonstrated in comments such as “It used to be just called just a normal city” and “Don’t people want to only walk a few minutes.”

Positivity is reflected in the comment “I think they’re a good thing and would benefit…”. However, the sample text with the highest sentiment score was “Like, I literally just saw it in a post mentioned,” which, from a human perspective, does not indicate strong feelings one way or another but rather is quite neutral.

While the sentiment analysis seems mostly credible, examples such as the one above (the highest score) indicate that short and out-of-context texts may project neutrality or a misguided approach to positivity. Additionally, a comment such as “It used to be just called just a normal city” could be misclassified due to the ambiguity of tone.

8. Insights derived from the sentiment analysis supported by three plots

# Run sentiment analysis code used for 10 samples on all text for plotting
sentiment_15minplots <- filtered_threads %>%
  filter(nzchar(text) & !grepl("http[s]?://", text)) %>%
  unnest(text) %>%
  drop_na()
sentiment_15minplots$sentiment_score <-
  sapply(sentiment_15minplots$text, function(text) {
  sentiment(text)$sentiment[1]
})
columns <- c("date_utc", "title", "text", "subreddit", "sentiment_score")
sentiment_15minplots <- sentiment_15minplots[, columns] %>%
  arrange(desc(sentiment_score))
sentiment_15minplots$title <- strtrim(sentiment_15minplots$title, 50)
sentiment_15minplots$text <- strtrim(sentiment_15minplots$text, 50)
sentiment_15minplots$DoW <- wday(sentiment_15minplots$date_utc, label = TRUE, abbr = FALSE)
sentiment_15minplots <- sentiment_15minplots %>% select(date_utc, DoW, title, text,
                                                            subreddit, sentiment_score)
# Density plot to show distribution of scores
ggplot(sentiment_15minplots, aes(x = sentiment_score)) +
  geom_density(fill = "lightblue", alpha = 0.9) +
  labs(title = "Distribution of Sentiment", x = "Sentiment_score", y = "Density") +
  theme_dark()

The sentiment scores range from about -0.5 to 0.5, which indicates both positive and negative statements were found in the analysis. The distribution peaks above 0, suggesting most sentiments are neutral. While there is a balanced sentiment response, the results may be a bit skewed due to ambiguity in tone and a difficulty detecting sarcasm.

Below, the heightened number of terms occurring on Mondays may indicate more traffic and a displeasure in congestion at the beginning of the work week, leading to more users commenting on “15 minute cities.”

# Bar graph for day of week
sentiment_15minplots %>% 
  ggplot(aes(x = DoW)) +
  geom_bar(fill = 'orange') +
  theme_classic()

The following violin plot reflects how sentiments differ based on the day, with Sundays exemplifying the widest range of variability. Fewer extreme sentiments seem to be shared in the middle of the week (Wednesday and Thursday), while Monday and Friday show more concentrated distributions, which likely represent more consistency in sentiment levels.

Overall, most posts were neutral without an obvious slant towards positive or negative, especially considering the broad range of Reddit forums scraped (from the left-leaning r/fuckcars to the right-leaning r/AskConservatives). Further analysis of more user sentiments may be necessary to accurately capture human thought on a topic as broad and wide ranging as the 15 minute city.

# Violin plot by day of week
ggplot(sentiment_15minplots, aes(x = DoW, y = sentiment_score)) +
  geom_violin(fill = "lightblue", color = "darkblue") +
  labs(title = "Violin Plot by Group", x = "Group", y = "Value") +
  theme_dark()