Sentiment Analysis: Parking Shortage

Kaiyu Zhou

2022-11-29

1. Problem:

Analyze public sentiment towards parking (shortages) in San Francisco using Reddit threads over the past year.

2. Keyword: “no parking”, subreddit: “sanfrancisco”, period = “year”

Downloading Reddit threads

# using keyword
threads_p <- find_thread_urls(keywords = 'no parking', 
                              sort_by = 'relevance', 
                              period = 'all') %>% 
  drop_na()

## parsing URLs on page 1...
## parsing URLs on page 2...
## parsing URLs on page 3...

rownames(threads_p) <- NULL

colnames(threads_p)

## [1] "date_utc"  "timestamp" "title"     "text"      "subreddit" "comments" 
## [7] "url"

head(threads_p, 3) %>% knitr::kable()

date_utc	timestamp	title	subreddit	comments	url
2023-06-16	1686922167	Working on healthcare is hard enough without having to dodge Teslas parked in the bike lane, in a no-stopping zone, DIRECTLY NEXT TO A PARKING LOT on my commute	boston	368	https://www.reddit.com/r/boston/comments/14awwuv/working_on_healthcare_is_hard_enough_without/
2024-01-17	1705455732	Tobogganing ban at dozens of Toronto parks ignites anger toward city council. ‘People running the city are so out of touch with Canadians. What’s next? You’re gonna remove outdoor rinks? No more hockey’.	Canada_sub	292	https://www.reddit.com/r/Canada_sub/comments/198kghr/tobogganing_ban_at_dozens_of_toronto_parks/
2023-02-24	1677230073	Paul Scholes on Instagram: And we finally have a manager who gets Man Utd, fast and furious, not posession based, no parking the bus or buses and taking no s#it from any f***er!	reddevils	203	https://www.reddit.com/r/reddevils/comments/11an4wx/paul_scholes_on_instagram_and_we_finally_have_a/

# search for subreddits
subreddit_list <- RedditExtractoR::find_subreddits('no parking')

## parsing URLs on page 1...
## parsing URLs on page 2...

#subreddit_list[1:25,c('subreddit','title','subscribers')] %>% knitr::kable()

Check the number of threads found by the keyword per subreddit.

threads_p$subreddit %>% table() %>% sort(decreasing = T) %>% head(20)

## .
## mildlyinfuriating              pics         EarthPorn             funny 
##                15                 8                 4                 4 
## mildlyinteresting      MovieDetails       nottheonion          politics 
##                 4                 4                 4                 4 
##      sanfrancisco         southpark          baseball   britishproblems 
##                 4                 4                 3                 3 
##          CasualUK            Denver          fuckcars   oddlyterrifying 
##                 3                 3                 3                 3 
##      pettyrevenge    Showerthoughts            soccer        television 
##                 3                 3                 3                 3

Search by both the keyword and the subreddit.

# using both subreddit and keyword
threads_sfp <- find_thread_urls(keywords= 'no parking', 
                              subreddit = 'sanfrancisco', 
                              sort_by = 'relevance', 
                              period = 'year') %>% 
  drop_na()

## parsing URLs on page 1...
## parsing URLs on page 2...
## parsing URLs on page 3...

rownames(threads_sfp) <- NULL

head(threads_sfp, 3) %>% knitr::kable()

date_utc	timestamp	title	text	subreddit	comments	url
2024-06-05	1717600561	Map of upcoming events across San Francisco	San Franciscos Open Data portal has a map of upcoming events (next 7 days). Its based on street closure data. It sucks on mobile, but overall it kind of a cool way to see whats happening across the city (Night Markets, Block Parties, street fairs, etc.).

Obviously its not everything happening in the city, just things that get a permit from MTA to shut down a street.

Here’s a link to the map

https://preview.redd.it/jqldyig6sr4d1.png?width=2240&format=png&auto=webp&s=d875c16f8e19fb3c456df6bb6d2f1f79db3c0968 |sanfrancisco | 3|https://www.reddit.com/r/sanfrancisco/comments/1d8sg03/map_of_upcoming_events_across_san_francisco/ | |2024-06-24 | 1719228923|Not so Wily Coyote Crossing: Watch Out! |I’ve had a weird couple encounters with a youngish looking coyote on Diamond Heights Blvd, coming around the bend from the Clipper turn off.

A couple weeks ago, as I was driving to work, about 9:55 pm, I was driving along the above mentioned stretch of road toward Glen Park. There is open hill on the right side going up and on the left side going down, between two groups of housing. Suddenly, what seemed like a juvenile coyote came dashing across the street from the left side and almost ran in front of my car. I hit my brakes and coyote swerved, and was panic running alongside, between my car and the traffic island. I hit my brakes again and my car nearly stopped and it swerved again, finished crossing the road and ran up the hill. This was just before the intersection of Safira Street.

I know animals often have a regular route they take for hunting and scavenging. Last week on Sunday night, I was coming arounf the same curve and thinking about the coyote. Glanced to my left and there it was again, dashing from the left side of the street, but this time it saw my car and scrambled to turn around and run back the way it came. I hope it learned a valuable lesson from our last encounter and will carefully look both ways before crossing. I wouldn’t like to hit it with my car and hope no one else does, either.

Sorry, no pics. |sanfrancisco | 2|https://www.reddit.com/r/sanfrancisco/comments/1dnbbcv/not_so_wily_coyote_crossing_watch_out/ | |2024-04-27 | 1714240615|How to get a table at italian heritage parade? |Attended SFs annual italian heritage parade last year and had a blast! All the restaurants along the parade route had set tables outside for people to eat and watch the parade go by. I asked a group of people outside Tonys how they got their table, they said they had come every year for the past several years and booked really far in advanced. Does anyone know how to set up these reservations? Should I just call up the restaurants and ask if I can make a reservation even though mid october is months away? Also, does anyone have any insight on how much i can expect to be spending? |sanfrancisco | 6|https://www.reddit.com/r/sanfrancisco/comments/1cekkwl/how_to_get_a_table_at_italian_heritage_parade/ |

Save file for the sentiment analysis.

dir()

##  [1] "major3.Rmd"          "major3.Rproj"        "major3_parking.html"
##  [4] "major3_parking.Rmd"  "sfp_reddit_bert.csv" "threads_1.csv"      
##  [7] "threads_p.csv"       "threads_sf.csv"      "threads_sfp.csv"    
## [10] "threads_sfp_1.csv"   "threads_sfp_2.csv"

write.csv(threads_sfp,"./threads_sfp.csv", row.names = T)

Downloading comments and additional information

# get individual comments
threads_sfp_content <- get_thread_content(threads_sfp$url[1:4])

names(threads_sfp_content)

## [1] "threads"  "comments"

# check upvotes and downvotes
print(threads_sfp_content$threads[,c('upvotes','downvotes','up_ratio')])

##   upvotes downvotes up_ratio
## 1      22         0     0.88
## 2       1         0     0.67
## 3      10         0     0.68
## 4       0         0     0.50

The comments data frame provides information on individual comments.

head(threads_sfp_content$comments, 3) %>% knitr::kable()

url	author	date	timestamp	score	upvotes	downvotes	golds	comment	comment_id
https://www.reddit.com/r/sanfrancisco/comments/1d8sg03/map_of_upcoming_events_across_san_francisco/	nelsonhops415	2024-06-05	1717603882	8	8	0	0	Interesting.

Most of these events are on funcheapsf, dothebay, eddie’s list etc but it is super helpful for locals to check in case they don’t get the memo (parking, closures, detours, traffic etc). |1 | |https://www.reddit.com/r/sanfrancisco/comments/1d8sg03/map_of_upcoming_events_across_san_francisco/ |tetsuo316 |2024-06-05 | 1717623213| 1| 1| 0| 0|/u/nelsonhops415 said it best for me. This is super helpful to know where I might run into snafus taking my kid to the marina. Thanks /u/treehousewest! For anyone doing the escape from alcatraz thing, have fun! |2 | |https://www.reddit.com/r/sanfrancisco/comments/1d8sg03/map_of_upcoming_events_across_san_francisco/ |ProLoser |2024-07-20 | 1721503539| 1| 1| 0| 0|I actually built this open source webpage that crawls FunCheapSF and puts their events into a database that I use to populate a filterable map https://github.com/ProLoser/funcheapmap/ unfortunately my crawler stopped working, need to fix it. EDIT: FIXED and UPDATED!

Lets you filter by event and date and plan your day around events going on in different parts of the city.

https://preview.redd.it/a5oq1x0v5qdd1.png?width=1410&format=png&auto=webp&s=3d0d8a4a4f1ae0bfb017a15f894ae74d02c5a3f7|3 |

Analyses on posting date/time

look at the number of threads by week for the last 12 months.

# create new column: date
threads_sfp %<>% 
  mutate(date = as.POSIXct(date_utc)) %>%
  filter(!is.na(date))

# number of threads by week
threads_sfp %>% 
  ggplot(aes(x = date)) +
  geom_histogram(color="black", position = 'stack', binwidth = 7*24*3600) +
  scale_x_datetime(date_labels = "%b %y",
                   breaks = seq(min(threads_sfp$date, na.rm = T), 
                                max(threads_sfp$date, na.rm = T), 
                                by = "1 week")) +
  theme_minimal()

weekends or weekdays?

# create new columns: day_of_week, is_weekend
  threads_sfp %<>%  
  mutate(day_of_week = wday(date, label = TRUE)) %>% 
  mutate(is_weekend = ifelse(day_of_week %in% c("Sat", "Sun"), "Weekend", "Weekday"))

# number of threads by time of day
threads_sfp %>% 
  ggplot(aes(x = day_of_week, fill = is_weekend)) +
  geom_bar(color = 'black') +
  scale_fill_manual(values = c("Weekday" = "gray", "Weekend" = "pink")) + 
  theme_minimal()

time of day

print(threads_sfp$timestamp[1])

## [1] 1717600561

print(threads_sfp$timestamp[1] %>% anytime(tz = anytime:::getTZ()))

## [1] "2024-06-05 11:16:01 EDT"

threads_sfp %<>%  
  mutate(time = timestamp %>% 
           anytime(tz = anytime:::getTZ()) %>% 
           str_split('-| |:') %>% 
           sapply(function(x) as.numeric(x[4])))

# number of threads by time of day
threads_sfp %>% 
  ggplot(aes(x = time)) +
  geom_histogram(bins = 24, color = 'black') +
  scale_x_continuous(breaks = seq(0, 24, by=2)) + 
  theme_minimal()

3. Tokenization and stop words

Tokenization

# Word tokenization
words <- threads_sfp %>% 
  unnest_tokens(output = word, input = text, token = "words")

words %>%
  count(word, sort = TRUE) %>%
  top_n(20) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(x = word, y = n)) +
  geom_col() +
  xlab(NULL) +
  coord_flip() +
  labs(x = "words",
       y = "counts",
       title = "Unique wordcounts")

## Selecting by n

Stop words

# load list of stop words - from the tidytext package
data("stop_words")
# view random 50 words
print(stop_words$word[sample(1:nrow(stop_words), 100)])

##   [1] "we'll"      "definitely" "sure"       "first"      "from"      
##   [6] "facts"      "think"      "or"         "m"          "other"     
##  [11] "if"         "us"         "does"       "sorry"      "where"     
##  [16] "get"        "new"        "available"  "over"       "above"     
##  [21] "longest"    "three"      "no"         "presents"   "did"       
##  [26] "asked"      "nine"       "going"      "you've"     "looking"   
##  [31] "by"         "different"  "good"       "else"       "hers"      
##  [36] "anybody"    "both"       "these"      "that"       "wherever"  
##  [41] "goes"       "yourself"   "we'd"       "ask"        "went"      
##  [46] "theres"     "have"       "yet"        "an"         "why"       
##  [51] "working"    "should"     "turned"     "an"         "became"    
##  [56] "become"     "isn't"      "etc"        "somehow"    "yourselves"
##  [61] "yet"        "itself"     "many"       "was"        "likely"    
##  [66] "are"        "we've"      "where"      "could"      "something" 
##  [71] "on"         "he"         "since"      "our"        "i"         
##  [76] "seems"      "goods"      "her"        "came"       "possible"  
##  [81] "this"       "he'd"       "obviously"  "become"     "this"      
##  [86] "they've"    "specified"  "anyhow"     "a"          "p"         
##  [91] "up"         "somebody"   "whereafter" "comes"      "ourselves" 
##  [96] "yes"        "perhaps"    "after"      "had"        "latterly"

# Regex that matches URL-type string
replace_reg <- "http[s]?://[A-Za-z\\d/\\.]+|&amp;|&lt;|&gt;"

words_clean <- threads_sfp %>% 
  # drop URLs
  mutate(text = str_replace_all(text, replace_reg, "")) %>%
  # Tokenization (word tokens)
  unnest_tokens(word, text, token = "words") %>% 
  # drop stop words
  anti_join(stop_words, by = "word") %>% 
  # drop non-alphabet-only strings
  filter(str_detect(word, "[a-z]"))

# Check the number of rows after removal of the stop words. There should be fewer words now
print(
  glue::glue("Before: {nrow(words)}, After: {nrow(words_clean)}")
)

## Before: 57564, After: 21117

words_clean %>%
  count(word, sort = TRUE) %>%
  top_n(20, n) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(x = word, y = n)) +
  geom_col() +
  xlab(NULL) +
  coord_flip() +
  labs(x = "words",
       y = "counts",
       title = "Unique wordcounts")

4. Generate Word cloud (except keywords)

Frequency of words before and after removing the stop words using word cloud.

words %>% 
  count(word, sort = TRUE) %>% 
  wordcloud2()

words_clean %>% 
  filter(!word %in% c("no", "parking")) %>%
  count(word, sort = TRUE) %>% 
  wordcloud2()

Change colors

n <- 20
h <- runif(n, 0, 1) # any color
s <- runif(n, 0.7, 1) # vivid
v <- runif(n, 0.4, 0.7) # neither too dark or bright

df_hsv <- data.frame(h = h, s = s, v = v)
pal <- apply(df_hsv, 1, function(x) hsv(x['h'], x['s'], x['v']))
pal <- c(pal, rep("grey", 10000))

words_clean %>% 
  filter(!word %in% c("no", "parking")) %>%
  count(word, sort = TRUE) %>% 
  wordcloud2(color = pal, 
             minRotation = 0, 
             maxRotation = 0, 
             ellipticity = 0.8)

5. Tri-grams

words_ngram <- threads_sfp %>%
  mutate(text = str_replace_all(text, replace_reg, "")) %>%
  select(text) %>%
  unnest_tokens(output = paired_words,
                input = text,
                token = "ngrams",
                n = 3)

# Show ngrams with sorted values
words_ngram %>%
  count(paired_words, sort = TRUE) %>% 
  head(20) %>% 
  knitr::kable()

paired_words	n
we need to	39
in the city	22
NA	20
a lot of	19
in front of	18
golden gate park	17
i don t	17
out of the	17
in san francisco	15
of the city	15
sf mental health	15
on the street	14
i went to	13
in the area	13
need to be	13
at this point	12
side of the	12
there is a	12
and it was	11
i want to	11

#separate the paired words into two columns
words_ngram_pair <- words_ngram %>%
  separate(paired_words, c("word1", "word2"), sep = " ")

## Warning: Expected 2 pieces. Additional pieces discarded in 56361 rows [1, 2, 3, 4, 5, 6,
## 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

# filter rows where there are stop words under word 1 column and word 2 column
words_ngram_pair_filtered <- words_ngram_pair %>%
  # drop stop words
  filter(!word1 %in% stop_words$word & !word2 %in% stop_words$word) %>% 
  # drop non-alphabet-only strings
  filter(str_detect(word1, "[a-z]") & str_detect(word2, "[a-z]"))

# Filter out words that are not encoded in ASCII
# To see what's ASCII, google 'ASCII table'
library(stringi)
words_ngram_pair_filtered %<>% 
  filter(stri_enc_isascii(word1) & stri_enc_isascii(word2))

# Sort the new bi-gram (n=2) counts:
words_counts <- words_ngram_pair_filtered %>%
  count(word1, word2) %>%
  arrange(desc(n))

head(words_counts, 20) %>% 
  knitr::kable()

word1	word2	n
san	francisco	72
mental	health	41
pjpgauto	webps	29
golden	gate	23
parking	lot	18
affordable	housing	17
gate	park	17
license	plate	15
sf	mental	15
north	beach	14
parking	spot	12
union	square	11
street	parking	10
dolores	park	9
health	services	9
parking	enforcement	9
public	transit	9
4624format	pjpgauto	8
ocean	beach	8
width	4624format	8

Visualize the words occurring in pairs.

# plot word network
words_counts %>%
  filter(n >= 8) %>%
  graph_from_data_frame() %>% # convert to graph
  ggraph(layout = "fr") +
  geom_edge_link(aes(edge_alpha = .6, edge_width = n)) +
  geom_node_point(color = "darkslategray4", size = 3) +
  geom_node_text(aes(label = name), vjust = 2, color = "red3") +
  labs(title = "Word Networks",
       x = "", y = "")

## Warning: The `trans` argument of `continuous_scale()` is deprecated as of ggplot2 3.5.0.
## ℹ Please use the `transform` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

6. Sentiment Analysis

Import the result

Import the sentiment analysis result data processed from the Colab to R.

# import the data
reddit_sentiment <- read_csv("sfp_reddit_bert.csv")

## New names:
## Rows: 229 Columns: 11
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (5): title, text, subreddit, url, bert_label dbl (5): ...1, Unnamed: 0,
## timestamp, comments, bert_score date (1): date_utc
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`

# drop NAs
reddit_sentiment %<>% drop_na('bert_label')

Comparison with the dictionary method

Get sentiment scores using the dictionary method for comparison.

# Join thread title and text.
reddit_sentiment %<>%
  mutate(title = replace_na(title, ""),
         text = replace_na(text, ""),
         title_text = str_c(title, text, sep = ". "))

# dictionary method
reddit_sentiment_dictionary <- sentiment_by(reddit_sentiment$title_text)

reddit_sentiment$sentiment_dict <- reddit_sentiment_dictionary %>% pull(ave_sentiment)
reddit_sentiment$word_count <- reddit_sentiment_dictionary %>% pull(word_count)

Check the correlation between the sentiment values from two different methods.

reddit_sentiment %<>% mutate(bert_label_numeric = str_sub(bert_label, 1, 1) %>% as.numeric())

cor(reddit_sentiment$bert_label_numeric, reddit_sentiment$sentiment_dict)

## [1] 0.2272316

0.227 implies a low positive correlation.

ggplot(data = reddit_sentiment, aes(x = bert_label_numeric, y = sentiment_dict)) +
  geom_jitter(width = 0.1, height = 0) +
  geom_line(aes(y = 0), color = '#FFD700', lwd = 1, linetype='dashed') +
  dark_theme_grey()

## Inverted geom defaults of fill and color/colour.
## To change them back, use invert_geom_defaults().

BERT: 1 star (negative) vs. 5 stars (positive)

bert_example <- reddit_sentiment %>%
  filter(bert_label_numeric %in% c(1,5)) %>%
  group_by(bert_label) %>%
  arrange(desc(bert_score)) %>%
  slice_head(n = 3) %>%
  ungroup()

# 1 star
bert_example %>% filter(bert_label_numeric == 1) %>% pull(title_text) %>% print()

## [1] "Market St. Closure. I am so sick of this of this nonsense. I see absolutely no reason to close off Market St for any reason let alone a parade or a protest. This street is one of the main thourofares to get to the bay bridge and get out of the city and it is once again closed for a parade which leaves the embarcadero as the only street to get to the bay bridge if you are north of Market. Embarcadero is a parking lot right now. The last time this happened it took me an hour and a half to go 3/4 of a mile. Free speech, peaceful assembly I DONT CARE I DONT CARE I DONT CARE this is nonsense and if I miss the start of the 9er game because of this  I will figuratively rain hell fire upon city hall.  NONSENSE I SAY!!!!!!!!!"                                                                                                                                                                                                                                
## [2] "This Derelict Vehicle!?!?. What can be done about this awful vehicle? My partner and I call it the Shitsubishi. This SUV keeps parking in the Powell street delivery pullouts illegally for days on end and often half blocking the city street. It has countless unpaid parking tickets, its registration has been expired for over 2 years, no front plates, it\031s leaking fluids and it\031s multiple drivers often are seen throwing trash directly from the vehicle right into the street. We report it to 311 as often as we can. If we were in any other city in the US this vehicle would be towed! WTF! We pay our vehicle registrations, our parking tickets and for a residential parking passes. Why is there no consequences for this kind of vehicle and behavior? Now the latest is that they spray painted their own license plate red. Why?  This makes the city look awful to all the riders on the street cars and as locals that lives on Powell we are over it."
## [3] "Rincon Hill Dog Park Police Activity?. Does anyone know what\031s going on near the Rincon Hill dog park (Saturday, 4/6)?  I received a Citizen app alert that a \034pet fell from a bridge\035.  But commenters are saying someone *threw* another\031s dog off the bridge, while others are saying it was not a dog but a *human*.  From the video, there is a large police presence.  Completely and utterly awful no matter.  Just wondering if anyone had insight?"

# 5 star
bert_example %>% filter(bert_label_numeric == 5) %>% pull(title_text) %>% print()

## [1] "Caption this photo. Colma Best Buy parking lot."                                   
## [2] "An excellent visualization of San Francisco's \"progressive crescent\" precincts. "
## [3] "hanging out in North Beach. "

Dictionary method: negative vs. positive

sentimentr_example <- reddit_sentiment %>%
  mutate(sentimentr_abs = abs(sentiment_dict),
         sentimentr_binary = case_when(sentiment_dict > 0 ~ 'positive',
                                       TRUE ~ 'negative')) %>%
  group_by(sentimentr_binary) %>%
  arrange(desc(sentimentr_abs)) %>%
  slice_head(n = 3) %>%
  ungroup() %>%
  arrange(sentiment_dict)

# negative
sentimentr_example %>% filter(sentimentr_binary == 'negative') %>% pull(title_text) %>% print()

## [1] "SFMTA Labor Day Parking Enforcement. "                                                                                                                                                      
## [2] "stolen motorcycle. some one with no soul or regard for life stole my baby.  My 2006 Yamaha fz6 between the panhandle and Buena Vista Park.  She was red and beautiful and I am heartbroken."
## [3] "Memorial Day SFMTA Street Parking Enforcement. "

# positive
sentimentr_example %>% filter(sentimentr_binary == 'positive') %>% pull(title_text) %>% print()

## [1] "New housing will add 160 affordable units to S.F.\031s Haight \024 but no parking spots. "                                                                                                                                    
## [2] "An excellent visualization of San Francisco's \"progressive crescent\" precincts. "                                                                                                                                           
## [3] "you can park anywhere right now, if only anything was open. i\031m glad no one\031s working thanksgiving and that people are out of town with their family but wish I could make better use of all the free parking, just me?"

Visualization

Sentiment distribution

Number of threads by sentiment category.
Negative sentiments (1-star ratings) dominate the dataset, indicating dissatisfaction regarding parking or related issues.

reddit_sentiment %>%
  ggplot(aes(x = bert_label)) +
  geom_bar(fill = "white") +
  dark_theme_gray()

Word counts by sentiment category.

reddit_sentiment %>%
  ggplot(aes(x = bert_label, y = comments)) +
  geom_jitter(height = 0, width = 0.05) +
  stat_summary(fun = mean, geom = "crossbar", width = 0.4, color = "red") +
  dark_theme_gray()

Association between a thread’s sentiment and the number of comments on the thread.

# Remove outliers
reddit_sentiment_rm_outlier <- reddit_sentiment %>%
  group_by(bert_label) %>%
  filter(
    between(
      comments,
      quantile(comments, 0.25) - 1.5 * IQR(comments),
      quantile(comments, 0.75) + 1.5 * IQR(comments)))

# Correlation analysis
cor.test(reddit_sentiment_rm_outlier$comments, reddit_sentiment_rm_outlier$bert_label_numeric)

## 
##  Pearson's product-moment correlation
## 
## data:  reddit_sentiment_rm_outlier$comments and reddit_sentiment_rm_outlier$bert_label_numeric
## t = -0.37591, df = 155, p-value = 0.7075
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1859393  0.1270587
## sample estimates:
##         cor 
## -0.03018012

# Scatterplot
reddit_sentiment_rm_outlier %>%
  ggplot(aes(x = bert_label_numeric, y = comments)) +
  geom_jitter(height = 0, width = 0.05) + 
  geom_smooth(method = 'loess', span = 0.75) +
  dark_theme_gray()

## `geom_smooth()` using formula = 'y ~ x'

Word clouds

Using word clouds to visualize words that are frequently seen in either positive or negative threads

# Stop word removal and tokenization
data("stop_words")
replace_reg <- "http[s]?://[A-Za-z\\d/\\.]+|&amp;|&lt;|&gt;"

reddit_sentiment_clean <- reddit_sentiment %>%
  mutate(title_text = str_replace_all(title_text, replace_reg, "")) %>%
  # tokenize
  unnest_tokens(word, title_text, token = "words") %>%
  # remove stop words
  anti_join(stop_words, by = "word") %>%
  filter(str_detect(word, "[a-z]")) %>%
  filter(!word %in% c('flu','shot','shots')) # You need to replace this with your keyword

# negative text
reddit_sentiment_clean_negative <- reddit_sentiment_clean %>%
  filter(bert_label_numeric %in% c(1,2))
# positive text
reddit_sentiment_clean_positive <- reddit_sentiment_clean %>%
  filter(bert_label_numeric %in% c(4,5))

# Remove words that are commonly seen in both negative and positive threads
reddit_sentiment_clean_negative_unique <- reddit_sentiment_clean_negative %>%
  anti_join(reddit_sentiment_clean_positive, by = 'word')
reddit_sentiment_clean_positive_unique <- reddit_sentiment_clean_positive %>%
  anti_join(reddit_sentiment_clean_negative, by = 'word')

Words appearing in negative threads

# Wordcloud with a custom color palette
n <- 20
h <- runif(n, 0, 1) # any color
s <- runif(n, 0.6, 1) # vivid
v <- runif(n, 0.3, 0.7) # neither too dark nor too bright

df_hsv <- data.frame(h = h, s = s, v = v)
pal <- apply(df_hsv, 1, function(x) hsv(x['h'], x['s'], x['v']))
pal <- c(pal, rep("grey", 10000))

reddit_sentiment_clean_negative_unique %>%
  count(word, sort = TRUE) %>%
  wordcloud2(color = pal,
       minRotation = -pi/6,
       maxRotation = -pi/6,
       rotateRatio = 1)

Words appearing in positive threads

# Wordcloud with a custom color palette
n <- 20
h <- runif(n, 0, 1) # any color
s <- runif(n, 0.6, 1) # vivid
v <- runif(n, 0.3, 0.7) # neither too dark nor too bright

df_hsv <- data.frame(h = h, s = s, v = v)
pal <- apply(df_hsv, 1, function(x) hsv(x['h'], x['s'], x['v']))
pal <- c(pal, rep("grey", 10000))

reddit_sentiment_clean_positive_unique %>%
  count(word, sort = TRUE) %>%
  wordcloud2(color = pal,
       minRotation = pi/6,
       maxRotation = pi/6,
       rotateRatio = 1)

Temporal analysis

Create new columns for the temporal analysis: date, year, day_of_week, is_weekend, time

reddit_sentiment %<>%
  mutate(date = as.POSIXct(date_utc)) %>%
  filter(!is.na(date)) %>%
  mutate(year = year(date),
         month = as.Date(floor_date(date, unit = "month")),
         day_of_week = wday(date, label = TRUE),
         is_weekend = ifelse(day_of_week %in% c("Sat", "Sun"), "Weekend", "Weekday"),
         time = timestamp %>%
           anytime(tz = anytime:::getTZ()) %>%
           str_split('-| |:') %>%
           sapply(function(x) as.numeric(x[4])))

Sentiment by month using a stacked bar plot

# sentiment by month
reddit_sentiment %>%
  ggplot(aes(x = month, fill = bert_label)) +
  geom_bar(position = 'stack') +
  scale_x_date(date_labels = "%b %Y",
                 date_breaks = "1 month") +
  scale_fill_brewer(palette = 'PuRd', direction = -1) +
  dark_theme_grey()

proportions.

# sentiment by month
reddit_sentiment %>%
  ggplot(aes(x = month, fill = bert_label)) +
  geom_bar(position = 'fill') +
  scale_x_date(date_labels = "%b %Y",
                 date_breaks = "1 month") +
  scale_fill_brewer(palette = 'PuRd', direction = -1) +
  dark_theme_grey()

Sentiment by day of week.

# sentiment by day
reddit_sentiment %>%
  ggplot(aes(x = day_of_week, fill = bert_label)) +
  geom_bar(position = 'fill') +
  scale_fill_brewer(palette = 'PuRd', direction = -1) +
  dark_theme_grey()

Sentiment by time of day.

reddit_sentiment %>%
  ggplot(aes(x = time, fill = bert_label)) +
  geom_histogram(bins = 24, position = 'fill', color = 'black', lwd = 0.2) +
  scale_x_continuous(breaks = seq(0, 24, by=1)) +
  scale_fill_manual(values = c('#bc5090', '#bc5090', '#ff6361', '#ffa600', '#ffa600')) +
  dark_theme_grey()

## Warning: Removed 15 rows containing missing values or values outside the scale range
## (`geom_bar()`).

ggplot(reddit_sentiment, aes(x = sentiment_dict)) +
    geom_histogram(binwidth = 0.1, fill = "steelblue", color = "black") +
    labs(title = "Distribution of Sentiment Scores", x = "Sentiment Score", y = "Count") +
    theme_minimal()

ggplot(reddit_sentiment, aes(x = sentiment_dict, y = comments)) +
    geom_point(alpha = 0.6, color = "darkred") +
    geom_smooth(method = "loess", color = "blue") +
    labs(title = "Sentiment Score vs. Engagement", x = "Sentiment Score", y = "Comment Count") +
    theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

7. Sample texts and their sentiment scores

library(stringr)

set.seed(123)

sample_texts <- reddit_sentiment %>%
    sample_n(10) %>%
    mutate(short_text = str_sub(text, 1, 1000)) %>%  # Display the first 100 characters
    select(short_text, bert_label_numeric)

# Display the table
knitr::kable(sample_texts, caption = "Sample Texts with Sentiment Scores")

Sample Texts with Sentiment Scores
short_text	bert_label_numeric
The City wants to install meters “pay or permit” devices, to be specific) in almost all parts of Cow Hollow/Marina. A meeting is bring held Monday, the Monday after Thanksgiving. I can see it benefitting certain people (namely those workers who are used to moving their car every few hours), but I think it is yet another money grab.

I believe the rates will be subject to peak demand, as well. I had no idea it is going to encompass such a huge majority of the neighborhood!

https://form.jotform.com/243288802092156?fbclid=IwY2xjawGzLXpleHRuA2FlbQIxMQABHR4SBTdWy08097YNe5D3mgOnm6c1age24BiKn5A6EJKkrhCNiNeqLHG_1w_aem_cQ25-3qYhjqIJ6-6Hfv04w | 2| |i got 2 tickets for not having a front license plate (cause I had gotten a new bumper and was just lazy to install the plate right away) within a month so I know they can’t be getting lucky with parking fairies.

edit: i don’t own a tesla lol | 1| |Just as the title says, Ive parked on the sides before and always use the PayByPhone app. Theres always the no stopping signs on all the parking spots. But I was always confused if youre actually allowed to park there because I wondered why would the hours you need to pay for parking overlap with the no stopping sign? Ive seen plenty of cars parked there and never seen them get a ticket. Anyone have a concrete answer for this? And do you think I can appeal the ticket?

I parked near nicks gyro, at 5pm and paid for an hour. Then got my ticket around 530pm. | 1| |Hi, I am a new driver so I am not too confident about parking with these signs. Today is Thursday, would it be fine if I parked here @5pm today and get the car before 10am tomorrow (Friday)? I do not want to get a ticket.

Any insights would be appreciated! | 2| |My husband and I are taking a very long overdue vacation and booked a trip to SF in June. Were staying near Dolores Park and plan to do a lot of walking and sight seeing during our stay. I would love some feedback and recommendations on the itinerary I have planned (see below). Also, we plan on packing a lot of casual athletic wear (leggings, hoodies, light jackets) so recommendations on what clothes to pack is appreciated as well.

Theres so much we want to do and see that isnt listed on our itinerary but Id love to hear from locals and others who have visited on whats an absolute cant miss in SF. We will be walking a lot and utilizing Uber/Lfyt. So no worries on parking or renting a car.

Wednesday - [ ] Fly in - [ ] Settle into airbnb - [ ] District Tea (Picnic Pack) - [ ] Sunset Picnic at Dolores Park - [ ] Bed early!

Thursday - [ ] Tour: Muir Woods, Napa/Somoma, Golden Gate Bridge (lunch during tour) - [ ] Dinner and exploring on Pier 39

Friday - [ ] | 4| |Get your kids. There’s a bunch of teenage guys doing donuts on scooters in Washington Square Park.

Edit 1: Everyone here claims to care about other people. We claim to be progressive. But the responses to this post are the opposite. No one seems to care about their neighbors in North Beach.

Edit 2: I posted because I’m hoping the parents see as there is a large Reddit population in SF. I did call the police. I’m getting a lot of hate for caring about my park. I made the mistaken assumption that people were kind and cared about their neighbors and city. | 1| | | 4| | | 2| |Will be coming for four days in March and have a rental car. Prices for hotels with no parking in DT area compared to Oakland or further south with parking is about $750+ vs $350+. Is it a hassle to commute into the city each morning and finding parking? Worth the extra price? | 3| |I went to Stonestown yesterday and Round 1 was open (10am-2am). It was a lively with a bunch of claw machines, DDR, arcade games. There was also food and a line to play billiards, bowling, and karaoke. I would highly recommend going if you have no plans today!

Please note that the claw machine games seem to be $2.25 (9 credits), $2.75 (11 credits), and $3.50 (14 credits) a play. Each credit costs $0.25 with a $10 minimum on the playing card and you get bonus credits if you purchase higher increments.

Quick edit: Side note, the main entrance was not open and I had to use the parking entrance downstairs to get in. I went through Target and took the stairs down. | 1|

Credibility of the sentiment analysis outcomes

“The sentiment analysis outcomes are mostly credible, as the majority of scores align with the general tone and content of the sampled texts. However, there are specific cases where the scores could be refined for greater accuracy.”

but I think it is yet another money grab.

2 (Accurate)

i got 2 tickets for not having a front license plate (cause I had gotten a new bumper and was just lazy to install the plate right away) within a month so I know they can’t be getting lucky with parking fairies.

1 (Accurate)

But I was always confused if you’re actually allowed to park there because I wondered why would the hours you need to pay for parking overlap with the no stopping sign? I’ve seen plenty of cars parked there and never seen them get a ticket.

1 (not very accurate)

I do not want to get a ticket. Any insights would be appreciated!

2 (not very accurate)

I would love some feedback and recommendations on the itinerary I have planned

4 (Accurate)

We claim to be progressive. But the responses to this post are the opposite. No one seems to care about their neighbors in North Beach.

1 (Accurate)

Prices for hotels with no parking in DT area compared to Oakland or further south with parking is about $750+ vs 350+. Is it a hassle to commute into the city each morning and finding parking? Worth the extra price?

3 (Accurate)

I would highly recommend going if you have no plans today! Quick edit: Side note, the main entrance was not open and I had to use the parking entrance downstairs to get in.

1 (Not Accurate)

8. Insights derived from the sentiment analysis

The word network plot highlights clusters around “parking,” “street,” “enforcement,” and “lot,” emphasizing common concerns related to enforcement, location, and availability. The clustering around “housing” and “affordable” indicates discussions linking parking issues with broader affordability and housing challenges in urban contexts.

The monthly sentiment distribution reveals low-star (1-2 stars) feedback peaking during summer months (June-August), likely due to heightened frustrations during peak travel and events. Conversely, high-star (4-5 stars) ratings increase in spring (April-May) and fall (September-October), possibly reflecting positive experiences tied to seasonal activities and reduced congestion.

Low Ratings (1 star): Negative sentiment is consistent throughout the week, but slightly higher on weekends (Friday and Saturday), potentially reflecting dissatisfaction with events or services experienced during leisure activities.

High Ratings (5 stars): Positive sentiment remains relatively low and stable across the week, indicating that users rarely express high praise regardless of the day.

## Warning: Removed 15 rows containing missing values or values outside the scale range
## (`geom_bar()`).

Low-Star Periods (1-2 stars): Early mornings (midnight to 9 AM) exhibit the highest proportion of complaints and frustrations, often related to parking issues, enforcement, and safety concerns.

High-Star Periods (4-5 stars): Late mornings to afternoons (10 AM to 5 PM) show increased positivity, likely tied to recommendations, positive experiences, or community engagement.