Sentiment Analysis: Parking Shortage

Kaiyu Zhou

2022-11-29

1. Problem:

Analyze public sentiment towards parking (shortages) in San Francisco using Reddit threads over the past year.

2. Keyword: “no parking”, subreddit: “sanfrancisco”, period = “year”

Downloading Reddit threads

# using keyword
threads_p <- find_thread_urls(keywords = 'no parking', 
                              sort_by = 'relevance', 
                              period = 'all') %>% 
  drop_na()
## parsing URLs on page 1...
## parsing URLs on page 2...
## parsing URLs on page 3...
rownames(threads_p) <- NULL

colnames(threads_p)
## [1] "date_utc"  "timestamp" "title"     "text"      "subreddit" "comments" 
## [7] "url"
head(threads_p, 3) %>% knitr::kable()
date_utc timestamp title text subreddit comments url
2023-06-16 1686922167 Working on healthcare is hard enough without having to dodge Teslas parked in the bike lane, in a no-stopping zone, DIRECTLY NEXT TO A PARKING LOT on my commute boston 368 https://www.reddit.com/r/boston/comments/14awwuv/working_on_healthcare_is_hard_enough_without/
2024-01-17 1705455732 Tobogganing ban at dozens of Toronto parks ignites anger toward city council. ‘People running the city are so out of touch with Canadians. What’s next? You’re gonna remove outdoor rinks? No more hockey’. Canada_sub 292 https://www.reddit.com/r/Canada_sub/comments/198kghr/tobogganing_ban_at_dozens_of_toronto_parks/
2023-02-24 1677230073 Paul Scholes on Instagram: And we finally have a manager who gets Man Utd, fast and furious, not posession based, no parking the bus or buses and taking no s#it from any f***er! reddevils 203 https://www.reddit.com/r/reddevils/comments/11an4wx/paul_scholes_on_instagram_and_we_finally_have_a/
# search for subreddits
subreddit_list <- RedditExtractoR::find_subreddits('no parking')
## parsing URLs on page 1...
## parsing URLs on page 2...
#subreddit_list[1:25,c('subreddit','title','subscribers')] %>% knitr::kable()

Check the number of threads found by the keyword per subreddit.

threads_p$subreddit %>% table() %>% sort(decreasing = T) %>% head(20)
## .
## mildlyinfuriating              pics         EarthPorn             funny 
##                15                 8                 4                 4 
## mildlyinteresting      MovieDetails       nottheonion          politics 
##                 4                 4                 4                 4 
##      sanfrancisco         southpark          baseball   britishproblems 
##                 4                 4                 3                 3 
##          CasualUK            Denver          fuckcars   oddlyterrifying 
##                 3                 3                 3                 3 
##      pettyrevenge    Showerthoughts            soccer        television 
##                 3                 3                 3                 3

Search by both the keyword and the subreddit.

# using both subreddit and keyword
threads_sfp <- find_thread_urls(keywords= 'no parking', 
                              subreddit = 'sanfrancisco', 
                              sort_by = 'relevance', 
                              period = 'year') %>% 
  drop_na()
## parsing URLs on page 1...
## parsing URLs on page 2...
## parsing URLs on page 3...
rownames(threads_sfp) <- NULL

head(threads_sfp, 3) %>% knitr::kable()
date_utc timestamp title text subreddit comments url
2024-06-05 1717600561 Map of upcoming events across San Francisco San Franciscos Open Data portal has a map of upcoming events (next 7 days). Its based on street closure data. It sucks on mobile, but overall it kind of a cool way to see whats happening across the city (Night Markets, Block Parties, street fairs, etc.).

Obviously its not everything happening in the city, just things that get a permit from MTA to shut down a street.

Here’s a link to the map

https://preview.redd.it/jqldyig6sr4d1.png?width=2240&format=png&auto=webp&s=d875c16f8e19fb3c456df6bb6d2f1f79db3c0968 |sanfrancisco | 3|https://www.reddit.com/r/sanfrancisco/comments/1d8sg03/map_of_upcoming_events_across_san_francisco/ | |2024-06-24 | 1719228923|Not so Wily Coyote Crossing: Watch Out! |I’ve had a weird couple encounters with a youngish looking coyote on Diamond Heights Blvd, coming around the bend from the Clipper turn off.

A couple weeks ago, as I was driving to work, about 9:55 pm, I was driving along the above mentioned stretch of road toward Glen Park. There is open hill on the right side going up and on the left side going down, between two groups of housing. Suddenly, what seemed like a juvenile coyote came dashing across the street from the left side and almost ran in front of my car. I hit my brakes and coyote swerved, and was panic running alongside, between my car and the traffic island. I hit my brakes again and my car nearly stopped and it swerved again, finished crossing the road and ran up the hill. This was just before the intersection of Safira Street.

I know animals often have a regular route they take for hunting and scavenging. Last week on Sunday night, I was coming arounf the same curve and thinking about the coyote. Glanced to my left and there it was again, dashing from the left side of the street, but this time it saw my car and scrambled to turn around and run back the way it came. I hope it learned a valuable lesson from our last encounter and will carefully look both ways before crossing. I wouldn’t like to hit it with my car and hope no one else does, either.

Sorry, no pics. |sanfrancisco | 2|https://www.reddit.com/r/sanfrancisco/comments/1dnbbcv/not_so_wily_coyote_crossing_watch_out/ | |2024-04-27 | 1714240615|How to get a table at italian heritage parade? |Attended SFs annual italian heritage parade last year and had a blast! All the restaurants along the parade route had set tables outside for people to eat and watch the parade go by. I asked a group of people outside Tonys how they got their table, they said they had come every year for the past several years and booked really far in advanced. Does anyone know how to set up these reservations? Should I just call up the restaurants and ask if I can make a reservation even though mid october is months away? Also, does anyone have any insight on how much i can expect to be spending? |sanfrancisco | 6|https://www.reddit.com/r/sanfrancisco/comments/1cekkwl/how_to_get_a_table_at_italian_heritage_parade/ |

Save file for the sentiment analysis.

dir()
##  [1] "major3.Rmd"          "major3.Rproj"        "major3_parking.html"
##  [4] "major3_parking.Rmd"  "sfp_reddit_bert.csv" "threads_1.csv"      
##  [7] "threads_p.csv"       "threads_sf.csv"      "threads_sfp.csv"    
## [10] "threads_sfp_1.csv"   "threads_sfp_2.csv"
write.csv(threads_sfp,"./threads_sfp.csv", row.names = T)

Downloading comments and additional information

# get individual comments
threads_sfp_content <- get_thread_content(threads_sfp$url[1:4])
names(threads_sfp_content)
## [1] "threads"  "comments"
# check upvotes and downvotes
print(threads_sfp_content$threads[,c('upvotes','downvotes','up_ratio')])
##   upvotes downvotes up_ratio
## 1      22         0     0.88
## 2       1         0     0.67
## 3      10         0     0.68
## 4       0         0     0.50

The comments data frame provides information on individual comments.

head(threads_sfp_content$comments, 3) %>% knitr::kable()
url author date timestamp score upvotes downvotes golds comment comment_id
https://www.reddit.com/r/sanfrancisco/comments/1d8sg03/map_of_upcoming_events_across_san_francisco/ nelsonhops415 2024-06-05 1717603882 8 8 0 0 Interesting.

Most of these events are on funcheapsf, dothebay, eddie’s list etc but it is super helpful for locals to check in case they don’t get the memo (parking, closures, detours, traffic etc). |1 | |https://www.reddit.com/r/sanfrancisco/comments/1d8sg03/map_of_upcoming_events_across_san_francisco/ |tetsuo316 |2024-06-05 | 1717623213| 1| 1| 0| 0|/u/nelsonhops415 said it best for me. This is super helpful to know where I might run into snafus taking my kid to the marina. Thanks /u/treehousewest! For anyone doing the escape from alcatraz thing, have fun! |2 | |https://www.reddit.com/r/sanfrancisco/comments/1d8sg03/map_of_upcoming_events_across_san_francisco/ |ProLoser |2024-07-20 | 1721503539| 1| 1| 0| 0|I actually built this open source webpage that crawls FunCheapSF and puts their events into a database that I use to populate a filterable map https://github.com/ProLoser/funcheapmap/ unfortunately my crawler stopped working, need to fix it. EDIT: FIXED and UPDATED!

Lets you filter by event and date and plan your day around events going on in different parts of the city.

https://preview.redd.it/a5oq1x0v5qdd1.png?width=1410&format=png&auto=webp&s=3d0d8a4a4f1ae0bfb017a15f894ae74d02c5a3f7|3 |

Analyses on posting date/time

look at the number of threads by week for the last 12 months.

# create new column: date
threads_sfp %<>% 
  mutate(date = as.POSIXct(date_utc)) %>%
  filter(!is.na(date))

# number of threads by week
threads_sfp %>% 
  ggplot(aes(x = date)) +
  geom_histogram(color="black", position = 'stack', binwidth = 7*24*3600) +
  scale_x_datetime(date_labels = "%b %y",
                   breaks = seq(min(threads_sfp$date, na.rm = T), 
                                max(threads_sfp$date, na.rm = T), 
                                by = "1 week")) +
  theme_minimal()

weekends or weekdays?

# create new columns: day_of_week, is_weekend
  threads_sfp %<>%  
  mutate(day_of_week = wday(date, label = TRUE)) %>% 
  mutate(is_weekend = ifelse(day_of_week %in% c("Sat", "Sun"), "Weekend", "Weekday"))

# number of threads by time of day
threads_sfp %>% 
  ggplot(aes(x = day_of_week, fill = is_weekend)) +
  geom_bar(color = 'black') +
  scale_fill_manual(values = c("Weekday" = "gray", "Weekend" = "pink")) + 
  theme_minimal()

time of day

print(threads_sfp$timestamp[1])
## [1] 1717600561
print(threads_sfp$timestamp[1] %>% anytime(tz = anytime:::getTZ()))
## [1] "2024-06-05 11:16:01 EDT"
threads_sfp %<>%  
  mutate(time = timestamp %>% 
           anytime(tz = anytime:::getTZ()) %>% 
           str_split('-| |:') %>% 
           sapply(function(x) as.numeric(x[4])))
# number of threads by time of day
threads_sfp %>% 
  ggplot(aes(x = time)) +
  geom_histogram(bins = 24, color = 'black') +
  scale_x_continuous(breaks = seq(0, 24, by=2)) + 
  theme_minimal()

3. Tokenization and stop words

Tokenization

# Word tokenization
words <- threads_sfp %>% 
  unnest_tokens(output = word, input = text, token = "words")

words %>%
  count(word, sort = TRUE) %>%
  top_n(20) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(x = word, y = n)) +
  geom_col() +
  xlab(NULL) +
  coord_flip() +
  labs(x = "words",
       y = "counts",
       title = "Unique wordcounts")
## Selecting by n

Stop words

# load list of stop words - from the tidytext package
data("stop_words")
# view random 50 words
print(stop_words$word[sample(1:nrow(stop_words), 100)])
##   [1] "we'll"      "definitely" "sure"       "first"      "from"      
##   [6] "facts"      "think"      "or"         "m"          "other"     
##  [11] "if"         "us"         "does"       "sorry"      "where"     
##  [16] "get"        "new"        "available"  "over"       "above"     
##  [21] "longest"    "three"      "no"         "presents"   "did"       
##  [26] "asked"      "nine"       "going"      "you've"     "looking"   
##  [31] "by"         "different"  "good"       "else"       "hers"      
##  [36] "anybody"    "both"       "these"      "that"       "wherever"  
##  [41] "goes"       "yourself"   "we'd"       "ask"        "went"      
##  [46] "theres"     "have"       "yet"        "an"         "why"       
##  [51] "working"    "should"     "turned"     "an"         "became"    
##  [56] "become"     "isn't"      "etc"        "somehow"    "yourselves"
##  [61] "yet"        "itself"     "many"       "was"        "likely"    
##  [66] "are"        "we've"      "where"      "could"      "something" 
##  [71] "on"         "he"         "since"      "our"        "i"         
##  [76] "seems"      "goods"      "her"        "came"       "possible"  
##  [81] "this"       "he'd"       "obviously"  "become"     "this"      
##  [86] "they've"    "specified"  "anyhow"     "a"          "p"         
##  [91] "up"         "somebody"   "whereafter" "comes"      "ourselves" 
##  [96] "yes"        "perhaps"    "after"      "had"        "latterly"
# Regex that matches URL-type string
replace_reg <- "http[s]?://[A-Za-z\\d/\\.]+|&amp;|&lt;|&gt;"

words_clean <- threads_sfp %>% 
  # drop URLs
  mutate(text = str_replace_all(text, replace_reg, "")) %>%
  # Tokenization (word tokens)
  unnest_tokens(word, text, token = "words") %>% 
  # drop stop words
  anti_join(stop_words, by = "word") %>% 
  # drop non-alphabet-only strings
  filter(str_detect(word, "[a-z]"))

# Check the number of rows after removal of the stop words. There should be fewer words now
print(
  glue::glue("Before: {nrow(words)}, After: {nrow(words_clean)}")
)
## Before: 57564, After: 21117
words_clean %>%
  count(word, sort = TRUE) %>%
  top_n(20, n) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(x = word, y = n)) +
  geom_col() +
  xlab(NULL) +
  coord_flip() +
  labs(x = "words",
       y = "counts",
       title = "Unique wordcounts")

4. Generate Word cloud (except keywords)

Frequency of words before and after removing the stop words using word cloud.

words %>% 
  count(word, sort = TRUE) %>% 
  wordcloud2()
words_clean %>% 
  filter(!word %in% c("no", "parking")) %>%
  count(word, sort = TRUE) %>% 
  wordcloud2()

Change colors

n <- 20
h <- runif(n, 0, 1) # any color
s <- runif(n, 0.7, 1) # vivid
v <- runif(n, 0.4, 0.7) # neither too dark or bright

df_hsv <- data.frame(h = h, s = s, v = v)
pal <- apply(df_hsv, 1, function(x) hsv(x['h'], x['s'], x['v']))
pal <- c(pal, rep("grey", 10000))
words_clean %>% 
  filter(!word %in% c("no", "parking")) %>%
  count(word, sort = TRUE) %>% 
  wordcloud2(color = pal, 
             minRotation = 0, 
             maxRotation = 0, 
             ellipticity = 0.8)

5. Tri-grams

words_ngram <- threads_sfp %>%
  mutate(text = str_replace_all(text, replace_reg, "")) %>%
  select(text) %>%
  unnest_tokens(output = paired_words,
                input = text,
                token = "ngrams",
                n = 3)
# Show ngrams with sorted values
words_ngram %>%
  count(paired_words, sort = TRUE) %>% 
  head(20) %>% 
  knitr::kable()
paired_words n
we need to 39
in the city 22
NA 20
a lot of 19
in front of 18
golden gate park 17
i don t 17
out of the 17
in san francisco 15
of the city 15
sf mental health 15
on the street 14
i went to 13
in the area 13
need to be 13
at this point 12
side of the 12
there is a 12
and it was 11
i want to 11
#separate the paired words into two columns
words_ngram_pair <- words_ngram %>%
  separate(paired_words, c("word1", "word2"), sep = " ")
## Warning: Expected 2 pieces. Additional pieces discarded in 56361 rows [1, 2, 3, 4, 5, 6,
## 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
# filter rows where there are stop words under word 1 column and word 2 column
words_ngram_pair_filtered <- words_ngram_pair %>%
  # drop stop words
  filter(!word1 %in% stop_words$word & !word2 %in% stop_words$word) %>% 
  # drop non-alphabet-only strings
  filter(str_detect(word1, "[a-z]") & str_detect(word2, "[a-z]"))

# Filter out words that are not encoded in ASCII
# To see what's ASCII, google 'ASCII table'
library(stringi)
words_ngram_pair_filtered %<>% 
  filter(stri_enc_isascii(word1) & stri_enc_isascii(word2))

# Sort the new bi-gram (n=2) counts:
words_counts <- words_ngram_pair_filtered %>%
  count(word1, word2) %>%
  arrange(desc(n))

head(words_counts, 20) %>% 
  knitr::kable()
word1 word2 n
san francisco 72
mental health 41
pjpgauto webps 29
golden gate 23
parking lot 18
affordable housing 17
gate park 17
license plate 15
sf mental 15
north beach 14
parking spot 12
union square 11
street parking 10
dolores park 9
health services 9
parking enforcement 9
public transit 9
4624format pjpgauto 8
ocean beach 8
width 4624format 8

Visualize the words occurring in pairs.

# plot word network
words_counts %>%
  filter(n >= 8) %>%
  graph_from_data_frame() %>% # convert to graph
  ggraph(layout = "fr") +
  geom_edge_link(aes(edge_alpha = .6, edge_width = n)) +
  geom_node_point(color = "darkslategray4", size = 3) +
  geom_node_text(aes(label = name), vjust = 2, color = "red3") +
  labs(title = "Word Networks",
       x = "", y = "")
## Warning: The `trans` argument of `continuous_scale()` is deprecated as of ggplot2 3.5.0.
## ℹ Please use the `transform` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

6. Sentiment Analysis

Import the result

Import the sentiment analysis result data processed from the Colab to R.

# import the data
reddit_sentiment <- read_csv("sfp_reddit_bert.csv")
## New names:
## Rows: 229 Columns: 11
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (5): title, text, subreddit, url, bert_label dbl (5): ...1, Unnamed: 0,
## timestamp, comments, bert_score date (1): date_utc
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
# drop NAs
reddit_sentiment %<>% drop_na('bert_label')

Comparison with the dictionary method

Get sentiment scores using the dictionary method for comparison.

# Join thread title and text.
reddit_sentiment %<>%
  mutate(title = replace_na(title, ""),
         text = replace_na(text, ""),
         title_text = str_c(title, text, sep = ". "))

# dictionary method
reddit_sentiment_dictionary <- sentiment_by(reddit_sentiment$title_text)

reddit_sentiment$sentiment_dict <- reddit_sentiment_dictionary %>% pull(ave_sentiment)
reddit_sentiment$word_count <- reddit_sentiment_dictionary %>% pull(word_count)

Check the correlation between the sentiment values from two different methods.

reddit_sentiment %<>% mutate(bert_label_numeric = str_sub(bert_label, 1, 1) %>% as.numeric())

cor(reddit_sentiment$bert_label_numeric, reddit_sentiment$sentiment_dict)
## [1] 0.2272316

0.227 implies a low positive correlation.

ggplot(data = reddit_sentiment, aes(x = bert_label_numeric, y = sentiment_dict)) +
  geom_jitter(width = 0.1, height = 0) +
  geom_line(aes(y = 0), color = '#FFD700', lwd = 1, linetype='dashed') +
  dark_theme_grey()
## Inverted geom defaults of fill and color/colour.
## To change them back, use invert_geom_defaults().

  • BERT: 1 star (negative) vs. 5 stars (positive)
bert_example <- reddit_sentiment %>%
  filter(bert_label_numeric %in% c(1,5)) %>%
  group_by(bert_label) %>%
  arrange(desc(bert_score)) %>%
  slice_head(n = 3) %>%
  ungroup()

# 1 star
bert_example %>% filter(bert_label_numeric == 1) %>% pull(title_text) %>% print()
## [1] "Market St. Closure. I am so sick of this of this nonsense. I see absolutely no reason to close off Market St for any reason let alone a parade or a protest. This street is one of the main thourofares to get to the bay bridge and get out of the city and it is once again closed for a parade which leaves the embarcadero as the only street to get to the bay bridge if you are north of Market. Embarcadero is a parking lot right now. The last time this happened it took me an hour and a half to go 3/4 of a mile. Free speech, peaceful assembly I DONT CARE I DONT CARE I DONT CARE this is nonsense and if I miss the start of the 9er game because of this  I will figuratively rain hell fire upon city hall.  NONSENSE I SAY!!!!!!!!!"                                                                                                                                                                                                                                
## [2] "This Derelict Vehicle!?!?. What can be done about this awful vehicle? My partner and I call it the Shitsubishi. This SUV keeps parking in the Powell street delivery pullouts illegally for days on end and often half blocking the city street. It has countless unpaid parking tickets, its registration has been expired for over 2 years, no front plates, it\031s leaking fluids and it\031s multiple drivers often are seen throwing trash directly from the vehicle right into the street. We report it to 311 as often as we can. If we were in any other city in the US this vehicle would be towed! WTF! We pay our vehicle registrations, our parking tickets and for a residential parking passes. Why is there no consequences for this kind of vehicle and behavior? Now the latest is that they spray painted their own license plate red. Why?  This makes the city look awful to all the riders on the street cars and as locals that lives on Powell we are over it."
## [3] "Rincon Hill Dog Park Police Activity?. Does anyone know what\031s going on near the Rincon Hill dog park (Saturday, 4/6)?  I received a Citizen app alert that a \034pet fell from a bridge\035.  But commenters are saying someone *threw* another\031s dog off the bridge, while others are saying it was not a dog but a *human*.  From the video, there is a large police presence.  Completely and utterly awful no matter.  Just wondering if anyone had insight?"
# 5 star
bert_example %>% filter(bert_label_numeric == 5) %>% pull(title_text) %>% print()
## [1] "Caption this photo. Colma Best Buy parking lot."                                   
## [2] "An excellent visualization of San Francisco's \"progressive crescent\" precincts. "
## [3] "hanging out in North Beach. "
  • Dictionary method: negative vs. positive
sentimentr_example <- reddit_sentiment %>%
  mutate(sentimentr_abs = abs(sentiment_dict),
         sentimentr_binary = case_when(sentiment_dict > 0 ~ 'positive',
                                       TRUE ~ 'negative')) %>%
  group_by(sentimentr_binary) %>%
  arrange(desc(sentimentr_abs)) %>%
  slice_head(n = 3) %>%
  ungroup() %>%
  arrange(sentiment_dict)

# negative
sentimentr_example %>% filter(sentimentr_binary == 'negative') %>% pull(title_text) %>% print()
## [1] "SFMTA Labor Day Parking Enforcement. "                                                                                                                                                      
## [2] "stolen motorcycle. some one with no soul or regard for life stole my baby.  My 2006 Yamaha fz6 between the panhandle and Buena Vista Park.  She was red and beautiful and I am heartbroken."
## [3] "Memorial Day SFMTA Street Parking Enforcement. "
# positive
sentimentr_example %>% filter(sentimentr_binary == 'positive') %>% pull(title_text) %>% print()
## [1] "New housing will add 160 affordable units to S.F.\031s Haight \024 but no parking spots. "                                                                                                                                    
## [2] "An excellent visualization of San Francisco's \"progressive crescent\" precincts. "                                                                                                                                           
## [3] "you can park anywhere right now, if only anything was open. i\031m glad no one\031s working thanksgiving and that people are out of town with their family but wish I could make better use of all the free parking, just me?"

Visualization

Sentiment distribution

  • Number of threads by sentiment category.
  • Negative sentiments (1-star ratings) dominate the dataset, indicating dissatisfaction regarding parking or related issues.
reddit_sentiment %>%
  ggplot(aes(x = bert_label)) +
  geom_bar(fill = "white") +
  dark_theme_gray()

  • Word counts by sentiment category.
reddit_sentiment %>%
  ggplot(aes(x = bert_label, y = comments)) +
  geom_jitter(height = 0, width = 0.05) +
  stat_summary(fun = mean, geom = "crossbar", width = 0.4, color = "red") +
  dark_theme_gray()

  • Association between a thread’s sentiment and the number of comments on the thread.
# Remove outliers
reddit_sentiment_rm_outlier <- reddit_sentiment %>%
  group_by(bert_label) %>%
  filter(
    between(
      comments,
      quantile(comments, 0.25) - 1.5 * IQR(comments),
      quantile(comments, 0.75) + 1.5 * IQR(comments)))

# Correlation analysis
cor.test(reddit_sentiment_rm_outlier$comments, reddit_sentiment_rm_outlier$bert_label_numeric)
## 
##  Pearson's product-moment correlation
## 
## data:  reddit_sentiment_rm_outlier$comments and reddit_sentiment_rm_outlier$bert_label_numeric
## t = -0.37591, df = 155, p-value = 0.7075
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1859393  0.1270587
## sample estimates:
##         cor 
## -0.03018012
# Scatterplot
reddit_sentiment_rm_outlier %>%
  ggplot(aes(x = bert_label_numeric, y = comments)) +
  geom_jitter(height = 0, width = 0.05) + 
  geom_smooth(method = 'loess', span = 0.75) +
  dark_theme_gray()
## `geom_smooth()` using formula = 'y ~ x'

Word clouds

Using word clouds to visualize words that are frequently seen in either positive or negative threads

# Stop word removal and tokenization
data("stop_words")
replace_reg <- "http[s]?://[A-Za-z\\d/\\.]+|&amp;|&lt;|&gt;"

reddit_sentiment_clean <- reddit_sentiment %>%
  mutate(title_text = str_replace_all(title_text, replace_reg, "")) %>%
  # tokenize
  unnest_tokens(word, title_text, token = "words") %>%
  # remove stop words
  anti_join(stop_words, by = "word") %>%
  filter(str_detect(word, "[a-z]")) %>%
  filter(!word %in% c('flu','shot','shots')) # You need to replace this with your keyword
# negative text
reddit_sentiment_clean_negative <- reddit_sentiment_clean %>%
  filter(bert_label_numeric %in% c(1,2))
# positive text
reddit_sentiment_clean_positive <- reddit_sentiment_clean %>%
  filter(bert_label_numeric %in% c(4,5))

# Remove words that are commonly seen in both negative and positive threads
reddit_sentiment_clean_negative_unique <- reddit_sentiment_clean_negative %>%
  anti_join(reddit_sentiment_clean_positive, by = 'word')
reddit_sentiment_clean_positive_unique <- reddit_sentiment_clean_positive %>%
  anti_join(reddit_sentiment_clean_negative, by = 'word')
  • Words appearing in negative threads
# Wordcloud with a custom color palette
n <- 20
h <- runif(n, 0, 1) # any color
s <- runif(n, 0.6, 1) # vivid
v <- runif(n, 0.3, 0.7) # neither too dark nor too bright

df_hsv <- data.frame(h = h, s = s, v = v)
pal <- apply(df_hsv, 1, function(x) hsv(x['h'], x['s'], x['v']))
pal <- c(pal, rep("grey", 10000))

reddit_sentiment_clean_negative_unique %>%
  count(word, sort = TRUE) %>%
  wordcloud2(color = pal,
       minRotation = -pi/6,
       maxRotation = -pi/6,
       rotateRatio = 1)
  • Words appearing in positive threads
# Wordcloud with a custom color palette
n <- 20
h <- runif(n, 0, 1) # any color
s <- runif(n, 0.6, 1) # vivid
v <- runif(n, 0.3, 0.7) # neither too dark nor too bright

df_hsv <- data.frame(h = h, s = s, v = v)
pal <- apply(df_hsv, 1, function(x) hsv(x['h'], x['s'], x['v']))
pal <- c(pal, rep("grey", 10000))

reddit_sentiment_clean_positive_unique %>%
  count(word, sort = TRUE) %>%
  wordcloud2(color = pal,
       minRotation = pi/6,
       maxRotation = pi/6,
       rotateRatio = 1)

Temporal analysis

Create new columns for the temporal analysis: date, year, day_of_week, is_weekend, time

reddit_sentiment %<>%
  mutate(date = as.POSIXct(date_utc)) %>%
  filter(!is.na(date)) %>%
  mutate(year = year(date),
         month = as.Date(floor_date(date, unit = "month")),
         day_of_week = wday(date, label = TRUE),
         is_weekend = ifelse(day_of_week %in% c("Sat", "Sun"), "Weekend", "Weekday"),
         time = timestamp %>%
           anytime(tz = anytime:::getTZ()) %>%
           str_split('-| |:') %>%
           sapply(function(x) as.numeric(x[4])))
  • Sentiment by month using a stacked bar plot
# sentiment by month
reddit_sentiment %>%
  ggplot(aes(x = month, fill = bert_label)) +
  geom_bar(position = 'stack') +
  scale_x_date(date_labels = "%b %Y",
                 date_breaks = "1 month") +
  scale_fill_brewer(palette = 'PuRd', direction = -1) +
  dark_theme_grey()

proportions.

# sentiment by month
reddit_sentiment %>%
  ggplot(aes(x = month, fill = bert_label)) +
  geom_bar(position = 'fill') +
  scale_x_date(date_labels = "%b %Y",
                 date_breaks = "1 month") +
  scale_fill_brewer(palette = 'PuRd', direction = -1) +
  dark_theme_grey()

  • Sentiment by day of week.
# sentiment by day
reddit_sentiment %>%
  ggplot(aes(x = day_of_week, fill = bert_label)) +
  geom_bar(position = 'fill') +
  scale_fill_brewer(palette = 'PuRd', direction = -1) +
  dark_theme_grey()

  • Sentiment by time of day.
reddit_sentiment %>%
  ggplot(aes(x = time, fill = bert_label)) +
  geom_histogram(bins = 24, position = 'fill', color = 'black', lwd = 0.2) +
  scale_x_continuous(breaks = seq(0, 24, by=1)) +
  scale_fill_manual(values = c('#bc5090', '#bc5090', '#ff6361', '#ffa600', '#ffa600')) +
  dark_theme_grey()
## Warning: Removed 15 rows containing missing values or values outside the scale range
## (`geom_bar()`).

ggplot(reddit_sentiment, aes(x = sentiment_dict)) +
    geom_histogram(binwidth = 0.1, fill = "steelblue", color = "black") +
    labs(title = "Distribution of Sentiment Scores", x = "Sentiment Score", y = "Count") +
    theme_minimal()

ggplot(reddit_sentiment, aes(x = sentiment_dict, y = comments)) +
    geom_point(alpha = 0.6, color = "darkred") +
    geom_smooth(method = "loess", color = "blue") +
    labs(title = "Sentiment Score vs. Engagement", x = "Sentiment Score", y = "Comment Count") +
    theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

7. Sample texts and their sentiment scores

library(stringr)

set.seed(123)

sample_texts <- reddit_sentiment %>%
    sample_n(10) %>%
    mutate(short_text = str_sub(text, 1, 1000)) %>%  # Display the first 100 characters
    select(short_text, bert_label_numeric)

# Display the table
knitr::kable(sample_texts, caption = "Sample Texts with Sentiment Scores")
Sample Texts with Sentiment Scores
short_text bert_label_numeric
The City wants to install meters “pay or permit” devices, to be specific) in almost all parts of Cow Hollow/Marina. A meeting is bring held Monday, the Monday after Thanksgiving. I can see it benefitting certain people (namely those workers who are used to moving their car every few hours), but I think it is yet another money grab.

I believe the rates will be subject to peak demand, as well. I had no idea it is going to encompass such a huge majority of the neighborhood!

https://form.jotform.com/243288802092156?fbclid=IwY2xjawGzLXpleHRuA2FlbQIxMQABHR4SBTdWy08097YNe5D3mgOnm6c1age24BiKn5A6EJKkrhCNiNeqLHG_1w_aem_cQ25-3qYhjqIJ6-6Hfv04w | 2| |i got 2 tickets for not having a front license plate (cause I had gotten a new bumper and was just lazy to install the plate right away) within a month so I know they can’t be getting lucky with parking fairies.

edit: i don’t own a tesla lol | 1| |Just as the title says, Ive parked on the sides before and always use the PayByPhone app. Theres always the no stopping signs on all the parking spots. But I was always confused if youre actually allowed to park there because I wondered why would the hours you need to pay for parking overlap with the no stopping sign? Ive seen plenty of cars parked there and never seen them get a ticket. Anyone have a concrete answer for this? And do you think I can appeal the ticket?

I parked near nicks gyro, at 5pm and paid for an hour. Then got my ticket around 530pm. | 1| |Hi, I am a new driver so I am not too confident about parking with these signs. Today is Thursday, would it be fine if I parked here @5pm today and get the car before 10am tomorrow (Friday)? I do not want to get a ticket.

Any insights would be appreciated! | 2| |My husband and I are taking a very long overdue vacation and booked a trip to SF in June. Were staying near Dolores Park and plan to do a lot of walking and sight seeing during our stay. I would love some feedback and recommendations on the itinerary I have planned (see below). Also, we plan on packing a lot of casual athletic wear (leggings, hoodies, light jackets) so recommendations on what clothes to pack is appreciated as well.

Theres so much we want to do and see that isnt listed on our itinerary but Id love to hear from locals and others who have visited on whats an absolute cant miss in SF. We will be walking a lot and utilizing Uber/Lfyt. So no worries on parking or renting a car.

Wednesday - [ ] Fly in - [ ] Settle into airbnb - [ ] District Tea (Picnic Pack) - [ ] Sunset Picnic at Dolores Park - [ ] Bed early!

Thursday - [ ] Tour: Muir Woods, Napa/Somoma, Golden Gate Bridge (lunch during tour) - [ ] Dinner and exploring on Pier 39

Friday - [ ] | 4| |Get your kids. There’s a bunch of teenage guys doing donuts on scooters in Washington Square Park.

Edit 1: Everyone here claims to care about other people. We claim to be progressive. But the responses to this post are the opposite. No one seems to care about their neighbors in North Beach.

Edit 2: I posted because I’m hoping the parents see as there is a large Reddit population in SF. I did call the police. I’m getting a lot of hate for caring about my park. I made the mistaken assumption that people were kind and cared about their neighbors and city. | 1| | | 4| | | 2| |Will be coming for four days in March and have a rental car. Prices for hotels with no parking in DT area compared to Oakland or further south with parking is about $750+ vs $350+. Is it a hassle to commute into the city each morning and finding parking? Worth the extra price? | 3| |I went to Stonestown yesterday and Round 1 was open (10am-2am). It was a lively with a bunch of claw machines, DDR, arcade games. There was also food and a line to play billiards, bowling, and karaoke. I would highly recommend going if you have no plans today!

Please note that the claw machine games seem to be $2.25 (9 credits), $2.75 (11 credits), and $3.50 (14 credits) a play. Each credit costs $0.25 with a $10 minimum on the playing card and you get bonus credits if you purchase higher increments.

Quick edit: Side note, the main entrance was not open and I had to use the parking entrance downstairs to get in. I went through Target and took the stairs down. | 1|

Credibility of the sentiment analysis outcomes

“The sentiment analysis outcomes are mostly credible, as the majority of scores align with the general tone and content of the sampled texts. However, there are specific cases where the scores could be refined for greater accuracy.”

  1. but I think it is yet another money grab.
  • 2 (Accurate)
  1. i got 2 tickets for not having a front license plate (cause I had gotten a new bumper and was just lazy to install the plate right away) within a month so I know they can’t be getting lucky with parking fairies.
  • 1 (Accurate)
  1. But I was always confused if you’re actually allowed to park there because I wondered why would the hours you need to pay for parking overlap with the no stopping sign? I’ve seen plenty of cars parked there and never seen them get a ticket.
  • 1 (not very accurate)
  1. I do not want to get a ticket. Any insights would be appreciated!
  • 2 (not very accurate)
  1. I would love some feedback and recommendations on the itinerary I have planned
  • 4 (Accurate)
  1. We claim to be progressive. But the responses to this post are the opposite. No one seems to care about their neighbors in North Beach.
  • 1 (Accurate)
  1. Prices for hotels with no parking in DT area compared to Oakland or further south with parking is about $750+ vs 350+. Is it a hassle to commute into the city each morning and finding parking? Worth the extra price?
  • 3 (Accurate)
  1. I would highly recommend going if you have no plans today! Quick edit: Side note, the main entrance was not open and I had to use the parking entrance downstairs to get in.
  • 1 (Not Accurate)

8. Insights derived from the sentiment analysis

  • The word network plot highlights clusters around “parking,” “street,” “enforcement,” and “lot,” emphasizing common concerns related to enforcement, location, and availability. The clustering around “housing” and “affordable” indicates discussions linking parking issues with broader affordability and housing challenges in urban contexts.

The monthly sentiment distribution reveals low-star (1-2 stars) feedback peaking during summer months (June-August), likely due to heightened frustrations during peak travel and events. Conversely, high-star (4-5 stars) ratings increase in spring (April-May) and fall (September-October), possibly reflecting positive experiences tied to seasonal activities and reduced congestion.

Low Ratings (1 star): Negative sentiment is consistent throughout the week, but slightly higher on weekends (Friday and Saturday), potentially reflecting dissatisfaction with events or services experienced during leisure activities.

High Ratings (5 stars): Positive sentiment remains relatively low and stable across the week, indicating that users rarely express high praise regardless of the day.

## Warning: Removed 15 rows containing missing values or values outside the scale range
## (`geom_bar()`).

Low-Star Periods (1-2 stars): Early mornings (midnight to 9 AM) exhibit the highest proportion of complaints and frustrations, often related to parking issues, enforcement, and safety concerns.

High-Star Periods (4-5 stars): Late mornings to afternoons (10 AM to 5 PM) show increased positivity, likely tied to recommendations, positive experiences, or community engagement.