<내가 긍정적으로 본 영화 <리틀 포레스트> 리뷰에는 어떤 단어들이 많이 나오는지 분석>

install.packages("readr", repos = "https://cran.rstudio.com/")
## 패키지 'readr'를 성공적으로 압축해제하였고 MD5 sums 이 확인되었습니다
## 
## 다운로드된 바이너리 패키지들은 다음의 위치에 있습니다
##  C:\Users\Public\Documents\ESTsoft\CreatorTemp\RtmpYR4ckc\downloaded_packages
install.packages("tm", repos = "https://cran.rstudio.com/")
## 패키지 'tm'를 성공적으로 압축해제하였고 MD5 sums 이 확인되었습니다
## 
## 다운로드된 바이너리 패키지들은 다음의 위치에 있습니다
##  C:\Users\Public\Documents\ESTsoft\CreatorTemp\RtmpYR4ckc\downloaded_packages
library(readr)
## Warning: 패키지 'readr'는 R 버전 4.4.3에서 작성되었습니다
library(tm)
## Warning: 패키지 'tm'는 R 버전 4.4.3에서 작성되었습니다
## 필요한 패키지를 로딩중입니다: NLP

분석할 텍스트

review <- c("I went to heal, but ended up wanting to quit my job.
Living without any breathing room, I couldn’t even take care of my own health,
let alone those around me. Watching the movie made me reflect: Is this really a life 
for myself? I think it was a moment to look back. I hope all the weary youth get a chance
to see it. After watching, I felt a bit of courage that maybe it’s okay to take a rest 
for a while.")

데이터프레임으로 변환

df <- data.frame(sentiment = "positive", text = review, stringsAsFactors = FALSE)

텍스트 전처리

docs <- VCorpus(VectorSource(df$text))
docs <- tm_map(docs, content_transformer(tolower))
docs <- tm_map(docs, removePunctuation)
docs <- tm_map(docs, removeWords, stopwords("english"))
docs <- tm_map(docs, stripWhitespace)

단어 행렬 생성 및 확인

dtm <- DocumentTermMatrix(docs)
dtm_mat <- as.matrix(dtm)

단어 빈도 비교

library(dplyr)
## 
## 다음의 패키지를 부착합니다: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
word_freq <- sort(colSums(dtm_mat), decreasing = TRUE)
print(word_freq)
##      take  watching     alone    around      back       bit breathing      care 
##         2         2         1         1         1         1         1         1 
##    chance couldn’t   courage     ended      even      felt       get      heal 
##         1         1         1         1         1         1         1         1 
##    health      hope       job       let      life    living      look      made 
##         1         1         1         1         1         1         1         1 
##     maybe    moment     movie      okay      quit    really   reflect      rest 
##         1         1         1         1         1         1         1         1 
##      room       see     think   wanting     weary      went   without     youth 
##         1         1         1         1         1         1         1         1
word_freq_sorted <- sort(word_freq, decreasing = TRUE)
top_n <- 10
head(word_freq_sorted, top_n)
##      take  watching     alone    around      back       bit breathing      care 
##         2         2         1         1         1         1         1         1 
##    chance couldn’t 
##         1         1

워드클라우드로 변환

install.packages("wordcloud", repos = "https://cran.rstudio.com/")
## 패키지 'wordcloud'를 성공적으로 압축해제하였고 MD5 sums 이 확인되었습니다
## 
## 다운로드된 바이너리 패키지들은 다음의 위치에 있습니다
##  C:\Users\Public\Documents\ESTsoft\CreatorTemp\RtmpYR4ckc\downloaded_packages
library(wordcloud)
## Warning: 패키지 'wordcloud'는 R 버전 4.4.3에서 작성되었습니다
## 필요한 패키지를 로딩중입니다: RColorBrewer
set.seed(123)  
wordcloud(words = names(word_freq_sorted),
          freq = word_freq_sorted,
          min.freq = 1,
          scale = c(5, 0.3),           
          rot.per = 0.25,             
          colors = colorRampPalette(brewer.pal(8, "Dark2"))(length(word_freq_sorted)),
          random.color = FALSE,        
          use.r.layout = FALSE)
## Warning in wordcloud(words = names(word_freq_sorted), freq = word_freq_sorted,
## : around could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = names(word_freq_sorted), freq = word_freq_sorted,
## : courage could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = names(word_freq_sorted), freq = word_freq_sorted,
## : moment could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = names(word_freq_sorted), freq = word_freq_sorted,
## : living could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = names(word_freq_sorted), freq = word_freq_sorted,
## : wanting could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = names(word_freq_sorted), freq = word_freq_sorted,
## : okay could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = names(word_freq_sorted), freq = word_freq_sorted,
## : reflect could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = names(word_freq_sorted), freq = word_freq_sorted,
## : health could not be fit on page. It will not be plotted.

분석 결과 “watching”과 “take”가 가장 많이 나온것을 알 수 있었음