<내가 긍정적으로 본 영화 <리틀 포레스트> 리뷰에는 어떤 단어들이
많이 나오는지 분석>
install.packages("readr", repos = "https://cran.rstudio.com/")
## 패키지 'readr'를 성공적으로 압축해제하였고 MD5 sums 이 확인되었습니다
##
## 다운로드된 바이너리 패키지들은 다음의 위치에 있습니다
## C:\Users\Public\Documents\ESTsoft\CreatorTemp\RtmpYR4ckc\downloaded_packages
install.packages("tm", repos = "https://cran.rstudio.com/")
## 패키지 'tm'를 성공적으로 압축해제하였고 MD5 sums 이 확인되었습니다
##
## 다운로드된 바이너리 패키지들은 다음의 위치에 있습니다
## C:\Users\Public\Documents\ESTsoft\CreatorTemp\RtmpYR4ckc\downloaded_packages
library(readr)
## Warning: 패키지 'readr'는 R 버전 4.4.3에서 작성되었습니다
library(tm)
## Warning: 패키지 'tm'는 R 버전 4.4.3에서 작성되었습니다
## 필요한 패키지를 로딩중입니다: NLP
분석할 텍스트
review <- c("I went to heal, but ended up wanting to quit my job.
Living without any breathing room, I couldn’t even take care of my own health,
let alone those around me. Watching the movie made me reflect: Is this really a life
for myself? I think it was a moment to look back. I hope all the weary youth get a chance
to see it. After watching, I felt a bit of courage that maybe it’s okay to take a rest
for a while.")
데이터프레임으로 변환
df <- data.frame(sentiment = "positive", text = review, stringsAsFactors = FALSE)
텍스트 전처리
docs <- VCorpus(VectorSource(df$text))
docs <- tm_map(docs, content_transformer(tolower))
docs <- tm_map(docs, removePunctuation)
docs <- tm_map(docs, removeWords, stopwords("english"))
docs <- tm_map(docs, stripWhitespace)
단어 행렬 생성 및 확인
dtm <- DocumentTermMatrix(docs)
dtm_mat <- as.matrix(dtm)
단어 빈도 비교
library(dplyr)
##
## 다음의 패키지를 부착합니다: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
word_freq <- sort(colSums(dtm_mat), decreasing = TRUE)
print(word_freq)
## take watching alone around back bit breathing care
## 2 2 1 1 1 1 1 1
## chance couldn’t courage ended even felt get heal
## 1 1 1 1 1 1 1 1
## health hope job let life living look made
## 1 1 1 1 1 1 1 1
## maybe moment movie okay quit really reflect rest
## 1 1 1 1 1 1 1 1
## room see think wanting weary went without youth
## 1 1 1 1 1 1 1 1
word_freq_sorted <- sort(word_freq, decreasing = TRUE)
top_n <- 10
head(word_freq_sorted, top_n)
## take watching alone around back bit breathing care
## 2 2 1 1 1 1 1 1
## chance couldn’t
## 1 1
워드클라우드로 변환
install.packages("wordcloud", repos = "https://cran.rstudio.com/")
## 패키지 'wordcloud'를 성공적으로 압축해제하였고 MD5 sums 이 확인되었습니다
##
## 다운로드된 바이너리 패키지들은 다음의 위치에 있습니다
## C:\Users\Public\Documents\ESTsoft\CreatorTemp\RtmpYR4ckc\downloaded_packages
library(wordcloud)
## Warning: 패키지 'wordcloud'는 R 버전 4.4.3에서 작성되었습니다
## 필요한 패키지를 로딩중입니다: RColorBrewer
set.seed(123)
wordcloud(words = names(word_freq_sorted),
freq = word_freq_sorted,
min.freq = 1,
scale = c(5, 0.3),
rot.per = 0.25,
colors = colorRampPalette(brewer.pal(8, "Dark2"))(length(word_freq_sorted)),
random.color = FALSE,
use.r.layout = FALSE)
## Warning in wordcloud(words = names(word_freq_sorted), freq = word_freq_sorted,
## : around could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = names(word_freq_sorted), freq = word_freq_sorted,
## : courage could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = names(word_freq_sorted), freq = word_freq_sorted,
## : moment could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = names(word_freq_sorted), freq = word_freq_sorted,
## : living could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = names(word_freq_sorted), freq = word_freq_sorted,
## : wanting could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = names(word_freq_sorted), freq = word_freq_sorted,
## : okay could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = names(word_freq_sorted), freq = word_freq_sorted,
## : reflect could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = names(word_freq_sorted), freq = word_freq_sorted,
## : health could not be fit on page. It will not be plotted.
분석 결과 “watching”과 “take”가 가장 많이 나온것을 알 수 있었음