Data Source: PTT CVS
We use web crawler technique to fetch data from public websites. In this article, the data sourcce is from most recent 1,000 articles (posts) on PTT CVS board.
## Source: local data table [1,000 x 8]
##
## # tbl_dt [1,000 x 8]
## board author author_ip
## <chr> <chr> <chr>
## 1 CVS RainIced (我好想念快速的宿網) <NA>
## 2 CVS BlueANSI (藍色) <NA>
## 3 CVS jan777 (jan) 49.217.17.246
## 4 CVS hank7352288 (=彩虹小桶) <NA>
## 5 CVS baiqire (少女小涵) <NA>
## 6 CVS thouloveme (赫赫) 111.240.177.30
## 7 CVS AngryYouth (0-0) <NA>
## 8 CVS tengobo (潶痞) 110.26.224.26
## 9 CVS edina (席那) 123.51.219.125
## 10 CVS Jiapie (小星) 219.87.162.162
## # ... with 990 more rows, and 5 more variables: title <chr>,
## # post_time <time>, post_url <chr>, post_id <chr>, post_text <chr>
Variables:
## [1] "board" "author" "author_ip" "title" "post_time" "post_url"
## [7] "post_id" "post_text"
The period of the posts:
## [1] "2016-04-14 01:51:01 CST" "2016-07-12 08:34:51 CST"
We’ll focus on main text for analysis.
Article Titles
We’re going to dig into how people (or 鄉民) discuss and what they are talking about.
First take a glimpse:
## [1] "[問題] 信用卡繳費的存根聯"
## [2] "[商品] [全家]西瓜聖代"
## [3] "[問題] 全家咖啡豆有換過嗎"
## [4] "[問題] 今年7/11沒活動了?"
## [5] "[情報]小七每日一商品,HAPPY GO 100點天天換!"
## [6] "[商品] 7-11 七七乳加巧克力(鳳梨口味)"
Article Categories
There are mainlly 16 types of articles.
## [1] "問題" "商品" "情報" "討論" "閒聊" "新聞" "推薦" "公告" "創作" "問卷"
## [11] "贈送" "抱怨" "感想" "爆卦" "食記" "廣宣"
Volume of each category.

View by FamilyMart and 7-11.

Main Text of the Articles
See what’s the frequently discussed keywords in the articles.
## <<DocumentTermMatrix (documents: 1000, terms: 886)>>
## Non-/sparse entries: 30225/855775
## Sparsity : 97%
## Maximal term length: 9
## Weighting : term frequency (tf)

Volume does not necessarily mean posive opions
Topic Models
## Warning in eattrs[[name]][index] <- value: number of items to replace is
## not a multiple of replacement length

Topics
咖啡 |
原價 |
葡萄 |
客人 |
店長 |
使用 |
買一送一 |
芒果 |
剛剛 |
男童 |
優惠 |
單獨 |
限定 |
可是 |
網友 |
禮券 |
鳳梨 |
冰棒 |
同事 |
新聞 |
寄杯 |
line |
汽水 |
每次 |
影片 |
期限 |
限時 |
蜂蜜 |
突然 |
小孩 |
We list some of those topics by their keyword:
Cloud Report
