Data Source: PTT CVS

We use web crawler technique to fetch data from public websites. In this article, the data sourcce is from most recent 1,000 articles (posts) on PTT CVS board.

## Source: local data table [1,000 x 8]
## 
## # tbl_dt [1,000 x 8]
##    board                        author      author_ip
##    <chr>                         <chr>          <chr>
## 1    CVS RainIced (我好想念快速的宿網)           <NA>
## 2    CVS               BlueANSI (藍色)           <NA>
## 3    CVS                  jan777 (jan)  49.217.17.246
## 4    CVS       hank7352288 (=彩虹小桶)           <NA>
## 5    CVS            baiqire (少女小涵)           <NA>
## 6    CVS             thouloveme (赫赫) 111.240.177.30
## 7    CVS              AngryYouth (0-0)           <NA>
## 8    CVS                tengobo (潶痞)  110.26.224.26
## 9    CVS                  edina (席那) 123.51.219.125
## 10   CVS                 Jiapie (小星) 219.87.162.162
## # ... with 990 more rows, and 5 more variables: title <chr>,
## #   post_time <time>, post_url <chr>, post_id <chr>, post_text <chr>

Variables:

## [1] "board"     "author"    "author_ip" "title"     "post_time" "post_url" 
## [7] "post_id"   "post_text"

The period of the posts:

## [1] "2016-04-14 01:51:01 CST" "2016-07-12 08:34:51 CST"

We’ll focus on main text for analysis.

Article Titles

We’re going to dig into how people (or 鄉民) discuss and what they are talking about.

First take a glimpse:

## [1] "[問題] 信用卡繳費的存根聯"                   
## [2] "[商品] [全家]西瓜聖代"                     
## [3] "[問題] 全家咖啡豆有換過嗎"                   
## [4] "[問題] 今年7/11沒活動了?"                   
## [5] "[情報]小七每日一商品,HAPPY GO 100點天天換!"
## [6] "[商品] 7-11 七七乳加巧克力(鳳梨口味)"

Article Categories

There are mainlly 16 types of articles.

##  [1] "問題" "商品" "情報" "討論" "閒聊" "新聞" "推薦" "公告" "創作" "問卷"
## [11] "贈送" "抱怨" "感想" "爆卦" "食記" "廣宣"

Volume of each category.

View by FamilyMart and 7-11.

Main Text of the Articles

See what’s the frequently discussed keywords in the articles.

## <<DocumentTermMatrix (documents: 1000, terms: 886)>>
## Non-/sparse entries: 30225/855775
## Sparsity           : 97%
## Maximal term length: 9
## Weighting          : term frequency (tf)

Volume does not necessarily mean posive opions

Topic Models

## Warning in eattrs[[name]][index] <- value: number of items to replace is
## not a multiple of replacement length

Topics

Topic 9 Topic 16 Topic 20 Topic 21 Topic 23
咖啡 原價 葡萄 客人 店長
使用 買一送一 芒果 剛剛 男童
優惠 單獨 限定 可是 網友
禮券 鳳梨 冰棒 同事 新聞
寄杯 line 汽水 每次 影片
期限 限時 蜂蜜 突然 小孩

We list some of those topics by their keyword:

Topic 9:咖啡優惠寄杯

Topic 16:買一送一、集點等促銷活動

Topic 20:水果口味限定商品討論

Topic 21:店員與客人互動情形

Topic 23:店長掐暈男童事件

Cloud Report