Lecture9: Collocation, TwitteR

Collocation

RMecabに便利な関数が用意されている

ディレクトリ内のファイル名を取得

dirName="univ"
files <- list.files(dirName)
filesDir <- unlist(lapply(dirName, paste, files, sep = "/"))
filesDir

## [1] "univ/hiroshima.txt" "univ/kufs.txt"      "univ/kyoto.txt"    
## [4] "univ/osaka1.txt"    "univ/osaka2.txt"    "univ/osaka3.txt"   
## [7] "univ/tokyo.txt"     "univ/waseda.txt"

1つのファイルを読み込む(単語単位): scan()

　what = character() or what = “char”でも同じ

filename <- filesDir[1]
txt<-scan(filename, what="")

補足：1つのファイルを読み込む(行単位)

バックスラッシュ(): Option + ¥

scan(filename, what="", sep = "\n")

## [1] "Hiroshima University aims to be a world-class hub of education and research, to foster excellent human resources to contribute to the community, and developmentally expand science."                                                                                                                                                                        
## [2] "The main campus, covering 252 hectares, is located in Higashi-Hiroshima (Saijo), in a verdant area which is famous for sake brewing. Including campuses in Hiroshima, known as the International City of Peace and Culture, the University includes 11 faculties, 12 graduate schools, a research institute, a university hospital, and 11 attached schools."
## [3] "Hiroshima University's mission of ongoing growth, as based on the five principles, is to create new forms of knowledge, nurture well-rounded human beings, continue self-development, pursue international peace, and collaborate with the local, regional, and international community."                                                                    
## [4] "I'd like to welcome your questions and comments; please feel free to send them to the President's Office. Though it might not be possible for me to provide feedback for every question, I'd certainly be happy to direct them to university operations, etc. Thank you in advance for your continued support of Hiroshima University. "

文字検索

grep("univ",txt, ignore.case = T)

## [1]   2  62  73  80 156 169

grep("univ",txt, ignore.case = T, value=T)

## [1] "University"   "University"   "university"   "University's"
## [5] "university"   "University."

grep("ed$",txt, ignore.case = T, value=T)

## [1] "located"      "attached"     "based"        "well-rounded"
## [5] "continued"

中心語の位置情報

node <- "univ"
nodeLst <- grep(node,txt, ignore.case = T, value=T)
nodeIndex <- grep(node,txt, ignore.case = T)

周辺語の抽出

Left1 <- txt[nodeIndex-1]
Left2 <- txt[nodeIndex-2]
Right1 <- txt[nodeIndex+1]
Right2 <- txt[nodeIndex+2]

Collocation Matrix

collo <- cbind(Left2,Left1,nodeLst, Right1, Right2)

## Warning in cbind(Left2, Left1, nodeLst, Right1, Right2): number of rows of
## result is not a multiple of vector length (arg 1)

colnames(collo) <- c("L2","L1","node","R1","R2")
rownames(collo) <- rep(1:dim(collo)[1])
collo

##   L2           L1          node           R1            R2    
## 1 "Culture,"   "Hiroshima" "University"   "aims"        "to"  
## 2 "institute," "the"       "University"   "includes"    "11"  
## 3 "schools."   "a"         "university"   "hospital,"   "and" 
## 4 "them"       "Hiroshima" "University's" "mission"     "of"  
## 5 "of"         "to"        "university"   "operations," "etc."
## 6 "Culture,"   "Hiroshima" "University."  NA            NA

Shinyで実装

library(shiny)
runApp("shiny_apps/app_collocation1")

2つのファイルを読み込む(単語単位): scan()

　what = character() or what = “char”でも同じ

filenames <- c(filesDir[1],filesDir[8])
txt<-unlist(lapply(filenames, scan, what=""))

Shinyで実装

runApp("shiny_apps/app_collocation2")

Twitterアプリケーションの作成＆登録

OAuth 認証用

https://apps.twitter.com/

ROAuth, twitteRのインストール

  install.packages('twitteR')
　install.packages('ROAuth')

ROAuth, twitteRのインストールの読み込み

  library(twitteR)
  library(ROAuth)

twitteRからのOauth認証

cacert.pemをダウンロード

  download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")

認証情報(Twitterアプリケーション)

consr_key="***********"
consr_secrt="***********"
req_url ="https://api.twitter.com/oauth/request_token"
acs_url = "https://api.twitter.com/oauth/access_token"
auth_url="https://api.twitter.com/oauth/authorize"

cred<-OAuthFactory$new(consumerKey=consr_key,consumerSecret=consr_secrt,requestURL =req_url,accessURL = acs_url,authURL=auth_url)

handshake: twitterクライアント接続

cred$handshake(cainfo="cacert.pem")

認証情報取得

setup_twitter_oauth(consr_key, consr_secrt, acs_token, acs_token_sec)

Lecture9: Collocation, TwitteR

Collocation

RMecabに便利な関数が用意されている

ディレクトリ内のファイル名を取得

1つのファイルを読み込む(単語単位): scan()

what = character() or what = “char”でも同じ

補足：1つのファイルを読み込む(行単位)

バックスラッシュ(): Option + ¥

文字検索

中心語の位置情報

周辺語の抽出

Collocation Matrix

Shinyで実装

2つのファイルを読み込む(単語単位): scan()

what = character() or what = “char”でも同じ

Shinyで実装

Twitterアプリケーションの作成＆登録

OAuth 認証用

ROAuth, twitteRのインストール

ROAuth, twitteRのインストールの読み込み

twitteRからのOauth認証

cacert.pemをダウンロード

認証情報(Twitterアプリケーション)

handshake: twitterクライアント接続

認証情報取得

検索例

twitteRアプリケーション(debug中)