Lecture3: 基本統計量, Manipulate関数

作業ディレクトリの確認

getwd()

テキストの頻度表作成

txt<-readLines("shiny.txt")
wordLst<-strsplit(txt,"[[:space:]]|[[:punct:]]")
wordLst<-unlist(wordLst)
wordLst<-tolower(wordLst)
wordLst<- wordLst[wordLst != ""]
wordLst
##  [1] "shiny"       "is"          "an"          "r"           "package"    
##  [6] "that"        "makes"       "it"          "easy"        "to"         
## [11] "build"       "interactive" "web"         "apps"        "straight"   
## [16] "from"        "r"           "you"         "can"         "host"       
## [21] "standalone"  "apps"        "on"          "a"           "webpage"    
## [26] "or"          "embed"       "them"        "in"          "r"          
## [31] "markdown"    "documents"   "or"          "build"       "dashboards" 
## [36] "you"         "can"         "also"        "extend"      "your"       
## [41] "shiny"       "apps"        "with"        "css"         "themes"     
## [46] "htmlwidgets" "and"         "javascript"  "actions"

実習1:関数getWordListの作成

テキストファイル名を引数にして、単語の頻度数をテーブルで出力する関数(関数ファイル名:getWordList.R)を作成する。

getWordList<- function(fname) {
  txt<-readLines(fname)
  wordLst<-strsplit(txt,"[[:space:]]|[[:punct:]]")
  wordLst<-unlist(wordLst)
  wordLst<-tolower(wordLst)
  wordLst<- wordLst[wordLst != ""]
  wordLst<- sort(table(wordLst), decreasing=TRUE)
  return(wordLst)
}

実行結果

source("getWordList.R")
getWordList("shiny.txt")
## wordLst
##        apps           r       build         can          or       shiny 
##           3           3           2           2           2           2 
##         you           a     actions        also          an         and 
##           2           1           1           1           1           1 
##         css  dashboards   documents        easy       embed      extend 
##           1           1           1           1           1           1 
##        from        host htmlwidgets          in interactive          is 
##           1           1           1           1           1           1 
##          it  javascript       makes    markdown          on     package 
##           1           1           1           1           1           1 
##  standalone    straight        that        them      themes          to 
##           1           1           1           1           1           1 
##         web     webpage        with        your 
##           1           1           1           1

頻度数の集計(頻度順でソート)

freq<-sort(table(wordLst), decreasing=TRUE)

相対頻度数

全体を1としたときの出現率

relative <- freq / sum(freq)
## wordLst
##        apps           r       build         can          or       shiny 
##  0.06122449  0.06122449  0.04081633  0.04081633  0.04081633  0.04081633 
##         you           a     actions        also          an         and 
##  0.04081633  0.02040816  0.02040816  0.02040816  0.02040816  0.02040816 
##         css  dashboards   documents        easy       embed      extend 
##  0.02040816  0.02040816  0.02040816  0.02040816  0.02040816  0.02040816 
##        from        host htmlwidgets          in interactive          is 
##  0.02040816  0.02040816  0.02040816  0.02040816  0.02040816  0.02040816 
##          it  javascript       makes    markdown          on     package 
##  0.02040816  0.02040816  0.02040816  0.02040816  0.02040816  0.02040816 
##  standalone    straight        that        them      themes          to 
##  0.02040816  0.02040816  0.02040816  0.02040816  0.02040816  0.02040816 
##         web     webpage        with        your 
##  0.02040816  0.02040816  0.02040816  0.02040816
練習:小数点3桁で結果を出力
relative
## wordLst
##        apps           r       build         can          or       shiny 
##       0.061       0.061       0.041       0.041       0.041       0.041 
##         you           a     actions        also          an         and 
##       0.041       0.020       0.020       0.020       0.020       0.020 
##         css  dashboards   documents        easy       embed      extend 
##       0.020       0.020       0.020       0.020       0.020       0.020 
##        from        host htmlwidgets          in interactive          is 
##       0.020       0.020       0.020       0.020       0.020       0.020 
##          it  javascript       makes    markdown          on     package 
##       0.020       0.020       0.020       0.020       0.020       0.020 
##  standalone    straight        that        them      themes          to 
##       0.020       0.020       0.020       0.020       0.020       0.020 
##         web     webpage        with        your 
##       0.020       0.020       0.020       0.020

単語頻度数分布(色付き)

las: label style

colors = c("red", "blue", "green") 
barplot(freq, las=3,col=colors)

お遊戯1

plot(0,0,pch=8)

plot(0,0,pch=8,cex=5)

plot(0,0,pch=8,cex=5,col="red")

manipulate package

インタラクティブなプロット

library(manipulate)

お遊戯:色の選択

picker()関数

manipulate(plot(0,0,pch=8,cex=5,col=myColors), myColors=picker("red", "yellow", "green", "violet", "orange", "blue", "pink", "cyan") )

お遊戯:プロットマーカーの選択

picker()関数

manipulate(
  plot(0,0,pch=myMarkers,cex=5,col=myColors), myColors=picker("red", "yellow", "green", "violet", "orange", "blue", "pink", "cyan",initial="violet"),
  myMarkers=picker(1,2,3,4,5,6,7,8,initial="5")
)

お遊戯:プロットサイズの選択

picker()関数

manipulate(
  plot(0,0,pch=8,cex=mySize,col="blue"),
  mySize=slider(1,10,initial=5)
)

お遊戯: 文字の描画1

plot(0,0,type="n")
text(0,0, "R",cex=1,col="blue")

お遊戯編:色の選択picker()関数

manipulate関数のメイン文は{}で囲む

manipulate({
  plot(0,0,type="n")
  text(0,0, "R",cex=1,col=myColors)
}, 
  myColors=picker("red", "yellow", "green", "violet", "orange", "blue", "pink", "cyan") )

練習:プロットサイズの選択機能を追加してください

slider関数: 最小=1,最大=10, 初期値3


頻度テーブルをデータ型に変換

freqData <- data.frame(freq)
relativeData <- data.frame(relative)
##   wordLst Freq
## 1    apps    3
## 2       r    3
## 3   build    2
## 4     can    2
## 5      or    2
## 6   shiny    2
##   wordLst  Freq
## 1    apps 0.061
## 2       r 0.061
## 3   build 0.041
## 4     can 0.041
## 5      or 0.041
## 6   shiny 0.041

2つのデータ型変数を連結(merge)

freqMtx <- merge(freqData, relativeData, all=T, by="wordLst")
##   wordLst Freq.x Freq.y
## 1       a      1  0.020
## 2 actions      1  0.020
## 3    also      1  0.020
## 4      an      1  0.020
## 5     and      1  0.020
## 6    apps      3  0.061

列に名前をつける

names(freqMtx) <- c("term","raw", "relative")
##      term raw relative
## 1       a   1    0.020
## 2 actions   1    0.020
## 3    also   1    0.020
## 4      an   1    0.020
## 5     and   1    0.020
## 6    apps   3    0.061

csvファイルに出力

write.csv(freqMtx,"shiny_freq.csv")

予告編:同一ディレクトリの複数フォルダから出現単語行列を作成する

ディレクトリ名

dirName <- "testData"

指定ディレクトリのファイル一覧を取得

files <- list.files(dirName)
files
## [1] "test1.txt" "test2.txt" "test3.txt"

作業ディレクトリからのファイルの相対参照パスを作成

filesDir <- unlist(lapply(dirName, paste, files, sep = "/"))
filesDir
## [1] "testData/test1.txt" "testData/test2.txt" "testData/test3.txt"

ファイルの読み込み

getWordList(filesDir[1])

実習2:関数getFreqMtxの作成

テキストファイル名を引数にして、単語の頻度数と相対頻度をマージした行列データを出力する関数(関数ファイル名:getFreqMtx.R)を作成しなさい。

実行結果出力イメージ

source("getFreqMtx.R")
res<-getFreqMtx("shiny.txt")
head(res)
##      term raw relative
## 1       a   1    0.020
## 2 actions   1    0.020
## 3    also   1    0.020
## 4      an   1    0.020
## 5     and   1    0.020
## 6    apps   3    0.061