author: Uros Godnov
date: 2014-09-18
Universitiy of Primorska
Faculty of management
Department of Information Science
Data mining
Crude dataset from tm package
Example(first document in the crude corpus)
library(tm)
data("crude")
meta(crude[[1]], type = "corpus")
Metadata:
author : character(0)
datetimestamp: 1987-02-26 17:00:56
description :
heading : DIAMOND SHAMROCK (DIA) CUTS CRUDE PRICES
id : 127
language : en
origin : Reuters-21578 XML
topics : YES
lewissplit : TRAIN
cgisplit : TRAINING-SET
oldid : 5670
places : usa
people : character(0)
orgs : character(0)
exchanges : character(0)
Inspecting matrix (1:5 rows and 2:5 columns)
We can see that the word “demand” is present in a document with id=144.
dtm <- DocumentTermMatrix(crude)
inspect(dtm[1:5, 2:5])
<<DocumentTermMatrix (documents: 5, terms: 4)>>
Non-/sparse entries: 1/19
Sparsity : 95%
Maximal term length: 10
Weighting : term frequency (tf)
Terms
Docs "demand "expansion "for "growth
127 0 0 0 0
144 1 0 0 0
191 0 0 0 0
194 0 0 0 0
211 0 0 0 0
Text mining is a future.
And R is a future.
Let me finish with a joke:
Keep calm and let the professor handle it :).