Andrei Pazniak
2016-05-31
Analyse of algorithm implementation:
Preprocess of input text is required. The following functions will help to build best results
Example:
transFuncs<- list(removePunctuation, stripWhitespace, removeNumbers, content_transformer( function(x) iconv(enc2utf8(x), sub = " ")),
content_transformer(tolower))
corpus <- tm_map(corpus, tm_reduce, tmFuns = transFuncs)
Preprocessing steps
| type | CompressedSize | Size |
|---|---|---|
| Input Data | 261 | 509 |
| Frequencies table | 25 | 118 |
| PerformanceFor | AverageTime | Attemps | MinMemory |
|---|---|---|---|
| Kneser-Ney Smoothing | 90 ms | 100 | 118 MB |
| Naive with Good-Turing Smoothing | 420 ms | 100 | 118 MB |
Steps to improve the model: