Word Cloud Generator

Project for DDP course

Ambika J
Learner

A Word Cloud Generator App


Problem statement: Given any book, article, program, news, journal, white-paper, etc. user wants to visualise the high frequency words in a most effective way.

With a word cloud generator the high frequency words of any given text file are identified, highlighted and visually represented.


The key idea is to quickly build and download word cloud for any file you are interested in.


Very simple to use. You upload a file and tweak the settings; the app generates a downloadable word-cloud and frequency-table. Also, enables analyis of the 7 top frequencies, if interested.


Applications, features and future

Applications:

  1. Pitches and Presentations: Re-use the downloaded word-cloud in presentations, pitches, etc. which saves considerable amount of time.
  2. Further Data analysis: The computed frequency tables can further be used in data analysis. Example: Many data analysts use excel as tool to compute. For those users, its a handy tool.
  3. Target audience: Content analysts, data miners, publishers, search and social media.

Features and future:

  1. Word cloud is more effective than a line/bar/point plots; when analysis is related to words and its importance.
  2. Further analyse the top 7 frequencies.
  3. Key differentiator of this app is the flexibility to tweak 3 features and analyse the top 7 frequenices. In the next phase, we will monitor and understand the users need and add/modify/delete/automate a few features; making it more intuitive.
  4. Future upgrade of the app will involve the ability to analyse user behavior based on machine learning and predictive analysis. This is planned in phase2 and phase 3.

Visual Features

Word cloud

plot of chunk simple-plot

library(wordcloud); library(tm)
## word_count() computes word matrix
terms <- word_count(readLines(file))
col <- brewer.pal(8, "Dark2")
wordcloud(names(terms), terms, 
    rot.per = 0.35, colors = col)

An analysis of top 7 frequencies

NOTE: Issue with rCharts in slidify; overlap of options and legends. To solve, click stacked radio button.

Extendability

  1. This app can be extended to spam classification, do a word cloud for spam and ham.
  2. It can be further extended by grouping words; different search terms and tags, and build a new model.
  3. Analysis of twitter data, can dive deeper into this as well.
  4. Further, extend to project metadata and tags in word cloud form.
  5. We would not limit it to only frequencies, but add weighted words as well and build it to use in prediction logic.

References

  1. Shiny apps gallery
  2. Download references are from a blog by user 'TrigonaMinima'

Apphttps://neo-r-apps.shinyapps.io/word_cloud_gen