About this Notebook
The google search data on this notebook comes from a google account archive
The steps outlined here to collect and analyze the data may change at any time
Below are the steps to claim your google account data
Data Collection: Claiming your Google Search Data
1) Sign into your google account, then Go to:
2) Find the link to download your data archive or Go to:

3) Select all Google products to create a complete archive of your data

4) After selecting the products, choose the file type and max archive size to make sure that all your account data is archive

Data Analysis: Visualizing Google Searches
To get an overall idea of the search volume, we can plot searches by year
p <- ggplot(search_data, aes(year))
p + geom_bar()

After determine the years with the largest search volume we can plot monthly searches
monthly <- search_data[(search_data$year > 2014 & search_data$year< 2018), ]
ggplot(monthly) + geom_bar(aes(x = month, group = year)) +
theme(axis.text.x = element_text(angle=90)) +
facet_grid(.~year, scales="free")

Another interesting metrict is searches by Hour
p <- ggplot(search_data, aes(hour))
p + geom_bar()

We can also plot the search data by day of the week to determine day are the most active
p <- ggplot(search_data, aes(day))
p + geom_bar()

We can take it an step further and group search time with day of the week.
ggplot(search_data) +
geom_bar(aes(x = hour, group = day) ) +
facet_grid(.~day, scales = "free")

We can group the search data by year and day of the week, to visualize the overall trend
wkday <- group_by(search_data, year, day) %>% summarize(count = n())
p <- ggplot(wkday, aes(day, count, fill = year))
p + geom_bar(stat = "identity") + labs(x = "", y = "Search Volume")

Reporting: A Wordcloud from Google Search Data
First we need to extract the text and clean it using regular expressions
search <- tolower(search_data$search)
search <- iconv(search, "ASCII", "UTF-8", " ")
search <- gsub('(http|https)\\S+\\s*|(#|@)\\S+\\s*|\\n|\\"', " ", search)
search <- gsub("(.*.)\\.com(.*.)\\S+\\s|[^[:alnum:]]", " ", search)
search <- trimws(search)
After cleaning the text we can create a Text Corpus (a large and structured set of texts) and remove some words
search_corpus <- Corpus(VectorSource(search))
search_corpus <- tm_map(search_corpus, content_transformer(removePunctuation))
search_corpus <- tm_map(search_corpus, content_transformer(removeNumbers))
stopwords <- c(stopwords("english"), "chrome", "chicago", "jlroo", "google")
search_corpus <- tm_map(search_corpus, removeWords, stopwords)
Now from the corpus we need to create a Term Document Matrix in order to create word associations and a wordcloud
search_tdm <- TermDocumentMatrix(search_corpus)
search_matrix <- as.matrix(search_tdm)
Set a threshold for the min/max frequency of words to create the wordcloud
wordcloud(d$word, d$freq, min.freq = 50, scale = c(3 , 0.5), max.words = 200)

