Data analysis using Google Trends

Google searches are one of the most important datasets ever collected. This is not only the tool to search or get answers but also great mean to understand people around the world. It is digital gold mine which can unravel much unknown information.

Google Trends data is an unbiased sample of our Google search data. It’s anonymized (no one is personally identified), categorized (determining the topic for a search query) and aggregated (grouped together). This allows us to measure interest in a particular topic across search, from around the globe, right down to city-level geography

Objective

Below is a report on “Analysis of Google Trends data”. I gathered Google Trends data different ways and then tried to get meaning out it. Initially, Google search data didn’t seem to be a proper source of information for “serious” research. But as we will see, trails we leave as we try to query on the Internet are extremely revealing. The idea of this project is to analyze multiple topics and some places merge it with other external data.

Data gathering

Although data from Google Trends is nicely formatted and easily available. There is no official API available to query this data. An API to accompany the Google Trends service was announced by Marissa Mayer, then vice president of search-products and user experience at Google. This was announced in 2007, and so far has not been released. Philippe Massicotte has written unofficial gTrendsR API to get this data and package is maintained by him. Fortunately, a package is updated but the documentation is not,

Bitcoin price v/s “Bitcoin” search

Analysing Bitcoin price movement with Google search for term “Bitcoin”

Step 1: Get bitCoin price history

Go to https://charts.bitcoin.com/chart/price
Click on Tools button
Download CSV

#reading 
bitcoinPrice <- read.csv("https://raw.githubusercontent.com/chirag-vithlani/Capstone/master/data/bitcoin/BitCoin_price_updated.csv")

#Formatting date
bitcoinPrice$newdate<-as.Date(bitcoinPrice$Date, "%m/%d/%Y")

#Price started moving only after year 2016, so reading data only from year 2016
newbitcoinPrice<-subset(bitcoinPrice, newdate > as.Date("2016-01-01"))
#converting price movement to percentage, as google search shows relative index value
newbitcoinPrice$pricePercent<-(newbitcoinPrice$Bitcoin.Price*100)/max(newbitcoinPrice$Bitcoin.Price)
head(newbitcoinPrice,2)

##          Date Bitcoin.Price    newdate pricePercent
## 1828 1/2/2016        433.36 2016-01-02     2.257822
## 1829 1/3/2016        433.36 2016-01-03     2.257822

Step 2: Get Bitcoin Google trends data

Go to https://trends.google.com/trends/explore?date=all&geo=US&q=bitcoin
Download CSV file from there

bitcoinTrends <- read.csv("https://raw.githubusercontent.com/chirag-vithlani/Capstone/master/data/bitcoin/BitCoin_Trends_updated.csv")

#Formatting date
bitcoinTrends$newdate<-as.Date(bitcoinTrends$Week, "%m/%d/%Y")

head(bitcoinTrends,2)

##        Week Bitcoin...United.States.    newdate
## 1  1/3/2016                        3 2016-01-03
## 2 1/10/2016                        2 2016-01-10

bitCoinPriceAndTrends<-merge(bitcoinTrends,newbitcoinPrice)

head(bitCoinPriceAndTrends,3)

##      newdate      Week Bitcoin...United.States.      Date Bitcoin.Price
## 1 2016-01-03  1/3/2016                        3  1/3/2016        433.36
## 2 2016-01-10 1/10/2016                        2 1/10/2016        447.46
## 3 2016-01-17 1/17/2016                        2 1/17/2016        381.85
##   pricePercent
## 1     2.257822
## 2     2.331283
## 3     1.989453

Step 3: Plotting

Plotting Bitcoin search Trends v/s price

Below is Bitcoin price and Bitcoin search trends comparison using line chart. It seems there is a correlation between how much people search about Bitcoin and the price of Bitcoin. Also, it shows around December-January the price and searches for Bitcoin was highest. It doesn’t mean more people search for Bitcoin; more its value will be. But reverse might be true.

m <- list(
  l = 50,
  r = 50,
  b = 100,
  t = 100,
  pad = 4
)


plot_ly(x = ~bitCoinPriceAndTrends$newdate) %>%
add_lines(y = ~bitCoinPriceAndTrends$pricePercent, name = "Actual Price Percentage", line = list(shape = "Actual Price Percentage")) %>%
add_lines(y = ~bitCoinPriceAndTrends$Bitcoin...United.States., name = "Google Bitcoin Serach Trends", line = list(shape = "Google Bitcoin Search Trends"))%>%
layout(autosize = F, width = 1022, height = 500, margin = m,font = list(family = "\"Droid Sans\", sans-serif"),
title = "Google Bitcoin Serach Trends Vs Bitcoin Price Percentage Change",xaxis = list(title = "Timeline"),yaxis = list(title = "Bitcoin Percentage"))

Summary

As we can see search for Bitcoin and price both picked around last week of December 2017 and starting of January 2018. Although It’s not a perfect indicator Google Trends sometimes lags and sometimes leads bitcoin’ price.

We can say there’s a strong correlation between bitcoin’s price and the performance of the search term “Bitcoin” on Google. Maybe price follows interest, and therefore, more buyers and greater search volume. If that is the case Trends chart shows a reversal to the uptrend for interest in Bitcoin.

Movie search trends v/s Lifetime Gross

Next I choose to analyze academy awards nominated movies. I wanted to compare is there some pattern in how people search about movies and the business that movie does. So I chose below four movies to do analysis.I got search Google search data from trends tool and the data of business that movie did from BoxOfficeMojo.com

Summary

As we can see “Shape of water” is highest searched movie among all four and that is also movie which had highest gross business.Second highest searched movie is “Three Billboards Outside Ebbing, Missouri” whereas second most earner was movie “Darkest Hour”. Least searched movie among four is “Lady Bird” and that is the one which did least business. so looks like Google search does reasonably good job with choosing first and last winner.

How to…search analysis(USA)

We frequently query “how to ….” on Google, so here is analysis which “how to” query we do the most.

To get “how to query”, we use Gtrends R package. We query “how to ..” term for the year 2012 to 2017 and showing most searched terms using wordcloud.

library(data.table)
# Writing function to display wordcloud

getYearTrends <- function(timeline)
{

  HowTo2017<-gtrends("How to", geo="US", time = timeline)
  HowTo2017<-HowTo2017$related_queries
  return (HowTo2017)

}

gTrendswordcloud <- function(timeline)
{

  #HowTo2017<-gtrends("How to", geo="US", time = timeline)
  HowTo2017<-getYearTrends(timeline)
  HowTo2017$subjectNew<-gsub('%','',HowTo2017$subject)
  HowTo2017[which(HowTo2017[,7]=='<1', arr.ind=TRUE), 7] <-0
  HowTo2017[which(HowTo2017[,7]=='Breakout', arr.ind=TRUE), 7] <-9999
  HowTo2017$subjectNew<-as.numeric(gsub(',','',HowTo2017$subjectNew))
  HowTo2017$subjectNew<-as.numeric(HowTo2017$subjectNew)
  max<-max(subset(HowTo2017, related_queries == 'rising')$subjectNew)
  HowTo2017rising<-subset(HowTo2017,related_queries=='rising')
  HowTo2017rising$subjectNew<-subset(HowTo2017rising,related_queries=='rising')$subjectNew*100/max
  HowTo2017risingTop<-subset(HowTo2017,related_queries=='top')
  HowTo2017<-rbind(HowTo2017rising ,HowTo2017risingTop)
  HowTo2017$subjectNew<-as.integer(HowTo2017$subjectNew)
  HowTo2017Top<-subset(HowTo2017,as.numeric(HowTo2017$subjectNew)>1)
  HowTo2017Top<-subset(HowTo2017Top, select=c("value", "subjectNew"))
  HowTo2017Top$subjectNew<-as.numeric(HowTo2017Top$subjectNew)
  colnames(HowTo2017Top)[2]<-"freq"
  HowTo2017Top<-HowTo2017Top[order(-HowTo2017Top$freq),]
  wordcloud2(data = HowTo2017Top)

}

#gTrendswordcloud("2017-01-01 2017-12-31")

2012	2013	2014	2015	2016	2017

Make GIF image

Make GIF using ezgif.com

Result shows most searched “How to..” queries on Google. The clouds give greater prominence to words that appear more frequently.

library(data.table)
tr2012<-getYearTrends("2012-01-01 2012-12-31")
tr2013<-getYearTrends("2013-01-01 2013-12-31")
tr2014<-getYearTrends("2014-01-01 2014-12-31")
tr2015<-getYearTrends("2015-01-01 2015-12-31")
tr2016<-getYearTrends("2016-01-01 2016-12-31")
tr2017<-getYearTrends("2017-01-01 2017-12-31")

all<-rbind(tr2012,tr2013,tr2014,tr2015,tr2016,tr2017)

#plot_ly(x = df$Var1,y = df$Freq,name = "SF Zoo",type = "bar")

How to queries asked since more than four years

df<-as.data.frame(table(all$value))
df<-subset(df, df$Freq > 4)
df[order(-df$Freq),]

##                           Var1 Freq
## 10            how to boil eggs    6
## 70   how to solve a rubix cube    6
## 74            how to tie a tie    6
## 75    how to train your dragon    6
## 26                 how to draw    5
## 34 how to get away with murder    5

kable(df, "html") %>%
  kable_styling(bootstrap_options = "striped", full_width = F)

	Var1	Freq
10	how to boil eggs	6
26	how to draw	5
34	how to get away with murder	5
70	how to solve a rubix cube	6
74	how to tie a tie	6
75	how to train your dragon	6

How to queries asked only in the year 2012

get2012uniqueQueries <- function(){
allExcept2012<-rbind(tr2013,tr2014,tr2015,tr2016,tr2017)
only2012<-setDT(tr2012)[!allExcept2012, on="value"]
only2012df<-as.data.frame(only2012$value)
colnames(only2012df)="Year 2012 Unique queries"
only2012df<-unique(only2012df)
kable(only2012df, "html") %>%
  kable_styling(bootstrap_options = "striped", full_width = F)
}

How to queries asked only in the year 2016

get2016uniqueQueries <- function(){
allExcept2016<-rbind(tr2012,tr2013,tr2014,tr2015,tr2017)
only2016<-setDT(tr2016)[!allExcept2016, on="value"]
only2016df<-as.data.frame(only2016$value)
colnames(only2016df)="Year 2016 Unique queries"
only2016df<-unique(only2016df)
kable(only2016df, "html") %>%
  kable_styling(bootstrap_options = "striped", full_width = F)
}

How to queries asked only in the year 2017

get2017uniqueQueries <- function(){
allExcept2017<-rbind(tr2012,tr2013,tr2014,tr2015,tr2016)
only2017<-setDT(tr2017)[!allExcept2017, on="value"]

only2017df<-as.data.frame(only2017$value)
colnames(only2017df)="Year 2017 Unique queries"
only2017df<-unique(only2017df)
kable(only2017df, "html") %>%
  kable_styling(bootstrap_options = "striped", full_width = F)
}

Year 2012 Unique queries
how to make out
how to cite
how to make a bow
how to tie a bow tie
how to unlock iphone 4
how to jailbreak iphone 4s
how to make moonshine
how to deactivate facebook
how to get rid of blackheads
how to get rid of hickeys
how to fall asleep fast
how to make guacamole
how to do a waterfall braid

	Year 2016 Unique queries
1	how to pronounce
2	how to vote
3	how to be single
4	how to register to vote
5	how to get rid of acne
6	how long to cook a turkey
7	how to use snapchat
8	how to play pokemon go
9	how to make pancakes
10	how to draw a rose
11	how to cook asparagus
13	how to battle pokemon go

Year 2017 Unique queries
how to screenshot
how to factor
how to gain weight
how to cook rice
how to make slime without glue
how to buy bitcoin
how to make fluffy slime
how to get rid of fruit flies
how to make scrambled eggs
how to watch mayweather vs mcgregor

Summary

Good to see “how to vote” & “how to register to vote” in year 2016, as that year being an election year.
“how to tie a tie”,“how to boil eggs”,“how to draw” queries we keep asking all these years.

Unique How to..search(World): Editor’s choice.

Here are interesting queries in last 12 months.

howToData <- read.csv("https://raw.githubusercontent.com/chirag-vithlani/Capstone/master/data/How_to_Interesting.csv")
howToDataSubSet<-subset(howToData, select = c(1, 4))
colnames(howToDataSubSet)[2] <- "Country"

kable(howToDataSubSet, "html") %>%
  kable_styling(bootstrap_options = "striped", full_width = F)

Topic	Country
how to make paper flowers	Bhutan
how to take pictures of northern lights	Iceland
how to become good teacher	India
how to get twins	Kenya
how to hack facebook	Myanmar
how to make carrot oil	Nigeria
how to handle wife	Pakistan
how to identify AIDS	Sri Lanka
how science is trying to help us eat better	Israel
how to make solar system	Jamaica
how to make a girl like you	Solomon Islands

#Create dataframe with toy data:
  LAND_ISO <- howToData$Country

  value <- howToData$val
 topic<-howToData$Topic
  data <- data.frame(LAND_ISO, value,topic)


g <- list(scope = 'world')

plot_geo(data) %>%
  add_trace(
    z = ~value, locations = ~LAND_ISO, colors = c(Pass="yellow", High="red", Low= "cyan", Sigma= "magenta", Mean='limegreen', Fail="blue", Median="violet"),text = ~paste(howToData$Topic)
  ) %>%
  
  layout(geo = g)%>% hide_colorbar()

How to handle wife

Out of above unique “How to” queries, I found “How to handle wife” quite funny and serious at the same time. It points out gender inequality and wherever we see such query, I expect that location to have high gender inequality. So here we are finding top five such countries.

howToHandleWifeSearch<-gtrends("how to handle wife", time = "today 12-m")

howToHandleWifeSearchHead<-head(howToHandleWifeSearch$interest_by_country,5)
howToHandleWifeSearchHead<-subset(howToHandleWifeSearchHead, select = c(1, 2))
colnames(howToHandleWifeSearchHead)[2] <- "Percentage of Hits"


kable(howToHandleWifeSearchHead, "html") %>%
  kable_styling(bootstrap_options = "striped", full_width = F)

location	Percentage of Hits
Pakistan	100
Sri Lanka	84
United Arab Emirates	64
India	43
Bangladesh	33

Northern Lights

It is a natural light display in the Earth’s sky, predominantly seen in the high-latitude regions like Iceland. That is the reason people from Iceland search “how to take pictures of northern lights”. This was the most amazing thing to know while working on this project.

Source : Wikipedia

How to spell & pronounce

Google Trends is a unique and useful tool we can use to keep track of what people want to know about. When people do Google search; they misspell words and collectively Google can tell which words are misspelled more frequently. Similarly people also use google when they want to know the spelling of some complex word with query “how to spell..”.

Misspelled words

There is article with title “Google reveals top ‘how to spell’ searches by Canadian province” which gave interesting analysis of which words are misspelled by Canadians. Similar data analysis for United States given by Google trends twitter handle.

ICYMI - here's our map of the most misspelled words in America #spellingbee

(corrected legend) pic.twitter.com/2w56NpDgGK
— GoogleTrends (@GoogleTrends) May 30, 2017

Story behind “how to spell sound of sniff”

If we query “how to spell” right now then top Google trends result in United States is “how to spell the sound of a sniff” which is strange result but there is story behind it. Macaroni Tony with Twitter handle @BigBeard_Ali, who is followed by 14K users tweeted below on 11 Feb 2018 .

I got $750 to anybody that can spell the sound of a sniff????????????
— Macaroni Tony (@BigBeard_Ali) February 11, 2018

This tweet had around 6K re-tweets and 10K likes and this led people to search for “how to spell sound of sniff”. I was able to see that this query picked only after this tweet. It is amazing that only one tweet with small amount of prize money can have such ripple effect.

How to pronounce searches in United States

Inspired with above analysis, I decided to do how to pronounce analysis. It would be fun to understand which are the words people find difficult to pronounce.

state_code	top_query	breakout_query
AL	how to pronounce gif	how to pronounce gif
AZ	how to pronounce acai	how to pronounce gif
CA	how do you pronounce	how to pronounce khalid
CO	how to pronounce pho	how to pronounce qatar
CT	how to pronounce gif	how to pronounce acai
FL	how do you pronounce	how to pronounce xxxtentacion
GA	how to pronounce gyro	how to pronounce sza
HI	how to pronounce acai	how to pronounce acai
IL	how to pronounce names	how to pronounce sza
IN	how to pronounce gif	how to pronounce acai
IA	how to pronounce gif	how to pronounce gif
KS	how to pronounce gyro	how to pronounce gif
MD	how to pronounce acai	how to pronounce acai
MA	how to pronounce gyro	how to pronounce nguyen
MI	how to pronounce gyro	how to pronounce sza
MN	how to pronounce names	how to pronounce qatar
MS	how to pronounce gyro	how to pronounce gyro
MO	how to pronounce acai	how to pronounce sza
NV	how to pronounce nevada	how to pronounce gyro
NJ	how to pronounce acai	how to pronounce nguyen
NY	how do you pronounce	how to pronounce xxxtentacion
NC	how to pronounce names	how to pronounce pho
OH	how to pronounce gyro	how to pronounce sza
OR	how to pronounce gyro	how to pronounce gyro
PA	how to pronounce gyro	how to pronounce sza
TX	how do you pronounce	how to pronounce khalid
UT	how to pronounce gif	how to pronounce gif
VA	how to pronounce names	how to pronounce pho
WA	how to pronounce gif	how to pronounce sza
WI	how to pronounce gif	how to pronounce gif
DC	how to pronounce gyro	how to pronounce gyro

In last twelve months, most difficult words people find difficult to pronounce are people’s names.

First is singer-songwriter named “Sza”
Followed by Wonder Woman fame “Gal Gadot”
Followed by American rapper “XXXTentacion

Below is state wise breakdown showing each state find which word difficult to pronounce.

This also contains food related items like ‘pho’, ‘gyro’ and acaí. Map is created through mapchart.net. At world stage top query is “how to pronounce pyeongchang” which hosted the 2018 Winter Olympics and the 2018 Winter Paralympics.

Quit smoking

Each new year we all make resolutions and we keep making them each year. We all search lose weight and quit smoking each year. These search queries peak around January each year.

trends_us = gtrends(c("quit smoking"), geo = c("US"), gprop = "web", time = "2015-01-01 2018-04-30")[[1]]

forcase <- trends_us[,c("date","hits")]
colnames(forcase) <-c("ds","y")
m <- prophet(forcase)

## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.

## Initial log joint probability = -4.16404
## Optimization terminated normally: 
##   Convergence detected: relative gradient magnitude is below tolerance

future <- make_future_dataframe(m, periods = 365)
forecast <- predict(m, future)
plot(m, forecast)

prophet_plot_components(m, forecast)

Image of above shows the forecasting (using Facebook’s time series forecasting prophet package) of “quit smoking” query which shows this trend continuing in January 2019.Image after that shows same data in different spans. As we can overall trend is over the years is going down. Somehow we search about “quit smoking” more on weekdays.

Sons vs. Daughters : Gender Bias

In 21st century we would like to think that we treat boys and girls same. At least in United States - which is arguably one most the most liberal country in the world. we would like to think that american parents have similar standards and similar dreams for their sons and daughters and there will not be any gender bias. But study from Seth Stephens-Davidowitz suggests that is not the case. Google searches suggest parents have different concerns for male and female children. As he points out and same shown below with Google trends data that our parents expect ( may be unknowingly ) boys to be smarter and girls to be thinner. They are more excited by the intellectual potential of their sons and they are more concerned about the weight and appearance of their daughters.

giftedQuery<- function(query1,query2,timespan)
{
  query11<-gtrends(query1, geo="US", time=timespan)
  query11Avg<-mean(query11$interest_over_time$hits)
  
  query22<-gtrends(query2, geo="US", time=timespan)
  query22Avg<-mean(query22$interest_over_time$hits)
  
  
p <- plot_ly(
  x = c(query1,query2),
  y = c(query11Avg, query22Avg),
  name = "SF Zoo",
  type = "bar"
)%>%
  layout(title = 'Gender bias clearly visible',xaxis = list(title = paste(query1,query2,sep = " <b>v/s</b> ")),yaxis = list(title = 'Average of search'))
p
}

giftedQuery('is my daughter gifted','is my son gifted','today 12-m')
##2017-01-01 2017-12-31
giftedQuery('is my daughter overweight','is my son overweight','2017-01-01 2017-12-31')

Giftedness

Mostly all parens like to believe that their kid has special talent. But as below graph shows people in US queried more “is my son gifted” than “is my daughter gifted”. This search difference is almost 50%

Unfortunately, this is despite the fact that that in real life this is exactly opposite. As David Walsh, an American psychologist, who specializes in parenting, points out as below “Girls talk earlier than boys, have larger pre-school vocabularies, and use more complex sentence structures. Once in school, girls are one to one-and-a-half years ahead of boys in reading and writing. Boys are twice as likely to have a language or reading problem and three to four times more likely to stutter. Girls do better on tests of verbal memory, spelling and verbal fluency. On average, girls utter two to three times more words per day than boys and even speak faster—twice as many words per minute.”

Overweight

Childhood obesity has undoubtedly become one of the most complex public health problems facing future generations. Parents search “Is my daughter overweight?” almost twice as frequently as they search “Is my son overweight?”. Clearly parents worry more about overweight girls than overweight boys. This is despite the fact that trends shows - in reality boys are more overweight than girls.

What pregnant Women Want

Unlike “what women want” which no one can claim to know about,we think we would have good idea “What pregnant Women Want”. It would be the cravings for pickles and chocolate, the avoidance of wine, the nausea and stretch marks, and good supporting husband. It looks like Google trends can put more light on this question and surely we can expect different answers from different parts of the world. We can start with questions about “what pregnant women can do” safely. The top questions women ask in United States are: Can pregnant women “eat shrimp,tuna,fish or crab”, “drink wine,coffee or tea” or “take Tylenol”? There are (comparatively) less frequently asked questions too like “fly”, “take bath” or “paint”.

Can pregnant women eat ____?

pregnantWomenEat<- function(query1,geo1,time1)
{
  query11<-gtrends(query1, geo=geo1, time=time1)
  all<-subset(query11$related_queries,query11$related_queries$related_queries=='top')$value
  #all1<-replace(all,"can pregnant women eat ","")
  
  num<-grep('^can pregnant women eat',all)
  all1<-all[num]
  all1<-gsub("can pregnant women eat", "", all1)
  all1
  all1<-head(all1,3)
  df1<-data.frame(all1)
  names(df1)<-geo1
  
#kable(df1, "html") %>%   kable_styling(bootstrap_options = "striped", full_width = F)

}

But if we inquire same question in other countries, they don’t look much like the United States or one another. Whether pregnant women can “drink wine” is not among the top 10 questions in Canada, Australia or United Kingdom. Australia’s unique concern is mostly related to eating cream cheese. The differences in questions have less to do with what is safe to do and more to do with information coming from different sources in each country including old stories,local custom and neighborhood trivial talk. We can see another clear difference when we look at the top searches for “how to ___ during pregnancy?” In the United States, the top search is (gain or lose) weight related queries whereas in India women are more concerned about how to sleep or weight of baby. In India it is illegal to find gender of baby before birth and that is why one of the top query is to know any trick about knowing gender. While the cultural manifestations of pregnancy may be different, the physical experience tends to be similar everywhere. I tested how often various symptoms were searched in combination with the word “pregnant.” For example, how often is “pregnant” searched in conjunction with “nausea,” “back pain” or “constipation”? Canada’s symptoms were very close to those in the United States. Symptoms in countries like Britain, Australia and India were all roughly similar, too. Preliminary evidence suggests that no part of the world has stumbled upon a diet or environment that drastically reduces a pregnancy symptom. We can extend this analysis and check what expectant fathers are searching for. In the United States, the top searches include “be nice to me my wife is pregnant shirt”

How to____during pregnancy

howTopregnantWomen<- function(query1,geo1)
{
 query11<-gtrends(query1, geo=geo1, time='2017-01-01 2017-12-31')
  all<-subset(query11$related_queries,query11$related_queries$related_queries=='top')$value
  
  num<-grep('during pregnancy',all)
  all1<-all[num]
  num<-grep('^how to',all1)
  all2<-all1[num]
  all3<-gsub("how to ", "", all2)

  all4<-gsub("during pregnancy","",all3)
  all5<-head(all4,7)
  df1<-data.frame(all5)
  names(df1)<-geo1
  
kable(df1, "html") %>%
  kable_styling(bootstrap_options = "striped", full_width = F)

}

US
gain weight
sleep
lose weight
reduce swelling
prevent stretch marks
not gain weight
relieve back pain

IN
sleep
know baby gender
know boy or girl
gain weight
know gender of baby
increase weight
take care

Conclusion

Philosophers speculated about a tool called “cerebroscope,” a mythical device that would display a person’s thoughts on a screen, people have been looking for tools to expose the workings of human nature. Google Trends data is one of such tool which is an anonymous, categorized, and unbiased sample of Google search data. It tracks trillions of searches per year,making it one of the most useful, real-time data indicators of human interest by region and category. Google Trends is most often used to understand brand health and monitor changes in consumer interests along competitive metrics and factors such as seasonality. Search engine query data offer insights into our life on the smallest possible scale of individual actions. In order to investigate whether Internet search volume is correlated with another aspect of our life, I used search volume data provided by the search engine Google. I tried to focus on the incredible amount of information about the localized behavior we can get from Google Trends. We got answers to questions we never asked from people we never considered. On top of that, we got information on historic behavior - we can’t ask panels how they felt many years ago!

One of the reasons why I preferred to use Google Trends as my source for information instead of the standard surveys or focus groups is the fact that we are leveraging the largest panel in the world (the internet). It’s honest, trusted and not influenced/skewed. Google Trends is arguably the best publicly available data we have. I say more trusted because it is default search engine we all use. If people were using the different search engine then this kind of analysis would have been difficult. I admire Google as a company for keeping this kind of data publicly available for free. Taking the meaning of open [ https://googleblog.blogspot.com/2009/12/meaning-of-open.html ] to next level. Due to this openness, anyone can do analysis with one’s own interest. Ideas can come from anyone. Data analysis is no longer restricted elite group of researchers and academias. Opportunities are endless. For many people, Google is more than just a simple search engine - it’s one of their closest confidants. The evidence is provided by Jeremy Ginsberg that Google Trends data can be used to track influenza-like illness in a population. Because the relative frequency of certain queries is highly correlated with the percentage of physician visits in which a patient presents with influenza-like symptoms, an estimate of weekly influenza activity can be reported.

This kind of data analysis can answer taboo subjects like what percent of American men are gay? or issues related to child sexual exploitation or child abuse can be analyzed more reliably because the Internet is the first thing we reach out. The advantage of this data source, of course, is that most people are making these searches in private. Google Trends offer an unprecedented peek into people’s psyches. people can unburden themselves of some wish or fear without a real person reacting in dismay. Most important thing is we should ask the right questions.

Analytics Master’s Research Project

Chirag Vithalani

March 18, 2018

Data analysis using Google Trends

Objective

Data gathering

Bitcoin price v/s “Bitcoin” search

Step 1: Get bitCoin price history

Step 2: Get Bitcoin Google trends data

Step 3: Plotting

Summary

Movie search trends v/s Lifetime Gross

Summary

How to…search analysis(USA)

Make GIF image

How to queries asked since more than four years

How to queries asked only in the year 2012

How to queries asked only in the year 2016

How to queries asked only in the year 2017

Summary

Unique How to..search(World): Editor’s choice.

How to handle wife

Northern Lights

How to spell & pronounce

Misspelled words

Story behind “how to spell sound of sniff”

How to pronounce searches in United States

Quit smoking

Sons vs. Daughters : Gender Bias

Giftedness

Overweight

What pregnant Women Want

Can pregnant women eat ____?

How to____during pregnancy

Conclusion