How to…search analysis(USA)

We frequently query “how to ….” on Google, so here is analysis which “how to” query we do the most.

To get “how to query”, we use Gtrends R package. We query “how to ..” term for the year 2012 to 2017 and showing most searched terms using wordcloud.

library(data.table)
# Writing function to display wordcloud

getYearTrends <- function(timeline)
{

  HowTo2017<-gtrends("How to", geo="US", time = timeline)
  HowTo2017<-HowTo2017$related_queries
  return (HowTo2017)

}

gTrendswordcloud <- function(timeline)
{

  #HowTo2017<-gtrends("How to", geo="US", time = timeline)
  HowTo2017<-getYearTrends(timeline)
  HowTo2017$subjectNew<-gsub('%','',HowTo2017$subject)
  HowTo2017[which(HowTo2017[,7]=='<1', arr.ind=TRUE), 7] <-0
  HowTo2017[which(HowTo2017[,7]=='Breakout', arr.ind=TRUE), 7] <-9999
  HowTo2017$subjectNew<-as.numeric(gsub(',','',HowTo2017$subjectNew))
  HowTo2017$subjectNew<-as.numeric(HowTo2017$subjectNew)
  max<-max(subset(HowTo2017, related_queries == 'rising')$subjectNew)
  HowTo2017rising<-subset(HowTo2017,related_queries=='rising')
  HowTo2017rising$subjectNew<-subset(HowTo2017rising,related_queries=='rising')$subjectNew*100/max
  HowTo2017risingTop<-subset(HowTo2017,related_queries=='top')
  HowTo2017<-rbind(HowTo2017rising ,HowTo2017risingTop)
  HowTo2017$subjectNew<-as.integer(HowTo2017$subjectNew)
  HowTo2017Top<-subset(HowTo2017,as.numeric(HowTo2017$subjectNew)>1)
  HowTo2017Top<-subset(HowTo2017Top, select=c("value", "subjectNew"))
  HowTo2017Top$subjectNew<-as.numeric(HowTo2017Top$subjectNew)
  colnames(HowTo2017Top)[2]<-"freq"
  HowTo2017Top<-HowTo2017Top[order(-HowTo2017Top$freq),]
  wordcloud2(data = HowTo2017Top)

}

#gTrendswordcloud("2017-01-01 2017-12-31")
2012 2013 2014 2015 2016 2017

Make GIF image

Make GIF using ezgif.com

Result shows most searched “How to..” queries on Google. The clouds give greater prominence to words that appear more frequently.

library(data.table)
tr2012<-getYearTrends("2012-01-01 2012-12-31")
tr2013<-getYearTrends("2013-01-01 2013-12-31")
tr2014<-getYearTrends("2014-01-01 2014-12-31")
tr2015<-getYearTrends("2015-01-01 2015-12-31")
tr2016<-getYearTrends("2016-01-01 2016-12-31")
tr2017<-getYearTrends("2017-01-01 2017-12-31")

all<-rbind(tr2012,tr2013,tr2014,tr2015,tr2016,tr2017)

#plot_ly(x = df$Var1,y = df$Freq,name = "SF Zoo",type = "bar")

How to queries asked since more than four years

df<-as.data.frame(table(all$value))
df<-subset(df, df$Freq > 4)
df[order(-df$Freq),] 
##                           Var1 Freq
## 10            how to boil eggs    6
## 70   how to solve a rubix cube    6
## 74            how to tie a tie    6
## 75    how to train your dragon    6
## 26                 how to draw    5
## 34 how to get away with murder    5
kable(df, "html") %>%
  kable_styling(bootstrap_options = "striped", full_width = F)
Var1 Freq
10 how to boil eggs 6
26 how to draw 5
34 how to get away with murder 5
70 how to solve a rubix cube 6
74 how to tie a tie 6
75 how to train your dragon 6

How to queries asked only in the year 2012

get2012uniqueQueries <- function(){
allExcept2012<-rbind(tr2013,tr2014,tr2015,tr2016,tr2017)
only2012<-setDT(tr2012)[!allExcept2012, on="value"]
only2012df<-as.data.frame(only2012$value)
colnames(only2012df)="Year 2012 Unique queries"
only2012df<-unique(only2012df)
kable(only2012df, "html") %>%
  kable_styling(bootstrap_options = "striped", full_width = F)
}

How to queries asked only in the year 2016

get2016uniqueQueries <- function(){
allExcept2016<-rbind(tr2012,tr2013,tr2014,tr2015,tr2017)
only2016<-setDT(tr2016)[!allExcept2016, on="value"]
only2016df<-as.data.frame(only2016$value)
colnames(only2016df)="Year 2016 Unique queries"
only2016df<-unique(only2016df)
kable(only2016df, "html") %>%
  kable_styling(bootstrap_options = "striped", full_width = F)
}

How to queries asked only in the year 2017

get2017uniqueQueries <- function(){
allExcept2017<-rbind(tr2012,tr2013,tr2014,tr2015,tr2016)
only2017<-setDT(tr2017)[!allExcept2017, on="value"]

only2017df<-as.data.frame(only2017$value)
colnames(only2017df)="Year 2017 Unique queries"
only2017df<-unique(only2017df)
kable(only2017df, "html") %>%
  kable_styling(bootstrap_options = "striped", full_width = F)
}
Year 2012 Unique queries
how to make out
how to cite
how to make a bow
how to tie a bow tie
how to unlock iphone 4
how to jailbreak iphone 4s
how to make moonshine
how to deactivate facebook
how to get rid of blackheads
how to get rid of hickeys
how to fall asleep fast
how to make guacamole
how to do a waterfall braid
Year 2016 Unique queries
1 how to pronounce
2 how to vote
3 how to be single
4 how to register to vote
5 how to get rid of acne
6 how long to cook a turkey
7 how to use snapchat
8 how to play pokemon go
9 how to make pancakes
10 how to draw a rose
11 how to cook asparagus
13 how to battle pokemon go
Year 2017 Unique queries
how to screenshot
how to factor
how to gain weight
how to cook rice
how to make slime without glue
how to buy bitcoin
how to make fluffy slime
how to get rid of fruit flies
how to make scrambled eggs
how to watch mayweather vs mcgregor

Summary

  • Good to see “how to vote” & “how to register to vote” in year 2016, as that year being an election year.
  • “how to tie a tie”,“how to boil eggs”,“how to draw” queries we keep asking all these years.

Unique How to..search(World): Editor’s choice.

Here are interesting queries in last 12 months.

howToData <- read.csv("https://raw.githubusercontent.com/chirag-vithlani/Capstone/master/data/How_to_Interesting.csv")
howToDataSubSet<-subset(howToData, select = c(1, 4))
colnames(howToDataSubSet)[2] <- "Country"

kable(howToDataSubSet, "html") %>%
  kable_styling(bootstrap_options = "striped", full_width = F)
Topic Country
how to make paper flowers Bhutan
how to take pictures of northern lights Iceland
how to become good teacher India
how to get twins Kenya
how to hack facebook Myanmar
how to make carrot oil Nigeria
how to handle wife Pakistan
how to identify AIDS Sri Lanka
how science is trying to help us eat better Israel
how to make solar system Jamaica
how to make a girl like you Solomon Islands
#Create dataframe with toy data:
  LAND_ISO <- howToData$Country

  value <- howToData$val
 topic<-howToData$Topic
  data <- data.frame(LAND_ISO, value,topic)


g <- list(scope = 'world')

plot_geo(data) %>%
  add_trace(
    z = ~value, locations = ~LAND_ISO, colors = c(Pass="yellow", High="red", Low= "cyan", Sigma= "magenta", Mean='limegreen', Fail="blue", Median="violet"),text = ~paste(howToData$Topic)
  ) %>%
  
  layout(geo = g)%>% hide_colorbar()

How to handle wife

Out of above unique “How to” queries, I found “How to handle wife” quite funny and serious at the same time. It points out gender inequality and wherever we see such query, I expect that location to have high gender inequality. So here we are finding top five such countries.

howToHandleWifeSearch<-gtrends("how to handle wife", time = "today 12-m")

howToHandleWifeSearchHead<-head(howToHandleWifeSearch$interest_by_country,5)
howToHandleWifeSearchHead<-subset(howToHandleWifeSearchHead, select = c(1, 2))
colnames(howToHandleWifeSearchHead)[2] <- "Percentage of Hits"


kable(howToHandleWifeSearchHead, "html") %>%
  kable_styling(bootstrap_options = "striped", full_width = F)
location Percentage of Hits
Pakistan 100
Sri Lanka 84
United Arab Emirates 64
India 43
Bangladesh 33

Northern Lights

It is a natural light display in the Earth’s sky, predominantly seen in the high-latitude regions like Iceland. That is the reason people from Iceland search “how to take pictures of northern lights”. This was the most amazing thing to know while working on this project.


Source : Wikipedia


How to spell & pronounce

Google Trends is a unique and useful tool we can use to keep track of what people want to know about. When people do Google search; they misspell words and collectively Google can tell which words are misspelled more frequently. Similarly people also use google when they want to know the spelling of some complex word with query “how to spell..”.

Misspelled words

There is article with title “Google reveals top ‘how to spell’ searches by Canadian province” which gave interesting analysis of which words are misspelled by Canadians. Similar data analysis for United States given by Google trends twitter handle.

Story behind “how to spell sound of sniff”

If we query “how to spell” right now then top Google trends result in United States is “how to spell the sound of a sniff” which is strange result but there is story behind it. Macaroni Tony with Twitter handle @BigBeard_Ali, who is followed by 14K users tweeted below on 11 Feb 2018 .

This tweet had around 6K re-tweets and 10K likes and this led people to search for “how to spell sound of sniff”. I was able to see that this query picked only after this tweet. It is amazing that only one tweet with small amount of prize money can have such ripple effect.

How to pronounce searches in United States

Inspired with above analysis, I decided to do how to pronounce analysis. It would be fun to understand which are the words people find difficult to pronounce.

state_code top_query breakout_query
AL how to pronounce gif how to pronounce gif
AZ how to pronounce acai how to pronounce gif
CA how do you pronounce how to pronounce khalid
CO how to pronounce pho how to pronounce qatar
CT how to pronounce gif how to pronounce acai
FL how do you pronounce how to pronounce xxxtentacion
GA how to pronounce gyro how to pronounce sza
HI how to pronounce acai how to pronounce acai
IL how to pronounce names how to pronounce sza
IN how to pronounce gif how to pronounce acai
IA how to pronounce gif how to pronounce gif
KS how to pronounce gyro how to pronounce gif
MD how to pronounce acai how to pronounce acai
MA how to pronounce gyro how to pronounce nguyen
MI how to pronounce gyro how to pronounce sza
MN how to pronounce names how to pronounce qatar
MS how to pronounce gyro how to pronounce gyro
MO how to pronounce acai how to pronounce sza
NV how to pronounce nevada how to pronounce gyro
NJ how to pronounce acai how to pronounce nguyen
NY how do you pronounce how to pronounce xxxtentacion
NC how to pronounce names how to pronounce pho
OH how to pronounce gyro how to pronounce sza
OR how to pronounce gyro how to pronounce gyro
PA how to pronounce gyro how to pronounce sza
TX how do you pronounce how to pronounce khalid
UT how to pronounce gif how to pronounce gif
VA how to pronounce names how to pronounce pho
WA how to pronounce gif how to pronounce sza
WI how to pronounce gif how to pronounce gif
DC how to pronounce gyro how to pronounce gyro

In last twelve months, most difficult words people find difficult to pronounce are people’s names.

First is singer-songwriter named “Sza”
Followed by Wonder Woman fame “Gal Gadot”
Followed by American rapper “XXXTentacion

Below is state wise breakdown showing each state find which word difficult to pronounce.

This also contains food related items like ‘pho’, ‘gyro’ and acaí. Map is created through mapchart.net. At world stage top query is “how to pronounce pyeongchang” which hosted the 2018 Winter Olympics and the 2018 Winter Paralympics.


Quit smoking

Each new year we all make resolutions and we keep making them each year. We all search lose weight and quit smoking each year. These search queries peak around January each year.

trends_us = gtrends(c("quit smoking"), geo = c("US"), gprop = "web", time = "2015-01-01 2018-04-30")[[1]]

forcase <- trends_us[,c("date","hits")]
colnames(forcase) <-c("ds","y")
m <- prophet(forcase)
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## Initial log joint probability = -4.16404
## Optimization terminated normally: 
##   Convergence detected: relative gradient magnitude is below tolerance
future <- make_future_dataframe(m, periods = 365)
forecast <- predict(m, future)
plot(m, forecast)

prophet_plot_components(m, forecast)

Image of above shows the forecasting (using Facebook’s time series forecasting prophet package) of “quit smoking” query which shows this trend continuing in January 2019.Image after that shows same data in different spans. As we can overall trend is over the years is going down. Somehow we search about “quit smoking” more on weekdays.


Sons vs. Daughters : Gender Bias

In 21st century we would like to think that we treat boys and girls same. At least in United States - which is arguably one most the most liberal country in the world. we would like to think that american parents have similar standards and similar dreams for their sons and daughters and there will not be any gender bias. But study from Seth Stephens-Davidowitz suggests that is not the case. Google searches suggest parents have different concerns for male and female children. As he points out and same shown below with Google trends data that our parents expect ( may be unknowingly ) boys to be smarter and girls to be thinner. They are more excited by the intellectual potential of their sons and they are more concerned about the weight and appearance of their daughters.

giftedQuery<- function(query1,query2,timespan)
{
  query11<-gtrends(query1, geo="US", time=timespan)
  query11Avg<-mean(query11$interest_over_time$hits)
  
  query22<-gtrends(query2, geo="US", time=timespan)
  query22Avg<-mean(query22$interest_over_time$hits)
  
  
p <- plot_ly(
  x = c(query1,query2),
  y = c(query11Avg, query22Avg),
  name = "SF Zoo",
  type = "bar"
)%>%
  layout(title = 'Gender bias clearly visible',xaxis = list(title = paste(query1,query2,sep = " <b>v/s</b> ")),yaxis = list(title = 'Average of search'))
p
}

giftedQuery('is my daughter gifted','is my son gifted','today 12-m')
##2017-01-01 2017-12-31
giftedQuery('is my daughter overweight','is my son overweight','2017-01-01 2017-12-31') 

Giftedness

Mostly all parens like to believe that their kid has special talent. But as below graph shows people in US queried more “is my son gifted” than “is my daughter gifted”. This search difference is almost 50%

Unfortunately, this is despite the fact that that in real life this is exactly opposite. As David Walsh, an American psychologist, who specializes in parenting, points out as below “Girls talk earlier than boys, have larger pre-school vocabularies, and use more complex sentence structures. Once in school, girls are one to one-and-a-half years ahead of boys in reading and writing. Boys are twice as likely to have a language or reading problem and three to four times more likely to stutter. Girls do better on tests of verbal memory, spelling and verbal fluency. On average, girls utter two to three times more words per day than boys and even speak faster—twice as many words per minute.”

Overweight

Childhood obesity has undoubtedly become one of the most complex public health problems facing future generations. Parents search “Is my daughter overweight?” almost twice as frequently as they search “Is my son overweight?”. Clearly parents worry more about overweight girls than overweight boys. This is despite the fact that trends shows - in reality boys are more overweight than girls.


What pregnant Women Want

Unlike “what women want” which no one can claim to know about,we think we would have good idea “What pregnant Women Want”. It would be the cravings for pickles and chocolate, the avoidance of wine, the nausea and stretch marks, and good supporting husband. It looks like Google trends can put more light on this question and surely we can expect different answers from different parts of the world. We can start with questions about “what pregnant women can do” safely. The top questions women ask in United States are: Can pregnant women “eat shrimp,tuna,fish or crab”, “drink wine,coffee or tea” or “take Tylenol”? There are (comparatively) less frequently asked questions too like “fly”, “take bath” or “paint”.

Can pregnant women eat ____?

pregnantWomenEat<- function(query1,geo1,time1)
{
  query11<-gtrends(query1, geo=geo1, time=time1)
  all<-subset(query11$related_queries,query11$related_queries$related_queries=='top')$value
  #all1<-replace(all,"can pregnant women eat ","")
  
  num<-grep('^can pregnant women eat',all)
  all1<-all[num]
  all1<-gsub("can pregnant women eat", "", all1)
  all1
  all1<-head(all1,3)
  df1<-data.frame(all1)
  names(df1)<-geo1
  
#kable(df1, "html") %>%   kable_styling(bootstrap_options = "striped", full_width = F)

}

But if we inquire same question in other countries, they don’t look much like the United States or one another. Whether pregnant women can “drink wine” is not among the top 10 questions in Canada, Australia or United Kingdom. Australia’s unique concern is mostly related to eating cream cheese. The differences in questions have less to do with what is safe to do and more to do with information coming from different sources in each country including old stories,local custom and neighborhood trivial talk. We can see another clear difference when we look at the top searches for “how to ___ during pregnancy?” In the United States, the top search is (gain or lose) weight related queries whereas in India women are more concerned about how to sleep or weight of baby. In India it is illegal to find gender of baby before birth and that is why one of the top query is to know any trick about knowing gender. While the cultural manifestations of pregnancy may be different, the physical experience tends to be similar everywhere. I tested how often various symptoms were searched in combination with the word “pregnant.” For example, how often is “pregnant” searched in conjunction with “nausea,” “back pain” or “constipation”? Canada’s symptoms were very close to those in the United States. Symptoms in countries like Britain, Australia and India were all roughly similar, too. Preliminary evidence suggests that no part of the world has stumbled upon a diet or environment that drastically reduces a pregnancy symptom. We can extend this analysis and check what expectant fathers are searching for. In the United States, the top searches include “be nice to me my wife is pregnant shirt”

How to____during pregnancy

howTopregnantWomen<- function(query1,geo1)
{
 query11<-gtrends(query1, geo=geo1, time='2017-01-01 2017-12-31')
  all<-subset(query11$related_queries,query11$related_queries$related_queries=='top')$value
  
  num<-grep('during pregnancy',all)
  all1<-all[num]
  num<-grep('^how to',all1)
  all2<-all1[num]
  all3<-gsub("how to ", "", all2)

  all4<-gsub("during pregnancy","",all3)
  all5<-head(all4,7)
  df1<-data.frame(all5)
  names(df1)<-geo1
  
kable(df1, "html") %>%
  kable_styling(bootstrap_options = "striped", full_width = F)

}
US
gain weight
sleep
lose weight
reduce swelling
prevent stretch marks
not gain weight
relieve back pain
IN
sleep
know baby gender
know boy or girl
gain weight
know gender of baby
increase weight
take care

Conclusion

Philosophers speculated about a tool called “cerebroscope,” a mythical device that would display a person’s thoughts on a screen, people have been looking for tools to expose the workings of human nature. Google Trends data is one of such tool which is an anonymous, categorized, and unbiased sample of Google search data. It tracks trillions of searches per year,making it one of the most useful, real-time data indicators of human interest by region and category. Google Trends is most often used to understand brand health and monitor changes in consumer interests along competitive metrics and factors such as seasonality. Search engine query data offer insights into our life on the smallest possible scale of individual actions. In order to investigate whether Internet search volume is correlated with another aspect of our life, I used search volume data provided by the search engine Google. I tried to focus on the incredible amount of information about the localized behavior we can get from Google Trends. We got answers to questions we never asked from people we never considered. On top of that, we got information on historic behavior - we can’t ask panels how they felt many years ago!

One of the reasons why I preferred to use Google Trends as my source for information instead of the standard surveys or focus groups is the fact that we are leveraging the largest panel in the world (the internet). It’s honest, trusted and not influenced/skewed. Google Trends is arguably the best publicly available data we have. I say more trusted because it is default search engine we all use. If people were using the different search engine then this kind of analysis would have been difficult. I admire Google as a company for keeping this kind of data publicly available for free. Taking the meaning of open [ https://googleblog.blogspot.com/2009/12/meaning-of-open.html ] to next level. Due to this openness, anyone can do analysis with one’s own interest. Ideas can come from anyone. Data analysis is no longer restricted elite group of researchers and academias. Opportunities are endless. For many people, Google is more than just a simple search engine - it’s one of their closest confidants. The evidence is provided by Jeremy Ginsberg that Google Trends data can be used to track influenza-like illness in a population. Because the relative frequency of certain queries is highly correlated with the percentage of physician visits in which a patient presents with influenza-like symptoms, an estimate of weekly influenza activity can be reported.

This kind of data analysis can answer taboo subjects like what percent of American men are gay? or issues related to child sexual exploitation or child abuse can be analyzed more reliably because the Internet is the first thing we reach out. The advantage of this data source, of course, is that most people are making these searches in private. Google Trends offer an unprecedented peek into people’s psyches. people can unburden themselves of some wish or fear without a real person reacting in dismay. Most important thing is we should ask the right questions.