library(tidyverse)
── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
✓ ggplot2 3.2.1     ✓ purrr   0.3.3
✓ tibble  2.1.3     ✓ dplyr   0.8.4
✓ tidyr   1.0.2     ✓ stringr 1.4.0
✓ readr   1.3.1     ✓ forcats 0.4.0
── Conflicts ────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter()  masks stats::filter()
x purrr::flatten() masks rtweet::flatten()
x dplyr::lag()     masks stats::lag()
library(DT)
library(trendyy)                # Package to access google search data
library(lubridate)               # Handles dates and times

Attaching package: ‘lubridate’

The following object is masked from ‘package:base’:

    date

The theory is that what people search for reveals their interests in a way that might be more honest than a survey.

There is a friendly web interface to google trends here: https://trends.google.com/

To start, let’s see how people search for the word “flu” with trendy(), and then glimpse() it.

flu <- trendy("flu")

flu %>% 
  get_interest() %>% 
  glimpse()
Observations: 260
Variables: 7
$ date     <dttm> 2015-03-01, 2015-03-08, 2015-03-15, 2015-03-22, 2015-03-29, 2015-04-05, 2015-04-12, 2015-04-19, 2015-04-26,…
$ hits     <int> 34, 25, 21, 17, 16, 18, 17, 16, 13, 13, 12, 12, 11, 13, 11, 10, 9, 9, 9, 9, 9, 9, 9, 10, 10, 11, 12, 14, 16,…
$ geo      <chr> "world", "world", "world", "world", "world", "world", "world", "world", "world", "world", "world", "world", …
$ time     <chr> "today+5-y", "today+5-y", "today+5-y", "today+5-y", "today+5-y", "today+5-y", "today+5-y", "today+5-y", "tod…
$ keyword  <chr> "flu", "flu", "flu", "flu", "flu", "flu", "flu", "flu", "flu", "flu", "flu", "flu", "flu", "flu", "flu", "fl…
$ gprop    <chr> "web", "web", "web", "web", "web", "web", "web", "web", "web", "web", "web", "web", "web", "web", "web", "we…
$ category <chr> "All categories", "All categories", "All categories", "All categories", "All categories", "All categories", …
flu %>% 
  get_interest() %>% 
  datatable()

graph it

flu %>%
  get_interest() %>% 
  ggplot(aes(x = date, y = hits)) +
  geom_line() +
  theme_minimal() +
  labs(title = "hits on flue")

To improve the graph, use theme_minimal and include a title:

flu %>%
  get_interest() %>% 
  ggplot(aes(x = date, y = hits)) +
  geom_line() +
  theme_minimal() +
  labs(title = "google search hits on 'flu'")

We might want to look at repeating seasonsal trends, collapsing across different years. Here’s how to do that:

  1. Create a new variable by extracting the month from the date variable with the lubridate package’s month() function.
  2. Use group_by() on the new month variable to collect the months together.
  3. Get the mean number of search hits for each month.

In addition, I used a datatable with a pageLength option set to 12, and formatRound() to round the 2nd column to 2 digits.

flu %>%
  get_interest() %>% 
  mutate(month = month(date)) %>%         # Create a new variable called month
  group_by(month) %>%                     # Combine the months across different weeks and years
  summarize(hits_per_month = mean(hits)) %>%         # Get average number of searches per month
  datatable(options = list(pageLength = 12)) %>% 
  formatRound(2, 1)

NA

Instead of sending that to a table, we can send it to a graph. We use x = month and y = hits_per_month. The last line scale_x_discrete(limits = c(1:12)) isn’t necessary, but it ensures that the number for every month shows on the x-axis.

flu %>%
  get_interest() %>% 
  mutate(month = month(date)) %>%            # Create a new variable called month
  group_by(month) %>%                        # Combine months across weeks and years
  summarize(hits_per_month = mean(hits)) %>%      # Average number of searches for each month
  ggplot(aes(x = month, y = hits_per_month)) +    # graph it
  geom_line() +
  scale_x_discrete(limits = c(1:12))

Locations

Limit to US, and over a specific time period.

flu_US <- trendy("flu", geo = "US", from = "2015-06-01", to = "2019-06-01")

Now we can look at different regions within the US. Regions are states.

flu_US %>%
  get_interest_region() %>% 
  datatable()

NA

DMA stands for Designated Market Area, and roughly corresponds to cities, and represents a single television market. Pipe flu_US into get_interest_dma() and then into datatable() in a chunk below:

flu_US %>%
  get_interest_dma() %>%
  datatable()

Flu season in Australia occurs at a different time from the US, so it might be interesting to compare search rates. In geo, you can put both “US” and “AU” inside the parentheses of c().

flu_countries <- trendy("flu", geo = c("US", "AU"), from = "2015-01-01", to = "2020-01-01")

Create another ggplot line graph, but this time add color = geo, which will show different colored lines for the US and Australia.

Pipe flu_countries to get_interest to a ggplot, with x = date, y = hits, color = geo. Make it a line graph. Do that below:

flu_countries %>%
  get_interest() %>%
  ggplot(aes(x = date, y = hits, color = geo)) +
  geom_line()

NA
NA

To summarize by month like we did before, just add geo to month inside group_by(), and then add color = geo inside ggplot(aes()).

flu_countries %>%
  get_interest() %>% 
  mutate(month = month(date)) %>%          
  group_by(month, geo) %>%                              
  summarize(hits_per_month = mean(hits)) %>%           
  ggplot(aes(x = month, y = hits_per_month, color = geo)) +       
  geom_line() +
  scale_x_discrete(limits = c(1:12)) +
  theme_minimal() +
  labs(title = "Internet searches for 'flu' over time, by country")

Multiple keywords

We can also search for multiple keywords. Since depression may have some seasonality, let’s compare flu and depression.

flu_depression <- trendy(c("flu", "depression"), geo = "US")

Graph it as above with a line graph, but this time use color = keyword to have different colored lines for the two search terms. Pipe flu_depression to get_interest() to ggplot, with x = date, y = hits, color = keyword.

flu_depression %>%
  get_interest() %>% 
  ggplot(aes(x = date, y = hits, color = keyword)) +
  geom_line()

NA
NA

Now graph it by month, as above for AU and US. Start with flu_depression, pipe that to get_interest(), and then use the model for AU vs. US above, but to see the different keyword plots, use group_by(month, keyword), and use color = keyword in the ggplot. Select an appropriate title.

flu_depression %>%
  get_interest() %>% 
  mutate(month = month(date)) %>%          
  group_by(month, keyword) %>%                              
  summarize(hits_per_month = mean(hits)) %>%           
  ggplot(aes(x = month, y = hits_per_month, color = keyword)) +       
  geom_line() +
  scale_x_discrete(limits = c(1:12)) +
  theme_minimal() +
  labs(title = "Internet searches for 'flu' and 'depression")

Searching specific google properties

You can specify a particular google site to look at with gprop =. Valid options are “web” (default), “news”, “images”, “froogle”, and “youtube”.

depression_videos <- trendy("how to treat depression", gprop = "youtube")

Graph searches by month:

depression_videos %>%
  get_interest() %>% 
  mutate(month = month(date)) %>%          
  group_by(month) %>%                              
  summarize(hits_per_month = mean(hits)) %>%           
  ggplot(aes(x = month, y = hits_per_month)) +       
  geom_line() +
  scale_x_discrete(limits = c(1:12)) +
  theme_minimal() +
  labs(title = "Youtube searches for 'how to treat depression' by month")

Assignment: Look up “psychologist” with trendyy.

psy <- trendy("psychologist")

psy %>% 
  get_interest() %>% 
  glimpse()
Observations: 260
Variables: 7
$ date     <dttm> 2015-03-01, 2015-03-08, 2015-03-15, 2015-03-22, 2015-03-29, 2015-04-05, 2015-04-12, 2015-04-19, 2015-04-26,…
$ hits     <int> 68, 67, 66, 67, 61, 63, 66, 66, 65, 66, 66, 65, 61, 62, 62, 60, 61, 58, 58, 61, 63, 61, 61, 64, 66, 73, 74, …
$ geo      <chr> "world", "world", "world", "world", "world", "world", "world", "world", "world", "world", "world", "world", …
$ time     <chr> "today+5-y", "today+5-y", "today+5-y", "today+5-y", "today+5-y", "today+5-y", "today+5-y", "today+5-y", "tod…
$ keyword  <chr> "psychologist", "psychologist", "psychologist", "psychologist", "psychologist", "psychologist", "psychologis…
$ gprop    <chr> "web", "web", "web", "web", "web", "web", "web", "web", "web", "web", "web", "web", "web", "web", "web", "we…
$ category <chr> "All categories", "All categories", "All categories", "All categories", "All categories", "All categories", …
  1. Create a graph of interest in it over time.
psy %>% 
  get_interest() %>% 
  datatable()
  1. Create a graph of monthly interest in it.

psy %>%
  get_interest() %>% 
  mutate(month = month(date)) %>%            # Create a new variable called month
  group_by(month) %>%                        # Combine months across weeks and years
  summarize(hits_per_month = mean(hits)) %>%      # Average number of searches for each month
  ggplot(aes(x = month, y = hits_per_month)) +    # graph it
  geom_line() +
  scale_x_discrete(limits = c(1:12))

psy_US <- trendy("psychologist", geo = "US", from = "2015-06-01", to = "2019-06-01")
psy_US %>%
  get_interest_region() %>% 
  datatable()

NA
  1. Create a datatable of interest by DMA.

  2. Compare US vs. Canadian (CA) interest in psychologist by month, and create a line graph.

psy_countries <- trendy("psychologist", geo = c("US", "AU"), from = "2015-01-01", to = "2020-01-01")
psy_countries %>%
  get_interest() %>% 
  mutate(month = month(date)) %>%          
  group_by(month, geo) %>%                              
  summarize(hits_per_month = mean(hits)) %>%           
  ggplot(aes(x = month, y = hits_per_month, color = geo)) +       
  geom_line() +
  scale_x_discrete(limits = c(1:12)) +
  theme_minimal() +
  labs(title = "Internet searches for 'psychologist' over time, by country")

  1. Compare interest in psychologist to psychiatrist over time, and create a line graph.
psy_psych <- trendy(c("psychologist", "psychiatrist"), geo = "US")
psy_psych %>%
  get_interest() %>% 
  mutate(month = month(date)) %>%          
  group_by(month, keyword) %>%                              
  summarize(hits_per_month = mean(hits)) %>%           
  ggplot(aes(x = month, y = hits_per_month, color = keyword)) +       
  geom_line() +
  scale_x_discrete(limits = c(1:12)) +
  theme_minimal() +
  labs(title = "Internet searches for 'psychologist' and 'Psychiatrist")

  1. Compare interest in psychologist to psychiatrist in google images over time, and create a line graph.

We can also search for multiple keywords. Since depression may have some seasonality, let’s compare flu and depression.

verses <- trendy(c("psychologist", "psychiatrist"), geo = "US")

Graph it as above with a line graph, but this time use color = keyword to have different colored lines for the two search terms. Pipe flu_depression to get_interest() to ggplot, with x = date, y = hits, color = keyword.

verses %>%
  get_interest() %>% 
  ggplot(aes(x = date, y = hits, color = keyword)) +
  geom_line()

NA
NA

``` 7. Annotate the document and publish it to rpubs.com.

---
title: "Google trends"
output: html_notebook
---
```{r}
library(tidyverse)
library(DT)
library(trendyy)                # Package to access google search data
library(lubridate)               # Handles dates and times
```



The theory is that what people search for reveals their interests in a way that might be more honest than a survey.  

There is a friendly web interface to google trends here: https://trends.google.com/

To start, let's see how people search for the word "flu" with trendy(), and then glimpse() it.


```{r}
flu <- trendy("flu")

flu %>% 
  get_interest() %>% 
  glimpse()
```


```{r}
flu %>% 
  get_interest() %>% 
  datatable()
```


graph it
```{r}
flu %>%
  get_interest() %>% 
  ggplot(aes(x = date, y = hits)) +
  geom_line() +
  theme_minimal() +
  labs(title = "google search hits on 'flu'")
```

To improve the graph, use theme_minimal and include a title:
```{r}
flu %>%
  get_interest() %>% 
  ggplot(aes(x = date, y = hits)) +
  geom_line() +
  theme_minimal() +
  labs(title = "google search hits on 'flu'")
```






We might want to look at repeating seasonsal trends, collapsing across different years. Here's how to do that: 

1. Create a new variable by extracting the month from the date variable with the lubridate package's month() function.  
2. Use group_by() on the new month variable to collect the months together.  
3. Get the mean number of search hits for each month.  

In addition, I used a datatable with a pageLength option set to 12, and formatRound() to round the 2nd column to 2 digits.

```{r}
flu %>%
  get_interest() %>% 
  mutate(month = month(date)) %>%         # Create a new variable called month
  group_by(month) %>%                     # Combine the months across different weeks and years
  summarize(hits_per_month = mean(hits)) %>%         # Get average number of searches per month
  datatable(options = list(pageLength = 12)) %>% 
  formatRound(2, 1)

```


Instead of sending that to a table, we can send it to a graph. We use x = month and y = hits_per_month. The last line scale_x_discrete(limits = c(1:12)) isn't necessary, but it ensures that the number for every month shows on the x-axis.

```{r}
flu %>%
  get_interest() %>% 
  mutate(month = month(date)) %>%            # Create a new variable called month
  group_by(month) %>%                        # Combine months across weeks and years
  summarize(hits_per_month = mean(hits)) %>%      # Average number of searches for each month
  ggplot(aes(x = month, y = hits_per_month)) +    # graph it
  geom_line() +
  scale_x_discrete(limits = c(1:12))

```





### Locations

Limit to US, and over a specific time period.


```{r}
flu_US <- trendy("flu", geo = "US", from = "2015-06-01", to = "2019-06-01")
```




Now we can look at different regions within the US. Regions are states.

```{r}
flu_US %>%
  get_interest_region() %>% 
  datatable()

```



DMA stands for Designated Market Area, and roughly corresponds to cities, and represents a single television market. Pipe flu_US into get_interest_dma() and then into datatable() in a chunk below:

```{r}
flu_US %>%
  get_interest_dma() %>%
  datatable()
```






Flu season in Australia occurs at a different time from the US, so it might be interesting to compare search rates. In geo, you can put both "US" and "AU" inside the parentheses of c().

```{r}
flu_countries <- trendy("flu", geo = c("US", "AU"), from = "2015-01-01", to = "2020-01-01")
```


Create another ggplot line graph, but this time add color = geo, which will show different colored lines for the US and Australia.

Pipe flu_countries to get_interest to a ggplot, with x = date, y = hits, color = geo. Make it a line graph. Do that below:

```{r}
flu_countries %>%
  get_interest() %>%
  ggplot(aes(x = date, y = hits, color = geo)) +
  geom_line()


```







To summarize by month like we did before, just add geo to month inside group_by(), and then add color = geo inside ggplot(aes()).

```{r}
flu_countries %>%
  get_interest() %>% 
  mutate(month = month(date)) %>%          
  group_by(month, geo) %>%                              
  summarize(hits_per_month = mean(hits)) %>%           
  ggplot(aes(x = month, y = hits_per_month, color = geo)) +       
  geom_line() +
  scale_x_discrete(limits = c(1:12)) +
  theme_minimal() +
  labs(title = "Internet searches for 'flu' over time, by country")

```


### Multiple keywords

We can also search for multiple keywords. Since depression may have some seasonality, let's compare flu and depression.

```{r}
flu_depression <- trendy(c("flu", "depression"), geo = "US")
```


Graph it as above with a line graph, but this time use color = keyword to have different colored lines for the two search terms. Pipe flu_depression to get_interest() to ggplot, with x = date, y = hits, color = keyword.
```{r}
flu_depression %>%
  get_interest() %>% 
  ggplot(aes(x = date, y = hits, color = keyword)) +
  geom_line()
  

```








Now graph it by month, as above for AU and US. Start with flu_depression, pipe that to get_interest(), and then use the model for AU vs. US above, but to see the different keyword plots, use group_by(month, keyword), and use color = keyword in the ggplot. Select an appropriate title.

```{r}
flu_depression %>%
  get_interest() %>% 
  mutate(month = month(date)) %>%          
  group_by(month, keyword) %>%                              
  summarize(hits_per_month = mean(hits)) %>%           
  ggplot(aes(x = month, y = hits_per_month, color = keyword)) +       
  geom_line() +
  scale_x_discrete(limits = c(1:12)) +
  theme_minimal() +
  labs(title = "Internet searches for 'flu' and 'depression")

```





### Searching specific google properties

You can specify a particular google site to look at with gprop =. Valid options are "web" (default), "news", "images", "froogle", and "youtube".

```{r}
depression_videos <- trendy("how to treat depression", gprop = "youtube")
```



Graph searches by month:

```{r}
depression_videos %>%
  get_interest() %>% 
  mutate(month = month(date)) %>%          
  group_by(month) %>%                              
  summarize(hits_per_month = mean(hits)) %>%           
  ggplot(aes(x = month, y = hits_per_month)) +       
  geom_line() +
  scale_x_discrete(limits = c(1:12)) +
  theme_minimal() +
  labs(title = "Youtube searches for 'how to treat depression' by month")

```








Assignment: Look up "psychologist" with trendyy.
```{r}
psy <- trendy("psychologist")

psy %>% 
  get_interest() %>% 
  glimpse()
```
1. Create a graph of interest in it over time.
```{r}
psy %>% 
  get_interest() %>% 
  datatable()
```

2. Create a graph of monthly interest in it.
```{r}

psy %>%
  get_interest() %>% 
  mutate(month = month(date)) %>%            # Create a new variable called month
  group_by(month) %>%                        # Combine months across weeks and years
  summarize(hits_per_month = mean(hits)) %>%      # Average number of searches for each month
  ggplot(aes(x = month, y = hits_per_month)) +    # graph it
  geom_line() +
  scale_x_discrete(limits = c(1:12))
```

```{r}
psy_US <- trendy("psychologist", geo = "US", from = "2015-06-01", to = "2019-06-01")
psy_US %>%
  get_interest_region() %>% 
  datatable()

```

3. Create a datatable of interest by DMA.


4. Compare US vs. Canadian (CA) interest in psychologist by month, and create a line graph.
```{r}
psy_countries <- trendy("psychologist", geo = c("US", "AU"), from = "2015-01-01", to = "2020-01-01")
```

```{r}
psy_countries %>%
  get_interest() %>% 
  mutate(month = month(date)) %>%          
  group_by(month, geo) %>%                              
  summarize(hits_per_month = mean(hits)) %>%           
  ggplot(aes(x = month, y = hits_per_month, color = geo)) +       
  geom_line() +
  scale_x_discrete(limits = c(1:12)) +
  theme_minimal() +
  labs(title = "Internet searches for 'psychologist' over time, by country")
```
5. Compare interest in psychologist to psychiatrist over time, and create a line graph.  
```{r}
psy_psych <- trendy(c("psychologist", "psychiatrist"), geo = "US")
```

```{r}
psy_psych %>%
  get_interest() %>% 
  mutate(month = month(date)) %>%          
  group_by(month, keyword) %>%                              
  summarize(hits_per_month = mean(hits)) %>%           
  ggplot(aes(x = month, y = hits_per_month, color = keyword)) +       
  geom_line() +
  scale_x_discrete(limits = c(1:12)) +
  theme_minimal() +
  labs(title = "Internet searches for 'psychologist' and 'Psychiatrist")

```



6. Compare interest in psychologist to psychiatrist in google images over time, and create a line graph.

We can also search for multiple keywords. Since depression may have some seasonality, let's compare flu and depression.

```{r}
verses <- trendy(c("psychologist", "psychiatrist"), geo = "US")
```


Graph it as above with a line graph, but this time use color = keyword to have different colored lines for the two search terms. Pipe flu_depression to get_interest() to ggplot, with x = date, y = hits, color = keyword.
```{r}
verses %>%
  get_interest() %>% 
  ggplot(aes(x = date, y = hits, color = keyword)) +
  geom_line()
  

```
```
7. Annotate the document and publish it to rpubs.com.

