In this Meetup event (https://www.meetup.com/valley-data-analytics-using-r-meetup-group/), I will show you how to scrape the search interest on “Business Analyst” from Jan 2024 to June 2024 and show the results on an interactive map. You can use a different search query instead by replacing “business analyst” in the syntax [keyword = “business analyst”] with your own chosen keyword. See the third section of this tutorial.
Please understand the gtrendsR available in R is not stable sometimes. If that is the case, you may need to download the data manually on the Google Trends website, then combine data and create the final map.
Note:You will be able to create the following interactive map today and publish it on a free website ending with your own name (even without any background in R) - https://rpubs.com/utjimmyx/analyst_hits. You can view the final map by zooming in or zooming out to see the details in each location. rpubs.com is owned and maintained by Posit (previously named R or R Studio).
Disclaimer:
Go to RStudio Cloud. Log in or sign up if necessary. Click on New Project to create a new project. Install Required Packages
Save your R script. Run the script to ensure it generates the map and displays it in the Viewer pane. Publish to RPubs
In the Viewer pane, click on the Publish button that appears after running your script.
Select RPubs and log in or create an account if needed. Fill in the required details such as the title and description. Click Publish to share your map online via RPubs. By following these steps, you can create an interactive map using an R script in RStudio Cloud and publish it on RPubs for easy sharing and viewing.
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
library(curl)
## Using libcurl 8.3.0 with Schannel
library(gtrendsR)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(leaflet)
library(leafem)
## Warning: package 'leafem' was built under R version 4.3.3
library(leaflet.extras)
## Warning: package 'leaflet.extras' was built under R version 4.3.3
library(mapview)
## Warning: package 'mapview' was built under R version 4.3.3
Syntax Breakdown 1. The plot() function is a generic function in R for plotting data. It generates a visual representation of the data passed to it.
Additional random trivial- Fore more about understanding the time function in GtrendsR and R, please visit this article: https://github.com/PMassicotte/gtrendsR/issues/360.
plot(gtrends(keyword = "business analyst",
geo = "US", time = "2024-01-01 2024-06-01", gprop = "web",
low_search_volume = FALSE, onlyInterest = FALSE))
region_trend <- businessanalyst$interest_by_region Description: This line extracts the interest_by_region component from the businessanalyst object and assigns it to a new variable region_trend.
names(region_trend)[names(region_trend) == “location”] <- “NAME” - We wanted to automatically match these two datasets as the us-geo data has the state column named “NAME” instead of “location” while the Google Trends dataset has the state name “location.” The syntax here changes the column name “location” in the region_trend data frame that is scraped from Google Trends to “NAME” and is probably not easy to understand if you are a beginner. It definitely makes sense to do it manually if you only do it once.
us_geo <- tigris::states(cb = TRUE, resolution = ‘20m’) Description: This line fetches geographical data for U.S. states using the tigris package. tigris::states(): is a function from the tigris package that retrieves state boundaries.cb = TRUE: Requests “cartographic boundary” files, which are simplified for mapping. resolution = ‘20m’: Sets the resolution of the map.
combineddata <- inner_join(us_geo, region_trend, by = c(“NAME” = “NAME”)). The inner join function is available from dplyr and merges the us_geo and region_trend data sets where the “NAME” columns match.
mapview(combineddata, zcol = “hits”) - This line creates an interactive map visualization using the mapview package.zcol = “hits” specifies that the color of the regions on the map should correspond to the “hits” column, which likely represents the level of interest.
businessanalyst <- gtrends(keyword = "business analyst",
geo = "US", time = "2024-01-01 2024-06-01", gprop = "web",
low_search_volume = FALSE, onlyInterest = FALSE)
#Select the data from the Google trends data and display the results in different regions.
region_trend <- businessanalyst$interest_by_region
# Rename the location variable to "NAME"
names(region_trend)[names(region_trend) == "location"] <- "NAME"
us_geo <- tigris::states(cb = TRUE, resolution = '20m')
## Retrieving data for the year 2021
## | | | 0% | |====== | 8% | |============ | 17% | |============= | 19% | |=================== | 27% | |======================== | 34% | |============================== | 43% | |==================================== | 51% | |======================================= | 56% | |============================================= | 64% | |================================================ | 69% | |====================================================== | 77% | |============================================================ | 86% | |=============================================================== | 90% | |===================================================================== | 99% | |======================================================================| 100%
head(us_geo)
## Simple feature collection with 6 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -179.1743 ymin: 24.49813 xmax: 179.7739 ymax: 71.35256
## Geodetic CRS: NAD83
## STATEFP STATENS AFFGEOID GEOID STUSPS NAME LSAD ALAND
## 1 22 01629543 0400000US22 22 LA Louisiana 00 1.119153e+11
## 2 02 01785533 0400000US02 02 AK Alaska 00 1.478943e+12
## 3 24 01714934 0400000US24 24 MD Maryland 00 2.515199e+10
## 4 55 01779806 0400000US55 55 WI Wisconsin 00 1.402923e+11
## 5 12 00294478 0400000US12 12 FL Florida 00 1.389617e+11
## 6 13 01705317 0400000US13 13 GA Georgia 00 1.494866e+11
## AWATER geometry
## 1 23736382213 MULTIPOLYGON (((-94.04305 3...
## 2 245378425142 MULTIPOLYGON (((179.4813 51...
## 3 6979074857 MULTIPOLYGON (((-76.04621 3...
## 4 29343646672 MULTIPOLYGON (((-86.93428 4...
## 5 45972570361 MULTIPOLYGON (((-81.81169 2...
## 6 4418360134 MULTIPOLYGON (((-85.60516 3...
combineddata <- inner_join(us_geo, region_trend, by = c("NAME" = "NAME"))
mapview(combineddata, zcol = "hits")
In this step, we will add labels to the map automatically. Each region will be labeled with its corresponding “hits” value from the dataset, combineddata. You can adjust the text size here by revising the syntax: textsize = “7px”
mapview(combineddata, zcol = "hits", label = "hits", layer.name = "Business Analyst - hits") |>
addStaticLabels(label = combineddata$hits, textsize = "5px")
https://rpubs.com/utjimmyx/interactivemapping3
Due to the time limitation, we will not cover it here today. Contact me on my LinkedIn or Twitter page if you are interested.
Note that the echo = TRUE
parameter was added to the
code chunk to print the R code that generated the plot. You can change
it to echo = FALSE
in case you want to prevent R from
printing the code.