The URL Inspection Tool allows you to inspect the status of URLs within Google’s Search Index.
The API for this functionality is available from early 2022 with documentation here.
To get this information within R, you can use the inspection() function newly introduced in version searchConsoleR v0.5.0.
The function needs two arguments - the URL to inspect and the siteUrl of a website you have access to. You can copy this from the list_websites() results.
library(searchConsoleR) # load library
# auth with email that has access to website
scr_auth()
# list the websites you have access to
websites <- list_websites()
websites
# siteUrl permissionLevel
#1 https://example.website.com/ siteFullUser
#2 sc-domain:code.markedmondson.me siteOwner
You can then query each URL of your website in inspection():
results <- inspection("https://code.markedmondson.me/searchConsoleR/",
siteUrl = "sc-domain:code.markedmondson.me")
The results vary depending on what is available in the API:
==SearchConsoleInspectionResult==
inspectionResultLink: https://search.google.com/search-console/inspect?resource_id=sc-domain:code.markedmondson.me&id=MQGXKOglhDYSizSgJrYmwQ&utm_medium=link&utm_source=api
===indexStatusResult===
$verdict
[1] "PASS"
$coverageState
[1] "Indexed, not submitted in sitemap"
$robotsTxtState
[1] "ALLOWED"
$indexingState
[1] "INDEXING_ALLOWED"
$lastCrawlTime
[1] "2022-01-24 22:24:14 UTC"
$pageFetchState
[1] "SUCCESSFUL"
$googleCanonical
[1] "https://code.markedmondson.me/searchConsoleR/"
$referringUrls
[1] "https://www.zldoty.com/feed/"
$crawledAs
[1] "MOBILE"
===MobileUsabilityResult===
$verdict
[1] "PASS"
The current limits for the index inspections API is:
If using this API a lot, you may hit these limits sooner if you are using the default clientId that comes with the package. In that case, it is advised to use your own clientId to send the hits through, which involves creating your own OAuth2 app in Google Cloud Platform. See the googleAuthR setup website for details on this. You don’t need to set any scopes which are saved for Cloud services. An example of its usage is in the ‘Speeding up queries’ section below
You may instead also want to use a service account instead of your own email, which is recommended for professional use. In that case you can authenticate via a JSON service key (note not the same JSON as the ClientId) that will set the clientId for you and accessible via scr_auth(json = "file_location_client.json")
The responses can be quite slow if you are requesting many URLs in bulk. Keeping in mind the quotas, you can speed it up by using parallelization via the future.apply() package
library(searchConsoleR)
scr_auth()
## the top URLs to fetch all at once found via `search_analytics()`
urls <- search_analytics("sc-domain:code.markedmondson.me", dimensions = "page")
## Fetching search analytics for url: sc-domain:code.markedmondson.me dates: 2021-11-06 2022-02-04 dimensions: page dimensionFilterExp: searchType: web aggregationType: auto
top10 <- head(urls$page, 10)
top10
## [1] "https://code.markedmondson.me/googleAnalyticsR/"
## [2] "https://code.markedmondson.me/googleAnalyticsR/articles/reporting-ga4.html"
## [3] "https://code.markedmondson.me/googleAnalyticsR/articles/v4.html"
## [4] "https://code.markedmondson.me/gtm-serverside-webhooks/"
## [5] "https://code.markedmondson.me/r-on-kubernetes-serverless-shiny-r-apis-and-scheduled-scripts/"
## [6] "https://code.markedmondson.me/googleAnalyticsR/articles/setup.html"
## [7] "https://code.markedmondson.me/gtm-serverside-cloudrun/"
## [8] "https://code.markedmondson.me/shiny-cloudrun/"
## [9] "https://code.markedmondson.me/4-ways-schedule-r-scripts-on-google-cloud-platform/"
## [10] "https://code.markedmondson.me/data-privacy-gtm/"
When using quota from the inspection() API, it is polite to use your own clientId by pointing to your clientId JSON via googleAuthR::gar_set_client()
# loop over the top10 for inspection in parallel manner
library(future.apply)
plan(multisession)
#googleAuthR::gar_set_client("path_to_your_clientid.json")
#✓ Setting client.id from path_to_your_clientid.json
f <- function(url, siteUrl){
# need to auth non-interactivly in each parallel session
scr_auth(email = "you@youremail.com") # or better - json
message("Inspection URL:", url)
inspection(url, siteUrl)
}
## makes 10 API calls at once
all_data <- future_lapply(top10, f, siteUrl = "sc-domain:code.markedmondson.me")
# see list of 10 results
str(all_data, 1)
## List of 10
## $ :List of 3
## ..- attr(*, "class")= chr "inspectionResult"
## $ :List of 3
## ..- attr(*, "class")= chr "inspectionResult"
## $ :List of 3
## ..- attr(*, "class")= chr "inspectionResult"
## $ :List of 3
## ..- attr(*, "class")= chr "inspectionResult"
## $ :List of 3
## ..- attr(*, "class")= chr "inspectionResult"
## $ :List of 3
## ..- attr(*, "class")= chr "inspectionResult"
## $ :List of 3
## ..- attr(*, "class")= chr "inspectionResult"
## $ :List of 3
## ..- attr(*, "class")= chr "inspectionResult"
## $ :List of 3
## ..- attr(*, "class")= chr "inspectionResult"
## $ :List of 3
## ..- attr(*, "class")= chr "inspectionResult"
# extract data from the list
lapply(all_data, function(x) x$indexStatusResult)
## [[1]]
## [[1]]$verdict
## [1] "PASS"
##
## [[1]]$coverageState
## [1] "Indexed, not submitted in sitemap"
##
## [[1]]$robotsTxtState
## [1] "ALLOWED"
##
## [[1]]$indexingState
## [1] "INDEXING_ALLOWED"
##
## [[1]]$lastCrawlTime
## [1] "2022-01-31 13:46:32 UTC"
##
## [[1]]$pageFetchState
## [1] "SUCCESSFUL"
##
## [[1]]$googleCanonical
## [1] "https://code.markedmondson.me/googleAnalyticsR/"
##
## [[1]]$referringUrls
## [1] "https://code.markedmondson.me/googleAnalyticsR"
##
## [[1]]$crawledAs
## [1] "MOBILE"
##
##
## [[2]]
## [[2]]$verdict
## [1] "PASS"
##
## [[2]]$coverageState
## [1] "Indexed, not submitted in sitemap"
##
## [[2]]$robotsTxtState
## [1] "ALLOWED"
##
## [[2]]$indexingState
## [1] "INDEXING_ALLOWED"
##
## [[2]]$lastCrawlTime
## [1] "2022-02-04 13:30:04 UTC"
##
## [[2]]$pageFetchState
## [1] "SUCCESSFUL"
##
## [[2]]$googleCanonical
## [1] "https://code.markedmondson.me/googleAnalyticsR/articles/reporting-ga4.html"
##
## [[2]]$referringUrls
## [1] "https://code.markedmondson.me/googleAnalyticsR/articles/v4.html"
## [2] "https://code.markedmondson.me/googleAnalyticsR/reference/ga_auth.html"
##
## [[2]]$crawledAs
## [1] "MOBILE"
##
##
## [[3]]
## [[3]]$verdict
## [1] "PASS"
##
## [[3]]$coverageState
## [1] "Indexed, not submitted in sitemap"
##
## [[3]]$robotsTxtState
## [1] "ALLOWED"
##
## [[3]]$indexingState
## [1] "INDEXING_ALLOWED"
##
## [[3]]$lastCrawlTime
## [1] "2022-02-05 08:57:03 UTC"
##
## [[3]]$pageFetchState
## [1] "SUCCESSFUL"
##
## [[3]]$googleCanonical
## [1] "https://code.markedmondson.me/googleAnalyticsR/articles/v4.html"
##
## [[3]]$referringUrls
## [1] "https://www.dataquest.io/feed/"
##
## [[3]]$crawledAs
## [1] "MOBILE"
##
##
## [[4]]
## [[4]]$verdict
## [1] "PASS"
##
## [[4]]$coverageState
## [1] "Submitted and indexed"
##
## [[4]]$robotsTxtState
## [1] "ALLOWED"
##
## [[4]]$indexingState
## [1] "INDEXING_ALLOWED"
##
## [[4]]$lastCrawlTime
## [1] "2022-02-01 12:03:45 UTC"
##
## [[4]]$pageFetchState
## [1] "SUCCESSFUL"
##
## [[4]]$googleCanonical
## [1] "https://code.markedmondson.me/gtm-serverside-webhooks/"
##
## [[4]]$userCanonical
## [1] "https://code.markedmondson.me/gtm-serverside-webhooks/"
##
## [[4]]$sitemap
## [1] "https://code.markedmondson.me/sitemap.xml"
##
## [[4]]$referringUrls
## [1] "https://code.markedmondson.me/anti-sampling-google-analytics-api/"
## [2] "https://code.markedmondson.me/datascience-aas/"
##
## [[4]]$crawledAs
## [1] "MOBILE"
##
##
## [[5]]
## [[5]]$verdict
## [1] "PASS"
##
## [[5]]$coverageState
## [1] "Submitted and indexed"
##
## [[5]]$robotsTxtState
## [1] "ALLOWED"
##
## [[5]]$indexingState
## [1] "INDEXING_ALLOWED"
##
## [[5]]$lastCrawlTime
## [1] "2022-02-07 03:56:14 UTC"
##
## [[5]]$pageFetchState
## [1] "SUCCESSFUL"
##
## [[5]]$googleCanonical
## [1] "https://code.markedmondson.me/r-on-kubernetes-serverless-shiny-r-apis-and-scheduled-scripts/"
##
## [[5]]$userCanonical
## [1] "https://code.markedmondson.me/r-on-kubernetes-serverless-shiny-r-apis-and-scheduled-scripts/"
##
## [[5]]$sitemap
## [1] "https://code.markedmondson.me/sitemap.xml"
##
## [[5]]$referringUrls
## [1] "https://www.joyk.com/dig/redirect/1608386517221791"
## [2] "http://code.markedmondson.me/"
## [3] "https://code.markedmondson.me/r-on-kubernetes-serverless-shiny-r-apis-and-scheduled-scripts"
## [4] "http://code.markedmondson.me/r-on-kubernetes-serverless-shiny-r-apis-and-scheduled-scripts/"
##
## [[5]]$crawledAs
## [1] "MOBILE"
##
##
## [[6]]
## [[6]]$verdict
## [1] "PASS"
##
## [[6]]$coverageState
## [1] "Indexed, not submitted in sitemap"
##
## [[6]]$robotsTxtState
## [1] "ALLOWED"
##
## [[6]]$indexingState
## [1] "INDEXING_ALLOWED"
##
## [[6]]$lastCrawlTime
## [1] "2022-02-04 10:30:38 UTC"
##
## [[6]]$pageFetchState
## [1] "SUCCESSFUL"
##
## [[6]]$googleCanonical
## [1] "https://code.markedmondson.me/googleAnalyticsR/articles/setup.html"
##
## [[6]]$referringUrls
## [1] "https://www.dataquest.io/feed/"
## [2] "https://code.markedmondson.me/googleAnalyticsR/index.html"
## [3] "https://code.markedmondson.me/googleAnalyticsR/articles/v4.html"
##
## [[6]]$crawledAs
## [1] "MOBILE"
##
##
## [[7]]
## [[7]]$verdict
## [1] "PASS"
##
## [[7]]$coverageState
## [1] "Submitted and indexed"
##
## [[7]]$robotsTxtState
## [1] "ALLOWED"
##
## [[7]]$indexingState
## [1] "INDEXING_ALLOWED"
##
## [[7]]$lastCrawlTime
## [1] "2022-02-03 11:36:00 UTC"
##
## [[7]]$pageFetchState
## [1] "SUCCESSFUL"
##
## [[7]]$googleCanonical
## [1] "https://code.markedmondson.me/gtm-serverside-cloudrun/"
##
## [[7]]$userCanonical
## [1] "https://code.markedmondson.me/gtm-serverside-cloudrun/"
##
## [[7]]$sitemap
## [1] "https://code.markedmondson.me/sitemap.xml"
##
## [[7]]$referringUrls
## [1] "http://code.markedmondson.me/"
##
## [[7]]$crawledAs
## [1] "MOBILE"
##
##
## [[8]]
## [[8]]$verdict
## [1] "PASS"
##
## [[8]]$coverageState
## [1] "Submitted and indexed"
##
## [[8]]$robotsTxtState
## [1] "ALLOWED"
##
## [[8]]$indexingState
## [1] "INDEXING_ALLOWED"
##
## [[8]]$lastCrawlTime
## [1] "2022-02-05 19:24:17 UTC"
##
## [[8]]$pageFetchState
## [1] "SUCCESSFUL"
##
## [[8]]$googleCanonical
## [1] "https://code.markedmondson.me/shiny-cloudrun/"
##
## [[8]]$userCanonical
## [1] "https://code.markedmondson.me/shiny-cloudrun/"
##
## [[8]]$sitemap
## [1] "https://code.markedmondson.me/sitemap.xml"
##
## [[8]]$referringUrls
## [1] "http://code.markedmondson.me/"
## [2] "https://code.markedmondson.me/tags/google-app-engine/"
##
## [[8]]$crawledAs
## [1] "MOBILE"
##
##
## [[9]]
## [[9]]$verdict
## [1] "PASS"
##
## [[9]]$coverageState
## [1] "Submitted and indexed"
##
## [[9]]$robotsTxtState
## [1] "ALLOWED"
##
## [[9]]$indexingState
## [1] "INDEXING_ALLOWED"
##
## [[9]]$lastCrawlTime
## [1] "2022-02-05 17:26:38 UTC"
##
## [[9]]$pageFetchState
## [1] "SUCCESSFUL"
##
## [[9]]$googleCanonical
## [1] "https://code.markedmondson.me/4-ways-schedule-r-scripts-on-google-cloud-platform/"
##
## [[9]]$userCanonical
## [1] "https://code.markedmondson.me/4-ways-schedule-r-scripts-on-google-cloud-platform/"
##
## [[9]]$sitemap
## [1] "https://code.markedmondson.me/sitemap.xml"
##
## [[9]]$referringUrls
## [1] "https://code.markedmondson.me/4-ways-schedule-r-scripts-on-google-cloud-platform"
##
## [[9]]$crawledAs
## [1] "MOBILE"
##
##
## [[10]]
## [[10]]$verdict
## [1] "PASS"
##
## [[10]]$coverageState
## [1] "Submitted and indexed"
##
## [[10]]$robotsTxtState
## [1] "ALLOWED"
##
## [[10]]$indexingState
## [1] "INDEXING_ALLOWED"
##
## [[10]]$lastCrawlTime
## [1] "2022-02-03 00:11:35 UTC"
##
## [[10]]$pageFetchState
## [1] "SUCCESSFUL"
##
## [[10]]$googleCanonical
## [1] "https://code.markedmondson.me/data-privacy-gtm/"
##
## [[10]]$userCanonical
## [1] "https://code.markedmondson.me/data-privacy-gtm/"
##
## [[10]]$sitemap
## [1] "https://code.markedmondson.me/sitemap.xml"
##
## [[10]]$referringUrls
## [1] "https://code.markedmondson.me/r-at-scale-on-google-cloud-platform/"
## [2] "http://code.markedmondson.me/"
## [3] "https://code.markedmondson.me/googleCloudRunner-intro/?utm_source=facebook&utm_medium=social&utm_campaign=dlvr_it"
##
## [[10]]$crawledAs
## [1] "MOBILE"
url_crawltime <- function(x){
data.frame(googleCanonical = x$indexStatusResult$googleCanonical,
crawled = as.character(x$indexStatusResult$lastCrawlTime))
}
# make a data.frame of the data via rbind
df <- Reduce(rbind, lapply(all_data, url_crawltime))
knitr::kable(df)
| googleCanonical | crawled |
|---|---|
| https://code.markedmondson.me/googleAnalyticsR/ | 2022-01-31 13:46:32 |
| https://code.markedmondson.me/googleAnalyticsR/articles/reporting-ga4.html | 2022-02-04 13:30:04 |
| https://code.markedmondson.me/googleAnalyticsR/articles/v4.html | 2022-02-05 08:57:03 |
| https://code.markedmondson.me/gtm-serverside-webhooks/ | 2022-02-01 12:03:45 |
| https://code.markedmondson.me/r-on-kubernetes-serverless-shiny-r-apis-and-scheduled-scripts/ | 2022-02-07 03:56:14 |
| https://code.markedmondson.me/googleAnalyticsR/articles/setup.html | 2022-02-04 10:30:38 |
| https://code.markedmondson.me/gtm-serverside-cloudrun/ | 2022-02-03 11:36:00 |
| https://code.markedmondson.me/shiny-cloudrun/ | 2022-02-05 19:24:17 |
| https://code.markedmondson.me/4-ways-schedule-r-scripts-on-google-cloud-platform/ | 2022-02-05 17:26:38 |
| https://code.markedmondson.me/data-privacy-gtm/ | 2022-02-03 00:11:35 |
x$indexStatusResult$lastCrawlTime is parsed to an R dateTime class POSIXct so you can use as.character() to turn it into a string, but it may be preferable for keeping it in class POSIXct when plotting the data.