SEO Basics

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

Basic SEO checkup with R

The purpose of the analysis is to extract SEO insights using R. We will be retrieving links, title tags, and other types of SEO information from several homepages. You may also use Screaming Frog, SEMrush, or Moz for this task. However, why not using R if it is for free.

Let’s start with Google’s basic SEO definition (https://support.google.com/webmasters/answer/7451184?hl=en)

“A title tag tells …”

Q1: Why is creating a good title tag so important? According to Google, what are some good practices? Please refer to the intro below:

Reference: https://support.google.com/webmasters/answer/7451184?hl=en

Q2: According to the guidelines given by Googe, could you please tell me how you like the title tags of the following websites?

#1.1 Install the packages
#install.packages("devtools")
library(devtools)

## Warning: package 'devtools' was built under R version 3.6.3

## Loading required package: usethis

## Warning: package 'usethis' was built under R version 3.6.3

#install_github("pixgarden/xsitemap")
library(xsitemap)
#install.packages("httr")
library(httr)

## Warning: package 'httr' was built under R version 3.6.3

library(XML)
library(plyr)

#1.2 Let’s list the title tags of the following 4 sites.

url <- "https://www.wexinc.com/"
request <- GET(url)
doc <- htmlParse(request, asText = TRUE)
Wex <- xpathSApply(doc, "//title", xmlValue)


url <- "https://www.wonderful.com/"
request <- GET(url)
#doc<- content(request, "text", encoding = "ISO-8859-1")
doc<- htmlParse(request, "text", encoding = "ISO-8859-1")
WonderfulFarm <- xpathSApply(doc, "//title", xmlValue)

url <- "https://www.nmsu.edu/"
request <- GET(url)
doc <- htmlParse(request, asText = TRUE)

## No encoding supplied: defaulting to UTF-8.

NMSU <- xpathSApply(doc, "//title", xmlValue)


url <- "https://www.CSUB.edu/"
request <- GET(url)
#doc<- content(request, "text", encoding = "ISO-8859-1")
doc<- htmlParse(request, "text", encoding = "ISO-8859-1")

## No encoding supplied: defaulting to UTF-8.

CSUB <- xpathSApply(doc, "//title", xmlValue)

# Now let's look at these title tags
Wex

## [1] "WEX Inc. | Game-changing payment solutions for every business"

WonderfulFarm

## [1] "The Wonderful Company :: Home"

NMSU

## [1] "\r\n\t\t\tNew Mexico State University - BE BOLD. Shape the Future.\r\n\t\t"

CSUB

## [1] "California State University, Bakersfield"

Now let’s do a quick SEO audit using R

#2. Get the source code
url <- "https://www.nmsu.edu/"
request <- GET(url)
doc <- htmlParse(request, asText = TRUE)

## No encoding supplied: defaulting to UTF-8.

#3. Get the title and count the number of characters
PageTitle <- xpathSApply(doc, "//title", xmlValue)
PageTitle

## [1] "\r\n\t\t\tNew Mexico State University - BE BOLD. Shape the Future.\r\n\t\t"

nchar(PageTitle)

## [1] 65

#4. Get posts names - you can also find h1 and h2 on the "viewsource" page
PostTitles <- data.frame(xpathSApply(doc, "//h2[@class='entry-title h1']", xmlValue))
PostTitles <- data.frame(xpathSApply(doc, "//h2", xmlValue))
PostTitles

##   xpathSApply.doc.....h2...xmlValue.
## 1                      Undergraduate
## 2                           Graduate
## 3                Parents & Families 
## 4                      International
## 5                             Online
## 6                        Recent News

#5. Retrieve all the links on the page and make a list of them
hrefs <- xpathSApply(doc, "//div/a", xmlGetAttr, 'href')
hrefs <- data.frame(matrix(unlist(hrefs), byrow=T))
hrefs

##                          matrix.unlist.hrefs...byrow...T.
## 1                                         http://nmsu.edu
## 2                                         http://nmsu.edu
## 3                                         http://nmsu.edu
## 4                                                       #
## 5                       http://admissions.nmsu.edu/apply/
## 6                       http://admissions.nmsu.edu/visit/
## 7                                https://ignite.nmsu.edu/
## 8                                https://inside.nmsu.edu/
## 9                       http://admissions.nmsu.edu/apply/
## 10                      http://admissions.nmsu.edu/visit/
## 11                     https://advancing.nmsu.edu/givenow
## 12                                 http://inside.nmsu.edu
## 13                            http://newscenter.nmsu.edu/
## 14                      https://admissions.nmsu.edu/info/
## 15                     https://admissions.nmsu.edu/visit/
## 16                     https://admissions.nmsu.edu/apply/
## 17 http://nmhedss2.state.nm.us/Dashboard/index.aspx?ID=15

count(hrefs)  # we count how many times each href shows on the homepage (https://rdrr.io/cran/plyr/man/count.html)

##                          matrix.unlist.hrefs...byrow...T. freq
## 1                                                       #    1
## 2                       http://admissions.nmsu.edu/apply/    2
## 3                       http://admissions.nmsu.edu/visit/    2
## 4                                  http://inside.nmsu.edu    1
## 5                             http://newscenter.nmsu.edu/    1
## 6  http://nmhedss2.state.nm.us/Dashboard/index.aspx?ID=15    1
## 7                                         http://nmsu.edu    3
## 8                      https://admissions.nmsu.edu/apply/    1
## 9                       https://admissions.nmsu.edu/info/    1
## 10                     https://admissions.nmsu.edu/visit/    1
## 11                     https://advancing.nmsu.edu/givenow    1
## 12                               https://ignite.nmsu.edu/    1
## 13                               https://inside.nmsu.edu/    1

#1. We can now skip the 1st step

#2. Get the source code
url <- "https://www.wexinc.com/"
request <- GET(url)
doc <- htmlParse(request, asText = TRUE)

#3. Get the title and count the number of characters
PageTitle <- xpathSApply(doc, "//title", xmlValue)
PageTitle

## [1] "WEX Inc. | Game-changing payment solutions for every business"

nchar(PageTitle)

## [1] 61

#4. Get posts names - you can also find h1 and h2 on the "viewsource" page
PostTitles <- data.frame(xpathSApply(doc, "//h2[@class='entry-title h1']", xmlValue))
PostTitles <- data.frame(xpathSApply(doc, "//h2", xmlValue))
PostTitles

##    xpathSApply.doc.....h2...xmlValue.
## 1                      Run Your Fleet
## 2       Streamline Corporate Payments
## 3 Simplify the Business of Healthcare
## 4                            Insights
## 5                       WEX Worldwide
## 6       Still Considering a Gas Card?

#5. Retrieve all the links on the page and make a list of them
hrefs <- xpathSApply(doc, "//div/a", xmlGetAttr, 'href')
hrefs <- data.frame(matrix(unlist(hrefs), byrow=T))
hrefs

##                                                                                                     matrix.unlist.hrefs...byrow...T.
## 1                                                                                                                      /get-started/
## 2                                                                                                            https://www.wexinc.com/
## 3                                                                                                                      /get-started/
## 4                                                                                 https://www.wexinc.com/solutions/fleet-management/
## 5                                                                               https://www.wexinc.com/solutions/payment-processing/
## 6                                                                   https://www.wexinc.com/solutions/healthcare-benefits-management/
## 7  https://www.wexinc.com/insights/blog/inside-wex/wexs-robert-deshaies-3-ways-benefits-plans-can-provide-relief-in-uncertain-times/
## 8                                                                                                        /insights/blogs/inside-wex/
## 9  https://www.wexinc.com/insights/blog/inside-wex/wexs-robert-deshaies-3-ways-benefits-plans-can-provide-relief-in-uncertain-times/
## 10                            https://www.wexinc.com/insights/blog/health/5-common-expenses-now-eligible-for-your-hsa-and-fsa-funds/
## 11                                                                                     https://www.wexinc.com/insights/blogs/health/
## 12                            https://www.wexinc.com/insights/blog/health/5-common-expenses-now-eligible-for-your-hsa-and-fsa-funds/
## 13                                                                                                                        /insights/
## 14                                                                                                           https://www.wexinc.com/
## 15                                                                                                                        /covid-19/

count(hrefs)  # we count how many times each href shows on the homepage (https://rdrr.io/cran/plyr/man/count.html)

##                                                                                                     matrix.unlist.hrefs...byrow...T.
## 1                                                                                                                         /covid-19/
## 2                                                                                                                      /get-started/
## 3                                                                                                                         /insights/
## 4                                                                                                        /insights/blogs/inside-wex/
## 5                                                                                                            https://www.wexinc.com/
## 6                             https://www.wexinc.com/insights/blog/health/5-common-expenses-now-eligible-for-your-hsa-and-fsa-funds/
## 7  https://www.wexinc.com/insights/blog/inside-wex/wexs-robert-deshaies-3-ways-benefits-plans-can-provide-relief-in-uncertain-times/
## 8                                                                                      https://www.wexinc.com/insights/blogs/health/
## 9                                                                                 https://www.wexinc.com/solutions/fleet-management/
## 10                                                                  https://www.wexinc.com/solutions/healthcare-benefits-management/
## 11                                                                              https://www.wexinc.com/solutions/payment-processing/
##    freq
## 1     1
## 2     2
## 3     1
## 4     1
## 5     2
## 6     2
## 7     2
## 8     1
## 9     1
## 10    1
## 11    1

Why R?

If you have already heard of the R (or Python) programming language and you are interested taking your SEO skills to the next level or streamlining your projects, this basic tutorial is probably a good start. Initially intended for data scientists and statisticians, the R language has for some years now landed in unsuspected audiences, and the reason is simple. You might want to explore the following resources.