1. Introduction

Whenever I'm searching for jobs, I find myself repeatedly refreshing job websites to check for new vacancies. While this works, it quickly becomes tedious. Some websites are cluttered with advertisements that make navigation difficult, while others, such as MyJobMag, don't always display vacancies in a clear chronological order. As a result, I often spend more time scrolling through listings than actually reviewing opportunities.

I started exploring web scraping as a way to solve this problem. My goal is to automatically collect job postings, store them in my own database, and build a dashboard where I can filter, search, and sort vacancies in a way that suits my workflow. Beyond simplifying my job search, I also see this as an opportunity to learn a practical data acquisition skill that can be applied to many other projects.

These notes document what I'm learning about how web scraping works in R.

2. Reading the Website

The following packages are what I will need.

library(rvest)
library(tidyverse)
library(lubridate)
library(janitor)
library(gt)

The website is like a book.

The URL is the book’s address.
The webpage is the book itself.
HTML is the language used to write the book.

The first thing I need to do is download the webpage.

page <- read_html("https://www.myjobmag.co.ke/jobs")

The read_html() function doesn’t return the webpage exactly as I see it in my browser.

Instead, it downloads the HTML that the browser uses to build the page.

I can inspect what was downloaded by checking its class:

class(page)

## [1] "xml_document" "xml_node"

The output shows it’s an xml_document object.

At first it looks strange, but it simply means R has stored the webpage as a structured document that I can search through.

Now that I’ve downloaded the webpage into R, I know the data is somewhere inside the HTML.

The question is:

How do I know where the job titles, company names, dates, or links are stored?

The answer is that I don’t, not until I understand the HTML structure of the webpage.

3. Understanding HTML

Every webpage is written in HTML (HyperText Markup Language).

A very small webpage might look like this:

<html>

<head>
<title>My Website</title>
</head>

<body>

<h1>Hello World</h1>

<p>This is my first webpage.</p>

</body>

</html>

The browser reads this code and turns it into the webpage we actually see:

My Website

Hello World

This is my first webpage.

3.1 Elements

HTML is made up of elements.

Some common ones are:

<title> — page title
<h1> — main heading
<h2> — secondary heading
<p> — paragraph
<img> — image
<a> — hyperlink

If I want to see the title of the MyJobMag page:

page %>%
  html_elements("title")

## {xml_nodeset (1)}
## [1] <title>Kenya Jobs - 1000+ Jobs Posted Daily | MyJobMag</title>\n

If I want to see all level-2 headings:

page %>%
  html_elements("h2")

## {xml_nodeset (28)}
##  [1] <h2><a style=" " href="/job/customer-support-specialist-nathan-digital"> ...
##  [2] <h2><a style=" " href="/job/sales-executive-field-service-technician-mal ...
##  [3] <h2><a style=" " href="/jobs/graduate-trainees-opportunities-at-west-ken ...
##  [4] <h2><a style=" " href="/job/senior-technician-fibre-north-rift-valley-jo ...
##  [5] <h2><a style=" " href="/job/sectional-ip-network-engineer-field-central- ...
##  [6] <h2><a style=" " href="/job/gsm-engineer-western-region-job-grade-2-2-hc ...
##  [7] <h2><a style=" " href="/job/gsm-engineer-central-eastern-job-grade-2-2-h ...
##  [8] <h2><a style=" " href="/job/senior-technician-gsm-central-rift-valley-jo ...
##  [9] <h2><a style=" " href="/job/account-manager-tugende-16">Account Manager  ...
## [10] <h2><a style=" " href="/job/product-business-analyst">Product &amp; Busi ...
## [11] <h2><a style=" " href="/job/sales-intern-food-partners-hcs-affiliates-gr ...
## [12] <h2><a style=" " href="/job/accounts-intern-generations-techzone">Accoun ...
## [13] <h2><a style=" " href="/job/account-assistant-generations-techzone-1">Ac ...
## [14] <h2><a style=" " href="/job/manager-digital-platforms-and-marketplace-gr ...
## [15] <h2><a style=" " href="/job/regional-sales-manager-kenya-burn">Regional  ...
## [16] <h2><a style=" " href="/job/purchasing-officer-medecins-sans-frontieres- ...
## [17] <h2><a style=" " href="/job/procurement-officer-trade-and-development-ba ...
## [18] <h2><a style=" " href="/job/pre-sales-specialist-gcp-abacus">Pre Sales S ...
## [19] <h2><a style=" " href="/job/accountant-arvocap-asset-managers">Accountan ...
## [20] <h2><a style=" " href="/job/data-analyst-in-house-and-contract-broiler-o ...
## ...

page %>%
  html_elements("h2") %>% 
  length()

## [1] 28

This tells me how many <h2> elements are on the page.

For MyJobMag, I noticed that every job title happens to be inside an <h2>.

That isn’t true for every website. Another website might store job titles inside <div>, <span>, or something else.

Finding the right element is part of web scraping.

3.2 Nodes

One idea that confused me at first was the concept of nodes.

A node is simply one piece of HTML.

For example,

<h2>
Product Analyst
</h2>

is one node.

This is also a node:

<p>
Salary: KES 120,000
</p>

Even an image is a node:

<img src="companylogo.png">

Basically, every object on a webpage is represented as a node.

When I use

html_elements()

I’m asking rvest to find nodes with a particular tag.

3.3 Tags

Every HTML element has a tag.

The tag below is h2:

<h2>Product Analyst</h2>

The tag below is p:

<p>Salary</p>

Other tags I keep seeing include:

div — section or container
span — small inline container
img — image
ul — unordered list
li — list item
table — table

I’ll probably encounter more as I scrape different websites.

3.3.1 The `<a>` Tag

One tag that appears almost everywhere is <a>.

For example,

<h2>

<a href="/job/product-business-analyst">

Product & Business Analyst

</a>

</h2>

The <a> tag creates a hyperlink.

The text between <a> and </a> is what users click.

The href attribute stores where that link goes.

This means I can extract both the job title and the job URL.

3.3.2 Parent and Child

HTML is organised like a family tree.

For example,

h2
└── a

The <a> element lives inside the <h2> element.

So:

<h2> is the parent
<a> is the child

This relationship is useful because sometimes I need to move from a child node to its parent, or from a parent to its children, while scraping.

In summary:

read_html() downloads a webpage.
HTML is just the code behind a webpage.
Everything on a webpage is represented as nodes.
Nodes have tags like h2, p, div, and a.
html_elements() searches for nodes with a given tag.
Job titles on MyJobMag happen to be stored inside <h2> elements.
HTML has a tree structure made up of parents and children.

4. Finding the Information I Want

So far, I’ve downloaded the webpage into R.

page <- read_html("https://www.myjobmag.co.ke/jobs")

However, the webpage contains thousands of HTML elements.

The challenge now is finding where the job titles, company names, dates, and links are stored.

To do that, I first need to inspect the webpage.

4.1 Inspecting the Website

Every modern browser lets you inspect the HTML behind a webpage.

For example, in Google Chrome or Microsoft Edge:

Open the webpage.
Right-click on a job title.
Click Inspect.

A panel appears showing the HTML that generated that piece of the page.

This is what I see when I inspect MyJobMag.

This is how I discovered that job titles were inside <h2> tags.

Also, while looking at the HTML, I noticed something interesting:

<a href="/job/product-business-analyst">
  Product & Business Analyst
</a>

Besides the text, the tag also contains:

href="/job/product-business-analyst"

This is called an attribute.

Think of attributes as extra information about an HTML element.

For example:

<img src="logo.png">

The image tag has an attribute called src.

<a href="/jobs">

The anchor tag has an attribute called href.

Attributes usually tell the browser something about the element.

4.1.1 Job Card

When I inspect the page more carefully, I notice that each job listing is contained within a card.

In HTML, a card is typically a container element that holds all the information about a single item.

For MyJobMag, each job card is an <li> element.

Inside each card, I can find:

The job title and link inside <h2><a>
The company name in the alt attribute of an <img> or the title attribute of an <a>
The date posted as text inside a nested <li>

Here's what a typical job card looks like:

Now that I understand the structure, I can extract data from each card.

4.2 Extracting the Job Title

Now that I know job titles are inside <h2> tags with an <a> inside, I can extract them.

page %>%
  html_elements("h2") %>%
  html_text2() %>%
  head(10)

##  [1] "Customer Support Specialist at Nathan Digital"                                               
##  [2] "Sales Executive & Field Service Technician (Male Applicants) at Riset Software & Systems LTD"
##  [3] "Graduate Trainees Opportunities at West Kenya Sugar Limited"                                 
##  [4] "Senior Technician - Fibre - North Rift Valley Job grade 2.1 at HCS Affiliates Group"         
##  [5] "Sectional IP Network Engineer- Field - Central Job Grade 2.2 at HCS Affiliates Group"        
##  [6] "GSM Engineer - Western Region - Job Grade 2.2 at HCS Affiliates Group"                       
##  [7] "GSM Engineer - Central & Eastern - Job Grade 2.2 at HCS Affiliates Group"                    
##  [8] "Senior Technician GSM - Central Rift Valley Job Grade 2.1 at HCS Affiliates Group"           
##  [9] "Account Manager at Tugende Limited"                                                          
## [10] "Product & Business Analyst at a Reputable Company"

4.3 Extracting the Job Link

I also want the job URL.

Since I know it lives inside the href attribute of the <a> tag, I can extract it.

page %>%
  html_elements("h2 a") %>%
  html_attr("href") %>%
  head(10)

##  [1] "/job/customer-support-specialist-nathan-digital"                                         
##  [2] "/job/sales-executive-field-service-technician-male-applicants-riset-software-systems-ltd"
##  [3] "/jobs/graduate-trainees-opportunities-at-west-kenya-sugar-limited"                       
##  [4] "/job/senior-technician-fibre-north-rift-valley-job-grade-2-1-hcs-affiliates-group"       
##  [5] "/job/sectional-ip-network-engineer-field-central-job-grade-2-2-hcs-affiliates-group"     
##  [6] "/job/gsm-engineer-western-region-job-grade-2-2-hcs-affiliates-group"                     
##  [7] "/job/gsm-engineer-central-eastern-job-grade-2-2-hcs-affiliates-group"                    
##  [8] "/job/senior-technician-gsm-central-rift-valley-job-grade-2-1-hcs-affiliates-group"       
##  [9] "/job/account-manager-tugende-16"                                                         
## [10] "/job/product-business-analyst"

Notice that I used "h2 a" instead of just "h2".

This is because the href attribute belongs to the <a> element, not the <h2>.

However, the output above are relative URLs.

To turn them into complete URLs, I simply concatenate them with the website address:

page %>%
  html_elements("h2 a") %>%
  html_attr("href") %>%
  paste0("https://www.myjobmag.co.ke", .) %>%
  head(10)

##  [1] "https://www.myjobmag.co.ke/job/customer-support-specialist-nathan-digital"                                         
##  [2] "https://www.myjobmag.co.ke/job/sales-executive-field-service-technician-male-applicants-riset-software-systems-ltd"
##  [3] "https://www.myjobmag.co.ke/jobs/graduate-trainees-opportunities-at-west-kenya-sugar-limited"                       
##  [4] "https://www.myjobmag.co.ke/job/senior-technician-fibre-north-rift-valley-job-grade-2-1-hcs-affiliates-group"       
##  [5] "https://www.myjobmag.co.ke/job/sectional-ip-network-engineer-field-central-job-grade-2-2-hcs-affiliates-group"     
##  [6] "https://www.myjobmag.co.ke/job/gsm-engineer-western-region-job-grade-2-2-hcs-affiliates-group"                     
##  [7] "https://www.myjobmag.co.ke/job/gsm-engineer-central-eastern-job-grade-2-2-hcs-affiliates-group"                    
##  [8] "https://www.myjobmag.co.ke/job/senior-technician-gsm-central-rift-valley-job-grade-2-1-hcs-affiliates-group"       
##  [9] "https://www.myjobmag.co.ke/job/account-manager-tugende-16"                                                         
## [10] "https://www.myjobmag.co.ke/job/product-business-analyst"

4.4 Extracting the Company Name

From my inspection, I found that the company name is stored in the alt attribute of an <img> tag or the title attribute of an <a> tag.

# Extracting from img alt as per the hierachy
company_names <- page %>%
  html_elements("ul.job-list li.job-list-li li.job-logo img") %>% # which can simply be coded as "li.job-logo img"
  html_attr("alt") %>%
  str_remove(" logo$") %>%
  str_trim()

# Show first 10
head(company_names, 10)

##  [1] "Nathan Digital"                                   
##  [2] "Riset Software & Systems LTD"                     
##  [3] "West Kenya Sugar Limited"                         
##  [4] "HCS Affiliates Group"                             
##  [5] "HCS Affiliates Group"                             
##  [6] "HCS Affiliates Group"                             
##  [7] "HCS Affiliates Group"                             
##  [8] "HCS Affiliates Group"                             
##  [9] "Tugende"                                          
## [10] "Product & Business Analyst at a Reputable Company"

4.5 Extracting the Date Posted

From my inspection, I found that the date is stored inside an element with the class job-date.

page %>% 
  html_elements("li.job-item") %>% 
  html_element("li#job-date") %>% 
  html_text2()

##  [1] "02 July" "02 July" "02 July" "01 July" "01 July" "01 July" "01 July"
##  [8] "01 July" "01 July" "01 July" "01 July" "01 July" "01 July" "02 July"
## [15] "02 July" "02 July" "02 July" "02 July" "02 July" "02 July" "02 July"
## [22] "02 July" "02 July" "02 July" "02 July"

4.6 Saving it as a DataFrame

Now I can combine everything into a single data frame:

# 1. Find all the individual job 'cards'
job_cards <- page %>% html_elements("li.job-list-li") 

# 2. Use map_dfr to iterate through each card and extract data
data <- job_cards %>% map_dfr(function(card) {
  
  # Extract components from this specific card
  title <- card %>% html_element("h2") %>% html_text2()
  link  <- card %>% html_element("h2 a") %>% html_attr("href")
  company <- card %>% html_element("li.job-logo img") %>% html_attr("alt")
  date    <- card %>% html_element("li#job-date") %>% html_text2() 
  
  # Return a tibble for this single row
  tibble(
    job_title   = ifelse(is.na(title), NA_character_, title),
    company     = ifelse(is.na(company), NA_character_, str_remove(company, " logo$")),
    date_posted = parse_date_time(date, orders = c("dmy", "dm")) %>% as.Date(),
    link        = paste0("https://www.myjobmag.co.ke", link)
  ) %>% 
    drop_na()
})


data %>% gt()

job_title	company	date_posted	link
Customer Support Specialist at Nathan Digital	Nathan Digital	2026-07-02	https://www.myjobmag.co.ke/job/customer-support-specialist-nathan-digital
Sales Executive & Field Service Technician (Male Applicants) at Riset Software & Systems LTD	Riset Software & Systems LTD	2026-07-02	https://www.myjobmag.co.ke/job/sales-executive-field-service-technician-male-applicants-riset-software-systems-ltd
Graduate Trainees Opportunities at West Kenya Sugar Limited	West Kenya Sugar Limited	2026-07-02	https://www.myjobmag.co.ke/jobs/graduate-trainees-opportunities-at-west-kenya-sugar-limited
Senior Technician - Fibre - North Rift Valley Job grade 2.1 at HCS Affiliates Group	HCS Affiliates Group	2026-07-01	https://www.myjobmag.co.ke/job/senior-technician-fibre-north-rift-valley-job-grade-2-1-hcs-affiliates-group
Sectional IP Network Engineer- Field - Central Job Grade 2.2 at HCS Affiliates Group	HCS Affiliates Group	2026-07-01	https://www.myjobmag.co.ke/job/sectional-ip-network-engineer-field-central-job-grade-2-2-hcs-affiliates-group
GSM Engineer - Western Region - Job Grade 2.2 at HCS Affiliates Group	HCS Affiliates Group	2026-07-01	https://www.myjobmag.co.ke/job/gsm-engineer-western-region-job-grade-2-2-hcs-affiliates-group
GSM Engineer - Central & Eastern - Job Grade 2.2 at HCS Affiliates Group	HCS Affiliates Group	2026-07-01	https://www.myjobmag.co.ke/job/gsm-engineer-central-eastern-job-grade-2-2-hcs-affiliates-group
Senior Technician GSM - Central Rift Valley Job Grade 2.1 at HCS Affiliates Group	HCS Affiliates Group	2026-07-01	https://www.myjobmag.co.ke/job/senior-technician-gsm-central-rift-valley-job-grade-2-1-hcs-affiliates-group
Account Manager at Tugende Limited	Tugende	2026-07-01	https://www.myjobmag.co.ke/job/account-manager-tugende-16
Product & Business Analyst at a Reputable Company	Product & Business Analyst at a Reputable Company	2026-07-01	https://www.myjobmag.co.ke/job/product-business-analyst
Sales Intern- Food Partners at HCS Affiliates Group	HCS Affiliates Group	2026-07-01	https://www.myjobmag.co.ke/job/sales-intern-food-partners-hcs-affiliates-group
Accounts Intern at Generations Techzone	Generations Techzone	2026-07-01	https://www.myjobmag.co.ke/job/accounts-intern-generations-techzone
Account Assistant at Generations Techzone	Generations Techzone	2026-07-01	https://www.myjobmag.co.ke/job/account-assistant-generations-techzone-1
Manager, Digital Platforms and Marketplace Growth at AA Kenya	AA Kenya	2026-07-02	https://www.myjobmag.co.ke/job/manager-digital-platforms-and-marketplace-growth-aa-kenya
Regional Sales Manager - Kenya at BURN	BURN	2026-07-02	https://www.myjobmag.co.ke/job/regional-sales-manager-kenya-burn
Purchasing Officer at Medecins Sans Frontieres (MSF)	Medecins Sans Frontieres (MSF)	2026-07-02	https://www.myjobmag.co.ke/job/purchasing-officer-medecins-sans-frontieres-msf-3
Procurement Officer at Trade and Development Bank (TDB)	Trade and Development Bank (TDB)	2026-07-02	https://www.myjobmag.co.ke/job/procurement-officer-trade-and-development-bank-tdb
Pre Sales Specialist (GCP) at Abacus	Abacus	2026-07-02	https://www.myjobmag.co.ke/job/pre-sales-specialist-gcp-abacus
Accountant at Arvocap Asset Managers	Arvocap Asset Managers	2026-07-02	https://www.myjobmag.co.ke/job/accountant-arvocap-asset-managers
Data Analyst-In-House And Contract Broiler Operations at Kenchic Limited	Kenchic Limited	2026-07-02	https://www.myjobmag.co.ke/job/data-analyst-in-house-and-contract-broiler-operations-kenchic-limited
Analyst, Solutions at Standard Bank Group	Standard Bank Group	2026-07-02	https://www.myjobmag.co.ke/job/analyst-solutions-standard-bank-group
Lab Technologist - Thika at Kenya Medical Research - KEMRI	Kenya Medical Research - KEMRI	2026-07-02	https://www.myjobmag.co.ke/job/lab-technologist-thika-kemya-medical-research-kemri
Job Vacancies at Absa Bank Limited	Absa Bank Limited	2026-07-02	https://www.myjobmag.co.ke/jobs/job-vacancies-at-absa-bank-limited-5
Vacant Roles at Genesis Analytics	Genesis Analytics	2026-07-02	https://www.myjobmag.co.ke/jobs/vacant-roles-at-genesis-analytics
Vacancies at Vihiga County Government	Vihiga County Government	2026-07-02	https://www.myjobmag.co.ke/jobs/vacancies-at-vihiga-county-government-9

Now I have a clean dataset with:

Job title — extracted from <h2> tags
Company name — extracted from image alt attribute
Date posted — extracted from <li class="job-date">
Job link — extracted from the href attribute

Now you can see how the jobs are not properly sorted by date! Ergo, having to scrape it and sort it myself.

5. Scraping Individual Job Pages

So far, I’ve extracted information that is visible on the main jobs page, such as the job title and the job URL.

However, some information isn’t displayed until I open a specific job posting.

For example, this job page contains additional information like:

Job Type

Qualification

Experience

Location

Job Field

To scrape these, I first need to download the individual job page.

job_page <- read_html(
  "https://www.myjobmag.co.ke/job/senior-technician-fibre-north-rift-valley-job-grade-2-1-hcs-affiliates-group"
)

Just like before, job_page is an HTML document that I can inspect to know where information is stored.

5.1 Inspecting the Job Page

When I right-click and inspect the job page, I can see the HTML structure.

Looking at the page, I notice that the job details are organized in a specific way.

From my inspection, I discovered the following structure:

<ul class="job-key-info">
  <li>
    <span class="jkey-title">Job Type</span>
    <span class="jkey-info">Full Time, Onsite</span>
  </li>
  <li>
    <span class="jkey-title">Qualification</span>
    <span class="jkey-info">BA/BSc/HND</span>
  </li>
  <li>
    <span class="jkey-title">Experience</span>
    <span class="jkey-info">2 years</span>
  </li>
  <li>
    <span class="jkey-title">Location</span>
    <span class="jkey-info">Nairobi</span>
  </li>
  <li>
    <span class="jkey-title">Job Field</span>
    <span class="jkey-info">Engineering / Technical, ICT / Computer</span>
  </li>
</ul>

And the heading section:

<ul class="read-ul">
  <li class="read-head">
    <ul class="read-h1">
      <li>
      <h1>
        Senior Technician - Fibre - North Rift Valley Job grade 2.1 at
        HCS Affiliates Group
        </h1>
      </li>
    </ul>
  </li>
</ul>

5.2 Finding the Real Selectors

From my inspection of the MyJobMag job page, I identified the following selectors:

What I Want	Selector	How I Found It
Job title & Company	`ul.read-ul li.read-head ul.read-h1 b`	Extract text and split at ” at ”
All job details	`ul.job-key-info li`	Each detail is inside a `<ul>` with class `job-key-info`
Label (title)	`span.jkey-title`	The label inside each `<li>`
Value (info)	`span.jkey-info`	The value inside each `<li>`
Job description	`div.job-description`	The main description section

5.3 Extracting Job Title and Company

The job title and company name are in the same element, separated by ” at “.

# Extract the full heading
full_heading <- job_page %>%
  html_element("ul.read-ul li.read-head ul.read-h1 h1") %>%
  html_text2()

full_heading

## [1] "\r \r Senior Technician - Fibre - North Rift Valley Job grade 2.1 at HCS Affiliates Group\r"

Now I can split it:

# Split into job title and company
parts <- str_split(full_heading, " at ")[[1]]
job_title <- parts[1]
company <- ifelse(length(parts) > 1, parts[2], NA)

job_title

## [1] "\r \r Senior Technician - Fibre - North Rift Valley Job grade 2.1"

company

## [1] "HCS Affiliates Group\r"

5.4 Extracting Date Posted and Deadline

Here's what the HTML code of the date looks like:

date <- job_page %>% 
  html_element("div#posted-date") %>% 
  html_text2() %>% 
  str_remove("Posted: ") %>% 
  str_trim() 

# Targeting the div with class 'read-date-sec-li' that is the second one in its container
deadline <- job_page %>% 
  html_element("div.read-date-sec-li:nth-child(2)") %>% 
  html_text2() %>% 
  str_remove("Deadline:") %>% 
  str_trim()

cat("Date Posted:", date, "\n")

## Date Posted: Jul 1, 2026

cat("Deadline:", deadline)

## Deadline: Not specified

5.5 Extracting Job Details

Now I can extract all the job details using ul.job-key-info li:

# Extract all job details
job_details <- job_page %>%
  html_elements("ul.job-key-info li")

# Extract labels and values
labels <- job_details %>%
  html_element("span.jkey-title") %>%
  html_text2()

values <- job_details %>%
  html_element("span.jkey-info") %>%
  html_text2()

# Create a data frame
details_df <- data.frame(
  label = labels,
  value = values
) %>% 
  pivot_wider(
    names_from = label, 
    values_from = value
  )

details_df

## # A tibble: 1 × 5
##   `Job Type`         Qualification Experience Location `Job Field`              
##   <chr>              <chr>         <chr>      <chr>    <chr>                    
## 1 Full Time , Onsite BA/BSc/HND    2 years    Nairobi  Engineering / Technical&…

Here’s how I can store it all in a dataset.

df <- tibble(
    job_title   = ifelse(is.na(job_title), NA_character_, job_title),
    company     = ifelse(is.na(company), NA_character_, str_remove(company, " logo$")),
    date_posted = date,
    deadline    = ifelse(is.na(deadline), NA_character_, deadline),
  ) %>% 
  cbind(details_df) %>% 
  gt()

# Viewing the dataframe
df

job_title	company	date_posted	deadline	Job Type	Qualification	Experience	Location	Job Field
Senior Technician - Fibre - North Rift Valley Job grade 2.1	HCS Affiliates Group	Jul 1, 2026	Not specified	Full Time , Onsite	BA/BSc/HND	2 years	Nairobi	Engineering / Technical&nbsp , ICT / Computer&nbsp

6. Creating a Workflow to Fetch Recent Jobs

To effectively track new opportunities, I have to build a pipeline that not only extracts job details but also filters them based on their recency. Since I have the basic understanding of scraping and we can’t scrape manually, I will encapsulate the logic into a function that handles extraction, data cleaning, and chronological filtering automatically.

The Automated Scraping Pipeline

This workflow iterates through each job card, converts raw date strings into standard Date objects, and retains only those postings from the last 7 days. By utilizing lubridate for time-based calculations and purrr for structured extraction, we ensure our dataset remains accurate and sortable.

# Defining the scraper function
myjobmag_jobs <- function(url, days_back = 7) {
  
  base_url <- "https://www.myjobmag.co.ke"
  page <- read_html(url)
  job_cards <- page %>% html_elements("li.job-list-li") 
  
  data <- job_cards %>% map_dfr(function(card) {
    
    # 1. Extracting summary info
    title   <- card %>% html_element("h2") %>% html_text2()
    link    <- card %>% html_element("h2 a") %>% html_attr("href")
    full_url <- paste0(base_url, link)
    company <- card %>% html_element("li.job-logo img") %>% html_attr("alt")
    date_raw <- card %>% html_element("li#job-date") %>% html_text2() 
    
    # 2. Extracting details by visiting the link
    # Using tryCatch to ensure one bad link doesn't break the whole scrape
    details <- tryCatch({
        job_page <- read_html(full_url)
        job_details <- job_page %>% html_elements("ul.job-key-info li")
        
        tibble(
          label = job_details %>% html_element("span.jkey-title") %>% html_text2(),
          value = job_details %>% html_element("span.jkey-info") %>% html_text2()
        ) %>%
          filter(!is.na(label)) %>%
          # Add values_fn = first to resolve the duplicate/list-column error
          pivot_wider(
            names_from = label, 
            values_from = value, 
            values_fn = first 
          )
      }, error = function(e) tibble()) 
    
    # 3. Combining summary and details into one row
    tibble(
      job_title   = title,
      company     = ifelse(is.na(company), NA_character_, str_remove(company, " logo$")),
      date_posted = parse_date_time(date_raw, orders = c("dmy", "dm")) %>% as.Date(),
      link        = full_url
    ) %>%
      bind_cols(details) # This attaches the wide-format details
  })
  
  # Filtering for the last 7 days
  final_data <- data %>%
    clean_names() %>% 
    filter(date_posted >= (Sys.Date() - days(days_back))) %>%
    drop_na(date_posted) %>% 
    separate_rows(location, sep = ",") %>%
    mutate(
      location = str_trim(location),
      primary_field = str_trim(str_split_i(job_field, "/|,", 1))
    ) %>% 
    arrange(desc(date_posted))
  
  return(final_data)
}

# Executing
recent_jobs <- myjobmag_jobs("https://www.myjobmag.co.ke/jobs", days_back = 7)
recent_jobs %>% gt()

job_title	company	date_posted	link	job_type	qualification	experience	location	job_field	salary_range	primary_field
Customer Support Specialist at Nathan Digital	Nathan Digital	2026-07-02	https://www.myjobmag.co.ke/job/customer-support-specialist-nathan-digital	Full Time , Onsite	BA/BSc/HND	2 years	Nairobi	Customer Care, Success and Service&nbsp	KSh 50,000 - KSh 100,000/month	Customer Care
Sales Executive & Field Service Technician (Male Applicants) at Riset Software & Systems LTD	Riset Software & Systems LTD	2026-07-02	https://www.myjobmag.co.ke/job/sales-executive-field-service-technician-male-applicants-riset-software-systems-ltd	Full Time , Onsite	BA/BSc/HND , Diploma		Nairobi	Sales and Business Development&nbsp	KSh 16,000 - KSh 30,000/month	Sales and Business Development&nbsp
Graduate Trainees Opportunities at West Kenya Sugar Limited	West Kenya Sugar Limited	2026-07-02	https://www.myjobmag.co.ke/jobs/graduate-trainees-opportunities-at-west-kenya-sugar-limited	Contract	BA/BSc/HND		Nairobi	Engineering / Technical&nbsp , Graduate Jobs&nbsp	NA	Engineering
Manager, Digital Platforms and Marketplace Growth at AA Kenya	AA Kenya	2026-07-02	https://www.myjobmag.co.ke/job/manager-digital-platforms-and-marketplace-growth-aa-kenya	Full Time	BA/BSc/HND , MBA/MSc/MA	8 years	Nairobi	Marketing and Communication&nbsp	NA	Marketing and Communication&nbsp
Regional Sales Manager - Kenya at BURN	BURN	2026-07-02	https://www.myjobmag.co.ke/job/regional-sales-manager-kenya-burn	Full Time	BA/BSc/HND	5 years	Nairobi	Sales and Business Development&nbsp	NA	Sales and Business Development&nbsp
Procurement Officer at Trade and Development Bank (TDB)	Trade and Development Bank (TDB)	2026-07-02	https://www.myjobmag.co.ke/job/procurement-officer-trade-and-development-bank-tdb	Full Time	MBA/MSc/MA	8 years	Nairobi	Procurement / Store-keeping / Supply Chain&nbsp	NA	Procurement
Pre Sales Specialist (GCP) at Abacus	Abacus	2026-07-02	https://www.myjobmag.co.ke/job/pre-sales-specialist-gcp-abacus	Full Time	BA/BSc/HND	7 years	Nairobi	Sales and Business Development&nbsp	NA	Sales and Business Development&nbsp
Accountant at Arvocap Asset Managers	Arvocap Asset Managers	2026-07-02	https://www.myjobmag.co.ke/job/accountant-arvocap-asset-managers	Full Time	BA/BSc/HND , Professional Certificate	3 years	Nairobi	Finance / Accounting / Audit&nbsp	NA	Finance
Data Analyst-In-House And Contract Broiler Operations at Kenchic Limited	Kenchic Limited	2026-07-02	https://www.myjobmag.co.ke/job/data-analyst-in-house-and-contract-broiler-operations-kenchic-limited	Full Time	Diploma	2 years	Nairobi	Data, Business Analysis and AI&nbsp , ICT / Computer&nbsp	NA	Data
Analyst, Solutions at Standard Bank Group	Standard Bank Group	2026-07-02	https://www.myjobmag.co.ke/job/analyst-solutions-standard-bank-group	Full Time	BA/BSc/HND	3 - 4 years	Nairobi	ICT / Computer&nbsp	NA	ICT
Lab Technologist - Thika at Kenya Medical Research - KEMRI	Kenya Medical Research - KEMRI	2026-07-02	https://www.myjobmag.co.ke/job/lab-technologist-thika-kemya-medical-research-kemri	Full Time	BA/BSc/HND , Diploma	2 years	Thika	Science&nbsp	NA	Science&nbsp
Job Vacancies at Absa Bank Limited	Absa Bank Limited	2026-07-02	https://www.myjobmag.co.ke/jobs/job-vacancies-at-absa-bank-limited-5	Full Time	BA/BSc/HND , MBA/MSc/MA		Nairobi	Human Resources / HR&nbsp	NA	Human Resources
Vacant Roles at Genesis Analytics	Genesis Analytics	2026-07-02	https://www.myjobmag.co.ke/jobs/vacant-roles-at-genesis-analytics	Full Time	MBA/MSc/MA , PhD/Fellowship	5 - 10 years	Nairobi	Project and Program Management&nbsp	NA	Project and Program Management&nbsp
Vacancies at Vihiga County Government	Vihiga County Government	2026-07-02	https://www.myjobmag.co.ke/jobs/vacancies-at-vihiga-county-government-9	Full Time	KCSE	2 years	Vihiga	Legal and Regulatory&nbsp	NA	Legal and Regulatory&nbsp
Senior Technician - Fibre - North Rift Valley Job grade 2.1 at HCS Affiliates Group	HCS Affiliates Group	2026-07-01	https://www.myjobmag.co.ke/job/senior-technician-fibre-north-rift-valley-job-grade-2-1-hcs-affiliates-group	Full Time , Onsite	BA/BSc/HND	2 years	Nairobi	Engineering / Technical&nbsp , ICT / Computer&nbsp	NA	Engineering
Sectional IP Network Engineer- Field - Central Job Grade 2.2 at HCS Affiliates Group	HCS Affiliates Group	2026-07-01	https://www.myjobmag.co.ke/job/sectional-ip-network-engineer-field-central-job-grade-2-2-hcs-affiliates-group	Full Time , Onsite	BA/BSc/HND	3 - 5 years	Nairobi	Engineering / Technical&nbsp , ICT / Computer&nbsp	NA	Engineering
GSM Engineer - Western Region - Job Grade 2.2 at HCS Affiliates Group	HCS Affiliates Group	2026-07-01	https://www.myjobmag.co.ke/job/gsm-engineer-western-region-job-grade-2-2-hcs-affiliates-group	Full Time , Onsite	BA/BSc/HND	2 years	Nairobi	Engineering / Technical&nbsp , ICT / Computer&nbsp	NA	Engineering
GSM Engineer - Central & Eastern - Job Grade 2.2 at HCS Affiliates Group	HCS Affiliates Group	2026-07-01	https://www.myjobmag.co.ke/job/gsm-engineer-central-eastern-job-grade-2-2-hcs-affiliates-group	Full Time , Onsite	BA/BSc/HND	2 years	Nairobi	Engineering / Technical&nbsp , ICT / Computer&nbsp	NA	Engineering
Senior Technician GSM - Central Rift Valley Job Grade 2.1 at HCS Affiliates Group	HCS Affiliates Group	2026-07-01	https://www.myjobmag.co.ke/job/senior-technician-gsm-central-rift-valley-job-grade-2-1-hcs-affiliates-group	Full Time , Onsite	BA/BSc/HND	2 years	Nairobi	Engineering / Technical&nbsp , ICT / Computer&nbsp	NA	Engineering
Account Manager at Tugende Limited	Tugende	2026-07-01	https://www.myjobmag.co.ke/job/account-manager-tugende-16	Full Time , Onsite	BA/BSc/HND , Diploma	1 year	Busia	Finance / Accounting / Audit&nbsp	NA	Finance
Account Manager at Tugende Limited	Tugende	2026-07-01	https://www.myjobmag.co.ke/job/account-manager-tugende-16	Full Time , Onsite	BA/BSc/HND , Diploma	1 year	Eldoret	Finance / Accounting / Audit&nbsp	NA	Finance
Account Manager at Tugende Limited	Tugende	2026-07-01	https://www.myjobmag.co.ke/job/account-manager-tugende-16	Full Time , Onsite	BA/BSc/HND , Diploma	1 year	Kisumu	Finance / Accounting / Audit&nbsp	NA	Finance
Account Manager at Tugende Limited	Tugende	2026-07-01	https://www.myjobmag.co.ke/job/account-manager-tugende-16	Full Time , Onsite	BA/BSc/HND , Diploma	1 year	Machakos	Finance / Accounting / Audit&nbsp	NA	Finance
Account Manager at Tugende Limited	Tugende	2026-07-01	https://www.myjobmag.co.ke/job/account-manager-tugende-16	Full Time , Onsite	BA/BSc/HND , Diploma	1 year	Meru	Finance / Accounting / Audit&nbsp	NA	Finance
Account Manager at Tugende Limited	Tugende	2026-07-01	https://www.myjobmag.co.ke/job/account-manager-tugende-16	Full Time , Onsite	BA/BSc/HND , Diploma	1 year	Mombasa	Finance / Accounting / Audit&nbsp	NA	Finance
Account Manager at Tugende Limited	Tugende	2026-07-01	https://www.myjobmag.co.ke/job/account-manager-tugende-16	Full Time , Onsite	BA/BSc/HND , Diploma	1 year	Nakuru	Finance / Accounting / Audit&nbsp	NA	Finance
Account Manager at Tugende Limited	Tugende	2026-07-01	https://www.myjobmag.co.ke/job/account-manager-tugende-16	Full Time , Onsite	BA/BSc/HND , Diploma	1 year	Thika	Finance / Accounting / Audit&nbsp	NA	Finance
Product & Business Analyst at a Reputable Company	Product & Business Analyst at a Reputable Company	2026-07-01	https://www.myjobmag.co.ke/job/product-business-analyst	Full Time , Onsite	BA/BSc/HND , Professional Certificate	3 - 10 years	Nairobi	Data, Business Analysis and AI&nbsp , ICT / Computer&nbsp	NA	Data
Sales Intern- Food Partners at HCS Affiliates Group	HCS Affiliates Group	2026-07-01	https://www.myjobmag.co.ke/job/sales-intern-food-partners-hcs-affiliates-group	Full Time , Onsite	BA/BSc/HND	1 - 2 years	Nairobi	Internships &nbsp , Sales and Business Development&nbsp	NA	Internships &nbsp
Accounts Intern at Generations Techzone	Generations Techzone	2026-07-01	https://www.myjobmag.co.ke/job/accounts-intern-generations-techzone	Full Time , Onsite	BA/BSc/HND , Diploma		Nairobi	Finance / Accounting / Audit&nbsp , Internships &nbsp	NA	Finance
Account Assistant at Generations Techzone	Generations Techzone	2026-07-01	https://www.myjobmag.co.ke/job/account-assistant-generations-techzone-1	Full Time , Onsite	BA/BSc/HND , Diploma	1 year	Nairobi	Finance / Accounting / Audit&nbsp	KSh 16,000 - KSh 30,000/month	Finance

7. Next Steps

Now that I have the data, here are some next steps I’m considering:

Store the data in a database (SQLite or PostgreSQL)
Schedule the scraper to run daily using cron or taskscheduleR
Build a dashboard in Shiny to filter and search jobs
Add email alerts for new jobs matching specific keywords
Clean the data further using functions like clean_numeric() and clean_Date()

8. References

Wickham H (2022). rvest: Easily Harvest (Scrape) Web Pages. R package version 1.0.3. https://CRAN.R-project.org/package=rvest
Wickham H, Bryan J (2023). R for Data Science. https://r4ds.had.co.nz/
MyJobMag. https://www.myjobmag.co.ke/

Field Notes: Learning Web Scraping to Simplify Job Searching

Roy Mwavita

2026-07-02

1. Introduction

These notes document what I'm learning about how web scraping works in R.

2. Reading the Website

3. Understanding HTML

Hello World

3.1 Elements

3.2 Nodes

3.3 Tags

3.3.1 The `<a>` Tag

3.3.2 Parent and Child

In summary:

4. Finding the Information I Want

4.1 Inspecting the Website

This is what I see when I inspect MyJobMag.

4.1.1 Job Card

Here's what a typical job card looks like:

4.2 Extracting the Job Title

4.3 Extracting the Job Link

4.4 Extracting the Company Name

4.5 Extracting the Date Posted

4.6 Saving it as a DataFrame

5. Scraping Individual Job Pages

For example, this job page contains additional information like:

Job Type

Qualification

Experience

Location

Job Field

5.1 Inspecting the Job Page

5.2 Finding the Real Selectors

5.3 Extracting Job Title and Company

5.4 Extracting Date Posted and Deadline

Here's what the HTML code of the date looks like:

5.5 Extracting Job Details

6. Creating a Workflow to Fetch Recent Jobs

The Automated Scraping Pipeline

7. Next Steps

8. References

Field Notes: Learning Web Scraping to Simplify Job Searching

Roy Mwavita

2026-07-02

1. Introduction

These notes document what I'm learning about how web scraping works in R.

2. Reading the Website

3. Understanding HTML

Hello World

3.1 Elements

3.2 Nodes

3.3 Tags

3.3.1 The <a> Tag

3.3.2 Parent and Child

In summary:

4. Finding the Information I Want

4.1 Inspecting the Website

This is what I see when I inspect MyJobMag.

4.1.1 Job Card

Here's what a typical job card looks like:

4.2 Extracting the Job Title

4.3 Extracting the Job Link

4.4 Extracting the Company Name

4.5 Extracting the Date Posted

4.6 Saving it as a DataFrame

5. Scraping Individual Job Pages

For example, this job page contains additional information like:

Job Type Qualification Experience Location Job Field

5.1 Inspecting the Job Page

5.2 Finding the Real Selectors

5.3 Extracting Job Title and Company

5.4 Extracting Date Posted and Deadline

Here's what the HTML code of the date looks like:

5.5 Extracting Job Details

6. Creating a Workflow to Fetch Recent Jobs

The Automated Scraping Pipeline

7. Next Steps

8. References

3.3.1 The `<a>` Tag

Job Type

Qualification

Experience

Location

Job Field