Introduction

Purpose and research questions

Lifelong learning supports social participation, cognitive health, and well being in later life. Online learning platforms often celebrate lifelong learning, yet older learners rarely appear as a focus in learning analytics work.

This project uses an EdX course catalog snapshot to study how often courses explicitly reference aging, older adults, or later life. The analysis addresses three research questions.

  • RQ1. Coverage. How common are courses that explicitly reference aging, older adults, or gerontology in this EdX dataset.
  • RQ2. Location. In which subjects and course levels do these aging related courses appear compared with the rest of the catalog.
  • RQ3. Positioning. Which ideas and keywords are most prominent in aging related course descriptions and how they frame aging and older adulthood.

The intended audience includes:

  • EdX and other MOOC platform staff who make decisions about catalog breadth.
  • Gerontology and adult education programs that might point older learners to MOOCs.
  • Learning analytics practitioners who study equity in access to learning opportunities.

Data source and context

# path to where the CSV is stored
edx_raw <- readr::read_csv("edx_courses.csv")

glimpse(edx_raw)
## Rows: 975
## Columns: 16
## $ title              <chr> "How to Learn Online", "Programming for Everybody (…
## $ summary            <chr> "Learn essential strategies for successful online l…
## $ n_enrolled         <dbl> 124980, 293864, 2442271, 129555, 81140, 301793, 328…
## $ course_type        <chr> "Self-paced on your time", "Self-paced on your time…
## $ institution        <chr> "edX", "The University of Michigan", "Harvard Unive…
## $ instructors        <chr> "Nina Huntemann-Robyn Belair-Ben Piscopo", "Charles…
## $ Level              <chr> "Introductory", "Introductory", "Introductory", "In…
## $ subject            <chr> "Education & Teacher Training", "Computer Science",…
## $ language           <chr> "English", "English", "English", "English", "Englis…
## $ subtitles          <chr> "English", "English", "English", "English", "Englis…
## $ course_effort      <chr> "2–3 hours per week", "2–4 hours per week", "6–18 h…
## $ course_length      <chr> "2 Weeks", "7 Weeks", "12 Weeks", "13 Weeks", "4 We…
## $ price              <chr> "FREE-Add a Verified Certificate for $49 USD", "FRE…
## $ course_description <chr> "Designed for those who are new to elearning, this …
## $ course_syllabus    <chr> "Welcome - We start with opportunities to meet your…
## $ course_url         <chr> "https://www.edx.org/course/how-to-learn-online", "…

The dataset contains 975 EdX courses with the following key fields.

  • title: course title
  • summary: short description shown in the catalog
  • course_description: full course description
  • course_syllabus: syllabus or weekly outline where available
  • subject: high level subject category, such as Social Sciences or Computer Science
  • Level: course level, such as Introductory, Intermediate, or Advanced
  • language: primary language of instruction
  • course_effort: estimated weekly effort text, such as “2–4 hours per week”
  • course_length: expected course length, such as “6 weeks”
  • n_enrolled: enrollment count as a formatted string
  • other metadata such as institution, course type, and course URL

The file appears to be a scraped snapshot of the EdX catalog around 2021. It reflects courses that were public on the platform at that time rather than the complete historical catalog. The dataset contains course level metadata only, not learner level interaction data.

Data wrangling and feature engineering

Cleaning and preprocessing steps

The analysis builds a combined text field for keyword search and text mining and performs light cleaning of character variables. The goal is to keep transformations simple and transparent.

edx_clean <- edx_raw |>
  mutate(
    across(where(is.character), ~ str_squish(.x)),
    # combined text across title, summary, description, and syllabus
    text_all = str_to_lower(
      paste(
        title,
        summary,
        course_description,
        course_syllabus,
        sep = " "
      )
    ),
    subject = as.factor(subject),
    Level = as.factor(Level),
    language = as.factor(language)
  )

edx_clean |>
  summarise(
    n_courses = n(),
    missing_title = sum(is.na(title) | title == ""),
    missing_text = sum(is.na(text_all) | text_all == "")
  )

These steps:

  • Standardize spacing in character fields.
  • Lowercase all text for reliable string matching.
  • Combine several descriptive fields into text_all so that searches capture references in the title, summary, detailed description, or syllabus.
  • Convert subject, Level, and language to factors for grouped summaries and plots.

More aggressive preprocessing such as stemming or removal of all high frequency words appears later in the text mining section, since those choices affect interpretability.

If enrollment counts are needed, n_enrolled can be converted from formatted text to a numeric variable.

edx_clean <- edx_clean |>
  mutate(
    n_enrolled_num = readr::parse_number(as.character(n_enrolled))
  )

This project does not rely on enrollment for the main research questions, so the numeric conversion is optional.

Analysis

Key findings and implications

Summary of findings

In this EdX catalog snapshot:

  • Only 8 of 975 courses, about 0.8%, explicitly reference aging, older adults, or later life in their course text.
  • Aging related courses cluster in Social Sciences, Health and Safety, and Food and Nutrition and appear rarely in technical subjects such as Computer Science, Engineering, or Data Analysis.
  • Most aging related courses are Introductory, with very few at Intermediate or Advanced levels.
  • Text mining shows that these courses emphasize health, social conditions, rights, and the broader demographic challenge of aging populations.
  • Few courses present aging as a context for ongoing learning in technical fields, digital skills, or creative domains.

These results suggest that older adults appear in this EdX sample mainly as subjects of health and social care or as part of a demographic and economic challenge rather than as a diverse group of learners with broad interests.

Potential actions for the target audience

For platform staff and course designers:

  • Identify aging related gaps in technical and creative domains and design new courses in areas such as digital literacy for later life, data skills for community engagement, or design for an aging society.
  • Build a curated pathway or catalog tag for aging, later adulthood, or longevity learning that makes existing courses more visible to older learners and professionals who work with them.

For gerontology and adult education programs:

  • Partner with MOOC providers to co design courses that reflect the interests and needs of older adults beyond health and risk management, such as civic engagement, entrepreneurship, and intergenerational learning.
  • Use catalog analytics like this as a baseline when arguing for new course development and partnerships.

For learning analytics practitioners:

  • Extend this catalog level analysis with learner level data when available, for example by studying who enrolls in aging related MOOCs, how engagement patterns compare across age groups, and how course design features support or hinder older learners.
  • Combine text analytics with survey or interview data from older learners about how they find and interpret online learning opportunities.

Limitations, ethics, and next steps

Data and measurement limitations

Several limitations shape these findings.

  • The dataset is a scraped snapshot of the EdX catalog around 2021, not a complete or current representation of all EdX offerings.
  • The analysis treats each course as a single unit and does not inspect course materials, assessments, or forums.
  • The aging flag uses explicit keyword patterns, which miss courses that might be relevant but avoid direct references to aging or older adults.
  • Some flagged courses discuss aging at the population level, not as a focus on older learners themselves.

These constraints mean that the prevalence estimates here are lower bounds on explicit aging related language, not definitive counts of all courses that could serve older learners.

Ethical considerations

The analysis works with public course metadata and does not use any identifiable learner data, so privacy risks are low. Yet downstream use of these findings still raises questions.

  • If decision makers treat this snapshot as complete, they may overlook recent courses or closed cohort offerings aimed at older adults.
  • If aging related courses are defined only by a narrow set of keywords, course developers might over optimize descriptions for those terms instead of thinking carefully about inclusive design and content.

A more balanced approach uses catalog analytics as one input among several when planning how to support older learners.

Future work

Future work could:

  • Compare this EdX snapshot to other platforms and to later catalog snapshots to see whether aging related offerings are growing or static.
  • Incorporate learner data, where accessible and ethical, to examine participation and completion patterns for older learners in both aging specific and general courses.
  • Extend the aging flag to include related roles such as caregivers and intergenerational learning, paired with manual review, to draw a richer map of where later life learning appears in the online ecosystem.

References

Krumm, A., Means, B., and Bienkowski, M. 2018. Learning Analytics Goes to School: A Collaborative Approach to Improving Education. Routledge.

Nakhaee, M. [imuhammad]. (2010). Edx Courses: A list of online courses on edx.org learning platform. (Version 5) [Dataset]. Kaggle. https://www.kaggle.com/datasets/imuhammad/edx-courses