Lifelong learning supports social participation, cognitive health, and well being in later life. Online learning platforms often celebrate lifelong learning, yet older learners rarely appear as a focus in learning analytics work.
This project uses an EdX course catalog snapshot to study how often courses explicitly reference aging, older adults, or later life. The analysis addresses three research questions.
The intended audience includes:
# path to where the CSV is stored
edx_raw <- readr::read_csv("edx_courses.csv")
glimpse(edx_raw)
## Rows: 975
## Columns: 16
## $ title <chr> "How to Learn Online", "Programming for Everybody (…
## $ summary <chr> "Learn essential strategies for successful online l…
## $ n_enrolled <dbl> 124980, 293864, 2442271, 129555, 81140, 301793, 328…
## $ course_type <chr> "Self-paced on your time", "Self-paced on your time…
## $ institution <chr> "edX", "The University of Michigan", "Harvard Unive…
## $ instructors <chr> "Nina Huntemann-Robyn Belair-Ben Piscopo", "Charles…
## $ Level <chr> "Introductory", "Introductory", "Introductory", "In…
## $ subject <chr> "Education & Teacher Training", "Computer Science",…
## $ language <chr> "English", "English", "English", "English", "Englis…
## $ subtitles <chr> "English", "English", "English", "English", "Englis…
## $ course_effort <chr> "2–3 hours per week", "2–4 hours per week", "6–18 h…
## $ course_length <chr> "2 Weeks", "7 Weeks", "12 Weeks", "13 Weeks", "4 We…
## $ price <chr> "FREE-Add a Verified Certificate for $49 USD", "FRE…
## $ course_description <chr> "Designed for those who are new to elearning, this …
## $ course_syllabus <chr> "Welcome - We start with opportunities to meet your…
## $ course_url <chr> "https://www.edx.org/course/how-to-learn-online", "…
The dataset contains 975 EdX courses with the following key fields.
title: course titlesummary: short description shown in the catalogcourse_description: full course descriptioncourse_syllabus: syllabus or weekly outline where
availablesubject: high level subject category, such as Social
Sciences or Computer ScienceLevel: course level, such as Introductory,
Intermediate, or Advancedlanguage: primary language of instructioncourse_effort: estimated weekly effort text, such as
“2–4 hours per week”course_length: expected course length, such as “6
weeks”n_enrolled: enrollment count as a formatted
stringThe file appears to be a scraped snapshot of the EdX catalog around 2021. It reflects courses that were public on the platform at that time rather than the complete historical catalog. The dataset contains course level metadata only, not learner level interaction data.
The analysis builds a combined text field for keyword search and text mining and performs light cleaning of character variables. The goal is to keep transformations simple and transparent.
edx_clean <- edx_raw |>
mutate(
across(where(is.character), ~ str_squish(.x)),
# combined text across title, summary, description, and syllabus
text_all = str_to_lower(
paste(
title,
summary,
course_description,
course_syllabus,
sep = " "
)
),
subject = as.factor(subject),
Level = as.factor(Level),
language = as.factor(language)
)
edx_clean |>
summarise(
n_courses = n(),
missing_title = sum(is.na(title) | title == ""),
missing_text = sum(is.na(text_all) | text_all == "")
)
These steps:
text_all so
that searches capture references in the title, summary, detailed
description, or syllabus.subject, Level, and
language to factors for grouped summaries and plots.More aggressive preprocessing such as stemming or removal of all high frequency words appears later in the text mining section, since those choices affect interpretability.
If enrollment counts are needed, n_enrolled can be
converted from formatted text to a numeric variable.
edx_clean <- edx_clean |>
mutate(
n_enrolled_num = readr::parse_number(as.character(n_enrolled))
)
This project does not rely on enrollment for the main research questions, so the numeric conversion is optional.
In this EdX catalog snapshot:
These results suggest that older adults appear in this EdX sample mainly as subjects of health and social care or as part of a demographic and economic challenge rather than as a diverse group of learners with broad interests.
For platform staff and course designers:
For gerontology and adult education programs:
For learning analytics practitioners:
Several limitations shape these findings.
These constraints mean that the prevalence estimates here are lower bounds on explicit aging related language, not definitive counts of all courses that could serve older learners.
The analysis works with public course metadata and does not use any identifiable learner data, so privacy risks are low. Yet downstream use of these findings still raises questions.
A more balanced approach uses catalog analytics as one input among several when planning how to support older learners.
Future work could:
Krumm, A., Means, B., and Bienkowski, M. 2018. Learning Analytics Goes to School: A Collaborative Approach to Improving Education. Routledge.
Nakhaee, M. [imuhammad]. (2010). Edx Courses: A list of online courses on edx.org learning platform. (Version 5) [Dataset]. Kaggle. https://www.kaggle.com/datasets/imuhammad/edx-courses