# Read the dataset
mydata <- read.csv("~/Bootcamp/my_dataset_HW3.csv", header = TRUE)
# Display the first few rows of the dataset
head(mydata, 10)
## Institution Course.Number Launch.Date
## 1 MITx 6.002x 09/05/2012
## 2 MITx 6.00x 09/26/2012
## 3 MITx 3.091x 10/09/2012
## 4 HarvardX CS50x 10/15/2012
## 5 HarvardX PH207x 10/15/2012
## 6 MITx 6.00x 02/04/2013
## 7 MITx 3.091x 02/05/2013
## 8 MITx 14.73x 02/12/2013
## 9 MITx 8.02x 02/18/2013
## 10 HarvardX ER22x 03/02/2013
## Course.Title
## 1 Circuits and Electronics
## 2 Introduction to Computer Science and Programming
## 3 Introduction to Solid State Chemistry
## 4 Introduction to Computer Science
## 5 Health in Numbers: Quantitative Methods in Clinical and Public Health Research
## 6 Introduction to Computer Science and Programming
## 7 Introduction to Solid State Chemistry
## 8 The Challenges of Global Poverty
## 9 Electricity and Magnetism
## 10 Justice
## Instructors
## 1 Khurram Afridi
## 2 Eric Grimson, John Guttag, Chris Terman
## 3 Michael Cima
## 4 David Malan, Nate Hardison, Rob Bowden, Tommy MacWilliam, Zamyla Chan
## 5 Earl Francis Cook, Marcello Pagano
## 6 Larry Rudolph
## 7 Michael Cima
## 8 Esther Duflo, Abhijit Banerjee
## 9 Walter Lewin, John Belcher, Peter Dourmashkin, Ricardo Abbate, Saif Rayyan, George Stephans, Isaac Chuang
## 10 Michael Sandel
## Course.Subject Year Honor.Code.Certificates
## 1 Science, Technology, Engineering, and Mathematics 1 1
## 2 Computer Science 1 1
## 3 Science, Technology, Engineering, and Mathematics 1 1
## 4 Computer Science 1 1
## 5 Government, Health, and Social Science 1 1
## 6 Computer Science 1 1
## 7 Science, Technology, Engineering, and Mathematics 1 1
## 8 Government, Health, and Social Science 1 1
## 9 Science, Technology, Engineering, and Mathematics 1 1
## 10 Humanities, History, Design, Religion, and Education 1 1
## Participants..Course.Content.Accessed. Audited....50..Course.Content.Accessed. Certified
## 1 36105 5431 3003
## 2 62709 8949 5783
## 3 16663 2855 2082
## 4 129400 12888 1439
## 5 52521 10729 5058
## 6 65380 6473 3313
## 7 8270 838 547
## 8 29044 6510 4607
## 9 39178 3543 1722
## 10 58779 9425 5438
## X..Audited X..Certified X..Certified.of...50..Course.Content.Accessed X..Played.Video
## 1 15.04 8.32 54.98 83.2
## 2 14.27 9.22 64.05 89.14
## 3 17.13 12.49 72.85 87.49
## 4 9.96 1.11 11.11 0
## 5 20.44 9.64 47.12 77.45
## 6 9.90 5.07 51.17 82.43
## 7 10.13 6.61 65.16 80.25
## 8 22.41 15.86 70.60 83.24
## 9 9.04 4.40 48.49 85.3
## 10 16.05 9.26 51.07 ---
## X..Posted.in.Forum X..Grade.Higher.Than.Zero Total.Course.Hours..Thousands.
## 1 8.17 28.97 418.94
## 2 14.38 39.50 884.04
## 3 14.42 34.89 227.55
## 4 0.00 1.11 220.90
## 5 15.98 32.52 804.41
## 6 10.30 28.90 639.40
## 7 10.22 23.49 68.11
## 8 13.89 39.38 279.22
## 9 5.86 16.04 380.35
## 10 21.86 20.98 186.61
## Median.Hours.for.Certification Median.Age X..Male X..Female X..Bachelor.s.Degree.or.Higher
## 1 64.45 26 88.28 11.72 60.68
## 2 78.53 28 83.50 16.50 63.04
## 3 61.28 27 70.32 29.68 58.76
## 4 0.00 28 80.02 19.98 58.78
## 5 76.10 32 56.78 43.22 88.33
## 6 84.14 27 83.99 16.01 60.90
## 7 59.29 27 73.30 26.70 58.99
## 8 40.30 30 53.76 46.24 81.94
## 9 107.88 26 85.42 14.58 56.97
## 10 13.67 30 60.42 39.58 69.78
Research Question: How the percentage of certified participants of online courses can be affected by such factors as Institution, percentage of people posted in forums, percentage of those who have grades higher than zero, and median age?
Unit of observation: An online course Number of units: 290
Basically, there are two institutes MITx and HarvardX - categorical variable And people who actively post in forums might be more successful to get certificates, I mean they communicate with each other etc. Those who have grades higher than zero might also get certificate.
While doing this analysis I am shocked by the number of people who start online course and then just give up.
The source of the dataset: https://www.kaggle.com/datasets/edx/course-study
Description of variables in the dataset:
Institution: The educational institution offering the online course (MITx, HarvardX).
Course Number: The unique identifier for the course.
Launch Date: The date when the course was launched.
Course Title: The title or name of the online course.
Instructors: Names of instructors or educators involved in teaching the course.
Course Subject: The subject category to which the course belongs (e.g., Science, Technology, Engineering, and Mathematics).
Year: The year in which the course was conducted.
Honor Code Certificates: Binary indicator (1 or 0) denoting whether honor code certificates were offered.
Participants (Course Content Accessed): The total number of participants who accessed the course content.
Audited (> 50% Course Content Accessed): The number of participants who audited more than 50% of the course content.
Certified: The number of participants who successfully completed and earned certification.
% Audited: Percentage of participants who audited the course.
% Certified: Percentage of participants who earned certification.
% Certified of > 50% Course Content Accessed: Percentage of participants who earned certification among those who audited more than 50% of the course content.
% Played Video: Percentage of participants who played course videos.
% Posted in Forum: Percentage of participants who posted in the course forum.
% Grade Higher Than Zero: Percentage of participants who achieved a grade higher than zero.
Total Course Hours (Thousands): The total number of course hours, expressed in thousands.
Median Hours for Certification: The median number of hours taken by participants to achieve certification.
Median Age: The median age of course participants.
% Male: Percentage of male participants.
% Female: Percentage of female participants.
% Bachelor’s Degree or Higher: Percentage of participants with a bachelor’s degree or higher.
mydata$ID <- 1:nrow(mydata)
# Factorize Institution (MITx - 1, Harvard - 0)
mydata$Institution <- factor(mydata$Institution, levels = c("MITx", "HarvardX"), labels = c(1, 0))
# Display the head of the dataset to check changes
head(mydata)
## Institution Course.Number Launch.Date
## 1 1 6.002x 09/05/2012
## 2 1 6.00x 09/26/2012
## 3 1 3.091x 10/09/2012
## 4 0 CS50x 10/15/2012
## 5 0 PH207x 10/15/2012
## 6 1 6.00x 02/04/2013
## Course.Title
## 1 Circuits and Electronics
## 2 Introduction to Computer Science and Programming
## 3 Introduction to Solid State Chemistry
## 4 Introduction to Computer Science
## 5 Health in Numbers: Quantitative Methods in Clinical and Public Health Research
## 6 Introduction to Computer Science and Programming
## Instructors
## 1 Khurram Afridi
## 2 Eric Grimson, John Guttag, Chris Terman
## 3 Michael Cima
## 4 David Malan, Nate Hardison, Rob Bowden, Tommy MacWilliam, Zamyla Chan
## 5 Earl Francis Cook, Marcello Pagano
## 6 Larry Rudolph
## Course.Subject Year Honor.Code.Certificates
## 1 Science, Technology, Engineering, and Mathematics 1 1
## 2 Computer Science 1 1
## 3 Science, Technology, Engineering, and Mathematics 1 1
## 4 Computer Science 1 1
## 5 Government, Health, and Social Science 1 1
## 6 Computer Science 1 1
## Participants..Course.Content.Accessed. Audited....50..Course.Content.Accessed. Certified
## 1 36105 5431 3003
## 2 62709 8949 5783
## 3 16663 2855 2082
## 4 129400 12888 1439
## 5 52521 10729 5058
## 6 65380 6473 3313
## X..Audited X..Certified X..Certified.of...50..Course.Content.Accessed X..Played.Video
## 1 15.04 8.32 54.98 83.2
## 2 14.27 9.22 64.05 89.14
## 3 17.13 12.49 72.85 87.49
## 4 9.96 1.11 11.11 0
## 5 20.44 9.64 47.12 77.45
## 6 9.90 5.07 51.17 82.43
## X..Posted.in.Forum X..Grade.Higher.Than.Zero Total.Course.Hours..Thousands.
## 1 8.17 28.97 418.94
## 2 14.38 39.50 884.04
## 3 14.42 34.89 227.55
## 4 0.00 1.11 220.90
## 5 15.98 32.52 804.41
## 6 10.30 28.90 639.40
## Median.Hours.for.Certification Median.Age X..Male X..Female X..Bachelor.s.Degree.or.Higher ID
## 1 64.45 26 88.28 11.72 60.68 1
## 2 78.53 28 83.50 16.50 63.04 2
## 3 61.28 27 70.32 29.68 58.76 3
## 4 0.00 28 80.02 19.98 58.78 4
## 5 76.10 32 56.78 43.22 88.33 5
## 6 84.14 27 83.99 16.01 60.90 6
# Descriptive statistics
round(stat.desc(mydata[c("Institution", "X..Posted.in.Forum", "X..Grade.Higher.Than.Zero", "Median.Age")], basic = FALSE), 2)
## Institution X..Posted.in.Forum X..Grade.Higher.Than.Zero Median.Age
## median NA 7.24 19.60 29.00
## mean NA 9.35 21.21 29.30
## SE.mean NA 0.44 0.79 0.24
## CI.mean NA 0.87 1.55 0.47
## var NA 56.51 179.87 16.39
## std.dev NA 7.52 13.41 4.05
## coef.var NA 0.80 0.63 0.14
summary(mydata[c("Institution", "X..Posted.in.Forum", "X..Grade.Higher.Than.Zero", "Median.Age")])
## Institution X..Posted.in.Forum X..Grade.Higher.Than.Zero Median.Age
## 1:161 Min. : 0.000 Min. : 0.00 Min. :22.0
## 0:129 1st Qu.: 3.993 1st Qu.:10.59 1st Qu.:26.0
## Median : 7.245 Median :19.61 Median :29.0
## Mean : 9.348 Mean :21.21 Mean :29.3
## 3rd Qu.:14.107 3rd Qu.:30.90 3rd Qu.:31.0
## Max. :35.280 Max. :52.35 Max. :53.0
From descriptive statistics we realize some things, such as:
The average percentage of forum posts is relatively high, with a mean of 9.35.
The distribution is right-skewed, with a minimum value of 0% and a maximum value of 35.28%. The average percentage of grades higher than zero is 21.21, indicating a relatively high overall performance. The distribution ranges from 0% to 52.35%. The average median age is 29.3 years. Apparently, adults with some work experiences study online to make their skills better. The dataset includes participants with a minimum age of 22 years and a maximum age of 53 years. The majority of participants fall within the age range of 26 to 31 years.
# Scatterplot Matrix
mydata_numer <- mydata[, c("X..Posted.in.Forum", "X..Grade.Higher.Than.Zero", "Median.Age")]
scatterplotMatrix(mydata_numer,
smooth = FALSE)
From the scatterplot visualization we understand that there is a
positive relationship between getting certification and posting in
forums, getting higher than zero points and median age.
# VIF statistics
fit_mydata <- lm(X..Certified ~ Institution + X..Posted.in.Forum +
X..Grade.Higher.Than.Zero + Median.Age,
data = mydata)
vif(fit_mydata)
## Institution X..Posted.in.Forum X..Grade.Higher.Than.Zero
## 1.527417 1.596425 1.367748
## Median.Age
## 1.544345
When we look at the VIF values, they’re all pretty close to 1. That’s a good thing because it means there’s not much multicollinearity going on. It’s a positive sign for our regression analysis.
mydata$StdResid <- round(rstandard(fit_mydata), 3)
mydata$CooksD <- round(cooks.distance(fit_mydata), 3)
hist(mydata$StdResid,
xlab = "Standardized residuals",
ylab = "Frequency",
main = "Histogram of standardized residuals")
There are some standardized residuals outside the (-3, 3) range, so we
have to remove some units.
head(mydata[order(-mydata$StdResid),], 10)
## Institution Course.Number Launch.Date
## 188 0 GOV1368.3x 10/01/2015
## 187 0 GOV1368.2x 10/01/2015
## 175 0 HUM1.7x 09/21/2015
## 99 0 SW12.9x 11/20/2014
## 90 0 SW12.8x 10/09/2014
## 57 0 SW12.5x 04/24/2014
## 88 0 HUM2.3x 10/08/2014
## 189 0 GOV1368.4x 10/01/2015
## 75 0 SW12.7x 09/04/2014
## 40 0 SW12.3x 02/13/2014
## Course.Title
## 188 Saving Schools: History, Politics, and Policy of U.S. Education – Accountability and National Standards
## 187 Saving Schools: History, Politics, and Policy of U.S. Education – Teacher Policies
## 175 History of the Book: Monasteries, Schools, and Notaries, Part 2: Introduction to the Transitional Gothic Script
## 99 Communist Liberations
## 90 Creating China: The Birth of a Nation
## 57 From Global Empire to Global Economy
## 88 The Ancient Greek Hero in 24 Hours (Hours 12-15): Cult of Heroes
## 189 Saving Schools: History, Politics, and Policy of U.S. Education – School Choice
## 75 Invasions, Rebellions, and the end of Imperial China
## 40 Cosmopolitan Tang: Aristocratic Culture
## Instructors Course.Subject Year
## 188 Paul Peterson Humanities, History, Design, Religion, and Education 4
## 187 Paul Peterson Humanities, History, Design, Religion, and Education 4
## 175 Beverly Kienzle Humanities, History, Design, Religion, and Education 4
## 99 Peter Bol, Bill Kirby Humanities, History, Design, Religion, and Education 3
## 90 Peter Bol, Bill Kirby Humanities, History, Design, Religion, and Education 3
## 57 Peter Bol, Bill Kirby Humanities, History, Design, Religion, and Education 2
## 88 Gregory Nagy Humanities, History, Design, Religion, and Education 3
## 189 Paul Peterson Humanities, History, Design, Religion, and Education 4
## 75 Peter Bol, Bill Kirby Humanities, History, Design, Religion, and Education 3
## 40 Peter Bol, Bill Kirby Humanities, History, Design, Religion, and Education 2
## Honor.Code.Certificates Participants..Course.Content.Accessed.
## 188 1 492
## 187 1 702
## 175 1 670
## 99 1 4248
## 90 1 4515
## 57 1 5256
## 88 1 1559
## 189 1 511
## 75 1 4662
## 40 1 7422
## Audited....50..Course.Content.Accessed. Certified X..Audited X..Certified
## 188 246 127 50.00 25.81
## 187 348 180 49.57 25.64
## 175 364 191 54.33 28.51
## 99 1835 1442 43.24 33.98
## 90 2081 1528 46.13 33.87
## 57 2649 1686 50.44 32.10
## 88 697 417 44.85 26.83
## 189 212 113 41.49 22.11
## 75 2148 1505 46.10 32.30
## 40 3221 2226 43.43 30.01
## X..Certified.of...50..Course.Content.Accessed X..Played.Video X..Posted.in.Forum
## 188 51.22 54.07 8.54
## 187 51.72 55.98 11.54
## 175 51.37 64.93 7.01
## 99 65.12 79.76 30.14
## 90 62.85 78.36 31.46
## 57 62.21 75.32 28.69
## 88 59.54 43.18 2.90
## 189 53.30 49.71 5.68
## 75 61.45 80.08 33.98
## 40 63.18 77.16 29.14
## X..Grade.Higher.Than.Zero Total.Course.Hours..Thousands. Median.Hours.for.Certification
## 188 25.81 1.18 3.67
## 187 25.64 1.85 4.65
## 175 28.51 3.49 8.25
## 99 50.49 23.03 9.76
## 90 51.05 23.92 9.54
## 57 48.95 24.76 9.09
## 88 33.72 3.20 1.77
## 189 22.11 1.08 4.44
## 75 52.26 22.93 8.68
## 40 50.29 33.76 8.99
## Median.Age X..Male X..Female X..Bachelor.s.Degree.or.Higher ID StdResid CooksD
## 188 30 49.09 50.91 78.86 188 3.013 0.020
## 187 31 48.98 51.02 80.42 187 2.937 0.015
## 175 39 44.76 55.24 77.52 175 2.873 0.055
## 99 37 67.11 32.89 82.98 99 2.832 0.060
## 90 37 64.33 35.67 82.68 90 2.780 0.062
## 57 35 64.35 35.65 82.04 57 2.637 0.043
## 88 34 50.35 49.65 66.91 88 2.447 0.035
## 189 31 48.48 51.52 74.39 189 2.386 0.014
## 75 38 63.87 36.13 82.25 75 2.314 0.052
## 40 34 58.97 41.03 79.40 40 2.159 0.030
After thinking, I decided to remove the units 168, 169.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:car':
##
## recode
## The following objects are masked from 'package:pastecs':
##
## first, last
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
removal <- c("168", "169")
mydata <- mydata %>%
filter(!(ID %in% removal))
hist(mydata$StdResid,
xlab = "Standardized residuals",
ylab = "Frequency",
main = "Histogram of standardized residuals")
Now the stardardized residuals that are above +3 are gone, but the ones that are below -3 are left. So we still need to filter out those below -3.
mydata_filtered <- mydata[mydata$StdResid >= -3, ]
head(mydata[order(-mydata$StdResid),], 5)
## Institution Course.Number Launch.Date
## 186 0 GOV1368.3x 10/01/2015
## 185 0 GOV1368.2x 10/01/2015
## 173 0 HUM1.7x 09/21/2015
## 99 0 SW12.9x 11/20/2014
## 90 0 SW12.8x 10/09/2014
## Course.Title
## 186 Saving Schools: History, Politics, and Policy of U.S. Education – Accountability and National Standards
## 185 Saving Schools: History, Politics, and Policy of U.S. Education – Teacher Policies
## 173 History of the Book: Monasteries, Schools, and Notaries, Part 2: Introduction to the Transitional Gothic Script
## 99 Communist Liberations
## 90 Creating China: The Birth of a Nation
## Instructors Course.Subject Year
## 186 Paul Peterson Humanities, History, Design, Religion, and Education 4
## 185 Paul Peterson Humanities, History, Design, Religion, and Education 4
## 173 Beverly Kienzle Humanities, History, Design, Religion, and Education 4
## 99 Peter Bol, Bill Kirby Humanities, History, Design, Religion, and Education 3
## 90 Peter Bol, Bill Kirby Humanities, History, Design, Religion, and Education 3
## Honor.Code.Certificates Participants..Course.Content.Accessed.
## 186 1 492
## 185 1 702
## 173 1 670
## 99 1 4248
## 90 1 4515
## Audited....50..Course.Content.Accessed. Certified X..Audited X..Certified
## 186 246 127 50.00 25.81
## 185 348 180 49.57 25.64
## 173 364 191 54.33 28.51
## 99 1835 1442 43.24 33.98
## 90 2081 1528 46.13 33.87
## X..Certified.of...50..Course.Content.Accessed X..Played.Video X..Posted.in.Forum
## 186 51.22 54.07 8.54
## 185 51.72 55.98 11.54
## 173 51.37 64.93 7.01
## 99 65.12 79.76 30.14
## 90 62.85 78.36 31.46
## X..Grade.Higher.Than.Zero Total.Course.Hours..Thousands. Median.Hours.for.Certification
## 186 25.81 1.18 3.67
## 185 25.64 1.85 4.65
## 173 28.51 3.49 8.25
## 99 50.49 23.03 9.76
## 90 51.05 23.92 9.54
## Median.Age X..Male X..Female X..Bachelor.s.Degree.or.Higher ID StdResid CooksD
## 186 30 49.09 50.91 78.86 188 3.013 0.020
## 185 31 48.98 51.02 80.42 187 2.937 0.015
## 173 39 44.76 55.24 77.52 175 2.873 0.055
## 99 37 67.11 32.89 82.98 99 2.832 0.060
## 90 37 64.33 35.67 82.68 90 2.780 0.062
# Filter out units with standardized residuals below -3 and above 3
mydata_filtered <- mydata[mydata$StdResid >= -3 & mydata$StdResid <= 3, ]
# Create a new histogram after filtering
hist(mydata_filtered$StdResid,
xlab = "Standardized residuals",
ylab = "Frequency",
main = "Histogram of standardized residuals (filtered)")
Now the standardized residuals are within (-3,3) and we can assume that
there are no outliers.
Let’s do the Shapiro-Wilk normality test.
shapiro.test(mydata$StdResid)
##
## Shapiro-Wilk normality test
##
## data: mydata$StdResid
## W = 0.98157, p-value = 0.0009069
Here we see that the W-statistic is 0,99 which is close to 1. It is nice. Even though p-value is less than 0,05, I have a very big sample (290 units). The Central Limit Theorem comes into play and we can assume normality in our case, even though our p-value is against us.
hist(mydata$CooksD,
xlab = "Cook's distances",
ylab = "Frequency",
main = "Histogram of Cook's distances")
There are gaps in the Cook’s distance. So there are outliers we need to
remove.
# Set a threshold for identifying outliers based on Cook's distance
cook_threshold <- 4 * mean(mydata$CooksD, na.rm = TRUE)
# Identify and remove observations with Cook's distance above the threshold
mydata_cleaned <- mydata[mydata$CooksD <= cook_threshold, ]
# Create a histogram for the cleaned data
hist(mydata_cleaned$CooksD,
xlab = "Cook's distances",
ylab = "Frequency",
main = "Histogram of Cook's distances (Cleaned)",
breaks = 50, # Adjust the number of bins as needed
col = "lightblue", # Adjust the color
probability = TRUE) # Show the density
# Add a density plot
lines(density(mydata_cleaned$CooksD), col = "red", lwd = 2)
Now let’s check the homoscedasticity.
# Load the car package
library(car)
# Create a scatterplot of residuals against fitted values
scatterplot(y = fit_mydata$residuals, x = fit_mydata$fitted.values,
ylab = "Residuals", xlab = "Fitted Values", main = "Residuals vs. Fitted Values")
In the scatterplot I see heteroscedasticity.
library(olsrr)
ols_test_breusch_pagan(fit_mydata)
##
## Breusch Pagan Test for Heteroskedasticity
## -----------------------------------------
## Ho: the variance is constant
## Ha: the variance is not constant
##
## Data
## ----------------------------------------
## Response : X..Certified
## Variables: fitted values of X..Certified
##
## Test Summary
## -------------------------------
## DF = 1
## Chi2 = 97.99667
## Prob > Chi2 = 4.190871e-23
After doing the Breusch Pagan test we reject the null hypothesis (constant variance), which means there is heteroscedasticity.
Because we discovered heteroscedasticity, we need to obtain robust standard errors.
fit_mydata <- lm(X..Certified ~ Institution + X..Posted.in.Forum +
X..Grade.Higher.Than.Zero + Median.Age,
data = mydata)
library(estimatr)
fit_mydata <- lm_robust(X..Certified ~ Institution + X..Posted.in.Forum +
X..Grade.Higher.Than.Zero + Median.Age,
data = mydata,
se_type = "HC1")
summary(fit_mydata)
##
## Call:
## lm_robust(formula = X..Certified ~ Institution + X..Posted.in.Forum +
## X..Grade.Higher.Than.Zero + Median.Age, data = mydata, se_type = "HC1")
##
## Standard error type: HC1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
## (Intercept) -8.92816 2.98208 -2.9939 2.997e-03 -14.79803 -3.05828 283
## Institution0 4.34085 0.69126 6.2796 1.274e-09 2.98019 5.70151 283
## X..Posted.in.Forum -0.02143 0.05572 -0.3846 7.008e-01 -0.13111 0.08825 283
## X..Grade.Higher.Than.Zero 0.31022 0.02606 11.9019 9.370e-27 0.25891 0.36153 283
## Median.Age 0.28657 0.11279 2.5409 1.159e-02 0.06457 0.50858 283
##
## Multiple R-squared: 0.5573 , Adjusted R-squared: 0.551
## F-statistic: 51.74 on 4 and 283 DF, p-value: < 2.2e-16
The Multiple R-squared value is 0.559, meaning that approximately 55.9% of the variability in the percentage of certified participants is explained by the model. The F-statistic tests the overall significance of the model and is highly significant (p-value < 2.2e-16).
In summary, the variables Institution, Grade higher than zero, and Median age appear to be statistically significant predictors of percentage of certified in our model. The variable Percentage posted in forum is not considered statistically significant because its coefficient estimate is very close to zero, and the p-value associated with it is quite high (p-value = 0.6744). So, against my initial guess, sitting in forums doesn’t help much for finishing your courses.