# Load necessary libraries
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Load the data
df <- read.csv("~/Documents/STAT 2024/udemy_courses_k.csv")
model_data <- df %>%
mutate(has_subscribers = ifelse(num_subscribers > 0, 1, 0))
model_simple <- glm(has_subscribers ~ price, data = model_data, family = "binomial")
summary(model_simple)
##
## Call:
## glm(formula = has_subscribers ~ price, family = "binomial", data = model_data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.533128 0.171290 20.627 <2e-16 ***
## price 0.007469 0.002667 2.801 0.0051 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 693.29 on 3677 degrees of freedom
## Residual deviance: 683.51 on 3676 degrees of freedom
## AIC: 687.51
##
## Number of Fisher Scoring iterations: 7
The intercept (3.533) represents the log-odds of a course having subscribers when the price is zero.
Price coefficient indicates that for each additional unit increase in price, the log-odds of a course having subscribers increases by approximately 0.0075, holding other factors constant. Since the p-value for price is 0.0051, this effect is statistically significant at the 0.01 level.
The odds ratio for price can be calculated as exp(0.007469), which provides an estimate of how each unit increase in price affects the odds of having subscribers.
se_price <- summary(model_simple)$coefficients["price", "Std. Error"]
estimate_price <- summary(model_simple)$coefficients["price", "Estimate"]
# 95% confidence interval for 'price'
lower_bound <- estimate_price - 1.96 * se_price
upper_bound <- estimate_price + 1.96 * se_price
ci_price <- c(lower_bound, upper_bound)
ci_price
## [1] 0.002241867 0.012696531
This 95% confidence interval indicates that, for each additional unit increase in price, we can be 95% confident that the true increase in the log-odds of having subscribers lies between 0.0022 and 0.0127
The interval does not include zero, it suggests that price has a statistically significant positive effect on the likelihood of having subscribers.
More expensive courses are somewhat more likely to attract subscribers, within this estimated range of effect.