A. is_paid, B. content_duration, C. level
A. is_paid: The column likely represents whether a course is paid or free. However, the value being either TRUE or FALSE needs clarification on whether true means the course is paid or free, or if there are additional conditions.
B. content_duration: It represent the duration of course, it’s unclear f the value is measure in hours , minutes or another unit without reading the documentation.
C. level: The values here are descriptive like “All Levels” or “intermediate level” but the exact categories and how they are defined need clarification.
A. is_paid: By using True or False instead of using paid or free it’s easier to filter the data and takes up less space in the database.
B. content_duration: The courses length is measured in hours because it’s simpler and makes it easy to compare the lengths of different courses.
C. level: Using words like “Beginner” or “Intermediate” makes it easier for people to understand and choose the right course.
You could miss enrolling for free courses and may enroll for paid courses. The unit of course duration may lead to miss calculation of course duration. Misunderstanding the level may lead to take improper segmentation of beginner and advanced courses.
The price column likely indicates the price of the cost of the course ,but it is unclear if the values represents the regular price, discounted price since udemy frequently offers discounts on courses, or combination of both and also if the values indicates taxes nd additional fees.
# Load necessary libraries
library(ggplot2)
# Read the dataset
df <- read.csv('~/Downloads/udemy_courses.csv')
# Create scatter plot for Price vs Number of Subscribers
ggplot(df, aes(x = price, y = num_subscribers)) +
geom_point(alpha = 0.5, color = 'blue') +
# Add a red dashed vertical line for free courses (Price = 0)
geom_vline(xintercept = 0, color = 'red', linetype = 'dashed') +
# Add titles and labels
ggtitle('Price vs Number of Subscribers') +
xlab('Price (in USD)') +
ylab('Number of Subscribers') +
# Improve layout
theme_minimal()
The above scatter plot shows the relationship between course price and number of subscribers. one of the key issue here is uncertainty surrounding the “price” column.
It is unclear the price represents the regular prices or discounted prices this critical issue because, If the prices are discounted, the relation between price and number of subscribers may be misleading.
The red dashed line at price=o represents free courses. These tend to attract many subscribers, but without clarity on how price discounts are given the behavior of paid courses at full vs discounted price remains unclear.
Without knowing the prices reflect discounts you could make wrong conclusions about which price point attract more subscribers.
Clarification on price by adding separate column for “discounted price”.