TABLE OF CONTENT

Introduction
  Dataset Description
  Assumption based on Dataset
Preprocessing Steps
  Tokenization
  Stopwords
  N-grams
  Word_Frequency
  WordCloud
Insights

Introduction

 This Education dataset is extracted from the coursera web page by the process of webscrapping using UI path studio and the scrapped data's are saved as csv files. This dataset shows the list of free online Data Science courses and the skills which are developed by these course then course duration and ratings.

Dataset Description

str(course)
## 'data.frame':    239 obs. of  4 variables:
##  $ Courses           : chr  "IBM Data Analyst" "Introduction to Data Science" "Data Processing Using Python" "HTML, CSS, and Javascript for Web Developers" ...
##  $ Skills.Will.Learn : chr  "Skills you'll gain: Algebra, Analysis, Apache, Big Data, Business Analysis, Computational Logic, Computer Progr"| __truncated__ "Skills you'll gain: Communication, Computer Programming, Data Analysis, Data Management, Data Mining, Database "| __truncated__ "Skills you'll gain: Statistical Programming, Computer Programming, Python Programming" "Skills you'll gain: Web Design, Html, HTML and CSS, Web Development, CSS" ...
##  $ Rating.And.Reviews: chr  "4.6\n\n(48.7k reviews)" "4.6\n\n(67.6k reviews)" "4.2\n\n(260 reviews)" "4.7\n\n(13.7k reviews)" ...
##  $ Course.Duration   : chr  "Beginner · Professional Certificate · 3+ Months" "Beginner · Specialization · 3+ Months" "Beginner · Course · 1-3 Months" "Mixed · Course · 1-3 Months" ...
summary(course)
##    Courses          Skills.Will.Learn  Rating.And.Reviews Course.Duration   
##  Length:239         Length:239         Length:239         Length:239        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character
skim(course)
Data summary
Name course
Number of rows 239
Number of columns 4
_______________________
Column type frequency:
character 4
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Courses 0 1 13 77 0 239 0
Skills.Will.Learn 0 1 0 1113 20 217 0
Rating.And.Reviews 0 1 0 21 21 208 0
Course.Duration 0 1 26 48 0 18 0
names(course)
## [1] "Courses"            "Skills.Will.Learn"  "Rating.And.Reviews"
## [4] "Course.Duration"
newcourse<-course[c("Courses")]
skim command is used to understand the dataset as much better, names command is used to view the column names which are all present in the dataset then using the subset command to select columns for the preprocessing steps.

##Assumption based on the dataset

In this dataset I have chosen the course attribute for text preprocessing process, then find which word has the highest frequency and which word has the lowest frequency, from this frequencies I  understood the related word has more free courses in coursera. This dataset contains 239 different types of courses and they are related to data science.

Preprocessing steps

Tokenization

##            word
## 1           ibm
## 2          data
## 3       analyst
## 4  introduction
## 5            to
## 6          data
## 7       science
## 8          data
## 9    processing
## 10        using
##                word
## 1306         thread
## 1307 implementation
## 1308          build
## 1309              a
## 1310        twitter
## 1311          clone
## 1312          front
## 1313            end
## 1314           with
## 1315        reactjs

N-Grams

data_3gram<-unnest_tokens(course,word,Courses,token = "ngrams",n=3)
head(data_3gram,20)
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   Skills.Will.Learn
## 1  Skills you'll gain: Algebra, Analysis, Apache, Big Data, Business Analysis, Computational Logic, Computer Programming, Computer Programming Tools, Correlation And Dependence, Data Analysis, Data Analysis Software, Data Management, Data Mining, Data Visualization, Data Visualization Software, Data Warehousing, Database Administration, Database Application, Databases, Econometrics, Exploratory Data Analysis, Extract, Transform, Load, General Statistics, Machine Learning, Mathematical Theory & Analysis, Mathematics, Microsoft Excel, NoSQL, Operating Systems, Plot (Graphics), Probability & Statistics, Python Programming, Regression, SQL, Spreadsheet Software, Statistical Analysis, Statistical Machine Learning, Statistical Programming, Statistical Visualization, System Programming, Theoretical Computer Science
## 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Skills you'll gain: Communication, Computer Programming, Data Analysis, Data Management, Data Mining, Database Administration, Database Application, Databases, General Statistics, Machine Learning, Marketing, Probability & Statistics, Python Programming, R Programming, Regression, SPSS, SQL, Statistical Programming
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Skills you'll gain: Communication, Computer Programming, Data Analysis, Data Management, Data Mining, Database Administration, Database Application, Databases, General Statistics, Machine Learning, Marketing, Probability & Statistics, Python Programming, R Programming, Regression, SPSS, SQL, Statistical Programming
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Skills you'll gain: Statistical Programming, Computer Programming, Python Programming
## 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Skills you'll gain: Statistical Programming, Computer Programming, Python Programming
## 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Skills you'll gain: Web Design, Html, HTML and CSS, Web Development, CSS
## 7                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Skills you'll gain: Web Design, Html, HTML and CSS, Web Development, CSS
## 8                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Skills you'll gain: Web Design, Html, HTML and CSS, Web Development, CSS
## 9                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Skills you'll gain: Web Design, Html, HTML and CSS, Web Development, CSS
## 10                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         Skills you'll gain: Web Design, Html, HTML and CSS, Web Development, CSS
## 11                    Skills you'll gain: Algebra, Algorithms, Analysis, Business Analysis, Cloud API, Cloud Computing, Communication, Computational Logic, Computer Programming, Computer Programming Tools, Correlation And Dependence, Data Analysis, Data Management, Data Mining, Data Structures, Data Visualization, Database Administration, Database Application, Databases, Econometrics, Exploratory Data Analysis, Extract, Transform, Load, General Statistics, Machine Learning, Machine Learning Algorithms, Marketing, Mathematical Theory & Analysis, Mathematics, Plot (Graphics), Probability & Statistics, Python Programming, R Programming, Regression, SPSS, SQL, Spreadsheet Software, Statistical Analysis, Statistical Machine Learning, Statistical Programming, Statistical Visualization, Theoretical Computer Science
## 12                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       Skills you'll gain: Data Management, Data Visualization, Data Analysis, Statistical Analysis, NoSQL, Data Warehousing, Big Data, Data Mining, Business Analysis, Extract, Transform, Load, Databases, Apache, Analysis, General Statistics
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       Skills you'll gain: Data Management, Data Visualization, Data Analysis, Statistical Analysis, NoSQL, Data Warehousing, Big Data, Data Mining, Business Analysis, Extract, Transform, Load, Databases, Apache, Analysis, General Statistics
## 14                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   Skills you'll gain: Bayesian Statistics, Theoretical Computer Science, Mathematical Theory & Analysis, Probability, Factorial, Computational Logic, Graph Theory, Probability Distribution, Probability & Statistics, Mathematics, General Statistics, Algebra
## 15                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   Skills you'll gain: Bayesian Statistics, Theoretical Computer Science, Mathematical Theory & Analysis, Probability, Factorial, Computational Logic, Graph Theory, Probability Distribution, Probability & Statistics, Mathematics, General Statistics, Algebra
## 16                                                                                   Skills you'll gain: Distributed Computing Architecture, Deep Learning, Computer Architecture, Statistical Machine Learning, Computer Programming, Theoretical Computer Science, Algorithms, Machine Learning Algorithms, Applied Machine Learning, Data Analysis, Computer Vision, Differential Equations, Artificial Neural Networks, Other Programming Languages, Estimation, Calculus, General Statistics, Dimensionality Reduction, Probability Distribution, Probability & Statistics, Linear Algebra, Security Engineering, Network Security, Data Mining, Econometrics, Data Analysis Software, Mathematics, Natural Language Processing, Feature Engineering, Geostatistics, Machine Learning, Support Vector Machine, Computer Networking, Regression
## 17                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Skills you'll gain: Data Analysis, Spreadsheet, Spreadsheet Software, Business Analysis, Analysis
## 18                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Skills you'll gain: Data Analysis, Spreadsheet, Spreadsheet Software, Business Analysis, Analysis
## 19                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Skills you'll gain: Data Analysis, Spreadsheet, Spreadsheet Software, Business Analysis, Analysis
## 20                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Skills you'll gain: Data Analysis, Spreadsheet, Spreadsheet Software, Business Analysis, Analysis
##        Rating.And.Reviews                                 Course.Duration
## 1  4.6\n\n(48.7k reviews) Beginner · Professional Certificate · 3+ Months
## 2  4.6\n\n(67.6k reviews)           Beginner · Specialization · 3+ Months
## 3  4.6\n\n(67.6k reviews)           Beginner · Specialization · 3+ Months
## 4    4.2\n\n(260 reviews)                  Beginner · Course · 1-3 Months
## 5    4.2\n\n(260 reviews)                  Beginner · Course · 1-3 Months
## 6  4.7\n\n(13.7k reviews)                     Mixed · Course · 1-3 Months
## 7  4.7\n\n(13.7k reviews)                     Mixed · Course · 1-3 Months
## 8  4.7\n\n(13.7k reviews)                     Mixed · Course · 1-3 Months
## 9  4.7\n\n(13.7k reviews)                     Mixed · Course · 1-3 Months
## 10 4.7\n\n(13.7k reviews)                     Mixed · Course · 1-3 Months
## 11 4.6\n\n(91.5k reviews) Beginner · Professional Certificate · 3+ Months
## 12  4.8\n\n(6.3k reviews)                  Beginner · Course · 1-3 Months
## 13  4.8\n\n(6.3k reviews)                  Beginner · Course · 1-3 Months
## 14 4.5\n\n(10.2k reviews)                  Beginner · Course · 1-3 Months
## 15 4.5\n\n(10.2k reviews)                  Beginner · Course · 1-3 Months
## 16  4.9\n\n(170k reviews)                      Mixed · Course · 3+ Months
## 17   4.3\n\n(385 reviews)    Beginner · Rhyme Project · Less Than 2 Hours
## 18   4.3\n\n(385 reviews)    Beginner · Rhyme Project · Less Than 2 Hours
## 19   4.3\n\n(385 reviews)    Beginner · Rhyme Project · Less Than 2 Hours
## 20   4.3\n\n(385 reviews)    Beginner · Rhyme Project · Less Than 2 Hours
##                           word
## 1             ibm data analyst
## 2         introduction to data
## 3              to data science
## 4        data processing using
## 5      processing using python
## 6                 html css and
## 7           css and javascript
## 8           and javascript for
## 9           javascript for web
## 10          for web developers
## 11            ibm data science
## 12        introduction to data
## 13           to data analytics
## 14           data science math
## 15         science math skills
## 16                        <NA>
## 17    introduction to business
## 18        to business analysis
## 19     business analysis using
## 20 analysis using spreadsheets
tail(data_3gram,20)
##                                                                                                                                                                                                                                                                                                                                                                                                                                                 Skills.Will.Learn
## 831                                                                                                                                                                                                                                                                           Skills you'll gain: Data Management, Computer Programming, Theoretical Computer Science, Data Structures, Algorithms, Statistical Programming, Python Programming, Computer Program
## 832                                                                                                                                                                                                                                                                           Skills you'll gain: Data Management, Computer Programming, Theoretical Computer Science, Data Structures, Algorithms, Statistical Programming, Python Programming, Computer Program
## 833                                                                                                                                                                                                                                                                           Skills you'll gain: Data Management, Computer Programming, Theoretical Computer Science, Data Structures, Algorithms, Statistical Programming, Python Programming, Computer Program
## 834                                                                                                                                                                                                                                                                                                                                                                                                  Skills you'll gain: Theoretical Computer Science, Algorithms
## 835                                                                                                                                                                                                                                                                                                                                                                                                  Skills you'll gain: Theoretical Computer Science, Algorithms
## 836                                                                                                                                                                                                                                                                                                                                                                                                  Skills you'll gain: Theoretical Computer Science, Algorithms
## 837                                                                                                                                                                                                                                                                                                                                                                                                                                                              
## 838                                                                                                                                                                                                                                                                                                                                                                                                                                                              
## 839                                                                                                                                                                                                                                                                                                                                                                                                                                                              
## 840                                                                                                                                                                                 Skills you'll gain: Data Management, Modeling, Pricing, Advertising, Marketing, Accounting, Research and Design, Theoretical Computer Science, Strategy and Operations, Communication, Finance, Flow Network, Analysis, Data Structures, Decision Tree, Cash Flow, Investment
## 841                                                                                                                                                                                 Skills you'll gain: Data Management, Modeling, Pricing, Advertising, Marketing, Accounting, Research and Design, Theoretical Computer Science, Strategy and Operations, Communication, Finance, Flow Network, Analysis, Data Structures, Decision Tree, Cash Flow, Investment
## 842                                                                                                                                                                                 Skills you'll gain: Data Management, Modeling, Pricing, Advertising, Marketing, Accounting, Research and Design, Theoretical Computer Science, Strategy and Operations, Communication, Finance, Flow Network, Analysis, Data Structures, Decision Tree, Cash Flow, Investment
## 843                                                                                                                                                                                 Skills you'll gain: Data Management, Modeling, Pricing, Advertising, Marketing, Accounting, Research and Design, Theoretical Computer Science, Strategy and Operations, Communication, Finance, Flow Network, Analysis, Data Structures, Decision Tree, Cash Flow, Investment
## 844 Skills you'll gain: Simulation, Modeling, Architecture, Business Process Management, Marketing, Strategy and Operations, Theoretical Computer Science, Manufacturing Process Management, Version Control, Process Analysis, Virtual Reality, Entrepreneurship, Computer-Aided Design, Algorithms, Human Computer Interaction, Communication, Design and Product, Business Analysis, Product Design, Process, Computer Graphics, Operations Research, Internet
## 845                                                                                                                                                                                                                                                                                                Skills you'll gain: Web, Cloud Computing, Computer Programming, Angular, Web Development, React (web framework), Cloud Applications, Front-End Web Development
## 846                                                                                                                                                                                                                                                                                                Skills you'll gain: Web, Cloud Computing, Computer Programming, Angular, Web Development, React (web framework), Cloud Applications, Front-End Web Development
## 847                                                                                                                                                                                                                                                                                                Skills you'll gain: Web, Cloud Computing, Computer Programming, Angular, Web Development, React (web framework), Cloud Applications, Front-End Web Development
## 848                                                                                                                                                                                                                                                                                                Skills you'll gain: Web, Cloud Computing, Computer Programming, Angular, Web Development, React (web framework), Cloud Applications, Front-End Web Development
## 849                                                                                                                                                                                                                                                                                                Skills you'll gain: Web, Cloud Computing, Computer Programming, Angular, Web Development, React (web framework), Cloud Applications, Front-End Web Development
## 850                                                                                                                                                                                                                                                                                                Skills you'll gain: Web, Cloud Computing, Computer Programming, Angular, Web Development, React (web framework), Cloud Applications, Front-End Web Development
##       Rating.And.Reviews                                  Course.Duration
## 831  4.4\n\n(32 reviews) Intermediate · Rhyme Project · Less Than 2 Hours
## 832  4.4\n\n(32 reviews) Intermediate · Rhyme Project · Less Than 2 Hours
## 833  4.4\n\n(32 reviews) Intermediate · Rhyme Project · Less Than 2 Hours
## 834                                    Intermediate · Course · 1-3 Months
## 835                                    Intermediate · Course · 1-3 Months
## 836                                    Intermediate · Course · 1-3 Months
## 837                                    Intermediate · Course · 1-3 Months
## 838                                    Intermediate · Course · 1-3 Months
## 839                                    Intermediate · Course · 1-3 Months
## 840  4.6\n\n(75 reviews)                    Beginner · Course · 1-4 Weeks
## 841  4.6\n\n(75 reviews)                    Beginner · Course · 1-4 Weeks
## 842  4.6\n\n(75 reviews)                    Beginner · Course · 1-4 Weeks
## 843  4.6\n\n(75 reviews)                    Beginner · Course · 1-4 Weeks
## 844 4.7\n\n(430 reviews)                   Beginner · Course · 1-3 Months
## 845  4.5\n\n(25 reviews) Intermediate · Rhyme Project · Less Than 2 Hours
## 846  4.5\n\n(25 reviews) Intermediate · Rhyme Project · Less Than 2 Hours
## 847  4.5\n\n(25 reviews) Intermediate · Rhyme Project · Less Than 2 Hours
## 848  4.5\n\n(25 reviews) Intermediate · Rhyme Project · Less Than 2 Hours
## 849  4.5\n\n(25 reviews) Intermediate · Rhyme Project · Less Than 2 Hours
## 850  4.5\n\n(25 reviews) Intermediate · Rhyme Project · Less Than 2 Hours
##                              word
## 831        solver using recursion
## 832            using recursion in
## 833           recursion in python
## 834           data structures and
## 835     structures and algorithms
## 836             and algorithms ii
## 837           data structures and
## 838     structures and algorithms
## 839            and algorithms iii
## 840  applying investment decision
## 841     investment decision rules
## 842            decision rules for
## 843            rules for startups
## 844 digital thread implementation
## 845               build a twitter
## 846               a twitter clone
## 847           twitter clone front
## 848               clone front end
## 849                front end with
## 850              end with reactjs

Stopwords

tokens_stop=data_tokens %>% filter(!word %in% stop_words$word)
head(tokens_stop,20)
##            word
## 1           ibm
## 2          data
## 3       analyst
## 4  introduction
## 5          data
## 6       science
## 7          data
## 8    processing
## 9        python
## 10         html
## 11          css
## 12   javascript
## 13          web
## 14   developers
## 15          ibm
## 16         data
## 17      science
## 18 introduction
## 19         data
## 20    analytics
tail(tokens_stop,20)
##               word
## 950     structures
## 951     algorithms
## 952             ii
## 953           data
## 954     structures
## 955     algorithms
## 956            iii
## 957       applying
## 958     investment
## 959       decision
## 960          rules
## 961       startups
## 962        digital
## 963         thread
## 964 implementation
## 965          build
## 966        twitter
## 967          clone
## 968          front
## 969        reactjs

Word frequency

word_freq<-tokens_stop %>% count(word,sort = TRUE)
head(word_freq,10)
##            word  n
## 1          data 66
## 2        python 26
## 3       science 26
## 4  introduction 23
## 5      learning 18
## 6       machine 18
## 7      analysis 15
## 8         excel 13
## 9        google 10
## 10   processing  9
tail(word_freq,10)
##          word n
## 444       wix 1
## 445      word 1
## 446 workbench 1
## 447  workflow 1
## 448 workloads 1
## 449 workplace 1
## 450 workspace 1
## 451 wrangling 1
## 452    writer 1
## 453        xg 1
  Using R commands Tokenization, Stopwords, N-grams, word_freq, WordCloud are identified. By using unnest_tokens command I separated the sentence present in the course attribute as a tokens, then using stopword command I removed the stopwords like of, in, an etc.,

WordCloud

wordcloud2(data=word_freq,size = 1.2,color = 'random-light', backgroundColor = 'litegreen')
Finally using WordCloud2 find the repeaded words it gives its result in visual representation.

Insights

  As per my understanding from this dataset, some words are repeated and has the most high frequencies, the repeated words are data, science, python, learning, analysis, fundamentals, processing, programming, finally in coursera there are more analytics oreinted free courses are offered, and the same time some words has very less frequencies, they are probability, management, thinking, marketing, cybersecurity etc,. The word which has less frequency are helps to identify the minimum number of courses are offered sin coursera.