Load data source into an R dataframe using MySQL RDBMS

Each data has a time constraint and had time box, but the efforts to have raw data to provide downstream on this project. The following data resource from Google, Indeed and Kaggle Job Boards generalize the approach to initiate a search for “data science” terms to collect the results, identify the skills and a preliminary filtering of noise and irrelevant data. The results set is provided to transforming the data with the following variables: Skill Type, Skill name, rating and company name.

# 2. variables Settings
db_user <- 'root'
db_password <- ''
db_name <- 'skills'
db_host <- '127.0.0.1' # localhost for local access
db_port <- 3306

# 3. Read data from db
mydb <-  dbConnect(MySQL(), user = db_user, password = db_password,
                 dbname = db_name, host = db_host, port = db_port)
s <- paste0(" SELECT SkillType, SkillName, Rating,
                        CASE
                            WHEN Company = 1 THEN 'Google'
                            ELSE 'NA'
                        END AS Company
              FROM GoogleData
              UNION
              SELECT SkillType, SkillName, Rating,
                        CASE
                            WHEN Company = 2 THEN 'Indeed'
                            ELSE 'NA'
                        END AS Company
              FROM IndeedData
              UNION
              SELECT SkillType, SkillName, Rating,
                        CASE
                            WHEN Company = 3 THEN 'Kaggle'
                            ELSE 'NA'
                        END AS Company
              FROM KaggleData;")

rs <- dbSendQuery(mydb, s)
df <-  fetch(rs, n = -1)
on.exit(dbDisconnect(mydb))
## Warning: Closing open result sets
#uf <- df %>% gather(SkillName, n, Rating:Company)
kable(df) %>%
  kable_styling() %>%
  scroll_box(height = "500px")
SkillType SkillName Rating Company
communication ai law 7 Google
business analysis 10 Google
math analytics 11 Google
programming big data 42 Google
programming cloud 2 Google
business creativity 3 Google
programming data engineering 4 Google
programming data mining 3 Google
programming data natives 8 Google
programming devops 2 Google
business entrepreneurship 2 Google
programming fintech 2 Google
programming galaxql 2 Google
business government 3 Google
programming Hadoop 12 Google
business information 9 Google
programming IOS 2 Google
programming Java 2 Google
programming machine learning 15 Google
business manufacturing 2 Google
programming MATLAB 2 Google
visualization media 6 Google
programming mobile development 2 Google
programming network 2 Google
programming NOSQL 5 Google
math pattern discovery 2 Google
business policy 3 Google
programming Python 7 Google
programming r 7 Google
business retail 2 Google
programming RSS 2 Google
programming Ruby 2 Google
programming SAS 2 Google
business security and risk strategy 2 Google
programming SQL 6 Google
programming storage 2 Google
programming tableau 2 Google
programming web development 2 Google
programming agent-based Simulation 32 Indeed
programming ANOVA 138 Indeed
math Bayes networks 74 Indeed
math Bayesian Statistics 175 Indeed
programming big data 44 Indeed
programming BUGS 160 Indeed
programming C 80 Indeed
math calculus 91 Indeed
programming cloud tech 106 Indeed
programming clustering 165 Indeed
math continuous Simulation 149 Indeed
math convex Optimization 17 Indeed
programming CSS 93 Indeed
programming DBA 143 Indeed
business decision trees 117 Indeed
visualization design 15 Indeed
programming discrete Simulation 94 Indeed
programming Distributed Data 94 Indeed
visualization experimental design 152 Indeed
business forecasting 49 Indeed
math general linear model 133 Indeed
math geographic covariates 0 Indeed
programming GIS 233 Indeed
business global Optimization 94 Indeed
visualization Graphical Models 122 Indeed
programming Hadoop 76 Indeed
programming Hive 106 Indeed
programming HTML 73 Indeed
math integer Optimization 96 Indeed
programming JAVA 70 Indeed
programming JavaScript 83 Indeed
programming JSON 146 Indeed
math linear algebra 69 Indeed
programming Machine Learning 98 Indeed
visualization Map/Reduce 109 Indeed
visualization mapping 134 Indeed
math Math 76 Indeed
programming MATLAB 108 Indeed
math MCMC 68 Indeed
math Monte-Carlo Statistics 170 Indeed
math multinomial modeling 16 Indeed
programming MySQL 114 Indeed
programming neural nets 64 Indeed
programming nix 6 Indeed
programming NoSQL 89 Indeed
programming Objective C 129 Indeed
programming Oracle 121 Indeed
programming Perl 96 Indeed
programming Pig 155 Indeed
programming PostgreSQL 124 Indeed
business Product Development 41 Indeed
business project management 25 Indeed
programming Python 52 Indeed
programming R 64 Indeed
programming Rails 132 Indeed
business real analysis 70 Indeed
programming RegEx 86 Indeed
programming Ruby 102 Indeed
programming SAS 52 Indeed
programming Simulation 136 Indeed
communication social networks 172 Indeed
math Spatial Statistics 109 Indeed
programming SPSS 122 Indeed
programming SQL 44 Indeed
programming Stata 107 Indeed
visualization statistical graphics 183 Indeed
math Statistics 85 Indeed
programming Structured Data 75 Indeed
business Surveys and Marketing 130 Indeed
programming SVM 144 Indeed
business Systems Administration 109 Indeed
communication technical writing/publishing 4 Indeed
programming Teradata 127 Indeed
programming text mining 174 Indeed
math time-series analysis 167 Indeed
programming UNIX 89 Indeed
programming Unstructured Data 92 Indeed
visualization visualization 80 Indeed
visualization web-based data-viz 0 Indeed
programming XML 185 Indeed
programming Python 28 Kaggle
math analytics 23 Kaggle
programming machine learning 19 Kaggle
business business 14 Kaggle
programming programming 14 Kaggle
communication team 14 Kaggle
math statistics 13 Kaggle
programming SQL 12 Kaggle
communication communication 11 Kaggle
programming modeling 9 Kaggle
communication interpersonal 9 Kaggle
communication management 9 Kaggle
programming big data 7 Kaggle
visualization design 6 Kaggle
programming Hadoop 6 Kaggle
programming Java 6 Kaggle
business research 4 Kaggle
communication creative 4 Kaggle
math math 3 Kaggle
programming MATLAB 3 Kaggle
programming predictive analytics 3 Kaggle
programming SAS 3 Kaggle
communication leadership 3 Kaggle
business analysis 2 Kaggle
visualization presentation 1 Kaggle
programming R 1 Kaggle

Summary

summary(df)
##   SkillType          SkillName            Rating         
##  Length:144         Length:144         Length:144        
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##    Company         
##  Length:144        
##  Class :character  
##  Mode  :character

Including Plots

## [1] "business"      "communication" "math"          "programming"  
## [5] "visualization"

## Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
## argument is not numeric or logical: returning NA


Conclusion

The purpose of this exercise is to import three data sets in mysql db and tidy them with above libraries.

  1. Project 3 on rpubs.com, and SQL script and additional files uploaded on github.





Please email to: kleber.perez@live.com for any suggestion.

    DATA607 Project 2 - MSDS 2019 Program.