D607_P03_presentation_DRAFT
Bar Raisers - Large Group Justified
For more details on authoring R presentations please visit https://support.rstudio.com/hc/en-us/articles/200486468.
Cite Slide (Letter) for:
For the plot image inclusion, do the following:
- NameTheReference of your plot image within the brackets
- plot image file name within the parenthesis
Example:
Left Column Header
Right Column Header
Summary / Conclusions
Future Exploratory Data Analysis Opportunities
Questions
This project used an exploratory data analysis approach to obtain information available in the public domain from leading social media websites to answer this question.
Websites Scraped:
CRISPER
CRoss-Industry Standard Process for Data Mining
Essential skills and traits of elite data scientists are -
- Critical thinking
- Coding
- Math
- Machine learning, deep learning, AI
- Communication
- Data architecture
- Risk analysis, process improvement, systems engineering
- Problem solving and good business intuition
Reference: Violino, B. (2018, Mar 27). Essential skills and traits of elite data scientists. CIO Magazine
Code snippet:
- Initially scraped all jobs listed within NYC metropolitan area
# base url for all jobs search
base_url<-"https://www.linkedin.com/jobs/search/?geoId=90000070&location=New%20York%20City%20Metropolitan%20Area&start="
max<-30
#scrape linkedin
df_null_linkedIn<-linkedIn_scrape_unique(base_url,max)
df_null_linkedIn%>%
filter(unique())
Methods of Tidying & Transforming -
- Quantitative Analysis of Textual Data
- Takes “text mining” to the next level
- Key Concepts - corpus structure, tokens, stopwords
jobs_vector<-as.vector(df_data$description)
job_corpus<-corpus(jobs_vector)
dfmat_data<-dfm(job_corpus,
remove = stopwords("english"),
stem = FALSE, remove_punct = TRUE
)
Preview of visualizations:
based on twitter #hashtags
some text below image
other points
Ranked bar graphs of #hashtags embedded within extracted tweets
Elaborate on key info features of EDA from Twitter
# put excerpts of code sets here -
Wordcloud
other points
Elaborate on key info features of EDA from LinkedIn
# put excerpts of code sets here
other points
Elaborate on key info features of EDA from GlassDoor
# put excerpts of code sets here
other points
Elaborate on key info features of EDA from Amazon Jobs
# put excerpts of code sets here
According to Anand Rao, global artificial intelligence and innovation lead for data and analytics at consulting firm PwC (Violino, 2018):
“language of choice in data science is moving towards Python, with a substantial following for R as well”
The evidence from the following graph affirms this point.
# put excerpts of code sets here
One observation from data extraction is the number of Data Scientist Internship positions with:
# put excerpts of code sets here
So What are the most valued data science skills?
[Pick following from Violino's list affirming points in his 2018 article based on our EDA]
Enhancing this framework to consider questions such as: