If we can get a decent sample of blogs and read their RSS feeds, we can possibly mine them for category tags and filter that as a list of skills. The idea is that bloggers write about topics of most interest in Data Science (of which Data Science skills are a subset).
The only real Data Science blog I am familiar with is KDNuggets. They have an RSS feed here.
As a proof of concept, this is what I did:
# Use RCurl to get the raw RSS/XML (XML package won't read https for some reason)
kd <- getURL("https://www.kdnuggets.com/feed")
# Save as a file
write_file(kd,"kd.rss")
# Parse the XML
test <- xmlParse("kd.rss")
# Use XPath to get the category
categories <- data.frame(table(xpathSApply(test,"//category", xmlValue)))
# Look at the categories
categories %>% arrange(desc(Freq))
## Var1 Freq
## 1 2018 Oct Opinions, Interviews 14
## 2 Machine Learning 14
## 3 2018 Oct Tutorials, Overviews 11
## 4 Data Science 10
## 5 Python 9
## 6 Deep Learning 8
## 7 2018 Oct Courses, Education 7
## 8 Jobs 7
## 9 2018 Oct News, Features 6
## 10 AI 6
## 11 2018 Oct Top Stories, Tweets 5
## 12 2018 Oct Meetings 4
## 13 Data Analytics 4
## 14 Mathematics 4
## 15 Algorithms 3
## 16 Analytics 3
## 17 Big Data 3
## 18 MN 3
## 19 Neural Networks 3
## 20 NLP 3
## 21 Online Education 3
## 22 R 3
## 23 Top stories 3
## 24 UnitedHealth Group 3
## 25 2018 Sep Meetings 2
## 26 Austin 2
## 27 Business Analytics 2
## 28 CA 2
## 29 Computer Vision 2
## 30 Data Science Education 2
## 31 Data Scientist 2
## 32 Data Visualization 2
## 33 Image Processing 2
## 34 Image Recognition 2
## 35 KDnuggets 2018 Issues 2
## 36 Linear Regression 2
## 37 London 2
## 38 Master of Science 2
## 39 Minnetonka 2
## 40 MS in Business Analytics 2
## 41 MS in Data Science 2
## 42 Project Manager 2
## 43 San Francisco 2
## 44 Summit 2
## 45 TEXATA 2
## 46 Tom Mitchell 2
## 47 Top tweets 2
## 48 Training 2
## 49 Transfer Learning 2
## 50 TX 2
## 51 2018 Sep News, Features 1
## 52 2018 Sep Opinions, Interviews 1
## 53 a4 Media 1
## 54 ActiveState 1
## 55 Apache Spark 1
## 56 Apps 1
## 57 Atlanta 1
## 58 Beginners 1
## 59 Big Data Analytics 1
## 60 Big Data Hype 1
## 61 Boston 1
## 62 Business Value 1
## 63 Canada 1
## 64 Career 1
## 65 Clark University 1
## 66 Classifier 1
## 67 Cloud Computing 1
## 68 CO 1
## 69 Colorado State University 1
## 70 Computer Science 1
## 71 Cross-validation 1
## 72 Data Engineer 1
## 73 Data Engineering 1
## 74 Data Mining 1
## 75 Data Preprocessing 1
## 76 DataCamp 1
## 77 Datasets 1
## 78 Derived Variables 1
## 79 Developer 1
## 80 Development 1
## 81 DevOps 1
## 82 Director 1
## 83 Drexel University 1
## 84 ebook 1
## 85 Eric Siegel 1
## 86 Europe 1
## 87 Explained 1
## 88 Faculty 1
## 89 Feature Engineering 1
## 90 Flat Minima 1
## 91 Fort Collins 1
## 92 GA 1
## 93 GDPR 1
## 94 GitHub 1
## 95 Government 1
## 96 Green Bay 1
## 97 Hadoop 1
## 98 Healthcare 1
## 99 Hype 1
## 100 IN 1
## 101 Indianapolis 1
## 102 Integration 1
## 103 Intel 1
## 104 IoT 1
## 105 Jan Software 1
## 106 JMP 1
## 107 Kafka 1
## 108 Keras 1
## 109 Learning 1
## 110 Lift charts 1
## 111 Linear Networks 1
## 112 Long Island City 1
## 113 MA 1
## 114 Manager 1
## 115 MathWorks 1
## 116 Meetings 1
## 117 Metrics 1
## 118 Michael Berry 1
## 119 Minneapolis 1
## 120 Mobile 1
## 121 Modeling 1
## 122 MS in Analytics 1
## 123 NE 1
## 124 New York City 1
## 125 Northwestern 1
## 126 NY 1
## 127 O'Reilly 1
## 128 Object Detection 1
## 129 ODSC 1
## 130 Omaha 1
## 131 Optimization 1
## 132 Penn State 1
## 133 Predictive Analytics 1
## 134 Predictive Analytics World 1
## 135 Predictive Modeling 1
## 136 Predictive Models 1
## 137 Privacy 1
## 138 Process 1
## 139 Products 1
## 140 Professor 1
## 141 Psychology 1
## 142 Raspberry Pi 1
## 143 RE.WORK 1
## 144 Recurrent Neural Networks 1
## 145 Redwood City 1
## 146 scikit-learn 1
## 147 Seattle 1
## 148 Segmentation 1
## 149 Semantic Analysis 1
## 150 Sequences 1
## 151 SGD 1
## 152 Small Data 1
## 153 Smart Data 1
## 154 Software 1
## 155 SQL 1
## 156 Syracuse University 1
## 157 TDWI 1
## 158 Tencent 1
## 159 Text Analysis 1
## 160 Text Classification 1
## 161 Toronto 1
## 162 UK 1
## 163 University of Nebraska at Omaha 1
## 164 WA 1
## 165 WI 1
## 166 Worcester 1
## 167 Word Embeddings 1
I am sure this can be repeated for other, legitimate DS blogs.