Outline

If we can get a decent sample of blogs and read their RSS feeds, we can possibly mine them for category tags and filter that as a list of skills. The idea is that bloggers write about topics of most interest in Data Science (of which Data Science skills are a subset).

KDNuggets

The only real Data Science blog I am familiar with is KDNuggets. They have an RSS feed here.

As a proof of concept, this is what I did:

# Use RCurl to get the raw RSS/XML (XML package won't read https for some reason)
kd <- getURL("https://www.kdnuggets.com/feed")

# Save as a file
write_file(kd,"kd.rss")

# Parse the XML
test <- xmlParse("kd.rss")

# Use XPath to get the category
categories <- data.frame(table(xpathSApply(test,"//category", xmlValue)))

# Look at the categories
categories %>% arrange(desc(Freq))
##                                Var1 Freq
## 1     2018 Oct Opinions, Interviews   14
## 2                  Machine Learning   14
## 3     2018 Oct Tutorials, Overviews   11
## 4                      Data Science   10
## 5                            Python    9
## 6                     Deep Learning    8
## 7       2018 Oct Courses, Education    7
## 8                              Jobs    7
## 9           2018 Oct News, Features    6
## 10                               AI    6
## 11     2018 Oct Top Stories, Tweets    5
## 12                2018 Oct Meetings    4
## 13                   Data Analytics    4
## 14                      Mathematics    4
## 15                       Algorithms    3
## 16                        Analytics    3
## 17                         Big Data    3
## 18                               MN    3
## 19                  Neural Networks    3
## 20                              NLP    3
## 21                 Online Education    3
## 22                                R    3
## 23                      Top stories    3
## 24               UnitedHealth Group    3
## 25                2018 Sep Meetings    2
## 26                           Austin    2
## 27               Business Analytics    2
## 28                               CA    2
## 29                  Computer Vision    2
## 30           Data Science Education    2
## 31                   Data Scientist    2
## 32               Data Visualization    2
## 33                 Image Processing    2
## 34                Image Recognition    2
## 35            KDnuggets 2018 Issues    2
## 36                Linear Regression    2
## 37                           London    2
## 38                Master of Science    2
## 39                       Minnetonka    2
## 40         MS in Business Analytics    2
## 41               MS in Data Science    2
## 42                  Project Manager    2
## 43                    San Francisco    2
## 44                           Summit    2
## 45                           TEXATA    2
## 46                     Tom Mitchell    2
## 47                       Top tweets    2
## 48                         Training    2
## 49                Transfer Learning    2
## 50                               TX    2
## 51          2018 Sep News, Features    1
## 52    2018 Sep Opinions, Interviews    1
## 53                         a4 Media    1
## 54                      ActiveState    1
## 55                     Apache Spark    1
## 56                             Apps    1
## 57                          Atlanta    1
## 58                        Beginners    1
## 59               Big Data Analytics    1
## 60                    Big Data Hype    1
## 61                           Boston    1
## 62                   Business Value    1
## 63                           Canada    1
## 64                           Career    1
## 65                 Clark University    1
## 66                       Classifier    1
## 67                  Cloud Computing    1
## 68                               CO    1
## 69        Colorado State University    1
## 70                 Computer Science    1
## 71                 Cross-validation    1
## 72                    Data Engineer    1
## 73                 Data Engineering    1
## 74                      Data Mining    1
## 75               Data Preprocessing    1
## 76                         DataCamp    1
## 77                         Datasets    1
## 78                Derived Variables    1
## 79                        Developer    1
## 80                      Development    1
## 81                           DevOps    1
## 82                         Director    1
## 83                Drexel University    1
## 84                            ebook    1
## 85                      Eric Siegel    1
## 86                           Europe    1
## 87                        Explained    1
## 88                          Faculty    1
## 89              Feature Engineering    1
## 90                      Flat Minima    1
## 91                     Fort Collins    1
## 92                               GA    1
## 93                             GDPR    1
## 94                           GitHub    1
## 95                       Government    1
## 96                        Green Bay    1
## 97                           Hadoop    1
## 98                       Healthcare    1
## 99                             Hype    1
## 100                              IN    1
## 101                    Indianapolis    1
## 102                     Integration    1
## 103                           Intel    1
## 104                             IoT    1
## 105                    Jan Software    1
## 106                             JMP    1
## 107                           Kafka    1
## 108                           Keras    1
## 109                        Learning    1
## 110                     Lift charts    1
## 111                 Linear Networks    1
## 112                Long Island City    1
## 113                              MA    1
## 114                         Manager    1
## 115                       MathWorks    1
## 116                        Meetings    1
## 117                         Metrics    1
## 118                   Michael Berry    1
## 119                     Minneapolis    1
## 120                          Mobile    1
## 121                        Modeling    1
## 122                 MS in Analytics    1
## 123                              NE    1
## 124                   New York City    1
## 125                    Northwestern    1
## 126                              NY    1
## 127                        O'Reilly    1
## 128                Object Detection    1
## 129                            ODSC    1
## 130                           Omaha    1
## 131                    Optimization    1
## 132                      Penn State    1
## 133            Predictive Analytics    1
## 134      Predictive Analytics World    1
## 135             Predictive Modeling    1
## 136               Predictive Models    1
## 137                         Privacy    1
## 138                         Process    1
## 139                        Products    1
## 140                       Professor    1
## 141                      Psychology    1
## 142                    Raspberry Pi    1
## 143                         RE.WORK    1
## 144       Recurrent Neural Networks    1
## 145                    Redwood City    1
## 146                    scikit-learn    1
## 147                         Seattle    1
## 148                    Segmentation    1
## 149               Semantic Analysis    1
## 150                       Sequences    1
## 151                             SGD    1
## 152                      Small Data    1
## 153                      Smart Data    1
## 154                        Software    1
## 155                             SQL    1
## 156             Syracuse University    1
## 157                            TDWI    1
## 158                         Tencent    1
## 159                   Text Analysis    1
## 160             Text Classification    1
## 161                         Toronto    1
## 162                              UK    1
## 163 University of Nebraska at Omaha    1
## 164                              WA    1
## 165                              WI    1
## 166                       Worcester    1
## 167                 Word Embeddings    1

Others?

I am sure this can be repeated for other, legitimate DS blogs.