pacman::p_load(RPostgreSQL, dplyr, dbplyr, tidyr, magrittr, stringr, udpipe, tm, lattice, tidytext, ggplot2)
# <ourgroupname> is no capitals and no spaces
host="soundsgood.crg53husyk2z.us-east-2.rds.amazonaws.com"
port="5432"
database="soundsgood"
username="soundsgood"
password="soundsgood"
my_db <- src_postgres(database, host=host, port=port, user=username, password=password)
con <- DBI::dbConnect(RPostgreSQL::PostgreSQL(),
host = host,
user = username,
password = password
)
comms <- data.frame(tbl(con, 'comms'))
posts <- data.frame(tbl(con, 'posts'))
comms %<>%
arrange(index)
head(comms, 3)
## index sub_post_id
## 1 0 0
## 2 1 0
## 3 2 0
## text
## 1 I am going to be starting a Computer Science Masters with a focus in Data Science/Machine Learning soon and I am interested in applying the skills I learn to cyber security. My question is, should I dedicate a few courses in my masters towards cyber security or should I get a certification or two in cyber security to begin to develop domain knowledge?
## 2 Do you guys think the virus is going to make it even harder to break into DS as a new grad school grad? \n\nI am getting really worried about this
## 3 In my Data Science degree I need to pick a "path"/specialization\n\nThe two I'm interested in are Economics and Social Sciences. The application in the former is fairly obvious, but what are the uses of the latter? I'm not interested in most areas of social science but am more so in things like PolSci and if it counts, IR. Would love to do something like Nate Silver for example\n\nAnyways, would love to hear from Data Scientists who work in Econometrics or some Social Science related fields :)
## likes
## 1 <NA>
## 2 <NA>
## 3 <NA>
Commms is a dataframe containing comments in response to different reddit posts. We can Identify which comments belong to which posts with the index and sub_post_id columns.
head(posts, 3)
## index category
## 1 0 datascience
## 2 1 datascience
## 3 2 datascience
## title
## 1 Weekly Entering & Transitioning Thread | 15 Mar 2020 - 22 Mar 2020
## 2 My boss proposes infeasible projects and doesnâ\200\231t like confrontation, advice?
## 3 Data Against Covid-19: "We are a community of medical professionals, life scientists and data scientists on a quest to defeat COVID-19."
## body
## 1 _Bleep Bloop_. Welcome to this week's entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include:\n\n* Learning resources (e.g. books, tutorials, videos)\n* Traditional education (e.g. schools, degrees, electives)\n* Alternative education (e.g. online courses, bootcamps)\n* Job search questions (e.g. resumes, applying, career prospects)\n* Elementary questions (e.g. where to start, what next)\n\nWhile you wait for answers from the community, check out the [FAQ](https://www.reddit.com/r/datascience/wiki/frequently-asked-questions) and [Resources](https://www.reddit.com/r/datascience/wiki/resources) pages on our wiki. You can also search for [past weekly threads](https://www.reddit.com/r/datascience/search?q=weekly%20thread&restrict_sr=1&sort=new).\n\n---\n\nI am a bot created by the r/datascience moderators. I'm open source! You can review my [source code on GitHub](https://github.com/vogt4nick/datascience-bot).
## 2 Hey guys, so Iâ\200\231m working as a Data Scientist in real estate. My boss proposes projects with insanely large data heâ\200\231s thrown into google big query. \n\nHe has a very limited background in CS and Iâ\200\231ve had to optimize a lot of his code / ETL pipelines just to make everyone elseâ\200\231s life easier. He got the position as itâ\200\231s a group of friends who started the company, designated himself as the head of data science.\n\nHeâ\200\231s proposing ideas that, in an optimal setting with a large budget, it would be feasible. Iâ\200\231ve talked to him about it and heâ\200\231s dismissed my concerns. \n\nAlarming extra concerns: \n1) he was amazed that I used terminal.\n2) he doesnâ\200\231t understand basic linear algebra. \n\nIâ\200\231m concerned for my safety in the company. If I canâ\200\231t fulfill my bossâ\200\231 project proposals / ideas, Iâ\200\231ll be let go. \n\nPlease, Iâ\200\231d love some advice. Thanks Gang!
## 3
## sub sub_post_id n_comments
## 1 datascience 0 56
## 2 datascience 1 29
## 3 datascience 2 1
Posts is a dataframe of the original reddit posts, containing the title of the post, the body of the post, and the number of comments on the post.
p1 = posts %>%
filter(n_comments == 1) %>%
arrange(index)
p2_plus = posts %>%
filter(n_comments > 1) %>%
arrange(index)
In order to find which comments are connected to which posts, we can merge the two dataframes. In order to do this, they need to be of the same length. Using the n_comments column, we can multiply each post by the number of comments on them so that the dataframes will have the same length. The following for loop does this:
for (row in 1:length(p2_plus$n_comments)){
for (n in 1:p2_plus[row, 7]){
p1 = rbind(p1, p2_plus[row,])
}
}
p1 %<>%
arrange(index)
head(p1, 3)
## index category
## 1 0 datascience
## 2 0 datascience
## 3 0 datascience
## title
## 1 Weekly Entering & Transitioning Thread | 15 Mar 2020 - 22 Mar 2020
## 2 Weekly Entering & Transitioning Thread | 15 Mar 2020 - 22 Mar 2020
## 3 Weekly Entering & Transitioning Thread | 15 Mar 2020 - 22 Mar 2020
## body
## 1 _Bleep Bloop_. Welcome to this week's entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include:\n\n* Learning resources (e.g. books, tutorials, videos)\n* Traditional education (e.g. schools, degrees, electives)\n* Alternative education (e.g. online courses, bootcamps)\n* Job search questions (e.g. resumes, applying, career prospects)\n* Elementary questions (e.g. where to start, what next)\n\nWhile you wait for answers from the community, check out the [FAQ](https://www.reddit.com/r/datascience/wiki/frequently-asked-questions) and [Resources](https://www.reddit.com/r/datascience/wiki/resources) pages on our wiki. You can also search for [past weekly threads](https://www.reddit.com/r/datascience/search?q=weekly%20thread&restrict_sr=1&sort=new).\n\n---\n\nI am a bot created by the r/datascience moderators. I'm open source! You can review my [source code on GitHub](https://github.com/vogt4nick/datascience-bot).
## 2 _Bleep Bloop_. Welcome to this week's entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include:\n\n* Learning resources (e.g. books, tutorials, videos)\n* Traditional education (e.g. schools, degrees, electives)\n* Alternative education (e.g. online courses, bootcamps)\n* Job search questions (e.g. resumes, applying, career prospects)\n* Elementary questions (e.g. where to start, what next)\n\nWhile you wait for answers from the community, check out the [FAQ](https://www.reddit.com/r/datascience/wiki/frequently-asked-questions) and [Resources](https://www.reddit.com/r/datascience/wiki/resources) pages on our wiki. You can also search for [past weekly threads](https://www.reddit.com/r/datascience/search?q=weekly%20thread&restrict_sr=1&sort=new).\n\n---\n\nI am a bot created by the r/datascience moderators. I'm open source! You can review my [source code on GitHub](https://github.com/vogt4nick/datascience-bot).
## 3 _Bleep Bloop_. Welcome to this week's entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include:\n\n* Learning resources (e.g. books, tutorials, videos)\n* Traditional education (e.g. schools, degrees, electives)\n* Alternative education (e.g. online courses, bootcamps)\n* Job search questions (e.g. resumes, applying, career prospects)\n* Elementary questions (e.g. where to start, what next)\n\nWhile you wait for answers from the community, check out the [FAQ](https://www.reddit.com/r/datascience/wiki/frequently-asked-questions) and [Resources](https://www.reddit.com/r/datascience/wiki/resources) pages on our wiki. You can also search for [past weekly threads](https://www.reddit.com/r/datascience/search?q=weekly%20thread&restrict_sr=1&sort=new).\n\n---\n\nI am a bot created by the r/datascience moderators. I'm open source! You can review my [source code on GitHub](https://github.com/vogt4nick/datascience-bot).
## sub sub_post_id n_comments
## 1 datascience 0 56
## 2 datascience 0 56
## 3 datascience 0 56
Before we can merge, we need the index columns to match. The indices are not matching anymore because we multiplied the rows.
p1 %<>%
mutate(index = comms$index)
head(p1, 3)
## index category
## 1 0 datascience
## 2 1 datascience
## 3 2 datascience
## title
## 1 Weekly Entering & Transitioning Thread | 15 Mar 2020 - 22 Mar 2020
## 2 Weekly Entering & Transitioning Thread | 15 Mar 2020 - 22 Mar 2020
## 3 Weekly Entering & Transitioning Thread | 15 Mar 2020 - 22 Mar 2020
## body
## 1 _Bleep Bloop_. Welcome to this week's entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include:\n\n* Learning resources (e.g. books, tutorials, videos)\n* Traditional education (e.g. schools, degrees, electives)\n* Alternative education (e.g. online courses, bootcamps)\n* Job search questions (e.g. resumes, applying, career prospects)\n* Elementary questions (e.g. where to start, what next)\n\nWhile you wait for answers from the community, check out the [FAQ](https://www.reddit.com/r/datascience/wiki/frequently-asked-questions) and [Resources](https://www.reddit.com/r/datascience/wiki/resources) pages on our wiki. You can also search for [past weekly threads](https://www.reddit.com/r/datascience/search?q=weekly%20thread&restrict_sr=1&sort=new).\n\n---\n\nI am a bot created by the r/datascience moderators. I'm open source! You can review my [source code on GitHub](https://github.com/vogt4nick/datascience-bot).
## 2 _Bleep Bloop_. Welcome to this week's entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include:\n\n* Learning resources (e.g. books, tutorials, videos)\n* Traditional education (e.g. schools, degrees, electives)\n* Alternative education (e.g. online courses, bootcamps)\n* Job search questions (e.g. resumes, applying, career prospects)\n* Elementary questions (e.g. where to start, what next)\n\nWhile you wait for answers from the community, check out the [FAQ](https://www.reddit.com/r/datascience/wiki/frequently-asked-questions) and [Resources](https://www.reddit.com/r/datascience/wiki/resources) pages on our wiki. You can also search for [past weekly threads](https://www.reddit.com/r/datascience/search?q=weekly%20thread&restrict_sr=1&sort=new).\n\n---\n\nI am a bot created by the r/datascience moderators. I'm open source! You can review my [source code on GitHub](https://github.com/vogt4nick/datascience-bot).
## 3 _Bleep Bloop_. Welcome to this week's entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include:\n\n* Learning resources (e.g. books, tutorials, videos)\n* Traditional education (e.g. schools, degrees, electives)\n* Alternative education (e.g. online courses, bootcamps)\n* Job search questions (e.g. resumes, applying, career prospects)\n* Elementary questions (e.g. where to start, what next)\n\nWhile you wait for answers from the community, check out the [FAQ](https://www.reddit.com/r/datascience/wiki/frequently-asked-questions) and [Resources](https://www.reddit.com/r/datascience/wiki/resources) pages on our wiki. You can also search for [past weekly threads](https://www.reddit.com/r/datascience/search?q=weekly%20thread&restrict_sr=1&sort=new).\n\n---\n\nI am a bot created by the r/datascience moderators. I'm open source! You can review my [source code on GitHub](https://github.com/vogt4nick/datascience-bot).
## sub sub_post_id n_comments
## 1 datascience 0 56
## 2 datascience 0 56
## 3 datascience 0 56
The posts are now multiplied by the number of comments on them, so now we can merge the dataframes by index:
reddit <- merge(p1, comms, by = 'index')
head(reddit, 3)
## index category
## 1 0 datascience
## 2 1 datascience
## 3 2 datascience
## title
## 1 Weekly Entering & Transitioning Thread | 15 Mar 2020 - 22 Mar 2020
## 2 Weekly Entering & Transitioning Thread | 15 Mar 2020 - 22 Mar 2020
## 3 Weekly Entering & Transitioning Thread | 15 Mar 2020 - 22 Mar 2020
## body
## 1 _Bleep Bloop_. Welcome to this week's entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include:\n\n* Learning resources (e.g. books, tutorials, videos)\n* Traditional education (e.g. schools, degrees, electives)\n* Alternative education (e.g. online courses, bootcamps)\n* Job search questions (e.g. resumes, applying, career prospects)\n* Elementary questions (e.g. where to start, what next)\n\nWhile you wait for answers from the community, check out the [FAQ](https://www.reddit.com/r/datascience/wiki/frequently-asked-questions) and [Resources](https://www.reddit.com/r/datascience/wiki/resources) pages on our wiki. You can also search for [past weekly threads](https://www.reddit.com/r/datascience/search?q=weekly%20thread&restrict_sr=1&sort=new).\n\n---\n\nI am a bot created by the r/datascience moderators. I'm open source! You can review my [source code on GitHub](https://github.com/vogt4nick/datascience-bot).
## 2 _Bleep Bloop_. Welcome to this week's entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include:\n\n* Learning resources (e.g. books, tutorials, videos)\n* Traditional education (e.g. schools, degrees, electives)\n* Alternative education (e.g. online courses, bootcamps)\n* Job search questions (e.g. resumes, applying, career prospects)\n* Elementary questions (e.g. where to start, what next)\n\nWhile you wait for answers from the community, check out the [FAQ](https://www.reddit.com/r/datascience/wiki/frequently-asked-questions) and [Resources](https://www.reddit.com/r/datascience/wiki/resources) pages on our wiki. You can also search for [past weekly threads](https://www.reddit.com/r/datascience/search?q=weekly%20thread&restrict_sr=1&sort=new).\n\n---\n\nI am a bot created by the r/datascience moderators. I'm open source! You can review my [source code on GitHub](https://github.com/vogt4nick/datascience-bot).
## 3 _Bleep Bloop_. Welcome to this week's entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include:\n\n* Learning resources (e.g. books, tutorials, videos)\n* Traditional education (e.g. schools, degrees, electives)\n* Alternative education (e.g. online courses, bootcamps)\n* Job search questions (e.g. resumes, applying, career prospects)\n* Elementary questions (e.g. where to start, what next)\n\nWhile you wait for answers from the community, check out the [FAQ](https://www.reddit.com/r/datascience/wiki/frequently-asked-questions) and [Resources](https://www.reddit.com/r/datascience/wiki/resources) pages on our wiki. You can also search for [past weekly threads](https://www.reddit.com/r/datascience/search?q=weekly%20thread&restrict_sr=1&sort=new).\n\n---\n\nI am a bot created by the r/datascience moderators. I'm open source! You can review my [source code on GitHub](https://github.com/vogt4nick/datascience-bot).
## sub sub_post_id.x n_comments sub_post_id.y
## 1 datascience 0 56 0
## 2 datascience 0 56 0
## 3 datascience 0 56 0
## text
## 1 I am going to be starting a Computer Science Masters with a focus in Data Science/Machine Learning soon and I am interested in applying the skills I learn to cyber security. My question is, should I dedicate a few courses in my masters towards cyber security or should I get a certification or two in cyber security to begin to develop domain knowledge?
## 2 Do you guys think the virus is going to make it even harder to break into DS as a new grad school grad? \n\nI am getting really worried about this
## 3 In my Data Science degree I need to pick a "path"/specialization\n\nThe two I'm interested in are Economics and Social Sciences. The application in the former is fairly obvious, but what are the uses of the latter? I'm not interested in most areas of social science but am more so in things like PolSci and if it counts, IR. Would love to do something like Nate Silver for example\n\nAnyways, would love to hear from Data Scientists who work in Econometrics or some Social Science related fields :)
## likes
## 1 <NA>
## 2 <NA>
## 3 <NA>
reddit %<>%
select(-c(index, n_comments, sub_post_id.y, n_comments, likes, category)) %>%
rename(comments = text, post_id = sub_post_id.x) %>%
mutate(post_id = post_id + 1)
head(reddit, 3)
## title
## 1 Weekly Entering & Transitioning Thread | 15 Mar 2020 - 22 Mar 2020
## 2 Weekly Entering & Transitioning Thread | 15 Mar 2020 - 22 Mar 2020
## 3 Weekly Entering & Transitioning Thread | 15 Mar 2020 - 22 Mar 2020
## body
## 1 _Bleep Bloop_. Welcome to this week's entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include:\n\n* Learning resources (e.g. books, tutorials, videos)\n* Traditional education (e.g. schools, degrees, electives)\n* Alternative education (e.g. online courses, bootcamps)\n* Job search questions (e.g. resumes, applying, career prospects)\n* Elementary questions (e.g. where to start, what next)\n\nWhile you wait for answers from the community, check out the [FAQ](https://www.reddit.com/r/datascience/wiki/frequently-asked-questions) and [Resources](https://www.reddit.com/r/datascience/wiki/resources) pages on our wiki. You can also search for [past weekly threads](https://www.reddit.com/r/datascience/search?q=weekly%20thread&restrict_sr=1&sort=new).\n\n---\n\nI am a bot created by the r/datascience moderators. I'm open source! You can review my [source code on GitHub](https://github.com/vogt4nick/datascience-bot).
## 2 _Bleep Bloop_. Welcome to this week's entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include:\n\n* Learning resources (e.g. books, tutorials, videos)\n* Traditional education (e.g. schools, degrees, electives)\n* Alternative education (e.g. online courses, bootcamps)\n* Job search questions (e.g. resumes, applying, career prospects)\n* Elementary questions (e.g. where to start, what next)\n\nWhile you wait for answers from the community, check out the [FAQ](https://www.reddit.com/r/datascience/wiki/frequently-asked-questions) and [Resources](https://www.reddit.com/r/datascience/wiki/resources) pages on our wiki. You can also search for [past weekly threads](https://www.reddit.com/r/datascience/search?q=weekly%20thread&restrict_sr=1&sort=new).\n\n---\n\nI am a bot created by the r/datascience moderators. I'm open source! You can review my [source code on GitHub](https://github.com/vogt4nick/datascience-bot).
## 3 _Bleep Bloop_. Welcome to this week's entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include:\n\n* Learning resources (e.g. books, tutorials, videos)\n* Traditional education (e.g. schools, degrees, electives)\n* Alternative education (e.g. online courses, bootcamps)\n* Job search questions (e.g. resumes, applying, career prospects)\n* Elementary questions (e.g. where to start, what next)\n\nWhile you wait for answers from the community, check out the [FAQ](https://www.reddit.com/r/datascience/wiki/frequently-asked-questions) and [Resources](https://www.reddit.com/r/datascience/wiki/resources) pages on our wiki. You can also search for [past weekly threads](https://www.reddit.com/r/datascience/search?q=weekly%20thread&restrict_sr=1&sort=new).\n\n---\n\nI am a bot created by the r/datascience moderators. I'm open source! You can review my [source code on GitHub](https://github.com/vogt4nick/datascience-bot).
## sub post_id
## 1 datascience 1
## 2 datascience 1
## 3 datascience 1
## comments
## 1 I am going to be starting a Computer Science Masters with a focus in Data Science/Machine Learning soon and I am interested in applying the skills I learn to cyber security. My question is, should I dedicate a few courses in my masters towards cyber security or should I get a certification or two in cyber security to begin to develop domain knowledge?
## 2 Do you guys think the virus is going to make it even harder to break into DS as a new grad school grad? \n\nI am getting really worried about this
## 3 In my Data Science degree I need to pick a "path"/specialization\n\nThe two I'm interested in are Economics and Social Sciences. The application in the former is fairly obvious, but what are the uses of the latter? I'm not interested in most areas of social science but am more so in things like PolSci and if it counts, IR. Would love to do something like Nate Silver for example\n\nAnyways, would love to hear from Data Scientists who work in Econometrics or some Social Science related fields :)
We now removed uneccessary columns. The merged data set contains the title of the post, the body, the sub-reddit it belongs to, the post id, and all of the corresponding comments.
stopwords_regex = paste(stopwords('en'), collapse = '\\b|\\b')
stopwords_regex = paste0('\\b', stopwords_regex, '\\b')
reddit$comments = stringr::str_replace_all(reddit$comments, stopwords_regex, '')
Remove stopwords to help identify the true keywords in our dataset.
model <- udpipe_download_model(language = "english")
## Downloading udpipe model from https://raw.githubusercontent.com/jwijffels/udpipe.models.ud.2.4/master/inst/udpipe-ud-2.4-190531/english-ewt-ud-2.4-190531.udpipe to C:/Users/micel/Documents/607/english-ewt-ud-2.4-190531.udpipe
## Visit https://github.com/jwijffels/udpipe.models.ud.2.4 for model license details
udmodel_english <- udpipe_load_model(model)
s <- udpipe_annotate(udmodel_english, reddit$comments)
x <- data.frame(s)
Created a POS tagger using udpipe, so that we can identify different parts of speech and analyze the most frequent words.
stats <- subset(x, upos %in% c("NOUN"))
stats <- txt_freq(stats$token)
stats$key <- factor(stats$key, levels = rev(stats$key))
barchart(key ~ freq, data = head(stats, 20), col = "cadetblue",
main = "Most occurring nouns", xlab = "Freq")
stats <- keywords_rake(x = x, term = "lemma", group = "doc_id",
relevant = x$upos %in% c("NOUN", "ADJ"))
stats %<>%
filter(!grepl('Â', keyword))
stats$key <- factor(stats$keyword, levels = rev(stats$keyword))
barchart(key ~ rake, data = head(subset(stats, freq > 2), 20), col = "red",
main = "Keywords identified by RAKE",
xlab = "Rake")
stats <- subset(stats, ngram > 1 & freq > 6)
stats %<>%
filter(!grepl('â', keyword))
stats$key <- factor(stats$keyword, levels = rev(stats$keyword))
barchart(key ~ freq, data = head(stats, 20), col = "magenta",
main = "Keywords - simple noun phrases", xlab = "Frequency")
##XML Data
if (!require('XML')) install.packages('XML')
## Loading required package: XML
library(XML)
# Reading XML file from the web
con <-'https://stackoverflow.com/jobs/feed?dr=DataScientist&j=permanent%2ccontract'
job_raw<-readLines(con, warn = FALSE)
#function that scraps node category which contains job skills
require(XML)
Fun1 <- function(xdata){
dum <- xmlParse(xdata)
xDf<- xmlToDataFrame(nodes = getNodeSet(dum, "//*/category"), stringsAsFactors = FALSE)
xDf
}
# calling function and converting data frame
skills<-Fun1(job_raw)
skills_tbl<-sort(table(skills), decreasing=T)
skills_tbl<-data.frame(skills_tbl)
# writing table to database for comparison
# writing function is commented out as table has already been created, results are showed
# <ourgroupname> is no capitals and no spaces
host="soundsgood.crg53husyk2z.us-east-2.rds.amazonaws.com"
port="5432"
database="soundsgood"
username="soundsgood"
password="soundsgood"
drv <- dbDriver("PostgreSQL")
conc <- DBI::dbConnect(drv,
host = host,
user = username,
password = password)
# write function is commented out
####### dbWriteTable(conc,"skills_xml",skills_tbl)
# results
(skills_db <- tbl(conc, "skills_xml"))
## # Source: table<skills_xml> [?? x 3]
## # Database: postgres 11.0.5
## # [soundsgood@soundsgood.crg53husyk2z.us-east-2.rds.amazonaws.com:5432/soundsgood]
## row.names skills Freq
## <chr> <chr> <int>
## 1 1 python 81
## 2 2 machine-learning 79
## 3 3 r 48
## 4 4 sql 44
## 5 5 hadoop 18
## 6 6 java 15
## 7 7 pandas 14
## 8 8 algorithm 13
## 9 9 c++ 13
## 10 10 amazon-web-services 10
## # ... with more rows
# converting to a data frame
skills_db<-data.frame(skills_db)
# summarizing 20 most frequent skills
skills_db<-head(skills_db,20)
#Bar Chart
# barchart(skills_db$skills ~ skills_db$Freq, xlab = "Frequency", ylab = "Skills", main=" 15 Most Frequent Skills Required")
#skills_db$
ggplot(skills_db, aes(x = reorder(skills, Freq), y = Freq))+ geom_bar(stat = 'identity', color="blue", fill = 'white') + coord_flip() + labs(y="Frequency Mentioned", x="Skill")
dbDisconnect(conc)
## [1] TRUE
#Conclusion
The analysis of the Reddit comments was for the most part successful. We were able to identify many keywords related to Data Science, but not all of them would be considered skills. The data we gathered from the XML files is more directly related to skills. In conclusion, both data sources were useful but it is more difficult to single out skills alone from a large database of text.