Coffy Andrews-Guo, Joseph Foy, Krutika Patel, Peter Phung
Please refer to the project write-up document for further information.
library(wordcloud)
library(tidyr)
library(tidyverse)
library(dplyr)
library(RMySQL)
?RMySQL
MyFunction <- function(x){
con <- dbConnect(
MySQL(),
user = user, password = password, host = "localhost", port = 3306, dbname = "data_science"
)
df <- dbGetQuery(con, paste("select count(*)
from course
where description like '%", x, "%'", sep = ""))
dbDisconnect(con)
return(df)
}
csv <- data.frame(read.csv("https://raw.githubusercontent.com/candrewxs/D607_Project3/main/Accounting%20Attributes%20for%20Data%20Analysis.xlsx%20-%20DATA%20Scientist%20.csv"))
words <- as.vector(csv$Technical.Skills)
results <- c()
counter <- 1
for(x in words)
{
results[[counter]] <- MyFunction(x)
counter <- counter +1
}
results <- as.data.frame(bind_rows(results))
results <- cbind(as.data.frame(words), bind_cols(words, results))
results <- results[-2]
colnames(results) <- c("Skill", "Count")
results
## Skill Count
## 1 Data Visualization 23
## 2 Python 18
## 3 SQL 13
## 4 Machine Learning 6
## 5 Microsoft Excel 9
## 6 Statistics 47
## 7 Machine Learning 6
## 8 Natural Language Processing 0
## 9 Data Wrangling 1
## 10 Big Data 28
## 11 Linear Algebra 1
## 12 Calculus 5
## 13 Data Visualization 23
## 14 Data Analytics 49
## 15 Database 65
## 16 Java 3
## 17 Data Science 8
## 18 Blockchain 2
## 19 Regression 64
## 20 Predict 6
## 21 Coding 3
## 22 Text Analysis 0
## 23 Data Preparation 6
## 24 Data Mining 30
## 25 R 8
## 26 Collection 29
## 27 Data Cleaning 1
## 28 Modeling 59
wordcloud(words = results$Skill, freq = results$Count, max.words=200, random.order=FALSE, rot.per=0.35, colors=brewer.pal(8, "Dark2"))
words <- as.vector(csv$Soft.Skills)
words <- words[words != ""]
results <- c()
counter <- 1
for(x in words)
{
results[[counter]] <- MyFunction(x)
counter <- counter +1
}
results <- as.data.frame(bind_rows(results))
results <- cbind(as.data.frame(words), bind_cols(words, results))
results <- results[-2]
colnames(results) <- c("Skill", "Count")
results
## Skill Count
## 1 Team 90
## 2 Communication 210
## 3 Business Savvy 0
## 4 Storytelling 3
## 5 Collab 7
wordcloud(words = results$Skill, freq = results$Count, max.words=200, random.order=FALSE, rot.per=0.35, colors=brewer.pal(8, "Dark2"))
The following code allows us to visualize the course descriptions from accounting programs. We are able to target data science skills and analyze how they are being highlighted in the curriculum. From the finding, “Regression”, “Database”, “Modeling”, and “Statistics” are the highest data science skills taught. In terms of soft skills, we found the highest skill taught was communication. Looking at the overall picture, the amount of courses that feature communication are only a fraction. These findings are important way to understand the current way accounting curriculum is set up, and how schools can improve it to attract employers and students alike.