Members

Coffy Andrews-Guo, Joseph Foy, Krutika Patel, Peter Phung

Please refer to the project write-up document for further information.

library(wordcloud)
library(tidyr)
library(tidyverse)
library(dplyr)
library(RMySQL)

?RMySQL

Query Function

MyFunction <- function(x){

  con <- dbConnect(
    MySQL(),
    user = user, password = password, host = "localhost", port =  3306, dbname = "data_science"
  )

 
  df <- dbGetQuery(con, paste("select count(*)
  from course
  where description like '%", x, "%'", sep = ""))

  dbDisconnect(con)
  return(df)
}

Skills

csv <- data.frame(read.csv("https://raw.githubusercontent.com/candrewxs/D607_Project3/main/Accounting%20Attributes%20for%20Data%20Analysis.xlsx%20-%20DATA%20Scientist%20.csv"))

Technical Skills

words <- as.vector(csv$Technical.Skills)

results <- c()
counter <- 1

for(x in words)
{
  results[[counter]] <- MyFunction(x)
  counter <- counter +1
}

results <- as.data.frame(bind_rows(results))
results <- cbind(as.data.frame(words), bind_cols(words, results))
results <- results[-2]
colnames(results) <- c("Skill", "Count")

results
##                          Skill Count
## 1           Data Visualization    23
## 2                       Python    18
## 3                          SQL    13
## 4             Machine Learning     6
## 5              Microsoft Excel     9
## 6                   Statistics    47
## 7             Machine Learning     6
## 8  Natural Language Processing     0
## 9               Data Wrangling     1
## 10                    Big Data    28
## 11              Linear Algebra     1
## 12                    Calculus     5
## 13          Data Visualization    23
## 14              Data Analytics    49
## 15                    Database    65
## 16                        Java     3
## 17                Data Science     8
## 18                 Blockchain      2
## 19                  Regression    64
## 20                    Predict      6
## 21                      Coding     3
## 22               Text Analysis     0
## 23            Data Preparation     6
## 24                 Data Mining    30
## 25                          R      8
## 26                  Collection    29
## 27               Data Cleaning     1
## 28                    Modeling    59
wordcloud(words = results$Skill, freq = results$Count,  max.words=200, random.order=FALSE, rot.per=0.35,            colors=brewer.pal(8, "Dark2"))

Soft Skills

words <- as.vector(csv$Soft.Skills)
words <- words[words != ""]
results <- c()
counter <- 1

for(x in words)
{
  results[[counter]] <- MyFunction(x)
  counter <- counter +1
}

results <- as.data.frame(bind_rows(results))
results <- cbind(as.data.frame(words), bind_cols(words, results))
results <- results[-2]
colnames(results) <- c("Skill", "Count")

results
##            Skill Count
## 1           Team    90
## 2  Communication   210
## 3 Business Savvy     0
## 4   Storytelling     3
## 5         Collab     7
wordcloud(words = results$Skill, freq = results$Count,  max.words=200, random.order=FALSE, rot.per=0.35,            colors=brewer.pal(8, "Dark2"))

Conclusions

The following code allows us to visualize the course descriptions from accounting programs. We are able to target data science skills and analyze how they are being highlighted in the curriculum. From the finding, “Regression”, “Database”, “Modeling”, and “Statistics” are the highest data science skills taught. In terms of soft skills, we found the highest skill taught was communication. Looking at the overall picture, the amount of courses that feature communication are only a fraction. These findings are important way to understand the current way accounting curriculum is set up, and how schools can improve it to attract employers and students alike.