Load the libraries

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readr)
library(tidyr)
library(dplyr)
library(ggplot2)

Define CSV file path

df <- read_csv("https://raw.githubusercontent.com/GullitNa/DATA607-Project3/main/kaggle_survey_2020_responses%202.csv",
               skip = 1)
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)
## Rows: 20036 Columns: 355
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (353): What is your age (# years)?, What is your gender? - Selected Choi...
## dbl   (1): Duration (in seconds)
## lgl   (1): Which of the following business intelligence tools do you use on ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Inspect dataset

head(df)
## # A tibble: 6 × 355
##   `Duration (in seconds)` `What is your age (# years)?` What is your gender? -…¹
##                     <dbl> <chr>                         <chr>                   
## 1                    1838 35-39                         Man                     
## 2                  289287 30-34                         Man                     
## 3                     860 35-39                         Man                     
## 4                     507 30-34                         Man                     
## 5                      78 30-34                         Man                     
## 6                     401 30-34                         Man                     
## # ℹ abbreviated name: ¹​`What is your gender? - Selected Choice`
## # ℹ 352 more variables: `In which country do you currently reside?` <chr>,
## #   `What is the highest level of formal education that you have attained or plan to attain within the next 2 years?` <chr>,
## #   `Select the title most similar to your current role (or most recent title if retired): - Selected Choice` <chr>,
## #   `For how many years have you been writing code and/or programming?` <chr>,
## #   `What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Python` <chr>,
## #   `What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - R` <chr>, …
colnames(df)
##   [1] "Duration (in seconds)"                                                                                                                                                                                                                                                  
##   [2] "What is your age (# years)?"                                                                                                                                                                                                                                            
##   [3] "What is your gender? - Selected Choice"                                                                                                                                                                                                                                 
##   [4] "In which country do you currently reside?"                                                                                                                                                                                                                              
##   [5] "What is the highest level of formal education that you have attained or plan to attain within the next 2 years?"                                                                                                                                                        
##   [6] "Select the title most similar to your current role (or most recent title if retired): - Selected Choice"                                                                                                                                                                
##   [7] "For how many years have you been writing code and/or programming?"                                                                                                                                                                                                      
##   [8] "What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Python"                                                                                                                                                           
##   [9] "What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - R"                                                                                                                                                                
##  [10] "What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - SQL"                                                                                                                                                              
##  [11] "What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - C"                                                                                                                                                                
##  [12] "What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - C++"                                                                                                                                                              
##  [13] "What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Java"                                                                                                                                                             
##  [14] "What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Javascript"                                                                                                                                                       
##  [15] "What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Julia"                                                                                                                                                            
##  [16] "What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Swift"                                                                                                                                                            
##  [17] "What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Bash"                                                                                                                                                             
##  [18] "What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - MATLAB"                                                                                                                                                           
##  [19] "What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - None"                                                                                                                                                             
##  [20] "What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Other"                                                                                                                                                            
##  [21] "What programming language would you recommend an aspiring data scientist to learn first? - Selected Choice"                                                                                                                                                             
##  [22] "Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice - Jupyter (JupyterLab, Jupyter Notebooks, etc)"                                                                            
##  [23] "Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice -  RStudio"                                                                                                                
##  [24] "Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice -  Visual Studio / Visual Studio Code"                                                                                     
##  [25] "Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice - Click to write Choice 13"                                                                                                
##  [26] "Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice -  PyCharm"                                                                                                                
##  [27] "Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice -   Spyder"                                                                                                                
##  [28] "Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice -   Notepad++"                                                                                                             
##  [29] "Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice -   Sublime Text"                                                                                                          
##  [30] "Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice -   Vim / Emacs"                                                                                                           
##  [31] "Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice -  MATLAB"                                                                                                                 
##  [32] "Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice - None"                                                                                                                    
##  [33] "Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice - Other"                                                                                                                   
##  [34] "Which of the following hosted notebook products do you use on a regular basis?  (Select all that apply) - Selected Choice -  Kaggle Notebooks"                                                                                                                          
##  [35] "Which of the following hosted notebook products do you use on a regular basis?  (Select all that apply) - Selected Choice - Colab Notebooks"                                                                                                                            
##  [36] "Which of the following hosted notebook products do you use on a regular basis?  (Select all that apply) - Selected Choice - Azure Notebooks"                                                                                                                            
##  [37] "Which of the following hosted notebook products do you use on a regular basis?  (Select all that apply) - Selected Choice -  Paperspace / Gradient"                                                                                                                     
##  [38] "Which of the following hosted notebook products do you use on a regular basis?  (Select all that apply) - Selected Choice -  Binder / JupyterHub"                                                                                                                       
##  [39] "Which of the following hosted notebook products do you use on a regular basis?  (Select all that apply) - Selected Choice -  Code Ocean"                                                                                                                                
##  [40] "Which of the following hosted notebook products do you use on a regular basis?  (Select all that apply) - Selected Choice -  IBM Watson Studio"                                                                                                                         
##  [41] "Which of the following hosted notebook products do you use on a regular basis?  (Select all that apply) - Selected Choice -  Amazon Sagemaker Studio"                                                                                                                   
##  [42] "Which of the following hosted notebook products do you use on a regular basis?  (Select all that apply) - Selected Choice -  Amazon EMR Notebooks"                                                                                                                      
##  [43] "Which of the following hosted notebook products do you use on a regular basis?  (Select all that apply) - Selected Choice - Google Cloud AI Platform Notebooks"                                                                                                         
##  [44] "Which of the following hosted notebook products do you use on a regular basis?  (Select all that apply) - Selected Choice - Google Cloud Datalab Notebooks"                                                                                                             
##  [45] "Which of the following hosted notebook products do you use on a regular basis?  (Select all that apply) - Selected Choice -  Databricks Collaborative Notebooks"                                                                                                        
##  [46] "Which of the following hosted notebook products do you use on a regular basis?  (Select all that apply) - Selected Choice - None"                                                                                                                                       
##  [47] "Which of the following hosted notebook products do you use on a regular basis?  (Select all that apply) - Selected Choice - Other"                                                                                                                                      
##  [48] "What type of computing platform do you use most often for your data science projects? - Selected Choice"                                                                                                                                                                
##  [49] "Which types of specialized hardware do you use on a regular basis?  (Select all that apply) - Selected Choice - GPUs"                                                                                                                                                   
##  [50] "Which types of specialized hardware do you use on a regular basis?  (Select all that apply) - Selected Choice - TPUs"                                                                                                                                                   
##  [51] "Which types of specialized hardware do you use on a regular basis?  (Select all that apply) - Selected Choice - None"                                                                                                                                                   
##  [52] "Which types of specialized hardware do you use on a regular basis?  (Select all that apply) - Selected Choice - Other"                                                                                                                                                  
##  [53] "Approximately how many times have you used a TPU (tensor processing unit)?"                                                                                                                                                                                             
##  [54] "What data visualization libraries or tools do you use on a regular basis?  (Select all that apply) - Selected Choice -  Matplotlib"                                                                                                                                     
##  [55] "What data visualization libraries or tools do you use on a regular basis?  (Select all that apply) - Selected Choice -  Seaborn"                                                                                                                                        
##  [56] "What data visualization libraries or tools do you use on a regular basis?  (Select all that apply) - Selected Choice -  Plotly / Plotly Express"                                                                                                                        
##  [57] "What data visualization libraries or tools do you use on a regular basis?  (Select all that apply) - Selected Choice -  Ggplot / ggplot2"                                                                                                                               
##  [58] "What data visualization libraries or tools do you use on a regular basis?  (Select all that apply) - Selected Choice -  Shiny"                                                                                                                                          
##  [59] "What data visualization libraries or tools do you use on a regular basis?  (Select all that apply) - Selected Choice -  D3 js"                                                                                                                                          
##  [60] "What data visualization libraries or tools do you use on a regular basis?  (Select all that apply) - Selected Choice -  Altair"                                                                                                                                         
##  [61] "What data visualization libraries or tools do you use on a regular basis?  (Select all that apply) - Selected Choice -  Bokeh"                                                                                                                                          
##  [62] "What data visualization libraries or tools do you use on a regular basis?  (Select all that apply) - Selected Choice -  Geoplotlib"                                                                                                                                     
##  [63] "What data visualization libraries or tools do you use on a regular basis?  (Select all that apply) - Selected Choice -  Leaflet / Folium"                                                                                                                               
##  [64] "What data visualization libraries or tools do you use on a regular basis?  (Select all that apply) - Selected Choice - None"                                                                                                                                            
##  [65] "What data visualization libraries or tools do you use on a regular basis?  (Select all that apply) - Selected Choice - Other"                                                                                                                                           
##  [66] "For how many years have you used machine learning methods?"                                                                                                                                                                                                             
##  [67] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -   Scikit-learn"                                                                                                                           
##  [68] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -   TensorFlow"                                                                                                                             
##  [69] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  Keras"                                                                                                                                   
##  [70] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  PyTorch"                                                                                                                                 
##  [71] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  Fast.ai"                                                                                                                                 
##  [72] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  MXNet"                                                                                                                                   
##  [73] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  Xgboost"                                                                                                                                 
##  [74] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  LightGBM"                                                                                                                                
##  [75] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  CatBoost"                                                                                                                                
##  [76] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  Prophet"                                                                                                                                 
##  [77] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  H2O 3"                                                                                                                                   
##  [78] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  Caret"                                                                                                                                   
##  [79] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  Tidymodels"                                                                                                                              
##  [80] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  JAX"                                                                                                                                     
##  [81] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice - None"                                                                                                                                     
##  [82] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice - Other"                                                                                                                                    
##  [83] "Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Linear or Logistic Regression"                                                                                                                         
##  [84] "Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Decision Trees or Random Forests"                                                                                                                      
##  [85] "Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Gradient Boosting Machines (xgboost, lightgbm, etc)"                                                                                                   
##  [86] "Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Bayesian Approaches"                                                                                                                                   
##  [87] "Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Evolutionary Approaches"                                                                                                                               
##  [88] "Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Dense Neural Networks (MLPs, etc)"                                                                                                                     
##  [89] "Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Convolutional Neural Networks"                                                                                                                         
##  [90] "Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Generative Adversarial Networks"                                                                                                                       
##  [91] "Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Recurrent Neural Networks"                                                                                                                             
##  [92] "Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Transformer Networks (BERT, gpt-3, etc)"                                                                                                               
##  [93] "Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - None"                                                                                                                                                  
##  [94] "Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Other"                                                                                                                                                 
##  [95] "Which categories of computer vision methods do you use on a regular basis?  (Select all that apply) - Selected Choice - General purpose image/video tools (PIL, cv2, skimage, etc)"                                                                                     
##  [96] "Which categories of computer vision methods do you use on a regular basis?  (Select all that apply) - Selected Choice - Image segmentation methods (U-Net, Mask R-CNN, etc)"                                                                                            
##  [97] "Which categories of computer vision methods do you use on a regular basis?  (Select all that apply) - Selected Choice - Object detection methods (YOLOv3, RetinaNet, etc)"                                                                                              
##  [98] "Which categories of computer vision methods do you use on a regular basis?  (Select all that apply) - Selected Choice - Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc)"                           
##  [99] "Which categories of computer vision methods do you use on a regular basis?  (Select all that apply) - Selected Choice - Generative Networks (GAN, VAE, etc)"                                                                                                            
## [100] "Which categories of computer vision methods do you use on a regular basis?  (Select all that apply) - Selected Choice - None"                                                                                                                                           
## [101] "Which categories of computer vision methods do you use on a regular basis?  (Select all that apply) - Selected Choice - Other"                                                                                                                                          
## [102] "Which of the following natural language processing (NLP) methods do you use on a regular basis?  (Select all that apply) - Selected Choice - Word embeddings/vectors (GLoVe, fastText, word2vec)"                                                                       
## [103] "Which of the following natural language processing (NLP) methods do you use on a regular basis?  (Select all that apply) - Selected Choice - Encoder-decorder models (seq2seq, vanilla transformers)"                                                                   
## [104] "Which of the following natural language processing (NLP) methods do you use on a regular basis?  (Select all that apply) - Selected Choice - Contextualized embeddings (ELMo, CoVe)"                                                                                    
## [105] "Which of the following natural language processing (NLP) methods do you use on a regular basis?  (Select all that apply) - Selected Choice - Transformer language models (GPT-3, BERT, XLnet, etc)"                                                                     
## [106] "Which of the following natural language processing (NLP) methods do you use on a regular basis?  (Select all that apply) - Selected Choice - None"                                                                                                                      
## [107] "Which of the following natural language processing (NLP) methods do you use on a regular basis?  (Select all that apply) - Selected Choice - Other"                                                                                                                     
## [108] "What is the size of the company where you are employed?"                                                                                                                                                                                                                
## [109] "Approximately how many individuals are responsible for data science workloads at your place of business?"                                                                                                                                                               
## [110] "Does your current employer incorporate machine learning methods into their business?"                                                                                                                                                                                   
## [111] "Select any activities that make up an important part of your role at work: (Select all that apply) - Selected Choice - Analyze and understand data to influence product or business decisions"                                                                          
## [112] "Select any activities that make up an important part of your role at work: (Select all that apply) - Selected Choice - Build and/or run the data infrastructure that my business uses for storing, analyzing, and operationalizing data"                                
## [113] "Select any activities that make up an important part of your role at work: (Select all that apply) - Selected Choice - Build prototypes to explore applying machine learning to new areas"                                                                              
## [114] "Select any activities that make up an important part of your role at work: (Select all that apply) - Selected Choice - Build and/or run a machine learning service that operationally improves my product or workflows"                                                 
## [115] "Select any activities that make up an important part of your role at work: (Select all that apply) - Selected Choice - Experimentation and iteration to improve existing ML models"                                                                                     
## [116] "Select any activities that make up an important part of your role at work: (Select all that apply) - Selected Choice - Do research that advances the state of the art of machine learning"                                                                              
## [117] "Select any activities that make up an important part of your role at work: (Select all that apply) - Selected Choice - None of these activities are an important part of my role at work"                                                                               
## [118] "Select any activities that make up an important part of your role at work: (Select all that apply) - Selected Choice - Other"                                                                                                                                           
## [119] "What is your current yearly compensation (approximate $USD)?"                                                                                                                                                                                                           
## [120] "Approximately how much money have you (or your team) spent on machine learning and/or cloud computing services at home (or at work) in the past 5 years (approximate $USD)?"                                                                                            
## [121] "Which of the following cloud computing platforms do you use on a regular basis? (Select all that apply) - Selected Choice -  Amazon Web Services (AWS)"                                                                                                                 
## [122] "Which of the following cloud computing platforms do you use on a regular basis? (Select all that apply) - Selected Choice -  Microsoft Azure"                                                                                                                           
## [123] "Which of the following cloud computing platforms do you use on a regular basis? (Select all that apply) - Selected Choice -  Google Cloud Platform (GCP)"                                                                                                               
## [124] "Which of the following cloud computing platforms do you use on a regular basis? (Select all that apply) - Selected Choice -  IBM Cloud / Red Hat"                                                                                                                       
## [125] "Which of the following cloud computing platforms do you use on a regular basis? (Select all that apply) - Selected Choice -  Oracle Cloud"                                                                                                                              
## [126] "Which of the following cloud computing platforms do you use on a regular basis? (Select all that apply) - Selected Choice -  SAP Cloud"                                                                                                                                 
## [127] "Which of the following cloud computing platforms do you use on a regular basis? (Select all that apply) - Selected Choice -  Salesforce Cloud"                                                                                                                          
## [128] "Which of the following cloud computing platforms do you use on a regular basis? (Select all that apply) - Selected Choice -  VMware Cloud"                                                                                                                              
## [129] "Which of the following cloud computing platforms do you use on a regular basis? (Select all that apply) - Selected Choice -  Alibaba Cloud"                                                                                                                             
## [130] "Which of the following cloud computing platforms do you use on a regular basis? (Select all that apply) - Selected Choice -  Tencent Cloud"                                                                                                                             
## [131] "Which of the following cloud computing platforms do you use on a regular basis? (Select all that apply) - Selected Choice - None"                                                                                                                                       
## [132] "Which of the following cloud computing platforms do you use on a regular basis? (Select all that apply) - Selected Choice - Other"                                                                                                                                      
## [133] "Do you use any of the following cloud computing products on a regular basis? (Select all that apply) - Selected Choice -  Amazon EC2"                                                                                                                                   
## [134] "Do you use any of the following cloud computing products on a regular basis? (Select all that apply) - Selected Choice -  AWS Lambda"                                                                                                                                   
## [135] "Do you use any of the following cloud computing products on a regular basis? (Select all that apply) - Selected Choice -  Amazon Elastic Container Service"                                                                                                             
## [136] "Do you use any of the following cloud computing products on a regular basis? (Select all that apply) - Selected Choice -  Azure Cloud Services"                                                                                                                         
## [137] "Do you use any of the following cloud computing products on a regular basis? (Select all that apply) - Selected Choice -  Microsoft Azure Container Instances"                                                                                                          
## [138] "Do you use any of the following cloud computing products on a regular basis? (Select all that apply) - Selected Choice -  Azure Functions"                                                                                                                              
## [139] "Do you use any of the following cloud computing products on a regular basis? (Select all that apply) - Selected Choice -  Google Cloud Compute Engine"                                                                                                                  
## [140] "Do you use any of the following cloud computing products on a regular basis? (Select all that apply) - Selected Choice -  Google Cloud Functions"                                                                                                                       
## [141] "Do you use any of the following cloud computing products on a regular basis? (Select all that apply) - Selected Choice -  Google Cloud Run"                                                                                                                             
## [142] "Do you use any of the following cloud computing products on a regular basis? (Select all that apply) - Selected Choice -  Google Cloud App Engine"                                                                                                                      
## [143] "Do you use any of the following cloud computing products on a regular basis? (Select all that apply) - Selected Choice - No / None"                                                                                                                                     
## [144] "Do you use any of the following cloud computing products on a regular basis? (Select all that apply) - Selected Choice - Other"                                                                                                                                         
## [145] "Do you use any of the following machine learning products on a regular basis? (Select all that apply) - Selected Choice -  Amazon SageMaker"                                                                                                                            
## [146] "Do you use any of the following machine learning products on a regular basis? (Select all that apply) - Selected Choice -  Amazon Forecast"                                                                                                                             
## [147] "Do you use any of the following machine learning products on a regular basis? (Select all that apply) - Selected Choice -  Amazon Rekognition"                                                                                                                          
## [148] "Do you use any of the following machine learning products on a regular basis? (Select all that apply) - Selected Choice -  Azure Machine Learning Studio"                                                                                                               
## [149] "Do you use any of the following machine learning products on a regular basis? (Select all that apply) - Selected Choice -  Azure Cognitive Services"                                                                                                                    
## [150] "Do you use any of the following machine learning products on a regular basis? (Select all that apply) - Selected Choice -  Google Cloud AI Platform / Google Cloud ML Engine"                                                                                           
## [151] "Do you use any of the following machine learning products on a regular basis? (Select all that apply) - Selected Choice -  Google Cloud Video AI"                                                                                                                       
## [152] "Do you use any of the following machine learning products on a regular basis? (Select all that apply) - Selected Choice -  Google Cloud Natural Language"                                                                                                               
## [153] "Do you use any of the following machine learning products on a regular basis? (Select all that apply) - Selected Choice -  Google Cloud Vision AI"                                                                                                                      
## [154] "Do you use any of the following machine learning products on a regular basis? (Select all that apply) - Selected Choice - No / None"                                                                                                                                    
## [155] "Do you use any of the following machine learning products on a regular basis? (Select all that apply) - Selected Choice - Other"                                                                                                                                        
## [156] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - MySQL"                                                                              
## [157] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - PostgresSQL"                                                                        
## [158] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - SQLite"                                                                             
## [159] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - Oracle Database"                                                                    
## [160] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - MongoDB"                                                                            
## [161] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - Snowflake"                                                                          
## [162] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - IBM Db2"                                                                            
## [163] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - Microsoft SQL Server"                                                               
## [164] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - Microsoft Access"                                                                   
## [165] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - Microsoft Azure Data Lake Storage"                                                  
## [166] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - Amazon Redshift"                                                                    
## [167] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - Amazon Athena"                                                                      
## [168] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - Amazon DynamoDB"                                                                    
## [169] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - Google Cloud BigQuery"                                                              
## [170] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - Google Cloud SQL"                                                                   
## [171] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - Google Cloud Firestore"                                                             
## [172] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - None"                                                                               
## [173] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - Other"                                                                              
## [174] "Which of the following big data products (relational database, data warehouse, data lake, or similar) do you use most often? - Selected Choice"                                                                                                                         
## [175] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - Amazon QuickSight"                                                                                                                        
## [176] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - Microsoft Power BI"                                                                                                                       
## [177] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - Google Data Studio"                                                                                                                       
## [178] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - Looker"                                                                                                                                   
## [179] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - Tableau"                                                                                                                                  
## [180] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - Salesforce"                                                                                                                               
## [181] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - Einstein Analytics"                                                                                                                       
## [182] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - Qlik"                                                                                                                                     
## [183] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - Domo"                                                                                                                                     
## [184] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - TIBCO Spotfire"                                                                                                                           
## [185] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - Alteryx"                                                                                                                                  
## [186] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - Sisense"                                                                                                                                  
## [187] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - SAP Analytics Cloud"                                                                                                                      
## [188] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - None"                                                                                                                                     
## [189] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - Other"                                                                                                                                    
## [190] "Which of the following business intelligence tools do you use most often? - Selected Choice"                                                                                                                                                                            
## [191] "Do you use any automated machine learning tools (or partial AutoML tools) on a regular basis?  (Select all that apply) - Selected Choice - Automated data augmentation (e.g. imgaug, albumentations)"                                                                   
## [192] "Do you use any automated machine learning tools (or partial AutoML tools) on a regular basis?  (Select all that apply) - Selected Choice - Automated feature engineering/selection (e.g. tpot, boruta_py)"                                                              
## [193] "Do you use any automated machine learning tools (or partial AutoML tools) on a regular basis?  (Select all that apply) - Selected Choice - Automated model selection (e.g. auto-sklearn, xcessiv)"                                                                      
## [194] "Do you use any automated machine learning tools (or partial AutoML tools) on a regular basis?  (Select all that apply) - Selected Choice - Automated model architecture searches (e.g. darts, enas)"                                                                    
## [195] "Do you use any automated machine learning tools (or partial AutoML tools) on a regular basis?  (Select all that apply) - Selected Choice - Automated hyperparameter tuning (e.g. hyperopt, ray.tune, Vizier)"                                                           
## [196] "Do you use any automated machine learning tools (or partial AutoML tools) on a regular basis?  (Select all that apply) - Selected Choice - Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI)"                                                     
## [197] "Do you use any automated machine learning tools (or partial AutoML tools) on a regular basis?  (Select all that apply) - Selected Choice - No / None"                                                                                                                   
## [198] "Do you use any automated machine learning tools (or partial AutoML tools) on a regular basis?  (Select all that apply) - Selected Choice - Other"                                                                                                                       
## [199] "Which of the following automated machine learning tools (or partial AutoML tools) do you use on a regular basis?  (Select all that apply) - Selected Choice -  Google Cloud AutoML"                                                                                     
## [200] "Which of the following automated machine learning tools (or partial AutoML tools) do you use on a regular basis?  (Select all that apply) - Selected Choice -  H20 Driverless AI"                                                                                       
## [201] "Which of the following automated machine learning tools (or partial AutoML tools) do you use on a regular basis?  (Select all that apply) - Selected Choice -  Databricks AutoML"                                                                                       
## [202] "Which of the following automated machine learning tools (or partial AutoML tools) do you use on a regular basis?  (Select all that apply) - Selected Choice -  DataRobot AutoML"                                                                                        
## [203] "Which of the following automated machine learning tools (or partial AutoML tools) do you use on a regular basis?  (Select all that apply) - Selected Choice -   Tpot"                                                                                                   
## [204] "Which of the following automated machine learning tools (or partial AutoML tools) do you use on a regular basis?  (Select all that apply) - Selected Choice -   Auto-Keras"                                                                                             
## [205] "Which of the following automated machine learning tools (or partial AutoML tools) do you use on a regular basis?  (Select all that apply) - Selected Choice -   Auto-Sklearn"                                                                                           
## [206] "Which of the following automated machine learning tools (or partial AutoML tools) do you use on a regular basis?  (Select all that apply) - Selected Choice -   Auto_ml"                                                                                                
## [207] "Which of the following automated machine learning tools (or partial AutoML tools) do you use on a regular basis?  (Select all that apply) - Selected Choice -   Xcessiv"                                                                                                
## [208] "Which of the following automated machine learning tools (or partial AutoML tools) do you use on a regular basis?  (Select all that apply) - Selected Choice -   MLbox"                                                                                                  
## [209] "Which of the following automated machine learning tools (or partial AutoML tools) do you use on a regular basis?  (Select all that apply) - Selected Choice - No / None"                                                                                                
## [210] "Which of the following automated machine learning tools (or partial AutoML tools) do you use on a regular basis?  (Select all that apply) - Selected Choice - Other"                                                                                                    
## [211] "Do you use any tools to help manage machine learning experiments? (Select all that apply) - Selected Choice -  Neptune.ai"                                                                                                                                              
## [212] "Do you use any tools to help manage machine learning experiments? (Select all that apply) - Selected Choice -  Weights & Biases"                                                                                                                                        
## [213] "Do you use any tools to help manage machine learning experiments? (Select all that apply) - Selected Choice -  Comet.ml"                                                                                                                                                
## [214] "Do you use any tools to help manage machine learning experiments? (Select all that apply) - Selected Choice -  Sacred + Omniboard"                                                                                                                                      
## [215] "Do you use any tools to help manage machine learning experiments? (Select all that apply) - Selected Choice -  TensorBoard"                                                                                                                                             
## [216] "Do you use any tools to help manage machine learning experiments? (Select all that apply) - Selected Choice -  Guild.ai"                                                                                                                                                
## [217] "Do you use any tools to help manage machine learning experiments? (Select all that apply) - Selected Choice -  Polyaxon"                                                                                                                                                
## [218] "Do you use any tools to help manage machine learning experiments? (Select all that apply) - Selected Choice -  Trains"                                                                                                                                                  
## [219] "Do you use any tools to help manage machine learning experiments? (Select all that apply) - Selected Choice -  Domino Model Monitor"                                                                                                                                    
## [220] "Do you use any tools to help manage machine learning experiments? (Select all that apply) - Selected Choice - No / None"                                                                                                                                                
## [221] "Do you use any tools to help manage machine learning experiments? (Select all that apply) - Selected Choice - Other"                                                                                                                                                    
## [222] "Where do you publicly share or deploy your data analysis or machine learning applications? (Select all that apply) - Selected Choice -  Plotly Dash"                                                                                                                    
## [223] "Where do you publicly share or deploy your data analysis or machine learning applications? (Select all that apply) - Selected Choice -  Streamlit"                                                                                                                      
## [224] "Where do you publicly share or deploy your data analysis or machine learning applications? (Select all that apply) - Selected Choice -  NBViewer"                                                                                                                       
## [225] "Where do you publicly share or deploy your data analysis or machine learning applications? (Select all that apply) - Selected Choice -  GitHub"                                                                                                                         
## [226] "Where do you publicly share or deploy your data analysis or machine learning applications? (Select all that apply) - Selected Choice -  Personal blog"                                                                                                                  
## [227] "Where do you publicly share or deploy your data analysis or machine learning applications? (Select all that apply) - Selected Choice -  Kaggle"                                                                                                                         
## [228] "Where do you publicly share or deploy your data analysis or machine learning applications? (Select all that apply) - Selected Choice -  Colab"                                                                                                                          
## [229] "Where do you publicly share or deploy your data analysis or machine learning applications? (Select all that apply) - Selected Choice -  Shiny"                                                                                                                          
## [230] "Where do you publicly share or deploy your data analysis or machine learning applications? (Select all that apply) - Selected Choice - I do not share my work publicly"                                                                                                 
## [231] "Where do you publicly share or deploy your data analysis or machine learning applications? (Select all that apply) - Selected Choice - Other"                                                                                                                           
## [232] "On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - Coursera"                                                                                                                                              
## [233] "On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - edX"                                                                                                                                                   
## [234] "On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - Kaggle Learn Courses"                                                                                                                                  
## [235] "On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - DataCamp"                                                                                                                                              
## [236] "On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - Fast.ai"                                                                                                                                               
## [237] "On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - Udacity"                                                                                                                                               
## [238] "On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - Udemy"                                                                                                                                                 
## [239] "On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - LinkedIn Learning"                                                                                                                                     
## [240] "On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - Cloud-certification programs (direct from AWS, Azure, GCP, or similar)"                                                                                
## [241] "On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - University Courses (resulting in a university degree)"                                                                                                 
## [242] "On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - None"                                                                                                                                                  
## [243] "On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - Other"                                                                                                                                                 
## [244] "What is the primary tool that you use at work or school to analyze data? (Include text response) - Selected Choice"                                                                                                                                                     
## [245] "Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Twitter (data science influencers)"                                                                                                            
## [246] "Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Email newsletters (Data Elixir, O'Reilly Data & AI, etc)"                                                                                      
## [247] "Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Reddit (r/machinelearning, etc)"                                                                                                               
## [248] "Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Kaggle (notebooks, forums, etc)"                                                                                                               
## [249] "Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Course Forums (forums.fast.ai, Coursera forums, etc)"                                                                                          
## [250] "Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - YouTube (Kaggle YouTube, Cloud AI Adventures, etc)"                                                                                            
## [251] "Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Podcasts (Chai Time Data Science, O’Reilly Data Show, etc)"                                                                                    
## [252] "Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Blogs (Towards Data Science, Analytics Vidhya, etc)"                                                                                           
## [253] "Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Journal Publications (peer-reviewed journals, conference proceedings, etc)"                                                                    
## [254] "Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Slack Communities (ods.ai, kagglenoobs, etc)"                                                                                                  
## [255] "Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - None"                                                                                                                                          
## [256] "Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Other"                                                                                                                                         
## [257] "Which of the following cloud computing platforms do you hope to become more familiar with in the next 2 years? - Selected Choice -  Amazon Web Services (AWS)"                                                                                                          
## [258] "Which of the following cloud computing platforms do you hope to become more familiar with in the next 2 years? - Selected Choice -  Microsoft Azure"                                                                                                                    
## [259] "Which of the following cloud computing platforms do you hope to become more familiar with in the next 2 years? - Selected Choice -  Google Cloud Platform (GCP)"                                                                                                        
## [260] "Which of the following cloud computing platforms do you hope to become more familiar with in the next 2 years? - Selected Choice -  IBM Cloud / Red Hat"                                                                                                                
## [261] "Which of the following cloud computing platforms do you hope to become more familiar with in the next 2 years? - Selected Choice -  Oracle Cloud"                                                                                                                       
## [262] "Which of the following cloud computing platforms do you hope to become more familiar with in the next 2 years? - Selected Choice -  SAP Cloud"                                                                                                                          
## [263] "Which of the following cloud computing platforms do you hope to become more familiar with in the next 2 years? - Selected Choice -  VMware Cloud"                                                                                                                       
## [264] "Which of the following cloud computing platforms do you hope to become more familiar with in the next 2 years? - Selected Choice -  Salesforce Cloud"                                                                                                                   
## [265] "Which of the following cloud computing platforms do you hope to become more familiar with in the next 2 years? - Selected Choice -  Alibaba Cloud"                                                                                                                      
## [266] "Which of the following cloud computing platforms do you hope to become more familiar with in the next 2 years? - Selected Choice -  Tencent Cloud"                                                                                                                      
## [267] "Which of the following cloud computing platforms do you hope to become more familiar with in the next 2 years? - Selected Choice - None"                                                                                                                                
## [268] "Which of the following cloud computing platforms do you hope to become more familiar with in the next 2 years? - Selected Choice - Other"                                                                                                                               
## [269] "In the next 2 years, do you hope to become more familiar with any of these specific cloud computing products? (Select all that apply) - Selected Choice -  Amazon EC2"                                                                                                  
## [270] "In the next 2 years, do you hope to become more familiar with any of these specific cloud computing products? (Select all that apply) - Selected Choice -  AWS Lambda"                                                                                                  
## [271] "In the next 2 years, do you hope to become more familiar with any of these specific cloud computing products? (Select all that apply) - Selected Choice -  Amazon Elastic Container Service"                                                                            
## [272] "In the next 2 years, do you hope to become more familiar with any of these specific cloud computing products? (Select all that apply) - Selected Choice -  Azure Cloud Services"                                                                                        
## [273] "In the next 2 years, do you hope to become more familiar with any of these specific cloud computing products? (Select all that apply) - Selected Choice -  Microsoft Azure Container Instances"                                                                         
## [274] "In the next 2 years, do you hope to become more familiar with any of these specific cloud computing products? (Select all that apply) - Selected Choice -  Azure Functions"                                                                                             
## [275] "In the next 2 years, do you hope to become more familiar with any of these specific cloud computing products? (Select all that apply) - Selected Choice -  Google Cloud Compute Engine"                                                                                 
## [276] "In the next 2 years, do you hope to become more familiar with any of these specific cloud computing products? (Select all that apply) - Selected Choice -  Google Cloud Functions"                                                                                      
## [277] "In the next 2 years, do you hope to become more familiar with any of these specific cloud computing products? (Select all that apply) - Selected Choice -  Google Cloud Run"                                                                                            
## [278] "In the next 2 years, do you hope to become more familiar with any of these specific cloud computing products? (Select all that apply) - Selected Choice -  Google Cloud App Engine"                                                                                     
## [279] "In the next 2 years, do you hope to become more familiar with any of these specific cloud computing products? (Select all that apply) - Selected Choice - None"                                                                                                         
## [280] "In the next 2 years, do you hope to become more familiar with any of these specific cloud computing products? (Select all that apply) - Selected Choice - Other"                                                                                                        
## [281] "In the next 2 years, do you hope to become more familiar with any of these specific machine learning products? (Select all that apply) - Selected Choice -  Amazon SageMaker"                                                                                           
## [282] "In the next 2 years, do you hope to become more familiar with any of these specific machine learning products? (Select all that apply) - Selected Choice -  Amazon Forecast"                                                                                            
## [283] "In the next 2 years, do you hope to become more familiar with any of these specific machine learning products? (Select all that apply) - Selected Choice -  Amazon Rekognition"                                                                                         
## [284] "In the next 2 years, do you hope to become more familiar with any of these specific machine learning products? (Select all that apply) - Selected Choice -  Azure Machine Learning Studio"                                                                              
## [285] "In the next 2 years, do you hope to become more familiar with any of these specific machine learning products? (Select all that apply) - Selected Choice -  Azure Cognitive Services"                                                                                   
## [286] "In the next 2 years, do you hope to become more familiar with any of these specific machine learning products? (Select all that apply) - Selected Choice -  Google Cloud AI Platform / Google Cloud ML Engine"                                                          
## [287] "In the next 2 years, do you hope to become more familiar with any of these specific machine learning products? (Select all that apply) - Selected Choice -  Google Cloud Video AI"                                                                                      
## [288] "In the next 2 years, do you hope to become more familiar with any of these specific machine learning products? (Select all that apply) - Selected Choice -  Google Cloud Natural Language"                                                                              
## [289] "In the next 2 years, do you hope to become more familiar with any of these specific machine learning products? (Select all that apply) - Selected Choice -  Google Cloud Vision AI"                                                                                     
## [290] "In the next 2 years, do you hope to become more familiar with any of these specific machine learning products? (Select all that apply) - Selected Choice - None"                                                                                                        
## [291] "In the next 2 years, do you hope to become more familiar with any of these specific machine learning products? (Select all that apply) - Selected Choice - Other"                                                                                                       
## [292] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - MySQL"                                               
## [293] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - PostgresSQL"                                         
## [294] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - SQLite"                                              
## [295] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Oracle Database"                                     
## [296] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - MongoDB"                                             
## [297] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Snowflake"                                           
## [298] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - IBM Db2"                                             
## [299] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Microsoft SQL Server"                                
## [300] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Microsoft Access"                                    
## [301] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Microsoft Azure Data Lake Storage"                   
## [302] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Amazon Redshift"                                     
## [303] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Amazon Athena"                                       
## [304] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Amazon DynamoDB"                                     
## [305] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Google Cloud BigQuery"                               
## [306] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Google Cloud SQL"                                    
## [307] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Google Cloud Firestore"                              
## [308] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - None"                                                
## [309] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Other"                                               
## [310] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Microsoft Power BI"                                                                                        
## [311] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Amazon QuickSight"                                                                                         
## [312] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Google Data Studio"                                                                                        
## [313] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Looker"                                                                                                    
## [314] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Tableau"                                                                                                   
## [315] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Salesforce"                                                                                                
## [316] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Einstein Analytics"                                                                                        
## [317] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Qlik"                                                                                                      
## [318] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Domo"                                                                                                      
## [319] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - TIBCO Spotfire"                                                                                            
## [320] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Alteryx"                                                                                                   
## [321] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Sisense"                                                                                                   
## [322] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - SAP Analytics Cloud"                                                                                       
## [323] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - None"                                                                                                      
## [324] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Other"                                                                                                     
## [325] "Which categories of automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years?  (Select all that apply) - Selected Choice - Automated data augmentation (e.g. imgaug, albumentations)"                    
## [326] "Which categories of automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years?  (Select all that apply) - Selected Choice - Automated feature engineering/selection (e.g. tpot, boruta_py)"               
## [327] "Which categories of automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years?  (Select all that apply) - Selected Choice - Automated model selection (e.g. auto-sklearn, xcessiv)"                       
## [328] "Which categories of automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years?  (Select all that apply) - Selected Choice - Automated model architecture searches (e.g. darts, enas)"                     
## [329] "Which categories of automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years?  (Select all that apply) - Selected Choice - Automated hyperparameter tuning (e.g. hyperopt, ray.tune, Vizier)"            
## [330] "Which categories of automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years?  (Select all that apply) - Selected Choice - Automation of full ML pipelines (e.g. Google Cloud AutoML, H20 Driverless AI)"
## [331] "Which categories of automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years?  (Select all that apply) - Selected Choice - None"                                                                         
## [332] "Which categories of automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years?  (Select all that apply) - Selected Choice - Other"                                                                        
## [333] "Which specific automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years?  (Select all that apply) - Selected Choice -  Google Cloud AutoML"                                                              
## [334] "Which specific automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years?  (Select all that apply) - Selected Choice -  H20 Driverless AI"                                                                
## [335] "Which specific automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years?  (Select all that apply) - Selected Choice -  Databricks AutoML"                                                                
## [336] "Which specific automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years?  (Select all that apply) - Selected Choice -  DataRobot AutoML"                                                                 
## [337] "Which specific automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years?  (Select all that apply) - Selected Choice -   Tpot"                                                                            
## [338] "Which specific automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years?  (Select all that apply) - Selected Choice -   Auto-Keras"                                                                      
## [339] "Which specific automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years?  (Select all that apply) - Selected Choice -   Auto-Sklearn"                                                                    
## [340] "Which specific automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years?  (Select all that apply) - Selected Choice -   Auto_ml"                                                                         
## [341] "Which specific automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years?  (Select all that apply) - Selected Choice -   Xcessiv"                                                                         
## [342] "Which specific automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years?  (Select all that apply) - Selected Choice -   MLbox"                                                                           
## [343] "Which specific automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years?  (Select all that apply) - Selected Choice - None"                                                                              
## [344] "Which specific automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years?  (Select all that apply) - Selected Choice - Other"                                                                             
## [345] "In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice -  Neptune.ai"                                                                                                  
## [346] "In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice -  Weights & Biases"                                                                                            
## [347] "In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice -  Comet.ml"                                                                                                    
## [348] "In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice -  Sacred + Omniboard"                                                                                          
## [349] "In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice -  TensorBoard"                                                                                                 
## [350] "In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice -  Guild.ai"                                                                                                    
## [351] "In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice -  Polyaxon"                                                                                                    
## [352] "In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice -  Trains"                                                                                                      
## [353] "In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice -  Domino Model Monitor"                                                                                        
## [354] "In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - None"                                                                                                         
## [355] "In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - Other"

Rename for clarity

df <- df %>%
  rename(
    job_title = "Select the title most similar to your current role (or most recent title if retired): - Selected Choice",
    experience = "For how many years have you been writing code and/or programming?", 
    country = "In which country do you currently reside?",
    salary = "What is your current yearly compensation (approximate $USD)?",
    education = "What is the highest level of formal education that you have attained or plan to attain within the next 2 years?"
  )

Convert key columns to factors #essential for grouping data or plotting categorical data

df$job_title <- as.factor(df$job_title)
df$experience <- as.factor(df$experience)
df$salary <- as.factor(df$salary)
df$country <- as.factor(df$country)

Filter out rows with missing salary

df_filtered <- df %>% filter(!is.na(salary))

Select relevant columns for final dataset

df_filtered <- df %>%
  select(
    job_title,
    experience,
    country,
    salary,
    education,
    #Programming languages (Q7)
    starts_with("What programming languages do you use on a regular basis"),
    #ML tools (Q17)
    starts_with("Which of the following ML algorithms do you use on a regular basis")
  )

Identify Q7 (Programming Languages people regularly use) and Q17 (ML algorithms used regularly) columns

q7_cols <- grep("What programming languages do you use on a regular basis?", names(df_filtered), value = TRUE)
q17_cols <- grep("Which of the following ML algorithms do you use on a regular basis?", names(df_filtered), value = TRUE)
unique_vals <- df_filtered %>%
  select(all_of(q7_cols)) %>%
  unlist() %>%      
  unique()                   


unique_vals <- unique_vals[!is.na(unique_vals)]

unique_vals
##  [1] "Python"     "R"          "SQL"        "C"          "C++"       
##  [6] "Java"       "Javascript" "Julia"      "Swift"      "Bash"      
## [11] "MATLAB"     "None"       "Other"

Convert selected Q7 columns to logical: TRUE. if not NA or “”

Converts TRUE or FALSE to 1 or 0

Necessary for rowSums()

df_filtered[q7_cols] <- lapply(df_filtered[q7_cols], function(x) ifelse(x != "" & !is.na(x), 1, 0))

Convert selected Q14 columns to logical as well

df_filtered[q17_cols] <- lapply(df_filtered[q17_cols], function(x) ifelse(x != "" & !is.na(x), 1, 0))

Create total programming skills column from Q7 (total number of programming languages each respondent uses)

df_filtered$total_prog_skills <- rowSums(df_filtered[q7_cols], na.rm = TRUE)

Create total ML skills column from Q17 (total number of ML algorithms each respondent uses)

df_filtered$total_ml_skills <- rowSums(df_filtered[q17_cols], na.rm = TRUE)

View cleaned dataset

view(df_filtered)

Create totals columns

df_filtered$total_prog_skills <- rowSums(df_filtered[q7_cols], na.rm = TRUE)
df_filtered$total_ml_skills <- rowSums(df_filtered[q17_cols], na.rm = TRUE)
df_filtered$total_skills <- df_filtered$total_prog_skills + df_filtered$total_ml_skills

view data set

view(df_filtered)

Summary

str(df_filtered)
## tibble [20,036 × 33] (S3: tbl_df/tbl/data.frame)
##  $ job_title                                                                                                                                                           : Factor w/ 13 levels "Business Analyst",..: 13 4 11 5 11 3 13 13 5 13 ...
##  $ experience                                                                                                                                                          : Factor w/ 7 levels "< 1 years","1-2 years",..: 6 6 3 6 5 1 5 1 6 1 ...
##  $ country                                                                                                                                                             : Factor w/ 55 levels "Argentina","Australia",..: 10 54 1 54 22 16 6 9 13 9 ...
##  $ salary                                                                                                                                                              : Factor w/ 25 levels "$0-999","> $500,000",..: NA 5 7 6 NA NA NA NA 23 NA ...
##  $ education                                                                                                                                                           : chr [1:20036] "Doctoral degree" "Master’s degree" "Bachelor’s degree" "Master’s degree" ...
##  $ What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Python                                                        : num [1:20036] 1 1 0 1 1 1 1 0 1 1 ...
##  $ What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - R                                                             : num [1:20036] 1 1 0 0 0 1 1 1 0 0 ...
##  $ What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - SQL                                                           : num [1:20036] 1 1 0 1 0 0 0 0 1 1 ...
##  $ What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - C                                                             : num [1:20036] 1 0 0 0 0 0 1 0 0 0 ...
##  $ What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - C++                                                           : num [1:20036] 0 0 0 0 0 0 0 0 0 0 ...
##  $ What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Java                                                          : num [1:20036] 0 0 1 0 0 0 0 0 0 0 ...
##  $ What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Javascript                                                    : num [1:20036] 1 0 1 0 0 0 0 0 0 0 ...
##  $ What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Julia                                                         : num [1:20036] 0 0 0 0 0 0 0 0 0 0 ...
##  $ What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Swift                                                         : num [1:20036] 0 0 0 0 0 0 0 0 0 0 ...
##  $ What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Bash                                                          : num [1:20036] 0 0 1 1 0 0 0 0 1 0 ...
##  $ What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - MATLAB                                                        : num [1:20036] 1 0 0 0 0 0 0 0 0 0 ...
##  $ What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - None                                                          : num [1:20036] 0 0 0 0 0 0 0 0 0 0 ...
##  $ What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Other                                                         : num [1:20036] 1 0 0 0 0 0 0 0 0 0 ...
##  $ Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Linear or Logistic Regression                      : num [1:20036] 0 1 0 1 0 0 1 0 0 0 ...
##  $ Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Decision Trees or Random Forests                   : num [1:20036] 1 0 0 1 0 0 1 0 0 0 ...
##  $ Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Gradient Boosting Machines (xgboost, lightgbm, etc): num [1:20036] 1 0 0 1 0 0 1 0 1 0 ...
##  $ Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Bayesian Approaches                                : num [1:20036] 1 0 0 1 0 0 0 0 0 0 ...
##  $ Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Evolutionary Approaches                            : num [1:20036] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Dense Neural Networks (MLPs, etc)                  : num [1:20036] 1 0 0 1 0 0 0 0 0 0 ...
##  $ Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Convolutional Neural Networks                      : num [1:20036] 1 1 0 0 0 0 1 1 1 0 ...
##  $ Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Generative Adversarial Networks                    : num [1:20036] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Recurrent Neural Networks                          : num [1:20036] 1 0 0 0 0 0 0 0 0 0 ...
##  $ Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Transformer Networks (BERT, gpt-3, etc)            : num [1:20036] 0 1 0 0 0 0 0 0 0 0 ...
##  $ Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - None                                               : num [1:20036] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Other                                              : num [1:20036] 0 0 0 0 0 0 0 0 0 0 ...
##  $ total_prog_skills                                                                                                                                                   : num [1:20036] 7 3 3 3 1 2 3 1 3 2 ...
##  $ total_ml_skills                                                                                                                                                     : num [1:20036] 6 3 0 5 0 0 4 1 2 0 ...
##  $ total_skills                                                                                                                                                        : num [1:20036] 13 6 3 8 1 2 7 2 5 2 ...
head(df_filtered[q7_cols], 10)
## # A tibble: 10 × 13
##    What programming languages do…¹ What programming lan…² What programming lan…³
##                              <dbl>                  <dbl>                  <dbl>
##  1                               1                      1                      1
##  2                               1                      1                      1
##  3                               0                      0                      0
##  4                               1                      0                      1
##  5                               1                      0                      0
##  6                               1                      1                      0
##  7                               1                      1                      0
##  8                               0                      1                      0
##  9                               1                      0                      1
## 10                               1                      0                      1
## # ℹ abbreviated names:
## #   ¹​`What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Python`,
## #   ²​`What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - R`,
## #   ³​`What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - SQL`
## # ℹ 10 more variables:
## #   `What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - C` <dbl>,
## #   `What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - C++` <dbl>, …

Visualize Skill Distribution

Clean q7 and q17 column names for plotting

#For Q7 and Q17 columns (Programming Languages and ML tools)
clean_q7_names <- sub(".* - ", "", q7_cols)
clean_q17_names <- sub(".* - ", "", q17_cols)

Create new df from df_filtered w/ only q7_cols –> for plotting only

df_prog <- df_filtered[q7_cols] #new df frame for plotting 
names(df_prog) <- clean_q7_names #renames the columns in new df 

Create new df from df_filtered w/ only q17_cols –> for plotting only

df_ml <- df_filtered[q17_cols] #new df frame for plotting 
names(df_ml) <- clean_q17_names #renames the columns in new df 

Plot Programming Languages

# Count and plot programming languages 
prog_counts <- data.frame(
  Language = names(df_prog),
  Count = colSums(df_prog, na.rm = TRUE)
)

ggplot(prog_counts, aes(x = Count, y = reorder(Language, Count))) +
  geom_bar(stat = "identity", fill = "steelblue") +
  labs(
    title = "Most Used Programming Languages",
    x = "Count", y = "Language"
  ) +
  theme_minimal()

Programming Languages Percent (and decending order of mean)

summary(df_prog)
##      Python             R               SQL               C         
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:1.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :1.0000   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.7751   Mean   :0.2135   Mean   :0.3761   Mean   :0.1655  
##  3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##       C++             Java         Javascript         Julia        
##  Min.   :0.000   Min.   :0.000   Min.   :0.0000   Min.   :0.00000  
##  1st Qu.:0.000   1st Qu.:0.000   1st Qu.:0.0000   1st Qu.:0.00000  
##  Median :0.000   Median :0.000   Median :0.0000   Median :0.00000  
##  Mean   :0.191   Mean   :0.168   Mean   :0.1495   Mean   :0.01308  
##  3rd Qu.:0.000   3rd Qu.:0.000   3rd Qu.:0.0000   3rd Qu.:0.00000  
##  Max.   :1.000   Max.   :1.000   Max.   :1.0000   Max.   :1.00000  
##      Swift               Bash             MATLAB            None        
##  Min.   :0.000000   Min.   :0.00000   Min.   :0.0000   Min.   :0.00000  
##  1st Qu.:0.000000   1st Qu.:0.00000   1st Qu.:0.0000   1st Qu.:0.00000  
##  Median :0.000000   Median :0.00000   Median :0.0000   Median :0.00000  
##  Mean   :0.009882   Mean   :0.08864   Mean   :0.1107   Mean   :0.01028  
##  3rd Qu.:0.000000   3rd Qu.:0.00000   3rd Qu.:0.0000   3rd Qu.:0.00000  
##  Max.   :1.000000   Max.   :1.00000   Max.   :1.0000   Max.   :1.00000  
##      Other        
##  Min.   :0.00000  
##  1st Qu.:0.00000  
##  Median :0.00000  
##  Mean   :0.09708  
##  3rd Qu.:0.00000  
##  Max.   :1.00000
prog_means <- colMeans(df_prog, na.rm = TRUE)
df_prog_means <- data.frame(
  Language = names(prog_means),
  Mean = as.numeric(prog_means)
)

# Sort in descending order
df_prog_means_sorted <- df_prog_means[order(-df_prog_means$Mean), ]
df_prog_means_sorted
##      Language        Mean
## 1      Python 0.775104811
## 3         SQL 0.376073068
## 2           R 0.213465762
## 5         C++ 0.191006189
## 6        Java 0.168047514
## 4           C 0.165452186
## 7  Javascript 0.149480934
## 11     MATLAB 0.110650829
## 13      Other 0.097075265
## 10       Bash 0.088640447
## 8       Julia 0.013076462
## 12       None 0.010281493
## 9       Swift 0.009882212

Python: 77.51% R: 21.35% SQL: 37.61% C: 16.55% C++: 19.1% Java: 16.8% Javascript: 14.95% Julia: 1.31% Swift: 0.988% Bash: 8.864% MATLAB: 11.07% None: 1.028% Other: 9.708

Plot ML Tools

# Count and plot ML Tools
ml_counts <- data.frame(
  Tools = names(df_ml),
  Count = colSums(df_ml, na.rm = TRUE)
)

ggplot(ml_counts, aes(x = Count, y = reorder(Tools, Count))) +
  geom_bar(stat = "identity", fill = "darkgreen") +
  labs(
    title = "Most Used Machine Learning Tools",
    x = "Count", y = "ML Tool"
  ) +
  theme_minimal()

ML Percent (and decending order of mean)

summary(df_ml)
##  Linear or Logistic Regression Decision Trees or Random Forests
##  Min.   :0.0000                Min.   :0.0000                  
##  1st Qu.:0.0000                1st Qu.:0.0000                  
##  Median :1.0000                Median :0.0000                  
##  Mean   :0.5271                Mean   :0.4394                  
##  3rd Qu.:1.0000                3rd Qu.:1.0000                  
##  Max.   :1.0000                Max.   :1.0000                  
##  Gradient Boosting Machines (xgboost, lightgbm, etc) Bayesian Approaches
##  Min.   :0.0000                                      Min.   :0.000      
##  1st Qu.:0.0000                                      1st Qu.:0.000      
##  Median :0.0000                                      Median :0.000      
##  Mean   :0.2562                                      Mean   :0.182      
##  3rd Qu.:1.0000                                      3rd Qu.:0.000      
##  Max.   :1.0000                                      Max.   :1.000      
##  Evolutionary Approaches Dense Neural Networks (MLPs, etc)
##  Min.   :0.00000         Min.   :0.0000                   
##  1st Qu.:0.00000         1st Qu.:0.0000                   
##  Median :0.00000         Median :0.0000                   
##  Mean   :0.03648         Mean   :0.1679                   
##  3rd Qu.:0.00000         3rd Qu.:0.0000                   
##  Max.   :1.00000         Max.   :1.0000                   
##  Convolutional Neural Networks Generative Adversarial Networks
##  Min.   :0.0000                Min.   :0.00000                
##  1st Qu.:0.0000                1st Qu.:0.00000                
##  Median :0.0000                Median :0.00000                
##  Mean   :0.2924                Mean   :0.05111                
##  3rd Qu.:1.0000                3rd Qu.:0.00000                
##  Max.   :1.0000                Max.   :1.00000                
##  Recurrent Neural Networks Transformer Networks (BERT, gpt-3, etc)
##  Min.   :0.0000            Min.   :0.00000                        
##  1st Qu.:0.0000            1st Qu.:0.00000                        
##  Median :0.0000            Median :0.00000                        
##  Mean   :0.1731            Mean   :0.06478                        
##  3rd Qu.:0.0000            3rd Qu.:0.00000                        
##  Max.   :1.0000            Max.   :1.00000                        
##       None             Other        
##  Min.   :0.00000   Min.   :0.00000  
##  1st Qu.:0.00000   1st Qu.:0.00000  
##  Median :0.00000   Median :0.00000  
##  Mean   :0.03673   Mean   :0.02046  
##  3rd Qu.:0.00000   3rd Qu.:0.00000  
##  Max.   :1.00000   Max.   :1.00000
ml_means <- colMeans(df_ml, na.rm = TRUE)
df_ml_means <- data.frame(
  Tools = names(ml_means),
  Mean = as.numeric(ml_means)
)

# Sort in descending order
df_ml_means_sorted <- df_ml_means[order(-df_ml_means$Mean), ]
df_ml_means_sorted
##                                                  Tools       Mean
## 1                        Linear or Logistic Regression 0.52705131
## 2                     Decision Trees or Random Forests 0.43940906
## 7                        Convolutional Neural Networks 0.29242364
## 3  Gradient Boosting Machines (xgboost, lightgbm, etc) 0.25623877
## 4                                  Bayesian Approaches 0.18202236
## 9                            Recurrent Neural Networks 0.17308844
## 6                    Dense Neural Networks (MLPs, etc) 0.16789778
## 10             Transformer Networks (BERT, gpt-3, etc) 0.06478339
## 8                      Generative Adversarial Networks 0.05110801
## 11                                                None 0.03673388
## 5                              Evolutionary Approaches 0.03648433
## 12                                               Other 0.02046317

Most Used Programming Languages by Job Title

# Add job titles to df_prog for grouping
df_prog_long <- df_prog %>% #df_prog is the cleaned df containing only q7 columns
  mutate(job_title = df_filtered$job_title) %>% #adds job title column to df_prog
  pivot_longer(       #convert from wide format (one column per language) to long format (one row per language used per person)
    cols = -job_title, # when converting leave job title column as is and reshape other columns (language and used)
    names_to = "Language",
    values_to = "Used"
  ) %>%
  filter(Used == 1) #filter for only languages that were used 

Filtered “Currently not employed,” “Student,” and “Other” for more clarity and direct job titles that are the subject matter

# Count by job title and language
prog_counts <- df_prog_long %>%
  filter(!job_title %in% c("Currently not employed", "Student", "Other")) %>%
  group_by(job_title, Language) %>% #group long format by job title and language 
  summarise(Count = n(), .groups = "drop") #counts how many respondents with a specific job used each language 
# Plot
ggplot(prog_counts, aes(x = Count, y = reorder(Language, Count), fill = job_title)) +
  geom_col() +
  labs(
    title = "Most Used Programming Languages by Job Title",
    x = "Count",
    y = "Programming Language",
    fill = "Job Title"
  ) +
  theme_minimal()

Most Used ML Tools by Job Title (use plot above for reference)

# Add job titles to df_ml for grouping
df_ml_long <- df_ml %>%
  mutate(job_title = df_filtered$job_title) %>%
  pivot_longer(
    cols = -job_title,
    names_to = "Tool",
    values_to = "Used"
  ) %>%
  filter(Used == 1)

Filtered “Currently not employed,” “Student,” and “Other” for more clarity and direct job titles that are the ML Tools subject matter

# Count by job title and tool
ml_counts <- df_ml_long %>%
  filter(!job_title %in% c("Currently not employed", "Student", "Other")) %>%
  group_by(job_title, Tool) %>%
  summarise(Count = n(), .groups = "drop")
# Plot
ggplot(ml_counts, aes(x = Count, y = reorder(Tool, Count), fill = job_title)) +
  geom_col() +
  labs(
    title = "Most Used ML Tools by Job Title",
    x = "Count",
    y = "ML Tool",
    fill = "Job Title"
  ) +
  theme_minimal()

Salary (United States only)

Coding for salary range

desired_order <- c(
  "0-9999", "10,000-14,999", "15,000-19,999", "20,000-24,999",
  "25,000-29,999", "30,000-39,999", "40,000-49,999", "50,000-59,999",
  "60,000-69,999", "70,000-79,999", "80,000-89,999", "90,000-99,999",
  "100,000-124,999", "125,000-149,999", "150,000-199,999", "200,000-249,999",
  "250,000-299,999", "300,000-499,999", "500,000-999,999", "1,000,000 or more"
)
df$salary <- factor(df$salary, levels = desired_order)
levels(df$salary)
##  [1] "0-9999"            "10,000-14,999"     "15,000-19,999"    
##  [4] "20,000-24,999"     "25,000-29,999"     "30,000-39,999"    
##  [7] "40,000-49,999"     "50,000-59,999"     "60,000-69,999"    
## [10] "70,000-79,999"     "80,000-89,999"     "90,000-99,999"    
## [13] "100,000-124,999"   "125,000-149,999"   "150,000-199,999"  
## [16] "200,000-249,999"   "250,000-299,999"   "300,000-499,999"  
## [19] "500,000-999,999"   "1,000,000 or more"

Additonal cleanup

excluded_ranges <- c(
  "0-9,999", 
  "2,000-2,999",
  "10,000-14,999", 
  "15,000-19,999", 
  "20,000-24,999",
  "25,000-29,999",
  "30,000-39,999",
  "500,000-999,999",
  "1,000,000 or more"
)

For graph salary range order

desc_salary_levels <- c(
  ">$500,000",
  "300,000-499,999",
  "250,000-299,999",
  "200,000-249,999",
  "150,000-199,999",
  "125,000-149,999",
  "100,000-124,999",
  "90,000-99,999",
  "80,000-89,999",
  "70,000-79,999",
  "60,000-69,999",
  "50,000-59,999",
  "40,000-49,999",
  "30,000-39,999",
  "25,000-29,999",
  "20,000-24,999",
  "15,000-19,999",
  "10,000-14,999"
)

#TEST

df_us_only <- df_filtered %>%
  filter(country == "United States of America")

df_long <- df_us_only %>%
  pivot_longer(
    cols = all_of(q7_cols), 
    names_to = "Language_raw",
    values_to = "Used"
  ) %>%
  filter(Used == 1) %>%
  mutate(Language = sub(".* - ", "", Language_raw))

df_lang_usage <- df_long %>%
  group_by(Language) %>%
  summarise(Count = n(), .groups = "drop") %>%
  arrange(desc(Count))
head(df_lang_usage, 5)
## # A tibble: 5 × 2
##   Language   Count
##   <chr>      <int>
## 1 Python      1705
## 2 SQL         1061
## 3 R            713
## 4 Bash         357
## 5 Javascript   322

Salary ranges in the Top 5 Programming languages used

df_us_only <- df_filtered %>%
  filter(country == "United States of America")

df_long <- df_us_only %>%
  pivot_longer(
    cols = all_of(q7_cols),  
    names_to = "Language_raw",
    values_to = "Used"
  ) %>%

  filter(Used == 1) %>%

  mutate(Language = sub(".* - ", "", Language_raw))

df_lang_usage <- df_long %>%
  group_by(Language) %>%
  summarise(Count = n(), .groups = "drop") %>%
  arrange(desc(Count))


head(df_lang_usage, 5)
## # A tibble: 5 × 2
##   Language   Count
##   <chr>      <int>
## 1 Python      1705
## 2 SQL         1061
## 3 R            713
## 4 Bash         357
## 5 Javascript   322
#---------- Useage counts
df_lang_usage <- df_long %>%
  group_by(Language) %>%
  summarise(Count = n(), .groups = "drop") %>%
  arrange(desc(Count))

head(df_lang_usage, 10)
## # A tibble: 10 × 2
##    Language   Count
##    <chr>      <int>
##  1 Python      1705
##  2 SQL         1061
##  3 R            713
##  4 Bash         357
##  5 Javascript   322
##  6 Other        318
##  7 Java         295
##  8 C++          254
##  9 MATLAB       219
## 10 C            173
#------------------- Plotting shows Other instead of Javascript because of NA and outlier removal
df_us_only <- df_filtered %>%
  filter(
    country == "United States of America",
    !is.na(salary),
    !salary %in% excluded_ranges
  ) %>%
  mutate(
    salary = factor(salary, levels = desc_salary_levels, ordered = TRUE),
    salary = droplevels(salary) 
  )

df_long <- df_us_only %>%
  pivot_longer(
    cols = all_of(q7_cols),
    names_to = "Language_raw",
    values_to = "Used"
  ) %>%
  filter(Used == 1) %>%
  mutate(Language = sub(".* - ", "", Language_raw))

df_lang_usage <- df_long %>%
  group_by(Language) %>%
  summarise(Count = n(), .groups = "drop") %>%
  arrange(desc(Count))

top_5_langs <- head(df_lang_usage$Language, 5)

df_long_top5 <- df_long %>%
  filter(Language %in% top_5_langs)

df_counts <- df_long_top5 %>%
  group_by(salary, Language) %>%
  summarise(n = n(), .groups = "drop")

ggplot(df_counts, aes(x = n, y = reorder(Language, n), fill = salary)) +
  geom_col(position = "dodge") +
  labs(
    x = "Number of Respondents",
    y = "Programming Language",
    fill = "Salary Range",
    title = "Top 5 Programming Languages vs. Salary Range (US Only)"
  ) +
  theme_minimal()

Salary ranges in the Top 5 ML tools used

df_ml_long <- df_us_only %>%
  pivot_longer(
    cols = all_of(q17_cols),  
    names_to = "MLTool_raw",
    values_to = "Used"
  ) %>%
  filter(Used == 1) %>%           
  mutate(MLTool = sub(".* - ", "", MLTool_raw)) 

df_ml_usage <- df_ml_long %>%
  group_by(MLTool) %>%
  summarise(Count = n(), .groups = "drop") %>%
  arrange(desc(Count))

top_5_tools <- head(df_ml_usage$MLTool, 5)

df_ml_long_top5 <- df_ml_long %>%
  filter(MLTool %in% top_5_tools)

df_counts <- df_ml_long_top5 %>%
  group_by(salary, MLTool) %>%
  summarise(n = n(), .groups = "drop")

ggplot(df_counts, aes(x = n, y = reorder(MLTool, n), fill = salary)) +
  geom_col(position = "dodge") +
  labs(
    x = "Number of Respondents",
    y = "ML Tool",
    fill = "Salary Range",
    title = "Top 5 ML Tools vs. Salary Range (US Only)"
  ) +
  theme_minimal()

Education

Salary based on Education level

df_edu <- df_filtered %>%
  filter(
    country == "United States of America",
    !is.na(salary),
    !is.na(education),
    !salary %in% excluded_ranges
  ) %>%
  mutate(
    salary = factor(salary),
    education = factor(education)
  ) %>%
  droplevels()


df_edu_counts <- df_edu %>%
  group_by(education, salary) %>%
  summarise(n = n(), .groups = "drop")

ggplot(df_edu_counts, aes(x = n, y = reorder(education, n), fill = salary)) +
  geom_col(position = "dodge") +
  labs(
    x = "Number of Respondents",
    y = "Education Level",
    fill = "Salary Range",
    title = "Salary Range by Education Level (US Only)"
  ) +
  theme_minimal()

Education level of Responedents based on top 5 Programming langauges used

top_langs <- df_lang_usage %>%
  slice_max(Count, n = 5) %>%
  pull(Language)

df_lang_edu <- df_long %>%
  filter(
    country == "United States of America",
    Language %in% top_langs,
    !is.na(education)
  ) %>%
  mutate(
    Language = factor(Language, levels = top_langs),
    education = factor(education)
  )
education_levels <- c(
  "No formal education past high school",
  "Some college/university study without earning a bachelor’s degree",
  "Bachelor’s degree",
  "Master’s degree",
  "Professional degree",
  "Doctoral degree",
  "I prefer not to answer"
)
df_lang_edu$education <- factor(df_lang_edu$education, levels = education_levels)
df_lang_edu_counts <- df_lang_edu %>%
  group_by(Language, education) %>%
  summarise(n = n(), .groups = "drop")
ggplot(df_lang_edu_counts, aes(x = n, y = education, fill = Language)) +
  geom_col(position = "dodge") +
  labs(
    title = "Education Level of Respondents by Top 5 Programming Languages",
    x = "Number of Respondents",
    y = "Education Level",
    fill = "Programming Language"
  ) +
  theme_minimal()

Education level of respondents based on top 5 ML Tools used

df_ml_edu_counts <- df_ml_long_top5 %>%
  group_by(MLTool, education) %>%
  summarise(n = n(), .groups = "drop") %>%
  complete(MLTool, education = education_levels, fill = list(n = 0)) %>%
  mutate(education = factor(education, levels = rev(education_levels)))

ggplot(df_ml_edu_counts, aes(x = n, y = education, fill = MLTool)) +
  geom_col(position = "dodge") +
  labs(
    title = "Education Level of US Respondents by Top 5 ML Tools",
    x = "Number of Respondents",
    y = "Education Level",
    fill = "ML Tool"
  ) +
  theme_minimal() +
  scale_fill_brewer(palette = "Set1") +
  theme(legend.position = "bottom")

Conclusion

Based on the various visualizations throughout this project, we can infer that programming proficiency in Python and SQL are the two most consistently used skills and its popularity remains within the top 5 skills (especially in the US), including R, refered as a also playing a strong role for many professionals. Neural networks, decision trees, gradient boosting methods, and linear/logistic regression are some of the machine learning tools and techniques that seem to be frequently utilized and appreciated. Stronger backgrounds in these programming languages and machine learning frameworks are frequently associated with better wages, and many responders in higher salary categories have at least a bachelor’s or master’s degree. All things considered, the most sought-after set of data science competencies is a strong foundation in Python, SQL, and fundamental machine learning algorithms—particularly those using neural networks and tree-based approaches.