library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readr)
library(tidyr)
library(dplyr)
library(ggplot2)
df <- read_csv("https://raw.githubusercontent.com/GullitNa/DATA607-Project3/main/kaggle_survey_2020_responses%202.csv",
skip = 1)
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
## Rows: 20036 Columns: 355
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (353): What is your age (# years)?, What is your gender? - Selected Choi...
## dbl (1): Duration (in seconds)
## lgl (1): Which of the following business intelligence tools do you use on ...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(df)
## # A tibble: 6 × 355
## `Duration (in seconds)` `What is your age (# years)?` What is your gender? -…¹
## <dbl> <chr> <chr>
## 1 1838 35-39 Man
## 2 289287 30-34 Man
## 3 860 35-39 Man
## 4 507 30-34 Man
## 5 78 30-34 Man
## 6 401 30-34 Man
## # ℹ abbreviated name: ¹`What is your gender? - Selected Choice`
## # ℹ 352 more variables: `In which country do you currently reside?` <chr>,
## # `What is the highest level of formal education that you have attained or plan to attain within the next 2 years?` <chr>,
## # `Select the title most similar to your current role (or most recent title if retired): - Selected Choice` <chr>,
## # `For how many years have you been writing code and/or programming?` <chr>,
## # `What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Python` <chr>,
## # `What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - R` <chr>, …
colnames(df)
## [1] "Duration (in seconds)"
## [2] "What is your age (# years)?"
## [3] "What is your gender? - Selected Choice"
## [4] "In which country do you currently reside?"
## [5] "What is the highest level of formal education that you have attained or plan to attain within the next 2 years?"
## [6] "Select the title most similar to your current role (or most recent title if retired): - Selected Choice"
## [7] "For how many years have you been writing code and/or programming?"
## [8] "What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Python"
## [9] "What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - R"
## [10] "What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - SQL"
## [11] "What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - C"
## [12] "What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - C++"
## [13] "What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Java"
## [14] "What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Javascript"
## [15] "What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Julia"
## [16] "What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Swift"
## [17] "What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Bash"
## [18] "What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - MATLAB"
## [19] "What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - None"
## [20] "What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Other"
## [21] "What programming language would you recommend an aspiring data scientist to learn first? - Selected Choice"
## [22] "Which of the following integrated development environments (IDE's) do you use on a regular basis? (Select all that apply) - Selected Choice - Jupyter (JupyterLab, Jupyter Notebooks, etc)"
## [23] "Which of the following integrated development environments (IDE's) do you use on a regular basis? (Select all that apply) - Selected Choice - RStudio"
## [24] "Which of the following integrated development environments (IDE's) do you use on a regular basis? (Select all that apply) - Selected Choice - Visual Studio / Visual Studio Code"
## [25] "Which of the following integrated development environments (IDE's) do you use on a regular basis? (Select all that apply) - Selected Choice - Click to write Choice 13"
## [26] "Which of the following integrated development environments (IDE's) do you use on a regular basis? (Select all that apply) - Selected Choice - PyCharm"
## [27] "Which of the following integrated development environments (IDE's) do you use on a regular basis? (Select all that apply) - Selected Choice - Spyder"
## [28] "Which of the following integrated development environments (IDE's) do you use on a regular basis? (Select all that apply) - Selected Choice - Notepad++"
## [29] "Which of the following integrated development environments (IDE's) do you use on a regular basis? (Select all that apply) - Selected Choice - Sublime Text"
## [30] "Which of the following integrated development environments (IDE's) do you use on a regular basis? (Select all that apply) - Selected Choice - Vim / Emacs"
## [31] "Which of the following integrated development environments (IDE's) do you use on a regular basis? (Select all that apply) - Selected Choice - MATLAB"
## [32] "Which of the following integrated development environments (IDE's) do you use on a regular basis? (Select all that apply) - Selected Choice - None"
## [33] "Which of the following integrated development environments (IDE's) do you use on a regular basis? (Select all that apply) - Selected Choice - Other"
## [34] "Which of the following hosted notebook products do you use on a regular basis? (Select all that apply) - Selected Choice - Kaggle Notebooks"
## [35] "Which of the following hosted notebook products do you use on a regular basis? (Select all that apply) - Selected Choice - Colab Notebooks"
## [36] "Which of the following hosted notebook products do you use on a regular basis? (Select all that apply) - Selected Choice - Azure Notebooks"
## [37] "Which of the following hosted notebook products do you use on a regular basis? (Select all that apply) - Selected Choice - Paperspace / Gradient"
## [38] "Which of the following hosted notebook products do you use on a regular basis? (Select all that apply) - Selected Choice - Binder / JupyterHub"
## [39] "Which of the following hosted notebook products do you use on a regular basis? (Select all that apply) - Selected Choice - Code Ocean"
## [40] "Which of the following hosted notebook products do you use on a regular basis? (Select all that apply) - Selected Choice - IBM Watson Studio"
## [41] "Which of the following hosted notebook products do you use on a regular basis? (Select all that apply) - Selected Choice - Amazon Sagemaker Studio"
## [42] "Which of the following hosted notebook products do you use on a regular basis? (Select all that apply) - Selected Choice - Amazon EMR Notebooks"
## [43] "Which of the following hosted notebook products do you use on a regular basis? (Select all that apply) - Selected Choice - Google Cloud AI Platform Notebooks"
## [44] "Which of the following hosted notebook products do you use on a regular basis? (Select all that apply) - Selected Choice - Google Cloud Datalab Notebooks"
## [45] "Which of the following hosted notebook products do you use on a regular basis? (Select all that apply) - Selected Choice - Databricks Collaborative Notebooks"
## [46] "Which of the following hosted notebook products do you use on a regular basis? (Select all that apply) - Selected Choice - None"
## [47] "Which of the following hosted notebook products do you use on a regular basis? (Select all that apply) - Selected Choice - Other"
## [48] "What type of computing platform do you use most often for your data science projects? - Selected Choice"
## [49] "Which types of specialized hardware do you use on a regular basis? (Select all that apply) - Selected Choice - GPUs"
## [50] "Which types of specialized hardware do you use on a regular basis? (Select all that apply) - Selected Choice - TPUs"
## [51] "Which types of specialized hardware do you use on a regular basis? (Select all that apply) - Selected Choice - None"
## [52] "Which types of specialized hardware do you use on a regular basis? (Select all that apply) - Selected Choice - Other"
## [53] "Approximately how many times have you used a TPU (tensor processing unit)?"
## [54] "What data visualization libraries or tools do you use on a regular basis? (Select all that apply) - Selected Choice - Matplotlib"
## [55] "What data visualization libraries or tools do you use on a regular basis? (Select all that apply) - Selected Choice - Seaborn"
## [56] "What data visualization libraries or tools do you use on a regular basis? (Select all that apply) - Selected Choice - Plotly / Plotly Express"
## [57] "What data visualization libraries or tools do you use on a regular basis? (Select all that apply) - Selected Choice - Ggplot / ggplot2"
## [58] "What data visualization libraries or tools do you use on a regular basis? (Select all that apply) - Selected Choice - Shiny"
## [59] "What data visualization libraries or tools do you use on a regular basis? (Select all that apply) - Selected Choice - D3 js"
## [60] "What data visualization libraries or tools do you use on a regular basis? (Select all that apply) - Selected Choice - Altair"
## [61] "What data visualization libraries or tools do you use on a regular basis? (Select all that apply) - Selected Choice - Bokeh"
## [62] "What data visualization libraries or tools do you use on a regular basis? (Select all that apply) - Selected Choice - Geoplotlib"
## [63] "What data visualization libraries or tools do you use on a regular basis? (Select all that apply) - Selected Choice - Leaflet / Folium"
## [64] "What data visualization libraries or tools do you use on a regular basis? (Select all that apply) - Selected Choice - None"
## [65] "What data visualization libraries or tools do you use on a regular basis? (Select all that apply) - Selected Choice - Other"
## [66] "For how many years have you used machine learning methods?"
## [67] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice - Scikit-learn"
## [68] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice - TensorFlow"
## [69] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice - Keras"
## [70] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice - PyTorch"
## [71] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice - Fast.ai"
## [72] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice - MXNet"
## [73] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice - Xgboost"
## [74] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice - LightGBM"
## [75] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice - CatBoost"
## [76] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice - Prophet"
## [77] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice - H2O 3"
## [78] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice - Caret"
## [79] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice - Tidymodels"
## [80] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice - JAX"
## [81] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice - None"
## [82] "Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice - Other"
## [83] "Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Linear or Logistic Regression"
## [84] "Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Decision Trees or Random Forests"
## [85] "Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Gradient Boosting Machines (xgboost, lightgbm, etc)"
## [86] "Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Bayesian Approaches"
## [87] "Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Evolutionary Approaches"
## [88] "Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Dense Neural Networks (MLPs, etc)"
## [89] "Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Convolutional Neural Networks"
## [90] "Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Generative Adversarial Networks"
## [91] "Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Recurrent Neural Networks"
## [92] "Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Transformer Networks (BERT, gpt-3, etc)"
## [93] "Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - None"
## [94] "Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Other"
## [95] "Which categories of computer vision methods do you use on a regular basis? (Select all that apply) - Selected Choice - General purpose image/video tools (PIL, cv2, skimage, etc)"
## [96] "Which categories of computer vision methods do you use on a regular basis? (Select all that apply) - Selected Choice - Image segmentation methods (U-Net, Mask R-CNN, etc)"
## [97] "Which categories of computer vision methods do you use on a regular basis? (Select all that apply) - Selected Choice - Object detection methods (YOLOv3, RetinaNet, etc)"
## [98] "Which categories of computer vision methods do you use on a regular basis? (Select all that apply) - Selected Choice - Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc)"
## [99] "Which categories of computer vision methods do you use on a regular basis? (Select all that apply) - Selected Choice - Generative Networks (GAN, VAE, etc)"
## [100] "Which categories of computer vision methods do you use on a regular basis? (Select all that apply) - Selected Choice - None"
## [101] "Which categories of computer vision methods do you use on a regular basis? (Select all that apply) - Selected Choice - Other"
## [102] "Which of the following natural language processing (NLP) methods do you use on a regular basis? (Select all that apply) - Selected Choice - Word embeddings/vectors (GLoVe, fastText, word2vec)"
## [103] "Which of the following natural language processing (NLP) methods do you use on a regular basis? (Select all that apply) - Selected Choice - Encoder-decorder models (seq2seq, vanilla transformers)"
## [104] "Which of the following natural language processing (NLP) methods do you use on a regular basis? (Select all that apply) - Selected Choice - Contextualized embeddings (ELMo, CoVe)"
## [105] "Which of the following natural language processing (NLP) methods do you use on a regular basis? (Select all that apply) - Selected Choice - Transformer language models (GPT-3, BERT, XLnet, etc)"
## [106] "Which of the following natural language processing (NLP) methods do you use on a regular basis? (Select all that apply) - Selected Choice - None"
## [107] "Which of the following natural language processing (NLP) methods do you use on a regular basis? (Select all that apply) - Selected Choice - Other"
## [108] "What is the size of the company where you are employed?"
## [109] "Approximately how many individuals are responsible for data science workloads at your place of business?"
## [110] "Does your current employer incorporate machine learning methods into their business?"
## [111] "Select any activities that make up an important part of your role at work: (Select all that apply) - Selected Choice - Analyze and understand data to influence product or business decisions"
## [112] "Select any activities that make up an important part of your role at work: (Select all that apply) - Selected Choice - Build and/or run the data infrastructure that my business uses for storing, analyzing, and operationalizing data"
## [113] "Select any activities that make up an important part of your role at work: (Select all that apply) - Selected Choice - Build prototypes to explore applying machine learning to new areas"
## [114] "Select any activities that make up an important part of your role at work: (Select all that apply) - Selected Choice - Build and/or run a machine learning service that operationally improves my product or workflows"
## [115] "Select any activities that make up an important part of your role at work: (Select all that apply) - Selected Choice - Experimentation and iteration to improve existing ML models"
## [116] "Select any activities that make up an important part of your role at work: (Select all that apply) - Selected Choice - Do research that advances the state of the art of machine learning"
## [117] "Select any activities that make up an important part of your role at work: (Select all that apply) - Selected Choice - None of these activities are an important part of my role at work"
## [118] "Select any activities that make up an important part of your role at work: (Select all that apply) - Selected Choice - Other"
## [119] "What is your current yearly compensation (approximate $USD)?"
## [120] "Approximately how much money have you (or your team) spent on machine learning and/or cloud computing services at home (or at work) in the past 5 years (approximate $USD)?"
## [121] "Which of the following cloud computing platforms do you use on a regular basis? (Select all that apply) - Selected Choice - Amazon Web Services (AWS)"
## [122] "Which of the following cloud computing platforms do you use on a regular basis? (Select all that apply) - Selected Choice - Microsoft Azure"
## [123] "Which of the following cloud computing platforms do you use on a regular basis? (Select all that apply) - Selected Choice - Google Cloud Platform (GCP)"
## [124] "Which of the following cloud computing platforms do you use on a regular basis? (Select all that apply) - Selected Choice - IBM Cloud / Red Hat"
## [125] "Which of the following cloud computing platforms do you use on a regular basis? (Select all that apply) - Selected Choice - Oracle Cloud"
## [126] "Which of the following cloud computing platforms do you use on a regular basis? (Select all that apply) - Selected Choice - SAP Cloud"
## [127] "Which of the following cloud computing platforms do you use on a regular basis? (Select all that apply) - Selected Choice - Salesforce Cloud"
## [128] "Which of the following cloud computing platforms do you use on a regular basis? (Select all that apply) - Selected Choice - VMware Cloud"
## [129] "Which of the following cloud computing platforms do you use on a regular basis? (Select all that apply) - Selected Choice - Alibaba Cloud"
## [130] "Which of the following cloud computing platforms do you use on a regular basis? (Select all that apply) - Selected Choice - Tencent Cloud"
## [131] "Which of the following cloud computing platforms do you use on a regular basis? (Select all that apply) - Selected Choice - None"
## [132] "Which of the following cloud computing platforms do you use on a regular basis? (Select all that apply) - Selected Choice - Other"
## [133] "Do you use any of the following cloud computing products on a regular basis? (Select all that apply) - Selected Choice - Amazon EC2"
## [134] "Do you use any of the following cloud computing products on a regular basis? (Select all that apply) - Selected Choice - AWS Lambda"
## [135] "Do you use any of the following cloud computing products on a regular basis? (Select all that apply) - Selected Choice - Amazon Elastic Container Service"
## [136] "Do you use any of the following cloud computing products on a regular basis? (Select all that apply) - Selected Choice - Azure Cloud Services"
## [137] "Do you use any of the following cloud computing products on a regular basis? (Select all that apply) - Selected Choice - Microsoft Azure Container Instances"
## [138] "Do you use any of the following cloud computing products on a regular basis? (Select all that apply) - Selected Choice - Azure Functions"
## [139] "Do you use any of the following cloud computing products on a regular basis? (Select all that apply) - Selected Choice - Google Cloud Compute Engine"
## [140] "Do you use any of the following cloud computing products on a regular basis? (Select all that apply) - Selected Choice - Google Cloud Functions"
## [141] "Do you use any of the following cloud computing products on a regular basis? (Select all that apply) - Selected Choice - Google Cloud Run"
## [142] "Do you use any of the following cloud computing products on a regular basis? (Select all that apply) - Selected Choice - Google Cloud App Engine"
## [143] "Do you use any of the following cloud computing products on a regular basis? (Select all that apply) - Selected Choice - No / None"
## [144] "Do you use any of the following cloud computing products on a regular basis? (Select all that apply) - Selected Choice - Other"
## [145] "Do you use any of the following machine learning products on a regular basis? (Select all that apply) - Selected Choice - Amazon SageMaker"
## [146] "Do you use any of the following machine learning products on a regular basis? (Select all that apply) - Selected Choice - Amazon Forecast"
## [147] "Do you use any of the following machine learning products on a regular basis? (Select all that apply) - Selected Choice - Amazon Rekognition"
## [148] "Do you use any of the following machine learning products on a regular basis? (Select all that apply) - Selected Choice - Azure Machine Learning Studio"
## [149] "Do you use any of the following machine learning products on a regular basis? (Select all that apply) - Selected Choice - Azure Cognitive Services"
## [150] "Do you use any of the following machine learning products on a regular basis? (Select all that apply) - Selected Choice - Google Cloud AI Platform / Google Cloud ML Engine"
## [151] "Do you use any of the following machine learning products on a regular basis? (Select all that apply) - Selected Choice - Google Cloud Video AI"
## [152] "Do you use any of the following machine learning products on a regular basis? (Select all that apply) - Selected Choice - Google Cloud Natural Language"
## [153] "Do you use any of the following machine learning products on a regular basis? (Select all that apply) - Selected Choice - Google Cloud Vision AI"
## [154] "Do you use any of the following machine learning products on a regular basis? (Select all that apply) - Selected Choice - No / None"
## [155] "Do you use any of the following machine learning products on a regular basis? (Select all that apply) - Selected Choice - Other"
## [156] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - MySQL"
## [157] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - PostgresSQL"
## [158] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - SQLite"
## [159] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - Oracle Database"
## [160] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - MongoDB"
## [161] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - Snowflake"
## [162] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - IBM Db2"
## [163] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - Microsoft SQL Server"
## [164] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - Microsoft Access"
## [165] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - Microsoft Azure Data Lake Storage"
## [166] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - Amazon Redshift"
## [167] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - Amazon Athena"
## [168] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - Amazon DynamoDB"
## [169] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - Google Cloud BigQuery"
## [170] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - Google Cloud SQL"
## [171] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - Google Cloud Firestore"
## [172] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - None"
## [173] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply) - Selected Choice - Other"
## [174] "Which of the following big data products (relational database, data warehouse, data lake, or similar) do you use most often? - Selected Choice"
## [175] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - Amazon QuickSight"
## [176] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - Microsoft Power BI"
## [177] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - Google Data Studio"
## [178] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - Looker"
## [179] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - Tableau"
## [180] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - Salesforce"
## [181] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - Einstein Analytics"
## [182] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - Qlik"
## [183] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - Domo"
## [184] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - TIBCO Spotfire"
## [185] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - Alteryx"
## [186] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - Sisense"
## [187] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - SAP Analytics Cloud"
## [188] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - None"
## [189] "Which of the following business intelligence tools do you use on a regular basis? (Select all that apply) - Selected Choice - Other"
## [190] "Which of the following business intelligence tools do you use most often? - Selected Choice"
## [191] "Do you use any automated machine learning tools (or partial AutoML tools) on a regular basis? (Select all that apply) - Selected Choice - Automated data augmentation (e.g. imgaug, albumentations)"
## [192] "Do you use any automated machine learning tools (or partial AutoML tools) on a regular basis? (Select all that apply) - Selected Choice - Automated feature engineering/selection (e.g. tpot, boruta_py)"
## [193] "Do you use any automated machine learning tools (or partial AutoML tools) on a regular basis? (Select all that apply) - Selected Choice - Automated model selection (e.g. auto-sklearn, xcessiv)"
## [194] "Do you use any automated machine learning tools (or partial AutoML tools) on a regular basis? (Select all that apply) - Selected Choice - Automated model architecture searches (e.g. darts, enas)"
## [195] "Do you use any automated machine learning tools (or partial AutoML tools) on a regular basis? (Select all that apply) - Selected Choice - Automated hyperparameter tuning (e.g. hyperopt, ray.tune, Vizier)"
## [196] "Do you use any automated machine learning tools (or partial AutoML tools) on a regular basis? (Select all that apply) - Selected Choice - Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI)"
## [197] "Do you use any automated machine learning tools (or partial AutoML tools) on a regular basis? (Select all that apply) - Selected Choice - No / None"
## [198] "Do you use any automated machine learning tools (or partial AutoML tools) on a regular basis? (Select all that apply) - Selected Choice - Other"
## [199] "Which of the following automated machine learning tools (or partial AutoML tools) do you use on a regular basis? (Select all that apply) - Selected Choice - Google Cloud AutoML"
## [200] "Which of the following automated machine learning tools (or partial AutoML tools) do you use on a regular basis? (Select all that apply) - Selected Choice - H20 Driverless AI"
## [201] "Which of the following automated machine learning tools (or partial AutoML tools) do you use on a regular basis? (Select all that apply) - Selected Choice - Databricks AutoML"
## [202] "Which of the following automated machine learning tools (or partial AutoML tools) do you use on a regular basis? (Select all that apply) - Selected Choice - DataRobot AutoML"
## [203] "Which of the following automated machine learning tools (or partial AutoML tools) do you use on a regular basis? (Select all that apply) - Selected Choice - Tpot"
## [204] "Which of the following automated machine learning tools (or partial AutoML tools) do you use on a regular basis? (Select all that apply) - Selected Choice - Auto-Keras"
## [205] "Which of the following automated machine learning tools (or partial AutoML tools) do you use on a regular basis? (Select all that apply) - Selected Choice - Auto-Sklearn"
## [206] "Which of the following automated machine learning tools (or partial AutoML tools) do you use on a regular basis? (Select all that apply) - Selected Choice - Auto_ml"
## [207] "Which of the following automated machine learning tools (or partial AutoML tools) do you use on a regular basis? (Select all that apply) - Selected Choice - Xcessiv"
## [208] "Which of the following automated machine learning tools (or partial AutoML tools) do you use on a regular basis? (Select all that apply) - Selected Choice - MLbox"
## [209] "Which of the following automated machine learning tools (or partial AutoML tools) do you use on a regular basis? (Select all that apply) - Selected Choice - No / None"
## [210] "Which of the following automated machine learning tools (or partial AutoML tools) do you use on a regular basis? (Select all that apply) - Selected Choice - Other"
## [211] "Do you use any tools to help manage machine learning experiments? (Select all that apply) - Selected Choice - Neptune.ai"
## [212] "Do you use any tools to help manage machine learning experiments? (Select all that apply) - Selected Choice - Weights & Biases"
## [213] "Do you use any tools to help manage machine learning experiments? (Select all that apply) - Selected Choice - Comet.ml"
## [214] "Do you use any tools to help manage machine learning experiments? (Select all that apply) - Selected Choice - Sacred + Omniboard"
## [215] "Do you use any tools to help manage machine learning experiments? (Select all that apply) - Selected Choice - TensorBoard"
## [216] "Do you use any tools to help manage machine learning experiments? (Select all that apply) - Selected Choice - Guild.ai"
## [217] "Do you use any tools to help manage machine learning experiments? (Select all that apply) - Selected Choice - Polyaxon"
## [218] "Do you use any tools to help manage machine learning experiments? (Select all that apply) - Selected Choice - Trains"
## [219] "Do you use any tools to help manage machine learning experiments? (Select all that apply) - Selected Choice - Domino Model Monitor"
## [220] "Do you use any tools to help manage machine learning experiments? (Select all that apply) - Selected Choice - No / None"
## [221] "Do you use any tools to help manage machine learning experiments? (Select all that apply) - Selected Choice - Other"
## [222] "Where do you publicly share or deploy your data analysis or machine learning applications? (Select all that apply) - Selected Choice - Plotly Dash"
## [223] "Where do you publicly share or deploy your data analysis or machine learning applications? (Select all that apply) - Selected Choice - Streamlit"
## [224] "Where do you publicly share or deploy your data analysis or machine learning applications? (Select all that apply) - Selected Choice - NBViewer"
## [225] "Where do you publicly share or deploy your data analysis or machine learning applications? (Select all that apply) - Selected Choice - GitHub"
## [226] "Where do you publicly share or deploy your data analysis or machine learning applications? (Select all that apply) - Selected Choice - Personal blog"
## [227] "Where do you publicly share or deploy your data analysis or machine learning applications? (Select all that apply) - Selected Choice - Kaggle"
## [228] "Where do you publicly share or deploy your data analysis or machine learning applications? (Select all that apply) - Selected Choice - Colab"
## [229] "Where do you publicly share or deploy your data analysis or machine learning applications? (Select all that apply) - Selected Choice - Shiny"
## [230] "Where do you publicly share or deploy your data analysis or machine learning applications? (Select all that apply) - Selected Choice - I do not share my work publicly"
## [231] "Where do you publicly share or deploy your data analysis or machine learning applications? (Select all that apply) - Selected Choice - Other"
## [232] "On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - Coursera"
## [233] "On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - edX"
## [234] "On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - Kaggle Learn Courses"
## [235] "On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - DataCamp"
## [236] "On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - Fast.ai"
## [237] "On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - Udacity"
## [238] "On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - Udemy"
## [239] "On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - LinkedIn Learning"
## [240] "On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - Cloud-certification programs (direct from AWS, Azure, GCP, or similar)"
## [241] "On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - University Courses (resulting in a university degree)"
## [242] "On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - None"
## [243] "On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - Other"
## [244] "What is the primary tool that you use at work or school to analyze data? (Include text response) - Selected Choice"
## [245] "Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Twitter (data science influencers)"
## [246] "Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Email newsletters (Data Elixir, O'Reilly Data & AI, etc)"
## [247] "Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Reddit (r/machinelearning, etc)"
## [248] "Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Kaggle (notebooks, forums, etc)"
## [249] "Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Course Forums (forums.fast.ai, Coursera forums, etc)"
## [250] "Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - YouTube (Kaggle YouTube, Cloud AI Adventures, etc)"
## [251] "Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Podcasts (Chai Time Data Science, O’Reilly Data Show, etc)"
## [252] "Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Blogs (Towards Data Science, Analytics Vidhya, etc)"
## [253] "Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Journal Publications (peer-reviewed journals, conference proceedings, etc)"
## [254] "Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Slack Communities (ods.ai, kagglenoobs, etc)"
## [255] "Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - None"
## [256] "Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Other"
## [257] "Which of the following cloud computing platforms do you hope to become more familiar with in the next 2 years? - Selected Choice - Amazon Web Services (AWS)"
## [258] "Which of the following cloud computing platforms do you hope to become more familiar with in the next 2 years? - Selected Choice - Microsoft Azure"
## [259] "Which of the following cloud computing platforms do you hope to become more familiar with in the next 2 years? - Selected Choice - Google Cloud Platform (GCP)"
## [260] "Which of the following cloud computing platforms do you hope to become more familiar with in the next 2 years? - Selected Choice - IBM Cloud / Red Hat"
## [261] "Which of the following cloud computing platforms do you hope to become more familiar with in the next 2 years? - Selected Choice - Oracle Cloud"
## [262] "Which of the following cloud computing platforms do you hope to become more familiar with in the next 2 years? - Selected Choice - SAP Cloud"
## [263] "Which of the following cloud computing platforms do you hope to become more familiar with in the next 2 years? - Selected Choice - VMware Cloud"
## [264] "Which of the following cloud computing platforms do you hope to become more familiar with in the next 2 years? - Selected Choice - Salesforce Cloud"
## [265] "Which of the following cloud computing platforms do you hope to become more familiar with in the next 2 years? - Selected Choice - Alibaba Cloud"
## [266] "Which of the following cloud computing platforms do you hope to become more familiar with in the next 2 years? - Selected Choice - Tencent Cloud"
## [267] "Which of the following cloud computing platforms do you hope to become more familiar with in the next 2 years? - Selected Choice - None"
## [268] "Which of the following cloud computing platforms do you hope to become more familiar with in the next 2 years? - Selected Choice - Other"
## [269] "In the next 2 years, do you hope to become more familiar with any of these specific cloud computing products? (Select all that apply) - Selected Choice - Amazon EC2"
## [270] "In the next 2 years, do you hope to become more familiar with any of these specific cloud computing products? (Select all that apply) - Selected Choice - AWS Lambda"
## [271] "In the next 2 years, do you hope to become more familiar with any of these specific cloud computing products? (Select all that apply) - Selected Choice - Amazon Elastic Container Service"
## [272] "In the next 2 years, do you hope to become more familiar with any of these specific cloud computing products? (Select all that apply) - Selected Choice - Azure Cloud Services"
## [273] "In the next 2 years, do you hope to become more familiar with any of these specific cloud computing products? (Select all that apply) - Selected Choice - Microsoft Azure Container Instances"
## [274] "In the next 2 years, do you hope to become more familiar with any of these specific cloud computing products? (Select all that apply) - Selected Choice - Azure Functions"
## [275] "In the next 2 years, do you hope to become more familiar with any of these specific cloud computing products? (Select all that apply) - Selected Choice - Google Cloud Compute Engine"
## [276] "In the next 2 years, do you hope to become more familiar with any of these specific cloud computing products? (Select all that apply) - Selected Choice - Google Cloud Functions"
## [277] "In the next 2 years, do you hope to become more familiar with any of these specific cloud computing products? (Select all that apply) - Selected Choice - Google Cloud Run"
## [278] "In the next 2 years, do you hope to become more familiar with any of these specific cloud computing products? (Select all that apply) - Selected Choice - Google Cloud App Engine"
## [279] "In the next 2 years, do you hope to become more familiar with any of these specific cloud computing products? (Select all that apply) - Selected Choice - None"
## [280] "In the next 2 years, do you hope to become more familiar with any of these specific cloud computing products? (Select all that apply) - Selected Choice - Other"
## [281] "In the next 2 years, do you hope to become more familiar with any of these specific machine learning products? (Select all that apply) - Selected Choice - Amazon SageMaker"
## [282] "In the next 2 years, do you hope to become more familiar with any of these specific machine learning products? (Select all that apply) - Selected Choice - Amazon Forecast"
## [283] "In the next 2 years, do you hope to become more familiar with any of these specific machine learning products? (Select all that apply) - Selected Choice - Amazon Rekognition"
## [284] "In the next 2 years, do you hope to become more familiar with any of these specific machine learning products? (Select all that apply) - Selected Choice - Azure Machine Learning Studio"
## [285] "In the next 2 years, do you hope to become more familiar with any of these specific machine learning products? (Select all that apply) - Selected Choice - Azure Cognitive Services"
## [286] "In the next 2 years, do you hope to become more familiar with any of these specific machine learning products? (Select all that apply) - Selected Choice - Google Cloud AI Platform / Google Cloud ML Engine"
## [287] "In the next 2 years, do you hope to become more familiar with any of these specific machine learning products? (Select all that apply) - Selected Choice - Google Cloud Video AI"
## [288] "In the next 2 years, do you hope to become more familiar with any of these specific machine learning products? (Select all that apply) - Selected Choice - Google Cloud Natural Language"
## [289] "In the next 2 years, do you hope to become more familiar with any of these specific machine learning products? (Select all that apply) - Selected Choice - Google Cloud Vision AI"
## [290] "In the next 2 years, do you hope to become more familiar with any of these specific machine learning products? (Select all that apply) - Selected Choice - None"
## [291] "In the next 2 years, do you hope to become more familiar with any of these specific machine learning products? (Select all that apply) - Selected Choice - Other"
## [292] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - MySQL"
## [293] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - PostgresSQL"
## [294] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - SQLite"
## [295] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Oracle Database"
## [296] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - MongoDB"
## [297] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Snowflake"
## [298] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - IBM Db2"
## [299] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Microsoft SQL Server"
## [300] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Microsoft Access"
## [301] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Microsoft Azure Data Lake Storage"
## [302] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Amazon Redshift"
## [303] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Amazon Athena"
## [304] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Amazon DynamoDB"
## [305] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Google Cloud BigQuery"
## [306] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Google Cloud SQL"
## [307] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Google Cloud Firestore"
## [308] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - None"
## [309] "Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Other"
## [310] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Microsoft Power BI"
## [311] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Amazon QuickSight"
## [312] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Google Data Studio"
## [313] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Looker"
## [314] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Tableau"
## [315] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Salesforce"
## [316] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Einstein Analytics"
## [317] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Qlik"
## [318] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Domo"
## [319] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - TIBCO Spotfire"
## [320] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Alteryx"
## [321] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Sisense"
## [322] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - SAP Analytics Cloud"
## [323] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - None"
## [324] "Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Other"
## [325] "Which categories of automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Automated data augmentation (e.g. imgaug, albumentations)"
## [326] "Which categories of automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Automated feature engineering/selection (e.g. tpot, boruta_py)"
## [327] "Which categories of automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Automated model selection (e.g. auto-sklearn, xcessiv)"
## [328] "Which categories of automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Automated model architecture searches (e.g. darts, enas)"
## [329] "Which categories of automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Automated hyperparameter tuning (e.g. hyperopt, ray.tune, Vizier)"
## [330] "Which categories of automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Automation of full ML pipelines (e.g. Google Cloud AutoML, H20 Driverless AI)"
## [331] "Which categories of automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - None"
## [332] "Which categories of automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Other"
## [333] "Which specific automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Google Cloud AutoML"
## [334] "Which specific automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - H20 Driverless AI"
## [335] "Which specific automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Databricks AutoML"
## [336] "Which specific automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - DataRobot AutoML"
## [337] "Which specific automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Tpot"
## [338] "Which specific automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Auto-Keras"
## [339] "Which specific automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Auto-Sklearn"
## [340] "Which specific automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Auto_ml"
## [341] "Which specific automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Xcessiv"
## [342] "Which specific automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - MLbox"
## [343] "Which specific automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - None"
## [344] "Which specific automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years? (Select all that apply) - Selected Choice - Other"
## [345] "In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - Neptune.ai"
## [346] "In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - Weights & Biases"
## [347] "In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - Comet.ml"
## [348] "In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - Sacred + Omniboard"
## [349] "In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - TensorBoard"
## [350] "In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - Guild.ai"
## [351] "In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - Polyaxon"
## [352] "In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - Trains"
## [353] "In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - Domino Model Monitor"
## [354] "In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - None"
## [355] "In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - Other"
df <- df %>%
rename(
job_title = "Select the title most similar to your current role (or most recent title if retired): - Selected Choice",
experience = "For how many years have you been writing code and/or programming?",
country = "In which country do you currently reside?",
salary = "What is your current yearly compensation (approximate $USD)?",
education = "What is the highest level of formal education that you have attained or plan to attain within the next 2 years?"
)
df$job_title <- as.factor(df$job_title)
df$experience <- as.factor(df$experience)
df$salary <- as.factor(df$salary)
df$country <- as.factor(df$country)
df_filtered <- df %>% filter(!is.na(salary))
df_filtered <- df %>%
select(
job_title,
experience,
country,
salary,
education,
#Programming languages (Q7)
starts_with("What programming languages do you use on a regular basis"),
#ML tools (Q17)
starts_with("Which of the following ML algorithms do you use on a regular basis")
)
q7_cols <- grep("What programming languages do you use on a regular basis?", names(df_filtered), value = TRUE)
q17_cols <- grep("Which of the following ML algorithms do you use on a regular basis?", names(df_filtered), value = TRUE)
unique_vals <- df_filtered %>%
select(all_of(q7_cols)) %>%
unlist() %>%
unique()
unique_vals <- unique_vals[!is.na(unique_vals)]
unique_vals
## [1] "Python" "R" "SQL" "C" "C++"
## [6] "Java" "Javascript" "Julia" "Swift" "Bash"
## [11] "MATLAB" "None" "Other"
df_filtered[q7_cols] <- lapply(df_filtered[q7_cols], function(x) ifelse(x != "" & !is.na(x), 1, 0))
df_filtered[q17_cols] <- lapply(df_filtered[q17_cols], function(x) ifelse(x != "" & !is.na(x), 1, 0))
df_filtered$total_prog_skills <- rowSums(df_filtered[q7_cols], na.rm = TRUE)
df_filtered$total_ml_skills <- rowSums(df_filtered[q17_cols], na.rm = TRUE)
view(df_filtered)
df_filtered$total_prog_skills <- rowSums(df_filtered[q7_cols], na.rm = TRUE)
df_filtered$total_ml_skills <- rowSums(df_filtered[q17_cols], na.rm = TRUE)
df_filtered$total_skills <- df_filtered$total_prog_skills + df_filtered$total_ml_skills
view(df_filtered)
str(df_filtered)
## tibble [20,036 × 33] (S3: tbl_df/tbl/data.frame)
## $ job_title : Factor w/ 13 levels "Business Analyst",..: 13 4 11 5 11 3 13 13 5 13 ...
## $ experience : Factor w/ 7 levels "< 1 years","1-2 years",..: 6 6 3 6 5 1 5 1 6 1 ...
## $ country : Factor w/ 55 levels "Argentina","Australia",..: 10 54 1 54 22 16 6 9 13 9 ...
## $ salary : Factor w/ 25 levels "$0-999","> $500,000",..: NA 5 7 6 NA NA NA NA 23 NA ...
## $ education : chr [1:20036] "Doctoral degree" "Master’s degree" "Bachelor’s degree" "Master’s degree" ...
## $ What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Python : num [1:20036] 1 1 0 1 1 1 1 0 1 1 ...
## $ What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - R : num [1:20036] 1 1 0 0 0 1 1 1 0 0 ...
## $ What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - SQL : num [1:20036] 1 1 0 1 0 0 0 0 1 1 ...
## $ What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - C : num [1:20036] 1 0 0 0 0 0 1 0 0 0 ...
## $ What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - C++ : num [1:20036] 0 0 0 0 0 0 0 0 0 0 ...
## $ What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Java : num [1:20036] 0 0 1 0 0 0 0 0 0 0 ...
## $ What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Javascript : num [1:20036] 1 0 1 0 0 0 0 0 0 0 ...
## $ What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Julia : num [1:20036] 0 0 0 0 0 0 0 0 0 0 ...
## $ What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Swift : num [1:20036] 0 0 0 0 0 0 0 0 0 0 ...
## $ What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Bash : num [1:20036] 0 0 1 1 0 0 0 0 1 0 ...
## $ What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - MATLAB : num [1:20036] 1 0 0 0 0 0 0 0 0 0 ...
## $ What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - None : num [1:20036] 0 0 0 0 0 0 0 0 0 0 ...
## $ What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Other : num [1:20036] 1 0 0 0 0 0 0 0 0 0 ...
## $ Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Linear or Logistic Regression : num [1:20036] 0 1 0 1 0 0 1 0 0 0 ...
## $ Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Decision Trees or Random Forests : num [1:20036] 1 0 0 1 0 0 1 0 0 0 ...
## $ Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Gradient Boosting Machines (xgboost, lightgbm, etc): num [1:20036] 1 0 0 1 0 0 1 0 1 0 ...
## $ Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Bayesian Approaches : num [1:20036] 1 0 0 1 0 0 0 0 0 0 ...
## $ Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Evolutionary Approaches : num [1:20036] 0 0 0 0 0 0 0 0 0 0 ...
## $ Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Dense Neural Networks (MLPs, etc) : num [1:20036] 1 0 0 1 0 0 0 0 0 0 ...
## $ Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Convolutional Neural Networks : num [1:20036] 1 1 0 0 0 0 1 1 1 0 ...
## $ Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Generative Adversarial Networks : num [1:20036] 0 0 0 0 0 0 0 0 0 0 ...
## $ Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Recurrent Neural Networks : num [1:20036] 1 0 0 0 0 0 0 0 0 0 ...
## $ Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Transformer Networks (BERT, gpt-3, etc) : num [1:20036] 0 1 0 0 0 0 0 0 0 0 ...
## $ Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - None : num [1:20036] 0 0 0 0 0 0 0 0 0 0 ...
## $ Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Other : num [1:20036] 0 0 0 0 0 0 0 0 0 0 ...
## $ total_prog_skills : num [1:20036] 7 3 3 3 1 2 3 1 3 2 ...
## $ total_ml_skills : num [1:20036] 6 3 0 5 0 0 4 1 2 0 ...
## $ total_skills : num [1:20036] 13 6 3 8 1 2 7 2 5 2 ...
head(df_filtered[q7_cols], 10)
## # A tibble: 10 × 13
## What programming languages do…¹ What programming lan…² What programming lan…³
## <dbl> <dbl> <dbl>
## 1 1 1 1
## 2 1 1 1
## 3 0 0 0
## 4 1 0 1
## 5 1 0 0
## 6 1 1 0
## 7 1 1 0
## 8 0 1 0
## 9 1 0 1
## 10 1 0 1
## # ℹ abbreviated names:
## # ¹`What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Python`,
## # ²`What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - R`,
## # ³`What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - SQL`
## # ℹ 10 more variables:
## # `What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - C` <dbl>,
## # `What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - C++` <dbl>, …
#For Q7 and Q17 columns (Programming Languages and ML tools)
clean_q7_names <- sub(".* - ", "", q7_cols)
clean_q17_names <- sub(".* - ", "", q17_cols)
df_prog <- df_filtered[q7_cols] #new df frame for plotting
names(df_prog) <- clean_q7_names #renames the columns in new df
df_ml <- df_filtered[q17_cols] #new df frame for plotting
names(df_ml) <- clean_q17_names #renames the columns in new df
# Count and plot programming languages
prog_counts <- data.frame(
Language = names(df_prog),
Count = colSums(df_prog, na.rm = TRUE)
)
ggplot(prog_counts, aes(x = Count, y = reorder(Language, Count))) +
geom_bar(stat = "identity", fill = "steelblue") +
labs(
title = "Most Used Programming Languages",
x = "Count", y = "Language"
) +
theme_minimal()
summary(df_prog)
## Python R SQL C
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:1.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.7751 Mean :0.2135 Mean :0.3761 Mean :0.1655
## 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## C++ Java Javascript Julia
## Min. :0.000 Min. :0.000 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.00000
## Median :0.000 Median :0.000 Median :0.0000 Median :0.00000
## Mean :0.191 Mean :0.168 Mean :0.1495 Mean :0.01308
## 3rd Qu.:0.000 3rd Qu.:0.000 3rd Qu.:0.0000 3rd Qu.:0.00000
## Max. :1.000 Max. :1.000 Max. :1.0000 Max. :1.00000
## Swift Bash MATLAB None
## Min. :0.000000 Min. :0.00000 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.00000
## Median :0.000000 Median :0.00000 Median :0.0000 Median :0.00000
## Mean :0.009882 Mean :0.08864 Mean :0.1107 Mean :0.01028
## 3rd Qu.:0.000000 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.00000
## Max. :1.000000 Max. :1.00000 Max. :1.0000 Max. :1.00000
## Other
## Min. :0.00000
## 1st Qu.:0.00000
## Median :0.00000
## Mean :0.09708
## 3rd Qu.:0.00000
## Max. :1.00000
prog_means <- colMeans(df_prog, na.rm = TRUE)
df_prog_means <- data.frame(
Language = names(prog_means),
Mean = as.numeric(prog_means)
)
# Sort in descending order
df_prog_means_sorted <- df_prog_means[order(-df_prog_means$Mean), ]
df_prog_means_sorted
## Language Mean
## 1 Python 0.775104811
## 3 SQL 0.376073068
## 2 R 0.213465762
## 5 C++ 0.191006189
## 6 Java 0.168047514
## 4 C 0.165452186
## 7 Javascript 0.149480934
## 11 MATLAB 0.110650829
## 13 Other 0.097075265
## 10 Bash 0.088640447
## 8 Julia 0.013076462
## 12 None 0.010281493
## 9 Swift 0.009882212
Python: 77.51% R: 21.35% SQL: 37.61% C: 16.55% C++: 19.1% Java: 16.8% Javascript: 14.95% Julia: 1.31% Swift: 0.988% Bash: 8.864% MATLAB: 11.07% None: 1.028% Other: 9.708
# Count and plot ML Tools
ml_counts <- data.frame(
Tools = names(df_ml),
Count = colSums(df_ml, na.rm = TRUE)
)
ggplot(ml_counts, aes(x = Count, y = reorder(Tools, Count))) +
geom_bar(stat = "identity", fill = "darkgreen") +
labs(
title = "Most Used Machine Learning Tools",
x = "Count", y = "ML Tool"
) +
theme_minimal()
summary(df_ml)
## Linear or Logistic Regression Decision Trees or Random Forests
## Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :0.0000
## Mean :0.5271 Mean :0.4394
## 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000
## Gradient Boosting Machines (xgboost, lightgbm, etc) Bayesian Approaches
## Min. :0.0000 Min. :0.000
## 1st Qu.:0.0000 1st Qu.:0.000
## Median :0.0000 Median :0.000
## Mean :0.2562 Mean :0.182
## 3rd Qu.:1.0000 3rd Qu.:0.000
## Max. :1.0000 Max. :1.000
## Evolutionary Approaches Dense Neural Networks (MLPs, etc)
## Min. :0.00000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.00000 Median :0.0000
## Mean :0.03648 Mean :0.1679
## 3rd Qu.:0.00000 3rd Qu.:0.0000
## Max. :1.00000 Max. :1.0000
## Convolutional Neural Networks Generative Adversarial Networks
## Min. :0.0000 Min. :0.00000
## 1st Qu.:0.0000 1st Qu.:0.00000
## Median :0.0000 Median :0.00000
## Mean :0.2924 Mean :0.05111
## 3rd Qu.:1.0000 3rd Qu.:0.00000
## Max. :1.0000 Max. :1.00000
## Recurrent Neural Networks Transformer Networks (BERT, gpt-3, etc)
## Min. :0.0000 Min. :0.00000
## 1st Qu.:0.0000 1st Qu.:0.00000
## Median :0.0000 Median :0.00000
## Mean :0.1731 Mean :0.06478
## 3rd Qu.:0.0000 3rd Qu.:0.00000
## Max. :1.0000 Max. :1.00000
## None Other
## Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000
## Mean :0.03673 Mean :0.02046
## 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000
ml_means <- colMeans(df_ml, na.rm = TRUE)
df_ml_means <- data.frame(
Tools = names(ml_means),
Mean = as.numeric(ml_means)
)
# Sort in descending order
df_ml_means_sorted <- df_ml_means[order(-df_ml_means$Mean), ]
df_ml_means_sorted
## Tools Mean
## 1 Linear or Logistic Regression 0.52705131
## 2 Decision Trees or Random Forests 0.43940906
## 7 Convolutional Neural Networks 0.29242364
## 3 Gradient Boosting Machines (xgboost, lightgbm, etc) 0.25623877
## 4 Bayesian Approaches 0.18202236
## 9 Recurrent Neural Networks 0.17308844
## 6 Dense Neural Networks (MLPs, etc) 0.16789778
## 10 Transformer Networks (BERT, gpt-3, etc) 0.06478339
## 8 Generative Adversarial Networks 0.05110801
## 11 None 0.03673388
## 5 Evolutionary Approaches 0.03648433
## 12 Other 0.02046317
# Add job titles to df_prog for grouping
df_prog_long <- df_prog %>% #df_prog is the cleaned df containing only q7 columns
mutate(job_title = df_filtered$job_title) %>% #adds job title column to df_prog
pivot_longer( #convert from wide format (one column per language) to long format (one row per language used per person)
cols = -job_title, # when converting leave job title column as is and reshape other columns (language and used)
names_to = "Language",
values_to = "Used"
) %>%
filter(Used == 1) #filter for only languages that were used
# Count by job title and language
prog_counts <- df_prog_long %>%
filter(!job_title %in% c("Currently not employed", "Student", "Other")) %>%
group_by(job_title, Language) %>% #group long format by job title and language
summarise(Count = n(), .groups = "drop") #counts how many respondents with a specific job used each language
# Plot
ggplot(prog_counts, aes(x = Count, y = reorder(Language, Count), fill = job_title)) +
geom_col() +
labs(
title = "Most Used Programming Languages by Job Title",
x = "Count",
y = "Programming Language",
fill = "Job Title"
) +
theme_minimal()
# Add job titles to df_ml for grouping
df_ml_long <- df_ml %>%
mutate(job_title = df_filtered$job_title) %>%
pivot_longer(
cols = -job_title,
names_to = "Tool",
values_to = "Used"
) %>%
filter(Used == 1)
# Count by job title and tool
ml_counts <- df_ml_long %>%
filter(!job_title %in% c("Currently not employed", "Student", "Other")) %>%
group_by(job_title, Tool) %>%
summarise(Count = n(), .groups = "drop")
# Plot
ggplot(ml_counts, aes(x = Count, y = reorder(Tool, Count), fill = job_title)) +
geom_col() +
labs(
title = "Most Used ML Tools by Job Title",
x = "Count",
y = "ML Tool",
fill = "Job Title"
) +
theme_minimal()
desired_order <- c(
"0-9999", "10,000-14,999", "15,000-19,999", "20,000-24,999",
"25,000-29,999", "30,000-39,999", "40,000-49,999", "50,000-59,999",
"60,000-69,999", "70,000-79,999", "80,000-89,999", "90,000-99,999",
"100,000-124,999", "125,000-149,999", "150,000-199,999", "200,000-249,999",
"250,000-299,999", "300,000-499,999", "500,000-999,999", "1,000,000 or more"
)
df$salary <- factor(df$salary, levels = desired_order)
levels(df$salary)
## [1] "0-9999" "10,000-14,999" "15,000-19,999"
## [4] "20,000-24,999" "25,000-29,999" "30,000-39,999"
## [7] "40,000-49,999" "50,000-59,999" "60,000-69,999"
## [10] "70,000-79,999" "80,000-89,999" "90,000-99,999"
## [13] "100,000-124,999" "125,000-149,999" "150,000-199,999"
## [16] "200,000-249,999" "250,000-299,999" "300,000-499,999"
## [19] "500,000-999,999" "1,000,000 or more"
excluded_ranges <- c(
"0-9,999",
"2,000-2,999",
"10,000-14,999",
"15,000-19,999",
"20,000-24,999",
"25,000-29,999",
"30,000-39,999",
"500,000-999,999",
"1,000,000 or more"
)
desc_salary_levels <- c(
">$500,000",
"300,000-499,999",
"250,000-299,999",
"200,000-249,999",
"150,000-199,999",
"125,000-149,999",
"100,000-124,999",
"90,000-99,999",
"80,000-89,999",
"70,000-79,999",
"60,000-69,999",
"50,000-59,999",
"40,000-49,999",
"30,000-39,999",
"25,000-29,999",
"20,000-24,999",
"15,000-19,999",
"10,000-14,999"
)
#TEST
df_us_only <- df_filtered %>%
filter(country == "United States of America")
df_long <- df_us_only %>%
pivot_longer(
cols = all_of(q7_cols),
names_to = "Language_raw",
values_to = "Used"
) %>%
filter(Used == 1) %>%
mutate(Language = sub(".* - ", "", Language_raw))
df_lang_usage <- df_long %>%
group_by(Language) %>%
summarise(Count = n(), .groups = "drop") %>%
arrange(desc(Count))
head(df_lang_usage, 5)
## # A tibble: 5 × 2
## Language Count
## <chr> <int>
## 1 Python 1705
## 2 SQL 1061
## 3 R 713
## 4 Bash 357
## 5 Javascript 322
df_us_only <- df_filtered %>%
filter(country == "United States of America")
df_long <- df_us_only %>%
pivot_longer(
cols = all_of(q7_cols),
names_to = "Language_raw",
values_to = "Used"
) %>%
filter(Used == 1) %>%
mutate(Language = sub(".* - ", "", Language_raw))
df_lang_usage <- df_long %>%
group_by(Language) %>%
summarise(Count = n(), .groups = "drop") %>%
arrange(desc(Count))
head(df_lang_usage, 5)
## # A tibble: 5 × 2
## Language Count
## <chr> <int>
## 1 Python 1705
## 2 SQL 1061
## 3 R 713
## 4 Bash 357
## 5 Javascript 322
#---------- Useage counts
df_lang_usage <- df_long %>%
group_by(Language) %>%
summarise(Count = n(), .groups = "drop") %>%
arrange(desc(Count))
head(df_lang_usage, 10)
## # A tibble: 10 × 2
## Language Count
## <chr> <int>
## 1 Python 1705
## 2 SQL 1061
## 3 R 713
## 4 Bash 357
## 5 Javascript 322
## 6 Other 318
## 7 Java 295
## 8 C++ 254
## 9 MATLAB 219
## 10 C 173
#------------------- Plotting shows Other instead of Javascript because of NA and outlier removal
df_us_only <- df_filtered %>%
filter(
country == "United States of America",
!is.na(salary),
!salary %in% excluded_ranges
) %>%
mutate(
salary = factor(salary, levels = desc_salary_levels, ordered = TRUE),
salary = droplevels(salary)
)
df_long <- df_us_only %>%
pivot_longer(
cols = all_of(q7_cols),
names_to = "Language_raw",
values_to = "Used"
) %>%
filter(Used == 1) %>%
mutate(Language = sub(".* - ", "", Language_raw))
df_lang_usage <- df_long %>%
group_by(Language) %>%
summarise(Count = n(), .groups = "drop") %>%
arrange(desc(Count))
top_5_langs <- head(df_lang_usage$Language, 5)
df_long_top5 <- df_long %>%
filter(Language %in% top_5_langs)
df_counts <- df_long_top5 %>%
group_by(salary, Language) %>%
summarise(n = n(), .groups = "drop")
ggplot(df_counts, aes(x = n, y = reorder(Language, n), fill = salary)) +
geom_col(position = "dodge") +
labs(
x = "Number of Respondents",
y = "Programming Language",
fill = "Salary Range",
title = "Top 5 Programming Languages vs. Salary Range (US Only)"
) +
theme_minimal()
df_ml_long <- df_us_only %>%
pivot_longer(
cols = all_of(q17_cols),
names_to = "MLTool_raw",
values_to = "Used"
) %>%
filter(Used == 1) %>%
mutate(MLTool = sub(".* - ", "", MLTool_raw))
df_ml_usage <- df_ml_long %>%
group_by(MLTool) %>%
summarise(Count = n(), .groups = "drop") %>%
arrange(desc(Count))
top_5_tools <- head(df_ml_usage$MLTool, 5)
df_ml_long_top5 <- df_ml_long %>%
filter(MLTool %in% top_5_tools)
df_counts <- df_ml_long_top5 %>%
group_by(salary, MLTool) %>%
summarise(n = n(), .groups = "drop")
ggplot(df_counts, aes(x = n, y = reorder(MLTool, n), fill = salary)) +
geom_col(position = "dodge") +
labs(
x = "Number of Respondents",
y = "ML Tool",
fill = "Salary Range",
title = "Top 5 ML Tools vs. Salary Range (US Only)"
) +
theme_minimal()
df_edu <- df_filtered %>%
filter(
country == "United States of America",
!is.na(salary),
!is.na(education),
!salary %in% excluded_ranges
) %>%
mutate(
salary = factor(salary),
education = factor(education)
) %>%
droplevels()
df_edu_counts <- df_edu %>%
group_by(education, salary) %>%
summarise(n = n(), .groups = "drop")
ggplot(df_edu_counts, aes(x = n, y = reorder(education, n), fill = salary)) +
geom_col(position = "dodge") +
labs(
x = "Number of Respondents",
y = "Education Level",
fill = "Salary Range",
title = "Salary Range by Education Level (US Only)"
) +
theme_minimal()
top_langs <- df_lang_usage %>%
slice_max(Count, n = 5) %>%
pull(Language)
df_lang_edu <- df_long %>%
filter(
country == "United States of America",
Language %in% top_langs,
!is.na(education)
) %>%
mutate(
Language = factor(Language, levels = top_langs),
education = factor(education)
)
education_levels <- c(
"No formal education past high school",
"Some college/university study without earning a bachelor’s degree",
"Bachelor’s degree",
"Master’s degree",
"Professional degree",
"Doctoral degree",
"I prefer not to answer"
)
df_lang_edu$education <- factor(df_lang_edu$education, levels = education_levels)
df_lang_edu_counts <- df_lang_edu %>%
group_by(Language, education) %>%
summarise(n = n(), .groups = "drop")
ggplot(df_lang_edu_counts, aes(x = n, y = education, fill = Language)) +
geom_col(position = "dodge") +
labs(
title = "Education Level of Respondents by Top 5 Programming Languages",
x = "Number of Respondents",
y = "Education Level",
fill = "Programming Language"
) +
theme_minimal()
df_ml_edu_counts <- df_ml_long_top5 %>%
group_by(MLTool, education) %>%
summarise(n = n(), .groups = "drop") %>%
complete(MLTool, education = education_levels, fill = list(n = 0)) %>%
mutate(education = factor(education, levels = rev(education_levels)))
ggplot(df_ml_edu_counts, aes(x = n, y = education, fill = MLTool)) +
geom_col(position = "dodge") +
labs(
title = "Education Level of US Respondents by Top 5 ML Tools",
x = "Number of Respondents",
y = "Education Level",
fill = "ML Tool"
) +
theme_minimal() +
scale_fill_brewer(palette = "Set1") +
theme(legend.position = "bottom")
Based on the various visualizations throughout this project, we can infer that programming proficiency in Python and SQL are the two most consistently used skills and its popularity remains within the top 5 skills (especially in the US), including R, refered as a also playing a strong role for many professionals. Neural networks, decision trees, gradient boosting methods, and linear/logistic regression are some of the machine learning tools and techniques that seem to be frequently utilized and appreciated. Stronger backgrounds in these programming languages and machine learning frameworks are frequently associated with better wages, and many responders in higher salary categories have at least a bachelor’s or master’s degree. All things considered, the most sought-after set of data science competencies is a strong foundation in Python, SQL, and fundamental machine learning algorithms—particularly those using neural networks and tree-based approaches.