Home

Column

Participation over the years

Education Distribution of the Participants

Age distribution of participants

Column

Participation across countries

Designation Distribution of Participants

Women In STEM

Demographic, Education, Designation, and Salary.


Insights

  • Close to 50% of the people who took this survey have a master’s degree.
  • 47% of the women respondents have a master’s degree as opposed to 43% for men.
  • Data Science and Software Engineer are the two most popular designations. Number of Students who took the respondents is nearly equal to the number of data scientists.
  • 24% of the students are women as compared to 20% for men.
  • There are more women who are Students, Statisticians, Product/Program Manager, Data Analyst, and Research Scientist as compared to men.
  • More men are Data Scientists, SWEs, DBA/DB Engineers, and Data Engineers as compared to women.
  • 6.87% women are unemployed as compared to 4.46% for men.

Machine Learning at Work.


Insights

  • Majority of the respondents are exploring ML models and may put a model into production one day at Work. A close second is the number of people who’ve put a model in production in the last 2 years.
  • More women (20.22%) do no use ML at work as opposed to men (18%).
  • More men(19.69%) work in teams that have well established ML methods at work as opposed to women (16.05%).
  • A large majority of the respondents work in Data Science teams of size 1-2 or 20+ ie. either a small exploratory team or a full fledged team.
  • More women (24.48%) work in team of size 20+ as opposed to men (23.08%).
  • More men (22.47%) work in team of size 1-2 as opposed women (19.44%).

Tools and Technologies used in Data Science.


Insights

  • MySQL and PostgresSQL are the most used RDBMS products.
  • More women (23.51%) use MySQL as compared to men (22.28%). More men (15.73%) use PostgresSQL as compared to women (13.42%).
  • Scikit-learn is by far the most popular ML framework used by close to 50% of the respondents. Keras is a close second in popularity.
  • More women (25.32%) use scikit-learn in Python as compared to men (22.98%). More women (3.52%) use Caret, a ML library in R, as compared to men (2.72%).
  • Men use deep learning Frameworks such as PyTorch, Tensorflow, and Keras than women.
  • Matplotlib and Seaborn are the most popular data visualization libraries. A close third is the ggplot2 library.
  • More men (34.11%) use matplotlib library as compared to women (31.47%). More women (17.41%) use ggplot2 library as compared to men (12.71%).
  • Jupyter notebook/lab is the most popular editor used by over 50% of the respondents. VScode and RStudio come a close second.
  • Kaggle Kernels and Google Colab are the most popular hosted notebook services.

Algorithms used in ML (NLP and Computer Vision).


Insights

  • Word embeddings are the most popular NLP technique used followed by encoder decoder models.
  • Automated Model Selection is the most popular tool used followed by Data Augmentation Techniques.
  • Image classification is the most common computer vision method used.

Coding Experience and Recommendations.


Insights

  • Close to 50% of the respondents have spent 0-2 years writing code to analyze data.
  • More women (28.88%) have less than 1 year of experience than men (23.79%) in writing code to analyze data. More men have >1 year of experience in writing code as compared to women.
  • Python is the most popular programming language used followed by SQL and R.
  • Python is by far the most recommended language for beginners with over 50% of the respondents recommeding it. R is a close second.
  • More men (79.8%) recommend Python as compared to women (73.38%). While more women (11.4%) recommend R as compared to men (8.94%).

Data Science Media and Courses Platforms.


Insights

  • Kaggle is the most followed data science media source followed by blogs such as Towards Data Science.
  • More women consume content from Kaggle and Blogs as compared to men.
  • Coursera, Kaggle, Udemy, and University are the most popular sources of learning data science through courses.
  • More women (13.55%) learn through University courses than men (10.77%).

R vs Python

Where is R/Python being used?


Insights

  • United States and India are the countries where R and Python is used the most.
  • United States has more R users while India has more Python users.

Who is using R and Python?


Insights

  • People in the age group 25-29 use R/Python the most.
  • Data Scientist use R and Python the most as compared to other designations. Software Engineers use Python way more than R users.
  • Python users consistently all more salary across all salary ranges.
  • Higher number of Statisticians use R as compared to Python.
  • More people with 1-2 years of coding experience use Python while more people with 3-5 years of coding experience use R.
  • People with Master’s degree use Python and R more as compared all other education degrees.

Algorithms used in ML (NLP, AutoML, and Computer Vision).