2023-05-03

So What is Data Science?

An interdisciplinary field that uses:

  • Statistics: Analyze and interpret data

  • Computer Science: Develop algorithms and leverage technology

  • Domain Expertise: Understand specific industries and problems

Goals:

  • Extract insights from raw data

  • Make data-driven decisions

  • Predict and classify outcomes

  • Optimize processes and systems

Key Components of Data Science

  1. Data Collection: Gather structured & unstructured data

  2. Data Preprocessing: Clean & transform data

  3. Exploratory Data Analysis: Understand data patterns

  4. Data Visualization: Present insights visually

  5. Model Development: Build predictive/classification models

    • Supervised Learning: Train models with labeled data (e.g., regression (linear reg and regression trees are examples), classification (log regression & classification trees are examples))

    • Unsupervised Learning: Discover hidden patterns without labels (e.g., clustering, dimensionality reduction (we did not cover dimensionality reduction))

    • Natural Language Processing (NLP): Analyze and process text data (e.g., sentiment analysis (shown last class), topic modeling)

  6. Model Evaluation: Assess model performance and refine

Bureau of Labor Statistics: Career Outlooks (May 2022)

Data Scientist: Hawk-Eye

Two hybrid openings: Atlanta, GA and London, England

The ideal candidate will possess a passion for sports & analytics, and the ability to derive insights from complex data sets.

Requirements Include:

  • Bachelor’s degree in Statistics, Data Science, Computer Science, Mathematics, or a related field; a Master’s degree is preferred.
  • 2 years of experience in data analysis, predictive modelling, or machine learning
  • Strong knowledge of basketball rules, strategies, and advanced statistical concepts.
  • Proficiency in programming languages such as Python, R , or Julia, and experience with data manipulation and visualisation tools like pandas, dplyr , plotly, or D3.js
  • Familiarity with machine learning frameworks and libraries, such as TensorFlow, PyTorch, or Scikit-learn.
  • Excellent analytical, problem-solving, and critical thinking skills, with the ability to interpret complex data and draw meaningful conclusions.
  • Strong communication and presentation skills, with the ability to effectively convey complex information to both technical and non-technical audiences.

Data Analyst: TEK Systems

One hybrid opening: Lexington, MA

Responsibilities include analytic support for Compensation Analytics program, and other assigned program support/process improvement, and managing the implementation of assigned projects.

Requirements Include:

  • College, university or equivalent degree required
  • Statistics background (academic or work experience)
  • Understanding and interpreting regression models (these are already built out)Research Software Engineer, Data Science: Dana-Farber Cancer Institute

Research Software Engineer, Data Science: Dana-Farber Cancer Institute

Remote opening: Boston, MA

The Department of Data Science at the DFCI seeks candidates with a strong R programming background. As part of the department’s mission to collaborate with basic biologists and clinical researchers to better understand cancer and improve treatment, our department develops new statistical methods and data analysis pipelines and implements these as R packages or shiny dashboards.

Train SAS users to use dplyr or data.table 

  • Minimum Education: Successful completion of a coding training/coursework, software certificate program, or similar; or current enrollment in a bachelor’s degree program in Computer Science, Software Engineering, or a related field.
  • You should hold demonstrable experience in R, shiny, C++, Python, and Unix.
  • You will be familiar with version control (e.g. git) and standard development practice tools and be able to write modular, maintainable and testable code.
  • A high level of communication skills is essential to be able to elicit complex requirements from, and convey complex requirements to, groups with differing technical backgrounds.

Data Science Graduate Programs: MS & PhD

Pros:

  • Advanced knowledge & skills
  • Specialized job opportunities
  • Networking & collaboration
  • Higher earning potential

Cons:

  • Time commitment
  • Opportunity cost
  • Financial considerations (Although many PhD programs are fully funded with a small stipend.)
  • Competitive entry

Common Job Titles for Advanced Degree Holders

MS in Data Science:

  • Senior Data Scientist
  • Machine Learning Engineer

PhD in Data Science:

  • Research Scientist
  • Senior Data Scientist
  • Machine Learning Engineer
  • Professor
  • Principal Data Scientist
  • AI Researcher

In the meantime, check out: