19/01/2016

Introduction to Data Science

Introduction to Data Science

  • Data science is an emerging field

  • Previous to data science?
    • Statistics
    • Databases
  • Was data science born from industry or academics?

Introduction to Data Science

Data Science Skillset

Data Science Profile

  • The skills that a data scientist should have
    • Computer Science
    • Mathematics
    • Statistics
    • Machine Learning
    • Domain Expertise
    • Communication and Presentation Skills
    • Data Visualization

Data Science Profile

Data Science Profile

Data Science Profile

Data Science Profile

Data Science is Multidisciplinary

The Data Scientist

The Data Scientist

The Data Scientist

Introduction to Data Science

  • Why is data science so important now?
    • Massive data all around our lives
    • Abundance of inexpensive computing power
    • A lot of data is being tracked on-line
    • Datafication of our behaviour

Introduction to Data Science

  • What kind of data?
    • Internet
    • Finances
    • Medical industry
    • Pharmaceuticals
    • Bioinformatics
    • Social welfare
    • Government
    • Education
    • Retail
    • Science
    • more and more…

Introduction to Data Science

  • Is data massiveness what makes this data interesting?
    • Data becomes the building blocks of data products
      • Amazon recommendation systems
      • Netflix
      • Finances: credit ratings, train algorithms
      • Education: personalized learning
      • Government: policies based on data

Introduction to Data Science

  • Data products particularities
    • Massive, culturally saturated feedback loop
      • Our behavior changes the product
      • The product changes our behavior
    • Technology plays important role
      • Data centers
      • Large-scale processing
      • Large amounts of memory
      • Bandwidth
  • "The Rise of Big Data", Kenneth Neil Cukier and Viktor Mayer-Schoenberger, Foreign Affairs, May/June 2013.

Datafication

  • Datafication is a recent concept
    • taking all aspects of life and turning them into data
    • Location datafied with latitude and longitude, later with GPS
    • Facebook datafies friendship through likes
    • Google's augmented-reality glasses datafy the gaze
    • Twitter datafies straight thoughts
    • LinkedIn datafies professional networks
  • Now we can use this data

Datafication

Datafication

  • We are able to know information about people (who shares their data)
  • We also share data (in a pasive way) when we use the internet
    • We are being datafied through cookies
    • We are datafied when we walk on the street with sensors
      • Cameras
  • Importance of datafication
    • "Once we datafy things, we can transform their purpose and turn the information into new forms of value"
    • "we" –> entrepreneurs
    • "value" –> increased efficiency through automation

What is a Data Scientist in Academia

  • Not clearly defined

    • …an academic data scientist is a scientist, trained in anything from social science to biology, who works with large amounts of data, and must grapple with computational problems posed by the structure, size, messiness, and the complexity and nature of the data, while simultaneously solving a real world problem. (Cathy O'Neil & Rachel Schutt)

What is a Data Scientist in Industry

  • Chief Data Scientist
    • Sets the data strategy of the company
      • Engineering and infrastructure to collect data and logging
      • Privacy concerns
      • How data will be used to take decisions
      • How data is used to build products
    • Manages the team of engineers, scientists, and analysts
    • Communicates with CEO, CTO, etc
    • Patenting innovative solutions
    • Set research goals

What is a Data Scientist in Industry

  • This is better defined

    • Someone who knows how to extract meaning from and interpret data, which requires both tools and methods from statistics and machine learning, as well as being human.
  • Collects, cleans and munges data
  • Understands biases in data
  • Performs exploratory data analysis
  • Finds patterns, builds models and algorithms
  • Designs experiments
  • Communicates with team members

Big Data & Data Science & Knowledge Discovery

Big Data [sorry] & Data Science: What Does a Data Scientist Do? by Carlos Somohano, Founder Data Science London

Data Science (defined)

  • The Science and Art of
    • Discover what we don't know from data
      • As in data mining
    • Obtain predictive, actionable insight from data
    • Create data products that have business impact
    • Communicate relevant business stories from data
    • Build confidence in decisions that drive business value

http://datasciencelondon.org

10 Things…

Toolkit…

Known, Unknowns, DIKUW

Data Science Process

Data Science Process

A Data Product

  • We consider a data product as:
    • a curated and crafted from raw data
    • a result of exploration & iterations
    • a machine that learns from data
    • an answer to known unknowns or unknown unknowns
    • a mechanism that triggers immediate business value
    • a probabilistic window of future events of behavior

from Big Data [sorry] & Data Science: What Does a Data Scientist Do?

Data Jiu-Jitsu

Developing Data Products

Developing Data Products

Developing Data Products