Data Science Introduction

Dmitrii Pianov
04/14/2017

Definition

  • Data Science - interdisciplinary field that seeks to extract knowledge from the data
  • Buzzword or not?
  • Why learn it?
  • You can practically extract and analyze any data you want!

Components

By Field

  • Statistics
  • Computer Science
  • Applications (NLP, Finance, Image Processing)

By Type

  • Machine Learning
  • Data Mining
  • Visualization

Problems ML deals with

  • Can we learn from the data?
    • Suppose we want to predict next major financial crisis. What is it hard to do? Not enough information to extract, only have around 100 documented data points.

  • Even if we can, how precise our knowledge will be?
    • It is a question of modeling. Sometimes we have to accept errors.

  • Is it practically possible to implement learning?
    • Remember Netflix competition, why did it fail?

Methods of ML

  • Supervised Learning - classification/regression of new data based on the history (algo trading, weather forecasts).
  • Unsupervised Learning - learning patterns in the data (image, sound recognition).
  • Reinforcement Learning - objective based learning (Mar/IO, Chess, GO algorithms).

You want to become a Data Scientist?

  • Great!
  • You need to know math! Statistics, linear algebra, numerical methods, optimization theory is a good start.
  • For ML, some programming languages are better than the others. Try Python, R, Matlab, Julia (maybe).
  • Do not forget about data storage! RDB, NoSQL, MongoDB, Hadoop will be useful.

Thank you!

plot of chunk unnamed-chunk-1plot of chunk unnamed-chunk-1plot of chunk unnamed-chunk-1