Data Science Introduction
Dmitrii Pianov
04/14/2017
Definition
- Data Science - interdisciplinary field that seeks to extract knowledge from the data
- Buzzword or not?
- Why learn it?
- You can practically extract and analyze any data you want!
Components
By Field
- Statistics
- Computer Science
- Applications (NLP, Finance, Image Processing)
By Type
- Machine Learning
- Data Mining
- Visualization
Problems ML deals with
- Can we learn from the data?
- Suppose we want to predict next major financial crisis. What is it hard to do? Not enough information to extract, only have around 100 documented data points.
- Even if we can, how precise our knowledge will be?
- It is a question of modeling. Sometimes we have to accept errors.
- Is it practically possible to implement learning?
- Remember Netflix competition, why did it fail?
Methods of ML
- Supervised Learning - classification/regression of new data based on the history (algo trading, weather forecasts).
- Unsupervised Learning - learning patterns in the data (image, sound recognition).
- Reinforcement Learning - objective based learning (Mar/IO, Chess, GO algorithms).

You want to become a Data Scientist?
- Great!
- You need to know math! Statistics, linear algebra, numerical methods, optimization theory is a good start.
- For ML, some programming languages are better than the others. Try Python, R, Matlab, Julia (maybe).
- Do not forget about data storage! RDB, NoSQL, MongoDB, Hadoop will be useful.