Gareth James - Daniela Witten - Trevor Hastie - Robert Tibshirani

-An Introduction to Statistical Learning with Applications in R-

I. Introduction

Supervised learning: involves building a statistical model for predicting, or estimating, an output based on one or more inputs.

Unsupervised learning: Inputs, but no outputs. We are still able to learn relationships and structure from such data.

Predicting a continuous or quantitative output value is often referred to as a regression problem.

Predicting a non-numerical valueโ€“a categorical or qualitative output, is referred to as a classification problem.

Situations involving only input variables, with no corresponding output, where we are trying to group observations by their observed characterstics is a clustering problem.

A note on notation throughout this book and these exercises

n denotes the number of observations and p denotes the number of variables.

xij represents the value of the jth varaiable for ith observation.