Dummy variables

For example suppose the explanatory variable was housing coded like this:

We would therefore need to choose a baseline category and create two dummy variables. For example if we chose owner occupier as the baseline category we would code the dummy variables ( and ) like this

\[\begin{array}{|l|c|c|} \hline Tenure: &House1 &House2\\ \hline \hline Owner occupier &0& 0\\ \hline Rented private &1 &0\\ \hline Rented local authority &0 &1\\ \hline \end{array}\]


One-Hot Encoding

One hot encoding is a process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction.


╔════════════╦═════════════════╦════════╗ 
║ CompanyName Categoricalvalue ║ Price  ║
╠════════════╬═════════════════╣════════║ 
║ VW         ╬      1          ║ 20000  ║
║ Acura      ╬      2          ║ 10011  ║
║ Honda      ╬      3          ║ 50000  ║
║ Honda      ╬      3          ║ 10000  ║
╚════════════╩═════════════════╩════════╝

One hot encoding is a process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction.

Example:

Suppose you have ‘flower’ feature which can take values ‘daffodil’, ‘lily’, and ‘rose’. One hot encoding converts ‘flower’ feature to three features, ‘is_daffodil’, ‘is_lily’, and ‘is_rose’ which all are binary.