One Hot Encoding

Dummy variables

When an explanatory variable is categorical we can use to contrast the different categories.
For each variable we choose a and then contrast all remaining categories with the base line.
If an explanatory variable has \(k\) categories, we need \(k-1\) dummy variables to investigate all the differences in the categories with respect to the dependent variable.

For example suppose the explanatory variable was housing coded like this:

[1:] Owner occupier
[2:] renting from a private landlord
[3:] renting from the local authority

We would therefore need to choose a baseline category and create two dummy variables. For example if we chose owner occupier as the baseline category we would code the dummy variables ( and ) like this

\[\begin{array}{|l|c|c|} \hline Tenure: &House1 &House2\\ \hline \hline Owner occupier &0& 0\\ \hline Rented private &1 &0\\ \hline Rented local authority &0 &1\\ \hline \end{array}\]

One-Hot Encoding

One hot encoding is a process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction.


╔════════════╦═════════════════╦════════╗ 
║ CompanyName Categoricalvalue ║ Price  ║
╠════════════╬═════════════════╣════════║ 
║ VW         ╬      1          ║ 20000  ║
║ Acura      ╬      2          ║ 10011  ║
║ Honda      ╬      3          ║ 50000  ║
║ Honda      ╬      3          ║ 10000  ║
╚════════════╩═════════════════╩════════╝

One hot encoding is a process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction.

Example:

Suppose you have ‘flower’ feature which can take values ‘daffodil’, ‘lily’, and ‘rose’. One hot encoding converts ‘flower’ feature to three features, ‘is_daffodil’, ‘is_lily’, and ‘is_rose’ which all are binary.

One Hot Encoding

Data Science with R

DragonflyStats.github.io

Dummy variables

One-Hot Encoding

Example: