What is a Decision Tree?

A Decision Tree is a graphical repsentation in a form of a tree. The goal is to perdict an unknown variable using other known data inputs, the data will be split according to the set parameters in the model with the goal of finding the unknown variable

which you many find HERE


Intro: Excercise on creating a Decision Tree

In The following exercise we wil create Decision Tree using already existing data provided by R, we will be using MTCARS. We will attempt to find the number of cylinder each respective vehicle has using the known variables of horse-power (HP), engine displacement (disp), and weight (wt),


Section 1 - Creat a new data set: pcars,

We will be creating a new data set called pcars, were we will convert the row names into cloumns and name that variable cars. the code has been provided below. We will be using the HP, DiSP and WT of the car to determine the number of cylinders each respective car has.


Section 2 - Breaking down our data set

The data set has a total of 32 vehicles, we will create a training set pcars_training and call out 22 observations for training and 10 observations for testing.


Section 3 - Constructing the Decision Tree

Now we will use the rpart function, we are converting cyl into a factor because we are attempting to classify the size of the engine (4, 6 or 8 cylinder). We use cylinder as our predicted variable and hp, disp and *wt will be the variables that allow us to predict the cylinders. We use minsplit to split into the 3 category we have.


##Section 4 - Displaying the Decision Tree Now that we have created the tree, it is time to to plot the tree. We now have a labeled tree, if we start from the top. If the disp greater than 153, then it goes to the right its most likely 6 cylinder. If the disp less than 153, then it goes to the left its most likely 4 cylinder.


Section 5

now we will displace the results of our decision tree in a prediction model using pcars_test and add a new column use predicted where it will it displace the predicted results. The important columns to compare in this table is cyl and Predicted

```