library(tidyverse)
library(openintro)

Instructions

  • Complete the coded in the code chunks.
  • Type the answers as text below each question.
  • Knit your file.
  • Upload resulting HTML file to the Canvas

Questions 1 (12 pts)

The MNIST dataset is one of the most well-known datasets in the field of machine learning and is widely used for training various image processing systems. It contains images of handwritten digits, with each image being a 28x28 pixel grayscale image. Each pixel ranges from 0 (white) to 255 (black). The dataset is structured as follows:

  • Label: This variable denotes the class of the handwritten digit image. It represents the digit depicted in the image (0 through 9). In some variations of the MNIST dataset (though not standard), the label might represent letters if it’s modified and extended to alphabets.

  • Pix1, Pix2, …, Pix784: These variables represent the pixel values of the 28x28 pixel images. Each image is “flattened” into a single row with 784 columns (28 multiplied by 28), where each PixN corresponds to the grayscale value of a pixel. Each pixel value ranges from 0 to 255, where 0 corresponds to a completely white pixel and 255 corresponds to a completely black pixel.

Structure of the Data:

  • Rows: Each row in the dataset corresponds to a single image (a single handwritten digit) along with its label.
  • Columns: The first column is typically the Label and the remaining 784 columns are the pixel intensity values from the top-left to the bottom-right of the image.

Goal: The goal with MNIST is to build a model that can predict the Label from the 784 pixel values, effectively allowing a computer to recognize handwritten digits.

Instructions

  1. Load the necessary libraries that will be used for performing k-Nearest Neighbors (kNN) classification and creating a confusion matrix.
  2. Load the MNIST dataset from a file named “MN500.rds” and ensure that the Label variable is set as a factor, as it represents categorical data.
  3. Split the dataset into training and testing subsets, using 70% of the data for training and the remaining 30% for testing. Set a random seed for reproducibility.
  4. Extract the label column from both the training and test datasets for later use in the kNN model.
  5. Remove the label column from the training and test datasets so that only pixel values are used for the kNN algorithm.
  6. Use the knn function to classify test data based on the training data, using k = 5 as the number of nearest neighbors.
  7. Use CrossTable from the gmodels library to construct a confusion matrix to compare the actual and predicted labels.
  8. Calculate the accuracy of the KNN predictions by comparing them to the true test labels, then print the accuracy as a percentage.

**Complete the code below by filling spaces (“___“).**

# Load necessary libraries
library(class)
## Warning: package 'class' was built under R version 4.3.3
library(gmodels)
## Warning: package 'gmodels' was built under R version 4.3.3
# Load MNIST dataset
DataMnist <- readRDS("MN500.rds")
DataMnist$Label <- factor(DataMnist$Label)

# Create Training and Testing Data
set.seed(123)
index <- sample(1:nrow(DataMnist), 0.7 * nrow(DataMnist))
DataTrain <- DataMnist[index, ]
DataTest <- DataMnist[-index, ]

# Prepare the data for KNN (excluding the label column for training/testing dataset)
train_labels <- DataTrain$Label
test_labels <- DataTest$Label

# Remove the label column from the datasets for KNN
DataTrain <- DataTrain[, -1]
DataTest <- DataTest[,-1]

# Perform KNN for k = 5
predicted_labels <- knn(train = DataTrain, test = DataTest, cl = train_labels, k = 5)
print(predicted_labels)
##   [1] 7 1 1 9 4 6 1 9 7 3 4 0 7 1 1 3 1 4 4 0 1 9 1 4 7 4 0 7 1 1 3 1 1 6 9 6 5
##  [38] 1 9 4 7 6 9 5 6 6 1 6 7 1 8 0 1 5 6 6 4 1 2 8 1 8 5 2 5 1 2 1 1 3 5 0 5 3
##  [75] 5 2 4 1 0 2 1 2 5 3 9 9 7 9 0 8 1 0 6 9 3 0 1 5 3 9 7 6 5 1 0 1 9 5 6 7 4
## [112] 1 4 6 6 5 9 4 0 7 7 9 2 4 7 1 9 8 3 2 9 8 3 0 4 6 6 7 7 5 8 9 7 1 7 1 5 4
## [149] 0 6
## Levels: 0 1 2 3 4 5 6 7 8 9
# Create Confusion Matrix using CrossTable
CrossTable(x = test_labels, y = predicted_labels, prop.chisq = FALSE)
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  150 
## 
##  
##              | predicted_labels 
##  test_labels |         0 |         1 |         2 |         3 |         4 |         5 |         6 |         7 |         8 |         9 | Row Total | 
## -------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
##            0 |        12 |         0 |         1 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |        13 | 
##              |     0.923 |     0.000 |     0.077 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.087 | 
##              |     0.923 |     0.000 |     0.125 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |           | 
##              |     0.080 |     0.000 |     0.007 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
##            1 |         0 |        23 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |        23 | 
##              |     0.000 |     1.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.153 | 
##              |     0.000 |     0.742 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |           | 
##              |     0.000 |     0.153 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
##            2 |         0 |         2 |         6 |         0 |         1 |         0 |         0 |         1 |         0 |         0 |        10 | 
##              |     0.000 |     0.200 |     0.600 |     0.000 |     0.100 |     0.000 |     0.000 |     0.100 |     0.000 |     0.000 |     0.067 | 
##              |     0.000 |     0.065 |     0.750 |     0.000 |     0.067 |     0.000 |     0.000 |     0.059 |     0.000 |     0.000 |           | 
##              |     0.000 |     0.013 |     0.040 |     0.000 |     0.007 |     0.000 |     0.000 |     0.007 |     0.000 |     0.000 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
##            3 |         0 |         1 |         0 |         9 |         0 |         0 |         0 |         1 |         0 |         0 |        11 | 
##              |     0.000 |     0.091 |     0.000 |     0.818 |     0.000 |     0.000 |     0.000 |     0.091 |     0.000 |     0.000 |     0.073 | 
##              |     0.000 |     0.032 |     0.000 |     0.900 |     0.000 |     0.000 |     0.000 |     0.059 |     0.000 |     0.000 |           | 
##              |     0.000 |     0.007 |     0.000 |     0.060 |     0.000 |     0.000 |     0.000 |     0.007 |     0.000 |     0.000 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
##            4 |         0 |         0 |         1 |         0 |        10 |         0 |         0 |         0 |         0 |         2 |        13 | 
##              |     0.000 |     0.000 |     0.077 |     0.000 |     0.769 |     0.000 |     0.000 |     0.000 |     0.000 |     0.154 |     0.087 | 
##              |     0.000 |     0.000 |     0.125 |     0.000 |     0.667 |     0.000 |     0.000 |     0.000 |     0.000 |     0.118 |           | 
##              |     0.000 |     0.000 |     0.007 |     0.000 |     0.067 |     0.000 |     0.000 |     0.000 |     0.000 |     0.013 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
##            5 |         0 |         2 |         0 |         1 |         2 |        12 |         1 |         0 |         0 |         0 |        18 | 
##              |     0.000 |     0.111 |     0.000 |     0.056 |     0.111 |     0.667 |     0.056 |     0.000 |     0.000 |     0.000 |     0.120 | 
##              |     0.000 |     0.065 |     0.000 |     0.100 |     0.133 |     0.800 |     0.059 |     0.000 |     0.000 |     0.000 |           | 
##              |     0.000 |     0.013 |     0.000 |     0.007 |     0.013 |     0.080 |     0.007 |     0.000 |     0.000 |     0.000 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
##            6 |         1 |         0 |         0 |         0 |         0 |         0 |        16 |         0 |         0 |         0 |        17 | 
##              |     0.059 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.941 |     0.000 |     0.000 |     0.000 |     0.113 | 
##              |     0.077 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.941 |     0.000 |     0.000 |     0.000 |           | 
##              |     0.007 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.107 |     0.000 |     0.000 |     0.000 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
##            7 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |        13 |         0 |         0 |        13 | 
##              |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     1.000 |     0.000 |     0.000 |     0.087 | 
##              |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.765 |     0.000 |     0.000 |           | 
##              |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.087 |     0.000 |     0.000 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
##            8 |         0 |         2 |         0 |         0 |         1 |         3 |         0 |         0 |         6 |         1 |        13 | 
##              |     0.000 |     0.154 |     0.000 |     0.000 |     0.077 |     0.231 |     0.000 |     0.000 |     0.462 |     0.077 |     0.087 | 
##              |     0.000 |     0.065 |     0.000 |     0.000 |     0.067 |     0.200 |     0.000 |     0.000 |     0.857 |     0.059 |           | 
##              |     0.000 |     0.013 |     0.000 |     0.000 |     0.007 |     0.020 |     0.000 |     0.000 |     0.040 |     0.007 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
##            9 |         0 |         1 |         0 |         0 |         1 |         0 |         0 |         2 |         1 |        14 |        19 | 
##              |     0.000 |     0.053 |     0.000 |     0.000 |     0.053 |     0.000 |     0.000 |     0.105 |     0.053 |     0.737 |     0.127 | 
##              |     0.000 |     0.032 |     0.000 |     0.000 |     0.067 |     0.000 |     0.000 |     0.118 |     0.143 |     0.824 |           | 
##              |     0.000 |     0.007 |     0.000 |     0.000 |     0.007 |     0.000 |     0.000 |     0.013 |     0.007 |     0.093 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
## Column Total |        13 |        31 |         8 |        10 |        15 |        15 |        17 |        17 |         7 |        17 |       150 | 
##              |     0.087 |     0.207 |     0.053 |     0.067 |     0.100 |     0.100 |     0.113 |     0.113 |     0.047 |     0.113 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
## 
## 
# Calculate accuracy
accuracy <- mean(predicted_labels == test_labels)
print(paste("Accuracy:", round(accuracy * 100, 2), "%"))
## [1] "Accuracy: 80.67 %"

Questions 2 (18 pts)

The Titanic dataset is a classic dataset in data science and machine learning typically used for demonstrating classification tasks. This dataset is available in the file `Titanic.csv1. Here’s a general description of the Titanic dataset and its variables:

Description of the Titanic Dataset:

The dataset contains data about the passengers who were onboard the ill-fated RMS Titanic. Your goal is to predict the survival of the passengers based on various features. Below are the variables you will use in this dataset:

Survived: Indicates if the passenger Survived (1) or did not survive (0).

Class: Passenger class, a proxy for socio-economic status (1 = 1st class, 2 = 2nd class, 3 = 3rd class).

Name: Full name of the passenger.

Sex: Gender of the passenger (male or female).

Age: Age of the passenger in years. Some entries may have missing ages.

SibSp: Number of siblings and spouses aboard the Titanic.

Parch: Number of parents and children aboard the Titanic.

Fare: Passenger fare.

Instructions

  1. Load the rpart library for decision tree modeling and the gmodels library for creating a confusion matrix.
  2. Filter the dataset to include only the relevant columns: “Survived”, “Sex”, “Class”, “Age”, and “Fare”.
  3. Convert the “Survived” column to a factor, as it represents categorical binary outcomes.
  4. Set a random seed for reproducibility, and split the dataset into training and test sets. Use 75% of the data for training and 25% for testing.
  5. Train a decision tree model using the rpart function. Predict “Survived” based on all other features, with a maximum depth for tree branches set to 3 for simplicity.
  6. Visualize the trained decision tree using the rpart.plot function, customizing the yes/no labels on each node.
  7. Use the trained model to predict outcomes for the test data, resulting in predicted class labels.
  8. Use the CrossTable function from the gmodels package to generate a confusion matrix comparing actual and predicted outcomes.
  9. Calculate the accuracy of the predictions by comparing predicted labels against actual labels. Print the accuracy percentage.

**Complete the code below by filling spaces (“___“).**

# Load necessary libraries
library(rpart)
#install.packages('gmodels')
library(gmodels)

# Load the dataset; assuming the Titanic dataset
Titanic <- read.csv("Titanic.csv")

# Select relevant columns
Titanic <- Titanic[,c("Survived", "Sex", "Class", "Age", "Fare")]

# Factorize the Survived column
Titanic$Survived <- as.factor(Titanic$Survived)

# Set seed and split the data
set.seed(777)
train_indices <- sample(1:nrow(Titanic), 0.75 * nrow(Titanic))
DataTrain <- Titanic[train_indices, ]
DataTest <- Titanic[-train_indices, ]

# Train the decision tree model
ModelDesignDecTree <- rpart(Survived ~ ., data = DataTrain, method = "class", control = rpart.control(cp=0, maxdepth = 3))

# Visualize the decision tree
library(rpart.plot)
## Warning: package 'rpart.plot' was built under R version 4.3.3
#install.packages('rpart.plot')
rpart.plot(ModelDesignDecTree, yes.text = "YES", no.text = "NO", roundint = FALSE)

# Predict on the test set
predicted_labels <- predict(ModelDesignDecTree, DataTest, type = "class")

# Create confusion matrix using CrossTable
CrossTable(x = DataTest$Survived, y = predicted_labels , prop.chisq = FALSE)
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  222 
## 
##  
##                   | predicted_labels 
## DataTest$Survived |         0 |         1 | Row Total | 
## ------------------|-----------|-----------|-----------|
##                 0 |       115 |        12 |       127 | 
##                   |     0.906 |     0.094 |     0.572 | 
##                   |     0.846 |     0.140 |           | 
##                   |     0.518 |     0.054 |           | 
## ------------------|-----------|-----------|-----------|
##                 1 |        21 |        74 |        95 | 
##                   |     0.221 |     0.779 |     0.428 | 
##                   |     0.154 |     0.860 |           | 
##                   |     0.095 |     0.333 |           | 
## ------------------|-----------|-----------|-----------|
##      Column Total |       136 |        86 |       222 | 
##                   |     0.613 |     0.387 |           | 
## ------------------|-----------|-----------|-----------|
## 
## 
# Calculate accuracy
accuracy <- mean(predicted_labels == DataTest$Survived)
print(paste(accuracy, round(accuracy * 100, 2), "%"))
## [1] "0.851351351351351 85.14 %"

Based on the plot of the decision tree, Asnswer the following questions by typing your answer below each question.

```

Based on the plot of the decision tree, Asnswer the following questions by typing your answer below each question.

  1. What is the survival rate for adult male passengers 13 years or older, regardless of the class they traveled in and the fare they paid?

Answer: 46%

  1. What is the survival rate for young male passengers (younger than 13 years), regardless of which class they traveled and the fare they paid?

Answer: 2%

  1. What is the survival rate for young male passengers (younger than 13 years) traveling in Third Class regardless of the fare they paid?

Answer: 2%

  1. What is the survival rate for female passengers, regardless of age and not considering the class they traveled in or the fare they paid?

Answer: 34%

  1. When considering the class female passengers traveled in, we can see female passengers, regardless of age, had a survival rate of ___ when they traveled in First or Second Class regardless of the fare they paid.

Answer: 18%

---
title: "BANL 3200: Supervises Machine Learning"
subtitle: "Midterm"
author: "Jaden Sampson"
date: "`r Sys.Date()`"
output: openintro::lab_report
---

```{r, setup, include=FALSE, warning=FALSE}
knitr::opts_chunk$set(
  eval = TRUE
)
```

```{r load-packages, message=FALSE, warning=FALSE}
library(tidyverse)
library(openintro)
```

### Instructions

- Complete the coded in the code chunks.
- Type the answers as text below each question.
- Knit your file.
- Upload resulting HTML file to the Canvas


### Questions 1 (12 pts)

The MNIST dataset is one of the most well-known datasets in the field of machine learning and is widely used for training various image processing systems. It contains images of handwritten digits, with each image being a 28x28 pixel grayscale image. Each pixel ranges from 0 (white) to 255 (black). The dataset is structured as follows:

- **Label**: This variable denotes the class of the handwritten digit image. It represents the digit depicted in the image (0 through 9). In some variations of the MNIST dataset (though not standard), the label might represent letters if it’s modified and extended to alphabets.  

- **Pix1, Pix2, ..., Pix784**: These variables represent the pixel values of the 28x28 pixel images. Each image is "flattened" into a single row with 784 columns (28 multiplied by 28), where each `PixN` corresponds to the grayscale value of a pixel. Each pixel value ranges from 0 to 255, where 0 corresponds to a completely white pixel and 255 corresponds to a completely black pixel.

***Structure of the Data:***

- **Rows**: Each row in the dataset corresponds to a single image (a single handwritten digit) along with its label.  
- **Columns**: The first column is typically the `Label` and the remaining 784 columns are the pixel intensity values from the top-left to the bottom-right of the image.

***Goal:*** 
The goal with MNIST is to build a model that can predict the `Label` from the 784 pixel values, effectively allowing a computer to recognize handwritten digits.

#### Instructions

1. Load the necessary libraries that will be used for performing k-Nearest Neighbors (kNN) classification and creating a confusion matrix.  
2. Load the MNIST dataset from a file named "MN500.rds" and ensure that the `Label` variable is set as a factor, as it represents categorical data.  
3. Split the dataset into training and testing subsets, using 70% of the data for training and the remaining 30% for testing. Set a random seed for reproducibility.
4. Extract the label column from both the training and test datasets for later use in the kNN model.  
5. Remove the label column from the training and test datasets so that only pixel values are used for the kNN algorithm.  
6. Use the `knn` function to classify test data based on the training data, using `k = 5` as the number of nearest neighbors.  
7. Use `CrossTable` from the `gmodels` library to construct a confusion matrix to compare the actual and predicted labels.  
8. Calculate the accuracy of the KNN predictions by comparing them to the true test labels, then print the accuracy as a percentage.  


**Complete the code below by filling spaces ("___").**


```{r Q1}
# Load necessary libraries
library(class)
library(gmodels)

# Load MNIST dataset
DataMnist <- readRDS("MN500.rds")
DataMnist$Label <- factor(DataMnist$Label)

# Create Training and Testing Data
set.seed(123)
index <- sample(1:nrow(DataMnist), 0.7 * nrow(DataMnist))
DataTrain <- DataMnist[index, ]
DataTest <- DataMnist[-index, ]

# Prepare the data for KNN (excluding the label column for training/testing dataset)
train_labels <- DataTrain$Label
test_labels <- DataTest$Label

# Remove the label column from the datasets for KNN
DataTrain <- DataTrain[, -1]
DataTest <- DataTest[,-1]

# Perform KNN for k = 5
predicted_labels <- knn(train = DataTrain, test = DataTest, cl = train_labels, k = 5)
print(predicted_labels)

# Create Confusion Matrix using CrossTable
CrossTable(x = test_labels, y = predicted_labels, prop.chisq = FALSE)

# Calculate accuracy
accuracy <- mean(predicted_labels == test_labels)
print(paste("Accuracy:", round(accuracy * 100, 2), "%"))
```


### Questions 2 (18 pts)

The Titanic dataset is a classic dataset in data science and machine learning typically used for demonstrating classification tasks. This dataset is available in the file `Titanic.csv1. Here's a general description of the Titanic dataset and its variables:

### Description of the Titanic Dataset:

The dataset contains data about the passengers who were onboard the ill-fated RMS Titanic. Your goal is to predict the survival of the passengers based on various features. Below are the variables you will use in this dataset:

**Survived**: Indicates if the passenger Survived (1) or did not survive (0).

**Class**: Passenger class, a proxy for socio-economic status (1 = 1st class, 2 = 2nd class, 3 = 3rd class).

**Name**: Full name of the passenger.

**Sex**: Gender of the passenger (male or female).

**Age**: Age of the passenger in years. Some entries may have missing ages.

**SibSp**: Number of siblings and spouses aboard the Titanic.

**Parch**: Number of parents and children aboard the Titanic.

**Fare**: Passenger fare.

#### Instructions

1. Load the `rpart` library for decision tree modeling and the `gmodels` library for creating a confusion matrix.  
2. Filter the dataset to include only the relevant columns: "Survived", "Sex", "Class", "Age", and "Fare".  
3. Convert the "Survived" column to a factor, as it represents categorical binary outcomes.  
4. Set a random seed for reproducibility, and split the dataset into training and test sets. Use 75% of the data for training and 25% for testing.  
5. Train a decision tree model using the `rpart` function. Predict "Survived" based on all other features, with a maximum depth for tree branches set to 3 for simplicity.  
6. Visualize the trained decision tree using the `rpart.plot` function, customizing the yes/no labels on each node.  
7. Use the trained model to predict outcomes for the test data, resulting in predicted class labels.  
8. Use the `CrossTable` function from the `gmodels` package to generate a confusion matrix comparing actual and predicted outcomes.  
9. Calculate the accuracy of the predictions by comparing predicted labels against actual labels. Print the accuracy percentage.  

**Complete the code below by filling spaces ("___").**


```{r Q2}
# Load necessary libraries
library(rpart)
#install.packages('gmodels')
library(gmodels)

# Load the dataset; assuming the Titanic dataset
Titanic <- read.csv("Titanic.csv")

# Select relevant columns
Titanic <- Titanic[,c("Survived", "Sex", "Class", "Age", "Fare")]

# Factorize the Survived column
Titanic$Survived <- as.factor(Titanic$Survived)

# Set seed and split the data
set.seed(777)
train_indices <- sample(1:nrow(Titanic), 0.75 * nrow(Titanic))
DataTrain <- Titanic[train_indices, ]
DataTest <- Titanic[-train_indices, ]

# Train the decision tree model
ModelDesignDecTree <- rpart(Survived ~ ., data = DataTrain, method = "class", control = rpart.control(cp=0, maxdepth = 3))

# Visualize the decision tree
library(rpart.plot)
#install.packages('rpart.plot')
rpart.plot(ModelDesignDecTree, yes.text = "YES", no.text = "NO", roundint = FALSE)

# Predict on the test set
predicted_labels <- predict(ModelDesignDecTree, DataTest, type = "class")

# Create confusion matrix using CrossTable
CrossTable(x = DataTest$Survived, y = predicted_labels , prop.chisq = FALSE)

# Calculate accuracy
accuracy <- mean(predicted_labels == DataTest$Survived)
print(paste(accuracy, round(accuracy * 100, 2), "%"))
```

**Based on the plot of the decision tree, Asnswer the following questions by typing your answer below each question.**



```

**Based on the plot of the decision tree, Asnswer the following questions by typing your answer below each question.**

1. What is the survival rate for adult male passengers 13 years or older, regardless of the class they traveled in and the fare they paid?

Answer: 46%

2. What is the survival rate for young male passengers (younger than 13 years), regardless of which class they traveled and the fare they paid?

Answer: 2%

3. What is the survival rate for young male passengers (younger than 13 years) traveling in Third Class regardless of the fare they paid?

Answer: 2%

4. What is the survival rate for female passengers, regardless of age and not considering the class they traveled in or the fare they paid?

Answer: 34%

5. When considering the class female passengers traveled in, we can see female passengers, regardless of age, had a survival rate of ___ when they traveled in First or Second Class regardless of the fare they paid.

Answer: 18%



