’’’{r} # packages library(dplyr) library(caret) # For data splitting library(data.table) # For one-hot encoding library(scales) # For Min-Max scaling

reading in the file

data <- read.csv(“C:\Users\William\OneDrive - Northeastern University\College\4th Year\Spring\DS4420\Lab1\pokedata_num.csv”) head(X)

seperating columns that are not needed and creating target column

X <- data %>% select(-c(hp, pokedex, name))
y <- data$hp

scaling between 0 and 1 using rescale and returns a scaled dataframe

X_scaled <- as.data.frame(scale(X))

bias column

X_scaled[“bias”] = 1 X_scaled

training and splitting the dataset

Split the data

set.seed(42) train_indices <- createDataPartition(y, p = 0.75, list = FALSE) X_train <- X_scaled[train_indices, ] X_test <- X_scaled[-train_indices, ] y_train <- y[train_indices] y_test <- y[-train_indices]

Training data

X_train y_train

Testing data

X_test y_test

setting up infrastructure

number of nodes in the hidden layer

hidden_nodes <- 10

creating random weights

set.seed(123) W1 <- matrix(rnorm(7 * 10), nrow = 7, ncol = hidden_nodes) W2 <- matrix(rnorm(10 * 1), nrow = hidden_nodes, ncol = 1)

fw and yi are scalar values with fw being our predicted

value and yi being the actual y value.

For the first derivative, h can be calculated with w(1)T x and with dimensions of 10X1.

w(1)T has dimensions of 10X7 since there are 10 nodes and 7 features including bias.

x would be 10X1. Multiplying them together would result in 10X1.

For the second derivative, x has dimensions of 7X1.w(2) is 10X1. After the hadamard multiplication

with w(1)T x which is an element-wise multiplication due to relu activation

depending on if the values are positive. The multiplication results in a 10X1 matrix.

Afterwards, the transpose, 1X10 multiplied with 7X1 results in 7X10 matrix.

gradient descent

errors <- numeric() epochs <- 500

relu function

relu <- function(x) { matrix(pmax(0, x)) }

for (epoch in 1:epochs) { # calculating the hidden layer and passing through relu
h_mul <- X_train %*% W1 h <- relu(h_mul)

# predicting y from multiplying h1 with W2 pred_y <- h1 %*% W2 error <- pred_y - y_train

# cacluating mse mse <- mean(error^2) errors[epoch] <- mse

# backpropagating d_W2 <- (2 / n) * t(h) %% error mat1 <- ifelse(h > 0, 1, 0)
error_term <- error
t(W2) * mat1 d_W1 <- (2 / n) * t(X_train) %*% error_term

W2 <- W2 - eta * d_W2 W1 <- W1 - eta * d_W1 }

There is no output. I couldn’t find the reason why.

plot(1:epochs, errors, type = “l”, main = “Error over Epochs”, xlab = “Epochs”, ylab = “Error”) ’’’