Due date

The due date for this exam is Wednesday, May 3, by 11:59PM. Late submissions will not be accepted apart from exceptional circumstances. Consequently, you should plan on submitting before the due date.

Problem 1)

Construct a confusion matrix for the dataset. In other words, create a \(10 \times 10\) matrix, where entry (\(i, j\)) indicates the total number of times that an image from category \(i\) was categorized by a human as belonging to category \(j\).

Solution:

confusion_matrix <- matrix(0, nrow = 10, ncol = 10) #10 x 10 matrix with 0s

counts <- read.csv("cifar10h-counts.csv")
#counts[1,] <- 1:10
labels <- read.csv("cifar10h-labels.csv")

data <- cbind(labels[,2],counts)

for (i in 1:nrow(counts)) {
  true_label <- labels$category_id[i]
  
  predicted_label <- which.max(counts[i,])
  
  #next confusion matrix
  confusion_matrix[true_label, predicted_label] <- confusion_matrix[true_label, predicted_label] + 1
}

print(confusion_matrix)

Problem 2)

Convert the counts in your confusion matrix from problem 1 into probabilities. For example, the first row should indicate the probability that an airplane was assigned to each of the 10 possible categories.

Which category has the highest probability of a correct categorization?

Which category has the lowest probability of a correct categorization?

Solution:

label_names <- c("airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck")

prob_matrix <- confusion_matrix/rowSums(confusion_matrix) # matrix / 1,000
print(prob_matrix)

max_prob <- max(diag(prob_matrix))
max_category <- which(diag(prob_matrix) == max_prob)

i <- 1
while (i < length(max_category) + 1) {
  index <- max_category[i]
  cat(label_names[index], "has the highest probability of a correct categorization\n")
  i <- i + 1
}

min_prob <- min(diag(prob_matrix))
min_category <- which(diag(prob_matrix) == min_prob)
i <- 1
while (i < length(min_category) + 1) {
  index <- min_category[i]
  cat(label_names[index], "has the lowest probability of a correct categorization\n")
  i <- i + 1
}

Problem 3)

Generate a plot of the confusion probabilities. Your plot should be a \(10 \times 10\) tile image where the fill color of each tile indicates the probability of assigning true category \(i\) to category \(j\).

Note: For creating this plot, you should set values along the main diagonal to NA. These correspond to correct categorizations. The reason to set these to NA is that most values along the diagonal are close to 1, while values off the diagonal are close to zero. If you include the correct categorizations, itmakes it hard to visualize the kinds of confusions people make.

Hint: See ?geom_tile in the tidyverse package. Note however, that geom_tile() expects a data frame with 3 columns, for example x, y, and z, where the fill color of each tile comes from the z column.

Solution:

library(ggplot2)
#prob_matrix <- conf_matrix / rowSums(conf_matrix)

#diag(prob_matrix) <- NA

conf_df <- reshape2::melt(prob_matrix, varnames = c("i", "j"), value.name = "prob")

# I will return to try and fix this because it is not working when I set the diagonal to 'NA'
ggplot(conf_df, aes(x = j, y = i, fill = prob)) +
  geom_tile() +
  scale_fill_gradient(low = "white", high = "red", limits = c(0, 0.02)) +
  labs(title = "Confusion Matrix", x = "Predicted Category", y = "True Category") +
  theme(plot.title = element_text(size = 20, face = "bold"),
        axis.title = element_text(size = 16),
        axis.text = element_text(size = 12)) +
  scale_x_discrete(labels = label_names) +
  scale_y_discrete(labels = label_names) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Problem 4)

In information theory, the entropy of a probability distribution measures how much uncertainty there is associated with that distribution. Consider a discrete random variable \(x\), with a corresponding probability distribution over outcomes, \(p(x)\). If there are \(n\) possible outcomes, then the entropy of the distribution, measured in bits, is given by:

\[ H(x) = -\sum_{i=1}^n p(x = i)\,\left[\mathrm{log}_2\,p(x=i)\right] \]

(Note \(\mathrm{log}_2\) indicates the logarithm base 2).

  • Compute the entropy (in bits) associated with each of the 10 image categories, using the probability distributions you computed in Problem 2.

  • Using ggplot, generate a bar chart of the entropies for each image category. Make sure the x-axis is labeled with the categories (not just the integer labels).

Solution:

library(tidyverse)
library(ggplot2)

category_prob <- rowSums(prob_matrix)

category_entropy <- -1 * sapply(1:10, function(i) sum(prob_matrix[i,] * log2(prob_matrix[i,]), na.rm = TRUE))

entropy_df <- data.frame(category = label_names, entropy = category_entropy)
entropy_df <- entropy_df[order(entropy_df$entropy, decreasing = TRUE), ]

ggplot(entropy_df, aes(x = reorder(category, entropy), y = entropy)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  labs(x = "Image Category", y = "Entropy (bits)") +
  ggtitle("Entropy of Image Categories") +
  theme(plot.title = element_text(hjust = 0.5), axis.text.x = element_text(angle = 45, hjust = 1)) 

Problem 5)

The raw dataset consists of 10,000 rows by 10 columns. In other words, for each image we have a 10-dimensional feature vector that characterizes peoples’ mental representation of that image. In this problem and the next, you will use principal component analysis (PCA) to reduce the dimensionality of the dataset.

Use the prcomp() function to perform PCA on the dataset. Make sure you center the data as part of performing PCA.

Print out the first eigenvector.

Hint: This should be a vector of length 10.

Solution:

counts_matrix <- counts[, 1:10]

pca <- prcomp(counts_matrix, center = TRUE)
print(pca$rotation[,1])

Problem 6)

Generate a scatterplot that shows all 10,000 stimuli projected down to a 2D space (the first two principal components). The color of each plot marker should be based on the true category of each image.

Ensure that your figure includes a legend that shows the mapping between colors and categories (using the english labels rather than integer codes).

Hint: See the retx argument to the prcomp() function.

Solution:

center_counts <- apply(data, 2, function(x) x - mean(x)) # centering data from counts (10,000 x 10 data)

pca <- prcomp(center_counts, retx = TRUE)

#first 2 principal components:
pca_2d <- data.frame(PC1 = pca$x[, 1], PC2 = pca$x[, 2], category = factor(label_names))

ggplot(pca_2d, aes(x = PC1, y = PC2, color = category)) +
  geom_point(size = 1) +
  labs(x = "PC1", y = "PC2", title = "PCA Scatterplot") +
  theme_bw() +
  theme(legend.position = "right")

Problem 7)

Based on the results from problems 5 and 6, how would you interpret the first principal component? In other words, how do images that vary along this dimension differ from each other? (I am looking for a plain English description.)

Is there anything else noteworthy about the results you obtained?

Solution:

The first principal component (in 5) resulted in the animals (bird, cat, deer, dog, frog, and horse) having positive PCAs, while machines/vehicles (airplane, automobile, ship, and truck) have negative PCAs. This can possibly mean that people are better suited at categorizing animals than machines/vehicles based off of the features. 

Problem 8)

For this problem, you will train a neural network on the CIFAR-10 image dataset. ie, we will be working the actual images, not the human data.

If you run the following code,

library(keras)
full_dataset <- dataset_cifar10()

train_images <- full_dataset$train$x / 255
train_labels <- to_categorical(full_dataset$train$y)

test_images <- full_dataset$test$x / 255
test_labels <- to_categorical(full_dataset$test$y)

it will download the CIFAR-10 dataset, and extract four variables: train_images, train_labels, test_images and test_labels.

  • train_images is a \(50,000 \times 32 \times 32 \times 3\) array. There are 50,000 images, each one is \(32 \times 32\) pixels, and since they are color images, there are three color channels (storing the R, G, and B values). We divide the values in this array by 255 to normalize the pixel intensities to a number between 0 and 1 (remember that scaling input variables is a critical step with neural networks).

  • The second variable, train_labels is a \(50,000 \times 10\) matrix. It uses a “one-hot code” suitable for training a neural network. In other words, each row of this matrix contains all zeros except for the true category label for that image. These will be the target values for the output of your neural network.

  • test_images and test_labels have similar structure, except they contain 10,000 rows (in fact, these are the 10,000 images used in the human experiment).

Using Keras, construct, and train, a neural network on this dataset. The architecture of the neural network is up to you. A minimal working example is given below; you should find a way to improve on the performance of this one.

model <- keras_model_sequential() %>%
  layer_flatten(input_shape = c(32, 32, 3)) %>% 
  layer_dense(units = 10, activation = "softmax")

If you are feeling ambitious, the following webpage demonstrates a convolutional neural network (CNN) that achieves 82% accuracy on the test set:

https://tensorflow.rstudio.com/examples/cifar10_cnn.html

If you google Keras + CIFAR-10 you will find plenty of other tutorials (though you may have to translate from Python to R).

Some hints and guidelines:

  • The input layer for your network should have input_shape = c(32, 32, 3)
  • The output layer should use a softmax activation function
  • The loss function for training the model should be "categorical_crossentropy"

What is the final accuracy for your trained model on the test set?

Solution:

model %>% compile(
  optimizer = "adam",
  loss = "categorical_crossentropy",
  metrics = c("accuracy")
)

history <- model %>% fit(
  train_images, train_labels,
  epochs = 10,
  batch_size = 64,
  validation_split = 0.2
)

test_metrics <- model %>% evaluate(test_images, test_labels)

cat("Final accuracy for trained model:",test_metrics[["accuracy"]])

Problem 9)

After you have trained your model, if you call

pred <- model$predict(test_images)

you will get a \(10,000 \times 10\) matrix, containing the probability of assigning each image to each category. Note that the true category labels for each image are identical to those contained in the file cifar10h-labels.csv.

An interesting question is the following: How similar is the model’s notion of similarity, to human behavior?

To answer this question, compute the confusion matrix for your trained model, essentially following the same exact procedure you used in Problems 1 and 2.

Hint: The \(10,000 \times 10\) matrix you start with for this problem contains probabilities rather than counts. However, you should be able to solve this problem in essentially the same way as Problem 1.

Solution:

pred_labels <- apply(pred, 1, which.max)
true_labels <- apply(test_labels, 1, which.max) 

model_confusion_matrix <- table(true_labels, pred_labels)
print(model_confusion_matrix)

label_names <- c("airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck")

prob_matrix <- model_confusion_matrix/rowSums(model_confusion_matrix) # matrix / 1,000
print(prob_matrix)

max_prob <- max(diag(prob_matrix))
max_category <- which(diag(prob_matrix) == max_prob)

i <- 1
while (i < length(max_category) + 1) {
  index <- max_category[i]
  cat(label_names[index], "has the highest probability of a correct categorization\n")
  i <- i + 1
}

min_prob <- min(diag(prob_matrix))
min_category <- which(diag(prob_matrix) == min_prob)
i <- 1
while (i < length(min_category) + 1) {
  index <- min_category[i]
  cat(label_names[index], "has the lowest probability of a correct categorization\n")
  i <- i + 1
}

print("The machine was able to guess most categories correctly- except for 3 and 4- which were animals. It is much worse at guessing than a human is; however, they happen to determine machines/vehicles the best which is similar to humans.")

Problem 10)

Interpret your results from Problem 9 relative to the results you obtained from the human data in Problem 2. Are there notable ways in which the machine learning model is similar to humans? Are there ways in which the model is substantially different?

(I am looking for a ~1 paragraph explanation.)

Solution:

The machine learning model is similar to humans due to the fact that they can guess machines better than they can guess animals. The machine, however, is prone to making many more mistakes than a human does. Several mistakes were made between guessing animals such as deer and cats. The machine mixed up cat and deer with every single category at least once- both deer and cat were mixed up with dog the most. Similarly, deer was mixed up with horse by human participants; however, the amount of times it was mistaken was much less than that of the machine. Additionally, it would make sense to mistake a deer with a horse since they have more similar features than a dog! Another thing that machines and humans have in common is confusing animals with other animals and confusing machines/vehicles with other machines/vehicles.