The source code for this notebook can be found on Github.

library(tidyverse)
library(itertools)
library(doParallel)

Exercise 1

1. is a direct statistical model which is not learned from data. It is therefore not learning;
1. learns from existing labeled data, so it is supervised learning;
1. is based on a penalty/reward system, so it is reinforcement learning.

Alternative \(d\).

Exercise 2

1. is an NP-complete problem so unlikely to be solvable with any model of reasonable complexity (as far as current knowledge goes, anyway);
1. people are already using Machine Learning for this, so definitely a good problem;
1. we have, from physics, a precise model for this already. No need to learn it from data;
1. I would not look at this as an ML problem, but maybe some reinforcement learning model could work here. Plus if I leave this out there would be no correct alternatives to choose from. ;-)

So, (ii) and (iv), alternative \(a\).

Exercise 3

Let \(D_1, D_2 \in \{w, b\}\) be the random variables representing the first and second draws, respectively. We want to compute the conditional probability \(P[D_2 = b \mid D_1 = b]\):

\[ \begin{align} P\left[D_2 = b \mid D_1 = b\right] &= \frac{P[D_2 = b \cap D_1 = b]}{P[D_1 = b]} = \\ &= \frac{\frac{1}{2} \cdot 1 \cdot 1 + \frac{1}{2}\cdot \frac{1}{2} \cdot 0}{\frac{1}{2}\cdot 1 + \frac{1}{2}\cdot\frac{1}{2}} =\\ &= \frac{\frac{1}{2}}{\frac{3}{4}} = \frac{2}{3} \end{align} \]

And this corresponds to answer \(d\).

Exercise 4

The probability of getting no red marbles is given by the probability of failing \(10\) bernoulli trials with \(p = 1 - 0.55\). We get, therefore:

p <- (1 - 0.55)**10
formatC(p, format = 'e', digits = 3)

## [1] "3.405e-04"

So answer \(b\).

Exercise 5

This is given by \(1\) minus that probability that each and every sample has at least one red marble in it. The probability that a sample has at least one red marble, in turn, is given by \(1 - p\), where \(p\) is the probability that we calculated for Exercise 4. Therefore:

round(1 - (1 - p)**1000, digits = 3)

## [1] 0.289

And that corresponds to answer \(c\).

Exercises 7-10: Linear Perceptron

For the next exercises we will need a linear perceptron. We start by coding one up:

pla_h <- function(X, w)  {
  sign(X %*% w)
}

pla_it <- function(X, z, w) {
  k <- which(pla_h(X, w) != z)
  if (is_empty(k)) {
    return(NULL)
  }
  k <- k[sample.int(length(k), 1)]
  w + z[k] * X[k,]
}

pla <- function(X, z, n = Inf, w = NULL) {
  w <- if (is.null(w)) rep(0, dim(X)[2]) else w
  c(list(as.list(w)), {
    w <- pla_it(X, z, w)
    if (!is.null(w) && n > 0) {
      pla(X, z, n - 1, w)
    }
  })
}

pla_line <- function(w) {
  coef <- c(intercept = -w[1]/w[3], slope = -w[2]/w[3])
  coef[is.nan(coef)] <- 0
  coef
}

We will also need code to generate the initial data points scattered over the \([-1, 1]\) square, and the initial linearly separable dataset:

generate_points <- function(n, smin, smax) {
  cbind(1, matrix(runif(2*n, smin, smax), ncol = 2)) 
}

generate_line <- function(smin, smax) {
  x <- runif(2, smin, smax)
  y <- runif(2, smin, smax)
  slope <- (y[2] - y[1]) / (x[2] - x[1])
  c(
    intercept = y[1] - slope * x[1],
    slope = slope
  )
}

generate_ls_dataset <- function(n, w = NULL, target = f) {
  X <- generate_points(n, -1, 1)
  w <- if (is.null(w)) generate_line(-1, 1) else w
  list(
    X = X,
    w = w,
    z = target(X, w)
  )
}

get_x <- function(X, no_bias = FALSE) X[, (no_bias + 1):(ncol(X) - 1)]
get_y <- function(X) X[, ncol(X)]

f <- function(X, w) {
  general_f(X, c(w, -1))
}

general_f <- function(X, w) {
  sign(X %*% w)
}

Exercise 7

We run \(1000\) sims and look at the statistics for convergence time.

summary(
  unlist(mclapply(
    1:1000,
    function(x) {
      D <- generate_ls_dataset(n = 10)  
      length(pla(
        X = D$X, 
        z = D$z, 
        w = c(0, 0, 0)
      ))
    },
    mc.cores = detectCores()
  ))
)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     2.0     4.0     7.0    10.5    12.0   220.0

My perceptron converges in \(\sim 10\) iterations, on average, which is closer to \(15\) than it is to \(5\), so we get answer \(b\). We plot runs of the perceptron in Figures 1 and 2, and they indeed seem to converge to the right boundary.

D <- generate_ls_dataset(n = 10)
plot_pla(
  D, pla(D$X, D$z), last_only = FALSE
)

Figure 1: Perceptron after convergence with 10 data points.

D <- generate_ls_dataset(n = 400)
plot_pla(
  D, pla(D$X, D$z), last_only = FALSE
)

Figure 2: Perceptron after convergence with 400 data points.

Exercise 8

We will shoot points at the board and see how many fall between \(f\) and \(g\).

error_sim <- function(t, g, n) {
  X <- generate_points(n, -1, 1)
  sum(f(X, t) != f(X, g)) / n
}

We will run \(1000\) trials where we compute the percentage of disagreeing pairs for batches of \(100\,000\) points. We then average those together. Averages-of-averages are fine as long as the samples are of the same size:

\[ \frac{\frac{1}{n}\sum_{i=1}^n s_i + \frac{1}{n}\sum_{i=n+1}^{2n} s_i}{2} = \frac{\frac{1}{n}\sum_{i=1}^{2n}s_i }{2} = \frac{1}{2n}\sum_{i=1}^{2n}s_i \]

And here we go.

summary(
  unlist(mclapply(
    1:1000,
    function(x) {
      D <- generate_ls_dataset(n = 10) 
      error_sim(
        t = D$w, 
        g = pla_line(unlist(pla(D$X, D$z) %>% tail(1))), 
        n = 100000
      )
    },
     mc.cores = detectCores()
  ))
)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00305 0.05694 0.10477 0.20347 0.18980 0.98333

So answer \(c\), as the mean is closer to \(0.1\) than it is to \(0.5\).

Exercise 9

summary(
  unlist(mclapply(
    1:1000,
    function(x) {
      D <- generate_ls_dataset(n = 100)  
      length(pla(
        X = D$X, 
        z = D$z, 
        w = c(0, 0, 0)
      ))
    },
    mc.cores = detectCores()
  ))
)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.00   34.75   62.00  113.55  118.50 2732.00

So answer \(b\), as the mean is closer to \(100\) than it is to \(500\).

Exercise 10

We run the simulation as before.

summary(
  unlist(mclapply(
    1:1000,
    function(x) {
      D <- generate_ls_dataset(n = 100) 
      error_sim(
        t = D$w, 
        g = pla_line(unlist(pla(D$X, D$z) %>% tail(1))), 
        n = 100000
      )
    },
    mc.cores = detectCores()
  ))
)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00035 0.00596 0.01085 0.03007 0.01879 0.99653

So answer \(b\), as the mean is closer to \(0.01\) than it is to \(0.1\).

Learning From Data - Homework 1