Built using RStudio 1.0.35 notebook.

Problem

What is the pattern of the following equations?

Source: http://www.analyticbridge.com/forum/topics/interesting-math-brain-teaser

It takes a human few seconds/minutes to find the solution.

But, can we solve this problem using machine learning?

Machine Learning Solving

Dataset

(data <- data.frame(x1 = c(6, 9, 8, 5, 7, 9, 10, 15), x2 = c(4, 2, 5, 2, 6, 8, 6, 3), y = c(210, 711, 313, 37, 113, 117, 416, 1218)))

Multinomial Linear Regression

Let’s start by coding few helpers.

Compute models

get.predict <- function(model, train, test, max.degrees = 5) {
    lapply(
      1:max.degrees,
      function(d) {
        m <- model(train, d)
        suppressWarnings(round(predict(m, test)))
      }
    )
}
get.predict.lm <- function(train, test, max.degrees = 5) {
  get.predict(
    function(train, degree) {
      lm(y ~ poly(x1, degree = degree, raw = T) + poly(x2, degree = degree, raw = T), train)
    },
    train,
    test,
    max.degrees
  )
}

Compute errors

get.errors <- function(y, pred) {
  if (!is.list(pred)) {
    pred <- list(pred)
  }
  err <- lapply(
    pred,
    function(p) {
      sum((y - p) ^ 2) / length(y)
    }
  )
  df <- data.frame(
      degree = 1:length(err),
      error = unlist(err)
  )
  return(df)
}

Prediction

pred <- get.predict.lm(data, data[, c("x1", "x2")])
(err <- get.errors(data$y, pred))

Only 4 degrees are required to correctly predict result (error = 0).

Reasons are:

  • We used a (very) small dataset, which allows model to learn from it, and easily overfit.
  • We predicted with the same dataset used to train the model, which is not a good way to do in machine learning.

Dataset++

Let’s generate a bigger dataset.

get.dataset <- function(to = 20) {
    l <- lapply(
      1:to,
      function(x) {
        as.data.frame(
          t(
            sapply(
              1:x,
              function(y) {
                c(x, y, strtoi(paste0(ifelse(x == y, "", sprintf("%s", x - y)), sprintf("%s", x + y))))
              }
            )
          )
        )
      }
    )
    df <- do.call(rbind, l)
    names(df) <- c("x1", "x2", "y")
    
    return(df)
}
# Will generate a dataset of 10585 samples
data.pp <- get.dataset(150)
plot.bt <- function(d) {
  cols <- terrain.colors(max(d$y))
  plot(
    d$x1, d$x2,
    xlab = "x1", ylab = "x2",
    col = cols[d$y],
    pch = 19,
    cex.axis = .8,
    cex.lab = .7
  )
}
plot.bt(data.pp)

get.train.test <- function(data, train.pct = .8){
  train.idx <- sample(1:nrow(data), nrow(data) * train.pct)
  return(list(train = data[train.idx,], test = data[-train.idx,]))
}
eval.model <- function(f.pred, data, train.pct = .8, max.degrees = 20) {
  tt <- get.train.test(data, train.pct)
  train <- tt$train
  test <- tt$test
  
  pred <- f.pred(train, test[, c("x1", "x2")], max.degrees)
  err <- get.errors(test$y, pred)
  
  return(list(pred = pred, err = err))
}
eval.model.lm <- function(data, train.pct = .8, max.degrees = 20) {
  eval.model(
    function(train, test, max.degrees) {
      get.predict.lm(train, test, max.degrees = max.degrees)
    },
    data,
    train.pct = train.pct,
    max.degrees = max.degrees
  )
}
plot.eval.model <- function(em) {
  plot(
    em$err$degree,
    em$err$error,
    type = "b",
    ylim = c(0, max(em$err$error)),
    xlab = "degree",
    ylab = "error",
    sub = paste0("Smallest error = ", round(min(em$err$error))),
    pch = 19,
    cex.axis = .8,
    cex.lab = .7
  )
}
em.lm <- eval.model.lm(data.pp)
plot.eval.model(em.lm)

When we try to apply some machine learning with train/test dataset, multinomial linear regression doesn’t succeed in predicting correct values.

This is because there is no linearity in the pattern (due to the specific pattern (x1, x2) => (x1 - x2)(x1 + x2).

Feature Engineering

Let’s add 2 extra features (f1 and f2).

data.ppf <- data.pp
data.ppf$f1 <- data.ppf$x1 + data.ppf$x2
data.ppf$f2 <- as.integer((data.ppf$x1 - data.ppf$x2) * 10 ^ ceiling(log10(data.ppf$f1)))
data.ppf$x1 <- NULL
data.ppf$x2 <- NULL
names(data.ppf) <- c("y", "x1", "x2")
head(data.ppf, 20)
plot.bt(data.ppf)

em.ppf <- eval.model.lm(data.ppf)
plot.eval.model(em.ppf)

While still far from being perfect, feature engineering helped minimizing error (1.450495e+07 vs 1.00194e+08). Note better approximation is not surprising as new features include brain teaser logic.

Neural Network

Let’s try with a neural network, supposed to better suit non-linear problems.

Train neural network

Resources:

library(nnet)
maxs <- apply(data.pp, 2, max)
mins <- apply(data.pp, 2, min)
tt.nn <- get.train.test(as.data.frame(scale(data.pp, center = mins, scale = maxs)))
model.nn <- nnet(
  y ~ x1 + x2,
  data = tt.nn$train,
  size = 5,
  maxit = 500,
  trace = F
)

Prediction

Even if neural network got a lower error (3.628796e+06), we are still far from perfection, as its capacity to find the correct prediction is poor (2 exact guesses for 2265 items).

Conclusion

This concludes my attempt to predict this brain teaser using machine learning.

My feeling is machine learning is better in finding approximations than predicting exact values, as there is always some uncertainty.

I’m convinced there is a better way to do, but I must admit my knowledge is to limited to go further.

Please do not hesitate to contact me if you have remarks or suggestions. They will be very welcome.

---
title: "Brain Teaser"
output:
  html_notebook: default
  html_document: default
  pdf_document: default
---

*Built using RStudio 1.0.35 notebook.*

## Problem

What is the pattern of the following equations?

- 6 + 4 = 210
- 9 + 2 = 711
- 8 + 5 = 313
- 5 + 2 = 37
- 7 + 6 = 113
- 9 + 8 = 117
- 10 + 6 = 416
- 15 + 3 = 1218

Source: http://www.analyticbridge.com/forum/topics/interesting-math-brain-teaser

It takes a human few seconds/minutes to find the solution.

But, can we solve this problem using machine learning?

## Machine Learning Solving

### Dataset

```{r dataset}
(data <- data.frame(x1 = c(6, 9, 8, 5, 7, 9, 10, 15), x2 = c(4, 2, 5, 2, 6, 8, 6, 3), y = c(210, 711, 313, 37, 113, 117, 416, 1218)))
```

### Multinomial Linear Regression

Let's start by coding few helpers.

#### Compute models

```{r compute_models}
get.predict <- function(model, train, test, max.degrees = 5) {
    lapply(
      1:max.degrees,
      function(d) {
        m <- model(train, d)
        suppressWarnings(round(predict(m, test)))
      }
    )
}

get.predict.lm <- function(train, test, max.degrees = 5) {
  get.predict(
    function(train, degree) {
      lm(y ~ poly(x1, degree = degree, raw = T) + poly(x2, degree = degree, raw = T), train)
    },
    train,
    test,
    max.degrees
  )
}
```

#### Compute errors

```{r compute_errors}
get.errors <- function(y, pred) {
  if (!is.list(pred)) {
    pred <- list(pred)
  }
  err <- lapply(
    pred,
    function(p) {
      sum((y - p) ^ 2) / length(y)
    }
  )
  df <- data.frame(
      degree = 1:length(err),
      error = unlist(err)
  )
  return(df)
}
```

#### Prediction

```{r}
pred <- get.predict.lm(data, data[, c("x1", "x2")])
(err <- get.errors(data$y, pred))
```

Only 4 degrees are required to correctly predict result (error = 0).

Reasons are:

- We used a (very) small dataset, which allows model to learn from it, and easily overfit.
- We predicted with the same dataset used to train the model, which is not a good way to do in machine learning.

### Dataset++

Let's generate a bigger dataset.

```{r dataset_plusplus}
get.dataset <- function(to = 20) {
    l <- lapply(
      1:to,
      function(x) {
        as.data.frame(
          t(
            sapply(
              1:x,
              function(y) {
                c(x, y, strtoi(paste0(ifelse(x == y, "", sprintf("%s", x - y)), sprintf("%s", x + y))))
              }
            )
          )
        )
      }
    )

    df <- do.call(rbind, l)
    names(df) <- c("x1", "x2", "y")
    
    return(df)
}
```

```{r}
# Will generate a dataset of 10585 samples
data.pp <- get.dataset(150)
```

```{r terrain_plot}
plot.bt <- function(d) {
  cols <- terrain.colors(max(d$y))
  plot(
    d$x1, d$x2,
    xlab = "x1", ylab = "x2",
    col = cols[d$y],
    pch = 19,
    cex.axis = .8,
    cex.lab = .7
  )
}
```

```{r}
plot.bt(data.pp)
```

```{r}
get.train.test <- function(data, train.pct = .8){
  train.idx <- sample(1:nrow(data), nrow(data) * train.pct)

  return(list(train = data[train.idx,], test = data[-train.idx,]))
}

eval.model <- function(f.pred, data, train.pct = .8, max.degrees = 20) {
  tt <- get.train.test(data, train.pct)
  train <- tt$train
  test <- tt$test
  
  pred <- f.pred(train, test[, c("x1", "x2")], max.degrees)
  err <- get.errors(test$y, pred)
  
  return(list(pred = pred, err = err))
}

eval.model.lm <- function(data, train.pct = .8, max.degrees = 20) {
  eval.model(
    function(train, test, max.degrees) {
      get.predict.lm(train, test, max.degrees = max.degrees)
    },
    data,
    train.pct = train.pct,
    max.degrees = max.degrees
  )
}

plot.eval.model <- function(em) {
  plot(
    em$err$degree,
    em$err$error,
    type = "b",
    ylim = c(0, max(em$err$error)),
    xlab = "degree",
    ylab = "error",
    sub = paste0("Smallest error = ", round(min(em$err$error))),
    pch = 19,
    cex.axis = .8,
    cex.lab = .7
  )
}
```

```{r}
em.lm <- eval.model.lm(data.pp)
plot.eval.model(em.lm)
```

When we try to apply some machine learning with train/test dataset, multinomial linear regression doesn't succeed in predicting correct values.

This is because there is **no linearity** in the pattern (due to the specific pattern **(x1, x2) => (x1 - x2)(x1 + x2)**.

### Feature Engineering

Let's add 2 extra features (f1 and f2).

```{r}
data.ppf <- data.pp
data.ppf$f1 <- data.ppf$x1 + data.ppf$x2
data.ppf$f2 <- as.integer((data.ppf$x1 - data.ppf$x2) * 10 ^ ceiling(log10(data.ppf$f1)))
data.ppf$x1 <- NULL
data.ppf$x2 <- NULL
names(data.ppf) <- c("y", "x1", "x2")
```

```{r}
head(data.ppf, 20)
```

```{r}
plot.bt(data.ppf)
```

```{r}
em.ppf <- eval.model.lm(data.ppf)
plot.eval.model(em.ppf)
```

While still far from being perfect, feature engineering helped minimizing error (`r format(min(em.ppf$err$error), scientific = T)` vs `r format(min(em.lm$err$error), scientific = T)`). Note better approximation is not surprising as new features include brain teaser logic.

### Neural Network

Let's try with a neural network, supposed to better suit non-linear problems.

#### Train neural network

Resources:

- http://datascienceplus.com/fitting-neural-network-in-r/
- http://www.di.fc.ul.pt/~jpn/r/neuralnets/neuralnets.html
- https://heuristically.wordpress.com/2011/11/17/using-neural-network-for-regression/

```{r}
library(nnet)

maxs <- apply(data.pp, 2, max)
mins <- apply(data.pp, 2, min)

tt.nn <- get.train.test(as.data.frame(scale(data.pp, center = mins, scale = maxs)))

model.nn <- nnet(
  y ~ x1 + x2,
  data = tt.nn$train,
  size = 20,
  maxit = 500,
  trace = F
)
```

#### Prediction

```{r}
pred.nn <- predict(model.nn, tt.nn$test[c("x1", "x2")])
```

```{r}
# Un-scale
nn.check <- as.data.frame(
  cbind(
    round(tt.nn$test$y * (maxs["y"] - mins["y"]) + mins["y"]),
    round(pred.nn * (maxs["y"] - mins["y"]) + mins["y"])
  )
)
names(nn.check) <- c("true", "guess")

err.nn <- get.errors(nn.check$true, nn.check$guess)

boxplot(
  nn.check$true - nn.check$guess,
  horizontal = T,
  main = "Estimation Error Distribution",
  sub = paste0("Error = ", round(err.nn$error)),
  cex.axis = .8,
  cex.lab = .7,
  col = "lightgreen"
)
```

Even if neural network got a lower error (`r format(err.nn$error, scientific = T)`), we are still far from perfection, as its capacity to find the correct prediction is poor (`r nrow(nn.check[nn.check$true == nn.check$guess,])` exact guesses for `r nrow(nn.check)` items).

## Conclusion

This concludes my attempt to predict this brain teaser using machine learning.

My feeling is machine learning is better in finding approximations than predicting exact values, as there is always some uncertainty.

I'm convinced there is a better way to do, but I must admit my knowledge is to limited to go further.

Please do not hesitate to contact me if you have remarks or suggestions. They will be very welcome.
