As your eyes move across this document the image hitting your retina is constantly shifting. Yet you hardly notice. One reason you do not, is because your brain recognizes letters regardless of their position in your field of view.
Consider the following image. Look first at the blue dot and then at the red. Notice that the number ‘2’ between them is recognizable regardless of your focus. This, despite the fact that the image is falling on a completely different set of neurons.
This is an important observation for anyone who wants to understand how the brain processes images and similarly for anyone that wants to teach a computer how to process images. In fact, the algorithms that are best at image recognition learn a representation of the world that considers many shifts of focus, what mathematicians call [translations][translations].
However, we all know that images do not just translate, they also reverse, rotate, and distort in many ways we have no words for. Yet though each of these remain perfectly recognizable to our vision systems.
Those afore mentioned algorithms do not consider all of these reasonable distortions, certainly far less than our brains do. Moreover, the rules of translation are programmed into the algorithms explicitly, they do not learn them the way that they learn to recognize digits.
The number of distortions can happen to an image is infinite, but that does not mean that all distortions are possible
Since hearing of the success of algorithms that rely on translation, I have wondered if such rules can be learned. Now I propose that they can be and submit the following experiment as evidence. In it I show that we can learn that flipping an image upside down is a valid transformation, but randomly re-arranging the pixels is not.
# Libraries
library(ggplot2)
library(reshape2)
library(plyr)
# Util
# TODO factor out theme_nothing
theme_nothing = function(...) theme(
axis.line=element_blank(),
axis.text.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank(),
legend.position="none",
panel.background=element_blank(),
panel.border=element_blank(),
panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
plot.background=element_blank(),
strip.text.x = element_blank(),
strip.background = element_blank())
order_factor_levels_by_frequency <- function(f) factor(f, levels=names(table(f)[order(table(f), decreasing=TRUE)]))
y <- function(pixel) -floor(pixel / 28) + 28
x <- function(pixel) pixel %% 28
concatenate_columns <- function(df) do.call("paste", df)
pixel <- function(x, y) (28 - y) * 28 + x
#pixel(0, 28) == 0
#pixel(17, 25) == 101
# Data
data <- "http://dl.dropboxusercontent.com/u/1131693/mnist_tiny.csv" # sample from http://www.pjreddie.com/projects/mnist-in-csv/
data <- read.csv(data)
# Format
# Convert the data to the following format:
#
# id (1 to Inf) a unique number for each image
# digit (0 to 9) the digit written in the image
# pixel (0 to 783) the position of the pixel
# x (0 to 28) x position of the pixel
# y (0 to 28) y position of the pixel
# intensity (0 or 1) binary whether the pixel is on or off
#
# e.g.
#
# id digit pixel intensity x y
# 1 0 0 0 0 0
# 1 0 1 0 0 1
digit <- data[,1]
data <- data[,2:ncol(data)]
data <- data.frame(llply(data, function(col) as.numeric(col > 100)))
names(data) <- c(0:783)
elongate <- function(d=data){
d <- d[,order(names(d))]
d$pattern = concatenate_columns(d)
d$id <- 1:(nrow(d))
d <- melt(d,
id.vars=c('id', 'pattern'),
variable.name='pixel',
value.name='intensity')
d$pixel <- as.numeric(as.character(d$pixel))
d <- d[order(d$id*1000 + d$pixel),]
d
}
long <- elongate()
head(long, 2)
## id
## 1 1
## 100 1
## pattern
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## pixel intensity
## 1 0 0
## 100 1 0
# Show sample digits
# Highlight areas of interest
g <- long[long$id <= 6,]
g$of_interest <- with(g, as.numeric(((x(pixel) == 13) & (y(pixel) %in% c(13:15)))))
ggplot(g, aes(x(pixel), y(pixel), alpha=intensity)) +
geom_tile() +
geom_tile(alpha=0.5, aes(fill=of_interest)) +
facet_wrap(~ id) +
theme_nothing()
# What are the common pixel patterns
# of three pixels one atop another?
g <- elongate(data[,pixel(x=13, y=13:15)])
ggplot(g, aes(x(pixel), y(pixel), alpha=intensity)) +
geom_tile() +
facet_wrap(~ pattern + id) +
theme_nothing()
# What if we flip them upside down?
# i.e.
# 1 3
# 2 -> 2
# 3 1
g <- data[,pixel(x=13, y=13:15)]
new_order <- c(3, 2, 1)
names(g) <- names(g)[new_order]
head(g, 2)
## 376 404 432
## 1 0 0 0
## 2 1 1 0
g <- elongate(g)
ggplot(g, aes(x(pixel), y(pixel), alpha=intensity)) +
geom_tile() +
facet_wrap(~ pattern + id) +
theme_nothing()
# What if we re-arrange them in an unlikely way?
# i.e.
# 1 1
# 2 -> 3
# 3 2
g <- data[,pixel(x=13, y=13:15)]
new_order <- c(1, 3, 2)
names(g) <- names(g)[new_order]
head(g, 2)
## 432 376 404
## 1 0 0 0
## 2 1 1 0
g <- elongate(g)
ggplot(g, aes(x(pixel), y(pixel), alpha=intensity)) +
geom_tile() +
facet_wrap(~ pattern + id) +
theme_nothing()