A question I've asked myself repeatedly. It's always interesting when a new deep learning architecture is able to beat the state of the art. The MNIST dataset has 10,000 images in the test set. At the time of writing Hinton's capsule networks has achieved the state of the art with 0.25% test error. This translates to 25 misclassified digits. Not bad at all. But what do these digits look like? How does this compare to human performance?

In this blog post I'm going to try to gain some intuition on how good the state of the art is compared to human performance by looking at misclassified MNIST digits using a simple convnet written in keras.

In [1]:

#hide
import keras
from keras import backend as K
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D, BatchNormalization

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

num_classes = 10
img_rows, img_cols = 28, 28

Using TensorFlow backend.