The graph below shows the amount of time it takes to train a network to classify
pictures from the ImageNet corpus (an image database) with a high degree of
accuracy. This metric is a proxy for the time it takes well-resourced actors in the AI
field to train large networks to perform AI tasks, such as image classification.
Because image classification is a relatively generic supervised learning task,
progress in this metric also correlates to faster training times for other AI
applications. In a year and a half, the time required to train a network has fallen
from about one hour to about 4 minutes.
The ImageNet training time metric also reflects the industrialization of AI research.
The factors that go into reducing ImageNet training time include: algorithmic
innovations and infrastructure investments (e.g., in the underlying hardware used
to train the system, or in the software used to connect this hardware together)
The graph below shows the performance of AI systems on a task to determine the
syntactic structure of sentences.
The parsing metric is the first step to understanding natural language in certain
tasks, such as question answering. Originally done using the algorithms similar to
those used for parsing programming languages, it is now almost universally done
using deep learning. Since 2003, F1 scores for all sentences have increased by 9
percentage points (or a 10% increase)
The graph below shows the performance of AI systems on a task to translate the
news from English to German and German to English.
BLEU scores for translations from English to German are 3.5x greater today than
they were in 2008. Translations from German to English are 2.5x greater over that
same time frame. Because each year uses different test sets, scores are not
perfectly comparable across years (we believe this contributes to the drop in 2017
— see more in the appendix). Still, BLEU scores indicate progress in machine