Attention is All You Need
Currently most of the structured processing relies on recurrent or convolutional models in a encoder-decoder configuration. The best performing models connect decoder with encoder via attention mechanism. I’ll describe my recent work at Google on a simple network architecture, the Transformer, based solely on attention mechanisms. Experiments on neural machine translation task show improvement over existing best results at fraction of training costs compared to previous state-of-the-art recurrent models.