Attention is all you get

For the past decade, there has been a new major architectural fad in deep learning every year or two.
One such fad for the past two years has been the transformer model, an implementation of the attention method which has superseded RNNs in most sequence learning applications. I'll give an overview of the model, with some discussion of non-physics applications, and intimate some possibilities for physics.