Attention
Some well-explained blog articles on Attention Mechanism.
Attention and Augmented Recurrent Neural Networks
Reference: https://distill.pub/2016/augmented-rnns/
Our guess is that these “augmented RNNs” will have an important role to play in extending deep learning’s capabilities over the coming years.
Attention is All you need
Reference: https://arxiv.org/abs/1706.03762
Video Explanation: https://www.youtube.com/watch?v=iDulhoQ2pro
Attention? Attention!
Reference: https://lilianweng.github.io/lil-log/2018/06/24/attention-attention.html
Attention has been a fairly popular concept and a useful tool in the deep learning community in recent years. In this post, we are gonna look into how attention was invented, and various attention mechanisms and models, such as transformer and SNAIL.
TRANSFORMERS FROM SCRATCH
Reference: http://www.peterbloem.nl/blog/transformers
Transformers are a very exciting family of machine learning architectures. Many good tutorials exist (e.g. [1, 2]) but in the last few years, transformers have mostly become simpler, so that it is now much more straightforward to explain how modern architectures work. This post is an attempt to explain directly how modern transformers work, and why, without some of the historical baggage.