Decoder-only Transformer Models

# Overview of Decoder-only Transformer Models #flashcard A transformer model, which only uses the decoder component. The decoder also produces a numerical representation of words, but uses different attention mechanisms. The decoder component has the following characteristics: - Uni-directional - only considers the words before a word in a sentence - Masked Self-attention - the numerical representation of a word does not consider words to the 'right' (i.e. later in the sentence). The subsequent words are 'masked'. This can be switched, so it only includes words on the left. - Auto-regressive - the word it generates can then act as an input back into the decoder  ## Examples - [[CTRL]] - [[GPT]] - [[Transformer XL]] # Key Considerations # Pros # Cons # Use Cases - Good for generative tasks - Text generation # Related Topics