# Overview of Decoder-only Transformer Models #flashcard
A transformer model, which only uses the decoder component. The decoder also produces a numerical representation of words, but uses different attention mechanisms.
The decoder component has the following characteristics:
- Uni-directional - only considers the words before a word in a sentence
- Masked Self-attention - the numerical representation of a word does not consider words to the 'right' (i.e. later in the sentence). The subsequent words are 'masked'. This can be switched, so it only includes words on the left.
- Auto-regressive - the word it generates can then act as an input back into the decoder
<!--ID: 1750528252711-->
## Examples
- [[CTRL]]
- [[GPT]]
- [[Transformer XL]]
# Key Considerations
# Pros
# Cons
# Use Cases
- Good for generative tasks
- Text generation
# Related Topics