Transformers

GPT Neo

概述

GPTNeo模型由Sid Black、Stella Biderman、Leo Gao、Phil Wang和Connor Leahy在EleutherAI/gpt-neo仓库中发布。它是一个类似于GPT2的因果语言模型，基于Pile数据集进行训练。

该架构与GPT2类似，不同之处在于GPT Neo在每隔一层中使用局部注意力，窗口大小为256个标记。

该模型由valhalla贡献。

使用示例

generate() 方法可用于使用 GPT Neo 模型生成文本。

>>> from transformers import GPTNeoForCausalLM, GPT2Tokenizer

>>> model = GPTNeoForCausalLM.from_pretrained("EleutherAI/gpt-neo-1.3B")
>>> tokenizer = GPT2Tokenizer.from_pretrained("EleutherAI/gpt-neo-1.3B")

>>> prompt = (
...     "In a shocking finding, scientists discovered a herd of unicorns living in a remote, "
...     "previously unexplored valley, in the Andes Mountains. Even more surprising to the "
...     "researchers was the fact that the unicorns spoke perfect English."
... )

>>> input_ids = tokenizer(prompt, return_tensors="pt").input_ids

>>> gen_tokens = model.generate(
...     input_ids,
...     do_sample=True,
...     temperature=0.9,
...     max_length=100,
... )
>>> gen_text = tokenizer.batch_decode(gen_tokens)[0]

结合 GPT-Neo 和 Flash Attention 2

首先，确保安装最新版本的 Flash Attention 2 以包含滑动窗口注意力功能，并确保您的硬件与 Flash-Attention 2 兼容。有关安装的更多详细信息，请参阅此处。

确保以半精度加载模型（例如 torch.float16）。

要使用Flash Attention 2加载并运行模型，请参考以下代码片段：

>>> import torch
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> device = "cuda" # the device to load the model onto

>>> model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-2.7B", torch_dtype=torch.float16, attn_implementation="flash_attention_2")
>>> tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-2.7B")

>>> prompt = "def hello_world():"

>>> model_inputs = tokenizer([prompt], return_tensors="pt").to(device)
>>> model.to(device)

>>> generated_ids = model.generate(**model_inputs, max_new_tokens=100, do_sample=True)
>>> tokenizer.batch_decode(generated_ids)[0]
"def hello_world():\n    >>> run_script("hello.py")\n    >>> exit(0)\n<|endoftext|>"

预期的加速

下面是一个预期的加速图，比较了使用EleutherAI/gpt-neo-2.7B检查点的transformers原生实现与Flash Attention 2版本模型的纯推理时间。请注意，对于GPT-Neo来说，无法在非常长的上下文中进行训练/运行，因为最大位置嵌入限制为2048 - 但这适用于所有gpt-neo模型，并非特定于FA-2。

Transformers

GPT Neo

概述

使用示例

结合 GPT-Neo 和 Flash Attention 2

预期的加速

资源

GPTNeoConfig

类 transformers.GPTNeoConfig

GPTNeoModel

类 transformers.GPTNeoModel

前进

GPTNeoForCausalLM

类 transformers.GPTNeoForCausalLM

前进

GPTNeoForQuestionAnswering

类 transformers.GPTNeoForQuestionAnswering

前进

GPTNeoForSequenceClassification

类 transformers.GPTNeoForSequenceClassification

前进

GPTNeoForTokenClassification

类 transformers.GPTNeoForTokenClassification

前进

FlaxGPTNeoModel

类 transformers.FlaxGPTNeoModel

__call__

FlaxGPTNeoForCausalLM

类 transformers.FlaxGPTNeoForCausalLM

__call__

call

call