vllm.inputs

模块：

名称	描述
`data`
`parse`
`preprocess`
`registry`

DecoderOnlyInputs `module-attribute` ¶

DecoderOnlyInputs = Union[
    TokenInputs, EmbedsInputs, "MultiModalInputs"
]

LLMEngine 中在传递给模型执行器之前的输入。这指定了仅解码器模型所需的数据。

INPUT_REGISTRY `module-attribute` ¶

INPUT_REGISTRY = InputRegistry()

全局的InputRegistry，被LLMEngine用于根据目标模型分发数据处理。

ProcessorInputs `module-attribute` ¶

ProcessorInputs = Union[
    DecoderOnlyInputs, EncoderDecoderInputs
]

vllm.inputs.preprocess.InputPreprocessor 的输出结果。

PromptType `module-attribute` ¶

PromptType = Union[
    SingletonPrompt, ExplicitEncoderDecoderPrompt
]

LLM输入可能采用的架构集合，包括仅解码器和编码器/解码器输入类型：

文本提示（str 或 TextPrompt）
经过分词的提示词 (TokensPrompt)
一个嵌入提示 (EmbedsPrompt)
一个单一的数据结构，同时包含编码器和解码器提示（ExplicitEncoderDecoderPrompt）

SingletonInputs `module-attribute` ¶

SingletonInputs = Union[
    TokenInputs, EmbedsInputs, "MultiModalInputs"
]

一个处理过的SingletonPrompt，可以传递给vllm.sequence.Sequence。

SingletonPrompt `module-attribute` ¶

SingletonPrompt = Union[
    str, TextPrompt, TokensPrompt, EmbedsPrompt
]

单个提示的可能模式集合：

文本提示（str 或 TextPrompt）
经过分词的提示词 (TokensPrompt)
一个嵌入提示 (EmbedsPrompt)

请注意，"singleton"（单例）与封装多个提示的数据结构相对，即当用户需要明确表达编码器和解码器提示时，可用于编码器/解码器模型的那类结构，例如ExplicitEncoderDecoderPrompt

类型为SingletonPrompt的提示词可用于以下场景：(1) 作为仅解码器模型的输入，(2) 在未显式指定解码器提示的情况下，作为编码器/解码器模型中编码器的输入，或 (3) 作为封装多个提示的更大数据结构（如ExplicitEncoderDecoderPrompt）的成员

all `module-attribute` ¶

__all__ = [
    "TextPrompt",
    "TokensPrompt",
    "PromptType",
    "SingletonPrompt",
    "ExplicitEncoderDecoderPrompt",
    "TokenInputs",
    "EmbedsInputs",
    "token_inputs",
    "embeds_inputs",
    "DecoderOnlyInputs",
    "EncoderDecoderInputs",
    "ProcessorInputs",
    "SingletonInputs",
    "build_explicit_enc_dec_prompt",
    "to_enc_dec_tuple_list",
    "zip_enc_dec_prompts",
    "INPUT_REGISTRY",
    "DummyData",
    "InputContext",
    "InputProcessingContext",
    "InputRegistry",
]

虚拟数据 ¶

基类: NamedTuple

用于性能分析的模拟数据。

注意：这仅在V0版本中使用。

Source code in vllm/inputs/registry.py

class DummyData(NamedTuple):
    """
    Dummy data used for profiling.

    Note: This is only used in V0.
    """

    seq_data: SequenceData
    multi_modal_data: Optional[MultiModalDataDict] = None
    multi_modal_placeholders: Optional[MultiModalPlaceholderDict] = None

multi_modal_data `class-attribute` `instance-attribute` ¶

multi_modal_data: Optional[MultiModalDataDict] = None

multi_modal_placeholders `class-attribute` `instance-attribute` ¶

multi_modal_placeholders: Optional[
    MultiModalPlaceholderDict
] = None

seq_data `instance-attribute` ¶

seq_data: SequenceData

EmbedsInputs ¶

基类: TypedDict

表示基于嵌入的输入。

Source code in vllm/inputs/data.py

class EmbedsInputs(TypedDict):
    """Represents embeddings-based inputs."""

    type: Literal["embeds"]
    """The type of inputs."""

    prompt_embeds: torch.Tensor
    """The embeddings of the prompt."""

    cache_salt: NotRequired[str]
    """
    Optional cache salt to be used for prefix caching.
    """

cache_salt `instance-attribute` ¶

cache_salt: NotRequired[str]

用于前缀缓存的可选缓存盐值。

prompt_embeds `instance-attribute` ¶

prompt_embeds: Tensor

提示词的嵌入向量。

类型 `instance-attribute` ¶

type: Literal['embeds']

输入的类型。

EncoderDecoderInputs ¶

基类: TypedDict

LLMEngine 中的输入数据在传递给模型执行器之前的状态。

这指定了编码器-解码器模型所需的数据。

Source code in vllm/inputs/data.py

class EncoderDecoderInputs(TypedDict):
    """
    The inputs in [`LLMEngine`][vllm.engine.llm_engine.LLMEngine] before they
    are passed to the model executor.

    This specifies the required data for encoder-decoder models.
    """

    encoder: Union[TokenInputs, "MultiModalInputs"]
    """The inputs for the encoder portion."""

    decoder: Union[TokenInputs, "MultiModalInputs"]
    """The inputs for the decoder portion."""

解码器 `instance-attribute` ¶

decoder: Union[TokenInputs, MultiModalInputs]

解码器部分的输入。

编码器 `instance-attribute` ¶

encoder: Union[TokenInputs, MultiModalInputs]

编码器部分的输入。

ExplicitEncoderDecoderPrompt ¶

基类: TypedDict, Generic[_T1_co, _T2_co]

表示一个编码器/解码器模型的输入提示，包含显式的编码器提示和解码器提示。

编码器和解码器的提示词可以分别按照SingletonPrompt中的任意一种模式进行格式化，且不要求采用相同的模式。

只有编码器提示可能包含多模态数据。mm_processor_kwargs应位于顶层，不应在编码器/解码器提示中设置，因为它们与编码器/解码器无关。

请注意，ExplicitEncoderDecoderPrompt不能作为仅解码器模型的输入，且该数据结构的encoder_prompt和decoder_prompt字段本身必须是SingletonPrompt实例。

Source code in vllm/inputs/data.py

class ExplicitEncoderDecoderPrompt(TypedDict, Generic[_T1_co, _T2_co]):
    """
    Represents an encoder/decoder model input prompt,
    comprising an explicit encoder prompt and a decoder prompt.

    The encoder and decoder prompts, respectively, may be formatted
    according to any of the
    [`SingletonPrompt`][vllm.inputs.data.SingletonPrompt] schemas,
    and are not required to have the same schema.

    Only the encoder prompt may have multi-modal data. mm_processor_kwargs
    should be at the top-level, and should not be set in the encoder/decoder
    prompts, since they are agnostic to the encoder/decoder.

    Note that an
    [`ExplicitEncoderDecoderPrompt`][vllm.inputs.data.ExplicitEncoderDecoderPrompt]
    may not be used as an input to a decoder-only model,
    and that the `encoder_prompt` and `decoder_prompt`
    fields of this data structure themselves must be
    [`SingletonPrompt`][vllm.inputs.data.SingletonPrompt] instances.
    """

    encoder_prompt: _T1_co

    decoder_prompt: Optional[_T2_co]

    mm_processor_kwargs: NotRequired[dict[str, Any]]

decoder_prompt `instance-attribute` ¶

decoder_prompt: Optional[_T2_co]

encoder_prompt `instance-attribute` ¶

encoder_prompt: _T1_co

mm_processor_kwargs `instance-attribute` ¶

mm_processor_kwargs: NotRequired[dict[str, Any]]

输入上下文 `dataclass` ¶

包含有关模型的信息，可用于修改输入。

Source code in vllm/inputs/registry.py

@dataclass(frozen=True)
class InputContext:
    """
    Contains information about the model which may be used to
    modify the inputs.
    """

    model_config: ModelConfig
    """The configuration of the model."""

    def get_hf_config(
        self,
        typ: Union[type[_C], tuple[type[_C], ...]] = PretrainedConfig,
        /,
    ) -> _C:
        """
        Get the HuggingFace configuration
        (`transformers.PretrainedConfig`) of the model,
        additionally checking its type.

        Raises:
            TypeError: If the configuration is not of the specified type.
        """
        hf_config = self.model_config.hf_config
        if not isinstance(hf_config, typ):
            raise TypeError("Invalid type of HuggingFace config. "
                            f"Expected type: {typ}, but "
                            f"found type: {type(hf_config)}")

        return hf_config

    def get_hf_image_processor_config(self) -> dict[str, Any]:
        """
        Get the HuggingFace image processor configuration of the model.
        """
        return self.model_config.hf_image_processor_config

    def get_mm_config(self):
        """
        Get the multimodal config of the model.

        Raises:
            RuntimeError: If the model is not a multimodal model.
        """
        mm_config = self.model_config.multimodal_config
        if mm_config is None:
            raise RuntimeError("Not a multimodal model")

        return mm_config

    def get_hf_processor(
        self,
        typ: Union[type[_P], tuple[type[_P], ...]] = ProcessorMixin,
        /,
        **kwargs: object,
    ) -> _P:
        """
        Get the HuggingFace processor
        (`transformers.ProcessorMixin`) of the model,
        additionally checking its type.

        Raises:
            TypeError: If the processor is not of the specified type.
        """
        return cached_processor_from_config(
            self.model_config,
            processor_cls=typ,
            **kwargs,
        )

    def init_processor(
        self,
        typ: type[_T],
        /,
        **kwargs: object,
    ) -> _T:
        """
        Initialize a HuggingFace-like processor class, merging the
        keyword arguments with those in the model's configuration.
        """
        mm_config = self.model_config.get_multimodal_config()
        base_kwargs = mm_config.mm_processor_kwargs
        if base_kwargs is None:
            base_kwargs = {}

        merged_kwargs = {**base_kwargs, **kwargs}

        return typ(**merged_kwargs)

model_config `instance-attribute` ¶

model_config: ModelConfig

模型的配置。

init ¶

__init__(model_config: ModelConfig) -> None

get_hf_config ¶

get_hf_config(
    typ: Union[
        type[_C], tuple[type[_C], ...]
    ] = PretrainedConfig,
) -> _C

获取模型的HuggingFace配置(transformers.PretrainedConfig)，并额外检查其类型。

抛出异常：

类型	描述
`TypeError`	如果配置不是指定的类型。

Source code in vllm/inputs/registry.py

def get_hf_config(
    self,
    typ: Union[type[_C], tuple[type[_C], ...]] = PretrainedConfig,
    /,
) -> _C:
    """
    Get the HuggingFace configuration
    (`transformers.PretrainedConfig`) of the model,
    additionally checking its type.

    Raises:
        TypeError: If the configuration is not of the specified type.
    """
    hf_config = self.model_config.hf_config
    if not isinstance(hf_config, typ):
        raise TypeError("Invalid type of HuggingFace config. "
                        f"Expected type: {typ}, but "
                        f"found type: {type(hf_config)}")

    return hf_config

get_hf_image_processor_config ¶

get_hf_image_processor_config() -> dict[str, Any]

获取模型的HuggingFace图像处理器配置。

Source code in vllm/inputs/registry.py

def get_hf_image_processor_config(self) -> dict[str, Any]:
    """
    Get the HuggingFace image processor configuration of the model.
    """
    return self.model_config.hf_image_processor_config

get_hf_processor ¶

get_hf_processor(
    typ: Union[
        type[_P], tuple[type[_P], ...]
    ] = ProcessorMixin,
    /,
    **kwargs: object,
) -> _P

获取模型的HuggingFace处理器(transformers.ProcessorMixin)，并额外检查其类型。

抛出异常：

类型	描述
`TypeError`	如果处理器不是指定类型。

Source code in vllm/inputs/registry.py

def get_hf_processor(
    self,
    typ: Union[type[_P], tuple[type[_P], ...]] = ProcessorMixin,
    /,
    **kwargs: object,
) -> _P:
    """
    Get the HuggingFace processor
    (`transformers.ProcessorMixin`) of the model,
    additionally checking its type.

    Raises:
        TypeError: If the processor is not of the specified type.
    """
    return cached_processor_from_config(
        self.model_config,
        processor_cls=typ,
        **kwargs,
    )

get_mm_config ¶

get_mm_config()

获取模型的多模态配置。

抛出异常：

类型	描述
`RuntimeError`	如果模型不是多模态模型。

Source code in vllm/inputs/registry.py

def get_mm_config(self):
    """
    Get the multimodal config of the model.

    Raises:
        RuntimeError: If the model is not a multimodal model.
    """
    mm_config = self.model_config.multimodal_config
    if mm_config is None:
        raise RuntimeError("Not a multimodal model")

    return mm_config

init_processor ¶

init_processor(typ: type[_T], /, **kwargs: object) -> _T

初始化一个类似HuggingFace的处理器类，将关键字参数与模型配置中的参数合并。

Source code in vllm/inputs/registry.py

def init_processor(
    self,
    typ: type[_T],
    /,
    **kwargs: object,
) -> _T:
    """
    Initialize a HuggingFace-like processor class, merging the
    keyword arguments with those in the model's configuration.
    """
    mm_config = self.model_config.get_multimodal_config()
    base_kwargs = mm_config.mm_processor_kwargs
    if base_kwargs is None:
        base_kwargs = {}

    merged_kwargs = {**base_kwargs, **kwargs}

    return typ(**merged_kwargs)

InputProcessingContext `dataclass` ¶

基础类: InputContext

Source code in vllm/inputs/registry.py

@dataclass(frozen=True)
class InputProcessingContext(InputContext):
    tokenizer: AnyTokenizer
    """The tokenizer used to tokenize the inputs."""

    def get_hf_processor(
        self,
        typ: Union[type[_P], tuple[type[_P], ...]] = ProcessorMixin,
        /,
        **kwargs: object,
    ) -> _P:
        return super().get_hf_processor(
            typ,
            tokenizer=self.tokenizer,
            **kwargs,
        )

    def call_hf_processor(
        self,
        hf_processor: ProcessorMixin,
        data: Mapping[str, object],
        kwargs: Mapping[str, object] = {},
    ) -> Union[BatchFeature, JSONTree]:
        """
        Call `hf_processor` on the prompt `data`
        (text, image, audio...) with configurable options `kwargs`.
        """
        assert callable(hf_processor)

        mm_config = self.model_config.get_multimodal_config()
        merged_kwargs = mm_config.merge_mm_processor_kwargs(kwargs)

        allowed_kwargs = get_allowed_kwarg_only_overrides(
            hf_processor,
            merged_kwargs,
            requires_kw_only=False,
            allow_var_kwargs=True,
        )

        def maybe_cast_dtype(x):
            # This mimics the behavior of transformers.BatchFeature
            if isinstance(x, torch.Tensor) and x.is_floating_point():
                return x.to(dtype=self.model_config.dtype)
            return x

        try:
            output = hf_processor(**data,
                                  **allowed_kwargs,
                                  return_tensors="pt")
            # this emulates output.to(dtype=self.model_config.dtype)
            if isinstance(output, BatchFeature):
                cast_output = json_map_leaves(maybe_cast_dtype, output.data)
                return BatchFeature(cast_output)

            cast_output = json_map_leaves(maybe_cast_dtype, output)

            logger.warning_once(
                f"{type(hf_processor).__name__} did not return `BatchFeature`. "
                "Make sure to match the behaviour of `ProcessorMixin` when "
                "implementing custom processors.")
            return cast_output

        except Exception as exc:
            msg = (f"Failed to apply {type(hf_processor).__name__} "
                   f"on data={data} with kwargs={allowed_kwargs}")

            raise ValueError(msg) from exc

tokenizer `instance-attribute` ¶

tokenizer: AnyTokenizer

用于对输入进行分词的标记器。

init ¶

__init__(
    model_config: ModelConfig, tokenizer: AnyTokenizer
) -> None

call_hf_processor ¶

call_hf_processor(
    hf_processor: ProcessorMixin,
    data: Mapping[str, object],
    kwargs: Mapping[str, object] = {},
) -> Union[BatchFeature, JSONTree]

使用可配置选项kwargs对提示data(文本、图像、音频等)调用hf_processor。

Source code in vllm/inputs/registry.py

def call_hf_processor(
    self,
    hf_processor: ProcessorMixin,
    data: Mapping[str, object],
    kwargs: Mapping[str, object] = {},
) -> Union[BatchFeature, JSONTree]:
    """
    Call `hf_processor` on the prompt `data`
    (text, image, audio...) with configurable options `kwargs`.
    """
    assert callable(hf_processor)

    mm_config = self.model_config.get_multimodal_config()
    merged_kwargs = mm_config.merge_mm_processor_kwargs(kwargs)

    allowed_kwargs = get_allowed_kwarg_only_overrides(
        hf_processor,
        merged_kwargs,
        requires_kw_only=False,
        allow_var_kwargs=True,
    )

    def maybe_cast_dtype(x):
        # This mimics the behavior of transformers.BatchFeature
        if isinstance(x, torch.Tensor) and x.is_floating_point():
            return x.to(dtype=self.model_config.dtype)
        return x

    try:
        output = hf_processor(**data,
                              **allowed_kwargs,
                              return_tensors="pt")
        # this emulates output.to(dtype=self.model_config.dtype)
        if isinstance(output, BatchFeature):
            cast_output = json_map_leaves(maybe_cast_dtype, output.data)
            return BatchFeature(cast_output)

        cast_output = json_map_leaves(maybe_cast_dtype, output)

        logger.warning_once(
            f"{type(hf_processor).__name__} did not return `BatchFeature`. "
            "Make sure to match the behaviour of `ProcessorMixin` when "
            "implementing custom processors.")
        return cast_output

    except Exception as exc:
        msg = (f"Failed to apply {type(hf_processor).__name__} "
               f"on data={data} with kwargs={allowed_kwargs}")

        raise ValueError(msg) from exc

get_hf_processor ¶

get_hf_processor(
    typ: Union[
        type[_P], tuple[type[_P], ...]
    ] = ProcessorMixin,
    /,
    **kwargs: object,
) -> _P

Source code in vllm/inputs/registry.py

def get_hf_processor(
    self,
    typ: Union[type[_P], tuple[type[_P], ...]] = ProcessorMixin,
    /,
    **kwargs: object,
) -> _P:
    return super().get_hf_processor(
        typ,
        tokenizer=self.tokenizer,
        **kwargs,
    )

InputRegistry ¶

注意：这仅在V0版本中使用。

Source code in vllm/inputs/registry.py

class InputRegistry:
    """
    Note: This is only used in V0.
    """

    def dummy_data_for_profiling(
        self,
        model_config: ModelConfig,
        seq_len: int,
        mm_registry: MultiModalRegistry,
        is_encoder_data: bool = False,
    ) -> DummyData:
        """
        Create dummy data for profiling the memory usage of a model.

        The model is identified by ``model_config``.
        """
        # Avoid circular import
        from vllm.sequence import SequenceData

        if not model_config.is_multimodal_model:
            seq_data = SequenceData.from_prompt_token_counts((0, seq_len))
            return DummyData(seq_data=seq_data)

        # Encoder dummy data does not contain multi-modal data
        if is_encoder_data:
            enc_data = mm_registry.get_encoder_dummy_data(
                model_config, seq_len)
            seq_data = SequenceData.from_seqs(enc_data.prompt_token_ids)
            return DummyData(seq_data=seq_data)

        dec_data = mm_registry.get_decoder_dummy_data(model_config, seq_len)

        return DummyData(
            seq_data=SequenceData.from_seqs(dec_data.prompt_token_ids),
            multi_modal_data=dec_data.multi_modal_data,
            multi_modal_placeholders=dec_data.multi_modal_placeholders,
        )

dummy_data_for_profiling ¶

dummy_data_for_profiling(
    model_config: ModelConfig,
    seq_len: int,
    mm_registry: MultiModalRegistry,
    is_encoder_data: bool = False,
) -> DummyData

为分析模型的内存使用情况创建模拟数据。

模型由model_config标识。

Source code in vllm/inputs/registry.py

def dummy_data_for_profiling(
    self,
    model_config: ModelConfig,
    seq_len: int,
    mm_registry: MultiModalRegistry,
    is_encoder_data: bool = False,
) -> DummyData:
    """
    Create dummy data for profiling the memory usage of a model.

    The model is identified by ``model_config``.
    """
    # Avoid circular import
    from vllm.sequence import SequenceData

    if not model_config.is_multimodal_model:
        seq_data = SequenceData.from_prompt_token_counts((0, seq_len))
        return DummyData(seq_data=seq_data)

    # Encoder dummy data does not contain multi-modal data
    if is_encoder_data:
        enc_data = mm_registry.get_encoder_dummy_data(
            model_config, seq_len)
        seq_data = SequenceData.from_seqs(enc_data.prompt_token_ids)
        return DummyData(seq_data=seq_data)

    dec_data = mm_registry.get_decoder_dummy_data(model_config, seq_len)

    return DummyData(
        seq_data=SequenceData.from_seqs(dec_data.prompt_token_ids),
        multi_modal_data=dec_data.multi_modal_data,
        multi_modal_placeholders=dec_data.multi_modal_placeholders,
    )

TextPrompt ¶

基类: TypedDict

文本提示的架构。

Source code in vllm/inputs/data.py

class TextPrompt(TypedDict):
    """Schema for a text prompt."""

    prompt: str
    """The input text to be tokenized before passing to the model."""

    multi_modal_data: NotRequired["MultiModalDataDict"]
    """
    Optional multi-modal data to pass to the model,
    if the model supports it.
    """

    mm_processor_kwargs: NotRequired[dict[str, Any]]
    """
    Optional multi-modal processor kwargs to be forwarded to the
    multimodal input mapper & processor. Note that if multiple modalities
    have registered mappers etc for the model being considered, we attempt
    to pass the mm_processor_kwargs to each of them.
    """

    cache_salt: NotRequired[str]
    """
    Optional cache salt to be used for prefix caching.
    """

cache_salt `instance-attribute` ¶

cache_salt: NotRequired[str]

用于前缀缓存的可选缓存盐值。

mm_processor_kwargs `instance-attribute` ¶

mm_processor_kwargs: NotRequired[dict[str, Any]]

可选的多模态处理器参数，将转发给多模态输入映射器和处理器。请注意，如果针对所考虑的模型注册了多个模态的映射器等，我们会尝试将mm_processor_kwargs参数传递给每个映射器。

multi_modal_data `instance-attribute` ¶

multi_modal_data: NotRequired[MultiModalDataDict]

如果模型支持，可传递给模型的可选多模态数据。

提示词 `instance-attribute` ¶

prompt: str

传递给模型之前需要进行分词的输入文本。

TokenInputs ¶

基类: TypedDict

表示基于令牌的输入。

Source code in vllm/inputs/data.py

class TokenInputs(TypedDict):
    """Represents token-based inputs."""

    type: Literal["token"]
    """The type of inputs."""

    prompt_token_ids: list[int]
    """The token IDs of the prompt."""

    token_type_ids: NotRequired[list[int]]
    """The token type IDs of the prompt."""

    prompt: NotRequired[str]
    """
    The original prompt text corresponding to the token IDs, if available.
    """

    cache_salt: NotRequired[str]
    """
    Optional cache salt to be used for prefix caching.
    """

cache_salt `instance-attribute` ¶

cache_salt: NotRequired[str]

用于前缀缓存的可选缓存盐值。

提示词 `instance-attribute` ¶

prompt: NotRequired[str]

与令牌ID对应的原始提示文本（如果可用）。

prompt_token_ids `instance-attribute` ¶

prompt_token_ids: list[int]

提示词(prompt)的token ID。

token_type_ids `instance-attribute` ¶

token_type_ids: NotRequired[list[int]]

提示词(token)的类型ID。

类型 `instance-attribute` ¶

type: Literal['token']

输入的类型。

TokensPrompt ¶

基类: TypedDict

标记化提示的模式。

Source code in vllm/inputs/data.py

class TokensPrompt(TypedDict):
    """Schema for a tokenized prompt."""

    prompt_token_ids: list[int]
    """A list of token IDs to pass to the model."""

    token_type_ids: NotRequired[list[int]]
    """A list of token type IDs to pass to the cross encoder model."""

    multi_modal_data: NotRequired["MultiModalDataDict"]
    """
    Optional multi-modal data to pass to the model,
    if the model supports it.
    """

    mm_processor_kwargs: NotRequired[dict[str, Any]]
    """
    Optional multi-modal processor kwargs to be forwarded to the
    multimodal input mapper & processor. Note that if multiple modalities
    have registered mappers etc for the model being considered, we attempt
    to pass the mm_processor_kwargs to each of them.
    """

    cache_salt: NotRequired[str]
    """
    Optional cache salt to be used for prefix caching.
    """

cache_salt `instance-attribute` ¶

cache_salt: NotRequired[str]

用于前缀缓存的可选缓存盐值。

mm_processor_kwargs `instance-attribute` ¶

mm_processor_kwargs: NotRequired[dict[str, Any]]

可选的多模态处理器参数，将转发给多模态输入映射器和处理器。请注意，如果针对所考虑的模型注册了多个模态的映射器等，我们会尝试将mm_processor_kwargs参数传递给每个映射器。

multi_modal_data `instance-attribute` ¶

multi_modal_data: NotRequired[MultiModalDataDict]

如果模型支持，可传递给模型的可选多模态数据。

prompt_token_ids `instance-attribute` ¶

prompt_token_ids: list[int]

传递给模型的令牌ID列表。

token_type_ids `instance-attribute` ¶

token_type_ids: NotRequired[list[int]]

传递给交叉编码器模型的token类型ID列表。

build_explicit_enc_dec_prompt ¶

build_explicit_enc_dec_prompt(
    encoder_prompt: _T1,
    decoder_prompt: Optional[_T2],
    mm_processor_kwargs: Optional[dict[str, Any]] = None,
) -> ExplicitEncoderDecoderPrompt[_T1, _T2]

Source code in vllm/inputs/data.py

def build_explicit_enc_dec_prompt(
    encoder_prompt: _T1,
    decoder_prompt: Optional[_T2],
    mm_processor_kwargs: Optional[dict[str, Any]] = None,
) -> ExplicitEncoderDecoderPrompt[_T1, _T2]:
    if mm_processor_kwargs is None:
        mm_processor_kwargs = {}
    return ExplicitEncoderDecoderPrompt(
        encoder_prompt=encoder_prompt,
        decoder_prompt=decoder_prompt,
        mm_processor_kwargs=mm_processor_kwargs,
    )

embeds_inputs ¶

embeds_inputs(
    prompt_embeds: Tensor, cache_salt: Optional[str] = None
) -> EmbedsInputs

从可选值构建EmbedsInputs。

Source code in vllm/inputs/data.py

def embeds_inputs(
    prompt_embeds: torch.Tensor,
    cache_salt: Optional[str] = None,
) -> EmbedsInputs:
    """Construct [`EmbedsInputs`][vllm.inputs.data.EmbedsInputs] from optional
    values."""
    inputs = EmbedsInputs(type="embeds", prompt_embeds=prompt_embeds)

    if cache_salt is not None:
        inputs["cache_salt"] = cache_salt

    return inputs

to_enc_dec_tuple_list ¶

to_enc_dec_tuple_list(
    enc_dec_prompts: Iterable[
        ExplicitEncoderDecoderPrompt[_T1, _T2]
    ],
) -> list[tuple[_T1, Optional[_T2]]]

Source code in vllm/inputs/data.py

def to_enc_dec_tuple_list(
    enc_dec_prompts: Iterable[ExplicitEncoderDecoderPrompt[_T1, _T2]],
) -> list[tuple[_T1, Optional[_T2]]]:
    return [(enc_dec_prompt["encoder_prompt"],
             enc_dec_prompt["decoder_prompt"])
            for enc_dec_prompt in enc_dec_prompts]

token_inputs ¶

token_inputs(
    prompt_token_ids: list[int],
    token_type_ids: Optional[list[int]] = None,
    prompt: Optional[str] = None,
    cache_salt: Optional[str] = None,
) -> TokenInputs

从可选值构建TokenInputs。

Source code in vllm/inputs/data.py

def token_inputs(
    prompt_token_ids: list[int],
    token_type_ids: Optional[list[int]] = None,
    prompt: Optional[str] = None,
    cache_salt: Optional[str] = None,
) -> TokenInputs:
    """Construct [`TokenInputs`][vllm.inputs.data.TokenInputs] from optional
    values."""
    inputs = TokenInputs(type="token", prompt_token_ids=prompt_token_ids)

    if prompt is not None:
        inputs["prompt"] = prompt
    if token_type_ids is not None:
        inputs["token_type_ids"] = token_type_ids
    if cache_salt is not None:
        inputs["cache_salt"] = cache_salt

    return inputs

zip_enc_dec_prompts ¶

zip_enc_dec_prompts(
    enc_prompts: Iterable[_T1],
    dec_prompts: Iterable[Optional[_T2]],
    mm_processor_kwargs: Optional[
        Union[Iterable[dict[str, Any]], dict[str, Any]]
    ] = None,
) -> list[ExplicitEncoderDecoderPrompt[_T1, _T2]]

将编码器和解码器的提示压缩成一个ExplicitEncoderDecoderPrompt实例列表。

mm_processor_kwargs 也可以提供；如果传入的是字典，则相同的字典将用于每个编码器/解码器提示。如果提供的是可迭代对象，它将与编码器/解码器提示一起压缩。

Source code in vllm/inputs/data.py

def zip_enc_dec_prompts(
    enc_prompts: Iterable[_T1],
    dec_prompts: Iterable[Optional[_T2]],
    mm_processor_kwargs: Optional[Union[Iterable[dict[str, Any]],
                                        dict[str, Any]]] = None,
) -> list[ExplicitEncoderDecoderPrompt[_T1, _T2]]:
    """
    Zip encoder and decoder prompts together into a list of
    [`ExplicitEncoderDecoderPrompt`][vllm.inputs.data.ExplicitEncoderDecoderPrompt]
    instances.

    ``mm_processor_kwargs`` may also be provided; if a dict is passed, the same
    dictionary will be used for every encoder/decoder prompt. If an iterable is
    provided, it will be zipped with the encoder/decoder prompts.
    """
    if mm_processor_kwargs is None:
        mm_processor_kwargs = cast(dict[str, Any], {})
    if isinstance(mm_processor_kwargs, dict):
        return [
            build_explicit_enc_dec_prompt(
                encoder_prompt,
                decoder_prompt,
                cast(dict[str, Any], mm_processor_kwargs),
            ) for (encoder_prompt,
                   decoder_prompt) in zip(enc_prompts, dec_prompts)
        ]
    return [
        build_explicit_enc_dec_prompt(encoder_prompt, decoder_prompt,
                                      mm_proc_kwargs)
        for (encoder_prompt, decoder_prompt, mm_proc_kwargs
             ) in zip(enc_prompts, dec_prompts, mm_processor_kwargs)
    ]

vllm.inputs

DecoderOnlyInputs module-attribute ¶

INPUT_REGISTRY module-attribute ¶

ProcessorInputs module-attribute ¶

PromptType module-attribute ¶

SingletonInputs module-attribute ¶

SingletonPrompt module-attribute ¶

__all__ module-attribute ¶

虚拟数据 ¶

multi_modal_data class-attribute instance-attribute ¶

multi_modal_placeholders class-attribute instance-attribute ¶

seq_data instance-attribute ¶

EmbedsInputs ¶

cache_salt instance-attribute ¶

prompt_embeds instance-attribute ¶

类型 instance-attribute ¶

EncoderDecoderInputs ¶

解码器 instance-attribute ¶

编码器 instance-attribute ¶

ExplicitEncoderDecoderPrompt ¶

decoder_prompt instance-attribute ¶

encoder_prompt instance-attribute ¶

mm_processor_kwargs instance-attribute ¶

输入上下文 dataclass ¶

model_config instance-attribute ¶

__init__ ¶

get_hf_config ¶

get_hf_image_processor_config ¶

get_hf_processor ¶

get_mm_config ¶

init_processor ¶

InputProcessingContext dataclass ¶

tokenizer instance-attribute ¶

__init__ ¶

call_hf_processor ¶

get_hf_processor ¶

InputRegistry ¶

dummy_data_for_profiling ¶

TextPrompt ¶

cache_salt instance-attribute ¶

mm_processor_kwargs instance-attribute ¶

multi_modal_data instance-attribute ¶

提示词 instance-attribute ¶

TokenInputs ¶

cache_salt instance-attribute ¶

提示词 instance-attribute ¶

prompt_token_ids instance-attribute ¶

token_type_ids instance-attribute ¶

类型 instance-attribute ¶

TokensPrompt ¶

cache_salt instance-attribute ¶

mm_processor_kwargs instance-attribute ¶

multi_modal_data instance-attribute ¶

prompt_token_ids instance-attribute ¶

token_type_ids instance-attribute ¶

build_explicit_enc_dec_prompt ¶

embeds_inputs ¶

to_enc_dec_tuple_list ¶

token_inputs ¶

zip_enc_dec_prompts ¶

DecoderOnlyInputs `module-attribute` ¶

INPUT_REGISTRY `module-attribute` ¶

ProcessorInputs `module-attribute` ¶

PromptType `module-attribute` ¶

SingletonInputs `module-attribute` ¶

SingletonPrompt `module-attribute` ¶

all `module-attribute` ¶

multi_modal_data `class-attribute` `instance-attribute` ¶

multi_modal_placeholders `class-attribute` `instance-attribute` ¶

seq_data `instance-attribute` ¶

cache_salt `instance-attribute` ¶

prompt_embeds `instance-attribute` ¶

类型 `instance-attribute` ¶

解码器 `instance-attribute` ¶

编码器 `instance-attribute` ¶

decoder_prompt `instance-attribute` ¶

encoder_prompt `instance-attribute` ¶

mm_processor_kwargs `instance-attribute` ¶

输入上下文 `dataclass` ¶

model_config `instance-attribute` ¶

init ¶

InputProcessingContext `dataclass` ¶

tokenizer `instance-attribute` ¶

init ¶

cache_salt `instance-attribute` ¶

mm_processor_kwargs `instance-attribute` ¶

multi_modal_data `instance-attribute` ¶

提示词 `instance-attribute` ¶

cache_salt `instance-attribute` ¶

提示词 `instance-attribute` ¶

prompt_token_ids `instance-attribute` ¶

token_type_ids `instance-attribute` ¶

类型 `instance-attribute` ¶

cache_salt `instance-attribute` ¶

mm_processor_kwargs `instance-attribute` ¶

multi_modal_data `instance-attribute` ¶

prompt_token_ids `instance-attribute` ¶

token_type_ids `instance-attribute` ¶