文本转语音

pipeline

文本转语音管道将文本生成语音。

示例

以下展示了一个使用此管道的简单示例。

from txtai.pipeline import TextToSpeech

# 创建并运行管道
tts = TextToSpeech()
tts("在这里说些什么")

# 流式音频 - 增量生成音频片段
yield from tts(
  "在这里说些什么。再说些别的。".split(),
  stream=True
)

# 使用说话者ID生成音频
tts = TextToSpeech("neuml/vctk-vits-onnx")
tts("在这里说些什么", speaker=15)

# 使用说话者嵌入生成音频
tts = TextToSpeech("neuml/txtai-speecht5-onnx")
tts("在这里说些什么", speaker=np.array(...))

请参阅以下链接以获取更详细的示例。

笔记本	描述
文本转语音生成	从文本生成语音

此管道由Hugging Face Hub中的ONNX模型支持。以下是目前可用的模型。

ljspeech-jets-onnx
ljspeech-vits-onnx
vctk-vits-onnx
txtai-speecht5-onnx

配置驱动示例

管道可以通过Python或配置运行。管道可以通过配置使用管道的类名的小写形式实例化。配置驱动的管道可以通过工作流或API运行。

config.yml

# 使用类名的小写形式创建管道
texttospeech:

# 使用工作流运行管道
workflow:
  tts:
    tasks:
      - action: texttospeech

使用工作流运行

from txtai import Application

# 使用工作流创建并运行管道
app = Application("config.yml")
list(app.workflow("tts", ["在这里说些什么"]))

使用API运行

CONFIG=config.yml uvicorn "txtai.api:app" &

curl \
  -X POST "http://localhost:8000/workflow" \
  -H "Content-Type: application/json" \
  -d '{"name":"tts", "elements":["在这里说些什么"]}'

方法

管道的Python文档。

`init(path=None, maxtokens=512, rate=22050)`

Creates a new TextToSpeech pipeline.

Parameters:

Name	Description	Default
`path`	optional model path	`None`
`maxtokens`	maximum number of tokens model can process, defaults to 512	`512`
`rate`	target sample rate, defaults to 22050	`22050`

Source code in txtai/pipeline/audio/texttospeech.py

def __init__(self, path=None, maxtokens=512, rate=22050):
    """
    Creates a new TextToSpeech pipeline.

    Args:
        path: optional model path
        maxtokens: maximum number of tokens model can process, defaults to 512
        rate: target sample rate, defaults to 22050
    """

    if not TTS:
        raise ImportError('TextToSpeech pipeline is not available - install "pipeline" extra to enable')

    # Default path
    path = path if path else "neuml/ljspeech-jets-onnx"

    # Target sample rate
    self.rate = rate

    # Load target tts pipeline
    self.pipeline = ESPnet(path, maxtokens, self.providers()) if self.hasfile(path, "model.onnx") else SpeechT5(path, maxtokens, self.providers())

`call(text, stream=False, speaker=1)`

Generates speech from text. Text longer than maxtokens will be batched and returned as a single waveform per text input.

This method supports text as a string or a list. If the input is a string, the return type is audio. If text is a list, the return type is a list.