Skip to content

文本转语音

pipeline pipeline

文本转语音管道将文本生成语音。

示例

以下展示了一个使用此管道的简单示例。

from txtai.pipeline import TextToSpeech

# 创建并运行管道
tts = TextToSpeech()
tts("在这里说些什么")

# 流式音频 - 增量生成音频片段
yield from tts(
  "在这里说些什么。再说些别的。".split(),
  stream=True
)

# 使用说话者ID生成音频
tts = TextToSpeech("neuml/vctk-vits-onnx")
tts("在这里说些什么", speaker=15)

# 使用说话者嵌入生成音频
tts = TextToSpeech("neuml/txtai-speecht5-onnx")
tts("在这里说些什么", speaker=np.array(...))

请参阅以下链接以获取更详细的示例。

笔记本 描述
文本转语音生成 从文本生成语音 在Colab中打开

此管道由Hugging Face Hub中的ONNX模型支持。以下是目前可用的模型。

配置驱动示例

管道可以通过Python或配置运行。管道可以通过配置使用管道的类名的小写形式实例化。配置驱动的管道可以通过工作流API运行。

config.yml

# 使用类名的小写形式创建管道
texttospeech:

# 使用工作流运行管道
workflow:
  tts:
    tasks:
      - action: texttospeech

使用工作流运行

from txtai import Application

# 使用工作流创建并运行管道
app = Application("config.yml")
list(app.workflow("tts", ["在这里说些什么"]))

使用API运行

CONFIG=config.yml uvicorn "txtai.api:app" &

curl \
  -X POST "http://localhost:8000/workflow" \
  -H "Content-Type: application/json" \
  -d '{"name":"tts", "elements":["在这里说些什么"]}'

方法

管道的Python文档。

__init__(path=None, maxtokens=512, rate=22050)

Creates a new TextToSpeech pipeline.

Parameters:

Name Type Description Default
path

optional model path

None
maxtokens

maximum number of tokens model can process, defaults to 512

512
rate

target sample rate, defaults to 22050

22050
Source code in txtai/pipeline/audio/texttospeech.py
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
def __init__(self, path=None, maxtokens=512, rate=22050):
    """
    Creates a new TextToSpeech pipeline.

    Args:
        path: optional model path
        maxtokens: maximum number of tokens model can process, defaults to 512
        rate: target sample rate, defaults to 22050
    """

    if not TTS:
        raise ImportError('TextToSpeech pipeline is not available - install "pipeline" extra to enable')

    # Default path
    path = path if path else "neuml/ljspeech-jets-onnx"

    # Target sample rate
    self.rate = rate

    # Load target tts pipeline
    self.pipeline = ESPnet(path, maxtokens, self.providers()) if self.hasfile(path, "model.onnx") else SpeechT5(path, maxtokens, self.providers())

__call__(text, stream=False, speaker=1)

Generates speech from text. Text longer than maxtokens will be batched and returned as a single waveform per text input.

This method supports text as a string or a list. If the input is a string, the return type is audio. If text is a list, the return type is a list.

Parameters:

Name Type Description Default
text

text|list

required
stream

stream response if True, defaults to False

False
speaker

speaker id, defaults to 1

1

Returns:

Type Description

list of (audio, sample rate)

Source code in txtai/pipeline/audio/texttospeech.py
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
def __call__(self, text, stream=False, speaker=1):
    """
    Generates speech from text. Text longer than maxtokens will be batched and returned
    as a single waveform per text input.

    This method supports text as a string or a list. If the input is a string,
    the return type is audio. If text is a list, the return type is a list.

    Args:
        text: text|list
        stream: stream response if True, defaults to False
        speaker: speaker id, defaults to 1

    Returns:
        list of (audio, sample rate)
    """

    # Convert results to a list if necessary
    texts = [text] if isinstance(text, str) else text

    # Streaming response
    if stream:
        return self.stream(texts, speaker)

    # Transform text to speech
    results = [self.execute(x, speaker) for x in texts]

    # Return results
    return results[0] if isinstance(text, str) else results