Skip to content

麦克风

pipeline pipeline

麦克风管道从麦克风设备读取输入语音。该管道设计为在本地计算机上运行,因为它需要访问输入设备进行读取。

示例

以下展示了一个使用此管道的简单示例。

from txtai.pipeline import Microphone

# 创建并运行管道
microphone = Microphone()
microphone()

此管道可能需要额外的系统依赖项。更多信息请参见此部分

配置驱动的示例

管道可以通过Python或配置运行。管道可以使用管道的全小写名称在配置中实例化。配置驱动的管道可以通过工作流API运行。

config.yml

# 使用类名的全小写形式创建管道
microphone:

# 使用工作流运行管道
workflow:
  microphone:
    tasks:
      - action: microphone

使用工作流运行

from txtai import Application

# 使用工作流创建并运行管道
app = Application("config.yml")
list(app.workflow("microphone", ["1"]))

使用API运行

CONFIG=config.yml uvicorn "txtai.api:app" &

curl \
  -X POST "http://localhost:8000/workflow" \
  -H "Content-Type: application/json" \
  -d '{"name":"microphone", "elements":["1"]}'

方法

管道的Python文档。

__init__(rate=16000, vadmode=3, vadframe=20, vadthreshold=0.6, voicestart=300, voiceend=3400, active=5, pause=8)

Creates a new Microphone pipeline.

Parameters:

Name Type Description Default
rate

sample rate to record audio in, defaults to 16000 (16 kHz)

16000
vadmode

aggressiveness of the voice activity detector (1 - 3), defaults to 3, which is the most aggressive filter

3
vadframe

voice activity detector frame size in ms, defaults to 20

20
vadthreshold

percentage of frames (0.0 - 1.0) that must be voice to be considered speech, defaults to 0.6

0.6
voicestart

starting frequency to use for voice filtering, defaults to 300

300
voiceend

ending frequency to use for voice filtering, defaults to 3400

3400
active

minimum number of active speech chunks to require before considering this speech, defaults to 5

5
pause

number of non-speech chunks to keep before considering speech complete, defaults to 8

8
Source code in txtai/pipeline/audio/microphone.py
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
def __init__(self, rate=16000, vadmode=3, vadframe=20, vadthreshold=0.6, voicestart=300, voiceend=3400, active=5, pause=8):
    """
    Creates a new Microphone pipeline.

    Args:
        rate: sample rate to record audio in, defaults to 16000 (16 kHz)
        vadmode: aggressiveness of the voice activity detector (1 - 3), defaults to 3, which is the most aggressive filter
        vadframe: voice activity detector frame size in ms, defaults to 20
        vadthreshold: percentage of frames (0.0 - 1.0) that must be voice to be considered speech, defaults to 0.6
        voicestart: starting frequency to use for voice filtering, defaults to 300
        voiceend: ending frequency to use for voice filtering, defaults to 3400
        active: minimum number of active speech chunks to require before considering this speech, defaults to 5
        pause: number of non-speech chunks to keep before considering speech complete, defaults to 8
    """

    if not MICROPHONE:
        raise ImportError(
            (
                'Microphone pipeline is not available - install "pipeline" extra to enable. '
                "Also check that the portaudio system library is available."
            )
        )

    # Sample rate
    self.rate = rate

    # Voice activity detector
    self.vad = webrtcvad.Vad(vadmode)
    self.vadframe = vadframe
    self.vadthreshold = vadthreshold

    # Voice spectrum
    self.voicestart = voicestart
    self.voiceend = voiceend

    # Audio chunks counts
    self.active = active
    self.pause = pause

__call__(device=None)

Reads audio from an input device.

Parameters:

Name Type Description Default
device

optional input device id, otherwise uses system default

None

Returns:

Type Description

list of (audio, sample rate)

Source code in txtai/pipeline/audio/microphone.py
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
def __call__(self, device=None):
    """
    Reads audio from an input device.

    Args:
        device: optional input device id, otherwise uses system default

    Returns:
        list of (audio, sample rate)
    """

    # Listen for audio
    audio = self.listen(device[0] if isinstance(device, list) else device)

    # Return single element if single element passed in
    return (audio, self.rate) if device is None or not isinstance(device, list) else [(audio, self.rate)]