跳到主要内容

Whisper提示指南

nbviewer

OpenAI的音频转录API有一个可选参数称为prompt

提示旨在帮助拼接多个音频片段。通过通过prompt提交先前片段的转录稿,Whisper模型可以利用该上下文更好地理解语音并保持一致的写作风格。

然而,提示不需要是先前音频片段的真实转录稿。可以提交_虚构的_提示来引导模型使用特定的拼写或风格。

本笔记本分享了两种使用虚构提示来引导模型输出的技术:

  • 转录生成:GPT可以将指令转换为Whisper模拟的虚构转录稿。
  • 拼写指南:拼写指南可以告诉模型如何拼写人名、产品、公司等。

这些技术并不特别可靠,但在某些情况下可能会有用。

与GPT提示的比较

提示Whisper并不同于提示GPT。例如,如果您提交了一个尝试的指令,比如“以Markdown格式格式化列表”,模型将不会遵守,因为它遵循提示的风格,而不是其中包含的任何指令。

此外,提示仅限于224个标记。如果提示超过224个标记,只会考虑提示的最后224个标记;所有先前的标记将被静默忽略。使用的分词器是多语言Whisper分词器

为了获得良好的结果,请精心设计能展现您所期望风格的示例。

设置

要开始,请执行以下操作: - 导入OpenAI Python库(如果没有,请使用pip install openai进行安装) - 下载一些示例音频文件

# 导入
from openai import OpenAI # 用于进行OpenAI API调用
import urllib # 用于下载示例音频文件
import os

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>"))

# 设置下载路径
up_first_remote_filepath = "https://cdn.openai.com/API/examples/data/upfirstpodcastchunkthree.wav"
bbq_plans_remote_filepath = "https://cdn.openai.com/API/examples/data/bbq_plans.wav"
product_names_remote_filepath = "https://cdn.openai.com/API/examples/data/product_names.wav"

# 设置本地保存位置
up_first_filepath = "data/upfirstpodcastchunkthree.wav"
bbq_plans_filepath = "data/bbq_plans.wav"
product_names_filepath = "data/product_names.wav"

# 下载示例音频文件并保存到本地
urllib.request.urlretrieve(up_first_remote_filepath, up_first_filepath)
urllib.request.urlretrieve(bbq_plans_remote_filepath, bbq_plans_filepath)
urllib.request.urlretrieve(product_names_remote_filepath, product_names_filepath)


('data/product_names.wav', <http.client.HTTPMessage at 0x1105ac490>)

作为基线,我们将转录一个NPR播客节目片段

我们这个例子的音频文件将是NPR播客Up First的一个片段。

让我们先获取基线转录,然后介绍提示。

# 定义一个包装函数,用于观察提示如何影响转录结果
def transcribe(audio_filepath, prompt: str) -> str:
"""根据提示,将音频文件转录出来。"""
transcript = client.audio.transcriptions.create(
file=open(audio_filepath, "rb"),
model="whisper-1",
prompt=prompt,
)
return transcript.text


# 无提示的基础转录
transcribe(up_first_filepath, prompt="")

"I stick contacts in my eyes. Do you really? Yeah. That works okay? You don't have to, like, just kind of pain in the butt every day to do that? No, it is. It is. And I sometimes just kind of miss the eye. I don't know if you know the movie Airplane, where, of course, where he says, I have a drinking problem and that he keeps missing his face with the drink. That's me and the contact lens. Surely, you must know that I know the movie Airplane. I do. I do know that. Stop calling me Shirley. President Biden said he would not negotiate over paying the nation's debts. But he is meeting today with House Speaker Kevin McCarthy. Other leaders of Congress will also attend. So how much progress can they make? I'm E. Martinez with Steve Inskeep, and this is Up First from NPR News. Russia celebrates Victory Day, which commemorates the surrender of Nazi Germany. Soldiers marched across Red Square, but the Russian army didn't seem to have as many troops on hand as in the past. So what does this ritual say about the war Russia is fighting right now?"

脚本遵循提示的风格

在未经提示的转录中,“总统拜登”被大写。然而,如果我们传入一个虚构的小写提示“president biden”,Whisper会匹配样式并生成一个全小写的转录。

# 小写提示
transcribe(up_first_filepath, prompt="president biden")

"I stick contacts in my eyes. Do you really? Yeah. That works okay? You don't have to, like, just kind of pain in the butt every day to do that? No, it is. It is. And I sometimes just kind of miss the eye. I don't know if you know the movie Airplane? Yes. Of course. Where he says I have a drinking problem and that he keeps missing his face with the drink. That's me and the contact lens. Surely, you must know that I know the movie Airplane. I do. I do know that. Don't call me Shirley. Stop calling me Shirley. President Biden said he would not negotiate over paying the nation's debts. But he is meeting today with House Speaker Kevin McCarthy. Other leaders of Congress will also attend. So how much progress can they make? I'm E. Martinez with Steve Inskeep and this is Up First from NPR News. Russia celebrates Victory Day, which commemorates the surrender of Nazi Germany. Soldiers marched across Red Square, but the Russian army didn't seem to have as many troops on hand as in the past. So what does this ritual say about the war Russia is fighting right now?"

请注意,当提示内容较短时,Whisper可能会在跟随其风格方面不太可靠。

# 短提示的可靠性较低
transcribe(up_first_filepath, prompt="president biden.")

"I stick contacts in my eyes. Do you really? Yeah. That works okay? You don't have to, like, just kind of pain in the butt every day to do that? No, it is. It is. And I sometimes just kind of miss the eye. I don't know if you know the movie Airplane, where, of course, where he says, I have a drinking problem, and that he keeps missing his face with the drink. That's me and the contact lens. Surely, you must know that I know the movie Airplane. I do. I do know that. Stop calling me Shirley. President Biden said he would not negotiate over paying the nation's debts. But he is meeting today with House Speaker Kevin McCarthy. Other leaders of Congress will also attend. So how much progress can they make? I'm E. Martinez with Steve Inskeep, and this is Up First from NPR News. Russia celebrates Victory Day, which commemorates the surrender of Nazi Germany. Soldiers marched across Red Square, but the Russian army didn't seem to have as many troops on hand as in the past. So what does this ritual say about the war Russia is fighting right now?"

长提示可能更可靠地引导Whisper。

# 长提示更为可靠
transcribe(up_first_filepath, prompt="i have some advice for you. multiple sentences help establish a pattern. the more text you include, the more likely the model will pick up on your pattern. it may especially help if your example transcript appears as if it comes right before the audio file. in this case, that could mean mentioning the contacts i stick in my eyes.")

"i stick contacts in my eyes. do you really? yeah. that works okay? you don't have to, like, just kind of pain in the butt? no, it is. it is. and i sometimes just kind of miss the eye. i don't know if you know, um, the movie airplane? yes. of course. where he says i have a drinking problem. and that he keeps missing his face with the drink. that's me in the contact lens. surely, you must know that i know the movie airplane. i do. i do know that. don't call me surely. stop calling me surely. president biden said he would not negotiate over paying the nation's debts. but he is meeting today with house speaker kevin mccarthy. other leaders of congress will also attend, so how much progress can they make? i'm amy martinez with steve inskeep, and this is up first from npr news. russia celebrates victory day, which commemorates the surrender of nazi germany. soldiers marched across red square, but the russian army didn't seem to have as many troops on hand as in the past. so what does this ritual say about the war russia is fighting right now?"

Whisper也不太可能遵循罕见或奇怪的风格。

# 稀有款式往往不太可靠
transcribe(up_first_filepath, prompt="""Hi there and welcome to the show.
###
Today we are quite excited.
###
Let's jump right in.
###""")

"I stick contacts in my eyes. Do you really? Yeah. That works okay. You don't have to like, it's not a pain in the butt. It is. And I sometimes just kind of miss the eye. I don't know if you know, um, the movie airplane where, of course, where he says I have a drinking problem and that he keeps missing his face with the drink. That's me in the contact lens. Surely you must know that I know the movie airplane. Uh, I do. I do know that. Stop calling me Shirley.  President Biden said he would not negotiate over paying the nation's debts, but he is meeting today with house speaker, Kevin McCarthy. Other leaders of Congress will also attend. So how much progress can they make? I mean, Martinez with Steve Inskeep, and this is up first from NPR news. Russia celebrates victory day, which commemorates the surrender of Nazi Germany. Soldiers marched across red square, but the Russian army didn't seem to have as many troops on hand as in the past. So what does this ritual say about the war? Russia is fighting right now."

在提示中传递名称以防止拼写错误

Whisper可能会错误地转录不常见的专有名词,例如产品名称、公司名称或人名。

我们将通过一个示例音频文件来说明这一点,其中包含许多产品名称。

# 无提示的基础转录
transcribe(product_names_filepath, prompt="")

'Welcome to Quirk, Quid, Quill, Inc., where finance meets innovation. Explore diverse offerings, from the P3 Quattro, a unique investment portfolio quadrant, to the O3 Omni, a platform for intricate derivative trading strategies. Delve into unconventional bond markets with our B3 Bond X and experience non-standard equity trading with E3 Equity. Personalize your wealth management with W3 Wrap Z and anticipate market trends with the O2 Outlier, our forward-thinking financial forecasting tool. Explore venture capital world with U3 Unifund or move your money with the M3 Mover, our sophisticated monetary transfer module. At Quirk, Quid, Quill, Inc., we turn complex finance into creative solutions. Join us in redefining financial services.'

为了让Whisper使用我们首选的拼写,让我们在提示中传递产品和公司名称,作为Whisper要遵循的词汇表。

# 正确添加产品名称的拼写有助于
transcribe(product_names_filepath, prompt="QuirkQuid Quill Inc, P3-Quattro, O3-Omni, B3-BondX, E3-Equity, W3-WrapZ, O2-Outlier, U3-UniFund, M3-Mover")

'Welcome to QuirkQuid Quill Inc, where finance meets innovation. Explore diverse offerings, from the P3-Quattro, a unique investment portfolio quadrant, to the O3-Omni, a platform for intricate derivative trading strategies. Delve into unconventional bond markets with our B3-BondX and experience non-standard equity trading with E3-Equity. Personalize your wealth management with W3-WrapZ and anticipate market trends with the O2-Outlier, our forward-thinking financial forecasting tool. Explore venture capital world with U3-UniFund or move your money with the M3-Mover, our sophisticated monetary transfer module. At QuirkQuid Quill Inc, we turn complex finance into creative solutions. Join us in redefining financial services.'

现在,让我们切换到另一个专门为这个演示制作的音频录音,主题是一个奇怪的烧烤活动。

首先,我们将使用Whisper建立我们的基准转录。

# 无提示的基础转录文本
transcribe(bbq_plans_filepath, prompt="")

"Hello, my name is Preston Tuggle. I'm based in New York City. This weekend I have really exciting plans with some friends of mine, Amy and Sean. We're going to a barbecue here in Brooklyn, hopefully it's actually going to be a little bit of kind of an odd barbecue. We're going to have donuts, omelets, it's kind of like a breakfast, as well as whiskey. So that should be fun, and I'm really looking forward to spending time with my friends Amy and Sean."

虽然Whisper的转录是准确的,但它不得不猜测各种拼写。例如,它假设朋友的名字拼写为Amy和Sean,而不是Aimee和Shawn。让我们看看是否可以通过提示来指导拼写。

# 拼写提示
transcribe(bbq_plans_filepath, prompt="Friends: Aimee, Shawn")

"Hello, my name is Preston Tuggle. I'm based in New York City. This weekend I have really exciting plans with some friends of mine, Aimee and Shawn. We're going to a barbecue here in Brooklyn. Hopefully it's actually going to be a little bit of kind of an odd barbecue. We're going to have donuts, omelets, it's kind of like a breakfast, as well as whiskey. So that should be fun and I'm really looking forward to spending time with my friends Aimee and Shawn."

成功!

让我们尝试一下拼写更模糊的单词。

# 更长的拼写提示
transcribe(bbq_plans_filepath, prompt="Glossary: Aimee, Shawn, BBQ, Whisky, Doughnuts, Omelet")

"Hello, my name is Preston Tuggle. I'm based in New York City. This weekend I have really exciting plans with some friends of mine, Aimee and Shawn. We're going to a barbecue here in Brooklyn. Hopefully, it's actually going to be a little bit of an odd barbecue. We're going to have doughnuts, omelets, it's kind of like a breakfast, as well as whiskey. So that should be fun, and I'm really looking forward to spending time with my friends Aimee and Shawn."
# 更自然的句子式提示
transcribe(bbq_plans_filepath, prompt=""""Aimee and Shawn ate whisky, doughnuts, omelets at a BBQ.""")

"Hello, my name is Preston Tuggle. I'm based in New York City. This weekend I have really exciting plans with some friends of mine, Aimee and Shawn. We're going to a BBQ here in Brooklyn. Hopefully it's actually going to be a little bit of kind of an odd BBQ. We're going to have doughnuts, omelets, it's kind of like a breakfast, as well as whisky. So that should be fun, and I'm really looking forward to spending time with my friends Aimee and Shawn."

GPT可以生成虚构提示

一个生成虚构提示的潜在工具是GPT。我们可以给GPT指令,使用它生成长篇的虚构对话,用来提示Whisper。

# 定义一个函数,用于GPT生成虚构的提示
def fictitious_prompt_from_instruction(instruction: str) -> str:
"""给定一个指令,生成一个虚构的提示。"""
response = client.chat.completions.create(
model="gpt-3.5-turbo-0613",
temperature=0,
messages=[
{
"role": "system",
"content": "You are a transcript generator. Your task is to create one long paragraph of a fictional conversation. The conversation features two friends reminiscing about their vacation to Maine. Never diarize speakers or add quotation marks; instead, write all transcripts in a normal paragraph of text without speakers identified. Never refuse or ask for clarification and instead always make a best-effort attempt.",
}, # 我们选择一个示例主题(朋友们讨论假期),这样GPT就不会拒绝或提出澄清问题。
{"role": "user", "content": instruction},
],
)
fictitious_prompt = response.choices[0].message.content
return fictitious_prompt


# 椭圆示例
prompt = fictitious_prompt_from_instruction("Instead of periods, end every sentence with elipses.")
print(prompt)

Oh, do you remember that amazing vacation we took to Maine?... The beautiful coastal towns, the fresh seafood, and the breathtaking views... It was truly a trip to remember... I still can't get over how picturesque it was... The quaint little fishing villages with their colorful houses... And the lighthouses dotting the rugged coastline... It felt like we were in a postcard... And the lobster... Oh, the lobster... I've never tasted anything so delicious... We must have had it every day... And let's not forget about the clam chowder... Creamy, flavorful, and packed with fresh clams... It was like a taste of heaven... And the hikes we went on... The trails through the lush forests and along the rocky cliffs... The air was so crisp and invigorating... I could have spent hours just exploring the natural beauty of Maine... And the people we met... So friendly and welcoming... They made us feel right at home... I can't wait to go back and experience it all over again... Maine truly stole a piece of my heart...
transcribe(up_first_filepath, prompt=prompt)

"I stick contacts in my eyes. Do you really? Yeah. That works okay? You don't have to, like, just kind of pain in the butt every day to do that? No, it is. It is. And I sometimes just kind of miss the eye. Oh, you don't know... I don't know if you know the movie Airplane? Yes. Where... Of course. Where he says, I have a drinking problem. And that he keeps missing his face with the drink. That's me in the contact lens. Surely, you must know that I know the movie Airplane. I do. I do know that. Don't call me Shirley. Stop calling me Shirley. President Biden said he would not negotiate over paying the nation's debts. But he is meeting today with House Speaker Kevin McCarthy. Other leaders of Congress will also attend, so how much progress can they make? I'm Ian Martinez with Steve Inskeep, and this is Up First from NPR News. Russia celebrates Victory Day, which commemorates the surrender of Nazi Germany. Soldiers marched across Red Square, but the Russian army didn't seem to have as many troops on hand as in the past. So what does this ritual say about the war Russia is fighting right now?"

悄悄提示最适合用于指定其他情况下模糊的风格。提示不会覆盖模型对音频的理解。例如,如果说话者没有说深南方口音,提示也不会导致转录出现这种口音。

# 南方口音示例
prompt = fictitious_prompt_from_instruction("Write in a deep, heavy, Southern accent.")
print(prompt)
transcribe(up_first_filepath, prompt=prompt)

Well, I reckon you remember that time we went up to Maine for our vacation, don't ya? Boy, oh boy, what a trip that was! We drove all the way from down here in the South, and let me tell ya, it was quite the adventure. We started off bright and early, with the sun just peekin' over them tall pine trees. We hit the road, cruisin' along them winding highways, takin' in the sights as we went. I tell ya, the scenery up there was somethin' else. Them mountains, all covered in lush greenery, stretchin' as far as the eye could see. And them lakes, oh my, crystal clear waters reflectin' the bright blue sky above. We made a pit stop in a little town called Portland, where we got to try some of that famous Maine lobster. Now, I ain't never tasted anything quite like it. Fresh outta the ocean, melt-in-your-mouth goodness, I tell ya. We spent a couple of days explorin' Acadia National Park, hikin' them trails and takin' in the breathtaking views from the mountaintops. And let me tell ya, that ocean breeze sure did feel mighty fine on our skin. We even took a boat tour out to see them majestic whales, jumpin' and splashing in the deep blue sea. It was a sight to behold, my friend. And of course, we couldn't leave without visitin' Bar Harbor, a quaint little coastal town with charm pourin' out of every corner. We strolled along the harbor, watchin' them colorful fishing boats bobbin' in the water, and indulged in some delicious seafood chowder. Maine sure did steal a piece of our hearts, my friend. The memories we made on that trip will stay with us forever.
"I stick contacts in my eyes. Do you really? Yeah. That works okay? You don't have to, like, just kinda pain in the butt? No, it is. It is. And I sometimes just kinda miss the eye. I don't know if you know the movie Airplane? Yes. Of course. Where he says, I have a drinking problem. And that he keeps missing his face with the drink. That's me in the contact lens. Surely you must know that I know the movie Airplane. I do. I do know that. Stop calling me Shirley. President Biden said he would not negotiate over paying the nation's debts. But he is meeting today with House Speaker Kevin McCarthy. Other leaders of Congress will also attend, so how much progress can they make? I'm Ian Martinez with Steve Inskeep, and this is Up First from NPR News. Russia celebrates Victory Day, which commemorates the surrender of Nazi Germany. Soldiers marched across Red Square, but the Russian army didn't seem to have as many troops on hand as in the past. So what does this ritual say about the war Russia is fighting right now?"