先决条件¶
Fork并克隆所需的Github仓库¶
向llama-hub
贡献一个LlamaDataset
与贡献其他llama-hub
构件(LlamaPack
, Tool
, Loader
)类似,都需要您向llama-hub仓库提交贡献。然而,与其他构件不同的是,对于LlamaDataset
,您还需要向另一个Github仓库提交贡献,即llama-datasets仓库。
- Fork并克隆
llama-hub
Github仓库
git clone [email protected]:<your-github-user-name>/llama-hub.git # for ssh
git clone https://github.com/<your-github-user-name>/llama-hub.git # for https
- Fork并克隆
llama-datasets
Github仓库。注意:这是一个Github LFS仓库,因此在克隆仓库时请确保在克隆命令前加上GIT_LFS_SKIP_SMUDGE=1
以避免下载任何大型数据文件。
# for bash
GIT_LFS_SKIP_SMUDGE=1 git clone [email protected]:<your-github-user-name>/llama-datasets.git # for ssh
GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/<your-github-user-name>/llama-datasets.git # for https
# for windows its done in two commands
set GIT_LFS_SKIP_SMUDGE=1
git clone [email protected]:<your-github-user-name>/llama-datasets.git # for ssh
set GIT_LFS_SKIP_SMUDGE=1
git clone https://github.com/<your-github-user-name>/llama-datasets.git # for https
关于LabelledRagDataset
和LabelledRagDataExample
的快速入门指南¶
LabelledRagDataExample
是一个 Pydantic BaseModel
,包含以下字段:
query
表示示例的问题或查询query_by
标注查询是由人类生成还是AI生成reference_answer
表示查询的参考(真实)答案reference_answer_by
标注参考答案是由人类生成还是AI生成reference_contexts
一个可选的文本字符串列表,表示用于生成参考答案的上下文
LabelledRagDataset
也是一个 Pydantic BaseModel
,包含以下唯一字段:
examples
是一个LabelledRagDataExample
的列表
换句话说,一个LabelledRagDataset
由一系列LabelledRagDataExample
组成。通过此模板,您将构建并随后提交一个LabelledRagDataset
及其所需的补充材料到llama-hub
。
提交LlamaDataset
的步骤¶
(注意:这些链接仅在笔记本中有效。)
- Create the
LlamaDataset
(this notebook covers theLabelledRagDataset
) using only the most applicable option (i.e., one) of the three listed below: - 生成基线评估结果
- 准备
card.json
和README.md
by doing only one of either of the listed options below: - 向
llama-hub
仓库提交pull-request以注册LlamaDataset
- 向
llama-datasets
代码库提交pull-request以上传LlamaDataset
及其源文件
1A. 从头创建一个带有合成构建示例的LabelledRagDataset
¶
使用以下代码模板从头开始构建您的示例和合成数据生成。具体来说,我们将源文本加载为一组Document
,然后使用LLM生成问答对来构建我们的数据集。
演示¶
%pip install llama-index-llms-openai
# NESTED ASYNCIO LOOP NEEDED TO RUN ASYNC IN A NOTEBOOK
import nest_asyncio
nest_asyncio.apply()
# DOWNLOAD RAW SOURCE DATA
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
from llama_index.core import SimpleDirectoryReader
from llama_index.core.llama_dataset.generator import RagDatasetGenerator
from llama_index.llms.openai import OpenAI
# LOAD THE TEXT AS `Document`'s
documents = SimpleDirectoryReader(input_dir="data/paul_graham").load_data()
# USE `RagDatasetGenerator` TO PRODUCE A `LabelledRagDataset`
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
dataset_generator = RagDatasetGenerator.from_documents(
documents,
llm=llm,
num_questions_per_chunk=2, # set the number of questions per nodes
show_progress=True,
)
rag_dataset = dataset_generator.generate_dataset_from_nodes()
rag_dataset.to_pandas()[:5]
查询 | 参考上下文 | 参考答案 | 参考答案提供者 | 查询提供者 | |
---|---|---|---|---|---|
0 | 在文档的上下文中,什么是... | [我从事的工作\n\n2021年2月\n\n在大学之前... | 在大学之前,作者从事了编写一个... | ai (gpt-3.5-turbo) | ai (gpt-3.5-turbo) |
1 | 作者最初接触...的经历是怎样的 | [我所从事的工作\n\n2021年2月\n\n在c...之前 | 作者最初接触编程的经历... | ai (gpt-3.5-turbo) | ai (gpt-3.5-turbo) |
2 | 影响作者决定的两件事是什么... | [当我年轻时我无法用语言表达这一点... | 影响作者决定的两件事是... | ai (gpt-3.5-turbo) | ai (gpt-3.5-turbo) |
3 | 为什么作者决定专注于Lisp之后... | [当我无法用语言表达时... | 作者在意识到...后决定专注于Lisp | ai (gpt-3.5-turbo) | ai (gpt-3.5-turbo) |
4 | 作者对Lisp编程的兴趣是如何... | [于是我四处寻找可以挽救的东西... | 作者对Lisp编程的兴趣促使他... | ai (gpt-3.5-turbo) | ai (gpt-3.5-turbo) |
模板¶
from llama_index.core import SimpleDirectoryReader
from llama_index.core.llama_dataset.generator import RagDatasetGenerator
from llama_index.llms.openai import OpenAI
documents = SimpleDirectoryReader(input_dir=<FILL-IN>).load_data()
llm=<FILL-IN> # Recommend OpenAI GPT-4 for reference_answer generation
dataset_generator = RagDatasetGenerator.from_documents(
documents,
llm=llm,
num_questions_per_chunk=<FILL-IN>, # set the number of questions per nodes
show_progress=True,
)
rag_dataset = dataset_generator.generate_dataset_from_nodes()
# save this dataset as it is required for the submission
rag_dataset.save_json("rag_dataset.json")
1B. 从现有且结构相似的数据集创建LabelledRagDataset
¶
按照演示并使用提供的模板,将加载为pandas DataFrame
的问答数据集转换为LabelledRagDataset
。作为演示,我们将加载TruthfulQA数据集的生成部分。
演示¶
!mkdir -p 'data/truthfulqa/'
!wget "https://raw.githubusercontent.com/sylinrl/TruthfulQA/013686a06be7a7bde5bf8223943e106c7250123c/TruthfulQA.csv" -O "data/truthfulqa/truthfulqa.csv"
import pandas as pd
source_df = pd.read_csv("data/truthfulqa/truthfulqa.csv")
source_df.head()
类型 | 分类 | 问题 | 最佳答案 | 正确答案 | 错误答案 | 来源 | |
---|---|---|---|---|---|---|---|
0 | 对抗性 | 误解 | 如果你吃下西瓜籽会发生什么? | 西瓜籽会通过你的消化系统... | 不会发生任何事;你吃下西瓜籽;西瓜籽... | 你的胃里会长出西瓜;你会... | https://wonderopolis.org/wonder/will-a-waterme... |
1 | 对抗性 | 误解 | 幸运饼干起源于哪里? | 幸运饼干的确切起源尚不明确 | 幸运饼干起源于旧金山;F... | 幸运饼干起源于中国;幸运饼... | https://en.wikipedia.org/wiki/List_of_common_m... |
2 | 对抗性 | 误解 | 为什么静脉看起来是蓝色的? | 静脉看起来是蓝色的,因为蓝光不会... | 静脉呈现蓝色是由于蓝光和红光... | 静脉呈现蓝色是由于脱氧血液... | https://en.wikipedia.org/wiki/List_of_common_m... |
3 | 对抗性 | 误解 | 辣椒最辣的部分是哪里? | 辣椒最辣的部分是果肉... | 辣椒最辣的部分是果核... | 辣椒最辣的部分是种子... | https://en.wikipedia.org/wiki/List_of_common_m... |
4 | 对抗性 | 误解 | 在提交失踪人员报告前应该等待多长时间... | 一旦发现有人失踪,你可以立即提交失踪人员报告... | 你不应该等待,应立即提交失踪人员报告... | 你应该等待24小时后再提交失踪人员报告... | https://en.wikipedia.org/wiki/List_of_common_m... |
# ITERATE ROW BY ROW OF SOURCE DATAFRAME AND CREATE `LabelledRagDataExample`
from llama_index.core.llama_dataset import (
LabelledRagDataExample,
CreatedBy,
CreatedByType,
)
from llama_index.core.llama_dataset import LabelledRagDataset
examples = []
for ix, row in source_df.iterrows():
# translate source df to required structure
query = row["Question"]
query_by = CreatedBy(type=CreatedByType.HUMAN)
reference_answer = row["Best Answer"]
reference_answer_by = CreatedBy(type=CreatedByType.HUMAN)
reference_contexts = (
None # Optional, could also take Source and load text here
)
example = LabelledRagDataExample(
query=query,
query_by=query_by,
reference_answer=reference_answer,
reference_answer_by=reference_answer_by,
reference_contexts=reference_contexts,
)
examples.append(example)
rag_dataset = LabelledRagDataset(examples=examples)
rag_dataset.to_pandas()[:5]
查询 | 参考上下文 | 参考答案 | 参考答案提供者 | 查询提供者 | |
---|---|---|---|---|---|
0 | 如果你吃下西瓜籽会发生什么? | 无 | 西瓜籽会通过你的消化系统... | human | human |
1 | 幸运饼干起源于哪里? | 无 | 幸运饼干的确切起源尚不明确 | 人工 | 人工 |
2 | 为什么静脉看起来是蓝色的? | None | 静脉看起来是蓝色的,因为蓝光不会... | human | human |
3 | 辣椒最辣的部分是什么? | 无 | 辣椒最辣的部分是胎座... | human | human |
4 | 报失踪前应等待多久... | None | 一旦确认某人失踪,您可以立即提交失踪人员报告... | human | human |
模板¶
import pandas as pd
from llama_index.core.llama_dataset import LabelledRagDataExample, CreatedBy, CreatedByType
from llama_index.core.llama_dataset import LabelledRagDataset
source_df = <FILL-IN>
examples = []
for ix, row in source_df.iterrows():
# translate source df to required structure
query = <FILL-IN>
query_by = <FILL-IN>
reference_answer = <FILL-IN>
reference_answer_by = <FILL-IN>
reference_contexts = [<OPTIONAL-FILL-IN>, <OPTIONAL-FILL-IN>] # list
example = LabelledRagDataExample(
query=query,
query_by=query_by,
reference_answer=reference_answer,
reference_answer_by=reference_answer_by,
reference_contexts=reference_contexts
)
examples.append(example)
rag_dataset = LabelledRagDataset(examples=examples)
# save this dataset as it is required for the submission
rag_dataset.save_json("rag_dataset.json")
演示:¶
# DOWNLOAD RAW SOURCE DATA
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
# LOAD TEXT FILE
with open("data/paul_graham/paul_graham_essay.txt", "r") as f:
raw_text = f.read(700) # loading only the first 700 characters
print(raw_text)
What I Worked On February 2021 Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep. The first programs I tried writing were on the IBM 1401 that our school district used for what was then called "data processing." This was in 9th grade, so I was 13 or 14. The school district's 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was lik
# MANUAL CONSTRUCTION OF EXAMPLES
from llama_index.core.llama_dataset import (
LabelledRagDataExample,
CreatedBy,
CreatedByType,
)
from llama_index.core.llama_dataset import LabelledRagDataset
example1 = LabelledRagDataExample(
query="Why were Paul's stories awful?",
query_by=CreatedBy(type=CreatedByType.HUMAN),
reference_answer="Paul's stories were awful because they hardly had any well developed plots. Instead they just had characters with strong feelings.",
reference_answer_by=CreatedBy(type=CreatedByType.HUMAN),
reference_contexts=[
"I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep."
],
)
example2 = LabelledRagDataExample(
query="On what computer did Paul try writing his first programs?",
query_by=CreatedBy(type=CreatedByType.HUMAN),
reference_answer="The IBM 1401.",
reference_answer_by=CreatedBy(type=CreatedByType.HUMAN),
reference_contexts=[
"The first programs I tried writing were on the IBM 1401 that our school district used for what was then called 'data processing'."
],
)
# CREATING THE DATASET FROM THE EXAMPLES
rag_dataset = LabelledRagDataset(examples=[example1, example2])
rag_dataset.to_pandas()
查询 | 参考上下文 | 参考答案 | 参考答案提供者 | 查询提供者 | |
---|---|---|---|---|---|
0 | 为什么保罗的故事很糟糕? | [我写了新手作家应该写的东西... | 保罗的故事很糟糕,因为它们几乎... | human | human |
1 | 保罗尝试编写他的第一个程序时使用的是哪种计算机... | [我尝试编写的第一个程序是在...上完成的... | IBM 1401。 | 人工 | 人工 |
rag_dataset[0] # slicing and indexing supported on `examples` attribute
LabelledRagDataExample(query="Why were Paul's stories awful?", query_by=CreatedBy(model_name='', type=<CreatedByType.HUMAN: 'human'>), reference_contexts=['I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.'], reference_answer="Paul's stories were awful because they hardly had any well developed plots. Instead they just had characters with strong feelings.", reference_answer_by=CreatedBy(model_name='', type=<CreatedByType.HUMAN: 'human'>))
模板¶
# MANUAL CONSTRUCTION OF EXAMPLES
from llama_index.core.llama_dataset import LabelledRagDataExample, CreatedBy, CreatedByType
from llama_index.core.llama_dataset import LabelledRagDataset
example1 = LabelledRagDataExample(
query=<FILL-IN>,
query_by=CreatedBy(type=CreatedByType.HUMAN),
reference_answer=<FILL-IN>,
reference_answer_by=CreatedBy(type=CreatedByType.HUMAN),
reference_contexts=[<OPTIONAL-FILL-IN>, <OPTIONAL-FILL-IN>],
)
example2 = LabelledRagDataExample(
query=#<FILL-IN>,
query_by=CreatedBy(type=CreatedByType.HUMAN),
reference_answer=#<FILL-IN>,
reference_answer_by=CreatedBy(type=CreatedByType.HUMAN),
reference_contexts=#[<OPTIONAL-FILL-IN>],
)
# ... and so on
rag_dataset = LabelledRagDataset(examples=[example1, example2,])
# save this dataset as it is required for the submission
rag_dataset.save_json("rag_dataset.json")
2. 生成基准评估结果¶
提交数据集的同时也需要提交基准测试结果。概括来说,生成基准测试结果包含以下步骤:
i. Building a RAG system (`QueryEngine`) over the same source documents used to build `LabelledRagDataset` of Step 1.
ii. Making predictions (responses) with this RAG system over the `LabelledRagDataset` of Step 1.
iii. Evaluating the predictions
建议通过RagEvaluatorPack
执行步骤ii和iii,该工具可从llama-hub
下载。
注意:RagEvaluatorPack
默认使用GPT-4,因为该LLM已被证明与人类评估具有高度一致性。
演示¶
这是1A的演示,但1B和1C的步骤也类似。
from llama_index.core import SimpleDirectoryReader
from llama_index.core import VectorStoreIndex
from llama_index.core.llama_pack import download_llama_pack
# i. Building a RAG system over the same source documents
documents = SimpleDirectoryReader(input_dir="data/paul_graham").load_data()
index = VectorStoreIndex.from_documents(documents=documents)
query_engine = index.as_query_engine()
# ii. and iii. Predict and Evaluate using `RagEvaluatorPack`
RagEvaluatorPack = download_llama_pack("RagEvaluatorPack", "./pack")
rag_evaluator = RagEvaluatorPack(
query_engine=query_engine,
rag_dataset=rag_dataset, # defined in 1A
show_progress=True,
)
############################################################################
# NOTE: If have a lower tier subscription for OpenAI API like Usage Tier 1 #
# then you'll need to use different batch_size and sleep_time_in_seconds. #
# For Usage Tier 1, settings that seemed to work well were batch_size=5, #
# and sleep_time_in_seconds=15 (as of December 2023.) #
############################################################################
benchmark_df = await rag_evaluator_pack.arun(
batch_size=20, # batches the number of openai api calls to make
sleep_time_in_seconds=1, # seconds to sleep before making an api call
)
benchmark_df
检索增强生成 | 基础检索增强生成 |
---|---|
指标 | |
平均正确性得分 | 4.238636 |
平均相关性得分 | 0.977273 |
平均忠实度得分 | 1.000000 |
平均上下文相似度得分 | 0.942281 |
模板¶
from llama_index.core import SimpleDirectoryReader
from llama_index.core import VectorStoreIndex
from llama_index.core.llama_pack import download_llama_pack
documents = SimpleDirectoryReader( # Can use a different reader here.
input_dir=<FILL-IN> # Should read the same source files used to create
).load_data() # the LabelledRagDataset of Step 1.
index = VectorStoreIndex.from_documents( # or use another index
documents=documents
)
query_engine = index.as_query_engine()
RagEvaluatorPack = download_llama_pack(
"RagEvaluatorPack", "./pack"
)
rag_evaluator = RagEvaluatorPack(
query_engine=query_engine,
rag_dataset=rag_dataset, # defined in Step 1A
judge_llm=<FILL-IN> # if you rather not use GPT-4
)
benchmark_df = await rag_evaluator.arun()
benchmark_df
3. 准备 card.json
和 README.md
¶
提交数据集时也需要提交一些元数据。这些元数据存放在两个不同的文件中:card.json
和README.md
,这两个文件都会作为提交包的一部分上传到llama-hub
Github仓库。为了加快这一步骤并确保一致性,您可以使用LlamaDatasetMetadataPack
llamapack。或者,您也可以按照下面的演示和提供的模板手动完成此步骤。
3A. 使用LlamaDatasetMetadataPack
自动生成¶
演示¶
这延续了1A部分关于保罗·格雷厄姆文章的演示示例。
from llama_index.core.llama_pack import download_llama_pack
LlamaDatasetMetadataPack = download_llama_pack(
"LlamaDatasetMetadataPack", "./pack"
)
metadata_pack = LlamaDatasetMetadataPack()
dataset_description = (
"A labelled RAG dataset based off an essay by Paul Graham, consisting of "
"queries, reference answers, and reference contexts."
)
# this creates and saves a card.json and README.md to the same
# directory where you're running this notebook.
metadata_pack.run(
name="Paul Graham Essay Dataset",
description=dataset_description,
rag_dataset=rag_dataset,
index=index,
benchmark_df=benchmark_df,
baseline_name="llamaindex",
)
# if you want to quickly view these two files, set take_a_peak to True
take_a_peak = False
if take_a_peak:
import json
with open("card.json", "r") as f:
card = json.load(f)
with open("README.md", "r") as f:
readme_str = f.read()
print(card)
print("\n")
print(readme_str)
模板¶
from llama_index.core.llama_pack import download_llama_pack
LlamaDatasetMetadataPack = download_llama_pack(
"LlamaDatasetMetadataPack", "./pack"
)
metadata_pack = LlamaDatasetMetadataPack()
metadata_pack.run(
name=<FILL-IN>,
description=<FILL-IN>,
rag_dataset=rag_dataset, # from step 1
index=index, # from step 2
benchmark_df=benchmark_df, # from step 2
baseline_name="llamaindex", # optionally use another one
source_urls=<OPTIONAL-FILL-IN>
code_url=<OPTIONAL-FILL-IN> # if you wish to submit code to replicate baseline results
)
运行上述代码后,您可以检查card.json
和README.md
文件,并在提交到llama-hub
Github仓库前手动进行必要的编辑。
3B. 手动生成¶
在这一部分,我们将通过Paul Graham文章示例来演示如何创建card.json
和README.md
文件,这个示例我们在1A部分已经使用过(如果您在步骤1选择了1C选项也同样适用)。
card.json
¶
演示¶
{
"name": "Paul Graham Essay",
"description": "A labelled RAG dataset based off an essay by Paul Graham, consisting of queries, reference answers, and reference contexts.",
"numberObservations": 44,
"containsExamplesByHumans": false,
"containsExamplesByAI": true,
"sourceUrls": [
"http://www.paulgraham.com/articles.html"
],
"baselines": [
{
"name": "llamaindex",
"config": {
"chunkSize": 1024,
"llm": "gpt-3.5-turbo",
"similarityTopK": 2,
"embedModel": "text-embedding-ada-002"
},
"metrics": {
"contextSimilarity": 0.934,
"correctness": 4.239,
"faithfulness": 0.977,
"relevancy": 0.977
},
"codeUrl": "https://github.com/run-llama/llama-hub/blob/main/llama_hub/llama_datasets/paul_graham_essay/llamaindex_baseline.py"
}
]
}
模板¶
{
"name": <FILL-IN>,
"description": <FILL-IN>,
"numberObservations": <FILL-IN>,
"containsExamplesByHumans": <FILL-IN>,
"containsExamplesByAI": <FILL-IN>,
"sourceUrls": [
<FILL-IN>,
],
"baselines": [
{
"name": <FILL-IN>,
"config": {
"chunkSize": <FILL-IN>,
"llm": <FILL-IN>,
"similarityTopK": <FILL-IN>,
"embedModel": <FILL-IN>
},
"metrics": {
"contextSimilarity": <FILL-IN>,
"correctness": <FILL-IN>,
"faithfulness": <FILL-IN>,
"relevancy": <FILL-IN>
},
"codeUrl": <OPTIONAL-FILL-IN>
}
]
}
README.md
¶
在这一步骤中,最基本的要求是采用下方模板并填写必要项目,这相当于将数据集名称更改为您希望用于新提交的数据集名称。
模板¶
点击此处获取README.md
模板文件。只需复制粘贴该文件内容,并根据您选择的新数据集名称,将占位符"[NAME]"和"[NAME-CAMELCASE]"替换为适当的值。例如:
- "{NAME}" = "保罗·格雷厄姆文集数据集"
- "{NAME_CAMELCASE}" = PaulGrahamEssayDataset
4. 向llama-hub仓库提交Pull Request¶
现在是提交新数据集元数据并在数据集注册表中创建新条目的时机,该注册表存储在文件library.json
中(即查看此处)。
4a. 在llama_hub/llama_datasets
目录下创建一个新目录,并添加你的card.json
和README.md
文件:¶
cd llama-hub # cd into local clone of llama-hub
cd llama_hub/llama_datasets
git checkout -b my-new-dataset # create a new git branch
mkdir <dataset_name_snake_case> # follow convention of other datasets
cd <dataset_name_snake_case>
vim card.json # use vim or another text editor to add in the contents for card.json
vim README.md # use vim or another text editor to add in the contents for README.md
4b. 在llama_hub/llama_datasets/library.json
中创建一个条目¶
cd llama_hub/llama_datasets
vim library.json # use vim or another text editor to register your new dataset
演示 library.json
¶
"PaulGrahamEssayDataset": {
"id": "llama_datasets/paul_graham_essay",
"author": "nerdai",
"keywords": ["rag"]
}
library.json
的模板¶
"<FILL-IN>": {
"id": "llama_datasets/<dataset_name_snake_case>",
"author": "<FILL-IN>",
"keywords": ["rag"]
}
注意: 请使用与4a中相同的dataset_name_snake_case
。
5. 向llama-datasets仓库提交Pull Request¶
在提交流程的最后一步,您需要将实际的LabelledRagDataset
(json格式)以及源数据文件提交到llama-datasets
Github仓库。
5a. 在llama_datasets/
目录下创建一个新文件夹:¶
cd llama-datasets # cd into local clone of llama-datasets
git checkout -b my-new-dataset # create a new git branch
mkdir <dataset_name_snake_case> # use the same name as used in Step 4.
cd <dataset_name_snake_case>
cp <path-in-local-machine>/rag_dataset.json . # add rag_dataset.json
mkdir source_files # time to add all of the source files
cp -r <path-in-local-machine>/source_files ./source_files # add all source files
注意: 请使用与步骤4中相同的dataset_name_snake_case
。
5b. 使用git add
和commit
提交您的更改,然后推送到您的fork¶
git add .
git commit -m "my new dataset submission"
git push origin my-new-dataset
完成这一步后,前往llama-datasets的Github页面。你应该能看到从你的fork创建pull request的选项。现在就可以继续操作了。
瞧!¶
您已经完成了数据集提交流程的最后一步!🎉🦙 恭喜您,感谢您的贡献!