模式

BaseSample

基类： BaseModel

用于评估样本的基类。

to_dict

to_dict() -> Dict

获取样本的字典表示，不包含属性值为 None 的属性。

Source code in ragas/src/ragas/dataset_schema.py

def to_dict(self) -> t.Dict:
    """
    Get the dictionary representation of the sample without attributes that are None.
    """
    return self.model_dump(exclude_none=True)

get_features

get_features() -> List[str]

获取样本中不为 None 的特征。

Source code in ragas/src/ragas/dataset_schema.py

def get_features(self) -> t.List[str]:
    """
    Get the features of the sample that are not None.
    """
    return list(self.to_dict().keys())

to_string

to_string() -> str

获取样本的字符串表示。

Source code in ragas/src/ragas/dataset_schema.py

def to_string(self) -> str:
    """
    Get the string representation of the sample.
    """
    sample_dict = self.to_dict()
    return "".join(f"\n{key}:\n\t{val}\n" for key, val in sample_dict.items())

SingleTurnSample

基类: BaseSample

表示用于单轮交互的评估样本。

属性：

名称	类型	描述
`user_input`	`Optional[str]`	用户的输入查询。
`retrieved_contexts`	`Optional[List[str]]`	为查询检索到的上下文列表。
`reference_contexts`	`Optional[List[str]]`	查询的参考上下文列表。
`response`	`Optional[str]`	针对该查询生成的响应。
`multi_responses`	`Optional[List[str]]`	为该查询生成的多个响应列表。
`reference`	`Optional[str]`	查询的参考答案。
`rubric`	`Optional[Dict[str, str]]`	样本的评估标准。

MultiTurnSample

基类: BaseSample

表示用于多轮交互的评估样本。

属性：

名称	类型	描述
`user_input`	`List[Union[HumanMessage, AIMessage, ToolMessage]]`	表示对话轮次的消息列表。
`reference`	`(Optional[str], optional)`	会话的参考答案或预期结果。
`reference_tool_calls`	`(Optional[List[ToolCall]], optional)`	该对话的预期工具调用列表。
`rubrics`	`(Optional[Dict[str, str]], optional)`	Evaluation rubrics for the conversation.
`reference_topics`	`(Optional[List[str]], optional)`	供对话参考的主题列表。

validate_user_input `classmethod`

validate_user_input(messages: List[Union[HumanMessage, AIMessage, ToolMessage]]) -> List[Union[HumanMessage, AIMessage, ToolMessage]]

验证用户输入的消息。

Source code in ragas/src/ragas/dataset_schema.py

@field_validator("user_input")
@classmethod
def validate_user_input(
    cls,
    messages: t.List[t.Union[HumanMessage, AIMessage, ToolMessage]],
) -> t.List[t.Union[HumanMessage, AIMessage, ToolMessage]]:
    """Validates the user input messages."""
    if not (
        isinstance(m, (HumanMessage, AIMessage, ToolMessage)) for m in messages
    ):
        raise ValueError(
            "All inputs must be instances of HumanMessage, AIMessage, or ToolMessage."
        )

    has_seen_ai_message = False

    for i, m in enumerate(messages):
        if isinstance(m, AIMessage):
            has_seen_ai_message = True

        elif isinstance(m, ToolMessage):
            # Rule 1: ToolMessage must be preceded by an AIMessage somewhere in the conversation
            if not has_seen_ai_message:
                raise ValueError(
                    "ToolMessage must be preceded by an AIMessage somewhere in the conversation."
                )

            # Rule 2: ToolMessage must follow an AIMessage or another ToolMessage
            if i > 0:
                prev_message = messages[i - 1]

                if isinstance(prev_message, AIMessage):
                    # Rule 3: If following AIMessage, that message must have tool_calls
                    if not prev_message.tool_calls:
                        raise ValueError(
                            "ToolMessage must follow an AIMessage where tools were called."
                        )
                elif not isinstance(prev_message, ToolMessage):
                    # Not following AIMessage or ToolMessage
                    raise ValueError(
                        "ToolMessage must follow an AIMessage or another ToolMessage."
                    )

    return messages

to_messages

to_messages()

将用户输入的消息转换为字典列表。

Source code in ragas/src/ragas/dataset_schema.py

def to_messages(self):
    """Converts the user input messages to a list of dictionaries."""
    return [m.model_dump() for m in self.user_input]

pretty_repr

pretty_repr()

返回对话的美观字符串表示。

Source code in ragas/src/ragas/dataset_schema.py

def pretty_repr(self):
    """Returns a pretty string representation of the conversation."""
    lines = []
    for m in self.user_input:
        lines.append(m.pretty_repr())

    return "\n".join(lines)

RagasDataset `dataclass`

RagasDataset(samples: List[Sample])

基类: ABC, Generic[Sample]

to_list `abstractmethod`

to_list() -> List[Dict]

将样本转换为字典列表。

Source code in ragas/src/ragas/dataset_schema.py

@abstractmethod
def to_list(self) -> t.List[t.Dict]:
    """Converts the samples to a list of dictionaries."""
    pass

from_list `abstractmethod` `classmethod`

from_list(data: List[Dict]) -> T

从字典列表创建一个 RagasDataset。

Source code in ragas/src/ragas/dataset_schema.py

@classmethod
@abstractmethod
def from_list(cls: t.Type[T], data: t.List[t.Dict]) -> T:
    """Creates an RagasDataset from a list of dictionaries."""
    pass

validate_samples

validate_samples(samples: List[Sample]) -> List[Sample]

验证所有样本是否为相同类型。

Source code in ragas/src/ragas/dataset_schema.py

def validate_samples(self, samples: t.List[Sample]) -> t.List[Sample]:
    """Validates that all samples are of the same type."""
    if len(samples) == 0:
        return samples

    first_sample_type = type(samples[0])
    for i, sample in enumerate(samples):
        if not isinstance(sample, first_sample_type):
            raise ValueError(
                f"Sample at index {i} is of type {type(sample)}, expected {first_sample_type}"
            )

    return samples

get_sample_type

get_sample_type() -> Type[Sample]

返回数据集中样本的类型。

Source code in ragas/src/ragas/dataset_schema.py

def get_sample_type(self) -> t.Type[Sample]:
    """Returns the type of the samples in the dataset."""
    return type(self.samples[0])

to_hf_dataset

to_hf_dataset() -> Dataset

将数据集转换为 Hugging Face Dataset。

Source code in ragas/src/ragas/dataset_schema.py

def to_hf_dataset(self) -> HFDataset:
    """Converts the dataset to a Hugging Face Dataset."""
    try:
        from datasets import Dataset as HFDataset
    except ImportError:
        raise ImportError(
            "datasets is not installed. Please install it to use this function."
        )

    return HFDataset.from_list(self.to_list())

from_hf_dataset `classmethod`

from_hf_dataset(dataset: Dataset) -> T

从 Hugging Face Dataset 创建一个 EvaluationDataset。

Source code in ragas/src/ragas/dataset_schema.py

@classmethod
def from_hf_dataset(cls: t.Type[T], dataset: HFDataset) -> T:
    """Creates an EvaluationDataset from a Hugging Face Dataset."""
    return cls.from_list(dataset.to_list())

to_pandas

to_pandas() -> DataFrame

将数据集转换为 pandas DataFrame。

Source code in ragas/src/ragas/dataset_schema.py

def to_pandas(self) -> PandasDataframe:
    """Converts the dataset to a pandas DataFrame."""
    try:
        import pandas as pd
    except ImportError:
        raise ImportError(
            "pandas is not installed. Please install it to use this function."
        )

    data = self.to_list()
    return pd.DataFrame(data)

from_pandas `classmethod`

from_pandas(dataframe: DataFrame)

从 pandas DataFrame 创建一个 EvaluationDataset。

Source code in ragas/src/ragas/dataset_schema.py

@classmethod
def from_pandas(cls, dataframe: PandasDataframe):
    """Creates an EvaluationDataset from a pandas DataFrame."""
    return cls.from_list(dataframe.to_dict(orient="records"))

功能

features()

返回样本的特征。

Source code in ragas/src/ragas/dataset_schema.py

def features(self):
    """Returns the features of the samples."""
    return self.samples[0].get_features()

from_dict `classmethod`

from_dict(mapping: Dict) -> T

从字典创建一个 EvaluationDataset。

Source code in ragas/src/ragas/dataset_schema.py

@classmethod
def from_dict(cls: t.Type[T], mapping: t.Dict) -> T:
    """Creates an EvaluationDataset from a dictionary."""
    samples = []
    if all(
        "user_input" in item and isinstance(mapping[0]["user_input"], list)
        for item in mapping
    ):
        samples.extend(MultiTurnSample(**sample) for sample in mapping)
    else:
        samples.extend(SingleTurnSample(**sample) for sample in mapping)
    return cls(samples=samples)

to_csv

to_csv(path: Union[str, Path])

将数据集转换为 CSV 文件。

Source code in ragas/src/ragas/dataset_schema.py

def to_csv(self, path: t.Union[str, Path]):
    """Converts the dataset to a CSV file."""
    import csv

    data = self.to_list()
    if not data:
        return

    fieldnames = data[0].keys()

    with open(path, "w", newline="") as csvfile:
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        for row in data:
            writer.writerow(row)

to_jsonl

to_jsonl(path: Union[str, Path])

将数据集转换为 JSONL 文件。

Source code in ragas/src/ragas/dataset_schema.py

def to_jsonl(self, path: t.Union[str, Path]):
    """Converts the dataset to a JSONL file."""
    with open(path, "w") as jsonlfile:
        for sample in self.to_list():
            jsonlfile.write(json.dumps(sample, ensure_ascii=False) + "\n")

from_jsonl `classmethod`

from_jsonl(path: Union[str, Path]) -> T

从 JSONL 文件创建 EvaluationDataset。

Source code in ragas/src/ragas/dataset_schema.py

@classmethod
def from_jsonl(cls: t.Type[T], path: t.Union[str, Path]) -> T:
    """Creates an EvaluationDataset from a JSONL file."""
    with open(path, "r") as jsonlfile:
        data = [json.loads(line) for line in jsonlfile]
    return cls.from_list(data)

评估数据集 `dataclass`

EvaluationDataset(samples: List[Sample])

基类： RagasDataset[SingleTurnSampleOrMultiTurnSample]

表示评估样本的数据集。

属性：

名称	类型	描述
`samples`	`List[BaseSample]`	评估样本列表。

方法：

名称	描述
`validate_samples`	验证所有样本是否为相同类型。
`get_sample_type`	返回数据集中样本的类型。
`to_hf_dataset`	将数据集转换为 Hugging Face Dataset。
`to_pandas`	将数据集转换为 pandas DataFrame。
`features`	返回样本的特征。
`from_list`	从字典列表创建 EvaluationDataset。
`from_dict`	从字典创建一个 EvaluationDataset。
`to_csv`	将数据集转换为 CSV 文件。
`to_jsonl`	将数据集转换为 JSONL 文件。
`from_jsonl`	从 JSONL 文件创建一个 EvaluationDataset。

EvaluationResult `dataclass`

EvaluationResult(scores: List[Dict[str, Any]], dataset: EvaluationDataset, binary_columns: List[str] = list(), cost_cb: Optional[CostCallbackHandler] = None, traces: List[Dict[str, Any]] = list(), ragas_traces: Dict[str, ChainRun] = dict(), run_id: Optional[UUID] = None)

一个用于存储和处理评估结果的类。

属性：

名称	类型	描述
`scores`	`Dataset`	包含评估分数的数据集。
`dataset`	`(Dataset, optional)`	用于评估的原始数据集。默认值为 None.
`binary_columns`	`list of str, optional`	二元指标的列列表。默认是空列表。
`cost_cb`	`(CostCallbackHandler, optional)`	用于成本计算的回调处理程序。默认值为 None。

to_pandas

to_pandas(batch_size: int | None = None, batched: bool = False)

将结果转换为 pandas DataFrame。

参数：

名称	类型	描述	默认值
`batch_size`	`int`	用于转换的批量大小。默认值为 None。	`None`
`batched`	`bool`	是否批量转换。默认值为 False。	`False`

返回：

类型	描述
`DataFrame`	结果为 pandas DataFrame。

引发：

类型	描述
`ValueError`	如果未提供数据集。

Source code in ragas/src/ragas/dataset_schema.py

def to_pandas(self, batch_size: int | None = None, batched: bool = False):
    """
    Convert the result to a pandas DataFrame.

    Parameters
    ----------
    batch_size : int, optional
        The batch size for conversion. Default is None.
    batched : bool, optional
        Whether to convert in batches. Default is False.

    Returns
    -------
    pandas.DataFrame
        The result as a pandas DataFrame.

    Raises
    ------
    ValueError
        If the dataset is not provided.
    """
    try:
        import pandas as pd
    except ImportError:
        raise ImportError(
            "pandas is not installed. Please install it to use this function."
        )

    if self.dataset is None:
        raise ValueError("dataset is not provided for the results class")
    assert len(self.scores) == len(self.dataset)
    # convert both to pandas dataframes and concatenate
    scores_df = pd.DataFrame(self.scores)
    dataset_df = self.dataset.to_pandas()
    return pd.concat([dataset_df, scores_df], axis=1)

total_tokens

total_tokens() -> Union[List[TokenUsage], TokenUsage]

计算评估中使用的总令牌数。

返回：

类型	描述
`list of TokenUsage or TokenUsage`	使用的总令牌数。

引发：

类型	描述
`ValueError`	如果未提供成本回调处理程序。

Source code in ragas/src/ragas/dataset_schema.py

def total_tokens(self) -> t.Union[t.List[TokenUsage], TokenUsage]:
    """
    Compute the total tokens used in the evaluation.

    Returns
    -------
    list of TokenUsage or TokenUsage
        The total tokens used.

    Raises
    ------
    ValueError
        If the cost callback handler is not provided.
    """
    if self.cost_cb is None:
        raise ValueError(
            "The evaluate() run was not configured for computing cost. Please provide a token_usage_parser function to evaluate() to compute cost."
        )
    return self.cost_cb.total_tokens()

total_cost

total_cost(cost_per_input_token: Optional[float] = None, cost_per_output_token: Optional[float] = None, per_model_costs: Dict[str, Tuple[float, float]] = {}) -> float

计算评估的总成本。

参数：

名称	类型	描述	默认值
`cost_per_input_token`	`float`	每个输入 token 的费用。默认是 None。	`None`
`cost_per_output_token`	`float`	每个输出令牌的成本。默认值为 None。	`None`
`per_model_costs`	`dict of str to tuple of float`	每个模型的费用。默认是一个空字典。	`{}`

返回：

类型	描述
`float`	评估的总成本。

引发：

类型	描述
`ValueError`	如果未提供成本回调处理程序。

Source code in ragas/src/ragas/dataset_schema.py

def total_cost(
    self,
    cost_per_input_token: t.Optional[float] = None,
    cost_per_output_token: t.Optional[float] = None,
    per_model_costs: t.Dict[str, t.Tuple[float, float]] = {},
) -> float:
    """
    Compute the total cost of the evaluation.

    Parameters
    ----------
    cost_per_input_token : float, optional
        The cost per input token. Default is None.
    cost_per_output_token : float, optional
        The cost per output token. Default is None.
    per_model_costs : dict of str to tuple of float, optional
        The per model costs. Default is an empty dictionary.

    Returns
    -------
    float
        The total cost of the evaluation.

    Raises
    ------
    ValueError
        If the cost callback handler is not provided.
    """
    if self.cost_cb is None:
        raise ValueError(
            "The evaluate() run was not configured for computing cost. Please provide a token_usage_parser function to evaluate() to compute cost."
        )
    return self.cost_cb.total_cost(
        cost_per_input_token, cost_per_output_token, per_model_costs
    )

MetricAnnotation

基类： BaseModel

from_json `classmethod`

from_json(path: str, metric_name: Optional[str]) -> 'MetricAnnotation'

从 JSON 文件加载注释

Source code in ragas/src/ragas/dataset_schema.py

@classmethod
def from_json(cls, path: str, metric_name: t.Optional[str]) -> "MetricAnnotation":
    """Load annotations from a JSON file"""
    dataset = json.load(open(path))
    return cls._process_dataset(dataset, metric_name)

from_app `classmethod`

from_app(run_id: str, metric_name: Optional[str] = None) -> 'MetricAnnotation'

从 URL 获取注释，可使用评估结果或 run_id

参数：

名称	类型	描述	默认值
`run_id`	`str`	直接指定 run ID 以获取注释	required
`metric_name`	`str`	要筛选的特定指标的名称	`None`

返回：

类型	描述
`MetricAnnotation`	来自 API 的标注数据

引发：

类型	描述
`ValueError`	如果未提供 run_id

Source code in ragas/src/ragas/dataset_schema.py

@classmethod
def from_app(
    cls,
    run_id: str,
    metric_name: t.Optional[str] = None,
) -> "MetricAnnotation":
    """
    Fetch annotations from a URL using either evaluation result or run_id

    Parameters
    ----------
    run_id : str
        Direct run ID to fetch annotations
    metric_name : str, optional
        Name of the specific metric to filter

    Returns
    -------
    MetricAnnotation
        Annotation data from the API

    Raises
    ------
    ValueError
        If run_id is not provided
    """
    if run_id is None:
        raise ValueError("run_id must be provided")

    endpoint = f"/api/v1/alignment/evaluation/annotation/{run_id}"

    app_token = get_app_token()
    base_url = get_api_url()
    app_url = get_app_url()

    response = requests.get(
        f"{base_url}{endpoint}",
        headers={
            "Content-Type": "application/json",
            "x-app-token": app_token,
            "x-source": RAGAS_API_SOURCE,
            "x-app-version": __version__,
        },
    )

    check_api_response(response)
    dataset = response.json()["data"]

    if not dataset:
        evaluation_url = build_evaluation_app_url(app_url, run_id)
        raise ValueError(
            f"No annotations found. Please annotate the Evaluation first then run this method. "
            f"\nNote: you can annotate the evaluations using the Ragas app by going to {evaluation_url}"
        )

    return cls._process_dataset(dataset, metric_name)

SingleMetricAnnotation

基类： BaseModel

train_test_split

train_test_split(test_size: float = 0.2, seed: int = 42, stratify: Optional[List[Any]] = None) -> Tuple['SingleMetricAnnotation', 'SingleMetricAnnotation']

将数据集分为训练集和测试集。

参数: test_size (float): 数据集中包含在测试划分中的比例。 seed (int): 用于可重现性的随机种子。 stratify (list): 要在其上进行分层划分的列值。

Source code in ragas/src/ragas/dataset_schema.py

def train_test_split(
    self,
    test_size: float = 0.2,
    seed: int = 42,
    stratify: t.Optional[t.List[t.Any]] = None,
) -> t.Tuple["SingleMetricAnnotation", "SingleMetricAnnotation"]:
    """
    Split the dataset into training and testing sets.

    Parameters:
        test_size (float): The proportion of the dataset to include in the test split.
        seed (int): Random seed for reproducibility.
        stratify (list): The column values to stratify the split on.
    """
    raise NotImplementedError

示例

sample(n: int, stratify_key: Optional[str] = None) -> 'SingleMetricAnnotation'

创建数据集的子集。

参数: n (int): 要包括在子集中的样本数量。 stratify_key (str): 用于对该子集进行分层的列。

返回：SingleMetricAnnotation：数据集的一个子集，包含 n 个样本。

Source code in ragas/src/ragas/dataset_schema.py

def sample(
    self, n: int, stratify_key: t.Optional[str] = None
) -> "SingleMetricAnnotation":
    """
    Create a subset of the dataset.

    Parameters:
        n (int): The number of samples to include in the subset.
        stratify_key (str): The column to stratify the subset on.

    Returns:
        SingleMetricAnnotation: A subset of the dataset with `n` samples.
    """
    if n > len(self.samples):
        raise ValueError(
            "Requested sample size exceeds the number of available samples."
        )

    if stratify_key is None:
        # Simple random sampling
        sampled_indices = random.sample(range(len(self.samples)), n)
        sampled_samples = [self.samples[i] for i in sampled_indices]
    else:
        # Stratified sampling
        class_groups = defaultdict(list)
        for idx, sample in enumerate(self.samples):
            key = sample[stratify_key]
            class_groups[key].append(idx)

        # Determine the proportion of samples to take from each class
        total_samples = sum(len(indices) for indices in class_groups.values())
        proportions = {
            cls: len(indices) / total_samples
            for cls, indices in class_groups.items()
        }

        sampled_indices = []
        for cls, indices in class_groups.items():
            cls_sample_count = int(np.round(proportions[cls] * n))
            cls_sample_count = min(
                cls_sample_count, len(indices)
            )  # Don't oversample
            sampled_indices.extend(random.sample(indices, cls_sample_count))

        # Handle any rounding discrepancies to ensure exactly `n` samples
        while len(sampled_indices) < n:
            remaining_indices = set(range(len(self.samples))) - set(sampled_indices)
            if not remaining_indices:
                break
            sampled_indices.append(random.choice(list(remaining_indices)))

        sampled_samples = [self.samples[i] for i in sampled_indices]

    return SingleMetricAnnotation(name=self.name, samples=sampled_samples)

批次

batch(batch_size: int, drop_last_batch: bool = False)

创建一个批次迭代器。

参数: batch_size (int): 每个批次中的样本数量。 stratify (str): 用于对批次进行分层的列。 drop_last_batch (bool): 如果最后一个批次小于指定的批次大小，是否丢弃该批次。

Source code in ragas/src/ragas/dataset_schema.py

def batch(
    self,
    batch_size: int,
    drop_last_batch: bool = False,
):
    """
    Create a batch iterator.

    Parameters:
        batch_size (int): The number of samples in each batch.
        stratify (str): The column to stratify the batches on.
        drop_last_batch (bool): Whether to drop the last batch if it is smaller than the specified batch size.
    """

    samples = self.samples[:]
    random.shuffle(samples)

    all_batches = [
        samples[i : i + batch_size]
        for i in range(0, len(samples), batch_size)
        if len(samples[i : i + batch_size]) == batch_size or not drop_last_batch
    ]

    return all_batches

stratified_batches

stratified_batches(batch_size: int, stratify_key: str, drop_last_batch: bool = False, replace: bool = False) -> List[List[SampleAnnotation]]

根据指定键创建分层批次，确保各类别按比例表示。

参数： batch_size (int): 每个批次的样本数量。 stratify_key (str): 用于分层的 metric_input 中的键（例如，类别标签）。 drop_last_batch (bool): 如果为 True，当最后一个批次的样本少于 batch_size 时，丢弃最后一个批次。 replace (bool): 如果为 True，在必要时允许重用同一类的样本来填充批次。

返回: List[List[SampleAnnotation]]: 分层批次的列表，每个批次都是一组 SampleAnnotation 对象。

Source code in ragas/src/ragas/dataset_schema.py

def stratified_batches(
    self,
    batch_size: int,
    stratify_key: str,
    drop_last_batch: bool = False,
    replace: bool = False,
) -> t.List[t.List[SampleAnnotation]]:
    """
    Create stratified batches based on a specified key, ensuring proportional representation.

    Parameters:
        batch_size (int): Number of samples per batch.
        stratify_key (str): Key in `metric_input` used for stratification (e.g., class labels).
        drop_last_batch (bool): If True, drops the last batch if it has fewer samples than `batch_size`.
        replace (bool): If True, allows reusing samples from the same class to fill a batch if necessary.

    Returns:
        List[List[SampleAnnotation]]: A list of stratified batches, each batch being a list of SampleAnnotation objects.
    """
    # Group samples based on the stratification key
    class_groups = defaultdict(list)
    for sample in self.samples:
        key = sample[stratify_key]
        class_groups[key].append(sample)

    # Shuffle each class group for randomness
    for group in class_groups.values():
        random.shuffle(group)

    # Determine the number of batches required
    total_samples = len(self.samples)
    num_batches = (
        np.ceil(total_samples / batch_size).astype(int)
        if drop_last_batch
        else np.floor(total_samples / batch_size).astype(int)
    )
    samples_per_class_per_batch = {
        cls: max(1, len(samples) // num_batches)
        for cls, samples in class_groups.items()
    }

    # Create stratified batches
    all_batches = []
    while len(all_batches) < num_batches:
        batch = []
        for cls, samples in list(class_groups.items()):
            # Determine the number of samples to take from this class
            count = min(
                samples_per_class_per_batch[cls],
                len(samples),
                batch_size - len(batch),
            )
            if count > 0:
                # Add samples from the current class
                batch.extend(samples[:count])
                class_groups[cls] = samples[count:]  # Remove used samples
            elif replace and len(batch) < batch_size:
                # Reuse samples if `replace` is True
                batch.extend(random.choices(samples, k=batch_size - len(batch)))

        # Shuffle the batch to mix classes
        random.shuffle(batch)
        if len(batch) == batch_size or not drop_last_batch:
            all_batches.append(batch)

    return all_batches

get_prompt_annotations

get_prompt_annotations() -> Dict[str, List[PromptAnnotation]]

以列表形式获取每个提示的所有注释。

Source code in ragas/src/ragas/dataset_schema.py

def get_prompt_annotations(self) -> t.Dict[str, t.List[PromptAnnotation]]:
    """
    Get all the prompt annotations for each prompt as a list.
    """
    prompt_annotations = defaultdict(list)
    for sample in self.samples:
        if sample.is_accepted:
            for prompt_name, prompt_annotation in sample.prompts.items():
                prompt_annotations[prompt_name].append(prompt_annotation)
    return prompt_annotations

消息

基类： BaseModel

表示一个通用的消息。

属性：

名称	类型	描述
`content`	`str`	消息的内容。
`metadata`	`(Optional[Dict[str, Any]], optional)`	与消息相关的附加元数据。

工具调用

基类： BaseModel

表示具有名称和参数的工具调用。

参数：

名称	类型	描述	默认值
`name`	`str`	被调用的工具的名称。	required
`args`	`Dict[str, Any]`	一个用于工具调用的参数字典，其中键是参数名称，值可以是字符串、整数或浮点数。	required

HumanMessage

基类: Message

表示来自人类用户的一条消息。

属性：

名称	类型	描述
`type`	`Literal[human]`	消息的类型，始终设置为 "human".

方法：

名称	描述
`pretty_repr`	返回人类消息的格式化字符串表示。

pretty_repr

pretty_repr()

返回人类消息的格式化字符串表示。

Source code in ragas/src/ragas/messages.py

def pretty_repr(self):
    """Returns a formatted string representation of the human message."""
    return f"Human: {self.content}"

工具消息

基类： Message

表示来自工具的消息。

属性：

名称	类型	描述
`type`	`Literal[tool]`	消息的类型，始终设置为 "tool".

方法：

名称	描述
`pretty_repr`	返回工具消息的格式化字符串表示。

pretty_repr

pretty_repr()

返回工具消息的格式化字符串表示。

Source code in ragas/src/ragas/messages.py

def pretty_repr(self):
    """Returns a formatted string representation of the tool message."""
    return f"ToolOutput: {self.content}"

AI消息

基类： Message

表示来自人工智能的消息。

属性：

名称	类型	描述
`type`	`Literal[ai]`	消息的类型，始终设置为 "ai".
`tool_calls`	`Optional[List[ToolCall]]`	由 AI 发出的工具调用列表（如果有）。
`metadata`	`Optional[Dict[str, Any]]`	与 AI 消息关联的附加元数据。

方法：

名称	描述
`dict`	返回AI消息的字典表示形式。
`pretty_repr`	返回 AI 消息的格式化字符串表示。

to_dict

to_dict(**kwargs)

返回AI消息的字典表示形式。

Source code in ragas/src/ragas/messages.py

def to_dict(self, **kwargs):
    """
    Returns a dictionary representation of the AI message.
    """
    content = (
        self.content
        if self.tool_calls is None
        else {
            "text": self.content,
            "tool_calls": [tc.dict() for tc in self.tool_calls],
        }
    )
    return {"content": content, "type": self.type}

pretty_repr

pretty_repr()

返回 AI 消息的格式化字符串表示。

Source code in ragas/src/ragas/messages.py

def pretty_repr(self):
    """
    Returns a formatted string representation of the AI message.
    """
    lines = []
    if self.content != "":
        lines.append(f"AI: {self.content}")
    if self.tool_calls is not None:
        lines.append("Tools:")
        for tc in self.tool_calls:
            lines.append(f"  {tc.name}: {tc.args}")

    return "\n".join(lines)

ragas.evaluation.EvaluationResult `dataclass`

EvaluationResult(scores: List[Dict[str, Any]], dataset: EvaluationDataset, binary_columns: List[str] = list(), cost_cb: Optional[CostCallbackHandler] = None, traces: List[Dict[str, Any]] = list(), ragas_traces: Dict[str, ChainRun] = dict(), run_id: Optional[UUID] = None)

一个用于存储和处理评估结果的类。

属性：

名称	类型	描述
`scores`	`Dataset`	包含评估分数的数据集。
`dataset`	`(Dataset, optional)`	用于评估的原始数据集。默认值为 None.
`binary_columns`	`list of str, optional`	二元指标的列列表。默认是空列表。
`cost_cb`	`(CostCallbackHandler, optional)`	用于成本计算的回调处理程序。默认值为 None。

to_pandas

to_pandas(batch_size: int | None = None, batched: bool = False)

将结果转换为 pandas DataFrame。

参数：

名称	类型	描述	默认值
`batch_size`	`int`	用于转换的批量大小。默认值为 None。	`None`
`batched`	`bool`	是否批量转换。默认值为 False。	`False`

返回：

类型	描述
`DataFrame`	结果为 pandas DataFrame。

引发：

类型	描述
`ValueError`	如果未提供数据集。

Source code in ragas/src/ragas/dataset_schema.py

def to_pandas(self, batch_size: int | None = None, batched: bool = False):
    """
    Convert the result to a pandas DataFrame.

    Parameters
    ----------
    batch_size : int, optional
        The batch size for conversion. Default is None.
    batched : bool, optional
        Whether to convert in batches. Default is False.

    Returns
    -------
    pandas.DataFrame
        The result as a pandas DataFrame.

    Raises
    ------
    ValueError
        If the dataset is not provided.
    """
    try:
        import pandas as pd
    except ImportError:
        raise ImportError(
            "pandas is not installed. Please install it to use this function."
        )

    if self.dataset is None:
        raise ValueError("dataset is not provided for the results class")
    assert len(self.scores) == len(self.dataset)
    # convert both to pandas dataframes and concatenate
    scores_df = pd.DataFrame(self.scores)
    dataset_df = self.dataset.to_pandas()
    return pd.concat([dataset_df, scores_df], axis=1)

total_tokens

total_tokens() -> Union[List[TokenUsage], TokenUsage]

计算评估中使用的总令牌数。

返回：

类型	描述
`list of TokenUsage or TokenUsage`	使用的总令牌数。

引发：

类型	描述
`ValueError`	如果未提供成本回调处理程序。

Source code in ragas/src/ragas/dataset_schema.py

def total_tokens(self) -> t.Union[t.List[TokenUsage], TokenUsage]:
    """
    Compute the total tokens used in the evaluation.

    Returns
    -------
    list of TokenUsage or TokenUsage
        The total tokens used.

    Raises
    ------
    ValueError
        If the cost callback handler is not provided.
    """
    if self.cost_cb is None:
        raise ValueError(
            "The evaluate() run was not configured for computing cost. Please provide a token_usage_parser function to evaluate() to compute cost."
        )
    return self.cost_cb.total_tokens()

total_cost

total_cost(cost_per_input_token: Optional[float] = None, cost_per_output_token: Optional[float] = None, per_model_costs: Dict[str, Tuple[float, float]] = {}) -> float

计算评估的总成本。

参数：

名称	类型	描述	默认值
`cost_per_input_token`	`float`	每个输入 token 的费用。默认是 None。	`None`
`cost_per_output_token`	`float`	每个输出令牌的成本。默认值为 None。	`None`
`per_model_costs`	`dict of str to tuple of float`	每个模型的费用。默认是一个空字典。	`{}`

返回：

类型	描述
`float`	评估的总成本。

引发：

类型	描述
`ValueError`	如果未提供成本回调处理程序。

Source code in ragas/src/ragas/dataset_schema.py

def total_cost(
    self,
    cost_per_input_token: t.Optional[float] = None,
    cost_per_output_token: t.Optional[float] = None,
    per_model_costs: t.Dict[str, t.Tuple[float, float]] = {},
) -> float:
    """
    Compute the total cost of the evaluation.

    Parameters
    ----------
    cost_per_input_token : float, optional
        The cost per input token. Default is None.
    cost_per_output_token : float, optional
        The cost per output token. Default is None.
    per_model_costs : dict of str to tuple of float, optional
        The per model costs. Default is an empty dictionary.

    Returns
    -------
    float
        The total cost of the evaluation.

    Raises
    ------
    ValueError
        If the cost callback handler is not provided.
    """
    if self.cost_cb is None:
        raise ValueError(
            "The evaluate() run was not configured for computing cost. Please provide a token_usage_parser function to evaluate() to compute cost."
        )
    return self.cost_cb.total_cost(
        cost_per_input_token, cost_per_output_token, per_model_costs
    )

模式

BaseSample

to_dict

get_features

to_string

SingleTurnSample

MultiTurnSample

validate_user_input classmethod

to_messages

pretty_repr

RagasDataset dataclass

to_list abstractmethod

from_list abstractmethod classmethod

validate_samples

get_sample_type

to_hf_dataset

from_hf_dataset classmethod

to_pandas

from_pandas classmethod

功能

from_dict classmethod

to_csv

to_jsonl

from_jsonl classmethod

评估数据集 dataclass

EvaluationResult dataclass

to_pandas

total_tokens

total_cost

MetricAnnotation

from_json classmethod

from_app classmethod

SingleMetricAnnotation

train_test_split

示例

批次

stratified_batches

get_prompt_annotations

消息

工具调用

HumanMessage

pretty_repr

工具消息

pretty_repr

AI消息

to_dict

pretty_repr

ragas.evaluation.EvaluationResult dataclass

to_pandas

total_tokens

total_cost

validate_user_input `classmethod`

RagasDataset `dataclass`

to_list `abstractmethod`

from_list `abstractmethod` `classmethod`

from_hf_dataset `classmethod`

from_pandas `classmethod`

from_dict `classmethod`

from_jsonl `classmethod`

评估数据集 `dataclass`

EvaluationResult `dataclass`

from_json `classmethod`

from_app `classmethod`

ragas.evaluation.EvaluationResult `dataclass`