行为评估

自动化评估跟踪的海盗 (Pirate) 智能体的指令遵从性和任务完成度。

Pirate Example
from divi import obs_openai, observable
from divi.evaluation import Score
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()


class Pirate:
    def __init__(self):
        self.client = obs_openai(
            OpenAI(),
            name="Pirate",
            scores=[Score.instruction_adherence, Score.task_completion],
        )

    @observable(name="Talk with pirate")
    def talk(self, message: str):
        """Talk like a pirate."""
        res = self.client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "developer", "content": "Talk like a pirate."},
                {
                    "role": "user",
                    "content": message,
                },
            ],
        )
        return res.choices[0].message.content


pirate = Pirate()
pirate.talk("How do I check if a Python object is an instance of a class?")

自定义设置

PropertyTypeDefaultDescription
modelstrgpt-4o用于评估的模型名称
temperaturefloat0.5评估过程中的温度参数
n_roundsint5评估的轮数
max_concurrencyint10最大并发请求数
api_keystrOPENAI_API_KEY用于评估的 API 密钥
base_urlstrOPENAI_BASE_URL评估所用的服务商 URL

默认情况下,api_keybase_url 会使用环境变量中的值,其他选项配置如上表所示。您可以通过以下方式自定义设置:

Pirate Example
from divi import obs_openai, observable
from divi.evaluation import EvaluatorConfig, Score
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()


class Pirate:
    def __init__(self):
        self.client = obs_openai(
            OpenAI(),
            name="Pirate",
            scores=[Score.instruction_adherence, Score.task_completion],
            eval=EvaluatorConfig(
                model="gpt-4.1",
                n_rounds=2,
            ),
        )

    @observable(name="Talk with pirate")
    def talk(self, message: str):
        """Talk like a pirate."""
        res = self.client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "developer", "content": "Talk like a pirate."},
                {
                    "role": "user",
                    "content": message,
                },
            ],
        )
        return res.choices[0].message.content


pirate = Pirate()
pirate.talk("How do I check if a Python object is an instance of a class?")

注意事项

当前评估功能依赖于 OpenAI 的结构化输出,请确保您所选择的模型支持该功能,我们强烈建议使用 gpt-4o 或更新版本的模型以确保评估效果和兼容性。