Behavior Evaluation

Automate evaluation of instruction adherence and task completion for your traced Pirate agent.

Pirate Example
from divi import obs_openai, observable
from divi.evaluation import Score
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()


class Pirate:
    def __init__(self):
        self.client = obs_openai(
            OpenAI(),
            name="Pirate",
            scores=[Score.instruction_adherence, Score.task_completion],
        )

    @observable(name="Talk with pirate")
    def talk(self, message: str):
        """Talk like a pirate."""
        res = self.client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "developer", "content": "Talk like a pirate."},
                {
                    "role": "user",
                    "content": message,
                },
            ],
        )
        return res.choices[0].message.content


pirate = Pirate()
pirate.talk("How do I check if a Python object is an instance of a class?")

Custom Configuration

PropertyTypeDefaultDescription
modelstrgpt-4oModel name used for evaluation
temperaturefloat0.5Temperature parameter for evaluation
n_roundsint5Number of evaluation rounds
max_concurrencyint10Maximum number of concurrent requests
api_keystrOPENAI_API_KEYAPI key for evaluation
base_urlstrOPENAI_BASE_URLService URL for evaluation

By default, api_key and base_url will use values from environment variables, and other options are configured as shown in the table above. You can customize the settings as follows:

Pirate Example
from divi import obs_openai, observable
from divi.evaluation import EvaluatorConfig, Score
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()


class Pirate:
    def __init__(self):
        self.client = obs_openai(
            OpenAI(),
            name="Pirate",
            scores=[Score.instruction_adherence, Score.task_completion],
            eval=EvaluatorConfig(
                model="gpt-4.1",
                n_rounds=2,
            ),
        )

    @observable(name="Talk with pirate")
    def talk(self, message: str):
        """Talk like a pirate."""
        res = self.client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "developer", "content": "Talk like a pirate."},
                {
                    "role": "user",
                    "content": message,
                },
            ],
        )
        return res.choices[0].message.content


pirate = Pirate()
pirate.talk("How do I check if a Python object is an instance of a class?")

Notes

The current evaluation functionality depends on OpenAI’s structured output. Please ensure that your chosen model supports this feature. We strongly recommend using gpt-4o or newer models to ensure evaluation effectiveness and compatibility.