Evaluation - Divine Agent

Behavior Evaluation

Automate evaluation of instruction adherence and task completion for your traced Pirate agent.

Pirate Example

from divi import obs_openai, observable
from divi.evaluation import Score
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()


class Pirate:
    def __init__(self):
        self.client = obs_openai(
            OpenAI(),
            name="Pirate",
            scores=[Score.instruction_adherence, Score.task_completion],
        )

    @observable(name="Talk with pirate")
    def talk(self, message: str):
        """Talk like a pirate."""
        res = self.client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "developer", "content": "Talk like a pirate."},
                {
                    "role": "user",
                    "content": message,
                },
            ],
        )
        return res.choices[0].message.content


pirate = Pirate()
pirate.talk("How do I check if a Python object is an instance of a class?")

Custom Configuration

Property	Type	Default	Description
`model`	str	gpt-4o	Model name used for evaluation
`temperature`	float	0.5	Temperature parameter for evaluation
`n_rounds`	int	5	Number of evaluation rounds
`max_concurrency`	int	10	Maximum number of concurrent requests
`api_key`	str	OPENAI_API_KEY	API key for evaluation
`base_url`	str	OPENAI_BASE_URL	Service URL for evaluation

By default, api_key and base_url will use values from environment variables, and other options are configured as shown in the table above. You can customize the settings as follows:

Pirate Example

from divi import obs_openai, observable
from divi.evaluation import EvaluatorConfig, Score
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()


class Pirate:
    def __init__(self):
        self.client = obs_openai(
            OpenAI(),
            name="Pirate",
            scores=[Score.instruction_adherence, Score.task_completion],
            eval=EvaluatorConfig(
                model="gpt-4.1",
                n_rounds=2,
            ),
        )

    @observable(name="Talk with pirate")
    def talk(self, message: str):
        """Talk like a pirate."""
        res = self.client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "developer", "content": "Talk like a pirate."},
                {
                    "role": "user",
                    "content": message,
                },
            ],
        )
        return res.choices[0].message.content


pirate = Pirate()
pirate.talk("How do I check if a Python object is an instance of a class?")

Notes

The current evaluation functionality depends on OpenAI’s structured output. Please ensure that your chosen model supports this feature. We strongly recommend using gpt-4o or newer models to ensure evaluation effectiveness and compatibility.

Get Started

​Behavior Evaluation

​Custom Configuration

​Notes

Behavior Evaluation

Custom Configuration

Notes