Name: Bagual Ai Evals Setup
Author: rhuanbarros

搵技能.../

Bagual Ai Evals Setup | Skills Pool

.venv\Scripts\activate

pip install -U deepeval

python -c "import deepeval; print(deepeval.__version__)"

Error	Cause	Fix
`error: externally-managed-environment`	System Python (modern Linux)	Use venv or `pip install --user deepeval` or `pipx install deepeval`
`pip: command not found`	Pip not in PATH	`python -m pip install -U deepeval`
`SSL certificate verify failed`	Corporate proxy	`pip install --trusted-host pypi.org --trusted-host files.pythonhosted.org -U deepeval`
`No matching distribution found for deepeval`	Python < 3.10	Upgrade Python

# Linux/Mac
export OPENAI_API_KEY="sk-..."

# Windows (PowerShell)
$env:OPENAI_API_KEY="sk-..."

OPENAI_API_KEY=sk-...

from deepeval.metrics import GEval
metric = GEval(name="Correctness", criteria="...", model="claude-3-5-sonnet")

from deepeval.models import DeepEvalBaseLLM
from openai import OpenAI

class CustomVLLMModel(DeepEvalBaseLLM):
    def __init__(self, base_url: str, model_name: str):
        self.client = OpenAI(base_url=base_url, api_key="not-needed")
        self.model_name = model_name

    def load_model(self):
        return self.client

    def generate(self, prompt: str) -> str:
        response = self.client.chat.completions.create(
            model=self.model_name,
            messages=[{"role": "user", "content": prompt}],
        )
        return response.choices[0].message.content

    async def a_generate(self, prompt: str) -> str:
        return self.generate(prompt)  # or implement real async

    def get_model_name(self) -> str:
        return self.model_name

# Usage:
custom_model = CustomVLLMModel(base_url="http://localhost:8000/v1", model_name="qwen3-30b-a3b")
metric = GEval(name="Correctness", criteria="...", model=custom_model)

deepeval login

export CONFIDENT_API_KEY="confident_us..."

CONFIDENT_API_KEY=confident_us...

export DEEPEVAL_RESULTS_FOLDER="./eval-results"

from deepeval import assert_test
from deepeval.test_case import LLMTestCase, LLMTestCaseParams
from deepeval.metrics import GEval

def test_smoke():
    metric = GEval(
        name="Correctness",
        criteria="Determine if the actual output is factually correct based on the expected output.",
        evaluation_params=[LLMTestCaseParams.ACTUAL_OUTPUT, LLMTestCaseParams.EXPECTED_OUTPUT],
        threshold=0.5,
    )
    test_case = LLMTestCase(
        input="What is 2+2?",
        actual_output="4",
        expected_output="4",
    )
    assert_test(test_case, [metric])

deepeval test run test_setup.py

Error	Cause	Fix
`OpenAI rate limit` or `insufficient_quota`	No credits on the OpenAI account	Add credits or switch to another model
`OPENAI_API_KEY not set`	Key not exported	Re-export or check if `.env.local` is at the root
`ModuleNotFoundError: deepeval`	Installation in the wrong venv	Check `which python` and `pip show deepeval` in the same venv
`Connection timeout`	Corporate firewall	Configure proxy or use local Ollama

your-project/
├── .env.local                # keys (gitignored)
├── .env.example              # template without keys (committed)
├── .gitignore                # includes .env.local and eval-results/
├── pyproject.toml            # or requirements.txt with deepeval
├── src/
│   └── your_agent.py         # agent code (to be instrumented)
├── evals/
│   ├── __init__.py
│   ├── datasets/
│   │   └── goldens.json      # or .csv
│   ├── metrics/
│   │   └── custom_metrics.py # custom G-Eval
│   ├── test_agent_evals.py   # pytest files for deepeval test run
│   └── run_evals.py          # standalone script to run via python
└── eval-results/             # gitignored

Variable	What it does
`OPENAI_API_KEY`	OpenAI key for LLM-as-judge
`CONFIDENT_API_KEY`	Confident AI key
`DEEPEVAL_RESULTS_FOLDER`	Where to save results locally (default: current folder)
`DEEPEVAL_DISABLE_DOTENV`	`1` disables `.env` autoload
`DEEPEVAL_TELEMETRY_OPT_OUT`	`1` disables telemetry
`DEEPEVAL_VERBOSE_MODE`	`1` prints intermediary outputs

Bagual Ai Evals Setup

DeepEval Setup — Installation and Configuration

Principle

Prerequisites you verify first

Bagual Ai Evals Setup

DeepEval Setup — Installation and Configuration

Principle

Prerequisites you verify first

Step 1 — Installation

Common installation errors

Step 2 — Configure the LLM-as-judge key

Critical question

Step 3 — (Optional but recommended) Log in to Confident AI

If yes:

If no:

Step 4 — Smoke test

About retries

Step 5 — Suggested file structure

Useful environment variables

Final checklist

Closing

Anti-patterns

Bun Runtime

Bun Runtime

Python Patterns

Python Patterns

Publish Extension

Minecraft Modpack Server