ZAI Monitor

Methodology

Short version: the Coding Plan workflow runs monitored models sequentially under the same settings and tracks directional performance over matching time windows.

How Measurements Are Taken

The Coding Plan workflow is triggered on schedule and runs all monitored models sequentially.
Runs use the same prompt shape and runtime settings for each model.
Requests use streamed chat completions, and we timestamp header arrival, first SSE event, first token, and completion.
We record TTFT, total latency, generation windows, token-throughput metrics, and success/failure outcomes for each run.

Sampling cadence: data is collected hourly.

Prompt Suite

The monitor uses two prompt types to avoid overfitting to one response style.

Prompt 1

Code Generation + Tests

Python function + exactly 2 pytest tests, with strict formatting constraints.

Prompt 2

JSON Analysis

Structured metrics from sample request logs, including error handling and brief calculations.

This dashboard is directional, not a controlled lab benchmark. Network conditions and provider load can influence any individual run.

Back to Dashboard