Run Experiment
Configure and execute a live experiment. Results stream in real time from the Python backend.
Experiment configuration
Schema compliance
—
% passing validation
Parse rate
—
% parseable as JSON
Avg latency
—
ms per request
Total runs
—
all systems loaded
Live log
--:--:--Enter your API key and backend URL above, then click Run.
Results
Schema compliance by system and prompt style, plus full per-sample breakdown.
Schema compliance by prompt style
All results 0 records
| Sample | System | Task | Style | Schema | Accuracy | Latency | Calls |
|---|---|---|---|---|---|---|---|
| Run an experiment to see results | |||||||
Variance Heatmap
Schema compliance by system and prompt style. Green = high compliance. Red = failure zone.
Compliance heatmap
Structured
Ambiguous
Verbose
Casual
Run experiments to populate the heatmap.
Variance bar chart
Key Findings
Auto-generated from your real experiment results.
Compliance gap
—
structured vs casual, Baseline A
Pipeline lift
—
compliance gain vs Baseline A avg
Cost multiplier
—
pipeline calls vs Baseline A
Parse rate
—
all results all styles
Waiting for data
Run at least Baseline A to generate findings. Findings update automatically as each experiment completes.