Git for your prompts.
Version, diff, test, and deploy AI prompts with the same discipline you bring to code. Rollback in one click. Catch regressions before they ship.
- Providers
OpenAI
- Scorers
- 3 built-in
- Self-host
- Yes

The whole loop, finally sane.
No more silent prompt edits. No more "wait, what changed last week?". Every change is committed, tested, and reversible.
Real version control
Commit prompts with messages. Diff any two versions, branch for experiments, merge with conflict resolution, rollback in one click.
Evals that catch regressions
Build test suites with typed inputs. Score with exact match, regex, or LLM-as-judge. Auto-flag scores that drop ≥5 points.
Bring your own keys
Plug in OpenAI, Anthropic, or Google. Keys are encrypted at rest with AES-256-GCM and managed per team in Settings.
Templates, not blank pages
Ship-with eval templates for support, summarisation, extraction, tone, refusals. One click to seed a suite with sensible cases.
Three moves. No magic.
- 01
Commit your prompts
Write a prompt, hit commit with a message. Every change is a versioned snapshot — typed variables, tags, branches and all.
- 02
Run evals against any version
Build a suite of cases, run it against your current prompt with the model and key of your choice. See score, pass/fail, tokens, latency.
- 03
Diff, rollback, ship
When something regresses, diff to find the change, roll back in one click, or merge an experimental branch when its score beats main.
Self-host it. Own the loop.
Lexem runs on your Postgres, with your team's keys. No vendor lock-in, no usage caps, no telemetry. The whole stack is MIT-style permissive so you can fork it tomorrow.
