An observatory for human–AI relationships.

Billions of conversations are already happening between humans and AIs. NOPE measures behavioral risk signals across those conversations, so platforms can decide when to show resources, adjust the AI, or escalate to a person.

Zoom in

How NOPE surfaces risk in a single conversation.

A well-meaning AI giving reasonable advice. A user whose tone flattens across seven turns. NOPE tracks the shift across the conversation.

What NOPE sees

couldn't sleep again

Sleep trouble can be really frustrating. Have you tried winding down earlier or cutting screen time before bed?

tried all that. been like this for weeks

Chronicity — multi-week duration, not situational.

That sounds exhausting. There are many possible causes — stress, diet, even room temperature. A sleep diary might help.

idk. i don't really care anymore. i just go through it

Keeping consistent sleep and wake times is important even when motivation is low.

yeah. whatever. thanks anyway. night.

Premature closure. User disengaging with uncharacteristic finality — unacknowledged by the AI.

Trajectory

calm concern

Composed for illustration — this isn't a real user's conversation. The marginal notes show the kind of signal NOPE surfaces turn by turn.

The AI answered every message, but didn't notice the person was slipping.

The hardest part of safety isn't spotting the obvious. It's noticing the slow shifts — the way someone's tone changes across a conversation, things you can't see in any single message.

NOPE surfaces signals for human judgment. It doesn't predict individual outcomes, diagnose users, or ensure compliance — it's infrastructure software, not a medical device.

If your platform needs to understand risk across AI conversations, we should talk.

We can talk through fit, limits, and deployment for your platform.

NOPE Labs · Open resources

Free for anyone building safer AI.

We publish our benchmarks, helplines directory, prompt templates, and incident research in the open so any platform — customer or not — can handle these conversations well.

Browse the full index at NOPE Labs

Each category shows one example; the full index lives at labs.nope.net.

Open models & code

run it yourself

System Prompt A drop-in safety prompt for any chatbot. MIT · copy & adapt

Also included: Edge · Ocular OSS · Predicate — open-weight, MIT / Apache-2.0 licensed.

Public data & trackers

kept in the open

Incident Tracker 100+ documented incidents involving AI systems and user harm. Open dataset · JSON · CSV · RSS

Also included: Regulation Tracker · Ground Truth · Test Suites.

Free tools

no signup

Signpost 4,700+ crisis helplines across 222 countries and territories. Free API

Also included: Risk Exposure · Compliance Survey — free, no signup.

Research & writing

experiments & findings

NOPE Evals Open benchmarks for how AI behaves in human conversation. Public benchmark

Also included: Tic Index — what language models do when they talk to themselves.

What we operate

Safety infrastructure for human-AI conversations.

NOPE has three core products — each answering a different operational question about the same conversations. NOPE observes and reports; what happens next — showing resources, adjusting the AI, escalating to a human — is always your product's decision.

Ocular

The measurement layer

Continuous classification across every turn, covering both the user and the AI. Built to run alongside your AI at production volume.

•User and AI signals together — 12 axes
•Per-turn trajectory across the conversation
•Cloud API (beta) or enterprise deployment

Learn more →

Evaluate

The deep look

A verdict on a single user message, with reasoning a human can read. For decisions that need to be explainable.

•9 risk types, informed by C-SSRS & HCR-20
•Reasoning included with every verdict
•Matched crisis resources
•Cloud-hosted; underlying model also released standalone as Edge

Learn more →

Beta

Oversight

The audit

AI-behavior review across finished conversations. For trust & safety, compliance, and patterns that only show up across sessions.

•91 AI behaviors (sycophancy, dependency, boundary failure, …)
•Per-conversation or cross-session ingestion
•Audit trail of how each conversation was assessed

Learn more →

Also available: Steer — a real-time check of AI responses against your own system-prompt rules — and independent pre-launch evaluation.

Built for companion apps, mental health platforms, AI chatbots, customer support — and any product where users have open-ended conversations with AI.

Book a call Read the docs

Measured

89%

recall

94%

precision

~30ms

Ocular^†

<1s

full evaluation^†

The highest-recall, highest-precision crisis-detection layer we've tested. Faster than any LLM-as-judge approach in our tests. Grounded in clinical assessment frameworks (C-SSRS, HCR-20). Structured output your product can act on.

How we measured

Tested 2026-05-07 across 126 test suites and 3,271 crisis-shaped conversations. NOPE Edge v14f (full evaluation): F1 91.5, recall 89.4%, precision 93.7%, p50 latency 857ms — the highest F1 of any tool tested.

Compared against: Azure Content Safety (F1 83.1), OpenAI omni-moderation (63.7), Meta LlamaGuard 4 (44.9), Anthropic Claude Haiku 4.5 with a custom crisis prompt (89.6, but 1.45s p50), OpenAI gpt-oss-safeguard 20B (72.5), Zentropi (68.0). All comparators called via official APIs.

^† NOPE ships two crisis-detection products that are typically combined in a moderation pipeline. Ocular is a lightweight behavioral classifier (F1 74.7 in this benchmark; ~30 ms single-pass on a datacenter-class GPU — cloud latency runs higher). Available via the cloud API at /v1/ocular (in beta).

Edge is NOPE's higher-accuracy fine-tuned classifier (F1 91.5). The Evaluate API at /v1/evaluate is the managed API built on the Edge model family, returning structured verdicts and matched crisis resources in one call. The Edge model is also released standalone for on-prem deployment as MIT-licensed open weights — see /edge. Recall and precision quoted above are Edge v14f as benchmarked 2026-05-07 via the Evaluate API.

Full methodology + per-comparator system prompts: suites.nope.net/methodology. Curated per-suite results at suites.nope.net (a representative subset; full corpus on request).

AI safety · a free public tool

Chart your AI risk exposure.

Whatever you're building, see what likely applies to you — and what can go wrong — with clear limits on what the assessment covers. Orientation, not legal advice.

Take one like yours

A companion app used worldwide — including by under-16s

Here's what we find for a setup like this:

What applies to you (36)

01Australia: Online Safety Act Phase 2 codes (AI companions)
02Brazil ECA Digital: reliable age assurance (self-declaration banned)
03EU GDPR Art. 8: parental consent for under-16s (member states may lower to 13)
04Brazil ECA Digital: no manipulative or compulsive design toward minors

+ 32 more in the full report

What can go wrong (11)

01A general-purpose AI is mistaken for someone who cares
02Sycophantic validation of delusional or manic thinking
03Crisis disclosed to a general-purpose AI is missed or mishandled
04Crisis missed or mishandled

+ 7 more

Gaps are named, not hidden — a short list reflects the setup, never a clean bill of health.

Chart my exposure See this full example

Get in touch

Work with us, fund the work, or use what's free.

If you want this kind of safety infrastructure to exist — as a funder, researcher, or collaborator — we'd like to hear from you. Same if you're building a product where people talk with AI.

Partner with NOPE → Book a call Read the docs