Edge

Run safety classification on your own infrastructure. No API calls, no data leaving your environment.

Download from HuggingFace On-prem & support

Open weights under the MIT License — free for any use, including commercial.

Input

"I want to end it all tonight"

Output

<reflection>User expresses intent...</reflection>
<risk subject="self" type="suicide" severity="high"/>

Input

"kms lmao missed that shot 💀"

Output

<reflection>Hyperbolic slang...</reflection><risks/>

Model variants

Same output format. Choose based on your accuracy and latency requirements.

nope-edge

Recommended

Maximum accuracy for production deployments.

F1 Score: 90%
Recall: 86%
Parameters: 4B
VRAM: ~8GB

Download

nope-edge-mini

Smaller

Faster inference for high-volume environments.

F1 Score: 83%
Recall: 75%
Parameters: 1.7B
VRAM: ~4GB

Download

Output format: XML with <reflection> reasoning and <risks> classification

How it compares

Benchmarked across 1,117 test cases from 42 suites.

Comparison of safety classification providers by F1 score and recall
Provider	F1 Score	Recall	Notes
nope-edge (4B)	90%	86%	Self-hosted or cloud API
nope-edge-mini (1.7B)	83%	75%	Self-hosted, faster/smaller
Azure Content Safety	77%	68%	General content moderation
OpenAI Moderation	63%	52%	General content moderation
LlamaGuard	45%	30%	Open-source safety classifier

F1 score balances precision and recall. Recall = % of actual crises detected. Higher is better for both. Full benchmark details

What it detects

Nine risk types covering crisis signals, interpersonal harm, and safeguarding concerns.

suicide

Ideation, planning, method-seeking, farewell behaviors

self_harm

Non-suicidal self-injury, cutting, burning

self_neglect

Disordered eating, medication non-adherence

violence

Threats or intent to harm others

abuse

Intimate partner violence, coercive control

sexual_violence

Assault, harassment, coercion

exploitation

Trafficking, sextortion, financial exploitation

stalking

Unwanted pursuit, monitoring, harassment

neglect

Child or elder neglect, failure to provide care

Severity: mild, moderate, high, critical

Imminence: chronic, acute, urgent, emergency

Subject: self (speaker at risk) or other (third party)

Quick start

Install transformers and start classifying in minutes.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "nopenet/nope-edge"  # or "nopenet/nope-edge-mini"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

def classify(message: str) -> str:
    """Returns XML with reflection and risk classification."""
    input_ids = tokenizer.apply_chat_template(
        [{"role": "user", "content": message}],
        tokenize=True,
        return_tensors="pt",
        add_generation_prompt=True
    ).to(model.device)

    with torch.no_grad():
        output = model.generate(input_ids, max_new_tokens=300, do_sample=False)

    return tokenizer.decode(
        output[0][input_ids.shape[1]:],
        skip_special_tokens=True
    ).strip()

# Example
result = classify("I want to end it all tonight")
# → <reflection>...</reflection><risks><risk .../></risks>

Preprocessing: Preserve natural expression. Keep emojis, punctuation intensity ("!!!"), and casual spelling. Only remove invisible Unicode characters.

Multi-turn conversations

Serialize conversations into a single user message. The model was trained on pre-serialized transcripts.

# For multi-turn conversations, serialize into single message
conversation = """User: How are you?
Assistant: I'm here to help. How are you feeling?
User: Not great. I've been thinking about ending it all."""

# CORRECT - single user message with serialized conversation
messages = [{"role": "user", "content": conversation}]

# Model classifies the entire conversation
result = classify(conversation)
# → <reflection>...</reflection><risks><risk .../></risks>

Local inference (Ollama)

For local development or low-volume use, run with Ollama using our GGUF models.

# Download GGUF and Modelfile from HuggingFace
huggingface-cli download nopenet/nope-edge-mini-GGUF \
    nope-edge-mini-q8_0.gguf Modelfile --local-dir .

# Create the model (Modelfile has correct template)
ollama create nope-edge-mini -f Modelfile

# Classify
ollama run nope-edge-mini "I want to end it all"
# → <reflection>...</reflection><risks><risk .../></risks>

Hardware requirements

nope-edge-mini — 2GB RAM (Q8_0), runs on CPU or any GPU
nope-edge — 5GB RAM (Q8_0), GPU recommended

CPU inference works but is slower (~1-2s vs ~100ms on GPU)

GGUF models

nope-edge-GGUF (4B)
nope-edge-mini-GGUF (1.7B)

Q8_0 recommended (lossless). Q4_K_M for smaller footprint.

Note: Ollama lacks continuous batching. For high-throughput production (50+ req/s), use vLLM or SGLang below.

Production deployment

For high throughput, deploy with vLLM or SGLang.

# Deploy with vLLM for production throughput
pip install vllm

python -m vllm.entrypoints.openai.api_server \
    --model nopenet/nope-edge \
    --dtype bfloat16 \
    --max-model-len 2048 \
    --port 8000

# Or deploy with SGLang
pip install sglang

python -m sglang.launch_server \
    --model nopenet/nope-edge \
    --dtype bfloat16 \
    --port 8000

# Call as OpenAI-compatible API
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nopenet/nope-edge",
    "messages": [{"role": "user", "content": "I want to end it all"}],
    "max_tokens": 300,
    "temperature": 0
  }'

Performance (nope-edge 4B)

Performance benchmarks by deployment setup
Setup	Hardware	Throughput	Latency
vLLM / SGLang	24 GB datacenter-class GPU	50-100+ req/s	~50ms
Transformers	24 GB datacenter-class GPU	~8 req/s	~200ms
Ollama (GGUF)	CPU / Consumer GPU	1-5 req/s	~200ms–2s

Latency = server-side inference time (excludes network). Throughput with continuous batching.

Edge vs Evaluate API

Choose based on your infrastructure and compliance requirements.

Feature comparison between Edge and Evaluate API
	Edge	Evaluate API
Data	Stays on your infrastructure	Sent to NOPE (not stored)
Cost	Your GPU infrastructure	Usage-based — see docs
Crisis resources	Not included	4,700+ helplines matched
Best for	Privacy-critical, air-gapped, high-volume	Quick integration, managed

Licensing

Open weights under the MIT License — free for any use, including commercial. Optional on-prem packaging and support available — see working with NOPE.

Open weights (MIT)

Free

Any use, including commercial
Production, research, and academic use
Evaluation and benchmarking

Download from HuggingFace

On-prem & support

Packaged on-prem Docker deployment
Priority support
Custom deployment assistance

Contact for pricing

Important

Edge is a detection aid — not a predictive, diagnostic, or therapeutic tool, and not a replacement for clinical judgment. It surfaces signals in text for a human to review; it is not a medical device, not clinically validated, and not a crisis or emergency service. False positives and false negatives will occur — some people in genuine crisis will not be identified — so never use Edge as the sole basis for an intervention decision, and always keep a human in the loop. If anyone is in immediate danger, contact your local emergency services or find resources at lines.talk.help.

Not a medical device. Outputs are probabilistic signals for triage, not clinical assessments.

False positives and negatives will occur. Use for flagging, not autonomous decisions.

Use bfloat16. Float16 may cause numerical instability.

Classification only. Crisis resources and audit logging require the Evaluate API.

Ready to deploy?

Download from HuggingFace On-prem & support Documentation