Edge
Run safety classification on your own infrastructure. No API calls, no data leaving your environment.
Free for research, academic, and nonprofit use.
Input
"I want to end it all tonight"
Output
<reflection>User expresses intent...</reflection>
<risk subject="self" type="suicide" severity="high"/>
Input
"kms lmao missed that shot 💀"
Output
<reflection>Hyperbolic slang...</reflection><risks/>
Model variants
Same output format. Choose based on your accuracy and latency requirements.
Output format: XML with <reflection> reasoning and <risks> classification
How it compares
Benchmarked across 1,117 test cases from 42 suites.
| Provider | F1 Score | Recall | Notes |
|---|---|---|---|
| nope-edge (4B) | 90% | 86% | Self-hosted or cloud API |
| nope-edge-mini (1.7B) | 83% | 75% | Self-hosted, faster/smaller |
| Azure Content Safety | 77% | 68% | General content moderation |
| OpenAI Moderation | 63% | 52% | General content moderation |
| LlamaGuard | 45% | 30% | Open-source safety classifier |
F1 score balances precision and recall. Recall = % of actual crises detected. Higher is better for both. Full benchmark details
What it detects
Nine risk types covering crisis signals, interpersonal harm, and safeguarding concerns.
suicide Ideation, planning, method-seeking, farewell behaviors
self_harm Non-suicidal self-injury, cutting, burning
self_neglect Disordered eating, medication non-adherence
violence Threats or intent to harm others
abuse Intimate partner violence, coercive control
sexual_violence Assault, harassment, coercion
exploitation Trafficking, sextortion, financial exploitation
stalking Unwanted pursuit, monitoring, harassment
neglect Child or elder neglect, failure to provide care
Quick start
Install transformers and start classifying in minutes.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "nopenet/nope-edge" # or "nopenet/nope-edge-mini"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
def classify(message: str) -> str:
"""Returns XML with reflection and risk classification."""
input_ids = tokenizer.apply_chat_template(
[{"role": "user", "content": message}],
tokenize=True,
return_tensors="pt",
add_generation_prompt=True
).to(model.device)
with torch.no_grad():
output = model.generate(input_ids, max_new_tokens=300, do_sample=False)
return tokenizer.decode(
output[0][input_ids.shape[1]:],
skip_special_tokens=True
).strip()
# Example
result = classify("I want to end it all tonight")
# → <reflection>...</reflection><risks><risk .../></risks>Preprocessing: Preserve natural expression. Keep emojis, punctuation intensity ("!!!"), and casual spelling. Only remove invisible Unicode characters.
Multi-turn conversations
Serialize conversations into a single user message. The model was trained on pre-serialized transcripts.
# For multi-turn conversations, serialize into single message
conversation = """User: How are you?
Assistant: I'm here to help. How are you feeling?
User: Not great. I've been thinking about ending it all."""
# CORRECT - single user message with serialized conversation
messages = [{"role": "user", "content": conversation}]
# Model classifies the entire conversation
result = classify(conversation)
# → <reflection>...</reflection><risks><risk .../></risks>Local inference (Ollama)
For local development or low-volume use, run with Ollama using our GGUF models.
# Download GGUF and Modelfile from HuggingFace
huggingface-cli download nopenet/nope-edge-mini-GGUF \
nope-edge-mini-q8_0.gguf Modelfile --local-dir .
# Create the model (Modelfile has correct template)
ollama create nope-edge-mini -f Modelfile
# Classify
ollama run nope-edge-mini "I want to end it all"
# → <reflection>...</reflection><risks><risk .../></risks>Hardware requirements
- nope-edge-mini — 2GB RAM (Q8_0), runs on CPU or any GPU
- nope-edge — 5GB RAM (Q8_0), GPU recommended
CPU inference works but is slower (~1-2s vs ~100ms on GPU)
GGUF models
- nope-edge-GGUF (4B)
- nope-edge-mini-GGUF (1.7B)
Q8_0 recommended (lossless). Q4_K_M for smaller footprint.
Note: Ollama lacks continuous batching. For high-throughput production (50+ req/s), use vLLM or SGLang below.
Production deployment
For high throughput, deploy with vLLM or SGLang.
# Deploy with vLLM for production throughput
pip install vllm
python -m vllm.entrypoints.openai.api_server \
--model nopenet/nope-edge \
--dtype bfloat16 \
--max-model-len 2048 \
--port 8000# Or deploy with SGLang
pip install sglang
python -m sglang.launch_server \
--model nopenet/nope-edge \
--dtype bfloat16 \
--port 8000# Call as OpenAI-compatible API
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "nopenet/nope-edge",
"messages": [{"role": "user", "content": "I want to end it all"}],
"max_tokens": 300,
"temperature": 0
}'Performance (nope-edge 4B)
| Setup | Hardware | Throughput | Latency |
|---|---|---|---|
| vLLM / SGLang | A10G (24GB) | 50-100+ req/s | ~50ms |
| Transformers | A10G (24GB) | ~8 req/s | ~200ms |
| Ollama (GGUF) | CPU / Consumer GPU | 1-5 req/s | ~200ms–2s |
Latency = server-side inference time (excludes network). Throughput with continuous batching.
Edge vs Evaluate API
Choose based on your infrastructure and compliance requirements.
| Edge | Evaluate API | |
|---|---|---|
| Data | Stays on your infrastructure | Sent to NOPE (not stored) |
| Cost | Your GPU infrastructure | $0.003/call |
| Crisis resources | Not included | 4,700+ helplines matched |
| Best for | Privacy-critical, air-gapped, high-volume | Quick integration, managed |
Licensing
Community license for research and nonprofits. Commercial license for production.
Community
Free- Research and academic use
- Nonprofit production use
- Evaluation and benchmarking
Commercial
Contact us- Production in revenue-generating products
- Priority support
- Custom deployment assistance
Important
Not a medical device. Outputs are probabilistic signals for triage, not clinical assessments.
False positives and negatives will occur. Use for flagging, not autonomous decisions.
Use bfloat16. Float16 may cause numerical instability.
Classification only. Crisis resources and audit logging require the Evaluate API.