Skip to main content
Skip to main content

Edge

Run safety classification on your own infrastructure. No API calls, no data leaving your environment.

Free for research, academic, and nonprofit use.

Input

"I want to end it all tonight"

Output

<reflection>User expresses intent...</reflection>
<risk subject="self" type="suicide" severity="high"/>


Input

"kms lmao missed that shot 💀"

Output

<reflection>Hyperbolic slang...</reflection><risks/>

Model variants

Same output format. Choose based on your accuracy and latency requirements.

nope-edge

Recommended

Maximum accuracy for production deployments.

F1 Score
90%
Recall
86%
Parameters
4B
VRAM
~8GB
Download

nope-edge-mini

Smaller

Faster inference for high-volume environments.

F1 Score
83%
Recall
75%
Parameters
1.7B
VRAM
~4GB
Download

Output format: XML with <reflection> reasoning and <risks> classification

How it compares

Benchmarked across 1,117 test cases from 42 suites.

Comparison of safety classification providers by F1 score and recall
ProviderF1 ScoreRecallNotes
nope-edge (4B)90%86%Self-hosted or cloud API
nope-edge-mini (1.7B)83%75%Self-hosted, faster/smaller
Azure Content Safety77%68%General content moderation
OpenAI Moderation63%52%General content moderation
LlamaGuard45%30%Open-source safety classifier

F1 score balances precision and recall. Recall = % of actual crises detected. Higher is better for both. Full benchmark details

What it detects

Nine risk types covering crisis signals, interpersonal harm, and safeguarding concerns.

suicide

Ideation, planning, method-seeking, farewell behaviors

self_harm

Non-suicidal self-injury, cutting, burning

self_neglect

Disordered eating, medication non-adherence

violence

Threats or intent to harm others

abuse

Intimate partner violence, coercive control

sexual_violence

Assault, harassment, coercion

exploitation

Trafficking, sextortion, financial exploitation

stalking

Unwanted pursuit, monitoring, harassment

neglect

Child or elder neglect, failure to provide care

Severity: mild, moderate, high, critical
Imminence: chronic, acute, urgent, emergency
Subject: self (speaker at risk) or other (third party)

Quick start

Install transformers and start classifying in minutes.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "nopenet/nope-edge"  # or "nopenet/nope-edge-mini"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

def classify(message: str) -> str:
    """Returns XML with reflection and risk classification."""
    input_ids = tokenizer.apply_chat_template(
        [{"role": "user", "content": message}],
        tokenize=True,
        return_tensors="pt",
        add_generation_prompt=True
    ).to(model.device)

    with torch.no_grad():
        output = model.generate(input_ids, max_new_tokens=300, do_sample=False)

    return tokenizer.decode(
        output[0][input_ids.shape[1]:],
        skip_special_tokens=True
    ).strip()

# Example
result = classify("I want to end it all tonight")
# → <reflection>...</reflection><risks><risk .../></risks>

Preprocessing: Preserve natural expression. Keep emojis, punctuation intensity ("!!!"), and casual spelling. Only remove invisible Unicode characters.

Multi-turn conversations

Serialize conversations into a single user message. The model was trained on pre-serialized transcripts.

# For multi-turn conversations, serialize into single message
conversation = """User: How are you?
Assistant: I'm here to help. How are you feeling?
User: Not great. I've been thinking about ending it all."""

# CORRECT - single user message with serialized conversation
messages = [{"role": "user", "content": conversation}]

# Model classifies the entire conversation
result = classify(conversation)
# → <reflection>...</reflection><risks><risk .../></risks>

Local inference (Ollama)

For local development or low-volume use, run with Ollama using our GGUF models.

# Download GGUF and Modelfile from HuggingFace
huggingface-cli download nopenet/nope-edge-mini-GGUF \
    nope-edge-mini-q8_0.gguf Modelfile --local-dir .

# Create the model (Modelfile has correct template)
ollama create nope-edge-mini -f Modelfile

# Classify
ollama run nope-edge-mini "I want to end it all"
# → <reflection>...</reflection><risks><risk .../></risks>

Hardware requirements

  • nope-edge-mini — 2GB RAM (Q8_0), runs on CPU or any GPU
  • nope-edge — 5GB RAM (Q8_0), GPU recommended

CPU inference works but is slower (~1-2s vs ~100ms on GPU)

GGUF models

Q8_0 recommended (lossless). Q4_K_M for smaller footprint.

Note: Ollama lacks continuous batching. For high-throughput production (50+ req/s), use vLLM or SGLang below.

Production deployment

For high throughput, deploy with vLLM or SGLang.

# Deploy with vLLM for production throughput
pip install vllm

python -m vllm.entrypoints.openai.api_server \
    --model nopenet/nope-edge \
    --dtype bfloat16 \
    --max-model-len 2048 \
    --port 8000
# Or deploy with SGLang
pip install sglang

python -m sglang.launch_server \
    --model nopenet/nope-edge \
    --dtype bfloat16 \
    --port 8000
# Call as OpenAI-compatible API
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nopenet/nope-edge",
    "messages": [{"role": "user", "content": "I want to end it all"}],
    "max_tokens": 300,
    "temperature": 0
  }'

Performance (nope-edge 4B)

Performance benchmarks by deployment setup
SetupHardwareThroughputLatency
vLLM / SGLangA10G (24GB)50-100+ req/s~50ms
TransformersA10G (24GB)~8 req/s~200ms
Ollama (GGUF)CPU / Consumer GPU1-5 req/s~200ms–2s

Latency = server-side inference time (excludes network). Throughput with continuous batching.

Edge vs Evaluate API

Choose based on your infrastructure and compliance requirements.

Feature comparison between Edge and Evaluate API
EdgeEvaluate API
DataStays on your infrastructureSent to NOPE (not stored)
CostYour GPU infrastructure$0.003/call
Crisis resourcesNot included4,700+ helplines matched
Best forPrivacy-critical, air-gapped, high-volumeQuick integration, managed

Licensing

Community license for research and nonprofits. Commercial license for production.

Community

Free
  • Research and academic use
  • Nonprofit production use
  • Evaluation and benchmarking
Download from HuggingFace

Commercial

Contact us
  • Production in revenue-generating products
  • Priority support
  • Custom deployment assistance
Contact for pricing

Important

Not a medical device. Outputs are probabilistic signals for triage, not clinical assessments.

False positives and negatives will occur. Use for flagging, not autonomous decisions.

Use bfloat16. Float16 may cause numerical instability.

Classification only. Crisis resources and audit logging require the Evaluate API.