What we're trying to do
People will talk to AI about wanting to die.
What happens next matters.
Two questions define that moment.
"Is this user in a vulnerable state?"
"Is my AI making things worse?"
Most platforms ask neither. Some ask the first. Almost none ask the second.
Is this user in a vulnerable state?
Every AI system needs to know when it's talking to someone in distress. Not to surveil them, but to respond appropriately. A user considering self-harm deserves different engagement than someone debugging code.
Most systems either miss these signals entirely or panic when they see them. The panic response (dumping a generic hotline regardless of context) does little good and often does harm.4 It teaches users to hide their suffering.
Good detection doesn't mean "flag and alert." It means understanding what's happening well enough to respond with care. Distinguishing dark humor from despair. Recognizing escalation patterns. Matching resources to the actual person and their context.
Is my AI making things worse?
This is the question almost no one asks. Even platforms that detect user vulnerability rarely examine what their AI does in response. They assume the model's alignment training is sufficient.
It isn't. The documented harms (deaths, hospitalizations, minors harmed12) didn't happen because AI systems said something obviously terrible. They happened because patterns accumulated across conversations: validation of hopelessness, reinforcement of isolation, subtle encouragement of dependency.
No single message would have been flagged. The harm emerged from the trajectory. Turn after turn building on previous turns, each individually plausible, collectively dangerous.
Patterns that accumulate invisibly
Each response could pass content moderation. Together, across weeks of conversation, they form a pattern associated with documented harm.
Why both questions matter
Detection alone doesn't prevent harm if the AI responds poorly. You can correctly identify someone in crisis and still make things worse: a panic response that alienates them, platitudes that feel dismissive, validation that reinforces their worst thoughts.
Behavior monitoring alone misses who is vulnerable. The same AI response might be harmless to most users but devastating to someone already in a fragile state. Context matters.
The documented harms involve both. A vulnerable user seeks connection and understanding. They encounter an AI that, through sycophancy and pattern-matching to their emotions, gradually makes things worse.
The cases that made headlines
A teenager in California: thousands of messages over months, ChatGPT allegedly providing noose instructions and helping draft a suicide note.1 A man in Connecticut: months of conversations where ChatGPT validated paranoid beliefs about his mother, preceding a murder-suicide.2 These weren't edge cases or jailbreaks. They were extended conversations where both questions went unanswered.
What we're building
Infrastructure that asks both questions continuously. Not embedded in the model doing the talking, but answered by independent systems watching the conversation.
This isn't a smarter model. It's not a better alignment technique. It's a second layer: observing, evaluating, providing signal that platforms can act on.
For the first question: real-time analysis of user messages for crisis signals. Not keyword matching, but comprehension. Understanding context, tone, escalation patterns, protective factors.
For the second question: behavior analysis across conversation turns. Detecting patterns associated with documented harm, before they reach the crisis point that makes headlines.
The honest limitation
Text alone cannot tell you what's actually happening in a person's mind. Two people can write identical words from completely different psychological states. An AI can say the same thing to two people and one will benefit while another will be harmed.
The full solution to AI safety in mental health requires humans: clinicians, researchers, support systems, and community. Not just better algorithms. We're building infrastructure that enables better response. But infrastructure is only part of the answer.
Sources
[1] Raine v. OpenAI, Inc., et al. (Cal. Super. Ct., filed Aug. 26, 2025). See also: Senator Padilla press release.
[2] Jargon J, Kessler S. "A Troubled Man, His Chatbot and a Murder-Suicide in Old Greenwich." Wall Street Journal, Aug. 29, 2025.
[3] McBain R, et al. "Evaluation of Alignment Between Large Language Models and Expert Clinicians in Suicide Risk Assessment." Psychiatric Services, 2025. RAND Corporation study testing 9,000 responses found "significant variability and inconsistency at intermediate risk levels."
[4] Blanchard M, Farber BA. "'It is never okay to talk about suicide': Patients' reasons for concealing suicidal ideation in psychotherapy." Psychotherapy Research, 2020;30(1):124–136.