AI rare diseases diagnostic tool

The problem

Rare diseases hide in fragmented patient data – therapists rarely have time to spot the pattern.

A clinic digitalization service provider came to us. Their therapists at a private clinic see dozens of patients per day, with each patient's history scattered across handwritten notes, scanned forms, and PDFs. Subtle signals that point to a rare condition get lost in the noise. And because the clinic operates on an internal network, no cloud-based AI tool was on the table.

Therapists spend excessive time processing unstructured patient data – notes, scans, PDFs
Dozens of patients per day – hard to maintain a coherent picture of each patient's condition over time
Clinic data lives on internal servers – cloud-based solutions are not an option

The starting point

Before

Patient data scattered across notes, scans, and PDFs with no shared structure
No semantic search across a patient's full history
Rare-disease patterns invisible without manual cross-referencing
HIPAA-style constraints – no data could leave the clinic's network

The challenge

An AI assistant that runs entirely inside the clinic's internal network, consolidates fragmented patient data into a structured record, and flags candidate diagnoses – including rare-disease hints – for the therapist to validate.

The solution

A specialized AI assistant that converts any input format into structured, validated patient data – then pushes it directly to the clinic's MIS.

MIS integration – full access to historical patient data
Symptom extraction & coding – ICD code mapping with semantic synonym handling
Manual validation UI – doctor reviews and approves before MIS upload
Disease suggestions – AI-powered early-detection hints, including rare conditions
Data anonymization – compliant AI processing on-premise

Key decisions

1. On-premise AI deployment

The full pipeline runs inside the clinic's internal network – no cloud, no external data transfer. Models, embeddings, and vector storage are all deployed on clinic infrastructure. Cloud-based LLM APIs were a non-starter from day one, so the architecture is built around on-prem inference from the ground up.

2. Medical synonym recognition

"Fever" and "high temperature" mean the same thing. "Subfebrile temperature" is a distinct condition. Off-the-shelf semantic similarity confuses the two. We built domain-aware deduplication that treats medical synonyms correctly while preserving clinically meaningful distinctions – critical when the goal is rare-disease detection, where a missed nuance is a missed diagnosis.

3. Patient data anonymization throughout the pipeline

Sensitive identifiers are stripped before data reaches the AI processing steps and re-attached only at the validation UI. The therapist sees the full record; intermediate AI stages do not. Compliance is structural, not an audit-time afterthought.

4. Doctor-in-the-loop, not doctor-out-of-the-loop

The AI extracts and suggests. The doctor validates before anything reaches the MIS. The system is positioned as an assistant that surfaces signals – not a black box that writes records autonomously. That framing was load-bearing for clinical adoption.

Results

+20% therapist efficiency
~90% of patient parameters extracted automatically
Full pipeline running on-premise, no external data transfer

"Their ability to combine velocity and quality of the final product is on another level. I'm also impressed with their deep knowledge of all aspects of AI development." – Client