AI rare diseases diagnostic tool
The problem
Rare diseases hide in fragmented patient data – therapists rarely have time to spot the pattern.
A clinic digitalization service provider came to us. Their therapists at a private clinic see dozens of patients per day, with each patient's history scattered across handwritten notes, scanned forms, and PDFs. Subtle signals that point to a rare condition get lost in the noise. And because the clinic operates on an internal network, no cloud-based AI tool was on the table.
- Therapists spend excessive time processing unstructured patient data – notes, scans, PDFs
- Dozens of patients per day – hard to maintain a coherent picture of each patient's condition over time
- Clinic data lives on internal servers – cloud-based solutions are not an option
The starting point
Before
- Patient data scattered across notes, scans, and PDFs with no shared structure
- No semantic search across a patient's full history
- Rare-disease patterns invisible without manual cross-referencing
- HIPAA-style constraints – no data could leave the clinic's network
The challenge
An AI assistant that runs entirely inside the clinic's internal network, consolidates fragmented patient data into a structured record, and flags candidate diagnoses – including rare-disease hints – for the therapist to validate.
The solution
A specialized AI assistant that converts any input format into structured, validated patient data – then pushes it directly to the clinic's MIS.
- MIS integration – full access to historical patient data
- Symptom extraction & coding – ICD code mapping with semantic synonym handling
- Manual validation UI – doctor reviews and approves before MIS upload
- Disease suggestions – AI-powered early-detection hints, including rare conditions
- Data anonymization – compliant AI processing on-premise
Key decisions
1. On-premise AI deployment
The full pipeline runs inside the clinic's internal network – no cloud, no external data transfer. Models, embeddings, and vector storage are all deployed on clinic infrastructure. Cloud-based LLM APIs were a non-starter from day one, so the architecture is built around on-prem inference from the ground up.
2. Medical synonym recognition
"Fever" and "high temperature" mean the same thing. "Subfebrile temperature" is a distinct condition. Off-the-shelf semantic similarity confuses the two. We built domain-aware deduplication that treats medical synonyms correctly while preserving clinically meaningful distinctions – critical when the goal is rare-disease detection, where a missed nuance is a missed diagnosis.
3. Patient data anonymization throughout the pipeline
Sensitive identifiers are stripped before data reaches the AI processing steps and re-attached only at the validation UI. The therapist sees the full record; intermediate AI stages do not. Compliance is structural, not an audit-time afterthought.
4. Doctor-in-the-loop, not doctor-out-of-the-loop
The AI extracts and suggests. The doctor validates before anything reaches the MIS. The system is positioned as an assistant that surfaces signals – not a black box that writes records autonomously. That framing was load-bearing for clinical adoption.
Results
- +20% therapist efficiency
- ~90% of patient parameters extracted automatically
- Full pipeline running on-premise, no external data transfer
"Their ability to combine velocity and quality of the final product is on another level. I'm also impressed with their deep knowledge of all aspects of AI development." – Client