Fully Local · Air-Gapped · Clinical Document Intelligence

DOCUMENTITIS

Where every page tells a story.

An on-premise AI platform that reads unstructured clinical documents — scanned notes, handwritten prescriptions, lab reports — and turns them into structured, source-linked, research-ready data. Nothing ever leaves the hospital network.

HIPAA-aligned GDPR-aligned FDA 21 CFR Part 11 ICH E6 (R2) GCP Runs on Apple Silicon M4 — no cloud
19
Documents processed
94
Variables extracted
85.5%
Avg extraction confidence
19
AI patient summaries
How this started

2024. First-year MBBS.

"Why is all of this still on paper?"

So instead of drawing the brachial plexus in our notebooks, we started sketching system architecture.

We walked in expecting stethoscopes and scrubs. We got mountains of paper instead — handwritten records, illegible prescriptions, lab reports stuffed into folders that no one would ever open again.

Between Anatomy lectures and Biochemistry vivas, we kept asking each other the same question — and we couldn't let it go.

Three medical students. One conviction: clinical records should never be a barrier to better care.

The real bottleneck

80% of clinical data is trapped in unstructured form.

Before a study can even begin enrolling, someone has to read it all by hand.

30–60 min

Per patient

Manual chart abstraction from scanned notes, handwritten prescriptions and free-text discharge summaries — for every single record.

100s of hrs

Per trial

Multiplied across a 500-patient study, that's hundreds of hours of pure data entry before the first patient is enrolled.

Integrity risk

Inter-rater variability

Different abstractors read the same chart differently — quietly threatening the integrity of multi-site research data.

What DOCUMENTITIS does

Five gated stages. All on-chain of custody.

A fully local pipeline that converts unstructured clinical documents into structured, research-ready data — every step on the hospital's own servers.

01
Upload
02
OCR Quality Gate
03
AI Extraction
04
Validation
05
Export
On-premiseEvery processing step runs on the hospital's own servers.
Source-linkedEvery extracted data point links back to its source line in the original document.
Knows its limitsUncertain extractions are flagged for human review — never silently trusted.
Export-readyOutputs flow directly into REDCap and Medidata Rave.
The intelligence behind it

Two AI layers. Both running locally.

OCR Layer

Reads scanned & handwritten documents, block-by-block
dots.ocr · Tesseract 5 · EasyOCR — ensemble OCR. Low-confidence blocks are auto-flagged for human review.

Local LLMs via Ollama

Schema-constrained extraction, validation & summarisation
Nemotron-mini — extracts schema-constrained clinical variables.
DeepSeek-R1 (8B/14B) — validates findings, normalises units, generates patient summaries.
Llama 3.1 — powers a clinical assistant chatbot across all documents.
All of this runs on consumer-grade hardware (Apple Silicon M4). No cloud APIs. No GPU clusters.
What's already built — v1.0.0

A working console, today.

Not a roadmap. Five live modules you can open right now.

Why local

How DOCUMENTITIS differs from existing tools.

Cloud Clinical AIDOCUMENTITIS
Data locationSent to cloudStays in hospital
InfrastructureData-center GPUsConsumer hardware
TraceabilityBlack-box outputsSource-linked, audit-grade
Regulatory fitDifficult in India / EUBuilt natively for data sovereignty
Cost to hospitalHigh recurring feesOne-time deployment
Where this fits best

Built for clinical research teams.

Retrospective Research

Automates chart abstraction for studies on sepsis, readmission rates and oncology outcomes — turning months of manual review into hours.

Multi-Site Trial Consistency

Standardises extraction across hospitals, eliminating the site-to-site inter-rater variability that compromises trial integrity.

Adverse Event Detection

Continuously evaluates records to flag abnormal lab values and safety signals — supporting real-time pharmacovigilance.

Built within legal & ethical frameworks

Trust is the architecture — not a feature.

HIPAA-aligned
GDPR-aligned
FDA 21 CFR Part 11
ICH E6 (R2) GCP
Any extraction with a confidence score below 0.89 is automatically paused and flagged for mandatory clinician verification. Development uses strictly de-identified records only.
What we're seeking

A clinical research partner.

We're three medical students with a working v1.0.0. Here's where the right institution changes everything.

01

Institutional Partnership

A clinical research institution willing to host a pilot deployment, strictly on de-identified historical data.

02

Validation Dataset Access

Permission to validate our extraction accuracy against ground-truth data, under proper ethical clearance.

03

Mentorship & Direction

Clinical research expertise to help us refine which use cases to prioritise first.

admin@produsa.dev
DOCUMENTITIS

Three medical students.
One conviction.

Clinical records should never be a barrier to better care.