Researcher holding a scientific article in front of crop trial fields at sunset

Scientific Trial PDFs Extraction Cut from 2 Hours to 15 Minutes at >90% Accuracy

We developed an AI assistant for a top-10 crop science multinational that extracts structured PECO fields from scientific trial PDFs with human-in-the-loop review.

Netherlands

Solution:

AI Assistant

Tech Stack:

Backend & APIs, Python

Services:

UI Design, DevOps, UX Design, Product Analytics, AI, QA, Software Development, Workflow Automation, Backend, AI Transformation

KPI:

Project Timeline: complete the project in 7 weeks.

Time per article: 2 hours → 15 minutes.

Extraction accuracy: >90% on the typical corpus.

Client & Context

A top-10 global crop science company. Agronomy and regulatory teams run systematic reviews of trial literature.

Each review uses the PECO framework — Population, Exposure, Comparator, Outcome. Analysts pulled these fields manually from PDFs.

One article took two hours. The team handles hundreds of articles a year.

Agricultural field trial plots viewed from above under a cloudy sky

Goals

1
Cut analyst time per article without losing accuracy.
2
Hit ≥85% extraction accuracy on a held-out test set.
3
Ship a deployment plan including on-prem.

Challenges & How We Overpowered Them

A single PECO field scatters across methods, results, and supplementary material.

Wider retrieval: top-N candidates instead of top-3.

LLM filters the candidate set for precision.

SciBERT-class embeddings adapted to the in-domain corpus.

Long-context prompts lose recall past ~32K tokens.

Retrieval architecture tuned for recall over precision.

Missing a fact costs more than reviewing one extra candidate.

PDF layouts vary heavily across publishers and journals.

Training and evaluation set expanded to several samples per format.

PDF parsing carries fragment coordinates end-to-end.

Image-embedded text handled via coordinate awareness.

Stakeholders disagreed on what counts as a correct extraction.

Schema locked in week one.

TP / FP / FN evaluation scheme agreed before model work started.

Diagram showing key PDF extraction challenges: scattered facts, context limits, heterogeneous layouts, and agreed evaluation rules

From Schema Definition to Production-Ready Pipeline

Schema definition

We formalized the extraction schema as extended PECO — four standard fields plus study-design and regional-context attributes.

Defining the schema surfaced disagreements between research leads. Resolving them before any model work started was the highest-leverage hour of the engagement.

Quality criteria, evaluation scheme, and working assumptions were locked in week one.

Extended PECO schema diagram with Population, Exposure, Comparator, Outcome, study design, and regional context

Retrieval and extraction pipeline

The decisive move was widening retrieval from top-3 passages to a broader candidate set, then letting the LLM filter.

Recall on per-field extraction rose into the 90s with no observable precision loss on the validation set.

Retrieval ran on SciBERT-class scientific embeddings adapted to the in-domain corpus. A GPT-class model handled field-level extraction.

Retrieval architecture diagram comparing top-3 extraction with top-N retrieval and LLM filtering for higher recall

PDF parsing with traceability

Fragment coordinates carried end-to-end through the pipeline. Every extracted fact traces back to its source span.

Text embedded inside images and figures was handled with coordinate awareness. This outperformed classic OCR and text-layer parsers on this corpus.

Heterogeneous article layouts were the second engineering hurdle. Several samples per format meaningfully improved robustness on edge layouts.

Scientific trial PDF with highlighted source text linked to coordinate-aware traceability

Analyst review interface

The web UI is a fix-and-confirm review surface. Each screen puts the analyst between an extracted fact and its source span.

SSO-friendly auth. Drag-and-drop PDF intake.

The results screen shows each extracted field next to the passage it came from. Quick-edit affordances make corrections take seconds.

Review interface showing extracted PECO fields linked to highlighted source passages in the original PDF

Metrics and production memo

We tracked a three-tier metric pyramid. North Star — end-to-end review time at required accuracy.

Product metrics covered time per article, share of articles closed without analyst edits, analyst satisfaction. Technical metrics covered TP / FP / FN per field, pipeline latency, infra cost per article.

Quality metrics pyramid with north star, product metrics, and technical metrics for the extraction pipeline

Compliance & Security

No personal data in scope; privacy alignment satisfied by default.

On-prem deployment available for production.

Source-span traceability for every extracted fact.

Quality-drift alerts on AI components.

Results

1
Time per article: 2 hours → 15 minutes with human review.
2
Extraction accuracy: >90% on the typical corpus.
3
Annual savings: ~875 hours at 10 reviews × 50 articles (≈$48K at $55/hour fully loaded).
4
Pilot delivered in 7 weeks from kickoff to working pipeline.