From Protocol to FHIR: Mapping Eligibility Criteria to Structured EHR Data
When a sponsor delivers a final protocol, the eligibility criteria section looks straightforward—inclusion and exclusion criteria listed in plain clinical language, often with the precision required for regulatory review. The challenge for CROs implementing AI-assisted screening is that this plain language needs to be translated into something a database can evaluate: structured queries against FHIR resources, with specific code sets, value ranges, and temporal relationships.
That translation work is not automatic. It requires both clinical knowledge (what does “active infection requiring systemic treatment” mean in terms of observable EHR data?) and data engineering knowledge (which FHIR resource and which code system will reliably surface that condition?). Getting it wrong means either rejecting eligible patients on false positives or failing to catch disqualifying conditions early.
Breaking Down a Criterion into FHIR Queries
Consider a common exclusion: “Prior treatment with any anti-PD-1 or anti-PD-L1 antibody.” The FHIR mapping requires identifying the relevant medication class (checkpoint inhibitors in the PD-1/PD-L1 pathway), translating that to RxNorm concept codes covering branded and generic names (nivolumab, pembrolizumab, atezolizumab, durvalumab, avelumab, cemiplimab), and querying the MedicationRequest and MedicationStatement resources with a patient-level filter.
A criterion like “adequate renal function (creatinine clearance ≥ 45 mL/min)” maps to the Observation resource using LOINC code 2160-0 (Creatinine [Mass/volume] in Serum or Plasma) or calculated GFR codes, with a value threshold and a recency window (typically the most recent value within 28 days of enrollment). That window handling is easy to miss and produces inaccurate screening when omitted.
The Code Set Problem
Protocol language rarely specifies which code system to use. “Prior diagnosis of hepatitis B or C” could be documented in ICD-10-CM (B18.1, B18.2, B19.10, B19.11…), SNOMED CT, or as a free-text problem list entry in a system that didn’t enforce structured coding at data entry. A mapping that only queries ICD-10 will miss patients whose hepatitis diagnosis was entered via SNOMED or as an unstructured problem.
Robust eligibility mapping requires building code set definitions that span the coding systems your participating sites actually use, then validating those code sets against historical patient populations to check for systematic under-capture or over-capture before the trial goes live.
Temporal Relationships
Many criteria include time-based logic: “no chemotherapy within 21 days of enrollment,” “prior radiation to the target field more than 6 months ago.” These require calculating intervals between the candidate’s enrollment date (or screening date) and the relevant clinical event date. FHIR resources expose date fields, but those fields are often populated inconsistently—procedure dates may reflect billing date rather than clinical date, and some sites populate effective dates while others use recorded dates.
Understanding which date fields are reliably populated at each participating site, and calibrating temporal queries accordingly, is part of the protocol intake work that precedes screening. It’s also why eligibility rule sets need to be reviewed with site coordinators, not just with the sponsor’s medical monitor.
Versioning for Protocol Amendments
Protocols change. Amendments mid-enrollment are common, and each amendment may modify inclusion or exclusion criteria. A well-managed eligibility rule set is versioned so that amendments can be incorporated as rule updates applied from a specific screening date forward, without invalidating screening decisions made under the previous version.
The practical implication: eligibility mapping infrastructure built for a single static protocol will require significant rework when the first amendment arrives. Mapping frameworks designed with versioning built in can absorb amendments in days rather than weeks.