Regulatory Considerations

Patient Privacy in EHR-Based Matching: What Your IRB Will Ask and How to Answer

Cohortbridge Editorial · September 20, 2025 · 10 min read

Abstract shield icon representing patient data protection in clinical trial matching

IRBs at academic medical centers and health systems are increasingly fielding requests to review EHR-based patient matching activities for clinical trial feasibility. Many of these reviews are straightforward — the matching doesn't constitute human subjects research, and the IRB issues a determination letter rather than full protocol approval. But some IRBs are asking harder questions, and the answers those questions demand depend on architecture choices that a CRO made weeks or months before the IRB submission was drafted. Answering those questions after the fact is significantly harder than designing for them in advance.

This piece addresses the three most substantive questions IRBs and health system research compliance offices ask about EHR-based patient matching, and explains why the technical architecture of the matching platform determines the answers more than any consent form or data use policy does.

Question 1: Does This Constitute Human Subjects Research?

Under the Common Rule (45 CFR Part 46), human subjects research is defined as a systematic investigation designed to develop or contribute to generalizable knowledge involving human subjects — which includes living individuals from whom identifiable private information is obtained. The feasibility question is whether EHR-based patient matching for clinical trial recruitment meets this definition.

The regulatory analysis generally turns on two factors. First, is the matching activity designed to generate generalizable knowledge, or is it a preparatory activity to support a specific research protocol? Feasibility matching to identify eligible patients for a specific trial is typically characterized as "preparatory to research" under 45 CFR 46.116(d)(3), which allows waiver of consent for activities preparatory to research under certain conditions, rather than as research itself. Second, does the matching platform access individually identifiable protected health information, or does it operate on de-identified data?

An architecture in which the matching engine runs locally within the health system's network — querying the EHR database directly and returning only de-identified cohort identifiers to the CRO — has a meaningfully different regulatory footprint than an architecture that transfers identified patient records to a cloud platform outside the health system's control. The former can often be characterized as a quality improvement or operational activity rather than research; the latter is harder to distinguish from research involving identified PHI, which triggers the full Common Rule analysis.

We're not saying that cloud-based architectures can't be used for patient matching — they can, under appropriate HIPAA protections and with appropriate regulatory determinations. We're saying that IRBs and research compliance offices notice the difference, and that the architecture choice directly affects how clean the regulatory pathway is.

Question 2: Is a Waiver of Authorization Required?

Even where the feasibility matching activity is characterized as preparatory to research rather than research itself, accessing patient EHR records without individual patient authorization requires a valid HIPAA authorization or a waiver. Under HIPAA's Privacy Rule at 45 CFR 164.512(i), uses or disclosures of PHI for research may be permitted without individual authorization under a waiver granted by an IRB or Privacy Board, provided four criteria are met: the use or disclosure involves no more than minimal risk to privacy; the research could not practicably be conducted without the waiver; the research could not practicably be conducted without access to the PHI; and when appropriate, subjects will be provided with additional pertinent information after participation.

For EHR-based feasibility matching, the "minimal risk" and "not practicably conducted without PHI" criteria are usually supportable. The argument that you cannot identify trial-eligible patients without accessing their health records is straightforward. The "minimal risk to privacy" criterion is where architecture matters: a matching system that processes de-identified data, retains no identified records outside the EHR, and produces cohort output with no individually-identifying information presents a significantly lower privacy risk profile than a system with different data retention and output characteristics.

Health system IRBs that see the technical architecture documented clearly — data flow, data residency, retention policy, output format — are substantially more efficient in their review process than IRBs that receive a conceptual description of the matching activity without specifics. That documentation isn't just a formality; it's the substantive basis for the IRB's risk assessment.

Question 3: What Happens to the EHR Query Results?

This is the question that creates the most variance in IRB responses. A CRO's IRB submission needs to clearly describe what the matching platform produces as output, where that output goes, who can see it, how long it's retained, and whether it contains any PHI.

The key distinction is between the working data used by the matching engine during the query and the output delivered to the CRO. During the matching process, the engine necessarily accesses records that contain PHI — that's what it's querying. But the output — the cohort list delivered to the CRO's feasibility team — should contain no individually identifying information. If the output contains a de-identified reference ID, an eligibility confidence score, and matched criteria indicators, but no name, date of birth, MRN, or other direct identifier, then the CRO's feasibility team never receives PHI. That's the architectural goal, and it's a meaningful regulatory distinction.

Some IRBs are also asking about downstream use: once the CRO receives the de-identified cohort list, what happens? Does the list get shared with the sponsor? Does it get used to inform site selection decisions for protocols other than the one being evaluated? These are legitimate questions, and the data use agreement governing the matching activity should address them explicitly. A protocol-specific use restriction — the cohort data from Protocol A may not be used for purposes related to Protocol B — is both reasonable and IRB-reviewable, and documenting it in advance prevents questions from arising later.

GDPR Considerations for International Programs

For CROs running feasibility programs that include EU health systems, GDPR Article 9 governs the processing of special category data, which includes health data. The legal basis for processing health data for research preparation under GDPR Article 9(2)(j) — processing for scientific or historical research purposes — requires appropriate safeguards under Article 89(1). This is a different framework from HIPAA's waiver mechanism, and EU health systems may require a separate legal basis documentation before granting API access.

The architectural preference for on-premise or locally-executed matching has additional force under GDPR: health data that never leaves the EU health system's network may face fewer cross-border transfer restrictions than data transferred to a US-based cloud platform, where the post-Schrems II compliance landscape adds compliance complexity. For international programs, this architectural consideration should be part of the initial program design conversation, not something addressed when the EU site coordinator asks where patient data is going.

Building the IRB Package That Actually Answers the Questions

A well-constructed IRB submission for EHR-based feasibility matching includes: a precise technical description of the matching platform's data access pattern; a data flow diagram showing where PHI exists during the matching process and what data leaves the health system; the de-identification standard applied to the output; a data use agreement covering purpose restriction and retention limits; and documentation that the activity constitutes preparatory research (or the applicable regulatory characterization) rather than human subjects research as defined under 45 CFR 46.102(e).

What makes IRB review efficient is not finding the right argument — it's giving the IRB enough information to make its own determination efficiently. An IRB that has to request supplemental information through two rounds of queries will take three times as long as an IRB that receives a complete submission. The preparation cost is front-loaded, but it is substantially less than the enrollment timeline cost of a delayed feasibility determination.

Cohortbridge's data handling architecture is documented in detail on the patient privacy page, including the de-identification approach, data residency, and protocol-level data isolation. For CROs working through IRB submissions that include EHR-based matching components, that documentation is available as a reference for IRB packages. The how it works overview provides the technical data flow description that IRB submissions typically require.

Want to see how Cohortbridge works with your protocol?

Schedule a de-identified match run — no commitment, just a live look at structured eligibility matching.

See a Match Run