🤖 Multi-Agent Orchestration for Community Paramedicine — A 24-Hour Build

November 15, 2025

By Ted Steinmann

What happens when you give a product-minded engineer 24 hours, a healthcare use case, and access to an agent framework?

Our team runs periodic hooley days — dedicated 24-hour windows where you pick a problem, build something real, and present what you shipped. No slides. No strategy decks. Just working software.

For this one, I chose a problem I'd been thinking about for a while: community paramedicine data is fragmented across formats, systems, and standards — and the paramedics who need it most have the least time to piece it together.

The goal was simple: build a conversational AI assistant that could pull patient context from multiple healthcare data sources and deliver unified, actionable guidance — in a single day.


🔍 The Problem: Fragmented Data in the Field

Community paramedics conduct home visits for patients with complex conditions. Before each visit, they need to understand the patient's recent history — hospital discharges, EMS encounters, primary care notes, imaging results.

That information exists, but it's scattered across:

  • CDA documents (hospital discharge summaries in XML)
  • NEMSIS records (EMS run sheets with vitals and interventions)
  • PDFs (primary care notes, scanned documents)
  • Imaging reports (radiology findings, DICOM metadata)

In practice, paramedics often arrive at a home visit with incomplete context — or spend significant time manually assembling it. The question was: could an agent-based system do this assembly automatically?


⚙️ The Architecture: Fan-Out/Fan-In with Specialized Agents

I built the system using the Microsoft Agent Framework with Azure OpenAI (GPT-4o) as the reasoning layer. The architecture follows a fan-out/fan-in workflow pattern with 7 specialized agents:

Orchestrator  [CDA | NEMSIS | PDF | Imaging]  Aggregator  Summarization  Decision Support
               (parallel data extraction)        (fan-in)     (synthesis)      (recommendations)

Each agent has a focused responsibility:

Agent Role
Orchestrator Entry point — discovers patients, routes queries, coordinates data gathering
CDA Agent Extracts diagnoses, procedures, and medications from XML clinical documents
NEMSIS Agent Parses vitals, interventions, and transport details from EMS records
PDF Agent Extracts clinical narratives and care plans from unstructured documents
Imaging Agent Identifies findings and recommendations from radiology reports
Summarization Agent Consolidates multi-source data with deduplication rules
Decision Support Produces conversational output with structured RED FLAGS, MEDICATION CONCERNS, and RECOMMENDED ACTIONS

The Orchestrator exposes tools for patient discovery (list_patients, list_encounters, get_encounter_files) and file routing (classify_file), giving the system the ability to navigate the data landscape before extracting from it.


🧠 Context Engineering: Where the Real Work Happened

The architecture came together quickly. The prompt engineering — or more accurately, context engineering — is where most of the 24 hours went.

A few things I learned iteratively:

Redundant red flag detection was a problem. Early versions had every agent surfacing clinical warnings independently. The Summarization Agent would then produce a wall of duplicated alerts. The fix: move all red flag logic exclusively to the Decision Support agent, and add explicit stop/deduplication instructions upstream. This is a form of output guardrails — shaping agent behavior so downstream consumers get clean signal.

Fan-in synchronization required custom code. The Agent Framework's built-in AgentExecutor couldn't natively accept list[AgentExecutorResponse] from multiple parallel agents. I built a custom DataAgentAggregator (an Executor subclass) to handle the fan-in — collecting parallel extraction results and consolidating them before passing to Summarization.

Token compression mattered. The four data agents collectively produced ~4K tokens of extracted content. The Summarization Agent compressed this to ~1K tokens of consolidated clinical data — enough for Decision Support to reason over without hitting context limits or producing unfocused output.


🎯 What It Produced

Here's what the system generates for a sample patient encounter — a 59-year-old male with CHF, hypertension, and type 2 diabetes who recently had a CHF exacerbation:

⚠️ RED FLAGS TO WATCH: Worsening respiratory distress, particularly if associated with hypoxia or tachypnea. Increased edema or lack of response to adjusted diuretic dose. Symptoms suggestive of low cardiac output or progression of heart failure.

💊 MEDICATION CONCERNS: Ensure adherence to increased furosemide dose. Check for overdiuresis or electrolyte imbalances. Confirm no duplicative or conflicting CHF management.

📋 RECOMMENDED ACTIONS: Evaluate current symptoms, especially respiratory status and degree of edema. Review adherence to prescribed furosemide and assess for symptomatic improvement. If no clinical improvement or signs of decompensation, consider expediting cardiology consult or transport.

The output is conversational, structured, and designed for a paramedic preparing for a home visit — not for charting or clinical documentation. Human-in-the-loop review is expected before any clinical action.


💡 What 24 Hours Taught Me

Building this in a single hooley day reinforced a few things:

  • Agent frameworks lower the scaffolding cost dramatically. The orchestration patterns that would have taken weeks to build from scratch were available as composable primitives. The hard work shifts from plumbing to prompt design.

  • Context engineering > prompt engineering. Getting the right information to the right agent at the right time matters more than clever prompt wording. The biggest quality improvements came from restructuring what each agent received, not how it was asked.

  • Guardrails are a design choice, not an afterthought. Deciding where clinical safety logic lives (Decision Support only, not scattered across agents) was an architectural decision that improved both output quality and maintainability.

  • Fan-out/fan-in is a natural fit for healthcare data. Patient records are inherently multi-source and multi-format. Parallel extraction with centralized synthesis mirrors how a clinician actually assembles context — just faster.


✨ Closing Thought

This wasn't a production system. It was a proof of concept built in 24 hours to test whether agentic orchestration could meaningfully reduce the cognitive load on community paramedics.

The answer was yes — with caveats about guardrails, validation, and the irreplaceable role of clinical judgment. But as a demonstration of what's possible when you combine structured healthcare data with multi-agent AI, it exceeded what I expected to ship in a single day.

The technologies — Microsoft Agent Framework, Azure OpenAI, DevUI for workflow visualization, and Python 3.13 with Docker — all contributed. But the real takeaway was simpler: well-scoped problems with clear data boundaries are ideal candidates for agentic workflows, and a focused day is enough to prove it.



Categories: blog

Tags: technology, systems, data