GraphRAG for Production Engineer Agents

James Phoenix

Your agent’s reasoning is fine. Its memory isn’t. GraphRAG turns organizational knowledge into a connected graph that agents can traverse for incident response.

Udemy Bestseller

Learn Prompt Engineering

My O'Reilly book adapted for hands-on learning. Build production-ready prompts with practical exercises.

★ 4.5/5 rating

306,000+ learners

View Course

Source: Decoding AI Magazine – Anca Ioana Muscalagiu | Date: January 20, 2026

The Problem

Production incidents aren’t slowed by the lack of a fix. They’re slowed by the lack of clarity.

Context is scattered across Slack threads, Confluence pages, half-written runbooks, and the memories of engineers who may have left the company. When a pager fires at 02:13, engineers spend more time reconstructing context than actually resolving the issue.

The knowledge that holds systems together is relational: services depend on services, teams own systems, incidents recur in patterns. A flat vector store retrieves similar text but cannot express ownership chains, dependency graphs, or blast radius.

What Is GraphRAG?

GraphRAG is RAG where retrieval is guided by graph structure, not just similarity scores.

Approach	Retrieval Method	Good For
Traditional RAG	Semantically similar text chunks from vector DB	Point lookups, specific facts
GraphRAG	Connected knowledge via graph traversal	Coverage questions, dependency chains, synthesis across systems

Traditional RAG answers: “What’s the most relevant chunk?”
GraphRAG answers: “What do we know about this issue across teams, services, and history?”

Two-Phase Architecture

Phase 1: Graph Generation (Offline)

Source Documents to Text Chunks. Break runbooks, postmortems, architecture docs into indexable pieces.
Text Chunks to Element Instances. Extract entities (services, teams, incidents) and relationships (DEPENDS_ON, OWNED_BY) from each chunk.
Element Instances to Element Summaries. LLM generates concise summaries for each entity and relationship. These summaries become the node/edge properties and the basis for embeddings.
Element Summaries to Graph Communities. Cluster the graph using hierarchical Leiden. Communities naturally align with operational boundaries (a platform area, a group of interdependent services).
Graph Communities to Community Summaries. LLM summarizes each community. These become the primary retrieval units.

Phase 2: Query Answering (Runtime)

Semantic entry point lookup. Use embeddings to find the most relevant nodes for the alert.
Graph expansion. Traverse edges (DEPENDS_ON, OWNED_BY, HAS_RUNBOOK) to capture blast radius and context.
Community-level synthesis. Identify relevant communities, generate intermediate answers from each, merge into a global response.

Knowledge Graph Schema

Nodes:

Type	Properties
Service	name, domain, tier, repo, tags, embedding
Team	name, oncall channel, owners, embedding
Incident	id, timestamp, severity, summary, embedding
Runbook	url, title, steps summary, embedding
Doc	source, url, title, embedding
Release/PR	id, timestamp, author, summary, embedding

Relationships:

Edge	Direction
DEPENDS_ON	Service -> Service
OWNED_BY	Service -> Team
AFFECTED	Incident -> Service
RESPONDED_BY	Incident -> Team
HAS_RUNBOOK	Service -> Runbook
DOCUMENTED_IN	Service/Incident -> Doc
RELATED_TO	Incident <-> Incident
INTRODUCED_BY	Incident/Service -> Release/PR

Each node carries a vector embedding derived from its LLM-generated summary, not from raw documents.

Concrete Example: Payments API 5xx Spike

Input: A Confluence runbook about “Payments API 5xx spike after deploy.”

Extracted entities:

Service: payments-api, auth-service, ledger-service
Team: Payments Platform
Runbook: “Payments API 5xx spike after deploy”

Extracted relationships:

payments-api DEPENDS_ON auth-service
payments-api DEPENDS_ON ledger-service
payments-api OWNED_BY Payments Platform
payments-api HAS_RUNBOOK “Payments API 5xx spike after deploy”

At query time (alert: payments-api 5xx spike):

MATCH (s:Service {name: "payments-api"})
OPTIONAL MATCH (s)-[:DEPENDS_ON]->(dep:Service)
OPTIONAL MATCH (s)-[:OWNED_BY]->(t:Team)
OPTIONAL MATCH (s)-[:HAS_RUNBOOK]->(r:Runbook)
RETURN s, collect(dep) AS dependencies, t AS owner, collect(r) AS runbooks

Bounded expansion for blast radius:

MATCH (s:Service {name: "payments-api"})-[:DEPENDS_ON*1..2]->(dep:Service)
RETURN s, collect(DISTINCT dep) AS deps_2_hops

This reconstructs a slice of the system: which services are involved, how far the blast radius extends, who owns what, and which operational knowledge applies.

System Architecture

Five components with clear boundaries:

Alerting System. Prometheus detects threshold breaches, routes via Alertmanager to FastAPI webhook. Single entry point for all incidents.
Agent Component. FastAPI server + Agent Controller. Orchestrates GraphRAG queries, MCP tool calls, and LLM inference. Custom explicit agent loop (no framework), making behavior predictable and debuggable.
GraphRAG Component. Neo4j graph database with vector embeddings on nodes. Graph Query Engine performs semantic search + traversal.
MCP Servers. Global MCP router forwards to specialized servers: Confluence (docs), GitHub (code changes), Slack (history + notifications), Prometheus (live metrics).
Observability. Opik traces prompt monitoring, tool usage, and retrieval latency.

Data Flow

Prometheus alert fires webhook to FastAPI
Agent Controller queries GraphRAG for related services/teams
Graph Query Engine: semantic search + edge traversal
Agent sends tool plan to LLM (Gemini)
MCP servers return live data (metrics, recent deploys, Slack discussions, docs)
LLM synthesizes structured incident report
Slack MCP posts to affected team channels

Steps 4-6 can loop as the LLM requests additional tool calls.

Graph vs MCP: Data Priority

The graph holds structure and history. MCP holds what’s happening right now.

Priority order for conflicting information:

MCP servers provide current state (deployments, metrics, discussions)
Graph provides historical patterns and documented structure
If they conflict, use MCP data and flag the discrepancy

Maintenance cadence: Build the graph once from existing docs, then run daily syncs. Production topology changes slowly enough that nightly updates suffice.

Tech Stack Choices

Component	Choice	Rationale
App server	FastAPI	Async by default, I/O-heavy workloads
Agent orchestration	Custom controller	Explicit loop, no hidden abstractions from frameworks
Graph DB	Neo4j	Native graph traversal + vector indexing on nodes
Retrieval	LlamaIndex PropertyGraph	Built-in agentic GraphRAG support
LLM	Gemini via gateway	Provider-agnostic abstraction layer
Observability	Opik	End-to-end trace logging for agent behavior

Key design decision: custom agent controller over LangChain/LangGraph. Frameworks hide execution order and error handling behind abstractions that become liabilities in production. For incident response, behavior must be predictable and debuggable.

When to Use GraphRAG vs Vector RAG

Use GraphRAG when:

Questions are about coverage and synthesis, not similarity (“What do we know across teams?”)
The domain is inherently relational (services, dependencies, ownership)
You need to trace propagation paths (blast radius, dependency chains)
Context is fragmented across many systems

Use traditional RAG when:

Questions target specific facts in specific documents
Relationships between entities aren’t central to the answer
The corpus is flat and doesn’t have meaningful graph structure

Key Insight

The retrieval step becomes an act of structural reasoning over the organization itself. We are no longer pulling “relevant documents.” We are reconstructing a slice of the system.

Agent Reliability Chasm – Why 95% of agent PoCs fail in production
MCP Server for Project Context – MCP as the real-time context layer
Information Theory for Coding Agents – Signal efficiency in retrieval
Progressive Disclosure of Context – Load only relevant context
Institutional Memory via Learning Files – Persistent knowledge across sessions