Your agent’s reasoning is fine. Its memory isn’t. GraphRAG turns organizational knowledge into a connected graph that agents can traverse for incident response.
Source: Decoding AI Magazine – Anca Ioana Muscalagiu | Date: January 20, 2026
The Problem
Production incidents aren’t slowed by the lack of a fix. They’re slowed by the lack of clarity.
Context is scattered across Slack threads, Confluence pages, half-written runbooks, and the memories of engineers who may have left the company. When a pager fires at 02:13, engineers spend more time reconstructing context than actually resolving the issue.
The knowledge that holds systems together is relational: services depend on services, teams own systems, incidents recur in patterns. A flat vector store retrieves similar text but cannot express ownership chains, dependency graphs, or blast radius.
What Is GraphRAG?
GraphRAG is RAG where retrieval is guided by graph structure, not just similarity scores.
| Approach | Retrieval Method | Good For |
|---|---|---|
| Traditional RAG | Semantically similar text chunks from vector DB | Point lookups, specific facts |
| GraphRAG | Connected knowledge via graph traversal | Coverage questions, dependency chains, synthesis across systems |
Traditional RAG answers: “What’s the most relevant chunk?”
GraphRAG answers: “What do we know about this issue across teams, services, and history?”
Two-Phase Architecture
Phase 1: Graph Generation (Offline)
- Source Documents to Text Chunks. Break runbooks, postmortems, architecture docs into indexable pieces.
- Text Chunks to Element Instances. Extract entities (services, teams, incidents) and relationships (DEPENDS_ON, OWNED_BY) from each chunk.
- Element Instances to Element Summaries. LLM generates concise summaries for each entity and relationship. These summaries become the node/edge properties and the basis for embeddings.
- Element Summaries to Graph Communities. Cluster the graph using hierarchical Leiden. Communities naturally align with operational boundaries (a platform area, a group of interdependent services).
- Graph Communities to Community Summaries. LLM summarizes each community. These become the primary retrieval units.
Phase 2: Query Answering (Runtime)
- Semantic entry point lookup. Use embeddings to find the most relevant nodes for the alert.
- Graph expansion. Traverse edges (DEPENDS_ON, OWNED_BY, HAS_RUNBOOK) to capture blast radius and context.
- Community-level synthesis. Identify relevant communities, generate intermediate answers from each, merge into a global response.
Knowledge Graph Schema
Nodes:
| Type | Properties |
|---|---|
| Service | name, domain, tier, repo, tags, embedding |
| Team | name, oncall channel, owners, embedding |
| Incident | id, timestamp, severity, summary, embedding |
| Runbook | url, title, steps summary, embedding |
| Doc | source, url, title, embedding |
| Release/PR | id, timestamp, author, summary, embedding |
Relationships:
| Edge | Direction |
|---|---|
| DEPENDS_ON | Service -> Service |
| OWNED_BY | Service -> Team |
| AFFECTED | Incident -> Service |
| RESPONDED_BY | Incident -> Team |
| HAS_RUNBOOK | Service -> Runbook |
| DOCUMENTED_IN | Service/Incident -> Doc |
| RELATED_TO | Incident <-> Incident |
| INTRODUCED_BY | Incident/Service -> Release/PR |
Each node carries a vector embedding derived from its LLM-generated summary, not from raw documents.
Concrete Example: Payments API 5xx Spike
Input: A Confluence runbook about “Payments API 5xx spike after deploy.”
Extracted entities:
- Service:
payments-api,auth-service,ledger-service - Team:
Payments Platform - Runbook: “Payments API 5xx spike after deploy”
Extracted relationships:
payments-apiDEPENDS_ONauth-servicepayments-apiDEPENDS_ONledger-servicepayments-apiOWNED_BYPayments Platformpayments-apiHAS_RUNBOOK “Payments API 5xx spike after deploy”
At query time (alert: payments-api 5xx spike):
MATCH (s:Service {name: "payments-api"})
OPTIONAL MATCH (s)-[:DEPENDS_ON]->(dep:Service)
OPTIONAL MATCH (s)-[:OWNED_BY]->(t:Team)
OPTIONAL MATCH (s)-[:HAS_RUNBOOK]->(r:Runbook)
RETURN s, collect(dep) AS dependencies, t AS owner, collect(r) AS runbooks
Bounded expansion for blast radius:
MATCH (s:Service {name: "payments-api"})-[:DEPENDS_ON*1..2]->(dep:Service)
RETURN s, collect(DISTINCT dep) AS deps_2_hops
This reconstructs a slice of the system: which services are involved, how far the blast radius extends, who owns what, and which operational knowledge applies.
System Architecture
Five components with clear boundaries:
- Alerting System. Prometheus detects threshold breaches, routes via Alertmanager to FastAPI webhook. Single entry point for all incidents.
- Agent Component. FastAPI server + Agent Controller. Orchestrates GraphRAG queries, MCP tool calls, and LLM inference. Custom explicit agent loop (no framework), making behavior predictable and debuggable.
- GraphRAG Component. Neo4j graph database with vector embeddings on nodes. Graph Query Engine performs semantic search + traversal.
- MCP Servers. Global MCP router forwards to specialized servers: Confluence (docs), GitHub (code changes), Slack (history + notifications), Prometheus (live metrics).
- Observability. Opik traces prompt monitoring, tool usage, and retrieval latency.
Data Flow
- Prometheus alert fires webhook to FastAPI
- Agent Controller queries GraphRAG for related services/teams
- Graph Query Engine: semantic search + edge traversal
- Agent sends tool plan to LLM (Gemini)
- MCP servers return live data (metrics, recent deploys, Slack discussions, docs)
- LLM synthesizes structured incident report
- Slack MCP posts to affected team channels
Steps 4-6 can loop as the LLM requests additional tool calls.
Graph vs MCP: Data Priority
The graph holds structure and history. MCP holds what’s happening right now.
Priority order for conflicting information:
- MCP servers provide current state (deployments, metrics, discussions)
- Graph provides historical patterns and documented structure
- If they conflict, use MCP data and flag the discrepancy
Maintenance cadence: Build the graph once from existing docs, then run daily syncs. Production topology changes slowly enough that nightly updates suffice.
Tech Stack Choices
| Component | Choice | Rationale |
|---|---|---|
| App server | FastAPI | Async by default, I/O-heavy workloads |
| Agent orchestration | Custom controller | Explicit loop, no hidden abstractions from frameworks |
| Graph DB | Neo4j | Native graph traversal + vector indexing on nodes |
| Retrieval | LlamaIndex PropertyGraph | Built-in agentic GraphRAG support |
| LLM | Gemini via gateway | Provider-agnostic abstraction layer |
| Observability | Opik | End-to-end trace logging for agent behavior |
Key design decision: custom agent controller over LangChain/LangGraph. Frameworks hide execution order and error handling behind abstractions that become liabilities in production. For incident response, behavior must be predictable and debuggable.
When to Use GraphRAG vs Vector RAG
Use GraphRAG when:
- Questions are about coverage and synthesis, not similarity (“What do we know across teams?”)
- The domain is inherently relational (services, dependencies, ownership)
- You need to trace propagation paths (blast radius, dependency chains)
- Context is fragmented across many systems
Use traditional RAG when:
- Questions target specific facts in specific documents
- Relationships between entities aren’t central to the answer
- The corpus is flat and doesn’t have meaningful graph structure
Key Insight
The retrieval step becomes an act of structural reasoning over the organization itself. We are no longer pulling “relevant documents.” We are reconstructing a slice of the system.
Related
- Agent Reliability Chasm – Why 95% of agent PoCs fail in production
- MCP Server for Project Context – MCP as the real-time context layer
- Information Theory for Coding Agents – Signal efficiency in retrieval
- Progressive Disclosure of Context – Load only relevant context
- Institutional Memory via Learning Files – Persistent knowledge across sessions

