Module 19 · Section 19.3

RAG with Knowledge Graphs

Combining structured knowledge representations with retrieval-augmented generation for multi-hop reasoning and entity-aware retrieval
★ Big Picture

Vector search finds documents that are semantically similar, but it cannot reason about relationships between entities. Knowledge graphs encode structured relationships (entities, relations, triples) that enable multi-hop reasoning, entity disambiguation, and relationship-aware retrieval. GraphRAG combines the structured reasoning of knowledge graphs with the generative power of LLMs, unlocking capabilities like "find all companies that partner with competitors of our top customer" that are impossible with vector search alone. This section covers knowledge graph fundamentals, graph embedding methods, and the GraphRAG paradigm.

1. Knowledge Graph Fundamentals

A knowledge graph (KG) is a structured representation of real-world entities and the relationships between them. Unlike documents stored as unstructured text, knowledge graphs encode information as a network of nodes (entities) connected by typed edges (relations). This structure enables precise queries about relationships, paths, and patterns that would require complex reasoning over unstructured text.

1.1 Entities, Relations, and Triples

The fundamental unit of a knowledge graph is the triple, expressed as (subject, predicate, object) or equivalently (head, relation, tail). For example, (Albert Einstein, bornIn, Ulm) encodes a single fact. A knowledge graph is simply a collection of such triples, forming a directed labeled graph where entities are nodes and relations are edge labels.

Knowledge Graph: Scientific Domain Albert Einstein Ulm General Relativity Physics Nobel Prize 1921 ETH Zurich bornIn developed field wonAward studiedAt
Figure 19.7: A knowledge graph encodes entities as nodes and relationships as labeled directed edges, enabling structured queries and multi-hop reasoning.

1.2 RDF vs. Property Graphs

Two main formalisms exist for representing knowledge graphs. RDF (Resource Description Framework) is a W3C standard that represents all knowledge as triples and uses URIs to identify entities and relations. RDF is queried using SPARQL and is the foundation of the Semantic Web (Wikidata, DBpedia). Property graphs allow nodes and edges to carry arbitrary key-value properties, offering greater flexibility. Property graphs are queried using Cypher (Neo4j) or Gremlin and are more commonly used in industry applications.

Feature RDF Property Graphs
Data model Subject-Predicate-Object triples Nodes and edges with properties
Query language SPARQL Cypher, Gremlin
Edge properties Requires reification (complex) Native support
Standards W3C standard, widely interoperable No universal standard
Typical databases Blazegraph, Virtuoso, Stardog Neo4j, Amazon Neptune, TigerGraph
Best for Linked data, semantic web, ontologies Application data, recommendations

2. Building Knowledge Graphs with LLMs

Traditionally, knowledge graph construction required extensive manual curation or specialized NLP pipelines for entity extraction and relation classification. LLMs have dramatically simplified this process by extracting structured triples directly from unstructured text through carefully designed prompts.

from openai import OpenAI
import json

client = OpenAI()

def extract_triples(text, entity_types=None):
    """Extract knowledge graph triples from text using an LLM."""
    type_hint = ""
    if entity_types:
        type_hint = f"\nFocus on these entity types: {', '.join(entity_types)}"

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{
            "role": "system",
            "content": f"""Extract knowledge graph triples from the text.
Return a JSON array of objects with keys: head, relation, tail.
Use concise, canonical entity names.
Use lowercase relation names with underscores.{type_hint}

Example output:
[
  {{"head": "OpenAI", "relation": "developed", "tail": "GPT-4"}},
  {{"head": "GPT-4", "relation": "is_a", "tail": "language model"}}
]"""
        }, {
            "role": "user",
            "content": text
        }],
        response_format={"type": "json_object"},
        temperature=0.0
    )

    result = json.loads(response.choices[0].message.content)
    return result.get("triples", result)

2.1 Storing Triples in Neo4j

from neo4j import GraphDatabase

class KnowledgeGraphStore:
    """Store and query knowledge graph triples in Neo4j."""

    def __init__(self, uri, user, password):
        self.driver = GraphDatabase.driver(uri, auth=(user, password))

    def add_triples(self, triples):
        """Insert triples as nodes and relationships."""
        with self.driver.session() as session:
            for triple in triples:
                session.run("""
                    MERGE (h:Entity {name: $head})
                    MERGE (t:Entity {name: $tail})
                    MERGE (h)-[r:RELATES {type: $relation}]->(t)
                """, head=triple["head"],
                     tail=triple["tail"],
                     relation=triple["relation"])

    def find_neighbors(self, entity, max_hops=2):
        """Find all entities within N hops of the given entity."""
        with self.driver.session() as session:
            result = session.run(f"""
                MATCH path = (e:Entity {{name: $entity}})
                      -[*1..{max_hops}]-(neighbor)
                RETURN path
                LIMIT 50
            """, entity=entity)
            return [record["path"] for record in result]

    def query_relationship(self, head, tail):
        """Find relationship paths between two entities."""
        with self.driver.session() as session:
            result = session.run("""
                MATCH path = shortestPath(
                    (h:Entity {name: $head})-[*]-(t:Entity {name: $tail})
                )
                RETURN path
            """, head=head, tail=tail)
            return [record["path"] for record in result]
ⓘ Entity Resolution

One of the hardest challenges in LLM-powered KG construction is entity resolution: determining that "Einstein," "Albert Einstein," and "A. Einstein" all refer to the same entity. Without careful entity resolution, the graph becomes fragmented with duplicate nodes. Common approaches include embedding-based deduplication (comparing entity name embeddings), LLM-based coreference resolution, and linking to external KGs like Wikidata for canonical entity identifiers.

3. Graph Embeddings

Graph embeddings map entities and relations from a knowledge graph into continuous vector spaces, enabling similarity-based retrieval, link prediction (predicting missing triples), and integration with dense retrieval systems. The key challenge is preserving the structural properties of the graph in the embedding space.

3.1 TransE

TransE (Bordes et al., 2013) is the foundational graph embedding model. It represents relations as translations in embedding space: if (h, r, t) is a true triple, then the embedding of the head entity plus the embedding of the relation should approximately equal the embedding of the tail entity: h + r ≈ t. The model is trained by minimizing a margin-based loss that pushes correct triples closer than corrupted triples.

3.2 DistMult

DistMult (Yang et al., 2015) models relations as diagonal matrices (represented as vectors) and scores triples using a bilinear product: score(h, r, t) = h · diag(r) · t. This is equivalent to an element-wise product followed by a sum. DistMult is simple and effective but cannot model asymmetric relations (it gives the same score to (h, r, t) and (t, r, h)), which limits its applicability.

import torch
import torch.nn as nn

class TransE(nn.Module):
    """TransE knowledge graph embedding model."""

    def __init__(self, num_entities, num_relations, dim=128):
        super().__init__()
        self.entity_emb = nn.Embedding(num_entities, dim)
        self.relation_emb = nn.Embedding(num_relations, dim)

        # Initialize with uniform distribution
        nn.init.uniform_(self.entity_emb.weight, -6/dim**0.5, 6/dim**0.5)
        nn.init.uniform_(self.relation_emb.weight, -6/dim**0.5, 6/dim**0.5)

    def forward(self, heads, relations, tails):
        """Score triples: lower distance = more plausible."""
        h = self.entity_emb(heads)
        r = self.relation_emb(relations)
        t = self.entity_emb(tails)

        # TransE scoring: ||h + r - t||
        score = torch.norm(h + r - t, p=2, dim=-1)
        return score

    def margin_loss(self, pos_score, neg_score, margin=1.0):
        """Margin-based ranking loss."""
        return torch.relu(margin + pos_score - neg_score).mean()

4. GraphRAG

GraphRAG (Microsoft Research, 2024) represents a paradigm shift in how knowledge graphs are used for RAG. Instead of querying a pre-existing knowledge graph, GraphRAG builds a knowledge graph from the document corpus itself, then uses community detection to create hierarchical summaries of entity clusters. This enables both local retrieval (finding specific facts) and global retrieval (answering questions that require synthesizing information across the entire corpus).

4.1 The GraphRAG Pipeline

  1. Entity and relation extraction: An LLM extracts entities and relationships from each document chunk, building a corpus-wide knowledge graph.
  2. Community detection: The Leiden algorithm partitions the graph into communities of closely connected entities, creating a hierarchy from fine-grained clusters to broad themes.
  3. Community summarization: An LLM generates a summary for each community, capturing the key entities, relationships, and themes within that cluster.
  4. Query routing: At query time, the system determines whether to use local search (entity-based retrieval for specific questions) or global search (community summary-based retrieval for broad questions).
GraphRAG Pipeline 1. Extract LLM extracts entities and relations from text 2. Build KG Merge into unified knowledge graph 3. Communities Leiden algorithm finds entity clusters 4. Summarize LLM summarizes each community cluster Query Router Local or Global? Local Search Entity neighbors + related chunks "What did Einstein discover?" Global Search Community summaries (map-reduce) "What are the main themes?"
Figure 19.8: GraphRAG builds a knowledge graph from documents, detects communities, generates summaries, then routes queries to local (entity) or global (community) search.
★ Key Insight

GraphRAG's most powerful capability is answering global questions that require synthesizing information across an entire corpus. Questions like "What are the major themes in this dataset?" or "How do the research areas in this collection relate to each other?" cannot be answered by retrieving a few chunks. GraphRAG's community summaries pre-compute these corpus-level insights at indexing time, making global queries feasible at serving time through a map-reduce approach over community summaries.

5. Hybrid KG + Vector Retrieval

The most effective production systems combine knowledge graph traversal with vector search. The knowledge graph provides structured multi-hop reasoning and entity-aware retrieval, while vector search provides semantic similarity for unstructured content that the KG does not cover.

def hybrid_kg_vector_retrieve(query, kg_store, vector_collection, k=5):
    """Combine KG traversal with vector search."""

    # Step 1: Extract entities from query
    entities = extract_entities_from_query(query)

    # Step 2: KG retrieval (structured paths)
    kg_context = []
    for entity in entities:
        neighbors = kg_store.find_neighbors(entity, max_hops=2)
        kg_context.extend(format_paths(neighbors))

    # Step 3: Vector retrieval (semantic similarity)
    vector_results = vector_collection.query(
        query_texts=[query],
        n_results=k
    )
    vector_context = vector_results["documents"][0]

    # Step 4: Combine both context types
    combined_context = f"""Structured Knowledge (from Knowledge Graph):
{chr(10).join(kg_context)}

Relevant Documents (from semantic search):
{chr(10).join(vector_context)}"""

    return combined_context
⚠ GraphRAG Cost Considerations

GraphRAG's indexing phase is significantly more expensive than standard RAG because it requires LLM calls for every chunk (entity extraction), every entity pair (relation validation), and every community (summary generation). For a corpus of 10,000 documents, indexing can cost $50 to $500 in LLM API calls depending on the model and chunk count. However, this is a one-time cost, and the resulting graph with community summaries enables query capabilities that are impossible with vector-only RAG.

6. When to Use Knowledge Graph RAG

Use Case Vector RAG KG RAG GraphRAG
Simple factual Q&A Excellent Good Overkill
Multi-hop reasoning Poor Excellent Excellent
Corpus-level themes Poor Moderate Excellent
Entity disambiguation Poor Excellent Good
Setup complexity Low High Very High
Indexing cost Low (embedding only) Medium (KG construction) High (LLM-intensive)

Section 19.3 Quiz

1. What is a knowledge graph triple, and how does it differ from a vector embedding?
Show Answer
A triple is a structured fact expressed as (subject, predicate, object), for example (Einstein, bornIn, Ulm). It encodes an explicit, typed relationship between two entities. A vector embedding is a continuous numerical representation of text in a high-dimensional space that captures semantic similarity but does not encode explicit relationships. Triples are discrete and interpretable; embeddings are continuous and opaque.
2. How does TransE represent relations in embedding space?
Show Answer
TransE represents relations as translation vectors in embedding space. For a true triple (h, r, t), the model learns embeddings such that h + r is approximately equal to t. The training objective minimizes the distance ||h + r - t|| for correct triples while maximizing it for corrupted (false) triples using a margin-based ranking loss.
3. What is the key innovation of GraphRAG compared to traditional KG-based RAG?
Show Answer
GraphRAG builds a knowledge graph from the document corpus itself (rather than relying on a pre-existing KG), then applies community detection to find clusters of related entities, and generates LLM summaries for each community. This enables "global search" over community summaries to answer corpus-level questions (e.g., "What are the main themes?") that are impossible with entity-level or chunk-level retrieval alone.
4. What is the main limitation of DistMult compared to TransE?
Show Answer
DistMult cannot model asymmetric relations. Because it scores triples using a symmetric bilinear product (element-wise multiplication followed by sum), it assigns the same score to (h, r, t) and (t, r, h). This means it cannot distinguish "A manages B" from "B manages A." TransE handles asymmetry naturally because h + r = t does not imply t + r = h.
5. Why is entity resolution particularly challenging in LLM-based KG construction?
Show Answer
LLMs extract entity mentions as surface strings, so the same entity may appear with different names, abbreviations, or spellings across documents (e.g., "Einstein," "Albert Einstein," "A. Einstein"). Without entity resolution, the graph becomes fragmented with duplicate nodes that should be merged. This is hard because it requires disambiguating entities with similar names (e.g., "Jordan" the country vs. "Jordan" the person) and linking variant spellings to canonical forms.

Key Takeaways