Vector search finds documents that are semantically similar, but it cannot reason about relationships between entities. Knowledge graphs encode structured relationships (entities, relations, triples) that enable multi-hop reasoning, entity disambiguation, and relationship-aware retrieval. GraphRAG combines the structured reasoning of knowledge graphs with the generative power of LLMs, unlocking capabilities like "find all companies that partner with competitors of our top customer" that are impossible with vector search alone. This section covers knowledge graph fundamentals, graph embedding methods, and the GraphRAG paradigm.
1. Knowledge Graph Fundamentals
A knowledge graph (KG) is a structured representation of real-world entities and the relationships between them. Unlike documents stored as unstructured text, knowledge graphs encode information as a network of nodes (entities) connected by typed edges (relations). This structure enables precise queries about relationships, paths, and patterns that would require complex reasoning over unstructured text.
1.1 Entities, Relations, and Triples
The fundamental unit of a knowledge graph is the triple, expressed as
(subject, predicate, object) or equivalently (head, relation, tail).
For example, (Albert Einstein, bornIn, Ulm) encodes a single fact. A knowledge graph
is simply a collection of such triples, forming a directed labeled graph where entities are nodes
and relations are edge labels.
1.2 RDF vs. Property Graphs
Two main formalisms exist for representing knowledge graphs. RDF (Resource Description Framework) is a W3C standard that represents all knowledge as triples and uses URIs to identify entities and relations. RDF is queried using SPARQL and is the foundation of the Semantic Web (Wikidata, DBpedia). Property graphs allow nodes and edges to carry arbitrary key-value properties, offering greater flexibility. Property graphs are queried using Cypher (Neo4j) or Gremlin and are more commonly used in industry applications.
| Feature | RDF | Property Graphs |
|---|---|---|
| Data model | Subject-Predicate-Object triples | Nodes and edges with properties |
| Query language | SPARQL | Cypher, Gremlin |
| Edge properties | Requires reification (complex) | Native support |
| Standards | W3C standard, widely interoperable | No universal standard |
| Typical databases | Blazegraph, Virtuoso, Stardog | Neo4j, Amazon Neptune, TigerGraph |
| Best for | Linked data, semantic web, ontologies | Application data, recommendations |
2. Building Knowledge Graphs with LLMs
Traditionally, knowledge graph construction required extensive manual curation or specialized NLP pipelines for entity extraction and relation classification. LLMs have dramatically simplified this process by extracting structured triples directly from unstructured text through carefully designed prompts.
from openai import OpenAI import json client = OpenAI() def extract_triples(text, entity_types=None): """Extract knowledge graph triples from text using an LLM.""" type_hint = "" if entity_types: type_hint = f"\nFocus on these entity types: {', '.join(entity_types)}" response = client.chat.completions.create( model="gpt-4o", messages=[{ "role": "system", "content": f"""Extract knowledge graph triples from the text. Return a JSON array of objects with keys: head, relation, tail. Use concise, canonical entity names. Use lowercase relation names with underscores.{type_hint} Example output: [ {{"head": "OpenAI", "relation": "developed", "tail": "GPT-4"}}, {{"head": "GPT-4", "relation": "is_a", "tail": "language model"}} ]""" }, { "role": "user", "content": text }], response_format={"type": "json_object"}, temperature=0.0 ) result = json.loads(response.choices[0].message.content) return result.get("triples", result)
2.1 Storing Triples in Neo4j
from neo4j import GraphDatabase class KnowledgeGraphStore: """Store and query knowledge graph triples in Neo4j.""" def __init__(self, uri, user, password): self.driver = GraphDatabase.driver(uri, auth=(user, password)) def add_triples(self, triples): """Insert triples as nodes and relationships.""" with self.driver.session() as session: for triple in triples: session.run(""" MERGE (h:Entity {name: $head}) MERGE (t:Entity {name: $tail}) MERGE (h)-[r:RELATES {type: $relation}]->(t) """, head=triple["head"], tail=triple["tail"], relation=triple["relation"]) def find_neighbors(self, entity, max_hops=2): """Find all entities within N hops of the given entity.""" with self.driver.session() as session: result = session.run(f""" MATCH path = (e:Entity {{name: $entity}}) -[*1..{max_hops}]-(neighbor) RETURN path LIMIT 50 """, entity=entity) return [record["path"] for record in result] def query_relationship(self, head, tail): """Find relationship paths between two entities.""" with self.driver.session() as session: result = session.run(""" MATCH path = shortestPath( (h:Entity {name: $head})-[*]-(t:Entity {name: $tail}) ) RETURN path """, head=head, tail=tail) return [record["path"] for record in result]
One of the hardest challenges in LLM-powered KG construction is entity resolution: determining that "Einstein," "Albert Einstein," and "A. Einstein" all refer to the same entity. Without careful entity resolution, the graph becomes fragmented with duplicate nodes. Common approaches include embedding-based deduplication (comparing entity name embeddings), LLM-based coreference resolution, and linking to external KGs like Wikidata for canonical entity identifiers.
3. Graph Embeddings
Graph embeddings map entities and relations from a knowledge graph into continuous vector spaces, enabling similarity-based retrieval, link prediction (predicting missing triples), and integration with dense retrieval systems. The key challenge is preserving the structural properties of the graph in the embedding space.
3.1 TransE
TransE (Bordes et al., 2013) is the foundational graph embedding model. It represents relations
as translations in embedding space: if (h, r, t) is a true triple, then the
embedding of the head entity plus the embedding of the relation should approximately equal
the embedding of the tail entity: h + r ≈ t. The model is trained by
minimizing a margin-based loss that pushes correct triples closer than corrupted triples.
3.2 DistMult
DistMult (Yang et al., 2015) models relations as diagonal matrices (represented as vectors) and
scores triples using a bilinear product: score(h, r, t) = h · diag(r) · t.
This is equivalent to an element-wise product followed by a sum. DistMult is simple and effective
but cannot model asymmetric relations (it gives the same score to (h, r, t) and
(t, r, h)), which limits its applicability.
import torch import torch.nn as nn class TransE(nn.Module): """TransE knowledge graph embedding model.""" def __init__(self, num_entities, num_relations, dim=128): super().__init__() self.entity_emb = nn.Embedding(num_entities, dim) self.relation_emb = nn.Embedding(num_relations, dim) # Initialize with uniform distribution nn.init.uniform_(self.entity_emb.weight, -6/dim**0.5, 6/dim**0.5) nn.init.uniform_(self.relation_emb.weight, -6/dim**0.5, 6/dim**0.5) def forward(self, heads, relations, tails): """Score triples: lower distance = more plausible.""" h = self.entity_emb(heads) r = self.relation_emb(relations) t = self.entity_emb(tails) # TransE scoring: ||h + r - t|| score = torch.norm(h + r - t, p=2, dim=-1) return score def margin_loss(self, pos_score, neg_score, margin=1.0): """Margin-based ranking loss.""" return torch.relu(margin + pos_score - neg_score).mean()
4. GraphRAG
GraphRAG (Microsoft Research, 2024) represents a paradigm shift in how knowledge graphs are used for RAG. Instead of querying a pre-existing knowledge graph, GraphRAG builds a knowledge graph from the document corpus itself, then uses community detection to create hierarchical summaries of entity clusters. This enables both local retrieval (finding specific facts) and global retrieval (answering questions that require synthesizing information across the entire corpus).
4.1 The GraphRAG Pipeline
- Entity and relation extraction: An LLM extracts entities and relationships from each document chunk, building a corpus-wide knowledge graph.
- Community detection: The Leiden algorithm partitions the graph into communities of closely connected entities, creating a hierarchy from fine-grained clusters to broad themes.
- Community summarization: An LLM generates a summary for each community, capturing the key entities, relationships, and themes within that cluster.
- Query routing: At query time, the system determines whether to use local search (entity-based retrieval for specific questions) or global search (community summary-based retrieval for broad questions).
GraphRAG's most powerful capability is answering global questions that require synthesizing information across an entire corpus. Questions like "What are the major themes in this dataset?" or "How do the research areas in this collection relate to each other?" cannot be answered by retrieving a few chunks. GraphRAG's community summaries pre-compute these corpus-level insights at indexing time, making global queries feasible at serving time through a map-reduce approach over community summaries.
5. Hybrid KG + Vector Retrieval
The most effective production systems combine knowledge graph traversal with vector search. The knowledge graph provides structured multi-hop reasoning and entity-aware retrieval, while vector search provides semantic similarity for unstructured content that the KG does not cover.
def hybrid_kg_vector_retrieve(query, kg_store, vector_collection, k=5): """Combine KG traversal with vector search.""" # Step 1: Extract entities from query entities = extract_entities_from_query(query) # Step 2: KG retrieval (structured paths) kg_context = [] for entity in entities: neighbors = kg_store.find_neighbors(entity, max_hops=2) kg_context.extend(format_paths(neighbors)) # Step 3: Vector retrieval (semantic similarity) vector_results = vector_collection.query( query_texts=[query], n_results=k ) vector_context = vector_results["documents"][0] # Step 4: Combine both context types combined_context = f"""Structured Knowledge (from Knowledge Graph): {chr(10).join(kg_context)} Relevant Documents (from semantic search): {chr(10).join(vector_context)}""" return combined_context
GraphRAG's indexing phase is significantly more expensive than standard RAG because it requires LLM calls for every chunk (entity extraction), every entity pair (relation validation), and every community (summary generation). For a corpus of 10,000 documents, indexing can cost $50 to $500 in LLM API calls depending on the model and chunk count. However, this is a one-time cost, and the resulting graph with community summaries enables query capabilities that are impossible with vector-only RAG.
6. When to Use Knowledge Graph RAG
| Use Case | Vector RAG | KG RAG | GraphRAG |
|---|---|---|---|
| Simple factual Q&A | Excellent | Good | Overkill |
| Multi-hop reasoning | Poor | Excellent | Excellent |
| Corpus-level themes | Poor | Moderate | Excellent |
| Entity disambiguation | Poor | Excellent | Good |
| Setup complexity | Low | High | Very High |
| Indexing cost | Low (embedding only) | Medium (KG construction) | High (LLM-intensive) |
Section 19.3 Quiz
Show Answer
Show Answer
Show Answer
Show Answer
Show Answer
Key Takeaways
- Knowledge graphs encode structured relationships: Triples (subject, predicate, object) enable multi-hop reasoning and entity-aware retrieval that vector search cannot provide.
- LLMs simplify KG construction: Prompted LLMs can extract entities and relations from text directly, replacing complex NLP pipelines, though entity resolution remains challenging.
- Graph embeddings bridge KGs and vector spaces: TransE and DistMult map entities and relations into continuous vectors, enabling similarity-based operations on structured knowledge.
- GraphRAG enables corpus-level reasoning: By building KGs from documents and using community detection with hierarchical summaries, GraphRAG answers global questions that require cross-document synthesis.
- Combine KG and vector retrieval: The most effective production systems use KG traversal for structured multi-hop queries and vector search for semantic similarity, merging both into a unified context.