Module 24 · Section 24.4

LLM-Powered Recommendation & Search

LLMs as recommendation engines, conversational recommendation, LLM-powered search, query understanding, and user preference modeling
★ Big Picture

LLMs are transforming both search and recommendation from retrieval problems into reasoning problems. Traditional search returns ranked documents matching keywords. LLM-powered search (Perplexity, Google AI Overviews) understands intent, synthesizes information across sources, and generates direct answers with citations. Similarly, traditional recommendation relies on collaborative filtering and content-based features, while LLM-powered recommendation understands nuanced preferences expressed in natural language and can explain its reasoning. This shift from pattern matching to comprehension represents a fundamental change in how users discover information and products.

1. LLMs as Recommendation Engines

LLMs can serve as recommendation engines by leveraging their world knowledge and reasoning abilities. Given a description of user preferences, past interactions, and a catalog of items, an LLM can generate personalized recommendations with natural language explanations. This approach excels for cold-start scenarios (new users with no history) and for nuanced preferences that are difficult to capture with traditional feature vectors.

from openai import OpenAI
import json

client = OpenAI()

def recommend_items(user_profile: str, catalog: list, n: int = 5) -> dict:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": f"""You are a recommendation engine.
Given a user profile and catalog, recommend {n} items.
Return JSON with 'recommendations' array, each having:
'item_id', 'score' (0-1), 'reasoning' (brief explanation)."""},
            {"role": "user", "content": f"""User Profile: {user_profile}
Catalog: {json.dumps(catalog)}""},
        ],
        response_format={"type": "json_object"},
    )
    return json.loads(response.choices[0].message.content)

recs = recommend_items(
    user_profile="Enjoys sci-fi with strong worldbuilding, dislikes romance subplots",
    catalog=[
        {"id": "b1", "title": "Dune", "genre": "sci-fi"},
        {"id": "b2", "title": "The Notebook", "genre": "romance"},
        {"id": "b3", "title": "Neuromancer", "genre": "sci-fi"},
    ],
)

2. LLM-Powered Search

LLM-powered search systems like Perplexity represent a paradigm shift from "ten blue links" to direct answers with cited sources. The architecture combines a search engine (retrieving relevant web pages), a reader model (extracting key information from each source), and a generator model (synthesizing a coherent answer with inline citations). This is essentially RAG (Module 19) applied at web scale.

User Query natural language Query Understanding rewrite + expand Web Search Knowledge vector store Reader + Synthesizer LLM generation Answer with citations
Figure 24.6: LLM-powered search architecture. The query is understood and expanded, sources are retrieved and read, and the LLM synthesizes an answer with inline citations.
# Building a simple LLM-powered search with RAG
from openai import OpenAI
import requests

client = OpenAI()

def llm_search(query: str, search_results: list) -> str:
    # Format search results as context
    context = "\n\n".join([
        f"[Source {i+1}] {r['title']}\nURL: {r['url']}\n{r['snippet']}"
        for i, r in enumerate(search_results)
    ])

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": """Answer the user's query using the provided sources.
Cite sources inline using [Source N] notation. Be concise and factual.
If sources conflict, note the disagreement."""},
            {"role": "user", "content": f"Query: {query}\n\nSources:\n{context}"},
        ],
    )
    return response.choices[0].message.content

3. Conversational Recommendation

Conversational recommendation combines dialogue management with recommendation logic. Instead of a one-shot recommendation, the system engages in a multi-turn conversation to elicit preferences, clarify constraints, and refine suggestions. This is particularly valuable for high-consideration purchases (electronics, travel, real estate) where user needs are complex and evolving.

from openai import OpenAI

client = OpenAI()

class ConversationalRecommender:
    def __init__(self, catalog_context: str):
        self.messages = [{
            "role": "system",
            "content": f"""You are a helpful product recommendation assistant.
Ask clarifying questions to understand user needs before recommending.
Available products:\n{catalog_context}
Always explain why each recommendation fits the user's stated needs."""
        }]

    def chat(self, user_message: str) -> str:
        self.messages.append({"role": "user", "content": user_message})
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=self.messages,
        )
        reply = response.choices[0].message.content
        self.messages.append({"role": "assistant", "content": reply})
        return reply

recommender = ConversationalRecommender(catalog_context="...")
print(recommender.chat("I need a laptop for data science work"))
🔍 Key Insight

The fundamental advantage of LLM-powered recommendation over traditional collaborative filtering is explainability and preference elicitation. An LLM can explain "I recommended this because you mentioned you prefer quiet keyboards, and this laptop has a low-profile mechanical keyboard" while collaborative filtering can only say "users like you also bought this." This explainability builds user trust and enables the system to correct misunderstandings through dialogue, creating a more effective recommendation loop.

4. User Preference Modeling

LLMs can build rich user preference models from natural language interactions, product reviews, and browsing histories. Rather than reducing preferences to sparse feature vectors, LLMs maintain a natural language summary of what the user likes, dislikes, and values. This "preference narrative" can be updated through conversation and used to condition future recommendations.

Approach Cold Start Explainability Scale Latency
Collaborative Filtering Poor Low Excellent Very fast
Content-Based Good Medium Good Fast
LLM Recommendation Excellent High Limited Slow
Hybrid (CF + LLM) Good High Good Moderate
⚠ Scalability Challenges

LLM-based recommendation faces significant scalability challenges. Generating a personalized recommendation for each user request requires an LLM inference call, which is orders of magnitude slower and more expensive than a collaborative filtering lookup. Production systems address this through caching (pre-compute recommendations for popular queries), hybrid architectures (use CF for candidate generation, LLM for re-ranking and explanation), and batching (generate recommendations in bulk during off-peak hours).

Knowledge Check

1. How does LLM-powered search differ from traditional keyword search?
Show Answer
Traditional search matches keywords to documents and returns ranked results. LLM-powered search understands query intent, expands or rewrites the query, retrieves relevant sources, and synthesizes a direct answer with citations. It transforms search from a retrieval problem into a comprehension and generation problem, providing answers rather than links.
2. Why are LLMs effective for cold-start recommendation scenarios?
Show Answer
Cold-start is when a new user has no interaction history, making collaborative filtering impossible. LLMs can leverage their world knowledge to make recommendations from a natural language description of preferences alone. A user saying "I enjoy complex strategy games with resource management" gives an LLM enough information to recommend relevant games, while collaborative filtering would have nothing to work with.
3. What is the architecture of an LLM-powered search system like Perplexity?
Show Answer
The architecture has three stages: (1) query understanding, where the LLM rewrites and expands the user's query into search-optimized forms; (2) retrieval, where a web search engine and/or knowledge base finds relevant sources; and (3) synthesis, where the LLM reads the retrieved content and generates a coherent answer with inline citations to specific sources. This is essentially RAG applied at web scale.
4. What advantage does conversational recommendation have over one-shot recommendation?
Show Answer
Conversational recommendation engages in multi-turn dialogue to elicit preferences, clarify constraints, and refine suggestions. This is valuable for complex decisions where user needs are nuanced and evolving. It can ask "Do you prioritize portability or screen size?" to disambiguate preferences, correct misunderstandings, and progressively narrow recommendations to an ideal match.
5. How do hybrid systems address the scalability limitations of LLM-based recommendation?
Show Answer
Hybrid systems use traditional methods (collaborative filtering, content-based) for fast candidate generation from large catalogs, then use LLMs for re-ranking the top candidates and generating natural language explanations. This provides the scalability of traditional methods with the explainability and preference understanding of LLMs, keeping the expensive LLM inference limited to a small number of pre-filtered candidates.

Key Takeaways