Section 26.2: Frontend & User Interfaces

★ Big Picture

The best LLM backend is useless without a good frontend. Python-native frameworks like Gradio and Streamlit let ML engineers build demos and internal tools in minutes without any JavaScript. Chainlit provides a purpose-built conversational interface with features like step-by-step reasoning display and file uploads. For production-grade consumer applications, the Vercel AI SDK offers React/Next.js components with built-in streaming support. This section covers when to use each framework and provides working examples for each.

1. Framework Comparison

Framework	Language	Streaming	Auth	Best For
Gradio	Python	Yes (built-in)	Basic / OAuth	ML demos, HuggingFace Spaces
Streamlit	Python	Yes (st.write_stream)	Community / Enterprise	Data apps, dashboards
Chainlit	Python	Yes (native)	OAuth / custom	Conversational AI, agent UIs
Open WebUI	Python/JS	Yes	Built-in multi-user	Self-hosted ChatGPT alternative
Vercel AI SDK	TypeScript	Yes (useChat hook)	Next.js auth	Production consumer apps

2. Gradio Chat Interface

import gradio as gr
from openai import OpenAI

client = OpenAI()

def chat(message, history):
    """Gradio chat handler with streaming."""
    messages = [{"role": "system", "content": "You are a helpful assistant."}]
    for user_msg, bot_msg in history:
        messages.append({"role": "user", "content": user_msg})
        messages.append({"role": "assistant", "content": bot_msg})
    messages.append({"role": "user", "content": message})

    stream = client.chat.completions.create(
        model="gpt-4o-mini", messages=messages, stream=True
    )
    partial = ""
    for chunk in stream:
        delta = chunk.choices[0].delta.content or ""
        partial += delta
        yield partial

demo = gr.ChatInterface(
    fn=chat,
    title="LLM Chat Demo",
    description="Streaming chat powered by GPT-4o-mini",
    examples=["Explain RAG in simple terms", "Write a haiku about ML"],
)
demo.launch(share=True)

Figure 26.2.1: Three common frontend architecture patterns, from monolithic Python demos to decoupled production stacks.

3. Streamlit Chat Application

import streamlit as st
from openai import OpenAI

st.title("Streamlit LLM Chat")
client = OpenAI()

# Initialize chat history in session state
if "messages" not in st.session_state:
    st.session_state.messages = []

# Display existing messages
for msg in st.session_state.messages:
    with st.chat_message(msg["role"]):
        st.markdown(msg["content"])

# Handle new input
if prompt := st.chat_input("Ask anything..."):
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.markdown(prompt)

    with st.chat_message("assistant"):
        stream = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=st.session_state.messages,
            stream=True,
        )
        response = st.write_stream(
            chunk.choices[0].delta.content or ""
            for chunk in stream
        )
    st.session_state.messages.append({"role": "assistant", "content": response})

4. Chainlit for Conversational AI

import chainlit as cl
from openai import AsyncOpenAI

client = AsyncOpenAI()

@cl.on_chat_start
async def start():
    cl.user_session.set("history", [])
    await cl.Message(content="Hello! How can I help you today?").send()

@cl.on_message
async def on_message(message: cl.Message):
    history = cl.user_session.get("history")
    history.append({"role": "user", "content": message.content})

    msg = cl.Message(content="")
    await msg.send()

    stream = await client.chat.completions.create(
        model="gpt-4o-mini", messages=history, stream=True
    )

    full_response = ""
    async for chunk in stream:
        token = chunk.choices[0].delta.content or ""
        full_response += token
        await msg.stream_token(token)

    await msg.update()
    history.append({"role": "assistant", "content": full_response})
    cl.user_session.set("history", history)

📝 Note

Chainlit excels at displaying multi-step agent reasoning. Its @cl.step decorator lets you show intermediate tool calls, retrieval results, and thinking processes as collapsible steps in the chat UI, which is invaluable for debugging and user transparency.

5. Vercel AI SDK with Next.js

// app/api/chat/route.ts (Next.js API route)
import { openai } from "@ai-sdk/openai";
import { streamText } from "ai";

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = streamText({
    model: openai("gpt-4o-mini"),
    system: "You are a helpful assistant.",
    messages,
  });

  return result.toDataStreamResponse();
}

// app/page.tsx (React component)
"use client";
import { useChat } from "@ai-sdk/react";

export default function Chat() {
  const { messages, input, handleInputChange, handleSubmit } = useChat();

  return (
    <div>
      {messages.map((m) => (
        <div key={m.id}>{m.role}: {m.content}</div>
      ))}
      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} />
      </form>
    </div>
  );
}

Figure 26.2.2: Open WebUI provides a self-hosted, multi-user chat interface that connects to Ollama or any OpenAI-compatible API.

⚠ Warning

Streamlit reruns the entire script on every interaction. For LLM applications, this means you must store chat history in st.session_state and guard expensive operations (model loading, API client initialization) with caching decorators like @st.cache_resource. Failing to do so causes repeated model loads and lost conversation context.

★ Key Insight

Choose your frontend framework based on your audience. Gradio and Streamlit are optimal for internal tools, demos, and ML team workflows. Chainlit is the best choice for agent-heavy applications where you need to show reasoning steps. For external, consumer-facing products with custom branding and complex UX, use the Vercel AI SDK with Next.js to get full control over the interface.

Knowledge Check

1. What is the main advantage of Gradio's ChatInterface over building a custom chat UI?

Show Answer

Gradio's ChatInterface handles conversation history management, streaming display, retry/undo buttons, and example prompts out of the box. It also provides automatic API endpoints and one-click public sharing via Gradio's tunneling, eliminating the need for any frontend code.

2. Why must chat history be stored in st.session_state in Streamlit?

Show Answer

Streamlit reruns the entire Python script from top to bottom on every user interaction. Without session_state, all local variables (including chat history) would be reset on each rerun, causing the conversation to be lost after every message.

3. What does Chainlit's @cl.step decorator provide that other frameworks lack?

Show Answer

The @cl.step decorator creates collapsible, nested step displays in the chat UI that show intermediate agent reasoning, tool calls, retrieval results, and processing stages. This is essential for debugging complex agent workflows and providing transparency to users about how the system reached its answer.

4. How does the Vercel AI SDK's useChat hook simplify streaming chat implementation?

Show Answer

The useChat hook manages the entire client-side chat lifecycle: it maintains message state, handles form submission, parses streaming responses from the server, updates the UI incrementally as tokens arrive, and provides loading/error states. This eliminates the need to manually implement SSE parsing, state management, and streaming display logic.

5. When would you choose Open WebUI over building a custom frontend?

Show Answer

Open WebUI is the best choice when you need a self-hosted, multi-user chat interface that supports multiple model backends (Ollama, OpenAI-compatible APIs), built-in RAG with document uploads, user management, and conversation history. It provides a ChatGPT-like experience without any custom development, making it ideal for teams that want to deploy open-source models internally.

Key Takeaways

Gradio's ChatInterface provides the fastest path from model to shareable demo with built-in streaming, history, and public URLs.
Streamlit requires explicit session state management due to its rerun-on-interaction execution model, but excels at data-rich dashboards.
Chainlit is purpose-built for conversational AI with native support for agent step visualization, file uploads, and multi-turn reasoning display.
Open WebUI offers a complete self-hosted ChatGPT alternative with multi-user support, RAG, and compatibility with Ollama and OpenAI APIs.
The Vercel AI SDK provides production-grade React hooks for streaming chat that integrate seamlessly with Next.js API routes and server components.
Match your framework to your audience: Python frameworks for internal tools, Next.js for consumer products.