Module 26 · Section 26.2

Frontend & User Interfaces

Gradio, Streamlit, Chainlit, Open WebUI, and the Vercel AI SDK for building interactive LLM application frontends
★ Big Picture

The best LLM backend is useless without a good frontend. Python-native frameworks like Gradio and Streamlit let ML engineers build demos and internal tools in minutes without any JavaScript. Chainlit provides a purpose-built conversational interface with features like step-by-step reasoning display and file uploads. For production-grade consumer applications, the Vercel AI SDK offers React/Next.js components with built-in streaming support. This section covers when to use each framework and provides working examples for each.

1. Framework Comparison

Framework Language Streaming Auth Best For
GradioPythonYes (built-in)Basic / OAuthML demos, HuggingFace Spaces
StreamlitPythonYes (st.write_stream)Community / EnterpriseData apps, dashboards
ChainlitPythonYes (native)OAuth / customConversational AI, agent UIs
Open WebUIPython/JSYesBuilt-in multi-userSelf-hosted ChatGPT alternative
Vercel AI SDKTypeScriptYes (useChat hook)Next.js authProduction consumer apps

2. Gradio Chat Interface

import gradio as gr
from openai import OpenAI

client = OpenAI()

def chat(message, history):
    """Gradio chat handler with streaming."""
    messages = [{"role": "system", "content": "You are a helpful assistant."}]
    for user_msg, bot_msg in history:
        messages.append({"role": "user", "content": user_msg})
        messages.append({"role": "assistant", "content": bot_msg})
    messages.append({"role": "user", "content": message})

    stream = client.chat.completions.create(
        model="gpt-4o-mini", messages=messages, stream=True
    )
    partial = ""
    for chunk in stream:
        delta = chunk.choices[0].delta.content or ""
        partial += delta
        yield partial

demo = gr.ChatInterface(
    fn=chat,
    title="LLM Chat Demo",
    description="Streaming chat powered by GPT-4o-mini",
    examples=["Explain RAG in simple terms", "Write a haiku about ML"],
)
demo.launch(share=True)
Python Monolith Gradio / Streamlit UI + Backend in one process LLM Logic Model / API Client Best for: demos, prototypes internal tools Chat-First Chainlit UI Conversation + step display Agent / Chain Logic LangChain / LlamaIndex Best for: agent UIs, step-by-step reasoning Decoupled Next.js + AI SDK React frontend, API routes FastAPI Backend Model Serving Layer Best for: production apps, consumer products
Figure 26.2.1: Three common frontend architecture patterns, from monolithic Python demos to decoupled production stacks.

3. Streamlit Chat Application

import streamlit as st
from openai import OpenAI

st.title("Streamlit LLM Chat")
client = OpenAI()

# Initialize chat history in session state
if "messages" not in st.session_state:
    st.session_state.messages = []

# Display existing messages
for msg in st.session_state.messages:
    with st.chat_message(msg["role"]):
        st.markdown(msg["content"])

# Handle new input
if prompt := st.chat_input("Ask anything..."):
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.markdown(prompt)

    with st.chat_message("assistant"):
        stream = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=st.session_state.messages,
            stream=True,
        )
        response = st.write_stream(
            chunk.choices[0].delta.content or ""
            for chunk in stream
        )
    st.session_state.messages.append({"role": "assistant", "content": response})

4. Chainlit for Conversational AI

import chainlit as cl
from openai import AsyncOpenAI

client = AsyncOpenAI()

@cl.on_chat_start
async def start():
    cl.user_session.set("history", [])
    await cl.Message(content="Hello! How can I help you today?").send()

@cl.on_message
async def on_message(message: cl.Message):
    history = cl.user_session.get("history")
    history.append({"role": "user", "content": message.content})

    msg = cl.Message(content="")
    await msg.send()

    stream = await client.chat.completions.create(
        model="gpt-4o-mini", messages=history, stream=True
    )

    full_response = ""
    async for chunk in stream:
        token = chunk.choices[0].delta.content or ""
        full_response += token
        await msg.stream_token(token)

    await msg.update()
    history.append({"role": "assistant", "content": full_response})
    cl.user_session.set("history", history)
📝 Note

Chainlit excels at displaying multi-step agent reasoning. Its @cl.step decorator lets you show intermediate tool calls, retrieval results, and thinking processes as collapsible steps in the chat UI, which is invaluable for debugging and user transparency.

5. Vercel AI SDK with Next.js

// app/api/chat/route.ts (Next.js API route)
import { openai } from "@ai-sdk/openai";
import { streamText } from "ai";

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = streamText({
    model: openai("gpt-4o-mini"),
    system: "You are a helpful assistant.",
    messages,
  });

  return result.toDataStreamResponse();
}

// app/page.tsx (React component)
"use client";
import { useChat } from "@ai-sdk/react";

export default function Chat() {
  const { messages, input, handleInputChange, handleSubmit } = useChat();

  return (
    <div>
      {messages.map((m) => (
        <div key={m.id}>{m.role}: {m.content}</div>
      ))}
      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} />
      </form>
    </div>
  );
}
Open WebUI Self-Hosted Architecture Browser Multi-user UI Open WebUI Python + SvelteKit Ollama (local) OpenAI-compat API SQLite / PostgreSQL Chat history, users, RAG
Figure 26.2.2: Open WebUI provides a self-hosted, multi-user chat interface that connects to Ollama or any OpenAI-compatible API.
⚠ Warning

Streamlit reruns the entire script on every interaction. For LLM applications, this means you must store chat history in st.session_state and guard expensive operations (model loading, API client initialization) with caching decorators like @st.cache_resource. Failing to do so causes repeated model loads and lost conversation context.

★ Key Insight

Choose your frontend framework based on your audience. Gradio and Streamlit are optimal for internal tools, demos, and ML team workflows. Chainlit is the best choice for agent-heavy applications where you need to show reasoning steps. For external, consumer-facing products with custom branding and complex UX, use the Vercel AI SDK with Next.js to get full control over the interface.

Knowledge Check

1. What is the main advantage of Gradio's ChatInterface over building a custom chat UI?

Show Answer
Gradio's ChatInterface handles conversation history management, streaming display, retry/undo buttons, and example prompts out of the box. It also provides automatic API endpoints and one-click public sharing via Gradio's tunneling, eliminating the need for any frontend code.

2. Why must chat history be stored in st.session_state in Streamlit?

Show Answer
Streamlit reruns the entire Python script from top to bottom on every user interaction. Without session_state, all local variables (including chat history) would be reset on each rerun, causing the conversation to be lost after every message.

3. What does Chainlit's @cl.step decorator provide that other frameworks lack?

Show Answer
The @cl.step decorator creates collapsible, nested step displays in the chat UI that show intermediate agent reasoning, tool calls, retrieval results, and processing stages. This is essential for debugging complex agent workflows and providing transparency to users about how the system reached its answer.

4. How does the Vercel AI SDK's useChat hook simplify streaming chat implementation?

Show Answer
The useChat hook manages the entire client-side chat lifecycle: it maintains message state, handles form submission, parses streaming responses from the server, updates the UI incrementally as tokens arrive, and provides loading/error states. This eliminates the need to manually implement SSE parsing, state management, and streaming display logic.

5. When would you choose Open WebUI over building a custom frontend?

Show Answer
Open WebUI is the best choice when you need a self-hosted, multi-user chat interface that supports multiple model backends (Ollama, OpenAI-compatible APIs), built-in RAG with document uploads, user management, and conversation history. It provides a ChatGPT-like experience without any custom development, making it ideal for teams that want to deploy open-source models internally.

Key Takeaways