Student Tutorial

Who This Is For

This tutorial is for students who want to understand StableSteering as both:

You do not need to know every detail of diffusion models before reading this. The goal is to build intuition first, then connect that intuition to the actual implementation in this repository.

Learning Goal

By the end of this tutorial, you should understand:

1. Motivation

Text-to-image systems are powerful, but one-shot prompting is often frustrating in practice. A user may know the direction they want, but not the exact words needed to get there in a single prompt.

Typical problems include:

StableSteering addresses this by turning image generation into an interactive process. Instead of asking the user to solve everything with one perfect prompt, the system:

  1. starts from the user's text prompt
  2. proposes several nearby candidate directions
  3. lets the user express preferences
  4. updates the steering state
  5. generates a better next round

This makes the system useful for creative exploration, product ideation, concept design, and research on human-in-the-loop generation.

For the fuller research framing, see motivation.md.

2. Background Intuition

At a high level, a diffusion image model turns text conditioning into image samples. StableSteering explores the idea that we can gently move the conditioning representation in a direction that better matches user preferences.

You can think of the system like this:

This does not require the user to manually edit embeddings. The user only interacts through images, ratings, or preference choices.

The main theoretical ideas behind the project are:

Illustration of the steering loop

This Gemini-generated illustration is meant to build intuition for the interaction cycle: start from a prompt, propose candidates, observe user preference, update the steering state, and repeat.

For the fuller theory discussion, see theoretical_background.md.

3. System Concept

StableSteering is best understood as a loop:

  1. The user enters a text prompt.
  2. The system creates a session and initializes a steering state.
  3. The backend generates a round of candidate images.
  4. The user gives feedback.
  5. The backend updates the steering state.
  6. The next round is generated from that updated state.

This process continues until the user is satisfied or the experiment ends.

What Makes It Different

The important shift is that the user does not need to produce the perfect prompt upfront. Instead, the user can express preference through interaction.

That gives the system two forms of information:

Together, these allow the session to move toward a better result over time.

Illustration of the system architecture

This illustration gives a high-level picture of the runtime path from prompt entry to generation, feedback, storage, and reporting. Treat it as a conceptual map rather than a strict engineering diagram.

4. Main Parts Of The Implementation

The implementation has a few core layers.

Frontend

The frontend is the user-facing part of the app. It lets the user:

Relevant files:

API Layer

The FastAPI app exposes the system as HTTP routes. It handles:

Relevant file:

Orchestration Layer

The orchestrator coordinates the session lifecycle. It is the heart of the application logic.

It decides how to:

Relevant file:

Generation Layer

The generation layer connects the system to the image model. In the real runtime, this uses a Diffusers-backed Stable Diffusion pipeline on GPU.

The project also contains a mock backend, but that is reserved strictly for testing.

Relevant file:

Storage Layer

The storage layer persists experiments, sessions, rounds, and related state. The current implementation uses SQLite for structured records, while artifacts and traces are stored on disk.

Relevant file:

5. A Typical User Session

Here is the normal user flow.

Step 1: Start From A Prompt

The user begins with a text description such as:

A premium cinematic product hero photo of an expedition-ready electric explorer motorcycle...

This is important because the system is prompt-first. The prompt defines the initial semantic goal.

Step 2: Generate Candidate Images

The system produces multiple candidates for the current round. These images are not random in a completely uncontrolled way. They are sampled around the current steering state.

Step 3: User Feedback

The user reviews the proposals and gives feedback, for example:

Step 4: Steering Update

The backend converts that feedback into an updated internal state. That state influences the next generation round.

Step 5: New Round

The next round should better reflect the user's preferences. Over multiple rounds, the session ideally becomes more aligned, more coherent, and more useful.

Session lifecycle diagram

6. Why Async Jobs Matter

Real image generation can take time, especially on a GPU pipeline. Because of that, StableSteering uses async jobs for round generation and feedback application.

This matters for user experience because:

This makes the system feel like a real interactive tool instead of a blocking demo script.

For more on the actual endpoints and job flow, see developer_guide.md and user_guide.md.

7. Why Traces And HTML Reports Matter

StableSteering is not only an app. It is also a research system.

That means we care about:

The backend saves readable per-session traces and generates an HTML report. This is valuable for:

Illustration of a session trace report

This illustration matches the idea of the saved HTML report: one place to inspect proposed images, user actions, feedback, and how the session evolved over time.

You can see this in the generated example bundles under output/examples/.

8. How To Study The Code

If you are learning the system for the first time, this is a good reading order:

  1. Read motivation.md.
  2. Read theoretical_background.md.
  3. Read system_specification.md.
  4. Read quick_start.md.
  5. Read user_guide.md.
  6. Read developer_guide.md.
  7. Open main.py.
  8. Open orchestrator.py.
  9. Open generation.py.
  10. Open repository.py.

9. Suggested Exercises

If you are using this project to learn, try these exercises:

Concept Exercises

Implementation Exercises

Research Exercises

10. Big Picture

StableSteering sits at the intersection of:

That makes it a good teaching project. It is concrete enough to run, inspect, and modify, but rich enough to support meaningful research questions.

Next Documents

After this tutorial, the most useful next reads are: