Research Improvement Roadmap

1. Purpose

This document tracks the highest-value research improvements for StableSteering as a study platform.

It focuses on:

It does not focus on core engineering execution. That belongs in:

2. Current Research Baseline

The current system already supports:

This is enough for exploratory pilot work, but not yet enough for strong claims about algorithm quality, usability, or scientific validity.

3. Main Research Gaps

The largest current gaps are:

4. Priority Levels

5. R0: Research Validity Priorities

5.1 Establish a baseline comparison matrix

Why it matters:

Implementation notes:

Success signal:

5.2 Add explicit study protocols

Why it matters:

Implementation notes:

Success signal:

5.3 Improve confound logging

Why it matters:

Implementation notes:

Success signal:

5.4 Define research success criteria

Why it matters:

Implementation notes:

Success signal:

6. R1: Better Measurement and Analysis

6.1 Add stronger outcome metrics

Why it matters:

Implementation notes:

Success signal:

6.2 Build analysis-ready exports

Why it matters:

Implementation notes:

Success signal:

6.3 Add notebook-based analysis templates

Why it matters:

Implementation notes:

Success signal:

6.4 Strengthen replay as a research asset

Why it matters:

Implementation notes:

Success signal:

7. R1: Better Human Interaction Research

7.1 Move beyond rating-only interaction

Why it matters:

Implementation notes:

Success signal:

7.2 Evaluate user consistency and fatigue

Why it matters:

Implementation notes:

Success signal:

7.3 Study interface bias

Why it matters:

Implementation notes:

Success signal:

7.4 Study richer elicitation modes and UI patterns

Why it matters:

Implementation notes:

Success signal:

8. R1: Synthetic Data Research Direction

8.1 Build realistic synthetic steering trajectories toward an anchor

Why it matters:

Implementation notes:

Success signal:

8.2 Build diversity-oriented synthetic sampling around one or more steered locations

Why it matters:

Implementation notes:

Success signal:

8.3 Use synthetic data to pretrain and stress-test steering algorithms

Why it matters:

Implementation notes:

Success signal:

8.4 Treat synthetic-user realism itself as a research problem

Why it matters:

Implementation notes:

Success signal:

8.5 Extend steering research to richer diffusion workflows

Why it matters:

Implementation notes:

Success signal:

9. R2: Strategy Research Expansions

9.1 Study steering-dimension selection methods

Why it matters:

Implementation notes:

Success signal:

9.2 Add richer steering representations

Why it matters:

Implementation notes:

Success signal:

9.3 Add stronger samplers

Why it matters:

Implementation notes:

Success signal:

9.4 Add stronger preference and reward models

Why it matters:

Implementation notes:

Success signal:

9.5 Add stronger updaters

Why it matters:

Implementation notes:

Success signal:

10. Study Program Milestones

Milestone R-A: Pilot Validity

Milestone R-B: Reliable Measurement

Milestone R-C: Comparative Research

11. Suggested Execution Order

  1. define the baseline comparison matrix
  2. define pilot protocols and prompt/task sets
  3. improve confound logging
  4. define explicit research success criteria
  5. build analysis-ready exports
  6. create notebook-based analysis templates
  7. strengthen replay as an analysis asset
  8. compare feedback modalities with real users
  9. study richer elicitation modes and UI patterns
  10. evaluate consistency, fatigue, and interface bias
  11. define anchor-seeking synthetic-user tasks
  12. define diversity-seeking synthetic-user tasks
  13. build synthetic stress-test corpora
  14. evaluate synthetic-user realism
  15. extend studies to image-prompt, inpainting, and ControlNet workflows
  16. compare steering-dimension selection methods
  17. compare richer representations, samplers, preference models, and updaters

12. Summary

The next research phase should shift from “can the system run?” to “can the system support credible conclusions?”

That means focusing on: