Documentation Audit Ledger
This file records the top 30 improvements identified and applied for each core document.
2026-03-26 Sync Update
This documentation set was re-audited and synchronized with the current implementation after the following system changes:
- GPU-only real Diffusers runtime became the default app path
- mock generation was restricted to explicit test harnesses
- backend and frontend tracing were added and persisted under
data/traces/ - per-session HTML trace reports were added under
data/traces/sessions/<session_id>/report.html - async round generation and feedback submission were exposed through job endpoints with visible progress
- a real GPU-backed end-to-end example bundle was added under
output/examples/real_e2e_example_run/ - a student-oriented tutorial was added to bridge motivation, theory, and implementation
- the roadmap set was expanded to include image-prompt, inpainting, and ControlNet steering directions
- lifecycle guards were added for duplicate feedback and premature next-round generation
- browser coverage was expanded with headed debug support and replay export API smoke coverage
The most heavily updated documents in this sync were:
- README.md
- quick_start.md
- user_guide.md
- developer_guide.md
- faq.md
- student_tutorial.md
- pre_implementation_blueprint.md
- system_specification.md
- system_test_specification.md
1. Motivation: Top 30 Improvements Applied
- Added a document-role section so readers know when to use this file.
- Added links to the related documents for navigation across the spec set.
- Reframed the introduction around the research problem rather than only system purpose.
- Made the core problem statement more explicit and concrete.
- Expanded the list of generation variables that create steering instability.
- Clarified that the issue is not only prompt sensitivity but also user-control mismatch.
- Added a crisp central research claim.
- Separated the central claim from the iterative loop description.
- Clarified that the goal is to study controllability, not only image quality.
- Added a section explaining why the project matters beyond curiosity.
- Split value into research value and practical value.
- Expanded the research questions into a more useful study agenda.
- Added a question about fatigue and inconsistency across rounds.
- Added a question about interface bias and interaction design.
- Added a section explaining why current interfaces are inadequate for research.
- Strengthened the rationale for exact reproducibility and replay.
- Added intended outcomes so the document points toward deliverables.
- Expanded the goals section to include replay and comparative analysis.
- Tightened non-goals to reduce scope ambiguity.
- Renamed the experimental matrix as a first comparison grid to better position it.
- Clarified that the matrix is intentionally manageable for early research.
- Added overfitting to one workflow as an explicit confound.
- Reworded risk statements so they are testable rather than purely descriptive.
- Added a requirement to log confounds instead of merely acknowledging them.
- Added explicit success criteria for deciding whether the project is worth continuing.
- Improved section ordering from problem to claim to value to goals to risks.
- Made wording more decisive and less repetitive.
- Improved cross-document consistency with the other spec files.
- Reduced ambiguity around the research purpose of the platform.
- Strengthened the summary so it reflects the document's main claim.
2. Theoretical Background: Top 30 Improvements Applied
- Added a document-role section to define the purpose of the theory doc.
- Added links back to the motivation and system docs.
- Clarified that the document is scoped to the minimum theory needed for design.
- Simplified the diffusion overview without losing technical meaning.
- Made the consequence of embedding-based conditioning more explicit.
- Added a dedicated section on why prompt rewriting is hard.
- Clarified the discrete-text versus continuous-control mismatch.
- Added explicit mention that prompt rewriting is still useful but limited.
- Expanded the embedding discussion beyond full-tensor control.
- Added the notion of tradeoffs among steering representations.
- Reframed low-dimensional steering as a controllable search space.
- Added a section explaining why local search is a reasonable framing.
- Clarified that the system is not solving global optimization.
- Connected low-dimensional search to interpretability and replay.
- Strengthened the preference-learning framing.
- Expanded the list of feedback types to match later system design.
- Clarified that the latent reward is noisy and only partially observed.
- Tightened the explanation of exploration versus exploitation.
- Linked the exploration problem directly to real human attention constraints.
- Expanded the seed-sensitivity explanation into an identification problem.
- Made seed-control implications explicit for system design.
- Added a stronger explanation of trust regions and anchoring.
- Added a comparison section for multiple representation and update choices.
- Connected theory choices to concrete engineering consequences.
- Added a section on the limits of the theory so the document is not overstated.
- Named entanglement and instability as theoretical limits.
- Improved continuity between sections by making each one motivate the next.
- Increased consistency with the terminology used in the system spec.
- Improved the summary so it restates the practical theoretical justification.
- Reduced the chance that readers interpret the theory as a claim of guaranteed smoothness.
3. System Specification: Top 30 Improvements Applied
- Added a document-role section to clarify that this is the main functional contract.
- Added links to related documents for navigation and alignment.
- Added an explicit scope section stating what the document does and does not cover.
- Added a short system-goals section before architecture details.
- Added a canonical user workflow to anchor the rest of the spec.
- Added core system invariants that implementation must preserve.
- Strengthened experiment fields to include steering and control parameters.
- Strengthened session fields with incumbent reference and status.
- Strengthened round fields with render status and update summary.
- Strengthened candidate fields with render status and metadata expectations.
- Strengthened feedback-event fields with normalized payload requirements.
- Added lifecycle states for experiments.
- Added lifecycle states for sessions.
- Added lifecycle states for candidate rendering.
- Expanded frontend requirements to include failure behavior.
- Clarified required dashboard actions and elements.
- Clarified session-setup inputs and actions.
- Clarified interactive-page behavior and stable candidate ordering.
- Tightened accessibility requirements with focus visibility and hover independence.
- Expanded backend modules to include storage-layer responsibilities.
- Improved the data model with
updated_at,status, and normalized payload fields. - Added API conventions in addition to endpoint lists.
- Added
GET /experiments/{experiment_id}for complete experiment retrieval. - Clarified response requirements for write endpoints.
- Tightened the steering-representation section with a default equation explanation.
- Separated required versus optional samplers and updaters more clearly.
- Tightened the unified feedback schema language.
- Added operational constraints for the v1 environment.
- Updated the suggested project structure to point at
system_specification.md. - Strengthened the summary around architectural priorities and research use.
4. System Test Specification: Top 30 Improvements Applied
- Added a document-role section explaining why tests are part of the research method.
- Added links to the implementation-facing specs.
- Added explicit test objectives before listing categories.
- Added a test-environment strategy to reduce unnecessary dependence on real generation.
- Clarified the distinction between logic, service, and end-to-end tests.
- Expanded steering unit tests with invalid-dimension failures.
- Expanded sampler unit tests with role-tag verification.
- Expanded feedback tests with critique preservation.
- Expanded feedback tests with skip and uncertain actions.
- Expanded updater tests with trust-region checks.
- Expanded seed-policy tests with missing-metadata failure handling.
- Added persistence and schema unit-test coverage.
- Strengthened the generation integration test to cover partial success.
- Strengthened the replay integration test to include round-order stability.
- Split sampler and updater swap checks explicitly in plug-in tests.
- Added API contract integration tests.
- Expanded end-to-end coverage to require at least two feedback modes.
- Expanded end-to-end coverage to include replay opening.
- Added recoverable-error display checks to end-to-end tests.
- Strengthened deterministic replay checks with round summaries.
- Added a separate regression-test section.
- Added edge-case prompt regression coverage.
- Added edge-case feedback-payload regression coverage.
- Added replay-bug regression coverage.
- Added an explicit failure-mode test section.
- Added export-failure testing.
- Added database interruption and resume testing.
- Expanded fixtures with schema snapshots.
- Strengthened acceptance criteria with failure-mode coverage.
- Added test-reporting expectations so failures are easier to interpret.
5. Pre-Implementation Blueprint: Top 30 Improvements Applied
- Added a document-role section to frame this as an implementation handoff.
- Added links to the related research and test docs.
- Added implementation principles before scope details.
- Reframed v1 scope as a concrete engineering boundary.
- Tightened out-of-scope items to reduce future drift.
- Added a clear assumptions section to lock environment defaults.
- Added a requirement for mock generation during testing.
- Reworked open decisions into decisions that should be fixed before coding.
- Kept default-model choice explicit and actionable.
- Kept default-basis choice explicit and actionable.
- Kept default-feedback choice explicit and actionable.
- Kept default-updater choice explicit and actionable.
- Clarified frontend responsibilities versus non-responsibilities.
- Clarified backend responsibilities versus non-responsibilities.
- Clarified storage responsibilities and exclusions.
- Strengthened the session contract before implementation.
- Strengthened the candidate contract before implementation.
- Strengthened the feedback contract before implementation.
- Strengthened the replay contract before implementation.
- Renamed implementation order to delivery order for clearer project planning.
- Tightened minimal API decisions as pre-coding agreements.
- Expanded non-functional requirements for reproducibility.
- Expanded non-functional requirements for debuggability.
- Expanded non-functional requirements for modularity.
- Reworked risk sections into explicit risk-and-mitigation pairs.
- Added a clearer definition of implementation readiness.
- Added delivery milestones to make the blueprint easier to execute.
- Improved consistency of terminology with the system spec.
- Reduced ambiguity around what must be decided before coding starts.
- Strengthened the summary so the document reads as an actual handoff artifact.