Documentation Audit Ledger

This file records the top 30 improvements identified and applied for each core document.

2026-03-26 Sync Update

This documentation set was re-audited and synchronized with the current implementation after the following system changes:

GPU-only real Diffusers runtime became the default app path
mock generation was restricted to explicit test harnesses
backend and frontend tracing were added and persisted under data/traces/
per-session HTML trace reports were added under data/traces/sessions/<session_id>/report.html
async round generation and feedback submission were exposed through job endpoints with visible progress
a real GPU-backed end-to-end example bundle was added under output/examples/real_e2e_example_run/
a student-oriented tutorial was added to bridge motivation, theory, and implementation
the roadmap set was expanded to include image-prompt, inpainting, and ControlNet steering directions
lifecycle guards were added for duplicate feedback and premature next-round generation
browser coverage was expanded with headed debug support and replay export API smoke coverage

The most heavily updated documents in this sync were:

1. Motivation: Top 30 Improvements Applied

Added a document-role section so readers know when to use this file.
Added links to the related documents for navigation across the spec set.
Reframed the introduction around the research problem rather than only system purpose.
Made the core problem statement more explicit and concrete.
Expanded the list of generation variables that create steering instability.
Clarified that the issue is not only prompt sensitivity but also user-control mismatch.
Added a crisp central research claim.
Separated the central claim from the iterative loop description.
Clarified that the goal is to study controllability, not only image quality.
Added a section explaining why the project matters beyond curiosity.
Split value into research value and practical value.
Expanded the research questions into a more useful study agenda.
Added a question about fatigue and inconsistency across rounds.
Added a question about interface bias and interaction design.
Added a section explaining why current interfaces are inadequate for research.
Strengthened the rationale for exact reproducibility and replay.
Added intended outcomes so the document points toward deliverables.
Expanded the goals section to include replay and comparative analysis.
Tightened non-goals to reduce scope ambiguity.
Renamed the experimental matrix as a first comparison grid to better position it.
Clarified that the matrix is intentionally manageable for early research.
Added overfitting to one workflow as an explicit confound.
Reworded risk statements so they are testable rather than purely descriptive.
Added a requirement to log confounds instead of merely acknowledging them.
Added explicit success criteria for deciding whether the project is worth continuing.
Improved section ordering from problem to claim to value to goals to risks.
Made wording more decisive and less repetitive.
Improved cross-document consistency with the other spec files.
Reduced ambiguity around the research purpose of the platform.
Strengthened the summary so it reflects the document's main claim.

2. Theoretical Background: Top 30 Improvements Applied

Added a document-role section to define the purpose of the theory doc.
Added links back to the motivation and system docs.
Clarified that the document is scoped to the minimum theory needed for design.
Simplified the diffusion overview without losing technical meaning.
Made the consequence of embedding-based conditioning more explicit.
Added a dedicated section on why prompt rewriting is hard.
Clarified the discrete-text versus continuous-control mismatch.
Added explicit mention that prompt rewriting is still useful but limited.
Expanded the embedding discussion beyond full-tensor control.
Added the notion of tradeoffs among steering representations.
Reframed low-dimensional steering as a controllable search space.
Added a section explaining why local search is a reasonable framing.
Clarified that the system is not solving global optimization.
Connected low-dimensional search to interpretability and replay.
Strengthened the preference-learning framing.
Expanded the list of feedback types to match later system design.
Clarified that the latent reward is noisy and only partially observed.
Tightened the explanation of exploration versus exploitation.
Linked the exploration problem directly to real human attention constraints.
Expanded the seed-sensitivity explanation into an identification problem.
Made seed-control implications explicit for system design.
Added a stronger explanation of trust regions and anchoring.
Added a comparison section for multiple representation and update choices.
Connected theory choices to concrete engineering consequences.
Added a section on the limits of the theory so the document is not overstated.
Named entanglement and instability as theoretical limits.
Improved continuity between sections by making each one motivate the next.
Increased consistency with the terminology used in the system spec.
Improved the summary so it restates the practical theoretical justification.
Reduced the chance that readers interpret the theory as a claim of guaranteed smoothness.

3. System Specification: Top 30 Improvements Applied

Added a document-role section to clarify that this is the main functional contract.
Added links to related documents for navigation and alignment.
Added an explicit scope section stating what the document does and does not cover.
Added a short system-goals section before architecture details.
Added a canonical user workflow to anchor the rest of the spec.
Added core system invariants that implementation must preserve.
Strengthened experiment fields to include steering and control parameters.
Strengthened session fields with incumbent reference and status.
Strengthened round fields with render status and update summary.
Strengthened candidate fields with render status and metadata expectations.
Strengthened feedback-event fields with normalized payload requirements.
Added lifecycle states for experiments.
Added lifecycle states for sessions.
Added lifecycle states for candidate rendering.
Expanded frontend requirements to include failure behavior.
Clarified required dashboard actions and elements.
Clarified session-setup inputs and actions.
Clarified interactive-page behavior and stable candidate ordering.
Tightened accessibility requirements with focus visibility and hover independence.
Expanded backend modules to include storage-layer responsibilities.
Improved the data model with updated_at, status, and normalized payload fields.
Added API conventions in addition to endpoint lists.
Added GET /experiments/{experiment_id} for complete experiment retrieval.
Clarified response requirements for write endpoints.
Tightened the steering-representation section with a default equation explanation.
Separated required versus optional samplers and updaters more clearly.
Tightened the unified feedback schema language.
Added operational constraints for the v1 environment.
Updated the suggested project structure to point at system_specification.md.
Strengthened the summary around architectural priorities and research use.

4. System Test Specification: Top 30 Improvements Applied

Added a document-role section explaining why tests are part of the research method.
Added links to the implementation-facing specs.
Added explicit test objectives before listing categories.
Added a test-environment strategy to reduce unnecessary dependence on real generation.
Clarified the distinction between logic, service, and end-to-end tests.
Expanded steering unit tests with invalid-dimension failures.
Expanded sampler unit tests with role-tag verification.
Expanded feedback tests with critique preservation.
Expanded feedback tests with skip and uncertain actions.
Expanded updater tests with trust-region checks.
Expanded seed-policy tests with missing-metadata failure handling.
Added persistence and schema unit-test coverage.
Strengthened the generation integration test to cover partial success.
Strengthened the replay integration test to include round-order stability.
Split sampler and updater swap checks explicitly in plug-in tests.
Added API contract integration tests.
Expanded end-to-end coverage to require at least two feedback modes.
Expanded end-to-end coverage to include replay opening.
Added recoverable-error display checks to end-to-end tests.
Strengthened deterministic replay checks with round summaries.
Added a separate regression-test section.
Added edge-case prompt regression coverage.
Added edge-case feedback-payload regression coverage.
Added replay-bug regression coverage.
Added an explicit failure-mode test section.
Added export-failure testing.
Added database interruption and resume testing.
Expanded fixtures with schema snapshots.
Strengthened acceptance criteria with failure-mode coverage.
Added test-reporting expectations so failures are easier to interpret.

5. Pre-Implementation Blueprint: Top 30 Improvements Applied

Added a document-role section to frame this as an implementation handoff.
Added links to the related research and test docs.
Added implementation principles before scope details.
Reframed v1 scope as a concrete engineering boundary.
Tightened out-of-scope items to reduce future drift.
Added a clear assumptions section to lock environment defaults.
Added a requirement for mock generation during testing.
Reworked open decisions into decisions that should be fixed before coding.
Kept default-model choice explicit and actionable.
Kept default-basis choice explicit and actionable.
Kept default-feedback choice explicit and actionable.
Kept default-updater choice explicit and actionable.
Clarified frontend responsibilities versus non-responsibilities.
Clarified backend responsibilities versus non-responsibilities.
Clarified storage responsibilities and exclusions.
Strengthened the session contract before implementation.
Strengthened the candidate contract before implementation.
Strengthened the feedback contract before implementation.
Strengthened the replay contract before implementation.
Renamed implementation order to delivery order for clearer project planning.
Tightened minimal API decisions as pre-coding agreements.
Expanded non-functional requirements for reproducibility.
Expanded non-functional requirements for debuggability.
Expanded non-functional requirements for modularity.
Reworked risk sections into explicit risk-and-mitigation pairs.
Added a clearer definition of implementation readiness.
Added delivery milestones to make the blueprint easier to execute.
Improved consistency of terminology with the system spec.
Reduced ambiguity around what must be decided before coding starts.
Strengthened the summary so the document reads as an actual handoff artifact.