Files
Celes Renata 88ad1e8d99 feat: comprehensive docs, unit tests, docker-compose app services
- Add scheduler and ingestion unit tests (test_scheduler_unit.py, test_ingestion_unit.py)
- Add all 13 app services + dashboard to docker-compose.yml
- Add full documentation suite: API reference, Helm reference, Docker deployment guide,
  3 architecture diagrams (K8s, Docker Compose, data pipeline), AI agent guide,
  backup/restore guide, observability/metrics reference, per-service docs
- Add intelligence pipeline deep-dive docs with Mermaid diagrams
- Update README with documentation index and links
- Add specs for comprehensive-quality-docs, intelligence-pipeline-deep-dive,
  sanitized-pipeline-docs
2026-04-22 02:56:41 +00:00

17 KiB
Raw Permalink Blame History

Requirements Document

Introduction

This feature produces a sanitized version of the existing 6-page intelligence pipeline deep dive documentation (docs/intelligence-pipeline-deep-dive/) for use in a work presentation. The sanitized version strips all financial, market, and trading language — stock tickers, buy/sell/hold actions, portfolio allocation, broker APIs, and domain-specific framing — and reframes the content as a general-purpose AI decision intelligence pipeline. The sanitized docs are stored as a separate doc group under docs/sanitized-pipeline-deep-dive/, preserving the original documents untouched. All engineering depth — algorithms, formulas, architectural patterns, queue topologies, database schemas, code module references, and Mermaid diagrams — is preserved. Only the domain-specific framing changes.

Glossary

  • Source_Docs: The original 6-page documentation set at docs/intelligence-pipeline-deep-dive/, including index.md, pages 01 through 06, and the diagrams/ subdirectory containing 6 Mermaid diagram files.
  • Sanitized_Docs: The output documentation set at docs/sanitized-pipeline-deep-dive/, mirroring the structure of Source_Docs with all financial/market/trading language replaced by domain-neutral equivalents.
  • Sanitization_Engine: The process (manual or automated) that transforms Source_Docs into Sanitized_Docs by applying the terminology mapping and content reframing rules defined in this document.
  • Terminology_Map: The defined set of financial/market/trading terms and their domain-neutral replacements used by the Sanitization_Engine.
  • Entity_Identifier: The domain-neutral replacement for stock ticker symbols (e.g., AAPL, TSLA) in Sanitized_Docs.
  • Decision_Term: A domain-neutral action term (act, defer, monitor, observe) that replaces trading actions (buy, sell, hold, watch) in Sanitized_Docs.
  • Decision_Execution_Engine: The domain-neutral name for the trading engine in Sanitized_Docs.
  • Execution_Adapter: The domain-neutral name for broker adapters and broker API references in Sanitized_Docs.
  • Allocation_Pool: The domain-neutral name for portfolio references in Sanitized_Docs.
  • Commitment_Sizing: The domain-neutral name for position sizing in Sanitized_Docs.

Requirements

Requirement 1: Separate Output Directory

User Story: As a presenter, I want the sanitized docs stored in a separate directory from the originals, so that the original documentation remains untouched and both versions coexist.

Acceptance Criteria

  1. THE Sanitization_Engine SHALL write all output files to docs/sanitized-pipeline-deep-dive/.
  2. THE Sanitization_Engine SHALL NOT modify, overwrite, or delete any file under docs/intelligence-pipeline-deep-dive/.
  3. THE Sanitized_Docs SHALL contain an index.md file at the root of docs/sanitized-pipeline-deep-dive/.
  4. THE Sanitized_Docs SHALL contain a diagrams/ subdirectory under docs/sanitized-pipeline-deep-dive/.

Requirement 2: Mirror the 6-Page Structure

User Story: As a presenter, I want the sanitized docs to mirror the same 6-page structure as the originals, so that readers familiar with the original can navigate the sanitized version identically.

Acceptance Criteria

  1. THE Sanitized_Docs SHALL contain exactly 6 numbered page files matching the naming pattern of Source_Docs: 01-*.md through 06-*.md.
  2. THE Sanitized_Docs SHALL contain an index.md with a table of contents linking to all 6 pages and all diagrams, mirroring the structure of the Source_Docs index.
  3. THE Sanitized_Docs SHALL contain one Mermaid diagram file in diagrams/ for each diagram file present in docs/intelligence-pipeline-deep-dive/diagrams/.
  4. WHEN a Source_Docs page contains internal cross-references to other pages or diagrams, THE Sanitized_Docs equivalent page SHALL contain corresponding cross-references pointing to the Sanitized_Docs versions of those pages and diagrams.
  5. THE Sanitized_Docs page filenames SHALL use sanitized titles (e.g., 06-decision-execution.md instead of 06-trading-decisions-and-execution.md).

Requirement 3: Strip Financial and Trading Terminology

User Story: As a presenter, I want all financial, market, and trading language removed from the sanitized docs, so that the presentation focuses on engineering without revealing the financial domain.

Acceptance Criteria

  1. THE Sanitized_Docs SHALL NOT contain any stock ticker symbols (e.g., AAPL, TSLA, NVDA, XOM, META).
  2. THE Sanitized_Docs SHALL NOT contain the trading action terms "buy", "sell", "hold", or "watch" when used as system action labels or decision outputs.
  3. THE Sanitized_Docs SHALL NOT contain the terms "trading engine", "paper trading", "live trading", "paper_eligible", or "live_eligible".
  4. THE Sanitized_Docs SHALL NOT contain the terms "portfolio", "portfolio allocation", "portfolio heat", or "portfolio snapshots" when referring to the resource management domain concept.
  5. THE Sanitized_Docs SHALL NOT contain references to "broker", "Alpaca", "broker adapter", or "broker API".
  6. THE Sanitized_Docs SHALL NOT contain the terms "stock market", "Wall Street", "bullish", "bearish", "position sizing" (as a financial concept label), or "stop-loss" (as a financial concept label).
  7. THE Sanitized_Docs SHALL NOT contain company names used as financial examples (e.g., "Apple", "Tesla", "NVIDIA" when used in a stock/market context).
  8. THE Sanitized_Docs SHALL NOT contain the terms "SEC EDGAR", "SEC filings", "10-K", "10-Q", "8-K", "earnings", "earnings call", or "earnings report" as domain-specific financial references.
  9. THE Sanitized_Docs SHALL NOT contain references to "Polygon.io" or "Polygon" as a financial data provider name.
  10. THE Sanitized_Docs SHALL NOT contain the term "Stonks Oracle" or "stonks" as a system name.

Requirement 4: Apply Domain-Neutral Terminology Mapping

User Story: As a presenter, I want consistent domain-neutral replacements for all stripped terms, so that the sanitized docs read coherently as a general-purpose AI decision intelligence pipeline.

Acceptance Criteria

  1. WHEN the Source_Docs use "stock ticker" or specific ticker symbols, THE Sanitized_Docs SHALL use "entity identifier" or "tracked entity".
  2. WHEN the Source_Docs use "buy/sell/hold/watch" as action labels, THE Sanitized_Docs SHALL use "act/defer/monitor/observe" or equivalent neutral decision terms.
  3. WHEN the Source_Docs use "trading engine", THE Sanitized_Docs SHALL use "decision execution engine" or "action engine".
  4. WHEN the Source_Docs use "portfolio", THE Sanitized_Docs SHALL use "resource pool" or "allocation pool".
  5. WHEN the Source_Docs use "broker" or "Alpaca", THE Sanitized_Docs SHALL use "execution adapter" or "external execution API".
  6. WHEN the Source_Docs use "paper trading", THE Sanitized_Docs SHALL use "simulation mode" or "dry-run mode".
  7. WHEN the Source_Docs use "live trading", THE Sanitized_Docs SHALL use "live execution mode" or "production mode".
  8. WHEN the Source_Docs use "bullish" or "bearish", THE Sanitized_Docs SHALL use "positive" or "negative" (or "favorable"/"unfavorable").
  9. WHEN the Source_Docs use "position sizing", THE Sanitized_Docs SHALL use "resource allocation" or "commitment sizing".
  10. WHEN the Source_Docs use "stop-loss", THE Sanitized_Docs SHALL use "risk threshold" or "loss limit".
  11. WHEN the Source_Docs use "Stonks Oracle" or "stonks", THE Sanitized_Docs SHALL use a neutral system name such as "the platform" or "the system".
  12. WHEN the Source_Docs use "SEC EDGAR" or "SEC filings", THE Sanitized_Docs SHALL use "regulatory filings source" or "public records API".
  13. WHEN the Source_Docs use "Polygon.io" or "Polygon", THE Sanitized_Docs SHALL use "external data provider" or "data source API".
  14. WHEN the Source_Docs use "earnings" as a catalyst type or event, THE Sanitized_Docs SHALL use "performance report" or "periodic disclosure".
  15. THE Sanitized_Docs SHALL apply the Terminology_Map consistently across all 6 pages, the index, and all diagram files.

Requirement 5: Preserve Engineering and Technical Depth

User Story: As a presenter, I want all engineering concepts, algorithms, formulas, and architectural details preserved, so that the sanitized docs demonstrate the technical sophistication of the system.

Acceptance Criteria

  1. THE Sanitized_Docs SHALL preserve all references to Redis queue patterns, including queue names and rpush/lpop/blpop operations.
  2. THE Sanitized_Docs SHALL preserve all references to PostgreSQL tables, including table names and column descriptions.
  3. THE Sanitized_Docs SHALL preserve all references to MinIO buckets and storage patterns.
  4. THE Sanitized_Docs SHALL preserve all references to Ollama as the LLM inference provider.
  5. THE Sanitized_Docs SHALL preserve the composite signal scoring formula: combined = gate × recency × credibility × (1 + novelty_bonus) × market_context_multiplier.
  6. THE Sanitized_Docs SHALL preserve the confidence computation formula with log₂ scaling and its four components (unique source count, average extraction credibility, signal agreement with sample-size dampening, contradiction penalty).
  7. THE Sanitized_Docs SHALL preserve the weighted sentiment average formula: weighted_avg = Σ(combined_weight × impact_score × sentiment_value) / Σ(combined_weight × impact_score).
  8. THE Sanitized_Docs SHALL preserve all code module path references (e.g., services/aggregation/scoring.py, services/recommendation/eligibility.py).
  9. THE Sanitized_Docs SHALL preserve the three-layer signal architecture, renaming the layers with domain-neutral labels (e.g., "Entity-Specific Signals", "Environmental Signals", "Relational Signals") while retaining the weight ratios (1.0, 0.3, 0.2).
  10. THE Sanitized_Docs SHALL preserve all threshold values, configuration parameters, and numeric constants (e.g., confidence gate of 0.2, recency half-lives per window, eligibility thresholds).
  11. THE Sanitized_Docs SHALL preserve all Markdown table structures containing technical parameters and thresholds.
  12. THE Sanitized_Docs SHALL preserve the contradiction detection algorithm, evidence ranking methodology, and trend projection computation.

Requirement 6: Sanitize Mermaid Diagrams

User Story: As a presenter, I want the Mermaid diagrams sanitized with the same terminology mapping as the narrative pages, so that diagrams and text are consistent.

Acceptance Criteria

  1. THE Sanitized_Docs SHALL contain one sanitized Mermaid diagram file for each of the 6 diagram files in Source_Docs.
  2. WHEN a Source_Docs diagram contains financial/trading terminology (e.g., "trading engine", "buy/sell", "paper_eligible", "bullish/bearish", ticker symbols), THE corresponding Sanitized_Docs diagram SHALL use the same domain-neutral replacements defined in the Terminology_Map.
  3. THE Sanitized_Docs diagrams SHALL preserve all Mermaid syntax, node relationships, subgraph structures, and flow directions from the Source_Docs diagrams.
  4. THE Sanitized_Docs diagrams SHALL preserve all code module path references and service names within diagram nodes.
  5. THE Sanitized_Docs diagram filenames SHALL use sanitized names where the original names contain financial terms (e.g., decision-engine-loop.md instead of trading-engine-decision-loop.md).

Requirement 7: Sanitize Redis Key and Queue Name References

User Story: As a presenter, I want Redis key patterns and queue names sanitized where they contain financial terms, so that even infrastructure-level references are domain-neutral.

Acceptance Criteria

  1. WHEN a Source_Docs Redis queue name contains "stonks" (e.g., stonks:queue:ingestion), THE Sanitized_Docs SHALL replace "stonks" with a neutral prefix (e.g., app:queue:ingestion).
  2. WHEN a Source_Docs Redis key pattern contains "trading" (e.g., stonks:queue:broker_orders, stonks:trading:circuit_breaker:*), THE Sanitized_Docs SHALL replace the trading-specific segment with a neutral equivalent (e.g., app:queue:execution_orders, app:execution:circuit_breaker:*).
  3. THE Sanitized_Docs SHALL apply Redis key sanitization consistently across all narrative pages and diagram files.

Requirement 8: Sanitize MinIO Bucket Name References

User Story: As a presenter, I want MinIO bucket names sanitized where they contain financial terms, so that storage references are domain-neutral.

Acceptance Criteria

  1. WHEN a Source_Docs MinIO bucket name contains "stonks" (e.g., stonks-raw-market, stonks-raw-news, stonks-normalized), THE Sanitized_Docs SHALL replace "stonks" with a neutral prefix (e.g., app-raw-data, app-raw-content, app-normalized).
  2. THE Sanitized_Docs SHALL apply MinIO bucket name sanitization consistently across all narrative pages and diagram files.

Requirement 9: Sanitize Database Table and Column References Where Needed

User Story: As a presenter, I want database table and column names that contain obvious financial terms sanitized, while preserving the overall schema structure.

Acceptance Criteria

  1. WHEN a Source_Docs database table name contains "trading" (e.g., trading_decisions), THE Sanitized_Docs SHALL use a neutral equivalent (e.g., execution_decisions).
  2. WHEN a Source_Docs database table or column references "portfolio" (e.g., portfolio_snapshots, portfolio_pct), THE Sanitized_Docs SHALL use a neutral equivalent (e.g., pool_snapshots, allocation_pct).
  3. THE Sanitized_Docs SHALL preserve all other database table names that do not contain financial-specific terms (e.g., documents, document_intelligence, trend_windows, recommendations).
  4. THE Sanitized_Docs SHALL apply database reference sanitization consistently across all narrative pages.

Requirement 10: Sanitize Example Scenarios and Inline References

User Story: As a presenter, I want all inline examples, scenario walkthroughs, and narrative references sanitized, so that no financial context leaks through illustrative content.

Acceptance Criteria

  1. WHEN a Source_Docs page uses a specific company name or ticker in an example scenario (e.g., "a bearish article about AAPL"), THE Sanitized_Docs SHALL replace the reference with a generic entity (e.g., "a negative-sentiment article about Entity-A").
  2. WHEN a Source_Docs page describes a financial event as an example (e.g., "earnings miss", "tariff announcement affecting XOM"), THE Sanitized_Docs SHALL reframe the example using domain-neutral language (e.g., "a negative performance disclosure", "a regulatory policy change affecting Entity-B").
  3. WHEN a Source_Docs page references market-specific concepts in narrative flow (e.g., "markets move fast", "trading volume", "intraday swings"), THE Sanitized_Docs SHALL reframe using neutral language (e.g., "conditions change rapidly", "activity volume", "short-term fluctuations").
  4. THE Sanitized_Docs SHALL preserve the logical structure and teaching purpose of all example scenarios while removing the financial framing.

Requirement 11: Preserve Acceptable Engineering Terms

User Story: As a presenter, I want general engineering terms that happen to overlap with financial language preserved when they describe engineering patterns, so that the technical accuracy is maintained.

Acceptance Criteria

  1. THE Sanitized_Docs SHALL preserve the term "circuit breaker" when it describes the engineering safety pattern (rate limiting, cascading failure prevention).
  2. THE Sanitized_Docs SHALL preserve the term "exponential backoff" and all retry/backoff patterns.
  3. THE Sanitized_Docs SHALL preserve all adapter pattern references (the software design pattern), renaming only the domain-specific adapter names (e.g., "AlpacaBrokerAdapter" becomes a neutral name).
  4. THE Sanitized_Docs SHALL preserve the term "signal" as used in the signal processing and scoring context.
  5. THE Sanitized_Docs SHALL preserve the terms "trend", "sentiment", "confidence", "contradiction", and "evidence" as used in the data analysis context.

Requirement 12: Reframe the System Narrative

User Story: As a presenter, I want the overall system narrative reframed as a general-purpose AI decision intelligence pipeline, so that the presentation tells a coherent story without financial context.

Acceptance Criteria

  1. THE Sanitized_Docs index page SHALL describe the system as an "AI-driven intelligence-to-decision pipeline" that ingests data from multiple sources, extracts structured intelligence via NLP/LLM, scores and weights signals, aggregates trends across time windows, generates recommendations with quality gates, and executes decisions autonomously with safety mechanisms.
  2. THE Sanitized_Docs page 01 SHALL describe data ingestion from "multiple external data sources" rather than from financial-specific APIs.
  3. THE Sanitized_Docs page 06 SHALL describe "autonomous decision execution with safety mechanisms" rather than "trading decisions and execution".
  4. WHEN the Source_Docs conclusion references the "intelligence-to-decision pipeline in Stonks Oracle", THE Sanitized_Docs conclusion SHALL reference the "intelligence-to-decision pipeline" without a financial system name.
  5. THE Sanitized_Docs SHALL maintain the narrative flow where each page ends with a transition to the next page, preserving the end-to-end story structure.