Files
stonks-oracle/.kiro/specs/sanitized-pipeline-docs/requirements.md
T
Celes Renata 88ad1e8d99 feat: comprehensive docs, unit tests, docker-compose app services
- Add scheduler and ingestion unit tests (test_scheduler_unit.py, test_ingestion_unit.py)
- Add all 13 app services + dashboard to docker-compose.yml
- Add full documentation suite: API reference, Helm reference, Docker deployment guide,
  3 architecture diagrams (K8s, Docker Compose, data pipeline), AI agent guide,
  backup/restore guide, observability/metrics reference, per-service docs
- Add intelligence pipeline deep-dive docs with Mermaid diagrams
- Update README with documentation index and links
- Add specs for comprehensive-quality-docs, intelligence-pipeline-deep-dive,
  sanitized-pipeline-docs
2026-04-22 02:56:41 +00:00

203 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Requirements Document
## Introduction
This feature produces a sanitized version of the existing 6-page intelligence pipeline deep dive documentation (`docs/intelligence-pipeline-deep-dive/`) for use in a work presentation. The sanitized version strips all financial, market, and trading language — stock tickers, buy/sell/hold actions, portfolio allocation, broker APIs, and domain-specific framing — and reframes the content as a general-purpose AI decision intelligence pipeline. The sanitized docs are stored as a separate doc group under `docs/sanitized-pipeline-deep-dive/`, preserving the original documents untouched. All engineering depth — algorithms, formulas, architectural patterns, queue topologies, database schemas, code module references, and Mermaid diagrams — is preserved. Only the domain-specific framing changes.
## Glossary
- **Source_Docs**: The original 6-page documentation set at `docs/intelligence-pipeline-deep-dive/`, including `index.md`, pages `01` through `06`, and the `diagrams/` subdirectory containing 6 Mermaid diagram files.
- **Sanitized_Docs**: The output documentation set at `docs/sanitized-pipeline-deep-dive/`, mirroring the structure of Source_Docs with all financial/market/trading language replaced by domain-neutral equivalents.
- **Sanitization_Engine**: The process (manual or automated) that transforms Source_Docs into Sanitized_Docs by applying the terminology mapping and content reframing rules defined in this document.
- **Terminology_Map**: The defined set of financial/market/trading terms and their domain-neutral replacements used by the Sanitization_Engine.
- **Entity_Identifier**: The domain-neutral replacement for stock ticker symbols (e.g., AAPL, TSLA) in Sanitized_Docs.
- **Decision_Term**: A domain-neutral action term (act, defer, monitor, observe) that replaces trading actions (buy, sell, hold, watch) in Sanitized_Docs.
- **Decision_Execution_Engine**: The domain-neutral name for the trading engine in Sanitized_Docs.
- **Execution_Adapter**: The domain-neutral name for broker adapters and broker API references in Sanitized_Docs.
- **Allocation_Pool**: The domain-neutral name for portfolio references in Sanitized_Docs.
- **Commitment_Sizing**: The domain-neutral name for position sizing in Sanitized_Docs.
---
## Requirements
### Requirement 1: Separate Output Directory
**User Story:** As a presenter, I want the sanitized docs stored in a separate directory from the originals, so that the original documentation remains untouched and both versions coexist.
#### Acceptance Criteria
1. THE Sanitization_Engine SHALL write all output files to `docs/sanitized-pipeline-deep-dive/`.
2. THE Sanitization_Engine SHALL NOT modify, overwrite, or delete any file under `docs/intelligence-pipeline-deep-dive/`.
3. THE Sanitized_Docs SHALL contain an `index.md` file at the root of `docs/sanitized-pipeline-deep-dive/`.
4. THE Sanitized_Docs SHALL contain a `diagrams/` subdirectory under `docs/sanitized-pipeline-deep-dive/`.
---
### Requirement 2: Mirror the 6-Page Structure
**User Story:** As a presenter, I want the sanitized docs to mirror the same 6-page structure as the originals, so that readers familiar with the original can navigate the sanitized version identically.
#### Acceptance Criteria
1. THE Sanitized_Docs SHALL contain exactly 6 numbered page files matching the naming pattern of Source_Docs: `01-*.md` through `06-*.md`.
2. THE Sanitized_Docs SHALL contain an `index.md` with a table of contents linking to all 6 pages and all diagrams, mirroring the structure of the Source_Docs index.
3. THE Sanitized_Docs SHALL contain one Mermaid diagram file in `diagrams/` for each diagram file present in `docs/intelligence-pipeline-deep-dive/diagrams/`.
4. WHEN a Source_Docs page contains internal cross-references to other pages or diagrams, THE Sanitized_Docs equivalent page SHALL contain corresponding cross-references pointing to the Sanitized_Docs versions of those pages and diagrams.
5. THE Sanitized_Docs page filenames SHALL use sanitized titles (e.g., `06-decision-execution.md` instead of `06-trading-decisions-and-execution.md`).
---
### Requirement 3: Strip Financial and Trading Terminology
**User Story:** As a presenter, I want all financial, market, and trading language removed from the sanitized docs, so that the presentation focuses on engineering without revealing the financial domain.
#### Acceptance Criteria
1. THE Sanitized_Docs SHALL NOT contain any stock ticker symbols (e.g., AAPL, TSLA, NVDA, XOM, META).
2. THE Sanitized_Docs SHALL NOT contain the trading action terms "buy", "sell", "hold", or "watch" when used as system action labels or decision outputs.
3. THE Sanitized_Docs SHALL NOT contain the terms "trading engine", "paper trading", "live trading", "paper_eligible", or "live_eligible".
4. THE Sanitized_Docs SHALL NOT contain the terms "portfolio", "portfolio allocation", "portfolio heat", or "portfolio snapshots" when referring to the resource management domain concept.
5. THE Sanitized_Docs SHALL NOT contain references to "broker", "Alpaca", "broker adapter", or "broker API".
6. THE Sanitized_Docs SHALL NOT contain the terms "stock market", "Wall Street", "bullish", "bearish", "position sizing" (as a financial concept label), or "stop-loss" (as a financial concept label).
7. THE Sanitized_Docs SHALL NOT contain company names used as financial examples (e.g., "Apple", "Tesla", "NVIDIA" when used in a stock/market context).
8. THE Sanitized_Docs SHALL NOT contain the terms "SEC EDGAR", "SEC filings", "10-K", "10-Q", "8-K", "earnings", "earnings call", or "earnings report" as domain-specific financial references.
9. THE Sanitized_Docs SHALL NOT contain references to "Polygon.io" or "Polygon" as a financial data provider name.
10. THE Sanitized_Docs SHALL NOT contain the term "Stonks Oracle" or "stonks" as a system name.
---
### Requirement 4: Apply Domain-Neutral Terminology Mapping
**User Story:** As a presenter, I want consistent domain-neutral replacements for all stripped terms, so that the sanitized docs read coherently as a general-purpose AI decision intelligence pipeline.
#### Acceptance Criteria
1. WHEN the Source_Docs use "stock ticker" or specific ticker symbols, THE Sanitized_Docs SHALL use "entity identifier" or "tracked entity".
2. WHEN the Source_Docs use "buy/sell/hold/watch" as action labels, THE Sanitized_Docs SHALL use "act/defer/monitor/observe" or equivalent neutral decision terms.
3. WHEN the Source_Docs use "trading engine", THE Sanitized_Docs SHALL use "decision execution engine" or "action engine".
4. WHEN the Source_Docs use "portfolio", THE Sanitized_Docs SHALL use "resource pool" or "allocation pool".
5. WHEN the Source_Docs use "broker" or "Alpaca", THE Sanitized_Docs SHALL use "execution adapter" or "external execution API".
6. WHEN the Source_Docs use "paper trading", THE Sanitized_Docs SHALL use "simulation mode" or "dry-run mode".
7. WHEN the Source_Docs use "live trading", THE Sanitized_Docs SHALL use "live execution mode" or "production mode".
8. WHEN the Source_Docs use "bullish" or "bearish", THE Sanitized_Docs SHALL use "positive" or "negative" (or "favorable"/"unfavorable").
9. WHEN the Source_Docs use "position sizing", THE Sanitized_Docs SHALL use "resource allocation" or "commitment sizing".
10. WHEN the Source_Docs use "stop-loss", THE Sanitized_Docs SHALL use "risk threshold" or "loss limit".
11. WHEN the Source_Docs use "Stonks Oracle" or "stonks", THE Sanitized_Docs SHALL use a neutral system name such as "the platform" or "the system".
12. WHEN the Source_Docs use "SEC EDGAR" or "SEC filings", THE Sanitized_Docs SHALL use "regulatory filings source" or "public records API".
13. WHEN the Source_Docs use "Polygon.io" or "Polygon", THE Sanitized_Docs SHALL use "external data provider" or "data source API".
14. WHEN the Source_Docs use "earnings" as a catalyst type or event, THE Sanitized_Docs SHALL use "performance report" or "periodic disclosure".
15. THE Sanitized_Docs SHALL apply the Terminology_Map consistently across all 6 pages, the index, and all diagram files.
---
### Requirement 5: Preserve Engineering and Technical Depth
**User Story:** As a presenter, I want all engineering concepts, algorithms, formulas, and architectural details preserved, so that the sanitized docs demonstrate the technical sophistication of the system.
#### Acceptance Criteria
1. THE Sanitized_Docs SHALL preserve all references to Redis queue patterns, including queue names and `rpush`/`lpop`/`blpop` operations.
2. THE Sanitized_Docs SHALL preserve all references to PostgreSQL tables, including table names and column descriptions.
3. THE Sanitized_Docs SHALL preserve all references to MinIO buckets and storage patterns.
4. THE Sanitized_Docs SHALL preserve all references to Ollama as the LLM inference provider.
5. THE Sanitized_Docs SHALL preserve the composite signal scoring formula: `combined = gate × recency × credibility × (1 + novelty_bonus) × market_context_multiplier`.
6. THE Sanitized_Docs SHALL preserve the confidence computation formula with log₂ scaling and its four components (unique source count, average extraction credibility, signal agreement with sample-size dampening, contradiction penalty).
7. THE Sanitized_Docs SHALL preserve the weighted sentiment average formula: `weighted_avg = Σ(combined_weight × impact_score × sentiment_value) / Σ(combined_weight × impact_score)`.
8. THE Sanitized_Docs SHALL preserve all code module path references (e.g., `services/aggregation/scoring.py`, `services/recommendation/eligibility.py`).
9. THE Sanitized_Docs SHALL preserve the three-layer signal architecture, renaming the layers with domain-neutral labels (e.g., "Entity-Specific Signals", "Environmental Signals", "Relational Signals") while retaining the weight ratios (1.0, 0.3, 0.2).
10. THE Sanitized_Docs SHALL preserve all threshold values, configuration parameters, and numeric constants (e.g., confidence gate of 0.2, recency half-lives per window, eligibility thresholds).
11. THE Sanitized_Docs SHALL preserve all Markdown table structures containing technical parameters and thresholds.
12. THE Sanitized_Docs SHALL preserve the contradiction detection algorithm, evidence ranking methodology, and trend projection computation.
---
### Requirement 6: Sanitize Mermaid Diagrams
**User Story:** As a presenter, I want the Mermaid diagrams sanitized with the same terminology mapping as the narrative pages, so that diagrams and text are consistent.
#### Acceptance Criteria
1. THE Sanitized_Docs SHALL contain one sanitized Mermaid diagram file for each of the 6 diagram files in Source_Docs.
2. WHEN a Source_Docs diagram contains financial/trading terminology (e.g., "trading engine", "buy/sell", "paper_eligible", "bullish/bearish", ticker symbols), THE corresponding Sanitized_Docs diagram SHALL use the same domain-neutral replacements defined in the Terminology_Map.
3. THE Sanitized_Docs diagrams SHALL preserve all Mermaid syntax, node relationships, subgraph structures, and flow directions from the Source_Docs diagrams.
4. THE Sanitized_Docs diagrams SHALL preserve all code module path references and service names within diagram nodes.
5. THE Sanitized_Docs diagram filenames SHALL use sanitized names where the original names contain financial terms (e.g., `decision-engine-loop.md` instead of `trading-engine-decision-loop.md`).
---
### Requirement 7: Sanitize Redis Key and Queue Name References
**User Story:** As a presenter, I want Redis key patterns and queue names sanitized where they contain financial terms, so that even infrastructure-level references are domain-neutral.
#### Acceptance Criteria
1. WHEN a Source_Docs Redis queue name contains "stonks" (e.g., `stonks:queue:ingestion`), THE Sanitized_Docs SHALL replace "stonks" with a neutral prefix (e.g., `app:queue:ingestion`).
2. WHEN a Source_Docs Redis key pattern contains "trading" (e.g., `stonks:queue:broker_orders`, `stonks:trading:circuit_breaker:*`), THE Sanitized_Docs SHALL replace the trading-specific segment with a neutral equivalent (e.g., `app:queue:execution_orders`, `app:execution:circuit_breaker:*`).
3. THE Sanitized_Docs SHALL apply Redis key sanitization consistently across all narrative pages and diagram files.
---
### Requirement 8: Sanitize MinIO Bucket Name References
**User Story:** As a presenter, I want MinIO bucket names sanitized where they contain financial terms, so that storage references are domain-neutral.
#### Acceptance Criteria
1. WHEN a Source_Docs MinIO bucket name contains "stonks" (e.g., `stonks-raw-market`, `stonks-raw-news`, `stonks-normalized`), THE Sanitized_Docs SHALL replace "stonks" with a neutral prefix (e.g., `app-raw-data`, `app-raw-content`, `app-normalized`).
2. THE Sanitized_Docs SHALL apply MinIO bucket name sanitization consistently across all narrative pages and diagram files.
---
### Requirement 9: Sanitize Database Table and Column References Where Needed
**User Story:** As a presenter, I want database table and column names that contain obvious financial terms sanitized, while preserving the overall schema structure.
#### Acceptance Criteria
1. WHEN a Source_Docs database table name contains "trading" (e.g., `trading_decisions`), THE Sanitized_Docs SHALL use a neutral equivalent (e.g., `execution_decisions`).
2. WHEN a Source_Docs database table or column references "portfolio" (e.g., `portfolio_snapshots`, `portfolio_pct`), THE Sanitized_Docs SHALL use a neutral equivalent (e.g., `pool_snapshots`, `allocation_pct`).
3. THE Sanitized_Docs SHALL preserve all other database table names that do not contain financial-specific terms (e.g., `documents`, `document_intelligence`, `trend_windows`, `recommendations`).
4. THE Sanitized_Docs SHALL apply database reference sanitization consistently across all narrative pages.
---
### Requirement 10: Sanitize Example Scenarios and Inline References
**User Story:** As a presenter, I want all inline examples, scenario walkthroughs, and narrative references sanitized, so that no financial context leaks through illustrative content.
#### Acceptance Criteria
1. WHEN a Source_Docs page uses a specific company name or ticker in an example scenario (e.g., "a bearish article about AAPL"), THE Sanitized_Docs SHALL replace the reference with a generic entity (e.g., "a negative-sentiment article about Entity-A").
2. WHEN a Source_Docs page describes a financial event as an example (e.g., "earnings miss", "tariff announcement affecting XOM"), THE Sanitized_Docs SHALL reframe the example using domain-neutral language (e.g., "a negative performance disclosure", "a regulatory policy change affecting Entity-B").
3. WHEN a Source_Docs page references market-specific concepts in narrative flow (e.g., "markets move fast", "trading volume", "intraday swings"), THE Sanitized_Docs SHALL reframe using neutral language (e.g., "conditions change rapidly", "activity volume", "short-term fluctuations").
4. THE Sanitized_Docs SHALL preserve the logical structure and teaching purpose of all example scenarios while removing the financial framing.
---
### Requirement 11: Preserve Acceptable Engineering Terms
**User Story:** As a presenter, I want general engineering terms that happen to overlap with financial language preserved when they describe engineering patterns, so that the technical accuracy is maintained.
#### Acceptance Criteria
1. THE Sanitized_Docs SHALL preserve the term "circuit breaker" when it describes the engineering safety pattern (rate limiting, cascading failure prevention).
2. THE Sanitized_Docs SHALL preserve the term "exponential backoff" and all retry/backoff patterns.
3. THE Sanitized_Docs SHALL preserve all adapter pattern references (the software design pattern), renaming only the domain-specific adapter names (e.g., "AlpacaBrokerAdapter" becomes a neutral name).
4. THE Sanitized_Docs SHALL preserve the term "signal" as used in the signal processing and scoring context.
5. THE Sanitized_Docs SHALL preserve the terms "trend", "sentiment", "confidence", "contradiction", and "evidence" as used in the data analysis context.
---
### Requirement 12: Reframe the System Narrative
**User Story:** As a presenter, I want the overall system narrative reframed as a general-purpose AI decision intelligence pipeline, so that the presentation tells a coherent story without financial context.
#### Acceptance Criteria
1. THE Sanitized_Docs index page SHALL describe the system as an "AI-driven intelligence-to-decision pipeline" that ingests data from multiple sources, extracts structured intelligence via NLP/LLM, scores and weights signals, aggregates trends across time windows, generates recommendations with quality gates, and executes decisions autonomously with safety mechanisms.
2. THE Sanitized_Docs page 01 SHALL describe data ingestion from "multiple external data sources" rather than from financial-specific APIs.
3. THE Sanitized_Docs page 06 SHALL describe "autonomous decision execution with safety mechanisms" rather than "trading decisions and execution".
4. WHEN the Source_Docs conclusion references the "intelligence-to-decision pipeline in Stonks Oracle", THE Sanitized_Docs conclusion SHALL reference the "intelligence-to-decision pipeline" without a financial system name.
5. THE Sanitized_Docs SHALL maintain the narrative flow where each page ends with a transition to the next page, preserving the end-to-end story structure.