88ad1e8d99
- Add scheduler and ingestion unit tests (test_scheduler_unit.py, test_ingestion_unit.py) - Add all 13 app services + dashboard to docker-compose.yml - Add full documentation suite: API reference, Helm reference, Docker deployment guide, 3 architecture diagrams (K8s, Docker Compose, data pipeline), AI agent guide, backup/restore guide, observability/metrics reference, per-service docs - Add intelligence pipeline deep-dive docs with Mermaid diagrams - Update README with documentation index and links - Add specs for comprehensive-quality-docs, intelligence-pipeline-deep-dive, sanitized-pipeline-docs
203 lines
17 KiB
Markdown
203 lines
17 KiB
Markdown
# Requirements Document
|
||
|
||
## Introduction
|
||
|
||
This feature produces a sanitized version of the existing 6-page intelligence pipeline deep dive documentation (`docs/intelligence-pipeline-deep-dive/`) for use in a work presentation. The sanitized version strips all financial, market, and trading language — stock tickers, buy/sell/hold actions, portfolio allocation, broker APIs, and domain-specific framing — and reframes the content as a general-purpose AI decision intelligence pipeline. The sanitized docs are stored as a separate doc group under `docs/sanitized-pipeline-deep-dive/`, preserving the original documents untouched. All engineering depth — algorithms, formulas, architectural patterns, queue topologies, database schemas, code module references, and Mermaid diagrams — is preserved. Only the domain-specific framing changes.
|
||
|
||
## Glossary
|
||
|
||
- **Source_Docs**: The original 6-page documentation set at `docs/intelligence-pipeline-deep-dive/`, including `index.md`, pages `01` through `06`, and the `diagrams/` subdirectory containing 6 Mermaid diagram files.
|
||
- **Sanitized_Docs**: The output documentation set at `docs/sanitized-pipeline-deep-dive/`, mirroring the structure of Source_Docs with all financial/market/trading language replaced by domain-neutral equivalents.
|
||
- **Sanitization_Engine**: The process (manual or automated) that transforms Source_Docs into Sanitized_Docs by applying the terminology mapping and content reframing rules defined in this document.
|
||
- **Terminology_Map**: The defined set of financial/market/trading terms and their domain-neutral replacements used by the Sanitization_Engine.
|
||
- **Entity_Identifier**: The domain-neutral replacement for stock ticker symbols (e.g., AAPL, TSLA) in Sanitized_Docs.
|
||
- **Decision_Term**: A domain-neutral action term (act, defer, monitor, observe) that replaces trading actions (buy, sell, hold, watch) in Sanitized_Docs.
|
||
- **Decision_Execution_Engine**: The domain-neutral name for the trading engine in Sanitized_Docs.
|
||
- **Execution_Adapter**: The domain-neutral name for broker adapters and broker API references in Sanitized_Docs.
|
||
- **Allocation_Pool**: The domain-neutral name for portfolio references in Sanitized_Docs.
|
||
- **Commitment_Sizing**: The domain-neutral name for position sizing in Sanitized_Docs.
|
||
|
||
---
|
||
|
||
## Requirements
|
||
|
||
### Requirement 1: Separate Output Directory
|
||
|
||
**User Story:** As a presenter, I want the sanitized docs stored in a separate directory from the originals, so that the original documentation remains untouched and both versions coexist.
|
||
|
||
#### Acceptance Criteria
|
||
|
||
1. THE Sanitization_Engine SHALL write all output files to `docs/sanitized-pipeline-deep-dive/`.
|
||
2. THE Sanitization_Engine SHALL NOT modify, overwrite, or delete any file under `docs/intelligence-pipeline-deep-dive/`.
|
||
3. THE Sanitized_Docs SHALL contain an `index.md` file at the root of `docs/sanitized-pipeline-deep-dive/`.
|
||
4. THE Sanitized_Docs SHALL contain a `diagrams/` subdirectory under `docs/sanitized-pipeline-deep-dive/`.
|
||
|
||
---
|
||
|
||
### Requirement 2: Mirror the 6-Page Structure
|
||
|
||
**User Story:** As a presenter, I want the sanitized docs to mirror the same 6-page structure as the originals, so that readers familiar with the original can navigate the sanitized version identically.
|
||
|
||
#### Acceptance Criteria
|
||
|
||
1. THE Sanitized_Docs SHALL contain exactly 6 numbered page files matching the naming pattern of Source_Docs: `01-*.md` through `06-*.md`.
|
||
2. THE Sanitized_Docs SHALL contain an `index.md` with a table of contents linking to all 6 pages and all diagrams, mirroring the structure of the Source_Docs index.
|
||
3. THE Sanitized_Docs SHALL contain one Mermaid diagram file in `diagrams/` for each diagram file present in `docs/intelligence-pipeline-deep-dive/diagrams/`.
|
||
4. WHEN a Source_Docs page contains internal cross-references to other pages or diagrams, THE Sanitized_Docs equivalent page SHALL contain corresponding cross-references pointing to the Sanitized_Docs versions of those pages and diagrams.
|
||
5. THE Sanitized_Docs page filenames SHALL use sanitized titles (e.g., `06-decision-execution.md` instead of `06-trading-decisions-and-execution.md`).
|
||
|
||
---
|
||
|
||
### Requirement 3: Strip Financial and Trading Terminology
|
||
|
||
**User Story:** As a presenter, I want all financial, market, and trading language removed from the sanitized docs, so that the presentation focuses on engineering without revealing the financial domain.
|
||
|
||
#### Acceptance Criteria
|
||
|
||
1. THE Sanitized_Docs SHALL NOT contain any stock ticker symbols (e.g., AAPL, TSLA, NVDA, XOM, META).
|
||
2. THE Sanitized_Docs SHALL NOT contain the trading action terms "buy", "sell", "hold", or "watch" when used as system action labels or decision outputs.
|
||
3. THE Sanitized_Docs SHALL NOT contain the terms "trading engine", "paper trading", "live trading", "paper_eligible", or "live_eligible".
|
||
4. THE Sanitized_Docs SHALL NOT contain the terms "portfolio", "portfolio allocation", "portfolio heat", or "portfolio snapshots" when referring to the resource management domain concept.
|
||
5. THE Sanitized_Docs SHALL NOT contain references to "broker", "Alpaca", "broker adapter", or "broker API".
|
||
6. THE Sanitized_Docs SHALL NOT contain the terms "stock market", "Wall Street", "bullish", "bearish", "position sizing" (as a financial concept label), or "stop-loss" (as a financial concept label).
|
||
7. THE Sanitized_Docs SHALL NOT contain company names used as financial examples (e.g., "Apple", "Tesla", "NVIDIA" when used in a stock/market context).
|
||
8. THE Sanitized_Docs SHALL NOT contain the terms "SEC EDGAR", "SEC filings", "10-K", "10-Q", "8-K", "earnings", "earnings call", or "earnings report" as domain-specific financial references.
|
||
9. THE Sanitized_Docs SHALL NOT contain references to "Polygon.io" or "Polygon" as a financial data provider name.
|
||
10. THE Sanitized_Docs SHALL NOT contain the term "Stonks Oracle" or "stonks" as a system name.
|
||
|
||
---
|
||
|
||
### Requirement 4: Apply Domain-Neutral Terminology Mapping
|
||
|
||
**User Story:** As a presenter, I want consistent domain-neutral replacements for all stripped terms, so that the sanitized docs read coherently as a general-purpose AI decision intelligence pipeline.
|
||
|
||
#### Acceptance Criteria
|
||
|
||
1. WHEN the Source_Docs use "stock ticker" or specific ticker symbols, THE Sanitized_Docs SHALL use "entity identifier" or "tracked entity".
|
||
2. WHEN the Source_Docs use "buy/sell/hold/watch" as action labels, THE Sanitized_Docs SHALL use "act/defer/monitor/observe" or equivalent neutral decision terms.
|
||
3. WHEN the Source_Docs use "trading engine", THE Sanitized_Docs SHALL use "decision execution engine" or "action engine".
|
||
4. WHEN the Source_Docs use "portfolio", THE Sanitized_Docs SHALL use "resource pool" or "allocation pool".
|
||
5. WHEN the Source_Docs use "broker" or "Alpaca", THE Sanitized_Docs SHALL use "execution adapter" or "external execution API".
|
||
6. WHEN the Source_Docs use "paper trading", THE Sanitized_Docs SHALL use "simulation mode" or "dry-run mode".
|
||
7. WHEN the Source_Docs use "live trading", THE Sanitized_Docs SHALL use "live execution mode" or "production mode".
|
||
8. WHEN the Source_Docs use "bullish" or "bearish", THE Sanitized_Docs SHALL use "positive" or "negative" (or "favorable"/"unfavorable").
|
||
9. WHEN the Source_Docs use "position sizing", THE Sanitized_Docs SHALL use "resource allocation" or "commitment sizing".
|
||
10. WHEN the Source_Docs use "stop-loss", THE Sanitized_Docs SHALL use "risk threshold" or "loss limit".
|
||
11. WHEN the Source_Docs use "Stonks Oracle" or "stonks", THE Sanitized_Docs SHALL use a neutral system name such as "the platform" or "the system".
|
||
12. WHEN the Source_Docs use "SEC EDGAR" or "SEC filings", THE Sanitized_Docs SHALL use "regulatory filings source" or "public records API".
|
||
13. WHEN the Source_Docs use "Polygon.io" or "Polygon", THE Sanitized_Docs SHALL use "external data provider" or "data source API".
|
||
14. WHEN the Source_Docs use "earnings" as a catalyst type or event, THE Sanitized_Docs SHALL use "performance report" or "periodic disclosure".
|
||
15. THE Sanitized_Docs SHALL apply the Terminology_Map consistently across all 6 pages, the index, and all diagram files.
|
||
|
||
---
|
||
|
||
### Requirement 5: Preserve Engineering and Technical Depth
|
||
|
||
**User Story:** As a presenter, I want all engineering concepts, algorithms, formulas, and architectural details preserved, so that the sanitized docs demonstrate the technical sophistication of the system.
|
||
|
||
#### Acceptance Criteria
|
||
|
||
1. THE Sanitized_Docs SHALL preserve all references to Redis queue patterns, including queue names and `rpush`/`lpop`/`blpop` operations.
|
||
2. THE Sanitized_Docs SHALL preserve all references to PostgreSQL tables, including table names and column descriptions.
|
||
3. THE Sanitized_Docs SHALL preserve all references to MinIO buckets and storage patterns.
|
||
4. THE Sanitized_Docs SHALL preserve all references to Ollama as the LLM inference provider.
|
||
5. THE Sanitized_Docs SHALL preserve the composite signal scoring formula: `combined = gate × recency × credibility × (1 + novelty_bonus) × market_context_multiplier`.
|
||
6. THE Sanitized_Docs SHALL preserve the confidence computation formula with log₂ scaling and its four components (unique source count, average extraction credibility, signal agreement with sample-size dampening, contradiction penalty).
|
||
7. THE Sanitized_Docs SHALL preserve the weighted sentiment average formula: `weighted_avg = Σ(combined_weight × impact_score × sentiment_value) / Σ(combined_weight × impact_score)`.
|
||
8. THE Sanitized_Docs SHALL preserve all code module path references (e.g., `services/aggregation/scoring.py`, `services/recommendation/eligibility.py`).
|
||
9. THE Sanitized_Docs SHALL preserve the three-layer signal architecture, renaming the layers with domain-neutral labels (e.g., "Entity-Specific Signals", "Environmental Signals", "Relational Signals") while retaining the weight ratios (1.0, 0.3, 0.2).
|
||
10. THE Sanitized_Docs SHALL preserve all threshold values, configuration parameters, and numeric constants (e.g., confidence gate of 0.2, recency half-lives per window, eligibility thresholds).
|
||
11. THE Sanitized_Docs SHALL preserve all Markdown table structures containing technical parameters and thresholds.
|
||
12. THE Sanitized_Docs SHALL preserve the contradiction detection algorithm, evidence ranking methodology, and trend projection computation.
|
||
|
||
---
|
||
|
||
### Requirement 6: Sanitize Mermaid Diagrams
|
||
|
||
**User Story:** As a presenter, I want the Mermaid diagrams sanitized with the same terminology mapping as the narrative pages, so that diagrams and text are consistent.
|
||
|
||
#### Acceptance Criteria
|
||
|
||
1. THE Sanitized_Docs SHALL contain one sanitized Mermaid diagram file for each of the 6 diagram files in Source_Docs.
|
||
2. WHEN a Source_Docs diagram contains financial/trading terminology (e.g., "trading engine", "buy/sell", "paper_eligible", "bullish/bearish", ticker symbols), THE corresponding Sanitized_Docs diagram SHALL use the same domain-neutral replacements defined in the Terminology_Map.
|
||
3. THE Sanitized_Docs diagrams SHALL preserve all Mermaid syntax, node relationships, subgraph structures, and flow directions from the Source_Docs diagrams.
|
||
4. THE Sanitized_Docs diagrams SHALL preserve all code module path references and service names within diagram nodes.
|
||
5. THE Sanitized_Docs diagram filenames SHALL use sanitized names where the original names contain financial terms (e.g., `decision-engine-loop.md` instead of `trading-engine-decision-loop.md`).
|
||
|
||
---
|
||
|
||
### Requirement 7: Sanitize Redis Key and Queue Name References
|
||
|
||
**User Story:** As a presenter, I want Redis key patterns and queue names sanitized where they contain financial terms, so that even infrastructure-level references are domain-neutral.
|
||
|
||
#### Acceptance Criteria
|
||
|
||
1. WHEN a Source_Docs Redis queue name contains "stonks" (e.g., `stonks:queue:ingestion`), THE Sanitized_Docs SHALL replace "stonks" with a neutral prefix (e.g., `app:queue:ingestion`).
|
||
2. WHEN a Source_Docs Redis key pattern contains "trading" (e.g., `stonks:queue:broker_orders`, `stonks:trading:circuit_breaker:*`), THE Sanitized_Docs SHALL replace the trading-specific segment with a neutral equivalent (e.g., `app:queue:execution_orders`, `app:execution:circuit_breaker:*`).
|
||
3. THE Sanitized_Docs SHALL apply Redis key sanitization consistently across all narrative pages and diagram files.
|
||
|
||
---
|
||
|
||
### Requirement 8: Sanitize MinIO Bucket Name References
|
||
|
||
**User Story:** As a presenter, I want MinIO bucket names sanitized where they contain financial terms, so that storage references are domain-neutral.
|
||
|
||
#### Acceptance Criteria
|
||
|
||
1. WHEN a Source_Docs MinIO bucket name contains "stonks" (e.g., `stonks-raw-market`, `stonks-raw-news`, `stonks-normalized`), THE Sanitized_Docs SHALL replace "stonks" with a neutral prefix (e.g., `app-raw-data`, `app-raw-content`, `app-normalized`).
|
||
2. THE Sanitized_Docs SHALL apply MinIO bucket name sanitization consistently across all narrative pages and diagram files.
|
||
|
||
---
|
||
|
||
### Requirement 9: Sanitize Database Table and Column References Where Needed
|
||
|
||
**User Story:** As a presenter, I want database table and column names that contain obvious financial terms sanitized, while preserving the overall schema structure.
|
||
|
||
#### Acceptance Criteria
|
||
|
||
1. WHEN a Source_Docs database table name contains "trading" (e.g., `trading_decisions`), THE Sanitized_Docs SHALL use a neutral equivalent (e.g., `execution_decisions`).
|
||
2. WHEN a Source_Docs database table or column references "portfolio" (e.g., `portfolio_snapshots`, `portfolio_pct`), THE Sanitized_Docs SHALL use a neutral equivalent (e.g., `pool_snapshots`, `allocation_pct`).
|
||
3. THE Sanitized_Docs SHALL preserve all other database table names that do not contain financial-specific terms (e.g., `documents`, `document_intelligence`, `trend_windows`, `recommendations`).
|
||
4. THE Sanitized_Docs SHALL apply database reference sanitization consistently across all narrative pages.
|
||
|
||
---
|
||
|
||
### Requirement 10: Sanitize Example Scenarios and Inline References
|
||
|
||
**User Story:** As a presenter, I want all inline examples, scenario walkthroughs, and narrative references sanitized, so that no financial context leaks through illustrative content.
|
||
|
||
#### Acceptance Criteria
|
||
|
||
1. WHEN a Source_Docs page uses a specific company name or ticker in an example scenario (e.g., "a bearish article about AAPL"), THE Sanitized_Docs SHALL replace the reference with a generic entity (e.g., "a negative-sentiment article about Entity-A").
|
||
2. WHEN a Source_Docs page describes a financial event as an example (e.g., "earnings miss", "tariff announcement affecting XOM"), THE Sanitized_Docs SHALL reframe the example using domain-neutral language (e.g., "a negative performance disclosure", "a regulatory policy change affecting Entity-B").
|
||
3. WHEN a Source_Docs page references market-specific concepts in narrative flow (e.g., "markets move fast", "trading volume", "intraday swings"), THE Sanitized_Docs SHALL reframe using neutral language (e.g., "conditions change rapidly", "activity volume", "short-term fluctuations").
|
||
4. THE Sanitized_Docs SHALL preserve the logical structure and teaching purpose of all example scenarios while removing the financial framing.
|
||
|
||
---
|
||
|
||
### Requirement 11: Preserve Acceptable Engineering Terms
|
||
|
||
**User Story:** As a presenter, I want general engineering terms that happen to overlap with financial language preserved when they describe engineering patterns, so that the technical accuracy is maintained.
|
||
|
||
#### Acceptance Criteria
|
||
|
||
1. THE Sanitized_Docs SHALL preserve the term "circuit breaker" when it describes the engineering safety pattern (rate limiting, cascading failure prevention).
|
||
2. THE Sanitized_Docs SHALL preserve the term "exponential backoff" and all retry/backoff patterns.
|
||
3. THE Sanitized_Docs SHALL preserve all adapter pattern references (the software design pattern), renaming only the domain-specific adapter names (e.g., "AlpacaBrokerAdapter" becomes a neutral name).
|
||
4. THE Sanitized_Docs SHALL preserve the term "signal" as used in the signal processing and scoring context.
|
||
5. THE Sanitized_Docs SHALL preserve the terms "trend", "sentiment", "confidence", "contradiction", and "evidence" as used in the data analysis context.
|
||
|
||
---
|
||
|
||
### Requirement 12: Reframe the System Narrative
|
||
|
||
**User Story:** As a presenter, I want the overall system narrative reframed as a general-purpose AI decision intelligence pipeline, so that the presentation tells a coherent story without financial context.
|
||
|
||
#### Acceptance Criteria
|
||
|
||
1. THE Sanitized_Docs index page SHALL describe the system as an "AI-driven intelligence-to-decision pipeline" that ingests data from multiple sources, extracts structured intelligence via NLP/LLM, scores and weights signals, aggregates trends across time windows, generates recommendations with quality gates, and executes decisions autonomously with safety mechanisms.
|
||
2. THE Sanitized_Docs page 01 SHALL describe data ingestion from "multiple external data sources" rather than from financial-specific APIs.
|
||
3. THE Sanitized_Docs page 06 SHALL describe "autonomous decision execution with safety mechanisms" rather than "trading decisions and execution".
|
||
4. WHEN the Source_Docs conclusion references the "intelligence-to-decision pipeline in Stonks Oracle", THE Sanitized_Docs conclusion SHALL reference the "intelligence-to-decision pipeline" without a financial system name.
|
||
5. THE Sanitized_Docs SHALL maintain the narrative flow where each page ends with a transition to the next page, preserving the end-to-end story structure.
|