# Stonks Oracle - Requirements ## Overview This feature builds an AI-assisted market intelligence, execution, and analytics platform for a Kubernetes-hosted environment. The platform ingests market symbols, licensed market data, company-specific news, regulatory filings, scraped web sources, and broker execution events; stores raw and normalized artifacts; extracts structured JSON with local Ollama models; computes trend and sentiment summaries; and optionally places trades through a broker integration. The platform SHALL also maintain a local analytics lake on MinIO using Hive-compatible partitioned data, support Athena-like SQL querying over captured market and trade data, and expose QuickSight-like dashboards for research, review, and audit. The initial release is focused on reliable ingestion, deterministic structured extraction, explainable trend scoring, paper trading safety, and internal analytics visibility. ## User Stories - As an operator, I want to register companies, tickers, sectors, watchlists, and source rules so the system knows what to monitor. - As an analyst, I want every raw article, filing, market snapshot, and scrape artifact preserved so I can audit downstream AI conclusions. - As a data engineer, I want structured JSON extraction from each article and filing so downstream analytics are queryable. - As a strategist, I want aggregated trend assessments per symbol, sector, and market regime so I can evaluate opportunities. - As a trader, I want the system to generate explainable trade recommendations with explicit confidence, catalysts, and risk notes. - As a risk owner, I want strict controls on automated trading so the system cannot place unsafe orders. - As a quantitative reviewer, I want to query historical market data, AI predictions, and executed trades in one SQL-accessible analytics plane. - As a dashboard user, I want QuickSight-like visualizations for performance, signal quality, prediction accuracy, and model behavior. - As a platform owner, I want the system to run fully inside Kubernetes against local Ollama and self-hosted analytics components. ## Functional Requirements ### 1. Watchlist and source management #### Requirement 1.1 WHEN an operator creates or updates a tracked company THE SYSTEM SHALL persist the company profile including ticker, legal name, aliases, exchange, sector, industry, market cap bucket, and source configuration. #### Requirement 1.2 WHEN an operator defines a source configuration for a company THE SYSTEM SHALL support source types including market data APIs, news API feeds, SEC or investor relations URLs, company press release pages, earnings transcript sources, curated web pages, and broker-linked execution sources. #### Requirement 1.3 WHEN a company has aliases, brands, or product names THE SYSTEM SHALL use those aliases during source retrieval, de-duplication, entity matching, and extraction. ### 2. External API integrations #### Requirement 2.1 WHEN the scheduler triggers a market ingestion cycle THE SYSTEM SHALL fetch configured market data API results for tracked companies and persist raw response payloads. #### Requirement 2.2 WHEN the scheduler triggers a news ingestion cycle THE SYSTEM SHALL fetch configured news API results for tracked companies and persist raw response payloads. #### Requirement 2.3 WHEN the scheduler triggers a regulatory ingestion cycle THE SYSTEM SHALL fetch configured filing or issuer event data from authoritative sources such as SEC-style APIs and persist raw response payloads. #### Requirement 2.4 WHEN trade automation is enabled THE SYSTEM SHALL integrate with at least one broker API that supports paper trading, order placement, order status retrieval, positions, account balances, and execution events. #### Requirement 2.5 WHEN external APIs enforce rate limits or quotas THE SYSTEM SHALL coordinate request pacing, retries, and backoff across workers. ### 3. Ingestion and raw artifact retention #### Requirement 3.1 WHEN a scraper retrieves an article, filing, or web page THE SYSTEM SHALL store the raw HTML, rendered text, metadata, retrieval timestamp, and retrieval source in object storage. #### Requirement 3.2 WHEN an article, filing, or market payload is ingested THE SYSTEM SHALL generate a stable content hash and use it to prevent duplicate processing. #### Requirement 3.3 WHEN the system stores a raw artifact THE SYSTEM SHALL persist an associated metadata record containing symbol, source, URL when applicable, title, publication time, retrieval time, language when applicable, and content hash. #### Requirement 3.4 WHEN content retrieval fails THE SYSTEM SHALL record the failure reason, retry policy state, and next eligible retry time. ### 4. Parsing and normalization #### Requirement 4.1 WHEN a raw article or filing enters the parsing stage THE SYSTEM SHALL extract normalized text, author data when available, publisher, tags, mentioned entities, outbound links, and document type. #### Requirement 4.2 WHEN the system detects boilerplate or repeated template text THE SYSTEM SHALL reduce or remove boilerplate before AI extraction while retaining the original raw artifact for audit. #### Requirement 4.3 WHEN the parser cannot confidently extract article body text THE SYSTEM SHALL flag the document for low-quality extraction and prevent it from influencing downstream trading until reviewed or reprocessed. ### 5. AI article and document extraction #### Requirement 5.1 WHEN a normalized article or filing is ready for AI extraction THE SYSTEM SHALL send the document to a local Ollama model using structured output with an explicit JSON schema. #### Requirement 5.2 WHEN the model returns extraction output THE SYSTEM SHALL validate the response against the expected schema before saving it. #### Requirement 5.3 WHEN extraction succeeds THE SYSTEM SHALL produce a canonical document intelligence object with at minimum: - document_id - document_type - source metadata - tickers referenced - companies referenced - document summary - sentiment by company - catalyst type - impact horizon - key facts - risks mentioned - macro themes - confidence score - extraction warnings - model metadata #### Requirement 5.4 WHEN the model response is invalid, incomplete, or hallucinatory THE SYSTEM SHALL retry extraction according to policy and preserve both the failed output and validation errors. #### Requirement 5.5 WHEN a document is materially relevant to multiple companies THE SYSTEM SHALL emit one shared document record and one or more per-company impact records. ### 6. Aggregation and trend analysis #### Requirement 6.1 WHEN multiple document intelligence objects and market observations exist for a company THE SYSTEM SHALL generate a rolling company trend summary over configurable windows including intraday, 1 day, 7 day, 30 day, and 90 day intervals. #### Requirement 6.2 WHEN generating a company trend summary THE SYSTEM SHALL consider sentiment, catalyst frequency, source credibility, recency decay, contradiction detection, document novelty, and current market context. #### Requirement 6.3 WHEN generating a market-wide trend summary THE SYSTEM SHALL aggregate company-level signals into sector and market-level summaries. #### Requirement 6.4 WHEN contradictory signals exist across sources THE SYSTEM SHALL represent disagreement explicitly rather than collapsing it into a single unsupported conclusion. #### Requirement 6.5 WHEN a trend summary is produced THE SYSTEM SHALL include explainability fields listing the top supporting and opposing evidence. ### 7. Trade recommendation generation #### Requirement 7.1 WHEN a company trend summary is available THE SYSTEM SHALL be able to generate a recommendation object containing action type, thesis, confidence, expected horizon, invalidation conditions, and cited evidence. #### Requirement 7.2 WHEN a recommendation is generated THE SYSTEM SHALL separate descriptive analysis from prescriptive trade action and include a risk classification. #### Requirement 7.3 WHEN the system proposes a trade THE SYSTEM SHALL attach position sizing guidance based on configured portfolio rules rather than unconstrained model output. #### Requirement 7.4 WHEN the confidence or data quality falls below configured thresholds THE SYSTEM SHALL suppress automated trade eligibility and mark the recommendation as informational only. ### 8. Trade execution and safety controls #### Requirement 8.1 WHEN trade automation is enabled THE SYSTEM SHALL support paper trading mode and live trading mode as separate execution environments. #### Requirement 8.2 WHEN live trading mode is enabled THE SYSTEM SHALL require operator approval controls, risk limits, and broker credential isolation. #### Requirement 8.3 WHEN the system places an order THE SYSTEM SHALL persist the full decision trace including signals used, prompt versions, model versions, thresholds, and broker response. #### Requirement 8.4 WHEN a proposed order violates configured risk controls THE SYSTEM SHALL reject the order before broker submission. #### Requirement 8.5 WHEN a broker API is unavailable or partially fails THE SYSTEM SHALL fail closed and SHALL NOT place duplicate or ambiguous orders. ### 9. Storage and queryability #### Requirement 9.1 WHEN storing raw artifacts THE SYSTEM SHALL use MinIO object storage as the system of record for HTML, text, API payloads, prompts, model outputs, and exported analytical datasets. #### Requirement 9.2 WHEN storing normalized relational data THE SYSTEM SHALL use PostgreSQL for companies, watchlists, article metadata, document intelligence objects, trends, recommendations, operational execution records, and control-plane state. #### Requirement 9.3 WHEN low-latency coordination or caching is required THE SYSTEM SHALL use Redis for job state, distributed locks, short-lived caches, and rate-limit coordination. #### Requirement 9.4 WHEN historical analytical queries are needed THE SYSTEM SHALL persist analytical fact datasets in Hive-compatible partitioned form on MinIO so that market data, predictions, and trade outcomes can be queried together. #### Requirement 9.5 WHEN analytical table management is required THE SYSTEM SHALL support a lakehouse table abstraction that permits append-only fact ingestion, partitioned queries, and schema evolution. ### 10. SQL analytics and dashboards #### Requirement 10.1 WHEN a user or service executes an analytical query THE SYSTEM SHALL provide an Athena-like SQL query service over MinIO-hosted analytical datasets. #### Requirement 10.2 WHEN a dashboard user explores market, prediction, and trade data THE SYSTEM SHALL expose QuickSight-like dashboards for performance, confidence, prediction accuracy, evidence coverage, and model behavior. #### Requirement 10.3 WHEN analytical results combine AI outputs with executed trades and market outcomes THE SYSTEM SHALL support joins across predicted signals, broker executions, and realized performance data. #### Requirement 10.4 WHEN dashboards or research queries need drill-down capability THE SYSTEM SHALL provide traceability from analytical aggregates back to underlying documents, prompts, model outputs, and raw artifacts. ### 11. APIs and UI #### Requirement 11.1 WHEN a client requests company analytics THE SYSTEM SHALL expose APIs for document timelines, trend summaries, recommendation history, execution history, and evidence drill-down. #### Requirement 11.2 WHEN an operator inspects a recommendation THE SYSTEM SHALL display the contributing document intelligence objects, the raw sources used, and any market context features that influenced the decision. #### Requirement 11.3 WHEN a user reviews an order decision THE SYSTEM SHALL expose a full audit trail from ingestion through broker execution and eventual market outcome. ### 12. Observability and operations #### Requirement 12.1 WHEN a pipeline stage runs THE SYSTEM SHALL emit structured logs, metrics, and traces for ingestion, parsing, extraction, aggregation, analytics publication, and trading. #### Requirement 12.2 WHEN model performance degrades THE SYSTEM SHALL surface schema failure rates, latency percentiles, token usage estimates, and extraction retry counts. #### Requirement 12.3 WHEN source coverage changes materially THE SYSTEM SHALL alert operators about sustained source failures, symbol coverage gaps, or analytical publication lag. ## Non-Functional Requirements #### Requirement N1 WHEN the system processes documents and market events concurrently THE SYSTEM SHALL support horizontal scaling across Kubernetes workers. #### Requirement N2 WHEN the system stores model-derived conclusions THE SYSTEM SHALL preserve enough provenance to reproduce or challenge those conclusions later. #### Requirement N3 WHEN the system handles licensed or restricted content THE SYSTEM SHALL preserve source metadata, access policy, and retention policy for each artifact. #### Requirement N4 WHEN the system publishes analytical datasets THE SYSTEM SHALL ensure queryable partitions are written atomically or with an equivalent consistency guarantee. #### Requirement N5 WHEN trade execution is enabled THE SYSTEM SHALL prioritize fail-closed behavior over availability in ambiguous conditions. #### Requirement N6 WHEN dashboards query large historical datasets THE SYSTEM SHALL support partition pruning and index or metadata strategies that keep typical analyst queries responsive.