Files
stonks-oracle/.kiro/specs/stonks-oracle/requirements.md
T

20 KiB

Stonks Oracle - Requirements

Overview

This feature builds an AI-assisted market intelligence, execution, and analytics platform for a Kubernetes-hosted environment. The platform ingests market symbols, licensed market data, company-specific news, regulatory filings, scraped web sources, and broker execution events; stores raw and normalized artifacts; extracts structured JSON with local Ollama models; computes trend and sentiment summaries; and optionally places trades through a broker integration.

The platform SHALL also maintain a local analytics lake on MinIO using Hive-compatible partitioned data, support Athena-like SQL querying over captured market and trade data, and expose QuickSight-like dashboards for research, review, and audit.

The initial release is focused on reliable ingestion, deterministic structured extraction, explainable trend scoring, paper trading safety, and internal analytics visibility.

User Stories

  • As an operator, I want to register companies, tickers, sectors, watchlists, and source rules so the system knows what to monitor.
  • As an analyst, I want every raw article, filing, market snapshot, and scrape artifact preserved so I can audit downstream AI conclusions.
  • As a data engineer, I want structured JSON extraction from each article and filing so downstream analytics are queryable.
  • As a strategist, I want aggregated trend assessments per symbol, sector, and market regime so I can evaluate opportunities.
  • As a trader, I want the system to generate explainable trade recommendations with explicit confidence, catalysts, and risk notes.
  • As a risk owner, I want strict controls on automated trading so the system cannot place unsafe orders.
  • As a quantitative reviewer, I want to query historical market data, AI predictions, and executed trades in one SQL-accessible analytics plane.
  • As a dashboard user, I want QuickSight-like visualizations for performance, signal quality, prediction accuracy, and model behavior.
  • As a platform owner, I want the system to run fully inside Kubernetes against local Ollama and self-hosted analytics components.

Functional Requirements

1. Watchlist and source management

Requirement 1.1

WHEN an operator creates or updates a tracked company THE SYSTEM SHALL persist the company profile including ticker, legal name, aliases, exchange, sector, industry, market cap bucket, and source configuration.

Requirement 1.2

WHEN an operator defines a source configuration for a company THE SYSTEM SHALL support source types including market data APIs, news API feeds, SEC or investor relations URLs, company press release pages, earnings transcript sources, curated web pages, and broker-linked execution sources.

Requirement 1.3

WHEN a company has aliases, brands, or product names THE SYSTEM SHALL use those aliases during source retrieval, de-duplication, entity matching, and extraction.

2. External API integrations

Requirement 2.1

WHEN the scheduler triggers a market ingestion cycle THE SYSTEM SHALL fetch configured market data API results for tracked companies and persist raw response payloads.

Requirement 2.2

WHEN the scheduler triggers a news ingestion cycle THE SYSTEM SHALL fetch configured news API results for tracked companies and persist raw response payloads.

Requirement 2.3

WHEN the scheduler triggers a regulatory ingestion cycle THE SYSTEM SHALL fetch configured filing or issuer event data from authoritative sources such as SEC-style APIs and persist raw response payloads.

Requirement 2.4

WHEN trade automation is enabled THE SYSTEM SHALL integrate with at least one broker API that supports paper trading, order placement, order status retrieval, positions, account balances, and execution events.

Requirement 2.5

WHEN external APIs enforce rate limits or quotas THE SYSTEM SHALL coordinate request pacing, retries, and backoff across workers.

3. Ingestion and raw artifact retention

Requirement 3.1

WHEN a scraper retrieves an article, filing, or web page THE SYSTEM SHALL store the raw HTML, rendered text, metadata, retrieval timestamp, and retrieval source in object storage.

Requirement 3.2

WHEN an article, filing, or market payload is ingested THE SYSTEM SHALL generate a stable content hash and use it to prevent duplicate processing.

Requirement 3.3

WHEN the system stores a raw artifact THE SYSTEM SHALL persist an associated metadata record containing symbol, source, URL when applicable, title, publication time, retrieval time, language when applicable, and content hash.

Requirement 3.4

WHEN content retrieval fails THE SYSTEM SHALL record the failure reason, retry policy state, and next eligible retry time.

4. Parsing and normalization

Requirement 4.1

WHEN a raw article or filing enters the parsing stage THE SYSTEM SHALL extract normalized text, author data when available, publisher, tags, mentioned entities, outbound links, and document type.

Requirement 4.2

WHEN the system detects boilerplate or repeated template text THE SYSTEM SHALL reduce or remove boilerplate before AI extraction while retaining the original raw artifact for audit.

Requirement 4.3

WHEN the parser cannot confidently extract article body text THE SYSTEM SHALL flag the document for low-quality extraction and prevent it from influencing downstream trading until reviewed or reprocessed.

5. AI article and document extraction

Requirement 5.1

WHEN a normalized article or filing is ready for AI extraction THE SYSTEM SHALL send the document to a local Ollama model using structured output with an explicit JSON schema.

Requirement 5.2

WHEN the model returns extraction output THE SYSTEM SHALL validate the response against the expected schema before saving it.

Requirement 5.3

WHEN extraction succeeds THE SYSTEM SHALL produce a canonical document intelligence object with at minimum:

  • document_id
  • document_type
  • source metadata
  • tickers referenced
  • companies referenced
  • document summary
  • sentiment by company
  • catalyst type
  • impact horizon
  • key facts
  • risks mentioned
  • macro themes
  • confidence score
  • extraction warnings
  • model metadata

Requirement 5.4

WHEN the model response is invalid, incomplete, or hallucinatory THE SYSTEM SHALL retry extraction according to policy and preserve both the failed output and validation errors.

Requirement 5.5

WHEN a document is materially relevant to multiple companies THE SYSTEM SHALL emit one shared document record and one or more per-company impact records.

6. Aggregation and trend analysis

Requirement 6.1

WHEN multiple document intelligence objects and market observations exist for a company THE SYSTEM SHALL generate a rolling company trend summary over configurable windows including intraday, 1 day, 7 day, 30 day, and 90 day intervals.

Requirement 6.2

WHEN generating a company trend summary THE SYSTEM SHALL consider sentiment, catalyst frequency, source credibility, recency decay, contradiction detection, document novelty, and current market context.

Requirement 6.3

WHEN generating a market-wide trend summary THE SYSTEM SHALL aggregate company-level signals into sector and market-level summaries.

Requirement 6.4

WHEN contradictory signals exist across sources THE SYSTEM SHALL represent disagreement explicitly rather than collapsing it into a single unsupported conclusion.

Requirement 6.5

WHEN a trend summary is produced THE SYSTEM SHALL include explainability fields listing the top supporting and opposing evidence.

7. Trade recommendation generation

Requirement 7.1

WHEN a company trend summary is available THE SYSTEM SHALL be able to generate a recommendation object containing action type, thesis, confidence, expected horizon, invalidation conditions, and cited evidence.

Requirement 7.2

WHEN a recommendation is generated THE SYSTEM SHALL separate descriptive analysis from prescriptive trade action and include a risk classification.

Requirement 7.3

WHEN the system proposes a trade THE SYSTEM SHALL attach position sizing guidance based on configured portfolio rules rather than unconstrained model output.

Requirement 7.4

WHEN the confidence or data quality falls below configured thresholds THE SYSTEM SHALL suppress automated trade eligibility and mark the recommendation as informational only.

8. Trade execution and safety controls

Requirement 8.1

WHEN trade automation is enabled THE SYSTEM SHALL support paper trading mode and live trading mode as separate execution environments.

Requirement 8.2

WHEN live trading mode is enabled THE SYSTEM SHALL require operator approval controls, risk limits, and broker credential isolation.

Requirement 8.3

WHEN the system places an order THE SYSTEM SHALL persist the full decision trace including signals used, prompt versions, model versions, thresholds, and broker response.

Requirement 8.4

WHEN a proposed order violates configured risk controls THE SYSTEM SHALL reject the order before broker submission.

Requirement 8.5

WHEN a broker API is unavailable or partially fails THE SYSTEM SHALL fail closed and SHALL NOT place duplicate or ambiguous orders.

9. Storage and queryability

Requirement 9.1

WHEN storing raw artifacts THE SYSTEM SHALL use MinIO object storage as the system of record for HTML, text, API payloads, prompts, model outputs, and exported analytical datasets.

Requirement 9.2

WHEN storing normalized relational data THE SYSTEM SHALL use PostgreSQL for companies, watchlists, article metadata, document intelligence objects, trends, recommendations, operational execution records, and control-plane state.

Requirement 9.3

WHEN low-latency coordination or caching is required THE SYSTEM SHALL use Redis for job state, distributed locks, short-lived caches, and rate-limit coordination.

Requirement 9.4

WHEN historical analytical queries are needed THE SYSTEM SHALL persist analytical fact datasets in Hive-compatible partitioned form on MinIO so that market data, predictions, and trade outcomes can be queried together.

Requirement 9.5

WHEN analytical table management is required THE SYSTEM SHALL support a lakehouse table abstraction that permits append-only fact ingestion, partitioned queries, and schema evolution.

10. SQL analytics and dashboards

Requirement 10.1

WHEN a user or service executes an analytical query THE SYSTEM SHALL provide an Athena-like SQL query service over MinIO-hosted analytical datasets.

Requirement 10.2

WHEN a dashboard user explores market, prediction, and trade data THE SYSTEM SHALL expose QuickSight-like dashboards for performance, confidence, prediction accuracy, evidence coverage, and model behavior.

Requirement 10.3

WHEN analytical results combine AI outputs with executed trades and market outcomes THE SYSTEM SHALL support joins across predicted signals, broker executions, and realized performance data.

Requirement 10.4

WHEN dashboards or research queries need drill-down capability THE SYSTEM SHALL provide traceability from analytical aggregates back to underlying documents, prompts, model outputs, and raw artifacts.

11. APIs and UI

Requirement 11.1

WHEN a client requests company analytics THE SYSTEM SHALL expose APIs for document timelines, trend summaries, recommendation history, execution history, and evidence drill-down.

Requirement 11.2

WHEN an operator inspects a recommendation THE SYSTEM SHALL display the contributing document intelligence objects, the raw sources used, and any market context features that influenced the decision.

Requirement 11.3

WHEN a user reviews an order decision THE SYSTEM SHALL expose a full audit trail from ingestion through broker execution and eventual market outcome.

12. Observability and operations

Requirement 12.1

WHEN a pipeline stage runs THE SYSTEM SHALL emit structured logs, metrics, and traces for ingestion, parsing, extraction, aggregation, analytics publication, and trading.

Requirement 12.2

WHEN model performance degrades THE SYSTEM SHALL surface schema failure rates, latency percentiles, token usage estimates, and extraction retry counts.

Requirement 12.3

WHEN source coverage changes materially THE SYSTEM SHALL alert operators about sustained source failures, symbol coverage gaps, or analytical publication lag.

13. Web dashboard and control plane UI

Requirement 13.1

User Story: As an operator, I want a web-based dashboard so that I can view and control every aspect of the platform without using curl or raw API calls.

Acceptance Criteria
  1. WHEN an operator opens the dashboard URL, THE Dashboard SHALL render a navigation layout with sidebar links to all major sections including companies, documents, trends, recommendations, orders, positions, trading controls, pipeline health, source management, analytics explorer, and system settings.
  2. WHEN the dashboard loads, THE Dashboard SHALL authenticate against the Query API and display the current system health status.

Requirement 13.2

User Story: As an operator, I want to manage companies, watchlists, aliases, and sources through the dashboard so that I can configure what the platform monitors.

Acceptance Criteria
  1. WHEN an operator navigates to the companies section, THE Dashboard SHALL display a searchable, sortable table of all tracked companies with ticker, name, sector, active status, and source count.
  2. WHEN an operator clicks a company row, THE Dashboard SHALL display a detail view with editable fields for sector, industry, market cap bucket, and active toggle, plus tabs for aliases, sources, and document history.
  3. WHEN an operator adds or edits a source for a company, THE Dashboard SHALL present a form with source type selection, configuration fields, credibility score slider, retention days, and access policy.
  4. WHEN an operator manages watchlists, THE Dashboard SHALL provide create, list, and member management views with drag-and-drop or multi-select company assignment.

Requirement 13.3

User Story: As an analyst, I want to browse documents, intelligence extractions, and trend summaries through the dashboard so that I can review what the AI is producing.

Acceptance Criteria
  1. WHEN an analyst navigates to the documents section, THE Dashboard SHALL display a filterable timeline of documents with columns for title, type, source, ticker mentions, published date, parse quality, and extraction status.
  2. WHEN an analyst clicks a document, THE Dashboard SHALL display the full document detail including intelligence extraction, company impacts, sentiment scores, key facts, risks, and links to raw artifacts in MinIO.
  3. WHEN an analyst navigates to the trends section, THE Dashboard SHALL display trend summaries per company with direction indicators, strength bars, confidence scores, contradiction scores, and expandable evidence lists.
  4. WHEN an analyst drills into a trend, THE Dashboard SHALL display the full evidence chain linking trend windows to contributing documents, intelligence objects, and raw sources.

Requirement 13.4

User Story: As a trader, I want to view recommendations, orders, positions, and PnL through the dashboard so that I can monitor trading activity and audit decisions.

Acceptance Criteria
  1. WHEN a trader navigates to the recommendations section, THE Dashboard SHALL display a filterable list with ticker, action, mode, confidence, thesis preview, and generation timestamp.
  2. WHEN a trader clicks a recommendation, THE Dashboard SHALL display the full evidence drill-down including contributing documents, intelligence objects, risk evaluation, and any linked orders.
  3. WHEN a trader navigates to the orders section, THE Dashboard SHALL display order history with status badges, fill information, and expandable audit trails.
  4. WHEN a trader navigates to the positions section, THE Dashboard SHALL display current positions with unrealized and realized PnL, entry prices, and current prices.

Requirement 13.5

User Story: As a risk owner, I want to manage trading mode, risk configuration, operator approvals, and symbol lockouts through the dashboard so that I can control execution safety.

Acceptance Criteria
  1. WHEN a risk owner navigates to trading controls, THE Dashboard SHALL display the current trading mode with a toggle between paper, live, and disabled modes, with a confirmation dialog for mode changes.
  2. WHEN a risk owner views pending approvals, THE Dashboard SHALL display a queue of pending operator approval requests with ticker, side, quantity, estimated value, and approve/reject buttons.
  3. WHEN a risk owner edits risk configuration, THE Dashboard SHALL present a form for max position size, daily loss cap, sector exposure limits, and cooldown periods.
  4. WHEN a risk owner views lockouts, THE Dashboard SHALL display active symbol lockouts with type, reason, and expiration time.

Requirement 13.6

User Story: As a platform owner, I want DevOps dashboards showing pipeline health, ingestion throughput, model performance, and source coverage so that I can monitor system operations.

Acceptance Criteria
  1. WHEN a platform owner navigates to the pipeline health section, THE Dashboard SHALL display document counts at each processing stage, parsing quality distribution, extraction validation rates, and trend generation counts.
  2. WHEN a platform owner views ingestion throughput, THE Dashboard SHALL display time-series charts of ingestion runs, items fetched, success and failure rates, broken down by source type and time bucket.
  3. WHEN a platform owner views model performance, THE Dashboard SHALL display success rates, latency percentiles, retry rates, confidence distributions, and recent extraction failures with error details.
  4. WHEN a platform owner views source coverage, THE Dashboard SHALL display a matrix of companies versus source types showing coverage gaps and stale sources.

Requirement 13.7

User Story: As an analyst, I want an interactive SQL query explorer in the dashboard so that I can run ad-hoc queries against the lakehouse without needing a separate Trino client.

Acceptance Criteria
  1. WHEN an analyst opens the analytics explorer, THE Dashboard SHALL display a SQL editor with syntax highlighting, auto-complete for table and column names, and an execute button.
  2. WHEN an analyst executes a query, THE Dashboard SHALL display results in a paginated, sortable table with column type indicators and row count.
  3. WHEN an analyst wants to visualize query results, THE Dashboard SHALL provide chart type selection including line, bar, scatter, pie, and heatmap with configurable axis mappings.
  4. WHEN an analyst saves a query, THE Dashboard SHALL persist the query with a name and description for later reuse, and provide a saved queries list.

Requirement 13.8

User Story: As a dashboard user, I want pre-built analytical dashboards for market intelligence review so that I can quickly assess signal quality and trading performance.

Acceptance Criteria
  1. WHEN a user opens the dashboards section, THE Dashboard SHALL display a gallery of pre-built dashboards including symbol overview, sentiment heatmap, prediction accuracy, paper trading PnL, and model quality.
  2. WHEN a user opens a pre-built dashboard, THE Dashboard SHALL render interactive charts with date range selectors, ticker filters, and drill-down capability.
  3. WHEN a user interacts with a chart, THE Dashboard SHALL support click-through navigation from aggregated metrics to underlying detail views.

Non-Functional Requirements

Requirement N1

WHEN the system processes documents and market events concurrently THE SYSTEM SHALL support horizontal scaling across Kubernetes workers.

Requirement N2

WHEN the system stores model-derived conclusions THE SYSTEM SHALL preserve enough provenance to reproduce or challenge those conclusions later.

Requirement N3

WHEN the system handles licensed or restricted content THE SYSTEM SHALL preserve source metadata, access policy, and retention policy for each artifact.

Requirement N4

WHEN the system publishes analytical datasets THE SYSTEM SHALL ensure queryable partitions are written atomically or with an equivalent consistency guarantee.

Requirement N5

WHEN trade execution is enabled THE SYSTEM SHALL prioritize fail-closed behavior over availability in ambiguous conditions.

Requirement N6

WHEN dashboards query large historical datasets THE SYSTEM SHALL support partition pruning and index or metadata strategies that keep typical analyst queries responsive.