270 lines
13 KiB
Markdown
270 lines
13 KiB
Markdown
# Stonks Oracle - Requirements
|
|
|
|
## Overview
|
|
This feature builds an AI-assisted market intelligence, execution, and analytics platform for a Kubernetes-hosted environment. The platform ingests market symbols, licensed market data, company-specific news, regulatory filings, scraped web sources, and broker execution events; stores raw and normalized artifacts; extracts structured JSON with local Ollama models; computes trend and sentiment summaries; and optionally places trades through a broker integration.
|
|
|
|
The platform SHALL also maintain a local analytics lake on MinIO using Hive-compatible partitioned data, support Athena-like SQL querying over captured market and trade data, and expose QuickSight-like dashboards for research, review, and audit.
|
|
|
|
The initial release is focused on reliable ingestion, deterministic structured extraction, explainable trend scoring, paper trading safety, and internal analytics visibility.
|
|
|
|
## User Stories
|
|
- As an operator, I want to register companies, tickers, sectors, watchlists, and source rules so the system knows what to monitor.
|
|
- As an analyst, I want every raw article, filing, market snapshot, and scrape artifact preserved so I can audit downstream AI conclusions.
|
|
- As a data engineer, I want structured JSON extraction from each article and filing so downstream analytics are queryable.
|
|
- As a strategist, I want aggregated trend assessments per symbol, sector, and market regime so I can evaluate opportunities.
|
|
- As a trader, I want the system to generate explainable trade recommendations with explicit confidence, catalysts, and risk notes.
|
|
- As a risk owner, I want strict controls on automated trading so the system cannot place unsafe orders.
|
|
- As a quantitative reviewer, I want to query historical market data, AI predictions, and executed trades in one SQL-accessible analytics plane.
|
|
- As a dashboard user, I want QuickSight-like visualizations for performance, signal quality, prediction accuracy, and model behavior.
|
|
- As a platform owner, I want the system to run fully inside Kubernetes against local Ollama and self-hosted analytics components.
|
|
|
|
## Functional Requirements
|
|
|
|
### 1. Watchlist and source management
|
|
#### Requirement 1.1
|
|
WHEN an operator creates or updates a tracked company
|
|
THE SYSTEM SHALL persist the company profile including ticker, legal name, aliases, exchange, sector, industry, market cap bucket, and source configuration.
|
|
|
|
#### Requirement 1.2
|
|
WHEN an operator defines a source configuration for a company
|
|
THE SYSTEM SHALL support source types including market data APIs, news API feeds, SEC or investor relations URLs, company press release pages, earnings transcript sources, curated web pages, and broker-linked execution sources.
|
|
|
|
#### Requirement 1.3
|
|
WHEN a company has aliases, brands, or product names
|
|
THE SYSTEM SHALL use those aliases during source retrieval, de-duplication, entity matching, and extraction.
|
|
|
|
### 2. External API integrations
|
|
#### Requirement 2.1
|
|
WHEN the scheduler triggers a market ingestion cycle
|
|
THE SYSTEM SHALL fetch configured market data API results for tracked companies and persist raw response payloads.
|
|
|
|
#### Requirement 2.2
|
|
WHEN the scheduler triggers a news ingestion cycle
|
|
THE SYSTEM SHALL fetch configured news API results for tracked companies and persist raw response payloads.
|
|
|
|
#### Requirement 2.3
|
|
WHEN the scheduler triggers a regulatory ingestion cycle
|
|
THE SYSTEM SHALL fetch configured filing or issuer event data from authoritative sources such as SEC-style APIs and persist raw response payloads.
|
|
|
|
#### Requirement 2.4
|
|
WHEN trade automation is enabled
|
|
THE SYSTEM SHALL integrate with at least one broker API that supports paper trading, order placement, order status retrieval, positions, account balances, and execution events.
|
|
|
|
#### Requirement 2.5
|
|
WHEN external APIs enforce rate limits or quotas
|
|
THE SYSTEM SHALL coordinate request pacing, retries, and backoff across workers.
|
|
|
|
### 3. Ingestion and raw artifact retention
|
|
#### Requirement 3.1
|
|
WHEN a scraper retrieves an article, filing, or web page
|
|
THE SYSTEM SHALL store the raw HTML, rendered text, metadata, retrieval timestamp, and retrieval source in object storage.
|
|
|
|
#### Requirement 3.2
|
|
WHEN an article, filing, or market payload is ingested
|
|
THE SYSTEM SHALL generate a stable content hash and use it to prevent duplicate processing.
|
|
|
|
#### Requirement 3.3
|
|
WHEN the system stores a raw artifact
|
|
THE SYSTEM SHALL persist an associated metadata record containing symbol, source, URL when applicable, title, publication time, retrieval time, language when applicable, and content hash.
|
|
|
|
#### Requirement 3.4
|
|
WHEN content retrieval fails
|
|
THE SYSTEM SHALL record the failure reason, retry policy state, and next eligible retry time.
|
|
|
|
### 4. Parsing and normalization
|
|
#### Requirement 4.1
|
|
WHEN a raw article or filing enters the parsing stage
|
|
THE SYSTEM SHALL extract normalized text, author data when available, publisher, tags, mentioned entities, outbound links, and document type.
|
|
|
|
#### Requirement 4.2
|
|
WHEN the system detects boilerplate or repeated template text
|
|
THE SYSTEM SHALL reduce or remove boilerplate before AI extraction while retaining the original raw artifact for audit.
|
|
|
|
#### Requirement 4.3
|
|
WHEN the parser cannot confidently extract article body text
|
|
THE SYSTEM SHALL flag the document for low-quality extraction and prevent it from influencing downstream trading until reviewed or reprocessed.
|
|
|
|
### 5. AI article and document extraction
|
|
#### Requirement 5.1
|
|
WHEN a normalized article or filing is ready for AI extraction
|
|
THE SYSTEM SHALL send the document to a local Ollama model using structured output with an explicit JSON schema.
|
|
|
|
#### Requirement 5.2
|
|
WHEN the model returns extraction output
|
|
THE SYSTEM SHALL validate the response against the expected schema before saving it.
|
|
|
|
#### Requirement 5.3
|
|
WHEN extraction succeeds
|
|
THE SYSTEM SHALL produce a canonical document intelligence object with at minimum:
|
|
- document_id
|
|
- document_type
|
|
- source metadata
|
|
- tickers referenced
|
|
- companies referenced
|
|
- document summary
|
|
- sentiment by company
|
|
- catalyst type
|
|
- impact horizon
|
|
- key facts
|
|
- risks mentioned
|
|
- macro themes
|
|
- confidence score
|
|
- extraction warnings
|
|
- model metadata
|
|
|
|
#### Requirement 5.4
|
|
WHEN the model response is invalid, incomplete, or hallucinatory
|
|
THE SYSTEM SHALL retry extraction according to policy and preserve both the failed output and validation errors.
|
|
|
|
#### Requirement 5.5
|
|
WHEN a document is materially relevant to multiple companies
|
|
THE SYSTEM SHALL emit one shared document record and one or more per-company impact records.
|
|
|
|
### 6. Aggregation and trend analysis
|
|
#### Requirement 6.1
|
|
WHEN multiple document intelligence objects and market observations exist for a company
|
|
THE SYSTEM SHALL generate a rolling company trend summary over configurable windows including intraday, 1 day, 7 day, 30 day, and 90 day intervals.
|
|
|
|
#### Requirement 6.2
|
|
WHEN generating a company trend summary
|
|
THE SYSTEM SHALL consider sentiment, catalyst frequency, source credibility, recency decay, contradiction detection, document novelty, and current market context.
|
|
|
|
#### Requirement 6.3
|
|
WHEN generating a market-wide trend summary
|
|
THE SYSTEM SHALL aggregate company-level signals into sector and market-level summaries.
|
|
|
|
#### Requirement 6.4
|
|
WHEN contradictory signals exist across sources
|
|
THE SYSTEM SHALL represent disagreement explicitly rather than collapsing it into a single unsupported conclusion.
|
|
|
|
#### Requirement 6.5
|
|
WHEN a trend summary is produced
|
|
THE SYSTEM SHALL include explainability fields listing the top supporting and opposing evidence.
|
|
|
|
### 7. Trade recommendation generation
|
|
#### Requirement 7.1
|
|
WHEN a company trend summary is available
|
|
THE SYSTEM SHALL be able to generate a recommendation object containing action type, thesis, confidence, expected horizon, invalidation conditions, and cited evidence.
|
|
|
|
#### Requirement 7.2
|
|
WHEN a recommendation is generated
|
|
THE SYSTEM SHALL separate descriptive analysis from prescriptive trade action and include a risk classification.
|
|
|
|
#### Requirement 7.3
|
|
WHEN the system proposes a trade
|
|
THE SYSTEM SHALL attach position sizing guidance based on configured portfolio rules rather than unconstrained model output.
|
|
|
|
#### Requirement 7.4
|
|
WHEN the confidence or data quality falls below configured thresholds
|
|
THE SYSTEM SHALL suppress automated trade eligibility and mark the recommendation as informational only.
|
|
|
|
### 8. Trade execution and safety controls
|
|
#### Requirement 8.1
|
|
WHEN trade automation is enabled
|
|
THE SYSTEM SHALL support paper trading mode and live trading mode as separate execution environments.
|
|
|
|
#### Requirement 8.2
|
|
WHEN live trading mode is enabled
|
|
THE SYSTEM SHALL require operator approval controls, risk limits, and broker credential isolation.
|
|
|
|
#### Requirement 8.3
|
|
WHEN the system places an order
|
|
THE SYSTEM SHALL persist the full decision trace including signals used, prompt versions, model versions, thresholds, and broker response.
|
|
|
|
#### Requirement 8.4
|
|
WHEN a proposed order violates configured risk controls
|
|
THE SYSTEM SHALL reject the order before broker submission.
|
|
|
|
#### Requirement 8.5
|
|
WHEN a broker API is unavailable or partially fails
|
|
THE SYSTEM SHALL fail closed and SHALL NOT place duplicate or ambiguous orders.
|
|
|
|
### 9. Storage and queryability
|
|
#### Requirement 9.1
|
|
WHEN storing raw artifacts
|
|
THE SYSTEM SHALL use MinIO object storage as the system of record for HTML, text, API payloads, prompts, model outputs, and exported analytical datasets.
|
|
|
|
#### Requirement 9.2
|
|
WHEN storing normalized relational data
|
|
THE SYSTEM SHALL use PostgreSQL for companies, watchlists, article metadata, document intelligence objects, trends, recommendations, operational execution records, and control-plane state.
|
|
|
|
#### Requirement 9.3
|
|
WHEN low-latency coordination or caching is required
|
|
THE SYSTEM SHALL use Redis for job state, distributed locks, short-lived caches, and rate-limit coordination.
|
|
|
|
#### Requirement 9.4
|
|
WHEN historical analytical queries are needed
|
|
THE SYSTEM SHALL persist analytical fact datasets in Hive-compatible partitioned form on MinIO so that market data, predictions, and trade outcomes can be queried together.
|
|
|
|
#### Requirement 9.5
|
|
WHEN analytical table management is required
|
|
THE SYSTEM SHALL support a lakehouse table abstraction that permits append-only fact ingestion, partitioned queries, and schema evolution.
|
|
|
|
### 10. SQL analytics and dashboards
|
|
#### Requirement 10.1
|
|
WHEN a user or service executes an analytical query
|
|
THE SYSTEM SHALL provide an Athena-like SQL query service over MinIO-hosted analytical datasets.
|
|
|
|
#### Requirement 10.2
|
|
WHEN a dashboard user explores market, prediction, and trade data
|
|
THE SYSTEM SHALL expose QuickSight-like dashboards for performance, confidence, prediction accuracy, evidence coverage, and model behavior.
|
|
|
|
#### Requirement 10.3
|
|
WHEN analytical results combine AI outputs with executed trades and market outcomes
|
|
THE SYSTEM SHALL support joins across predicted signals, broker executions, and realized performance data.
|
|
|
|
#### Requirement 10.4
|
|
WHEN dashboards or research queries need drill-down capability
|
|
THE SYSTEM SHALL provide traceability from analytical aggregates back to underlying documents, prompts, model outputs, and raw artifacts.
|
|
|
|
### 11. APIs and UI
|
|
#### Requirement 11.1
|
|
WHEN a client requests company analytics
|
|
THE SYSTEM SHALL expose APIs for document timelines, trend summaries, recommendation history, execution history, and evidence drill-down.
|
|
|
|
#### Requirement 11.2
|
|
WHEN an operator inspects a recommendation
|
|
THE SYSTEM SHALL display the contributing document intelligence objects, the raw sources used, and any market context features that influenced the decision.
|
|
|
|
#### Requirement 11.3
|
|
WHEN a user reviews an order decision
|
|
THE SYSTEM SHALL expose a full audit trail from ingestion through broker execution and eventual market outcome.
|
|
|
|
### 12. Observability and operations
|
|
#### Requirement 12.1
|
|
WHEN a pipeline stage runs
|
|
THE SYSTEM SHALL emit structured logs, metrics, and traces for ingestion, parsing, extraction, aggregation, analytics publication, and trading.
|
|
|
|
#### Requirement 12.2
|
|
WHEN model performance degrades
|
|
THE SYSTEM SHALL surface schema failure rates, latency percentiles, token usage estimates, and extraction retry counts.
|
|
|
|
#### Requirement 12.3
|
|
WHEN source coverage changes materially
|
|
THE SYSTEM SHALL alert operators about sustained source failures, symbol coverage gaps, or analytical publication lag.
|
|
|
|
## Non-Functional Requirements
|
|
#### Requirement N1
|
|
WHEN the system processes documents and market events concurrently
|
|
THE SYSTEM SHALL support horizontal scaling across Kubernetes workers.
|
|
|
|
#### Requirement N2
|
|
WHEN the system stores model-derived conclusions
|
|
THE SYSTEM SHALL preserve enough provenance to reproduce or challenge those conclusions later.
|
|
|
|
#### Requirement N3
|
|
WHEN the system handles licensed or restricted content
|
|
THE SYSTEM SHALL preserve source metadata, access policy, and retention policy for each artifact.
|
|
|
|
#### Requirement N4
|
|
WHEN the system publishes analytical datasets
|
|
THE SYSTEM SHALL ensure queryable partitions are written atomically or with an equivalent consistency guarantee.
|
|
|
|
#### Requirement N5
|
|
WHEN trade execution is enabled
|
|
THE SYSTEM SHALL prioritize fail-closed behavior over availability in ambiguous conditions.
|
|
|
|
#### Requirement N6
|
|
WHEN dashboards query large historical datasets
|
|
THE SYSTEM SHALL support partition pruning and index or metadata strategies that keep typical analyst queries responsive.
|