phase 0+1: project scaffold, k8s manifests, CI pipeline, steering, hooks, tests

- Repository structure for all services, infra, lakehouse, dashboards - K8s manifests targeting stonks-oracle namespace with GHCR images - Ingress via Traefik with ca-issuer TLS for internal services - ConfigMap wired to existing cluster services (pg, redis, minio, ollama) - GitHub Actions workflow for lint, test, multi-service container builds - Dockerfile with build-arg CMD per service - Makefile for local build/push/deploy - Steering rules for TDD workflow, K8s conventions, project context - Agent hooks for lint-on-save, test-on-save, k8s-validate, phase-commit - Ruff linter config, all lint issues fixed - 14 passing tests for schemas, config, redis keys - PostgreSQL migrations, Trino catalogs, Superset config, MinIO lifecycle
2026-04-11 03:25:08 -07:00
parent 8cfc4f423b
commit ebea70573b
90 changed files with 3590 additions and 19 deletions
@@ -0,0 +1,14 @@
+---
+name: Lint Python on Save
+description: Run ruff linter when any Python file is saved
+version: "1.0"
+trigger:
+  type: onSave
+  filePattern: "**/*.py"
+---
+
+When any Python file is saved:
+
+1. Run `ruff check {filePath}` on the saved file
+2. If there are fixable issues, run `ruff check --fix {filePath}` to auto-fix
+3. Report any remaining issues concisely
@@ -0,0 +1,15 @@
+---
+name: Phase Commit and Push
+description: Commit and push after completing a spec phase task
+version: "1.0"
+trigger:
+  type: manual
+---
+
+When triggered manually after completing a phase:
+
+1. Run `git add -A`
+2. Ask the user for a commit message, suggesting format: `phase N: short description`
+3. Run `git commit -m "{message}"`
+4. Run `git push origin main`
+5. Report the commit SHA and confirm push succeeded
@@ -0,0 +1,16 @@
+---
+name: Run Tests on Save
+description: Automatically run relevant tests when a Python service file is saved
+version: "1.0"
+trigger:
+  type: onSave
+  filePattern: "services/**/*.py"
+---
+
+When a Python file under `services/` is saved:
+
+1. Identify which service module was modified (e.g. `services/ingestion/worker.py` → `ingestion`)
+2. Look for corresponding tests in `tests/` matching the service name
+3. Run `pytest tests/test_{service_name}*.py -x --tb=short -q` if test files exist
+4. If no specific test file exists, run `ruff check` on the modified file to catch syntax/lint issues
+5. Report results concisely — only show failures or a one-line success confirmation
@@ -0,0 +1,16 @@
+---
+name: Validate K8s Manifests
+description: Validate Kubernetes YAML when manifest files are saved
+version: "1.0"
+trigger:
+  type: onSave
+  filePattern: "infra/k8s/**/*.yaml"
+---
+
+When a Kubernetes manifest YAML file is saved:
+
+1. Parse the YAML to check for syntax errors
+2. Verify required fields exist (apiVersion, kind, metadata)
+3. Check that namespace is set to `stonks-oracle` for application resources
+4. Verify image references point to `ghcr.io/celesrenata/stonks-oracle/`
+5. Report any issues found
@@ -0,0 +1,480 @@
+# Stonks Oracle - Design
+
+## 1. Purpose
+Stonks Oracle is a Kubernetes-native AI market intelligence and trading platform. It ingests structured market data, company news, filings, and curated web content; preserves raw artifacts in MinIO; extracts structured intelligence objects with local Ollama models; aggregates signals into trend and recommendation outputs; optionally executes trades through a broker integration; and publishes historical datasets into a local lakehouse for Athena-like querying and QuickSight-like dashboards.
+
+This design prioritizes:
+- deterministic data contracts
+- auditability of every AI-derived conclusion
+- safe paper-trading-first automation
+- self-hosted analytics on MinIO-backed datasets
+- clear separation between operational state and analytical state
+
+## 2. Architecture Summary
+The platform is split into two planes:
+
+### 2.1 Operational plane
+Handles ingestion, parsing, structured extraction, signal generation, risk evaluation, trade execution, and control APIs.
+
+Primary stores:
+- PostgreSQL for operational state and transactional records
+- Redis for queues, locks, and hot cache state
+- MinIO for raw artifacts, prompts, model outputs, and exported datasets
+
+### 2.2 Analytical plane
+Handles historical fact storage, SQL query access, research, scorecards, and dashboards.
+
+Primary components:
+- MinIO as S3-compatible object store
+- Hive-compatible partition layout for query compatibility
+- Iceberg tables as the preferred lakehouse abstraction for managed analytical datasets
+- Trino as the Athena-like SQL query engine
+- Apache Superset as the QuickSight-like dashboard and exploration layer
+
+## 3. External Integrations
+
+### 3.1 Market Data API
+Used for:
+- quotes
+- OHLCV bars
+- reference data
+- corporate actions
+- earnings calendars
+- optional market news or fundamentals
+
+### 3.2 News API
+Used for:
+- company-linked headlines
+- publisher metadata
+- article URLs
+- article summaries when licensed
+
+### 3.3 Filings / Regulatory API
+Used for:
+- SEC-style company submissions
+- 8-K, 10-Q, 10-K, and related filings
+- structured issuer event discovery
+
+### 3.4 Web Scraper
+Used for:
+- full article body retrieval when API content is partial
+- investor relations pages
+- curated press release sources
+- transcript or presentation retrieval when permitted
+
+### 3.5 Broker API
+Used for:
+- paper-trading simulation or sandbox trading
+- live order submission when enabled
+- order acknowledgements and rejections
+- fills and cancellations
+- positions and account balances
+
+## 4. Logical Components
+
+### 4.1 Symbol Registry Service
+Responsibilities:
+- manage companies, aliases, watchlists, sectors, and source configurations
+- manage source trust or credibility policies
+- manage symbol-to-document matching rules
+
+### 4.2 Scheduler / Orchestrator
+Responsibilities:
+- trigger market, news, filings, and scrape jobs
+- manage polling cadences by source class
+- coordinate backoff, retries, and dedupe windows
+- publish downstream jobs to workers
+
+### 4.3 Ingestion Adapters
+Subcomponents:
+- Market data adapter
+- News API adapter
+- Filings adapter
+- Broker event adapter
+
+Responsibilities:
+- fetch external payloads
+- preserve raw responses in MinIO
+- normalize metadata into PostgreSQL
+- emit processing jobs for parsing or publication
+
+### 4.4 Scraper / Parser Service
+Responsibilities:
+- fetch and render source pages
+- extract normalized text and metadata
+- reduce boilerplate and duplicated template text
+- score parser quality and extraction confidence
+- persist normalized artifacts
+
+### 4.5 Ollama Extraction Service
+Responsibilities:
+- call local Ollama models using schema-constrained JSON output
+- produce canonical document intelligence objects
+- preserve prompts, schemas, model metadata, and raw outputs
+- validate schema and semantic consistency
+- retry invalid generations under policy
+
+### 4.6 Aggregation Engine
+Responsibilities:
+- combine document intelligence with market context
+- compute rolling trend summaries by company, sector, and market
+- track contradiction and agreement signals
+- score evidence with recency decay and source weighting
+
+### 4.7 Recommendation Engine
+Responsibilities:
+- generate explainable recommendation objects from aggregated evidence
+- separate deterministic eligibility scoring from final action mapping
+- produce suggested action, thesis, horizon, and invalidation conditions
+- publish analytical prediction facts to the lake
+
+### 4.8 Risk Engine
+Responsibilities:
+- enforce guardrails such as max position size, daily loss cap, exposure by sector, symbol cooldowns, news shock lockouts, and operator approval rules
+- determine whether a recommendation is eligible for paper or live execution
+- block ambiguous or unsafe orders before broker submission
+
+### 4.9 Broker Adapter
+Responsibilities:
+- abstract one or more trading APIs
+- support paper mode and live mode
+- record submission, acknowledgement, rejection, fill, and cancellation events
+- guarantee idempotent order submission keys
+- publish order and fill facts to both PostgreSQL and the analytical lake
+
+### 4.10 Lake Publisher
+Responsibilities:
+- transform operational records into analytics-friendly fact datasets
+- publish append-only partitioned tables to MinIO
+- maintain Iceberg metadata or equivalent lakehouse metadata
+- expose datasets such as predictions, outcomes, fills, bars, and PnL
+
+### 4.11 Query API / Dashboard
+Responsibilities:
+- expose companies, documents, trends, recommendations, and orders
+- provide evidence drill-down and audit views
+- provide operator controls for live-trading enablement and review queues
+- expose links into analytical dashboards and query tools
+
+### 4.12 SQL Query Engine and BI Layer
+Components:
+- Trino coordinator and workers
+- Hive Metastore or Iceberg catalog service
+- Apache Superset
+
+Responsibilities:
+- provide Athena-like SQL access to MinIO-hosted tables
+- support dashboard datasets and ad hoc exploration
+- support joins between market facts, AI predictions, and executed trades
+#
+# 5. Storage Model
+
+### 5.1 Operational stores
+#### PostgreSQL
+Used for:
+- companies and aliases
+- watchlists and source configs
+- article and filing metadata
+- document intelligence objects
+- trend summaries
+- recommendations
+- risk evaluations
+- orders and execution events
+- control-plane state and audit records
+
+#### Redis
+Used for:
+- distributed locks for symbol-source retrieval
+- ingestion rate-limit counters
+- job queue state
+- retry backoff state
+- dedupe markers
+- cache for hot API and dashboard views
+
+#### MinIO object storage
+Used for:
+- raw API payloads
+- raw article HTML and normalized text
+- prompts, schemas, and raw model results
+- exported analytical datasets
+- audit traces and reproducibility bundles
+
+### 5.2 MinIO bucket layout
+Recommended buckets:
+- `stonks-raw-market` — raw market API payloads
+- `stonks-raw-news` — raw news API payloads and article HTML
+- `stonks-raw-filings` — raw filings and issuer event payloads
+- `stonks-normalized` — cleaned text and parser outputs
+- `stonks-llm-prompts` — prompts and schemas used
+- `stonks-llm-results` — raw model outputs and validation reports
+- `stonks-lakehouse` — partitioned analytical datasets and table metadata
+- `stonks-audit` — execution traces and exported reports
+
+Suggested raw object path pattern:
+```text
+/{stage}/{symbol}/{yyyy}/{mm}/{dd}/{document_id}/{artifact_type}.json
+/{stage}/{symbol}/{yyyy}/{mm}/{dd}/{document_id}/{artifact_type}.html
+```
+
+Suggested analytical path pattern:
+```text
+/warehouse/{table_name}/dt={yyyy-mm-dd}/symbol={ticker}/part-*.parquet
+```
+
+### 5.3 Lakehouse model
+Preferred design:
+- Parquet files stored in MinIO
+- Hive-compatible partitioning for interoperability
+- Iceberg table metadata for managed analytical tables
+- Trino catalogs for SQL access
+
+Rationale:
+- Hive-compatible layouts preserve broad engine compatibility
+- Iceberg improves schema evolution, partition handling, and table maintenance
+- Trino can query MinIO-backed object storage and supports both Hive and Iceberg catalogs
+
+## 6. Data Model
+
+### 6.1 PostgreSQL schema outline
+Core tables:
+- `companies`
+- `company_aliases`
+- `watchlists`
+- `watchlist_members`
+- `sources`
+- `api_credentials_refs`
+- `ingestion_runs`
+- `market_snapshots`
+- `documents`
+- `document_versions`
+- `document_company_mentions`
+- `document_intelligence`
+- `document_impact_records`
+- `trend_windows`
+- `recommendations`
+- `recommendation_evidence`
+- `risk_evaluations`
+- `broker_accounts`
+- `orders`
+- `order_events`
+- `positions`
+- `audit_events`
+
+### 6.2 Article or document metadata record
+```json
+{
+  "document_id": "uuid",
+  "document_type": "article|filing|transcript|press_release",
+  "symbol_candidates": ["AAPL", "MSFT"],
+  "source_type": "news_api",
+  "publisher": "string",
+  "url": "string",
+  "canonical_url": "string",
+  "title": "string",
+  "published_at": "2026-04-09T00:00:00Z",
+  "retrieved_at": "2026-04-09T00:00:00Z",
+  "language": "en",
+  "content_hash": "sha256",
+  "storage_refs": {
+    "raw_html": "s3://...",
+    "raw_payload": "s3://..."
+  }
+}
+```
+
+### 6.3 Document intelligence schema
+```json
+{
+  "document_id": "uuid",
+  "summary": "string",
+  "companies": [
+    {
+      "ticker": "AAPL",
+      "company_name": "Apple Inc.",
+      "relevance": 0.95,
+      "sentiment": "positive",
+      "impact_score": 0.71,
+      "impact_horizon": "1d_30d",
+      "catalyst_type": "earnings|product|legal|macro|supply_chain|m_and_a|rating_change|other",
+      "key_facts": ["string"],
+      "risks": ["string"],
+      "evidence_spans": ["string"]
+    }
+  ],
+  "macro_themes": ["rates", "ai_capex"],
+  "novelty_score": 0.64,
+  "source_credibility": 0.8,
+  "extraction_warnings": ["ambiguous_ticker_reference"],
+  "confidence": 0.86,
+  "model": {
+    "provider": "ollama",
+    "model_name": "gpt-oss:20b",
+    "prompt_version": "document-intel-v2",
+    "schema_version": "2.0.0"
+  }
+}
+```
+
+### 6.4 Trend summary schema
+```json
+{
+  "entity_type": "company",
+  "entity_id": "AAPL",
+  "window": "7d",
+  "trend_direction": "bullish|bearish|mixed|neutral",
+  "trend_strength": 0.68,
+  "confidence": 0.74,
+  "top_supporting_evidence": ["document_id_1", "document_id_2"],
+  "top_opposing_evidence": ["document_id_3"],
+  "dominant_catalysts": ["product", "analyst_rating"],
+  "material_risks": ["regulatory scrutiny"],
+  "contradiction_score": 0.22
+}
+```
+
+### 6.5 Recommendation schema
+```json
+{
+  "recommendation_id": "uuid",
+  "ticker": "AAPL",
+  "action": "buy|sell|hold|watch",
+  "mode": "informational|paper_eligible|live_eligible",
+  "confidence": 0.72,
+  "time_horizon": "swing_1d_10d",
+  "thesis": "string",
+  "invalidation_conditions": ["string"],
+  "position_sizing": {
+    "portfolio_pct": 0.02,
+    "max_loss_pct": 0.005
+  },
+  "evidence_refs": ["document_id_1", "document_id_2"],
+  "model_metadata": {
+    "version": "recommendation-v1"
+  }
+}
+```
+
+## 7. Analytical Lake Datasets
+The analytical plane should expose the following logical fact tables:
+- `lake.market_bars`
+- `lake.market_quotes`
+- `lake.company_events`
+- `lake.documents`
+- `lake.document_extractions`
+- `lake.trade_signals`
+- `lake.trade_orders`
+- `lake.trade_fills`
+- `lake.positions_daily`
+- `lake.pnl_daily`
+- `lake.prediction_vs_outcome`
+
+Recommended partitioning examples:
+- market data: partition by `dt`, optional symbol transform later
+- documents: partition by `dt` and maybe `source_type`
+- predictions: partition by `dt` and `model_version`
+- fills and PnL: partition by `dt` and broker account
+
+## 8. Data Flows
+
+### 8.1 Market and document ingestion flow
+1. Scheduler selects due symbols and sources.
+2. Adapters fetch market, news, and filings payloads.
+3. Raw payloads are written to MinIO.
+4. Metadata records are written to PostgreSQL.
+5. New documents are emitted to parser jobs.
+
+### 8.2 Extraction flow
+1. Parser produces normalized text and confidence score.
+2. Extraction worker sends document to Ollama with schema-bound output.
+3. Validator checks schema and semantic consistency.
+4. Canonical intelligence object is stored in PostgreSQL and MinIO.
+5. Aggregation jobs are triggered for impacted symbols.
+
+### 8.3 Recommendation and trade flow
+1. Aggregation engine updates trend windows.
+2. Recommendation engine emits a recommendation object.
+3. Risk engine determines eligibility and allowed execution mode.
+4. Broker adapter places paper or live orders when authorized.
+5. Broker events update PostgreSQL and publish analytical facts to the lake.
+
+### 8.4 Lake publication flow
+1. Operational records are transformed into analytical facts.
+2. Facts are written as partitioned Parquet files to MinIO.
+3. Table metadata is updated through Iceberg or equivalent catalog operations.
+4. Trino exposes the datasets for SQL.
+5. Superset uses Trino datasets for dashboards and ad hoc exploration.
+
+## 9. Query and Dashboard Surface
+
+### 9.1 Operational API
+Should expose:
+- company and watchlist configuration
+- source health and job state
+- document timelines and evidence
+- recommendation history
+- order history and audit trail
+- risk configuration and trading mode
+
+### 9.2 Analytical surface
+Should expose:
+- SQL access through Trino
+- dashboard datasets in Superset
+- scorecards for prediction accuracy and PnL
+- evidence-to-outcome drill-down views
+- model performance and extraction failure dashboards
+
+Suggested starter dashboards:
+- symbol overview
+- market sentiment heatmap
+- prediction confidence vs realized move
+- paper trading PnL
+- model extraction quality
+- source coverage and ingestion lag
+
+## 10. Reliability and Safety
+- Broker submission must be idempotent.
+- Live trading must be disabled by default.
+- Paper trading must be the first enabled execution mode.
+- Invalid model output must not advance to trade execution.
+- Low-quality document extraction must not influence live trading.
+- All analytical publication jobs should be replayable.
+- Every recommendation and order should be reproducible from saved prompts, source refs, and model metadata.
+
+## 11. Deployment Notes
+Recommended Kubernetes workloads:
+- `symbol-registry-api`
+- `scheduler`
+- `market-adapter-worker`
+- `news-adapter-worker`
+- `filings-adapter-worker`
+- `scraper-worker`
+- `parser-worker`
+- `ollama-extractor-worker`
+- `aggregation-worker`
+- `recommendation-worker`
+- `risk-engine-api`
+- `broker-adapter`
+- `lake-publisher`
+- `trino-coordinator`
+- `trino-worker`
+- `superset-web`
+- `postgres`
+- `redis`
+- `minio`
+
+## 12. Deliberate Scope Boundaries for v1
+Included in v1:
+- tracked watchlists
+- market, news, filings, and broker integrations
+- Ollama structured extraction
+- trend aggregation and recommendation objects
+- paper trading with strict controls
+- MinIO-backed analytics lake
+- Trino and Superset self-hosted analytics
+
+Deferred from v1:
+- options trading
+- full order book or tick-level market microstructure
+- online model retraining
+- fully autonomous live trading with no approval workflow
+- advanced portfolio optimization beyond basic sizing and risk caps
@@ -0,0 +1,269 @@
+# Stonks Oracle - Requirements
+
+## Overview
+This feature builds an AI-assisted market intelligence, execution, and analytics platform for a Kubernetes-hosted environment. The platform ingests market symbols, licensed market data, company-specific news, regulatory filings, scraped web sources, and broker execution events; stores raw and normalized artifacts; extracts structured JSON with local Ollama models; computes trend and sentiment summaries; and optionally places trades through a broker integration.
+
+The platform SHALL also maintain a local analytics lake on MinIO using Hive-compatible partitioned data, support Athena-like SQL querying over captured market and trade data, and expose QuickSight-like dashboards for research, review, and audit.
+
+The initial release is focused on reliable ingestion, deterministic structured extraction, explainable trend scoring, paper trading safety, and internal analytics visibility.
+
+## User Stories
+- As an operator, I want to register companies, tickers, sectors, watchlists, and source rules so the system knows what to monitor.
+- As an analyst, I want every raw article, filing, market snapshot, and scrape artifact preserved so I can audit downstream AI conclusions.
+- As a data engineer, I want structured JSON extraction from each article and filing so downstream analytics are queryable.
+- As a strategist, I want aggregated trend assessments per symbol, sector, and market regime so I can evaluate opportunities.
+- As a trader, I want the system to generate explainable trade recommendations with explicit confidence, catalysts, and risk notes.
+- As a risk owner, I want strict controls on automated trading so the system cannot place unsafe orders.
+- As a quantitative reviewer, I want to query historical market data, AI predictions, and executed trades in one SQL-accessible analytics plane.
+- As a dashboard user, I want QuickSight-like visualizations for performance, signal quality, prediction accuracy, and model behavior.
+- As a platform owner, I want the system to run fully inside Kubernetes against local Ollama and self-hosted analytics components.
+
+## Functional Requirements
+
+### 1. Watchlist and source management
+#### Requirement 1.1
+WHEN an operator creates or updates a tracked company
+THE SYSTEM SHALL persist the company profile including ticker, legal name, aliases, exchange, sector, industry, market cap bucket, and source configuration.
+
+#### Requirement 1.2
+WHEN an operator defines a source configuration for a company
+THE SYSTEM SHALL support source types including market data APIs, news API feeds, SEC or investor relations URLs, company press release pages, earnings transcript sources, curated web pages, and broker-linked execution sources.
+
+#### Requirement 1.3
+WHEN a company has aliases, brands, or product names
+THE SYSTEM SHALL use those aliases during source retrieval, de-duplication, entity matching, and extraction.
+
+### 2. External API integrations
+#### Requirement 2.1
+WHEN the scheduler triggers a market ingestion cycle
+THE SYSTEM SHALL fetch configured market data API results for tracked companies and persist raw response payloads.
+
+#### Requirement 2.2
+WHEN the scheduler triggers a news ingestion cycle
+THE SYSTEM SHALL fetch configured news API results for tracked companies and persist raw response payloads.
+
+#### Requirement 2.3
+WHEN the scheduler triggers a regulatory ingestion cycle
+THE SYSTEM SHALL fetch configured filing or issuer event data from authoritative sources such as SEC-style APIs and persist raw response payloads.
+
+#### Requirement 2.4
+WHEN trade automation is enabled
+THE SYSTEM SHALL integrate with at least one broker API that supports paper trading, order placement, order status retrieval, positions, account balances, and execution events.
+
+#### Requirement 2.5
+WHEN external APIs enforce rate limits or quotas
+THE SYSTEM SHALL coordinate request pacing, retries, and backoff across workers.
+
+### 3. Ingestion and raw artifact retention
+#### Requirement 3.1
+WHEN a scraper retrieves an article, filing, or web page
+THE SYSTEM SHALL store the raw HTML, rendered text, metadata, retrieval timestamp, and retrieval source in object storage.
+
+#### Requirement 3.2
+WHEN an article, filing, or market payload is ingested
+THE SYSTEM SHALL generate a stable content hash and use it to prevent duplicate processing.
+
+#### Requirement 3.3
+WHEN the system stores a raw artifact
+THE SYSTEM SHALL persist an associated metadata record containing symbol, source, URL when applicable, title, publication time, retrieval time, language when applicable, and content hash.
+
+#### Requirement 3.4
+WHEN content retrieval fails
+THE SYSTEM SHALL record the failure reason, retry policy state, and next eligible retry time.
+
+### 4. Parsing and normalization
+#### Requirement 4.1
+WHEN a raw article or filing enters the parsing stage
+THE SYSTEM SHALL extract normalized text, author data when available, publisher, tags, mentioned entities, outbound links, and document type.
+
+#### Requirement 4.2
+WHEN the system detects boilerplate or repeated template text
+THE SYSTEM SHALL reduce or remove boilerplate before AI extraction while retaining the original raw artifact for audit.
+
+#### Requirement 4.3
+WHEN the parser cannot confidently extract article body text
+THE SYSTEM SHALL flag the document for low-quality extraction and prevent it from influencing downstream trading until reviewed or reprocessed.
+
+### 5. AI article and document extraction
+#### Requirement 5.1
+WHEN a normalized article or filing is ready for AI extraction
+THE SYSTEM SHALL send the document to a local Ollama model using structured output with an explicit JSON schema.
+
+#### Requirement 5.2
+WHEN the model returns extraction output
+THE SYSTEM SHALL validate the response against the expected schema before saving it.
+
+#### Requirement 5.3
+WHEN extraction succeeds
+THE SYSTEM SHALL produce a canonical document intelligence object with at minimum:
+- document_id
+- document_type
+- source metadata
+- tickers referenced
+- companies referenced
+- document summary
+- sentiment by company
+- catalyst type
+- impact horizon
+- key facts
+- risks mentioned
+- macro themes
+- confidence score
+- extraction warnings
+- model metadata
+
+#### Requirement 5.4
+WHEN the model response is invalid, incomplete, or hallucinatory
+THE SYSTEM SHALL retry extraction according to policy and preserve both the failed output and validation errors.
+
+#### Requirement 5.5
+WHEN a document is materially relevant to multiple companies
+THE SYSTEM SHALL emit one shared document record and one or more per-company impact records.
+
+### 6. Aggregation and trend analysis
+#### Requirement 6.1
+WHEN multiple document intelligence objects and market observations exist for a company
+THE SYSTEM SHALL generate a rolling company trend summary over configurable windows including intraday, 1 day, 7 day, 30 day, and 90 day intervals.
+
+#### Requirement 6.2
+WHEN generating a company trend summary
+THE SYSTEM SHALL consider sentiment, catalyst frequency, source credibility, recency decay, contradiction detection, document novelty, and current market context.
+
+#### Requirement 6.3
+WHEN generating a market-wide trend summary
+THE SYSTEM SHALL aggregate company-level signals into sector and market-level summaries.
+
+#### Requirement 6.4
+WHEN contradictory signals exist across sources
+THE SYSTEM SHALL represent disagreement explicitly rather than collapsing it into a single unsupported conclusion.
+
+#### Requirement 6.5
+WHEN a trend summary is produced
+THE SYSTEM SHALL include explainability fields listing the top supporting and opposing evidence.
+
+### 7. Trade recommendation generation
+#### Requirement 7.1
+WHEN a company trend summary is available
+THE SYSTEM SHALL be able to generate a recommendation object containing action type, thesis, confidence, expected horizon, invalidation conditions, and cited evidence.
+
+#### Requirement 7.2
+WHEN a recommendation is generated
+THE SYSTEM SHALL separate descriptive analysis from prescriptive trade action and include a risk classification.
+
+#### Requirement 7.3
+WHEN the system proposes a trade
+THE SYSTEM SHALL attach position sizing guidance based on configured portfolio rules rather than unconstrained model output.
+
+#### Requirement 7.4
+WHEN the confidence or data quality falls below configured thresholds
+THE SYSTEM SHALL suppress automated trade eligibility and mark the recommendation as informational only.
+
+### 8. Trade execution and safety controls
+#### Requirement 8.1
+WHEN trade automation is enabled
+THE SYSTEM SHALL support paper trading mode and live trading mode as separate execution environments.
+
+#### Requirement 8.2
+WHEN live trading mode is enabled
+THE SYSTEM SHALL require operator approval controls, risk limits, and broker credential isolation.
+
+#### Requirement 8.3
+WHEN the system places an order
+THE SYSTEM SHALL persist the full decision trace including signals used, prompt versions, model versions, thresholds, and broker response.
+
+#### Requirement 8.4
+WHEN a proposed order violates configured risk controls
+THE SYSTEM SHALL reject the order before broker submission.
+
+#### Requirement 8.5
+WHEN a broker API is unavailable or partially fails
+THE SYSTEM SHALL fail closed and SHALL NOT place duplicate or ambiguous orders.
+
+### 9. Storage and queryability
+#### Requirement 9.1
+WHEN storing raw artifacts
+THE SYSTEM SHALL use MinIO object storage as the system of record for HTML, text, API payloads, prompts, model outputs, and exported analytical datasets.
+
+#### Requirement 9.2
+WHEN storing normalized relational data
+THE SYSTEM SHALL use PostgreSQL for companies, watchlists, article metadata, document intelligence objects, trends, recommendations, operational execution records, and control-plane state.
+
+#### Requirement 9.3
+WHEN low-latency coordination or caching is required
+THE SYSTEM SHALL use Redis for job state, distributed locks, short-lived caches, and rate-limit coordination.
+
+#### Requirement 9.4
+WHEN historical analytical queries are needed
+THE SYSTEM SHALL persist analytical fact datasets in Hive-compatible partitioned form on MinIO so that market data, predictions, and trade outcomes can be queried together.
+
+#### Requirement 9.5
+WHEN analytical table management is required
+THE SYSTEM SHALL support a lakehouse table abstraction that permits append-only fact ingestion, partitioned queries, and schema evolution.
+
+### 10. SQL analytics and dashboards
+#### Requirement 10.1
+WHEN a user or service executes an analytical query
+THE SYSTEM SHALL provide an Athena-like SQL query service over MinIO-hosted analytical datasets.
+
+#### Requirement 10.2
+WHEN a dashboard user explores market, prediction, and trade data
+THE SYSTEM SHALL expose QuickSight-like dashboards for performance, confidence, prediction accuracy, evidence coverage, and model behavior.
+
+#### Requirement 10.3
+WHEN analytical results combine AI outputs with executed trades and market outcomes
+THE SYSTEM SHALL support joins across predicted signals, broker executions, and realized performance data.
+
+#### Requirement 10.4
+WHEN dashboards or research queries need drill-down capability
+THE SYSTEM SHALL provide traceability from analytical aggregates back to underlying documents, prompts, model outputs, and raw artifacts.
+
+### 11. APIs and UI
+#### Requirement 11.1
+WHEN a client requests company analytics
+THE SYSTEM SHALL expose APIs for document timelines, trend summaries, recommendation history, execution history, and evidence drill-down.
+
+#### Requirement 11.2
+WHEN an operator inspects a recommendation
+THE SYSTEM SHALL display the contributing document intelligence objects, the raw sources used, and any market context features that influenced the decision.
+
+#### Requirement 11.3
+WHEN a user reviews an order decision
+THE SYSTEM SHALL expose a full audit trail from ingestion through broker execution and eventual market outcome.
+
+### 12. Observability and operations
+#### Requirement 12.1
+WHEN a pipeline stage runs
+THE SYSTEM SHALL emit structured logs, metrics, and traces for ingestion, parsing, extraction, aggregation, analytics publication, and trading.
+
+#### Requirement 12.2
+WHEN model performance degrades
+THE SYSTEM SHALL surface schema failure rates, latency percentiles, token usage estimates, and extraction retry counts.
+
+#### Requirement 12.3
+WHEN source coverage changes materially
+THE SYSTEM SHALL alert operators about sustained source failures, symbol coverage gaps, or analytical publication lag.
+
+## Non-Functional Requirements
+#### Requirement N1
+WHEN the system processes documents and market events concurrently
+THE SYSTEM SHALL support horizontal scaling across Kubernetes workers.
+
+#### Requirement N2
+WHEN the system stores model-derived conclusions
+THE SYSTEM SHALL preserve enough provenance to reproduce or challenge those conclusions later.
+
+#### Requirement N3
+WHEN the system handles licensed or restricted content
+THE SYSTEM SHALL preserve source metadata, access policy, and retention policy for each artifact.
+
+#### Requirement N4
+WHEN the system publishes analytical datasets
+THE SYSTEM SHALL ensure queryable partitions are written atomically or with an equivalent consistency guarantee.
+
+#### Requirement N5
+WHEN trade execution is enabled
+THE SYSTEM SHALL prioritize fail-closed behavior over availability in ambiguous conditions.
+
+#### Requirement N6
+WHEN dashboards query large historical datasets
+THE SYSTEM SHALL support partition pruning and index or metadata strategies that keep typical analyst queries responsive.
@@ -0,0 +1,129 @@
+# Stonks Oracle - Tasks
+
+## Phase 0 - Project Setup
+- [x] Create repository structure for services, shared schemas, infrastructure, lakehouse, and dashboards
+- [x] Choose implementation language for services (Python preferred for scraping/LLM workflows)
+- [x] Add local development stack with MinIO, PostgreSQL, Redis, Ollama, Trino, and Superset
+- [x] Add Kubernetes manifests or Helm chart skeletons for all core components
+- [x] Add CI pipeline for linting, tests, container builds, schema checks, and lake dataset validation
+
+## Phase 1 - Core Data and Infrastructure
+- [x] Create PostgreSQL schema migrations for companies, watchlists, sources, documents, document intelligence, trends, recommendations, orders, positions, and audit records
+- [x] Create MinIO bucket provisioning and lifecycle policies
+- [x] Create Redis key conventions and queue abstractions
+- [x] Implement shared config loader for environment variables and secrets
+- [x] Implement shared typed JSON schemas for document intelligence, trend summaries, and recommendations
+- [x] Stand up initial Trino catalog configuration for MinIO-backed datasets
+- [x] Stand up Superset with environment-backed datasource configuration
+
+## Phase 2 - Symbol Registry and Source Management
+- [ ] Build symbol registry API endpoints for companies, aliases, watchlists, and sources
+- [ ] Add source credibility, retention policy, and access policy fields
+- [ ] Add source classes for market data API, news API, filings API, web scrape, and broker adapter
+- [ ] Add admin validation for duplicate tickers, invalid URLs, and unsupported source types
+- [ ] Add seed data support for an initial tracked watchlist
+## Phase 3 
+- External API Adapters
+- [ ] Implement scheduler for symbol and source polling windows
+- [ ] Implement market data API adapter interface
+- [ ] Implement first concrete market data provider adapter
+- [ ] Implement news API adapter interface
+- [ ] Implement first concrete news API provider adapter
+- [ ] Implement filings or regulatory adapter interface
+- [ ] Implement first concrete filings provider adapter
+- [ ] Implement broker API adapter interface for paper trading and order events
+- [ ] Implement rate-limit coordination, retries, and backoff across adapters
+
+## Phase 4 - Ingestion Pipeline
+- [ ] Implement web scraper worker for curated URLs and article pages
+- [ ] Implement canonical URL normalization and content hashing
+- [ ] Implement raw artifact upload to MinIO
+- [ ] Implement metadata persistence in PostgreSQL for market payloads, documents, and broker events
+- [ ] Implement retry and failure tracking for source retrieval
+- [ ] Implement dedupe logic across article and filing sources
+
+## Phase 5 - Parsing and Normalization
+- [ ] Implement HTML-to-text parsing pipeline
+- [ ] Implement boilerplate reduction and body extraction heuristics
+- [ ] Implement parser quality scoring and confidence flags
+- [ ] Implement company mention detection using ticker, alias, and name matching
+- [ ] Persist normalized text and parser outputs to MinIO and PostgreSQL
+
+## Phase 6 - Ollama Structured Extraction
+- [ ] Build extraction prompt templates with anti-hallucination instructions
+- [ ] Build JSON schema definitions for document intelligence extraction
+- [ ] Implement Ollama client wrapper using structured output format
+- [ ] Implement schema validation and semantic validation layers
+- [ ] Persist prompts, model metadata, raw outputs, validation reports, and final intelligence objects
+- [ ] Add retry behavior for invalid or incomplete model responses
+- [ ] Add model performance metrics and dashboards
+
+## Phase 7 - Aggregation and Trend Engine
+- [ ] Implement recency decay and source credibility weighting
+- [ ] Integrate market context features into aggregation windows
+- [ ] Implement company-level rolling window aggregation
+- [ ] Implement contradiction detection and disagreement representation
+- [ ] Implement sector and market rollups
+- [ ] Implement evidence ranking for supporting and opposing documents
+- [ ] Persist trend windows and evidence mappings
+
+## Phase 8 - Recommendation Engine
+- [ ] Design deterministic recommendation eligibility logic
+- [ ] Implement recommendation generation from aggregated scores and evidence
+- [ ] Add optional LLM wording layer for thesis generation only
+- [ ] Persist recommendation objects and evidence citations
+- [ ] Add suppression logic for low-quality data or low confidence
+- [ ] Publish prediction facts to analytical tables
+
+## Phase 9 - Risk Engine and Trade Adapter
+- [ ] Implement portfolio and account risk configuration model
+- [ ] Implement hard blocks for max position size, sector exposure, daily loss limits, and news-shock lockouts
+- [ ] Implement paper trading adapter behavior and state sync
+- [ ] Integrate first broker API in sandbox mode
+- [ ] Implement idempotent order submission keys and duplicate prevention
+- [ ] Implement full execution audit trail
+- [ ] Add operator approval workflow for live trading mode
+- [ ] Publish order, fill, and position facts to analytical tables
+
+## Phase 10 - Lakehouse and SQL Analytics
+- [ ] Define analytical fact tables for bars, documents, extractions, signals, orders, fills, positions, and PnL
+- [ ] Implement Parquet writers for analytical datasets
+- [ ] Implement Hive-compatible partition layout conventions on MinIO
+- [ ] Implement Iceberg table creation and metadata management for analytical datasets
+- [ ] Implement lake publisher jobs from operational data into analytical fact tables
+- [ ] Configure Trino catalogs for Hive and or Iceberg access to MinIO
+- [ ] Add example SQL views for prediction-vs-outcome and paper-trade scorecards
+
+## Phase 11 - Query API and Dashboard
+- [ ] Build APIs for companies, document timelines, trend summaries, recommendations, and order history
+- [ ] Build evidence drill-down view linking recommendations to source documents and raw artifacts
+- [ ] Build admin controls for source health, symbol configs, and trading mode
+- [ ] Build operational dashboard for ingestion throughput, model failures, and source coverage gaps
+- [ ] Build Superset starter dashboards for symbol overview, sentiment heatmap, PnL, and prediction accuracy
+
+## Phase 12 - Observability and Hardening
+- [ ] Add structured logs and distributed tracing across services
+- [ ] Add Prometheus metrics for ingestion, parsing, extraction, aggregation, lake publication, and trading
+- [ ] Add alerting for source failures, schema failure spikes, analytical lag, and broker issues
+- [ ] Add dead-letter queues and replay tooling
+- [ ] Add data retention and lifecycle controls for raw and derived artifacts
+- [ ] Add security review for secrets, network policies, trading isolation, and dashboard access control
+
+## Phase 13 - Verification and Rollout
+- [ ] Create replay dataset from archived documents for deterministic extraction testing
+- [ ] Create integration tests for the full ingest-to-recommendation flow
+- [ ] Create paper trading simulation scenarios
+- [ ] Validate fail-closed behavior for broker outages and ambiguous order states
+- [ ] Validate lake publication and Trino query correctness over partitioned MinIO datasets
+- [ ] Run shadow mode before enabling any live execution
+- [ ] Prepare operator runbook and incident response procedures
+
+## Recommended First Vertical Slice
+- [ ] Track 5 to 10 symbols
+- [ ] Ingest one market data API, one news API, and one filings source per symbol group
+- [ ] Persist raw artifacts to MinIO and metadata to PostgreSQL
+- [ ] Extract structured document intelligence through Ollama
+- [ ] Generate 7-day company trend summaries with market context
+- [ ] Produce paper-trade recommendations only
+- [ ] Publish analytical facts for bars, signals, and paper trades into MinIO
+- [ ] Expose a simple dashboard with evidence, trend cards, and prediction-vs-outcome views
@@ -0,0 +1,44 @@
+# Development Process — Test-Develop-Debug
+
+## Workflow
+1. Write or update tests for the target behavior
+2. Implement the minimal code to pass
+3. Debug failures, fix, re-run
+4. Commit and push after each phase completes
+5. GitHub Actions builds container images and pushes to GHCR
+6. Deploy to cluster via `kubectl apply`
+
+## Testing
+- Use `pytest` with `pytest-asyncio` for async code
+- Tests live alongside service code or in a top-level `tests/` directory
+- Run tests with `pytest --tb=short -q` or `pytest -x` for fail-fast
+- Focus on core logic, not mocking infrastructure
+
+## Build and Deploy
+- Always build and test Docker images locally before pushing to GitHub
+- Only push to GitHub after local build succeeds — don't waste CI credits on broken builds
+- Dockerfile at `docker/Dockerfile`
+- GitHub workflow at `.github/workflows/build.yml`
+- Images tagged as `ghcr.io/celesrenata/stonks-oracle/<service>:<sha>` and `:latest`
+- K8s manifests reference GHCR images
+- Deploy: `kubectl apply -f infra/k8s/`
+- Local build: `make build` → verify → `git push` → CI builds and pushes to GHCR
+
+## Git Conventions
+- Commit after each completed phase task
+- Commit message format: `phase N: short description`
+- Push to `main` branch triggers CI
+
+## Code Style
+- Python 3.12, type hints everywhere
+- Pydantic for data validation
+- FastAPI for HTTP services
+- asyncio + asyncpg/aioredis for async I/O
+- Minimal dependencies, prefer stdlib where possible
+
+## Documentation
+- Do NOT create large summary/success markdown files after each step
+- Keep notes short, concise, and organized under `docs/notes/`
+- Name note files to match the task they relate to (e.g. `docs/notes/phase0-k8s-manifests.md`)
+- This makes them recallable by task without guessing
+- If a note isn't useful for future reference, don't write it
@@ -0,0 +1,33 @@
+---
+inclusion: fileMatch
+fileMatchPattern: "infra/k8s/**"
+---
+# Kubernetes Conventions
+
+## Namespace
+All Stonks Oracle workloads deploy to `stonks-oracle` namespace.
+
+## TLS
+- Internal services: use `ca-issuer` ClusterIssuer (local CA)
+- Public-facing services (Superset, Query API): use `celestium-le-production` ClusterIssuer (Let's Encrypt)
+- Annotate ingress with `cert-manager.io/cluster-issuer`
+
+## Ingress
+- Traefik ingress controller
+- Domain pattern: `<service>.celestium.life`
+- Always create both HTTP and HTTPS ingress rules
+
+## Service References
+- PostgreSQL: `postgresql-rw.postgresql-service.svc.cluster.local:5432`
+- Redis: `redis-master.redis-service.svc.cluster.local:6379`
+- MinIO API: `minio.minio-service.svc.cluster.local:80`
+- Ollama: `ollama.ollama-service.svc.cluster.local:11434`
+
+## Images
+- All images from `ghcr.io/celesrenata/stonks-oracle/<service>:latest`
+- Use `imagePullPolicy: Always` in production
+- Use `imagePullSecrets` referencing `ghcr-secret` if repo is private
+
+## Labels
+- `app.kubernetes.io/part-of: stonks-oracle`
+- `app: <service-name>`
@@ -0,0 +1,33 @@
+# Stonks Oracle — Project Context
+
+## Overview
+Stonks Oracle is a Kubernetes-native AI market intelligence and paper-trading platform.
+Python monorepo with services under `services/`, infrastructure under `infra/`, lakehouse schemas under `lakehouse/`, and dashboards under `dashboards/`.
+
+## Infrastructure
+- Kubernetes cluster: 4x NixOS nodes (gremlin-1 through gremlin-4), reachable via `kubectl`, `virtctl`, `ssh root@gremlin-{1,2,3,4}`
+- NixOS configs stored at `/etc/nixos` on gremlin-1, git-pushed to other hosts
+- Ingress: Traefik, domain `*.celestium.life`
+- Cert-Manager: `ca-issuer` (local CA) for internal services, `celestium-le-production` (Let's Encrypt) for public-facing
+- Container registry: `ghcr.io/celesrenata/stonks-oracle`
+- CI: GitHub Actions builds containers, cluster pulls from GHCR
+
+## Existing Cluster Services (do NOT redeploy these)
+- PostgreSQL: `postgresql-rw.postgresql-service.svc.cluster.local:5432`
+- Redis: `redis-master.redis-service.svc.cluster.local:6379`
+- MinIO: `minio.minio-service.svc.cluster.local:80` (API), console at `minio-crawler-console.minio-service.svc.cluster.local:9090`
+- Ollama: `ollama.ollama-service.svc.cluster.local:11434` (cluster-internal), also at `http://10.1.1.12:2701` (external), GPU: 4070 Ti Super 16GB
+
+## Development Process
+- Test-Develop-Debug (TDD) cycle
+- After each phase: commit, push, build via GitHub Actions, deploy to cluster
+- Local builds for dev iteration, GitHub workflows for CI/CD
+- Python 3.12, NixOS dev environment
+
+## Key Conventions
+- All services use `services/shared/config.py` for configuration via env vars
+- Redis queues defined in `services/shared/redis_keys.py`
+- Pydantic schemas in `services/shared/schemas.py`
+- K8s manifests in `infra/k8s/`, all in `stonks-oracle` namespace
+- Lakehouse DDL in `lakehouse/schemas/`
+- Crawler patterns inspired by Noctipede (`~/sources/splinterstice/noctipede`): BeautifulSoup + requests with retry adapters, content hashing, boilerplate stripping, quality scoring