phase 0+1: project scaffold, k8s manifests, CI pipeline, steering, hooks, tests
- Repository structure for all services, infra, lakehouse, dashboards - K8s manifests targeting stonks-oracle namespace with GHCR images - Ingress via Traefik with ca-issuer TLS for internal services - ConfigMap wired to existing cluster services (pg, redis, minio, ollama) - GitHub Actions workflow for lint, test, multi-service container builds - Dockerfile with build-arg CMD per service - Makefile for local build/push/deploy - Steering rules for TDD workflow, K8s conventions, project context - Agent hooks for lint-on-save, test-on-save, k8s-validate, phase-commit - Ruff linter config, all lint issues fixed - 14 passing tests for schemas, config, redis keys - PostgreSQL migrations, Trino catalogs, Superset config, MinIO lifecycle
This commit is contained in:
@@ -0,0 +1,14 @@
|
||||
---
|
||||
name: Lint Python on Save
|
||||
description: Run ruff linter when any Python file is saved
|
||||
version: "1.0"
|
||||
trigger:
|
||||
type: onSave
|
||||
filePattern: "**/*.py"
|
||||
---
|
||||
|
||||
When any Python file is saved:
|
||||
|
||||
1. Run `ruff check {filePath}` on the saved file
|
||||
2. If there are fixable issues, run `ruff check --fix {filePath}` to auto-fix
|
||||
3. Report any remaining issues concisely
|
||||
@@ -0,0 +1,15 @@
|
||||
---
|
||||
name: Phase Commit and Push
|
||||
description: Commit and push after completing a spec phase task
|
||||
version: "1.0"
|
||||
trigger:
|
||||
type: manual
|
||||
---
|
||||
|
||||
When triggered manually after completing a phase:
|
||||
|
||||
1. Run `git add -A`
|
||||
2. Ask the user for a commit message, suggesting format: `phase N: short description`
|
||||
3. Run `git commit -m "{message}"`
|
||||
4. Run `git push origin main`
|
||||
5. Report the commit SHA and confirm push succeeded
|
||||
@@ -0,0 +1,16 @@
|
||||
---
|
||||
name: Run Tests on Save
|
||||
description: Automatically run relevant tests when a Python service file is saved
|
||||
version: "1.0"
|
||||
trigger:
|
||||
type: onSave
|
||||
filePattern: "services/**/*.py"
|
||||
---
|
||||
|
||||
When a Python file under `services/` is saved:
|
||||
|
||||
1. Identify which service module was modified (e.g. `services/ingestion/worker.py` → `ingestion`)
|
||||
2. Look for corresponding tests in `tests/` matching the service name
|
||||
3. Run `pytest tests/test_{service_name}*.py -x --tb=short -q` if test files exist
|
||||
4. If no specific test file exists, run `ruff check` on the modified file to catch syntax/lint issues
|
||||
5. Report results concisely — only show failures or a one-line success confirmation
|
||||
@@ -0,0 +1,16 @@
|
||||
---
|
||||
name: Validate K8s Manifests
|
||||
description: Validate Kubernetes YAML when manifest files are saved
|
||||
version: "1.0"
|
||||
trigger:
|
||||
type: onSave
|
||||
filePattern: "infra/k8s/**/*.yaml"
|
||||
---
|
||||
|
||||
When a Kubernetes manifest YAML file is saved:
|
||||
|
||||
1. Parse the YAML to check for syntax errors
|
||||
2. Verify required fields exist (apiVersion, kind, metadata)
|
||||
3. Check that namespace is set to `stonks-oracle` for application resources
|
||||
4. Verify image references point to `ghcr.io/celesrenata/stonks-oracle/`
|
||||
5. Report any issues found
|
||||
@@ -0,0 +1,480 @@
|
||||
# Stonks Oracle - Design
|
||||
|
||||
## 1. Purpose
|
||||
Stonks Oracle is a Kubernetes-native AI market intelligence and trading platform. It ingests structured market data, company news, filings, and curated web content; preserves raw artifacts in MinIO; extracts structured intelligence objects with local Ollama models; aggregates signals into trend and recommendation outputs; optionally executes trades through a broker integration; and publishes historical datasets into a local lakehouse for Athena-like querying and QuickSight-like dashboards.
|
||||
|
||||
This design prioritizes:
|
||||
- deterministic data contracts
|
||||
- auditability of every AI-derived conclusion
|
||||
- safe paper-trading-first automation
|
||||
- self-hosted analytics on MinIO-backed datasets
|
||||
- clear separation between operational state and analytical state
|
||||
|
||||
## 2. Architecture Summary
|
||||
The platform is split into two planes:
|
||||
|
||||
### 2.1 Operational plane
|
||||
Handles ingestion, parsing, structured extraction, signal generation, risk evaluation, trade execution, and control APIs.
|
||||
|
||||
Primary stores:
|
||||
- PostgreSQL for operational state and transactional records
|
||||
- Redis for queues, locks, and hot cache state
|
||||
- MinIO for raw artifacts, prompts, model outputs, and exported datasets
|
||||
|
||||
### 2.2 Analytical plane
|
||||
Handles historical fact storage, SQL query access, research, scorecards, and dashboards.
|
||||
|
||||
Primary components:
|
||||
- MinIO as S3-compatible object store
|
||||
- Hive-compatible partition layout for query compatibility
|
||||
- Iceberg tables as the preferred lakehouse abstraction for managed analytical datasets
|
||||
- Trino as the Athena-like SQL query engine
|
||||
- Apache Superset as the QuickSight-like dashboard and exploration layer
|
||||
|
||||
## 3. External Integrations
|
||||
|
||||
### 3.1 Market Data API
|
||||
Used for:
|
||||
- quotes
|
||||
- OHLCV bars
|
||||
- reference data
|
||||
- corporate actions
|
||||
- earnings calendars
|
||||
- optional market news or fundamentals
|
||||
|
||||
### 3.2 News API
|
||||
Used for:
|
||||
- company-linked headlines
|
||||
- publisher metadata
|
||||
- article URLs
|
||||
- article summaries when licensed
|
||||
|
||||
### 3.3 Filings / Regulatory API
|
||||
Used for:
|
||||
- SEC-style company submissions
|
||||
- 8-K, 10-Q, 10-K, and related filings
|
||||
- structured issuer event discovery
|
||||
|
||||
### 3.4 Web Scraper
|
||||
Used for:
|
||||
- full article body retrieval when API content is partial
|
||||
- investor relations pages
|
||||
- curated press release sources
|
||||
- transcript or presentation retrieval when permitted
|
||||
|
||||
### 3.5 Broker API
|
||||
Used for:
|
||||
- paper-trading simulation or sandbox trading
|
||||
- live order submission when enabled
|
||||
- order acknowledgements and rejections
|
||||
- fills and cancellations
|
||||
- positions and account balances
|
||||
|
||||
## 4. Logical Components
|
||||
|
||||
### 4.1 Symbol Registry Service
|
||||
Responsibilities:
|
||||
- manage companies, aliases, watchlists, sectors, and source configurations
|
||||
- manage source trust or credibility policies
|
||||
- manage symbol-to-document matching rules
|
||||
|
||||
### 4.2 Scheduler / Orchestrator
|
||||
Responsibilities:
|
||||
- trigger market, news, filings, and scrape jobs
|
||||
- manage polling cadences by source class
|
||||
- coordinate backoff, retries, and dedupe windows
|
||||
- publish downstream jobs to workers
|
||||
|
||||
### 4.3 Ingestion Adapters
|
||||
Subcomponents:
|
||||
- Market data adapter
|
||||
- News API adapter
|
||||
- Filings adapter
|
||||
- Broker event adapter
|
||||
|
||||
Responsibilities:
|
||||
- fetch external payloads
|
||||
- preserve raw responses in MinIO
|
||||
- normalize metadata into PostgreSQL
|
||||
- emit processing jobs for parsing or publication
|
||||
|
||||
### 4.4 Scraper / Parser Service
|
||||
Responsibilities:
|
||||
- fetch and render source pages
|
||||
- extract normalized text and metadata
|
||||
- reduce boilerplate and duplicated template text
|
||||
- score parser quality and extraction confidence
|
||||
- persist normalized artifacts
|
||||
|
||||
### 4.5 Ollama Extraction Service
|
||||
Responsibilities:
|
||||
- call local Ollama models using schema-constrained JSON output
|
||||
- produce canonical document intelligence objects
|
||||
- preserve prompts, schemas, model metadata, and raw outputs
|
||||
- validate schema and semantic consistency
|
||||
- retry invalid generations under policy
|
||||
|
||||
### 4.6 Aggregation Engine
|
||||
Responsibilities:
|
||||
- combine document intelligence with market context
|
||||
- compute rolling trend summaries by company, sector, and market
|
||||
- track contradiction and agreement signals
|
||||
- score evidence with recency decay and source weighting
|
||||
|
||||
### 4.7 Recommendation Engine
|
||||
Responsibilities:
|
||||
- generate explainable recommendation objects from aggregated evidence
|
||||
- separate deterministic eligibility scoring from final action mapping
|
||||
- produce suggested action, thesis, horizon, and invalidation conditions
|
||||
- publish analytical prediction facts to the lake
|
||||
|
||||
### 4.8 Risk Engine
|
||||
Responsibilities:
|
||||
- enforce guardrails such as max position size, daily loss cap, exposure by sector, symbol cooldowns, news shock lockouts, and operator approval rules
|
||||
- determine whether a recommendation is eligible for paper or live execution
|
||||
- block ambiguous or unsafe orders before broker submission
|
||||
|
||||
### 4.9 Broker Adapter
|
||||
Responsibilities:
|
||||
- abstract one or more trading APIs
|
||||
- support paper mode and live mode
|
||||
- record submission, acknowledgement, rejection, fill, and cancellation events
|
||||
- guarantee idempotent order submission keys
|
||||
- publish order and fill facts to both PostgreSQL and the analytical lake
|
||||
|
||||
### 4.10 Lake Publisher
|
||||
Responsibilities:
|
||||
- transform operational records into analytics-friendly fact datasets
|
||||
- publish append-only partitioned tables to MinIO
|
||||
- maintain Iceberg metadata or equivalent lakehouse metadata
|
||||
- expose datasets such as predictions, outcomes, fills, bars, and PnL
|
||||
|
||||
### 4.11 Query API / Dashboard
|
||||
Responsibilities:
|
||||
- expose companies, documents, trends, recommendations, and orders
|
||||
- provide evidence drill-down and audit views
|
||||
- provide operator controls for live-trading enablement and review queues
|
||||
- expose links into analytical dashboards and query tools
|
||||
|
||||
### 4.12 SQL Query Engine and BI Layer
|
||||
Components:
|
||||
- Trino coordinator and workers
|
||||
- Hive Metastore or Iceberg catalog service
|
||||
- Apache Superset
|
||||
|
||||
Responsibilities:
|
||||
- provide Athena-like SQL access to MinIO-hosted tables
|
||||
- support dashboard datasets and ad hoc exploration
|
||||
- support joins between market facts, AI predictions, and executed trades
|
||||
#
|
||||
# 5. Storage Model
|
||||
|
||||
### 5.1 Operational stores
|
||||
#### PostgreSQL
|
||||
Used for:
|
||||
- companies and aliases
|
||||
- watchlists and source configs
|
||||
- article and filing metadata
|
||||
- document intelligence objects
|
||||
- trend summaries
|
||||
- recommendations
|
||||
- risk evaluations
|
||||
- orders and execution events
|
||||
- control-plane state and audit records
|
||||
|
||||
#### Redis
|
||||
Used for:
|
||||
- distributed locks for symbol-source retrieval
|
||||
- ingestion rate-limit counters
|
||||
- job queue state
|
||||
- retry backoff state
|
||||
- dedupe markers
|
||||
- cache for hot API and dashboard views
|
||||
|
||||
#### MinIO object storage
|
||||
Used for:
|
||||
- raw API payloads
|
||||
- raw article HTML and normalized text
|
||||
- prompts, schemas, and raw model results
|
||||
- exported analytical datasets
|
||||
- audit traces and reproducibility bundles
|
||||
|
||||
### 5.2 MinIO bucket layout
|
||||
Recommended buckets:
|
||||
- `stonks-raw-market` — raw market API payloads
|
||||
- `stonks-raw-news` — raw news API payloads and article HTML
|
||||
- `stonks-raw-filings` — raw filings and issuer event payloads
|
||||
- `stonks-normalized` — cleaned text and parser outputs
|
||||
- `stonks-llm-prompts` — prompts and schemas used
|
||||
- `stonks-llm-results` — raw model outputs and validation reports
|
||||
- `stonks-lakehouse` — partitioned analytical datasets and table metadata
|
||||
- `stonks-audit` — execution traces and exported reports
|
||||
|
||||
Suggested raw object path pattern:
|
||||
```text
|
||||
/{stage}/{symbol}/{yyyy}/{mm}/{dd}/{document_id}/{artifact_type}.json
|
||||
/{stage}/{symbol}/{yyyy}/{mm}/{dd}/{document_id}/{artifact_type}.html
|
||||
```
|
||||
|
||||
Suggested analytical path pattern:
|
||||
```text
|
||||
/warehouse/{table_name}/dt={yyyy-mm-dd}/symbol={ticker}/part-*.parquet
|
||||
```
|
||||
|
||||
### 5.3 Lakehouse model
|
||||
Preferred design:
|
||||
- Parquet files stored in MinIO
|
||||
- Hive-compatible partitioning for interoperability
|
||||
- Iceberg table metadata for managed analytical tables
|
||||
- Trino catalogs for SQL access
|
||||
|
||||
Rationale:
|
||||
- Hive-compatible layouts preserve broad engine compatibility
|
||||
- Iceberg improves schema evolution, partition handling, and table maintenance
|
||||
- Trino can query MinIO-backed object storage and supports both Hive and Iceberg catalogs
|
||||
|
||||
## 6. Data Model
|
||||
|
||||
### 6.1 PostgreSQL schema outline
|
||||
Core tables:
|
||||
- `companies`
|
||||
- `company_aliases`
|
||||
- `watchlists`
|
||||
- `watchlist_members`
|
||||
- `sources`
|
||||
- `api_credentials_refs`
|
||||
- `ingestion_runs`
|
||||
- `market_snapshots`
|
||||
- `documents`
|
||||
- `document_versions`
|
||||
- `document_company_mentions`
|
||||
- `document_intelligence`
|
||||
- `document_impact_records`
|
||||
- `trend_windows`
|
||||
- `recommendations`
|
||||
- `recommendation_evidence`
|
||||
- `risk_evaluations`
|
||||
- `broker_accounts`
|
||||
- `orders`
|
||||
- `order_events`
|
||||
- `positions`
|
||||
- `audit_events`
|
||||
|
||||
### 6.2 Article or document metadata record
|
||||
```json
|
||||
{
|
||||
"document_id": "uuid",
|
||||
"document_type": "article|filing|transcript|press_release",
|
||||
"symbol_candidates": ["AAPL", "MSFT"],
|
||||
"source_type": "news_api",
|
||||
"publisher": "string",
|
||||
"url": "string",
|
||||
"canonical_url": "string",
|
||||
"title": "string",
|
||||
"published_at": "2026-04-09T00:00:00Z",
|
||||
"retrieved_at": "2026-04-09T00:00:00Z",
|
||||
"language": "en",
|
||||
"content_hash": "sha256",
|
||||
"storage_refs": {
|
||||
"raw_html": "s3://...",
|
||||
"raw_payload": "s3://..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 6.3 Document intelligence schema
|
||||
```json
|
||||
{
|
||||
"document_id": "uuid",
|
||||
"summary": "string",
|
||||
"companies": [
|
||||
{
|
||||
"ticker": "AAPL",
|
||||
"company_name": "Apple Inc.",
|
||||
"relevance": 0.95,
|
||||
"sentiment": "positive",
|
||||
"impact_score": 0.71,
|
||||
"impact_horizon": "1d_30d",
|
||||
"catalyst_type": "earnings|product|legal|macro|supply_chain|m_and_a|rating_change|other",
|
||||
"key_facts": ["string"],
|
||||
"risks": ["string"],
|
||||
"evidence_spans": ["string"]
|
||||
}
|
||||
],
|
||||
"macro_themes": ["rates", "ai_capex"],
|
||||
"novelty_score": 0.64,
|
||||
"source_credibility": 0.8,
|
||||
"extraction_warnings": ["ambiguous_ticker_reference"],
|
||||
"confidence": 0.86,
|
||||
"model": {
|
||||
"provider": "ollama",
|
||||
"model_name": "gpt-oss:20b",
|
||||
"prompt_version": "document-intel-v2",
|
||||
"schema_version": "2.0.0"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 6.4 Trend summary schema
|
||||
```json
|
||||
{
|
||||
"entity_type": "company",
|
||||
"entity_id": "AAPL",
|
||||
"window": "7d",
|
||||
"trend_direction": "bullish|bearish|mixed|neutral",
|
||||
"trend_strength": 0.68,
|
||||
"confidence": 0.74,
|
||||
"top_supporting_evidence": ["document_id_1", "document_id_2"],
|
||||
"top_opposing_evidence": ["document_id_3"],
|
||||
"dominant_catalysts": ["product", "analyst_rating"],
|
||||
"material_risks": ["regulatory scrutiny"],
|
||||
"contradiction_score": 0.22
|
||||
}
|
||||
```
|
||||
|
||||
### 6.5 Recommendation schema
|
||||
```json
|
||||
{
|
||||
"recommendation_id": "uuid",
|
||||
"ticker": "AAPL",
|
||||
"action": "buy|sell|hold|watch",
|
||||
"mode": "informational|paper_eligible|live_eligible",
|
||||
"confidence": 0.72,
|
||||
"time_horizon": "swing_1d_10d",
|
||||
"thesis": "string",
|
||||
"invalidation_conditions": ["string"],
|
||||
"position_sizing": {
|
||||
"portfolio_pct": 0.02,
|
||||
"max_loss_pct": 0.005
|
||||
},
|
||||
"evidence_refs": ["document_id_1", "document_id_2"],
|
||||
"model_metadata": {
|
||||
"version": "recommendation-v1"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 7. Analytical Lake Datasets
|
||||
The analytical plane should expose the following logical fact tables:
|
||||
- `lake.market_bars`
|
||||
- `lake.market_quotes`
|
||||
- `lake.company_events`
|
||||
- `lake.documents`
|
||||
- `lake.document_extractions`
|
||||
- `lake.trade_signals`
|
||||
- `lake.trade_orders`
|
||||
- `lake.trade_fills`
|
||||
- `lake.positions_daily`
|
||||
- `lake.pnl_daily`
|
||||
- `lake.prediction_vs_outcome`
|
||||
|
||||
Recommended partitioning examples:
|
||||
- market data: partition by `dt`, optional symbol transform later
|
||||
- documents: partition by `dt` and maybe `source_type`
|
||||
- predictions: partition by `dt` and `model_version`
|
||||
- fills and PnL: partition by `dt` and broker account
|
||||
|
||||
## 8. Data Flows
|
||||
|
||||
### 8.1 Market and document ingestion flow
|
||||
1. Scheduler selects due symbols and sources.
|
||||
2. Adapters fetch market, news, and filings payloads.
|
||||
3. Raw payloads are written to MinIO.
|
||||
4. Metadata records are written to PostgreSQL.
|
||||
5. New documents are emitted to parser jobs.
|
||||
|
||||
### 8.2 Extraction flow
|
||||
1. Parser produces normalized text and confidence score.
|
||||
2. Extraction worker sends document to Ollama with schema-bound output.
|
||||
3. Validator checks schema and semantic consistency.
|
||||
4. Canonical intelligence object is stored in PostgreSQL and MinIO.
|
||||
5. Aggregation jobs are triggered for impacted symbols.
|
||||
|
||||
### 8.3 Recommendation and trade flow
|
||||
1. Aggregation engine updates trend windows.
|
||||
2. Recommendation engine emits a recommendation object.
|
||||
3. Risk engine determines eligibility and allowed execution mode.
|
||||
4. Broker adapter places paper or live orders when authorized.
|
||||
5. Broker events update PostgreSQL and publish analytical facts to the lake.
|
||||
|
||||
### 8.4 Lake publication flow
|
||||
1. Operational records are transformed into analytical facts.
|
||||
2. Facts are written as partitioned Parquet files to MinIO.
|
||||
3. Table metadata is updated through Iceberg or equivalent catalog operations.
|
||||
4. Trino exposes the datasets for SQL.
|
||||
5. Superset uses Trino datasets for dashboards and ad hoc exploration.
|
||||
|
||||
## 9. Query and Dashboard Surface
|
||||
|
||||
### 9.1 Operational API
|
||||
Should expose:
|
||||
- company and watchlist configuration
|
||||
- source health and job state
|
||||
- document timelines and evidence
|
||||
- recommendation history
|
||||
- order history and audit trail
|
||||
- risk configuration and trading mode
|
||||
|
||||
### 9.2 Analytical surface
|
||||
Should expose:
|
||||
- SQL access through Trino
|
||||
- dashboard datasets in Superset
|
||||
- scorecards for prediction accuracy and PnL
|
||||
- evidence-to-outcome drill-down views
|
||||
- model performance and extraction failure dashboards
|
||||
|
||||
Suggested starter dashboards:
|
||||
- symbol overview
|
||||
- market sentiment heatmap
|
||||
- prediction confidence vs realized move
|
||||
- paper trading PnL
|
||||
- model extraction quality
|
||||
- source coverage and ingestion lag
|
||||
|
||||
## 10. Reliability and Safety
|
||||
- Broker submission must be idempotent.
|
||||
- Live trading must be disabled by default.
|
||||
- Paper trading must be the first enabled execution mode.
|
||||
- Invalid model output must not advance to trade execution.
|
||||
- Low-quality document extraction must not influence live trading.
|
||||
- All analytical publication jobs should be replayable.
|
||||
- Every recommendation and order should be reproducible from saved prompts, source refs, and model metadata.
|
||||
|
||||
## 11. Deployment Notes
|
||||
Recommended Kubernetes workloads:
|
||||
- `symbol-registry-api`
|
||||
- `scheduler`
|
||||
- `market-adapter-worker`
|
||||
- `news-adapter-worker`
|
||||
- `filings-adapter-worker`
|
||||
- `scraper-worker`
|
||||
- `parser-worker`
|
||||
- `ollama-extractor-worker`
|
||||
- `aggregation-worker`
|
||||
- `recommendation-worker`
|
||||
- `risk-engine-api`
|
||||
- `broker-adapter`
|
||||
- `lake-publisher`
|
||||
- `trino-coordinator`
|
||||
- `trino-worker`
|
||||
- `superset-web`
|
||||
- `postgres`
|
||||
- `redis`
|
||||
- `minio`
|
||||
|
||||
## 12. Deliberate Scope Boundaries for v1
|
||||
Included in v1:
|
||||
- tracked watchlists
|
||||
- market, news, filings, and broker integrations
|
||||
- Ollama structured extraction
|
||||
- trend aggregation and recommendation objects
|
||||
- paper trading with strict controls
|
||||
- MinIO-backed analytics lake
|
||||
- Trino and Superset self-hosted analytics
|
||||
|
||||
Deferred from v1:
|
||||
- options trading
|
||||
- full order book or tick-level market microstructure
|
||||
- online model retraining
|
||||
- fully autonomous live trading with no approval workflow
|
||||
- advanced portfolio optimization beyond basic sizing and risk caps
|
||||
@@ -0,0 +1,269 @@
|
||||
# Stonks Oracle - Requirements
|
||||
|
||||
## Overview
|
||||
This feature builds an AI-assisted market intelligence, execution, and analytics platform for a Kubernetes-hosted environment. The platform ingests market symbols, licensed market data, company-specific news, regulatory filings, scraped web sources, and broker execution events; stores raw and normalized artifacts; extracts structured JSON with local Ollama models; computes trend and sentiment summaries; and optionally places trades through a broker integration.
|
||||
|
||||
The platform SHALL also maintain a local analytics lake on MinIO using Hive-compatible partitioned data, support Athena-like SQL querying over captured market and trade data, and expose QuickSight-like dashboards for research, review, and audit.
|
||||
|
||||
The initial release is focused on reliable ingestion, deterministic structured extraction, explainable trend scoring, paper trading safety, and internal analytics visibility.
|
||||
|
||||
## User Stories
|
||||
- As an operator, I want to register companies, tickers, sectors, watchlists, and source rules so the system knows what to monitor.
|
||||
- As an analyst, I want every raw article, filing, market snapshot, and scrape artifact preserved so I can audit downstream AI conclusions.
|
||||
- As a data engineer, I want structured JSON extraction from each article and filing so downstream analytics are queryable.
|
||||
- As a strategist, I want aggregated trend assessments per symbol, sector, and market regime so I can evaluate opportunities.
|
||||
- As a trader, I want the system to generate explainable trade recommendations with explicit confidence, catalysts, and risk notes.
|
||||
- As a risk owner, I want strict controls on automated trading so the system cannot place unsafe orders.
|
||||
- As a quantitative reviewer, I want to query historical market data, AI predictions, and executed trades in one SQL-accessible analytics plane.
|
||||
- As a dashboard user, I want QuickSight-like visualizations for performance, signal quality, prediction accuracy, and model behavior.
|
||||
- As a platform owner, I want the system to run fully inside Kubernetes against local Ollama and self-hosted analytics components.
|
||||
|
||||
## Functional Requirements
|
||||
|
||||
### 1. Watchlist and source management
|
||||
#### Requirement 1.1
|
||||
WHEN an operator creates or updates a tracked company
|
||||
THE SYSTEM SHALL persist the company profile including ticker, legal name, aliases, exchange, sector, industry, market cap bucket, and source configuration.
|
||||
|
||||
#### Requirement 1.2
|
||||
WHEN an operator defines a source configuration for a company
|
||||
THE SYSTEM SHALL support source types including market data APIs, news API feeds, SEC or investor relations URLs, company press release pages, earnings transcript sources, curated web pages, and broker-linked execution sources.
|
||||
|
||||
#### Requirement 1.3
|
||||
WHEN a company has aliases, brands, or product names
|
||||
THE SYSTEM SHALL use those aliases during source retrieval, de-duplication, entity matching, and extraction.
|
||||
|
||||
### 2. External API integrations
|
||||
#### Requirement 2.1
|
||||
WHEN the scheduler triggers a market ingestion cycle
|
||||
THE SYSTEM SHALL fetch configured market data API results for tracked companies and persist raw response payloads.
|
||||
|
||||
#### Requirement 2.2
|
||||
WHEN the scheduler triggers a news ingestion cycle
|
||||
THE SYSTEM SHALL fetch configured news API results for tracked companies and persist raw response payloads.
|
||||
|
||||
#### Requirement 2.3
|
||||
WHEN the scheduler triggers a regulatory ingestion cycle
|
||||
THE SYSTEM SHALL fetch configured filing or issuer event data from authoritative sources such as SEC-style APIs and persist raw response payloads.
|
||||
|
||||
#### Requirement 2.4
|
||||
WHEN trade automation is enabled
|
||||
THE SYSTEM SHALL integrate with at least one broker API that supports paper trading, order placement, order status retrieval, positions, account balances, and execution events.
|
||||
|
||||
#### Requirement 2.5
|
||||
WHEN external APIs enforce rate limits or quotas
|
||||
THE SYSTEM SHALL coordinate request pacing, retries, and backoff across workers.
|
||||
|
||||
### 3. Ingestion and raw artifact retention
|
||||
#### Requirement 3.1
|
||||
WHEN a scraper retrieves an article, filing, or web page
|
||||
THE SYSTEM SHALL store the raw HTML, rendered text, metadata, retrieval timestamp, and retrieval source in object storage.
|
||||
|
||||
#### Requirement 3.2
|
||||
WHEN an article, filing, or market payload is ingested
|
||||
THE SYSTEM SHALL generate a stable content hash and use it to prevent duplicate processing.
|
||||
|
||||
#### Requirement 3.3
|
||||
WHEN the system stores a raw artifact
|
||||
THE SYSTEM SHALL persist an associated metadata record containing symbol, source, URL when applicable, title, publication time, retrieval time, language when applicable, and content hash.
|
||||
|
||||
#### Requirement 3.4
|
||||
WHEN content retrieval fails
|
||||
THE SYSTEM SHALL record the failure reason, retry policy state, and next eligible retry time.
|
||||
|
||||
### 4. Parsing and normalization
|
||||
#### Requirement 4.1
|
||||
WHEN a raw article or filing enters the parsing stage
|
||||
THE SYSTEM SHALL extract normalized text, author data when available, publisher, tags, mentioned entities, outbound links, and document type.
|
||||
|
||||
#### Requirement 4.2
|
||||
WHEN the system detects boilerplate or repeated template text
|
||||
THE SYSTEM SHALL reduce or remove boilerplate before AI extraction while retaining the original raw artifact for audit.
|
||||
|
||||
#### Requirement 4.3
|
||||
WHEN the parser cannot confidently extract article body text
|
||||
THE SYSTEM SHALL flag the document for low-quality extraction and prevent it from influencing downstream trading until reviewed or reprocessed.
|
||||
|
||||
### 5. AI article and document extraction
|
||||
#### Requirement 5.1
|
||||
WHEN a normalized article or filing is ready for AI extraction
|
||||
THE SYSTEM SHALL send the document to a local Ollama model using structured output with an explicit JSON schema.
|
||||
|
||||
#### Requirement 5.2
|
||||
WHEN the model returns extraction output
|
||||
THE SYSTEM SHALL validate the response against the expected schema before saving it.
|
||||
|
||||
#### Requirement 5.3
|
||||
WHEN extraction succeeds
|
||||
THE SYSTEM SHALL produce a canonical document intelligence object with at minimum:
|
||||
- document_id
|
||||
- document_type
|
||||
- source metadata
|
||||
- tickers referenced
|
||||
- companies referenced
|
||||
- document summary
|
||||
- sentiment by company
|
||||
- catalyst type
|
||||
- impact horizon
|
||||
- key facts
|
||||
- risks mentioned
|
||||
- macro themes
|
||||
- confidence score
|
||||
- extraction warnings
|
||||
- model metadata
|
||||
|
||||
#### Requirement 5.4
|
||||
WHEN the model response is invalid, incomplete, or hallucinatory
|
||||
THE SYSTEM SHALL retry extraction according to policy and preserve both the failed output and validation errors.
|
||||
|
||||
#### Requirement 5.5
|
||||
WHEN a document is materially relevant to multiple companies
|
||||
THE SYSTEM SHALL emit one shared document record and one or more per-company impact records.
|
||||
|
||||
### 6. Aggregation and trend analysis
|
||||
#### Requirement 6.1
|
||||
WHEN multiple document intelligence objects and market observations exist for a company
|
||||
THE SYSTEM SHALL generate a rolling company trend summary over configurable windows including intraday, 1 day, 7 day, 30 day, and 90 day intervals.
|
||||
|
||||
#### Requirement 6.2
|
||||
WHEN generating a company trend summary
|
||||
THE SYSTEM SHALL consider sentiment, catalyst frequency, source credibility, recency decay, contradiction detection, document novelty, and current market context.
|
||||
|
||||
#### Requirement 6.3
|
||||
WHEN generating a market-wide trend summary
|
||||
THE SYSTEM SHALL aggregate company-level signals into sector and market-level summaries.
|
||||
|
||||
#### Requirement 6.4
|
||||
WHEN contradictory signals exist across sources
|
||||
THE SYSTEM SHALL represent disagreement explicitly rather than collapsing it into a single unsupported conclusion.
|
||||
|
||||
#### Requirement 6.5
|
||||
WHEN a trend summary is produced
|
||||
THE SYSTEM SHALL include explainability fields listing the top supporting and opposing evidence.
|
||||
|
||||
### 7. Trade recommendation generation
|
||||
#### Requirement 7.1
|
||||
WHEN a company trend summary is available
|
||||
THE SYSTEM SHALL be able to generate a recommendation object containing action type, thesis, confidence, expected horizon, invalidation conditions, and cited evidence.
|
||||
|
||||
#### Requirement 7.2
|
||||
WHEN a recommendation is generated
|
||||
THE SYSTEM SHALL separate descriptive analysis from prescriptive trade action and include a risk classification.
|
||||
|
||||
#### Requirement 7.3
|
||||
WHEN the system proposes a trade
|
||||
THE SYSTEM SHALL attach position sizing guidance based on configured portfolio rules rather than unconstrained model output.
|
||||
|
||||
#### Requirement 7.4
|
||||
WHEN the confidence or data quality falls below configured thresholds
|
||||
THE SYSTEM SHALL suppress automated trade eligibility and mark the recommendation as informational only.
|
||||
|
||||
### 8. Trade execution and safety controls
|
||||
#### Requirement 8.1
|
||||
WHEN trade automation is enabled
|
||||
THE SYSTEM SHALL support paper trading mode and live trading mode as separate execution environments.
|
||||
|
||||
#### Requirement 8.2
|
||||
WHEN live trading mode is enabled
|
||||
THE SYSTEM SHALL require operator approval controls, risk limits, and broker credential isolation.
|
||||
|
||||
#### Requirement 8.3
|
||||
WHEN the system places an order
|
||||
THE SYSTEM SHALL persist the full decision trace including signals used, prompt versions, model versions, thresholds, and broker response.
|
||||
|
||||
#### Requirement 8.4
|
||||
WHEN a proposed order violates configured risk controls
|
||||
THE SYSTEM SHALL reject the order before broker submission.
|
||||
|
||||
#### Requirement 8.5
|
||||
WHEN a broker API is unavailable or partially fails
|
||||
THE SYSTEM SHALL fail closed and SHALL NOT place duplicate or ambiguous orders.
|
||||
|
||||
### 9. Storage and queryability
|
||||
#### Requirement 9.1
|
||||
WHEN storing raw artifacts
|
||||
THE SYSTEM SHALL use MinIO object storage as the system of record for HTML, text, API payloads, prompts, model outputs, and exported analytical datasets.
|
||||
|
||||
#### Requirement 9.2
|
||||
WHEN storing normalized relational data
|
||||
THE SYSTEM SHALL use PostgreSQL for companies, watchlists, article metadata, document intelligence objects, trends, recommendations, operational execution records, and control-plane state.
|
||||
|
||||
#### Requirement 9.3
|
||||
WHEN low-latency coordination or caching is required
|
||||
THE SYSTEM SHALL use Redis for job state, distributed locks, short-lived caches, and rate-limit coordination.
|
||||
|
||||
#### Requirement 9.4
|
||||
WHEN historical analytical queries are needed
|
||||
THE SYSTEM SHALL persist analytical fact datasets in Hive-compatible partitioned form on MinIO so that market data, predictions, and trade outcomes can be queried together.
|
||||
|
||||
#### Requirement 9.5
|
||||
WHEN analytical table management is required
|
||||
THE SYSTEM SHALL support a lakehouse table abstraction that permits append-only fact ingestion, partitioned queries, and schema evolution.
|
||||
|
||||
### 10. SQL analytics and dashboards
|
||||
#### Requirement 10.1
|
||||
WHEN a user or service executes an analytical query
|
||||
THE SYSTEM SHALL provide an Athena-like SQL query service over MinIO-hosted analytical datasets.
|
||||
|
||||
#### Requirement 10.2
|
||||
WHEN a dashboard user explores market, prediction, and trade data
|
||||
THE SYSTEM SHALL expose QuickSight-like dashboards for performance, confidence, prediction accuracy, evidence coverage, and model behavior.
|
||||
|
||||
#### Requirement 10.3
|
||||
WHEN analytical results combine AI outputs with executed trades and market outcomes
|
||||
THE SYSTEM SHALL support joins across predicted signals, broker executions, and realized performance data.
|
||||
|
||||
#### Requirement 10.4
|
||||
WHEN dashboards or research queries need drill-down capability
|
||||
THE SYSTEM SHALL provide traceability from analytical aggregates back to underlying documents, prompts, model outputs, and raw artifacts.
|
||||
|
||||
### 11. APIs and UI
|
||||
#### Requirement 11.1
|
||||
WHEN a client requests company analytics
|
||||
THE SYSTEM SHALL expose APIs for document timelines, trend summaries, recommendation history, execution history, and evidence drill-down.
|
||||
|
||||
#### Requirement 11.2
|
||||
WHEN an operator inspects a recommendation
|
||||
THE SYSTEM SHALL display the contributing document intelligence objects, the raw sources used, and any market context features that influenced the decision.
|
||||
|
||||
#### Requirement 11.3
|
||||
WHEN a user reviews an order decision
|
||||
THE SYSTEM SHALL expose a full audit trail from ingestion through broker execution and eventual market outcome.
|
||||
|
||||
### 12. Observability and operations
|
||||
#### Requirement 12.1
|
||||
WHEN a pipeline stage runs
|
||||
THE SYSTEM SHALL emit structured logs, metrics, and traces for ingestion, parsing, extraction, aggregation, analytics publication, and trading.
|
||||
|
||||
#### Requirement 12.2
|
||||
WHEN model performance degrades
|
||||
THE SYSTEM SHALL surface schema failure rates, latency percentiles, token usage estimates, and extraction retry counts.
|
||||
|
||||
#### Requirement 12.3
|
||||
WHEN source coverage changes materially
|
||||
THE SYSTEM SHALL alert operators about sustained source failures, symbol coverage gaps, or analytical publication lag.
|
||||
|
||||
## Non-Functional Requirements
|
||||
#### Requirement N1
|
||||
WHEN the system processes documents and market events concurrently
|
||||
THE SYSTEM SHALL support horizontal scaling across Kubernetes workers.
|
||||
|
||||
#### Requirement N2
|
||||
WHEN the system stores model-derived conclusions
|
||||
THE SYSTEM SHALL preserve enough provenance to reproduce or challenge those conclusions later.
|
||||
|
||||
#### Requirement N3
|
||||
WHEN the system handles licensed or restricted content
|
||||
THE SYSTEM SHALL preserve source metadata, access policy, and retention policy for each artifact.
|
||||
|
||||
#### Requirement N4
|
||||
WHEN the system publishes analytical datasets
|
||||
THE SYSTEM SHALL ensure queryable partitions are written atomically or with an equivalent consistency guarantee.
|
||||
|
||||
#### Requirement N5
|
||||
WHEN trade execution is enabled
|
||||
THE SYSTEM SHALL prioritize fail-closed behavior over availability in ambiguous conditions.
|
||||
|
||||
#### Requirement N6
|
||||
WHEN dashboards query large historical datasets
|
||||
THE SYSTEM SHALL support partition pruning and index or metadata strategies that keep typical analyst queries responsive.
|
||||
@@ -0,0 +1,129 @@
|
||||
# Stonks Oracle - Tasks
|
||||
|
||||
## Phase 0 - Project Setup
|
||||
- [x] Create repository structure for services, shared schemas, infrastructure, lakehouse, and dashboards
|
||||
- [x] Choose implementation language for services (Python preferred for scraping/LLM workflows)
|
||||
- [x] Add local development stack with MinIO, PostgreSQL, Redis, Ollama, Trino, and Superset
|
||||
- [x] Add Kubernetes manifests or Helm chart skeletons for all core components
|
||||
- [x] Add CI pipeline for linting, tests, container builds, schema checks, and lake dataset validation
|
||||
|
||||
## Phase 1 - Core Data and Infrastructure
|
||||
- [x] Create PostgreSQL schema migrations for companies, watchlists, sources, documents, document intelligence, trends, recommendations, orders, positions, and audit records
|
||||
- [x] Create MinIO bucket provisioning and lifecycle policies
|
||||
- [x] Create Redis key conventions and queue abstractions
|
||||
- [x] Implement shared config loader for environment variables and secrets
|
||||
- [x] Implement shared typed JSON schemas for document intelligence, trend summaries, and recommendations
|
||||
- [x] Stand up initial Trino catalog configuration for MinIO-backed datasets
|
||||
- [x] Stand up Superset with environment-backed datasource configuration
|
||||
|
||||
## Phase 2 - Symbol Registry and Source Management
|
||||
- [ ] Build symbol registry API endpoints for companies, aliases, watchlists, and sources
|
||||
- [ ] Add source credibility, retention policy, and access policy fields
|
||||
- [ ] Add source classes for market data API, news API, filings API, web scrape, and broker adapter
|
||||
- [ ] Add admin validation for duplicate tickers, invalid URLs, and unsupported source types
|
||||
- [ ] Add seed data support for an initial tracked watchlist
|
||||
## Phase 3
|
||||
- External API Adapters
|
||||
- [ ] Implement scheduler for symbol and source polling windows
|
||||
- [ ] Implement market data API adapter interface
|
||||
- [ ] Implement first concrete market data provider adapter
|
||||
- [ ] Implement news API adapter interface
|
||||
- [ ] Implement first concrete news API provider adapter
|
||||
- [ ] Implement filings or regulatory adapter interface
|
||||
- [ ] Implement first concrete filings provider adapter
|
||||
- [ ] Implement broker API adapter interface for paper trading and order events
|
||||
- [ ] Implement rate-limit coordination, retries, and backoff across adapters
|
||||
|
||||
## Phase 4 - Ingestion Pipeline
|
||||
- [ ] Implement web scraper worker for curated URLs and article pages
|
||||
- [ ] Implement canonical URL normalization and content hashing
|
||||
- [ ] Implement raw artifact upload to MinIO
|
||||
- [ ] Implement metadata persistence in PostgreSQL for market payloads, documents, and broker events
|
||||
- [ ] Implement retry and failure tracking for source retrieval
|
||||
- [ ] Implement dedupe logic across article and filing sources
|
||||
|
||||
## Phase 5 - Parsing and Normalization
|
||||
- [ ] Implement HTML-to-text parsing pipeline
|
||||
- [ ] Implement boilerplate reduction and body extraction heuristics
|
||||
- [ ] Implement parser quality scoring and confidence flags
|
||||
- [ ] Implement company mention detection using ticker, alias, and name matching
|
||||
- [ ] Persist normalized text and parser outputs to MinIO and PostgreSQL
|
||||
|
||||
## Phase 6 - Ollama Structured Extraction
|
||||
- [ ] Build extraction prompt templates with anti-hallucination instructions
|
||||
- [ ] Build JSON schema definitions for document intelligence extraction
|
||||
- [ ] Implement Ollama client wrapper using structured output format
|
||||
- [ ] Implement schema validation and semantic validation layers
|
||||
- [ ] Persist prompts, model metadata, raw outputs, validation reports, and final intelligence objects
|
||||
- [ ] Add retry behavior for invalid or incomplete model responses
|
||||
- [ ] Add model performance metrics and dashboards
|
||||
|
||||
## Phase 7 - Aggregation and Trend Engine
|
||||
- [ ] Implement recency decay and source credibility weighting
|
||||
- [ ] Integrate market context features into aggregation windows
|
||||
- [ ] Implement company-level rolling window aggregation
|
||||
- [ ] Implement contradiction detection and disagreement representation
|
||||
- [ ] Implement sector and market rollups
|
||||
- [ ] Implement evidence ranking for supporting and opposing documents
|
||||
- [ ] Persist trend windows and evidence mappings
|
||||
|
||||
## Phase 8 - Recommendation Engine
|
||||
- [ ] Design deterministic recommendation eligibility logic
|
||||
- [ ] Implement recommendation generation from aggregated scores and evidence
|
||||
- [ ] Add optional LLM wording layer for thesis generation only
|
||||
- [ ] Persist recommendation objects and evidence citations
|
||||
- [ ] Add suppression logic for low-quality data or low confidence
|
||||
- [ ] Publish prediction facts to analytical tables
|
||||
|
||||
## Phase 9 - Risk Engine and Trade Adapter
|
||||
- [ ] Implement portfolio and account risk configuration model
|
||||
- [ ] Implement hard blocks for max position size, sector exposure, daily loss limits, and news-shock lockouts
|
||||
- [ ] Implement paper trading adapter behavior and state sync
|
||||
- [ ] Integrate first broker API in sandbox mode
|
||||
- [ ] Implement idempotent order submission keys and duplicate prevention
|
||||
- [ ] Implement full execution audit trail
|
||||
- [ ] Add operator approval workflow for live trading mode
|
||||
- [ ] Publish order, fill, and position facts to analytical tables
|
||||
|
||||
## Phase 10 - Lakehouse and SQL Analytics
|
||||
- [ ] Define analytical fact tables for bars, documents, extractions, signals, orders, fills, positions, and PnL
|
||||
- [ ] Implement Parquet writers for analytical datasets
|
||||
- [ ] Implement Hive-compatible partition layout conventions on MinIO
|
||||
- [ ] Implement Iceberg table creation and metadata management for analytical datasets
|
||||
- [ ] Implement lake publisher jobs from operational data into analytical fact tables
|
||||
- [ ] Configure Trino catalogs for Hive and or Iceberg access to MinIO
|
||||
- [ ] Add example SQL views for prediction-vs-outcome and paper-trade scorecards
|
||||
|
||||
## Phase 11 - Query API and Dashboard
|
||||
- [ ] Build APIs for companies, document timelines, trend summaries, recommendations, and order history
|
||||
- [ ] Build evidence drill-down view linking recommendations to source documents and raw artifacts
|
||||
- [ ] Build admin controls for source health, symbol configs, and trading mode
|
||||
- [ ] Build operational dashboard for ingestion throughput, model failures, and source coverage gaps
|
||||
- [ ] Build Superset starter dashboards for symbol overview, sentiment heatmap, PnL, and prediction accuracy
|
||||
|
||||
## Phase 12 - Observability and Hardening
|
||||
- [ ] Add structured logs and distributed tracing across services
|
||||
- [ ] Add Prometheus metrics for ingestion, parsing, extraction, aggregation, lake publication, and trading
|
||||
- [ ] Add alerting for source failures, schema failure spikes, analytical lag, and broker issues
|
||||
- [ ] Add dead-letter queues and replay tooling
|
||||
- [ ] Add data retention and lifecycle controls for raw and derived artifacts
|
||||
- [ ] Add security review for secrets, network policies, trading isolation, and dashboard access control
|
||||
|
||||
## Phase 13 - Verification and Rollout
|
||||
- [ ] Create replay dataset from archived documents for deterministic extraction testing
|
||||
- [ ] Create integration tests for the full ingest-to-recommendation flow
|
||||
- [ ] Create paper trading simulation scenarios
|
||||
- [ ] Validate fail-closed behavior for broker outages and ambiguous order states
|
||||
- [ ] Validate lake publication and Trino query correctness over partitioned MinIO datasets
|
||||
- [ ] Run shadow mode before enabling any live execution
|
||||
- [ ] Prepare operator runbook and incident response procedures
|
||||
|
||||
## Recommended First Vertical Slice
|
||||
- [ ] Track 5 to 10 symbols
|
||||
- [ ] Ingest one market data API, one news API, and one filings source per symbol group
|
||||
- [ ] Persist raw artifacts to MinIO and metadata to PostgreSQL
|
||||
- [ ] Extract structured document intelligence through Ollama
|
||||
- [ ] Generate 7-day company trend summaries with market context
|
||||
- [ ] Produce paper-trade recommendations only
|
||||
- [ ] Publish analytical facts for bars, signals, and paper trades into MinIO
|
||||
- [ ] Expose a simple dashboard with evidence, trend cards, and prediction-vs-outcome views
|
||||
@@ -0,0 +1,44 @@
|
||||
# Development Process — Test-Develop-Debug
|
||||
|
||||
## Workflow
|
||||
1. Write or update tests for the target behavior
|
||||
2. Implement the minimal code to pass
|
||||
3. Debug failures, fix, re-run
|
||||
4. Commit and push after each phase completes
|
||||
5. GitHub Actions builds container images and pushes to GHCR
|
||||
6. Deploy to cluster via `kubectl apply`
|
||||
|
||||
## Testing
|
||||
- Use `pytest` with `pytest-asyncio` for async code
|
||||
- Tests live alongside service code or in a top-level `tests/` directory
|
||||
- Run tests with `pytest --tb=short -q` or `pytest -x` for fail-fast
|
||||
- Focus on core logic, not mocking infrastructure
|
||||
|
||||
## Build and Deploy
|
||||
- Always build and test Docker images locally before pushing to GitHub
|
||||
- Only push to GitHub after local build succeeds — don't waste CI credits on broken builds
|
||||
- Dockerfile at `docker/Dockerfile`
|
||||
- GitHub workflow at `.github/workflows/build.yml`
|
||||
- Images tagged as `ghcr.io/celesrenata/stonks-oracle/<service>:<sha>` and `:latest`
|
||||
- K8s manifests reference GHCR images
|
||||
- Deploy: `kubectl apply -f infra/k8s/`
|
||||
- Local build: `make build` → verify → `git push` → CI builds and pushes to GHCR
|
||||
|
||||
## Git Conventions
|
||||
- Commit after each completed phase task
|
||||
- Commit message format: `phase N: short description`
|
||||
- Push to `main` branch triggers CI
|
||||
|
||||
## Code Style
|
||||
- Python 3.12, type hints everywhere
|
||||
- Pydantic for data validation
|
||||
- FastAPI for HTTP services
|
||||
- asyncio + asyncpg/aioredis for async I/O
|
||||
- Minimal dependencies, prefer stdlib where possible
|
||||
|
||||
## Documentation
|
||||
- Do NOT create large summary/success markdown files after each step
|
||||
- Keep notes short, concise, and organized under `docs/notes/`
|
||||
- Name note files to match the task they relate to (e.g. `docs/notes/phase0-k8s-manifests.md`)
|
||||
- This makes them recallable by task without guessing
|
||||
- If a note isn't useful for future reference, don't write it
|
||||
@@ -0,0 +1,33 @@
|
||||
---
|
||||
inclusion: fileMatch
|
||||
fileMatchPattern: "infra/k8s/**"
|
||||
---
|
||||
# Kubernetes Conventions
|
||||
|
||||
## Namespace
|
||||
All Stonks Oracle workloads deploy to `stonks-oracle` namespace.
|
||||
|
||||
## TLS
|
||||
- Internal services: use `ca-issuer` ClusterIssuer (local CA)
|
||||
- Public-facing services (Superset, Query API): use `celestium-le-production` ClusterIssuer (Let's Encrypt)
|
||||
- Annotate ingress with `cert-manager.io/cluster-issuer`
|
||||
|
||||
## Ingress
|
||||
- Traefik ingress controller
|
||||
- Domain pattern: `<service>.celestium.life`
|
||||
- Always create both HTTP and HTTPS ingress rules
|
||||
|
||||
## Service References
|
||||
- PostgreSQL: `postgresql-rw.postgresql-service.svc.cluster.local:5432`
|
||||
- Redis: `redis-master.redis-service.svc.cluster.local:6379`
|
||||
- MinIO API: `minio.minio-service.svc.cluster.local:80`
|
||||
- Ollama: `ollama.ollama-service.svc.cluster.local:11434`
|
||||
|
||||
## Images
|
||||
- All images from `ghcr.io/celesrenata/stonks-oracle/<service>:latest`
|
||||
- Use `imagePullPolicy: Always` in production
|
||||
- Use `imagePullSecrets` referencing `ghcr-secret` if repo is private
|
||||
|
||||
## Labels
|
||||
- `app.kubernetes.io/part-of: stonks-oracle`
|
||||
- `app: <service-name>`
|
||||
@@ -0,0 +1,33 @@
|
||||
# Stonks Oracle — Project Context
|
||||
|
||||
## Overview
|
||||
Stonks Oracle is a Kubernetes-native AI market intelligence and paper-trading platform.
|
||||
Python monorepo with services under `services/`, infrastructure under `infra/`, lakehouse schemas under `lakehouse/`, and dashboards under `dashboards/`.
|
||||
|
||||
## Infrastructure
|
||||
- Kubernetes cluster: 4x NixOS nodes (gremlin-1 through gremlin-4), reachable via `kubectl`, `virtctl`, `ssh root@gremlin-{1,2,3,4}`
|
||||
- NixOS configs stored at `/etc/nixos` on gremlin-1, git-pushed to other hosts
|
||||
- Ingress: Traefik, domain `*.celestium.life`
|
||||
- Cert-Manager: `ca-issuer` (local CA) for internal services, `celestium-le-production` (Let's Encrypt) for public-facing
|
||||
- Container registry: `ghcr.io/celesrenata/stonks-oracle`
|
||||
- CI: GitHub Actions builds containers, cluster pulls from GHCR
|
||||
|
||||
## Existing Cluster Services (do NOT redeploy these)
|
||||
- PostgreSQL: `postgresql-rw.postgresql-service.svc.cluster.local:5432`
|
||||
- Redis: `redis-master.redis-service.svc.cluster.local:6379`
|
||||
- MinIO: `minio.minio-service.svc.cluster.local:80` (API), console at `minio-crawler-console.minio-service.svc.cluster.local:9090`
|
||||
- Ollama: `ollama.ollama-service.svc.cluster.local:11434` (cluster-internal), also at `http://10.1.1.12:2701` (external), GPU: 4070 Ti Super 16GB
|
||||
|
||||
## Development Process
|
||||
- Test-Develop-Debug (TDD) cycle
|
||||
- After each phase: commit, push, build via GitHub Actions, deploy to cluster
|
||||
- Local builds for dev iteration, GitHub workflows for CI/CD
|
||||
- Python 3.12, NixOS dev environment
|
||||
|
||||
## Key Conventions
|
||||
- All services use `services/shared/config.py` for configuration via env vars
|
||||
- Redis queues defined in `services/shared/redis_keys.py`
|
||||
- Pydantic schemas in `services/shared/schemas.py`
|
||||
- K8s manifests in `infra/k8s/`, all in `stonks-oracle` namespace
|
||||
- Lakehouse DDL in `lakehouse/schemas/`
|
||||
- Crawler patterns inspired by Noctipede (`~/sources/splinterstice/noctipede`): BeautifulSoup + requests with retry adapters, content hashing, boilerplate stripping, quality scoring
|
||||
Reference in New Issue
Block a user