phase 14-15: docker build validation and helm deployment

This commit is contained in:
Celes Renata
2026-04-11 11:59:45 -07:00
parent 7394d241c9
commit ce10afa034
179 changed files with 32559 additions and 576 deletions
+127 -73
View File
@@ -24,99 +24,153 @@
- [x] Add seed data support for an initial tracked watchlist
## Phase 3
- External API Adapters
- [ ] Implement scheduler for symbol and source polling windows
- [ ] Implement market data API adapter interface
- [ ] Implement first concrete market data provider adapter
- [ ] Implement news API adapter interface
- [ ] Implement first concrete news API provider adapter
- [ ] Implement filings or regulatory adapter interface
- [ ] Implement first concrete filings provider adapter
- [ ] Implement broker API adapter interface for paper trading and order events
- [ ] Implement rate-limit coordination, retries, and backoff across adapters
- [x] Implement scheduler for symbol and source polling windows
- [x] Implement market data API adapter interface
- [x] Implement first concrete market data provider adapter
- [x] Implement news API adapter interface
- [x] Implement first concrete news API provider adapter
- [x] Implement filings or regulatory adapter interface
- [x] Implement first concrete filings provider adapter
- [x] Implement broker API adapter interface for paper trading and order events
- [x] Implement rate-limit coordination, retries, and backoff across adapters
## Phase 4 - Ingestion Pipeline
- [ ] Implement web scraper worker for curated URLs and article pages
- [ ] Implement canonical URL normalization and content hashing
- [ ] Implement raw artifact upload to MinIO
- [ ] Implement metadata persistence in PostgreSQL for market payloads, documents, and broker events
- [ ] Implement retry and failure tracking for source retrieval
- [ ] Implement dedupe logic across article and filing sources
- [x] Implement web scraper worker for curated URLs and article pages
- [x] Implement canonical URL normalization and content hashing
- [x] Implement raw artifact upload to MinIO
- [x] Implement metadata persistence in PostgreSQL for market payloads, documents, and broker events
- [x] Implement retry and failure tracking for source retrieval
- [x] Implement dedupe logic across article and filing sources
## Phase 5 - Parsing and Normalization
- [ ] Implement HTML-to-text parsing pipeline
- [ ] Implement boilerplate reduction and body extraction heuristics
- [ ] Implement parser quality scoring and confidence flags
- [ ] Implement company mention detection using ticker, alias, and name matching
- [ ] Persist normalized text and parser outputs to MinIO and PostgreSQL
- [x] Implement HTML-to-text parsing pipeline
- [x] Implement boilerplate reduction and body extraction heuristics
- [x] Implement parser quality scoring and confidence flags
- [x] Implement company mention detection using ticker, alias, and name matching
- [x] Persist normalized text and parser outputs to MinIO and PostgreSQL
## Phase 6 - Ollama Structured Extraction
- [ ] Build extraction prompt templates with anti-hallucination instructions
- [ ] Build JSON schema definitions for document intelligence extraction
- [ ] Implement Ollama client wrapper using structured output format
- [ ] Implement schema validation and semantic validation layers
- [ ] Persist prompts, model metadata, raw outputs, validation reports, and final intelligence objects
- [ ] Add retry behavior for invalid or incomplete model responses
- [ ] Add model performance metrics and dashboards
- [x] Build extraction prompt templates with anti-hallucination instructions
- [x] Build JSON schema definitions for document intelligence extraction
- [x] Implement Ollama client wrapper using structured output format
- [x] Implement schema validation and semantic validation layers
- [x] Persist prompts, model metadata, raw outputs, validation reports, and final intelligence objects
- [x] Add retry behavior for invalid or incomplete model responses
- [x] Add model performance metrics and dashboards
## Phase 7 - Aggregation and Trend Engine
- [ ] Implement recency decay and source credibility weighting
- [ ] Integrate market context features into aggregation windows
- [ ] Implement company-level rolling window aggregation
- [ ] Implement contradiction detection and disagreement representation
- [ ] Implement sector and market rollups
- [ ] Implement evidence ranking for supporting and opposing documents
- [ ] Persist trend windows and evidence mappings
- [x] Implement recency decay and source credibility weighting
- [x] Integrate market context features into aggregation windows
- [x] Implement company-level rolling window aggregation
- [x] Implement contradiction detection and disagreement representation
- [x] Implement sector and market rollups
- [x] Implement evidence ranking for supporting and opposing documents
- [x] Persist trend windows and evidence mappings
## Phase 8 - Recommendation Engine
- [ ] Design deterministic recommendation eligibility logic
- [ ] Implement recommendation generation from aggregated scores and evidence
- [ ] Add optional LLM wording layer for thesis generation only
- [ ] Persist recommendation objects and evidence citations
- [ ] Add suppression logic for low-quality data or low confidence
- [ ] Publish prediction facts to analytical tables
- [x] Design deterministic recommendation eligibility logic
- [x] Implement recommendation generation from aggregated scores and evidence
- [x] Add optional LLM wording layer for thesis generation only
- [x] Persist recommendation objects and evidence citations
- [x] Add suppression logic for low-quality data or low confidence
- [x] Publish prediction facts to analytical tables
## Phase 9 - Risk Engine and Trade Adapter
- [ ] Implement portfolio and account risk configuration model
- [ ] Implement hard blocks for max position size, sector exposure, daily loss limits, and news-shock lockouts
- [ ] Implement paper trading adapter behavior and state sync
- [ ] Integrate first broker API in sandbox mode
- [ ] Implement idempotent order submission keys and duplicate prevention
- [ ] Implement full execution audit trail
- [ ] Add operator approval workflow for live trading mode
- [ ] Publish order, fill, and position facts to analytical tables
- [x] Implement portfolio and account risk configuration model
- [x] Implement hard blocks for max position size, sector exposure, daily loss limits, and news-shock lockouts
- [x] Implement paper trading adapter behavior and state sync
- [x] Integrate first broker API in sandbox mode
- [x] Implement idempotent order submission keys and duplicate prevention
- [x] Implement full execution audit trail
- [x] Add operator approval workflow for live trading mode
- [x] Publish order, fill, and position facts to analytical tables
## Phase 10 - Lakehouse and SQL Analytics
- [ ] Define analytical fact tables for bars, documents, extractions, signals, orders, fills, positions, and PnL
- [ ] Implement Parquet writers for analytical datasets
- [ ] Implement Hive-compatible partition layout conventions on MinIO
- [ ] Implement Iceberg table creation and metadata management for analytical datasets
- [ ] Implement lake publisher jobs from operational data into analytical fact tables
- [ ] Configure Trino catalogs for Hive and or Iceberg access to MinIO
- [ ] Add example SQL views for prediction-vs-outcome and paper-trade scorecards
- [x] Define analytical fact tables for bars, documents, extractions, signals, orders, fills, positions, and PnL
- [x] Implement Parquet writers for analytical datasets
- [x] Implement Hive-compatible partition layout conventions on MinIO
- [x] Implement Iceberg table creation and metadata management for analytical datasets
- [x] Implement lake publisher jobs from operational data into analytical fact tables
- [x] Configure Trino catalogs for Hive and or Iceberg access to MinIO
- [x] Add example SQL views for prediction-vs-outcome and paper-trade scorecards
## Phase 11 - Query API and Dashboard
- [ ] Build APIs for companies, document timelines, trend summaries, recommendations, and order history
- [ ] Build evidence drill-down view linking recommendations to source documents and raw artifacts
- [ ] Build admin controls for source health, symbol configs, and trading mode
- [ ] Build operational dashboard for ingestion throughput, model failures, and source coverage gaps
- [ ] Build Superset starter dashboards for symbol overview, sentiment heatmap, PnL, and prediction accuracy
- [x] Build APIs for companies, document timelines, trend summaries, recommendations, and order history
- [x] Build evidence drill-down view linking recommendations to source documents and raw artifacts
- [x] Build admin controls for source health, symbol configs, and trading mode
- [x] Build operational dashboard for ingestion throughput, model failures, and source coverage gaps
- [x] Build Superset starter dashboards for symbol overview, sentiment heatmap, PnL, and prediction accuracy
## Phase 12 - Observability and Hardening
- [ ] Add structured logs and distributed tracing across services
- [ ] Add Prometheus metrics for ingestion, parsing, extraction, aggregation, lake publication, and trading
- [ ] Add alerting for source failures, schema failure spikes, analytical lag, and broker issues
- [ ] Add dead-letter queues and replay tooling
- [ ] Add data retention and lifecycle controls for raw and derived artifacts
- [ ] Add security review for secrets, network policies, trading isolation, and dashboard access control
- [x] Add structured logs and distributed tracing across services
- [x] Add Prometheus metrics for ingestion, parsing, extraction, aggregation, lake publication, and trading
- [x] Add alerting for source failures, schema failure spikes, analytical lag, and broker issues
- [x] Add dead-letter queues and replay tooling
- [x] Add data retention and lifecycle controls for raw and derived artifacts
- [x] Add security review for secrets, network policies, trading isolation, and dashboard access control
## Phase 13 - Verification and Rollout
- [ ] Create replay dataset from archived documents for deterministic extraction testing
- [ ] Create integration tests for the full ingest-to-recommendation flow
- [ ] Create paper trading simulation scenarios
- [ ] Validate fail-closed behavior for broker outages and ambiguous order states
- [ ] Validate lake publication and Trino query correctness over partitioned MinIO datasets
- [ ] Run shadow mode before enabling any live execution
- [ ] Prepare operator runbook and incident response procedures
- [x] Create replay dataset from archived documents for deterministic extraction testing
- [x] Create integration tests for the full ingest-to-recommendation flow
- [x] Create paper trading simulation scenarios
- [x] Validate fail-closed behavior for broker outages and ambiguous order states
- [x] Validate lake publication and Trino query correctness over partitioned MinIO datasets
- [x] ~~Run shadow mode~~ moved to Phase 15.5 (post-deployment)
- [x] ~~Prepare operator runbook~~ moved to Phase 15.5 (post-deployment)
## Phase 14 - Local Docker Build Validation
- [x] 14. Build and validate all Docker containers locally
- [x] 14.1 Build all 11 service containers locally using the Makefile
- Run `make build` to build scheduler, symbol-registry, ingestion, parser, extractor, aggregation, recommendation, risk, broker-adapter, lake-publisher, and query-api images
- Fix any build failures (missing dependencies, import errors, syntax issues)
- _Requirements: N1, 12.1_
- [x] 14.2 Validate schema and logic consistency across all services
- Run the full test suite with `pytest tests/ -x --tb=short -q` to catch import errors, schema mismatches, and logic inconsistencies
- Verify all shared schemas in `services/shared/schemas.py` are consistent with what each service expects
- Verify config loader fields match the configmap and secrets definitions
- Fix any mismatches found between services, schemas, migrations, and K8s manifests
- _Requirements: 5.2, 5.3, 9.2, N2_
- [x] 14.3 Verify each container starts without immediate crash
- Run each built image with `docker run --rm` and a quick health check or `--help` flag to confirm the entrypoint resolves
- Fix any runtime import errors or missing module paths
- _Requirements: N1_
## Phase 15 - CI Validation, Helm Deployment, and Cluster Rollout
- [-] 15. Commit, push, validate CI, create Helm chart, and deploy to cluster
- [-] 15.1 Commit and push code to GitHub
- Configure git with SSH key for the private repo
- Commit all current changes with message `phase 14-15: docker build validation and helm deployment`
- Push to main branch
- _Requirements: N1_
- [ ] 15.2 Validate GitHub Actions workflow builds containers
- Monitor the GitHub Actions run to confirm lint-and-test and build-services jobs succeed
- Fix any CI failures and re-push if needed
- _Requirements: N1_
- [ ] 15.3 Create Helm chart for stonks-oracle deployment
- Create `infra/helm/stonks-oracle/Chart.yaml` with chart metadata
- Create `infra/helm/stonks-oracle/values.yaml` with configurable image tags, replica counts, resource limits, and environment references
- Create Helm templates for all deployments, services, configmap, secrets, ingress, and network policies from existing K8s manifests
- Add imagePullSecrets configuration for GHCR private registry access
- Add a template for a Kubernetes Secret of type `kubernetes.io/dockerconfigjson` for GHCR authentication
- _Requirements: N1, 8.2_
- [ ] 15.4 Configure GHCR image pull authentication on the cluster
- Create a `docker-registry` secret in the `stonks-oracle` namespace with GHCR credentials (using a GitHub PAT or deploy key)
- Reference the imagePullSecret in all deployment specs via the Helm values
- _Requirements: 8.2, N1_
- [ ] 15.5 Deploy stonks-oracle to the cluster via Helm
- Run `helm install` or `helm upgrade --install` targeting the `stonks-oracle` namespace
- Verify all pods reach Running/Ready state
- Verify services and ingress endpoints are reachable
- Debug and fix any deployment issues (CrashLoopBackOff, image pull errors, config mismatches)
- _Requirements: N1, 12.1_
- [ ] 15.6 Run shadow mode before enabling any live execution
- Confirm all services are running and processing in paper-only mode
- Validate end-to-end data flow from ingestion through recommendation without live trades
- _Requirements: N5, 8.1_
- [ ] 15.7 Prepare operator runbook and incident response procedures
- Document service restart procedures, log access, and common failure modes
- Document how to toggle trading modes and approve live execution
- _Requirements: 8.2, 12.1_
## Recommended First Vertical Slice
- [ ] Track 5 to 10 symbols