Commit Graph

340 Commits

Author SHA1 Message Date
Celes Renata 2904d3215a ci: retrigger after webhook cleanup 2026-04-21 01:55:50 +00:00
Celes Renata c166aafc40 fix: reduce integration test timeout to 5 minutes
Tests complete in ~7s. The 10-minute timeout was causing unnecessary
wait time on failures. Reduced Job activeDeadlineSeconds and kubectl
wait timeout to 300s.
2026-04-21 01:52:11 +00:00
Celes Renata 490d7a25a8 fix: update signal flow test assertions to match actual API responses
- rec['mode'] can be 'autonomous' (not just informational/paper/live)
- risk check uses 'check_name'/'result' not 'name'/'passed'
- decision type can be 'execute' not just 'act'/'skip'
2026-04-21 01:34:49 +00:00
Celes Renata be526ae614 feat: pipeline on/off toggle with per-stage Helm control
- Added pipelineEnabled flag to Helm values (default: true)
- Worker services (scheduler, ingestion, parser, extractor, aggregation,
  recommendation, broker-adapter, lake-publisher) scale to 0 when disabled
- API services always run regardless of toggle
- Redis-based runtime toggle: POST /api/ops/pipeline/toggle
- Scheduler checks the flag before each cycle
- Frontend: green/red Pipeline ON/OFF button on the pipeline page
- Beta defaults to pipelineEnabled: false
- Base values.yaml: blanked external URLs (Ollama, Polygon, Alpaca)
  so stages only connect to what they explicitly configure
2026-04-21 00:21:53 +00:00
Celes Renata a19ed086fe fix: blank external URLs in base values, set only in stage overrides
Base values.yaml now has empty OLLAMA_BASE_URL, MARKET_DATA_BASE_URL,
and BROKER_PROVIDER. Only paper (and eventually live) set the real
URLs. Beta inherits empty defaults so it can't reach external services.
2026-04-21 00:15:22 +00:00
Celes Renata d8fce71178 fix: disable pipeline workers and blank API keys in beta stage
Beta is for API testing only. Scale scheduler, ingestion, parser,
extractor, aggregation, recommendation, broker-adapter, and
lake-publisher to 0 replicas. Blank out Polygon and Alpaca keys.
Infra secrets (postgres, redis, minio) kept so API services work.
2026-04-21 00:14:02 +00:00
Celes Renata 7eab50fda9 fix: disable all external connections and pipeline workers in beta
Beta is for API testing only. Blanked out Polygon/Alpaca/Ollama
credentials, set OLLAMA_BASE_URL to localhost:99999, and scaled
scheduler/ingestion/parser/extractor/aggregation/recommendation/
broker-adapter/lake-publisher to 0 replicas.
2026-04-21 00:13:42 +00:00
Celes Renata 7071bba92d fix: increase stale threshold to 4h to prevent duplicate enqueuing
The 30-minute threshold was shorter than the queue drain time, causing
the recovery sweep to re-enqueue docs that were already queued but not
yet processed. Bumped to 4 hours with matching marker TTL.
2026-04-20 18:05:30 +00:00
Celes Renata 20faa8e20d fix: bake secrets into values-paper.yaml and auto-seed on empty DB
- All paper stage credentials now in values-paper.yaml so ArgoCD
  renders them correctly on every sync (no more empty secrets)
- Added seed-if-empty init container to scheduler: runs the seed
  script if the companies table is empty after migrations
2026-04-20 17:40:41 +00:00
Celes Renata 46c24aefab fix: prevent duplicate queue entries with Redis SET markers
Recovery sweeps and the retry endpoint now check a per-document Redis
key (SET NX, 1h TTL) before pushing to the queue. If the marker exists,
the doc is already enqueued and gets skipped. This prevents the
scheduler from re-enqueuing the same parsed docs every 5 minutes.
2026-04-20 17:24:53 +00:00
Celes Renata 288c5333b5 fix: use queue_key() for stage-prefixed Redis queue names in pipeline endpoints
The pipeline health, SSE stream, and retry endpoints were hardcoding
'stonks:queue:{name}' but services use DEPLOY_STAGE prefix
('stonks:paper:queue:{name}'). Now uses queue_key() from redis_keys.py.
2026-04-20 13:16:11 +00:00
Celes Renata 740ddc1c54 fix: revert extractor to 1 replica (single GPU bottleneck) 2026-04-20 12:16:51 +00:00
Celes Renata 7fc54a6023 ci: retrigger with docker hub token 2026-04-20 11:44:43 +00:00
Celes Renata 5b07821ed4 ci: retrigger with all secrets 2026-04-20 11:40:32 +00:00
Celes Renata 8218a17d23 ci: retrigger with secrets configured 2026-04-20 11:38:43 +00:00
Celes Renata 9dc36c32bc ci: retrigger with trusted repo 2026-04-20 11:37:06 +00:00
Celes Renata 25f8c0dc97 ci: trigger fresh woodpecker build 2026-04-20 11:36:17 +00:00
Celes Renata 7a082667b6 ci: retrigger build (webhook restored) 2026-04-20 11:16:33 +00:00
Celes Renata f20c737de3 ci: retrigger after webhook fix 2026-04-20 11:14:44 +00:00
Celes Renata b6ca89742d ci: retrigger build 2026-04-20 11:07:30 +00:00
Celes Renata f1f0b7e34c fix: scale extractor to 3 replicas in paper stage
The extraction queue had 3000+ SEC filings backed up with a single
extractor pod processing them at 10-115s each. Ollama handles
concurrent requests so multiple extractor pods can share the GPU.
2026-04-20 10:59:05 +00:00
Celes Renata de35279269 feat: retry failed extractions button on pipeline page
- POST /api/ops/pipeline/retry-failed endpoint resets extraction_failed
  docs to parsed, deletes failed intelligence rows, and re-enqueues
  them (batch of 200)
- Scheduler now auto-retries extraction_failed docs every ~10 minutes
  (100 per cycle, 60-min cooldown per doc)
- Pipeline page shows 'Retry Failed (N)' button when extraction_failed
  count > 0, with pending/success/error states
2026-04-20 08:09:29 +00:00
Celes Renata 5289f0f195 fix: use kubectl wait for job completion detection in inttest pipeline
The polling loop checked conditions[0].type which missed the Complete
condition when it wasn't at index 0. Switch to kubectl wait
--for=condition=complete which handles condition matching reliably.
2026-04-20 07:47:10 +00:00
Celes Renata 40d4fd8197 ci: retrigger after woodpecker oauth reauth 2026-04-20 07:29:57 +00:00
Celes Renata eda49be927 ci: retrigger woodpecker after oauth fix 2026-04-20 07:27:43 +00:00
Celes Renata 1c861e3948 ci: trigger woodpecker pipeline 2026-04-20 07:20:34 +00:00
Celes Renata 950ff03f7e fix: join through document_intelligence in patterns endpoint
The inline catalyst_type query in GET /api/patterns/{ticker} referenced
dir.document_id which does not exist on document_impact_records. The
table links to documents via intelligence_id -> document_intelligence ->
document_id. Added the missing JOIN to match the pattern used in
_SELF_PATTERN_QUERY.
2026-04-20 07:12:13 +00:00
Celes Renata 5acb2fb43e fix: resolve 6 integration test failures
1. patterns endpoint: fix query referencing non-existent column
   di.catalyst_type → dir.catalyst_type (column is on document_impact_records)
2. lockouts seed: use relative timestamps (now + 7d) so active lockout
   is always in the future regardless of when tests run
3. create_agent: make slug optional with auto-generation from name
4. create_source: json.dumps(config) + ::jsonb cast for asyncpg JSONB compat
5. approval_expiry: return count as int (len(expired)) not the list itself
6. metrics_consistency: fix test assertion to match API contract
   (total >= active + reserve, not total == active + reserve + unrealized)
2026-04-20 04:30:13 +00:00
Celes Renata 422326bf83 fix: inttest pipeline timeout and busybox grep compatibility
- Poll job status instead of kubectl wait (catches Failed condition
  immediately instead of waiting 600s for Complete that never comes)
- Replace grep -oP (Perl regex) with POSIX grep -o (BusyBox compat)
2026-04-20 04:23:19 +00:00
Celes Renata a735e569b2 fix: run steps in parallel within each workflow
All build steps and test steps now have depends_on: [] so they
execute concurrently within their workflow instead of sequentially.
2026-04-20 04:07:26 +00:00
Celes Renata 40eacdf8d2 ci: trigger pipeline with multi-workflow config 2026-04-20 04:04:04 +00:00
Celes Renata c81e17f527 fix: split pipeline into 4 workflows for cross-node scheduling
Multi-workflow with local-path storage + mismatchLabelKeys anti-affinity
forces each workflow onto a different cluster node:
- test: lint + pytest + vitest (node A)
- build-1: scheduler, symbol-registry, ingestion, parser (node B)
- build-2: extractor, aggregation, recommendation, risk (node C)
- build-3: broker-adapter, lake-publisher, query-api, trading-engine, dashboard, superset (node D)
- finalize: integration-test + github mirror (any available node)
2026-04-20 03:31:50 +00:00
Celes Renata 9850dc45b1 fix: use longhorn-rwx storage for cross-node pipeline scheduling
- StorageClass longhorn-rwx with hard NFS mount (no softerr)
- Prevents Remote I/O errors from share-manager startup race
- RWX allows steps to spread across all 4 cluster nodes
- podAntiAffinity ensures different workflows use different nodes
2026-04-20 03:23:54 +00:00
Celes Renata a2f76cea96 ci: trigger pipeline to validate node spreading 2026-04-20 02:54:14 +00:00
Celes Renata 8e90b49f78 fix: add resource requests and scheduling docs to woodpecker pipeline
- Add backend_options.kubernetes.resources to all build/test steps
- Helps scheduler spread pods across cluster nodes (works with
  agent-level WOODPECKER_BACKEND_K8S_POD_AFFINITY anti-affinity)
- Build steps: 1Gi/1000m requests, 2Gi/4000m limits
- Test steps: 256-512Mi/200-500m requests
2026-04-20 02:48:47 +00:00
Celes Renata 898f89926d feat: beta API integration test suite — 85 new tests across 6 modules
Extends integration test coverage from 108 to 193 tests for the beta gate.

New test modules:
- test_query_api_extended.py (33 tests): documents, evidence, macro/competitive, ops/admin, agents, analytics
- test_registry_write_paths.py (16 tests): write paths, validation, duplicates, competitor/exposure CRUD
- test_risk_approval_lifecycle.py (8 tests): evaluation edge cases, full approval lifecycle
- test_trading_extended.py (12 tests): config round-trips, decision filtering, override validation
- test_cross_service_roundtrip.py (4 tests): cross-service data consistency
- test_error_handling.py (12 tests): 404s, 422s, empty states, health checks

Seed script extended with watchlists, approvals, lockouts, notifications,
ingestion runs, saved queries, and daily risk snapshots.
2026-04-20 02:34:19 +00:00
Celes Renata 8f67d326c9 feat: derive POSTGRES_DB and Redis prefix from DEPLOY_STAGE for pipeline isolation 2026-04-20 01:33:14 +00:00
Celes Renata d64ce82649 fix: scheduler timezone-aware datetime subtraction in is_source_due 2026-04-20 00:47:26 +00:00
Celes Renata f3aac0ac3d fix: superset config uses POSTGRES_DB and REDIS_DB env vars for stage isolation 2026-04-19 23:49:11 +00:00
Celes Renata 0f2f0460a6 fix: dedicated scheduler Dockerfile with psql for migrations, remove Python splitter 2026-04-19 23:35:00 +00:00
Celes Renata 48fed18078 feat: per-stage PostgreSQL users for database isolation (stonks_beta, stonks_paper) 2026-04-19 23:17:22 +00:00
Celes Renata 47f10cd3cf fix: use Python asyncpg migration runner instead of psql, remove postgresql-client from image 2026-04-19 22:54:01 +00:00
Celes Renata 4d2adaa9e5 fix: allow infra/migrations in .dockerignore, add psql + migrations to Docker image 2026-04-19 22:51:24 +00:00
Celes Renata 021efba294 feat: auto-run migrations via psql init container on scheduler startup 2026-04-19 22:37:50 +00:00
Celes Renata 5c63264393 feat: stage-isolated infrastructure — separate Postgres DBs, Redis DBs, and MinIO bucket prefixes per stage 2026-04-19 22:20:03 +00:00
Celes Renata 2621b3c5c5 feat: add stage-specific ingress hostnames for beta and paper 2026-04-19 22:00:47 +00:00
Celes Renata 651ef838ce fix: add Argo Rollouts install, secrets seeding, and Kargo admin password fix to runmefirst.sh 2026-04-19 21:58:48 +00:00
Celes Renata 4425a023d9 fix: use correct argocd-update sources schema to pin image SHA tags 2026-04-19 21:16:31 +00:00
Celes Renata e5ed2c21a3 fix: pin image SHA tags in Kargo promotions, 1min warehouse poll, auto-promote paper 2026-04-19 20:54:02 +00:00
Celes Renata 827be709df fix: use Recreate strategy for hive-metastore and superset (RWO PVC) 2026-04-19 20:41:22 +00:00