Commit Graph

483 Commits

Author SHA1 Message Date
Celes Renata 7071bba92d fix: increase stale threshold to 4h to prevent duplicate enqueuing
The 30-minute threshold was shorter than the queue drain time, causing
the recovery sweep to re-enqueue docs that were already queued but not
yet processed. Bumped to 4 hours with matching marker TTL.
2026-04-20 18:05:30 +00:00
Celes Renata 20faa8e20d fix: bake secrets into values-paper.yaml and auto-seed on empty DB
- All paper stage credentials now in values-paper.yaml so ArgoCD
  renders them correctly on every sync (no more empty secrets)
- Added seed-if-empty init container to scheduler: runs the seed
  script if the companies table is empty after migrations
2026-04-20 17:40:41 +00:00
Celes Renata 46c24aefab fix: prevent duplicate queue entries with Redis SET markers
Recovery sweeps and the retry endpoint now check a per-document Redis
key (SET NX, 1h TTL) before pushing to the queue. If the marker exists,
the doc is already enqueued and gets skipped. This prevents the
scheduler from re-enqueuing the same parsed docs every 5 minutes.
2026-04-20 17:24:53 +00:00
Celes Renata 288c5333b5 fix: use queue_key() for stage-prefixed Redis queue names in pipeline endpoints
The pipeline health, SSE stream, and retry endpoints were hardcoding
'stonks:queue:{name}' but services use DEPLOY_STAGE prefix
('stonks:paper:queue:{name}'). Now uses queue_key() from redis_keys.py.
2026-04-20 13:16:11 +00:00
Celes Renata 740ddc1c54 fix: revert extractor to 1 replica (single GPU bottleneck) 2026-04-20 12:16:51 +00:00
Celes Renata 7fc54a6023 ci: retrigger with docker hub token 2026-04-20 11:44:43 +00:00
Celes Renata 5b07821ed4 ci: retrigger with all secrets 2026-04-20 11:40:32 +00:00
Celes Renata 8218a17d23 ci: retrigger with secrets configured 2026-04-20 11:38:43 +00:00
Celes Renata 9dc36c32bc ci: retrigger with trusted repo 2026-04-20 11:37:06 +00:00
Celes Renata 25f8c0dc97 ci: trigger fresh woodpecker build 2026-04-20 11:36:17 +00:00
Celes Renata 7a082667b6 ci: retrigger build (webhook restored) 2026-04-20 11:16:33 +00:00
Celes Renata f20c737de3 ci: retrigger after webhook fix 2026-04-20 11:14:44 +00:00
Celes Renata b6ca89742d ci: retrigger build 2026-04-20 11:07:30 +00:00
Celes Renata f1f0b7e34c fix: scale extractor to 3 replicas in paper stage
The extraction queue had 3000+ SEC filings backed up with a single
extractor pod processing them at 10-115s each. Ollama handles
concurrent requests so multiple extractor pods can share the GPU.
2026-04-20 10:59:05 +00:00
Celes Renata de35279269 feat: retry failed extractions button on pipeline page
- POST /api/ops/pipeline/retry-failed endpoint resets extraction_failed
  docs to parsed, deletes failed intelligence rows, and re-enqueues
  them (batch of 200)
- Scheduler now auto-retries extraction_failed docs every ~10 minutes
  (100 per cycle, 60-min cooldown per doc)
- Pipeline page shows 'Retry Failed (N)' button when extraction_failed
  count > 0, with pending/success/error states
2026-04-20 08:09:29 +00:00
Celes Renata 5289f0f195 fix: use kubectl wait for job completion detection in inttest pipeline
The polling loop checked conditions[0].type which missed the Complete
condition when it wasn't at index 0. Switch to kubectl wait
--for=condition=complete which handles condition matching reliably.
2026-04-20 07:47:10 +00:00
Celes Renata 40d4fd8197 ci: retrigger after woodpecker oauth reauth 2026-04-20 07:29:57 +00:00
Celes Renata eda49be927 ci: retrigger woodpecker after oauth fix 2026-04-20 07:27:43 +00:00
Celes Renata 1c861e3948 ci: trigger woodpecker pipeline 2026-04-20 07:20:34 +00:00
Celes Renata 950ff03f7e fix: join through document_intelligence in patterns endpoint
The inline catalyst_type query in GET /api/patterns/{ticker} referenced
dir.document_id which does not exist on document_impact_records. The
table links to documents via intelligence_id -> document_intelligence ->
document_id. Added the missing JOIN to match the pattern used in
_SELF_PATTERN_QUERY.
2026-04-20 07:12:13 +00:00
Celes Renata 5acb2fb43e fix: resolve 6 integration test failures
1. patterns endpoint: fix query referencing non-existent column
   di.catalyst_type → dir.catalyst_type (column is on document_impact_records)
2. lockouts seed: use relative timestamps (now + 7d) so active lockout
   is always in the future regardless of when tests run
3. create_agent: make slug optional with auto-generation from name
4. create_source: json.dumps(config) + ::jsonb cast for asyncpg JSONB compat
5. approval_expiry: return count as int (len(expired)) not the list itself
6. metrics_consistency: fix test assertion to match API contract
   (total >= active + reserve, not total == active + reserve + unrealized)
2026-04-20 04:30:13 +00:00
Celes Renata 422326bf83 fix: inttest pipeline timeout and busybox grep compatibility
- Poll job status instead of kubectl wait (catches Failed condition
  immediately instead of waiting 600s for Complete that never comes)
- Replace grep -oP (Perl regex) with POSIX grep -o (BusyBox compat)
2026-04-20 04:23:19 +00:00
Celes Renata a735e569b2 fix: run steps in parallel within each workflow
All build steps and test steps now have depends_on: [] so they
execute concurrently within their workflow instead of sequentially.
2026-04-20 04:07:26 +00:00
Celes Renata 40eacdf8d2 ci: trigger pipeline with multi-workflow config 2026-04-20 04:04:04 +00:00
Celes Renata c81e17f527 fix: split pipeline into 4 workflows for cross-node scheduling
Multi-workflow with local-path storage + mismatchLabelKeys anti-affinity
forces each workflow onto a different cluster node:
- test: lint + pytest + vitest (node A)
- build-1: scheduler, symbol-registry, ingestion, parser (node B)
- build-2: extractor, aggregation, recommendation, risk (node C)
- build-3: broker-adapter, lake-publisher, query-api, trading-engine, dashboard, superset (node D)
- finalize: integration-test + github mirror (any available node)
2026-04-20 03:31:50 +00:00
Celes Renata 9850dc45b1 fix: use longhorn-rwx storage for cross-node pipeline scheduling
- StorageClass longhorn-rwx with hard NFS mount (no softerr)
- Prevents Remote I/O errors from share-manager startup race
- RWX allows steps to spread across all 4 cluster nodes
- podAntiAffinity ensures different workflows use different nodes
2026-04-20 03:23:54 +00:00
Celes Renata a2f76cea96 ci: trigger pipeline to validate node spreading 2026-04-20 02:54:14 +00:00
Celes Renata 8e90b49f78 fix: add resource requests and scheduling docs to woodpecker pipeline
- Add backend_options.kubernetes.resources to all build/test steps
- Helps scheduler spread pods across cluster nodes (works with
  agent-level WOODPECKER_BACKEND_K8S_POD_AFFINITY anti-affinity)
- Build steps: 1Gi/1000m requests, 2Gi/4000m limits
- Test steps: 256-512Mi/200-500m requests
2026-04-20 02:48:47 +00:00
Celes Renata 898f89926d feat: beta API integration test suite — 85 new tests across 6 modules
Extends integration test coverage from 108 to 193 tests for the beta gate.

New test modules:
- test_query_api_extended.py (33 tests): documents, evidence, macro/competitive, ops/admin, agents, analytics
- test_registry_write_paths.py (16 tests): write paths, validation, duplicates, competitor/exposure CRUD
- test_risk_approval_lifecycle.py (8 tests): evaluation edge cases, full approval lifecycle
- test_trading_extended.py (12 tests): config round-trips, decision filtering, override validation
- test_cross_service_roundtrip.py (4 tests): cross-service data consistency
- test_error_handling.py (12 tests): 404s, 422s, empty states, health checks

Seed script extended with watchlists, approvals, lockouts, notifications,
ingestion runs, saved queries, and daily risk snapshots.
2026-04-20 02:34:19 +00:00
Celes Renata 8f67d326c9 feat: derive POSTGRES_DB and Redis prefix from DEPLOY_STAGE for pipeline isolation 2026-04-20 01:33:14 +00:00
Celes Renata d64ce82649 fix: scheduler timezone-aware datetime subtraction in is_source_due 2026-04-20 00:47:26 +00:00
Celes Renata f3aac0ac3d fix: superset config uses POSTGRES_DB and REDIS_DB env vars for stage isolation 2026-04-19 23:49:11 +00:00
Celes Renata 0f2f0460a6 fix: dedicated scheduler Dockerfile with psql for migrations, remove Python splitter 2026-04-19 23:35:00 +00:00
Celes Renata 48fed18078 feat: per-stage PostgreSQL users for database isolation (stonks_beta, stonks_paper) 2026-04-19 23:17:22 +00:00
Celes Renata 47f10cd3cf fix: use Python asyncpg migration runner instead of psql, remove postgresql-client from image 2026-04-19 22:54:01 +00:00
Celes Renata 4d2adaa9e5 fix: allow infra/migrations in .dockerignore, add psql + migrations to Docker image 2026-04-19 22:51:24 +00:00
Celes Renata 021efba294 feat: auto-run migrations via psql init container on scheduler startup 2026-04-19 22:37:50 +00:00
Celes Renata 5c63264393 feat: stage-isolated infrastructure — separate Postgres DBs, Redis DBs, and MinIO bucket prefixes per stage 2026-04-19 22:20:03 +00:00
Celes Renata 2621b3c5c5 feat: add stage-specific ingress hostnames for beta and paper 2026-04-19 22:00:47 +00:00
Celes Renata 651ef838ce fix: add Argo Rollouts install, secrets seeding, and Kargo admin password fix to runmefirst.sh 2026-04-19 21:58:48 +00:00
Celes Renata 4425a023d9 fix: use correct argocd-update sources schema to pin image SHA tags 2026-04-19 21:16:31 +00:00
Celes Renata e5ed2c21a3 fix: pin image SHA tags in Kargo promotions, 1min warehouse poll, auto-promote paper 2026-04-19 20:54:02 +00:00
Celes Renata 827be709df fix: use Recreate strategy for hive-metastore and superset (RWO PVC) 2026-04-19 20:41:22 +00:00
Celes Renata dbd9e74784 fix: add ignoreDifferences for secrets in ArgoCD apps, fix warehouse strategy and Kargo auth annotations 2026-04-19 20:27:31 +00:00
Celes Renata 014ffa2fd2 fix: Kargo promotion pipeline — add AnalysisRun CRD, fix warehouse image strategy, add authorized-stage annotations, remove proxy from ArgoCD 2026-04-19 20:08:46 +00:00
Celes Renata a9be904afe fix: guard ghcr-secret template against nil ghcrAuth values 2026-04-19 19:51:29 +00:00
Celes Renata 886911149f ci: add unshallow fetch and suppress ssh-keyscan stderr in mirror step 2026-04-19 19:46:11 +00:00
Celes Renata 1f69a27e3b fix: replace mktemp with PID-based temp path for BusyBox compat
BusyBox mktemp in alpine/k8s doesn't support .json suffix in template.
The mktemp failure triggered set -e, causing pipeline to report failure
despite all 93 tests passing.
2026-04-19 19:35:02 +00:00
Celes Renata 4df513d096 fix: remove bucket-init job, wait for pods before readiness check
- Remove minio-bucket-init Job entirely (seed_minio.py creates bucket)
- Wait for pods to exist before kubectl wait --for=condition=ready
- Fixes 'no matching resources found' race when pods are still ContainerCreating
2026-04-19 19:25:49 +00:00
Celes Renata b2b8aca7c6 fix: inttest runner crash and minio bucket-init proxy issue
- Remove --profiling-output arg from runner.yaml (plugin uses default path)
- Inline profiling hooks in root conftest.py with graceful fallback
- Replace mc-based bucket-init with Python urllib (no proxy interference)
- Add explicit ProxyHandler({}) to guarantee no proxy usage in bucket-init
2026-04-19 19:15:20 +00:00