- All paper stage credentials now in values-paper.yaml so ArgoCD
renders them correctly on every sync (no more empty secrets)
- Added seed-if-empty init container to scheduler: runs the seed
script if the companies table is empty after migrations
Recovery sweeps and the retry endpoint now check a per-document Redis
key (SET NX, 1h TTL) before pushing to the queue. If the marker exists,
the doc is already enqueued and gets skipped. This prevents the
scheduler from re-enqueuing the same parsed docs every 5 minutes.
The pipeline health, SSE stream, and retry endpoints were hardcoding
'stonks:queue:{name}' but services use DEPLOY_STAGE prefix
('stonks:paper:queue:{name}'). Now uses queue_key() from redis_keys.py.
The extraction queue had 3000+ SEC filings backed up with a single
extractor pod processing them at 10-115s each. Ollama handles
concurrent requests so multiple extractor pods can share the GPU.
- POST /api/ops/pipeline/retry-failed endpoint resets extraction_failed
docs to parsed, deletes failed intelligence rows, and re-enqueues
them (batch of 200)
- Scheduler now auto-retries extraction_failed docs every ~10 minutes
(100 per cycle, 60-min cooldown per doc)
- Pipeline page shows 'Retry Failed (N)' button when extraction_failed
count > 0, with pending/success/error states
The polling loop checked conditions[0].type which missed the Complete
condition when it wasn't at index 0. Switch to kubectl wait
--for=condition=complete which handles condition matching reliably.
The inline catalyst_type query in GET /api/patterns/{ticker} referenced
dir.document_id which does not exist on document_impact_records. The
table links to documents via intelligence_id -> document_intelligence ->
document_id. Added the missing JOIN to match the pattern used in
_SELF_PATTERN_QUERY.
1. patterns endpoint: fix query referencing non-existent column
di.catalyst_type → dir.catalyst_type (column is on document_impact_records)
2. lockouts seed: use relative timestamps (now + 7d) so active lockout
is always in the future regardless of when tests run
3. create_agent: make slug optional with auto-generation from name
4. create_source: json.dumps(config) + ::jsonb cast for asyncpg JSONB compat
5. approval_expiry: return count as int (len(expired)) not the list itself
6. metrics_consistency: fix test assertion to match API contract
(total >= active + reserve, not total == active + reserve + unrealized)
- Poll job status instead of kubectl wait (catches Failed condition
immediately instead of waiting 600s for Complete that never comes)
- Replace grep -oP (Perl regex) with POSIX grep -o (BusyBox compat)
- StorageClass longhorn-rwx with hard NFS mount (no softerr)
- Prevents Remote I/O errors from share-manager startup race
- RWX allows steps to spread across all 4 cluster nodes
- podAntiAffinity ensures different workflows use different nodes
BusyBox mktemp in alpine/k8s doesn't support .json suffix in template.
The mktemp failure triggered set -e, causing pipeline to report failure
despite all 93 tests passing.
- Remove minio-bucket-init Job entirely (seed_minio.py creates bucket)
- Wait for pods to exist before kubectl wait --for=condition=ready
- Fixes 'no matching resources found' race when pods are still ContainerCreating