Beta was pointing at stonks_beta DB where tables were owned by postgres
superuser, causing permission denied for the stonks app user. Switch to
sharing stonks_paper DB/user (already has proper grants). DEPLOY_STAGE=beta
still isolates Redis keys and MinIO buckets. Added market data API key
so beta can test ingestion when pipeline is toggled ON.
The plugin-docker-buildx inherits proxy env vars from the pod/node.
Setting http_proxy/https_proxy to empty strings overrides any
inherited proxy config so Docker can reach Harbor directly.
SSL filtering is off on the proxy. The proxy env vars were causing
Docker login failures (proxy intercepting Harbor auth) and pip hash
mismatches (proxy caching stale packages). Keep only the CA cert
mount for any remaining TLS needs.
- pipelineEnabled: true in beta so all pods run (Kargo happy)
- PIPELINE_DEFAULT_OFF=true in beta config — scheduler initializes
the Redis toggle to OFF on first boot
- Shared Ollama (10.1.1.12:2701) between beta and paper
- Flip pipeline ON from the UI when testing, OFF when done
- Optimistic UI update for the toggle button
The Kyverno policy injected HTTP_PROXY into build pods but NO_PROXY
was missing .celestium.life. Docker login to registry.celestium.life
was going through the Squid proxy which does SSL interception,
causing auth failures.
Buildkit resolves registry hostnames using its own resolver which
doesn't use the custom_dns setting. Adding an explicit host entry
ensures registry.celestium.life resolves even when cluster DNS
can't reach the proxy DNS.
kubectl wait fails immediately with 'no matching resources found' if
pods haven't been created yet. Added a poll loop to wait for all 3
infra pods (postgres, redis, minio) to exist before running wait.
Permanent fix for cluster rebuilds:
- OAuth2 client_id/secret baked into woodpecker/values.yaml
- WOODPECKER_AGENT_SECRET shared between server and agents
- runmefirst.sh uses baked creds if present, creates fresh ones only
if values.yaml still has placeholders
- Agents survive DB wipes since they auth via shared secret
Tests complete in ~7s. The 10-minute timeout was causing unnecessary
wait time on failures. Reduced Job activeDeadlineSeconds and kubectl
wait timeout to 300s.
- rec['mode'] can be 'autonomous' (not just informational/paper/live)
- risk check uses 'check_name'/'result' not 'name'/'passed'
- decision type can be 'execute' not just 'act'/'skip'
- Added pipelineEnabled flag to Helm values (default: true)
- Worker services (scheduler, ingestion, parser, extractor, aggregation,
recommendation, broker-adapter, lake-publisher) scale to 0 when disabled
- API services always run regardless of toggle
- Redis-based runtime toggle: POST /api/ops/pipeline/toggle
- Scheduler checks the flag before each cycle
- Frontend: green/red Pipeline ON/OFF button on the pipeline page
- Beta defaults to pipelineEnabled: false
- Base values.yaml: blanked external URLs (Ollama, Polygon, Alpaca)
so stages only connect to what they explicitly configure
Base values.yaml now has empty OLLAMA_BASE_URL, MARKET_DATA_BASE_URL,
and BROKER_PROVIDER. Only paper (and eventually live) set the real
URLs. Beta inherits empty defaults so it can't reach external services.
Beta is for API testing only. Scale scheduler, ingestion, parser,
extractor, aggregation, recommendation, broker-adapter, and
lake-publisher to 0 replicas. Blank out Polygon and Alpaca keys.
Infra secrets (postgres, redis, minio) kept so API services work.
Beta is for API testing only. Blanked out Polygon/Alpaca/Ollama
credentials, set OLLAMA_BASE_URL to localhost:99999, and scaled
scheduler/ingestion/parser/extractor/aggregation/recommendation/
broker-adapter/lake-publisher to 0 replicas.
The 30-minute threshold was shorter than the queue drain time, causing
the recovery sweep to re-enqueue docs that were already queued but not
yet processed. Bumped to 4 hours with matching marker TTL.
- All paper stage credentials now in values-paper.yaml so ArgoCD
renders them correctly on every sync (no more empty secrets)
- Added seed-if-empty init container to scheduler: runs the seed
script if the companies table is empty after migrations
Recovery sweeps and the retry endpoint now check a per-document Redis
key (SET NX, 1h TTL) before pushing to the queue. If the marker exists,
the doc is already enqueued and gets skipped. This prevents the
scheduler from re-enqueuing the same parsed docs every 5 minutes.
The pipeline health, SSE stream, and retry endpoints were hardcoding
'stonks:queue:{name}' but services use DEPLOY_STAGE prefix
('stonks:paper:queue:{name}'). Now uses queue_key() from redis_keys.py.
The extraction queue had 3000+ SEC filings backed up with a single
extractor pod processing them at 10-115s each. Ollama handles
concurrent requests so multiple extractor pods can share the GPU.
- POST /api/ops/pipeline/retry-failed endpoint resets extraction_failed
docs to parsed, deletes failed intelligence rows, and re-enqueues
them (batch of 200)
- Scheduler now auto-retries extraction_failed docs every ~10 minutes
(100 per cycle, 60-min cooldown per doc)
- Pipeline page shows 'Retry Failed (N)' button when extraction_failed
count > 0, with pending/success/error states